Research Insights on IHEP Disk Storage Systems and Cloud Computing Integration

Slide Note
Embed
Share

Ms. Wang Lu, Ph.D., an associated researcher at CC-IHEP, specializes in the management and analysis of the IHEP disk storage system, IO behavior analysis of HEP jobs, and the integration of cloud storage with HEP computing. Her research involves software development, evaluation, and problem diagnosis in the context of large-scale storage systems. The IHEP disk storage system utilizes Lustre technology, with a total space of around 3PB and an aggregated throughput of 40GB/s. Ms. Wang is also actively involved in collecting and analyzing IO behavior data, with plans for advanced machine learning algorithms to enhance storage optimization. Additionally, she explores the potential benefits of integrating cloud storage with existing systems for increased elasticity, cost-effectiveness, and data reliability.


Uploaded on Sep 28, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Summary Ms. Wang Lu Ph.D, Associated Researcher Email: wanglu@ihep.ac.cn Works at CC-IHEP: Management of the IHEP disk storage system IO behavior analysis of HEP jobs Integration of cloud storage with HEP computing Related software development, evaluation, problem diagnosis

  2. The IHEP disk storage system(1) Current builds with open source Lustre on a relatively simple architecture. Lustre Version: 1.8.8 Network: 10G Ethernet Intallation: Quattor/Puppet Monitor: Ganglia Problem Detection: Nagios Diagnosis: Crashdump/GDB/SystemTap/ Usage Statistics: Robinhood ( developed by CEA) Homemade scripts Total space is around 3PB, aggregated throughput 40GB/s, divided into 8 mount points, serving different experiments and purpose. This system has been running for about 6 years, probably second largest Lustre system in HEP community.

  3. The IHEP disk storage system(2) Several problems found before the next wave of HEP data Hardware depended data redundancy, cost of disk array controller, potential risk of data reliability Single metadata server design, limited metadata performance Evaluation of new storage technologies: Glusterfs, Ceph (in Plan),Loogstore, Dawoo Storage,ParaStore software level data redundancy, higher data durability Better designs of metadata server, better performance and reliability of metadata Hundred TB scale production usage of Glusterfs/ParaStore, too early to draw a conclusion

  4. IO behavior collection and analysis(1) Supported by NSFC, results could be multiplied as input of storage optimization, requirement analysis and long-term storage plan.

  5. IO behavior collection and analysis(2) Collected over 1 million job samples With correct training set and logistic regression algorithm, the classification accuracy is about 85% More sophisticated machine learning algorithm over large set of job sample in plan Neural network/SVM On-line training

  6. Integration of Cloud Storage with HEP computing(1) Understanding the possibility of using Cloud storage as a supplement of current storage systems merits as elasticity, cost-effectiveness and high data reliability Joint work with Fabio Hernandez Our approaches: Implemented an IO plugin for ROOT, so that physicists and backend jobs can access data on cloud storage without modifying their programs. Developed a FUSE-based cloud file system to bridge the protocol difference between cloud storage and Linux file system. Investigated several cloud storage GUIs provided by third-parties

  7. Integration of Cloud Storage with HEP computing(2) Three possible scenarios Storage of physics data on small sites Data sharing among participating sites Storage backend for the enormous amount of small files Remote collaborated evaluation is needed before conclusive opinion is drew. CC-In2p3, which has good network connection and collaboration basis with IHEP, is a potential partner.

  8. Past Projects R&D on distributed file system, July, 2011-March, 2012 dynamic replica, read-ahead, multithread IO features Design and optimization of the metadata server of GRASS storage system, PhD Thesis, Sept.2009 - Mar.2011 In memory metadata server design, Bloom filter based optimization, DRBD based data HA Deployment of an animated monitor of a HSM storage, Sept. 2007 - Sept. 2008 Based on Adobe Flex and Cairngorm framework Design and Implementation of BES file-set operating service over BES grid, Master Thesis, Sept. 2005 Sept. 2007 Grid middle ware design and development, TCP tuning, replication selection

  9. Thanks for your time!

Related