PROJECT Data Storage
Five switchboard cabinets (the so-called racks) are occupied by the PROJECT data storage facility, which is fully available to users from March 2021. As its name suggests, it is used for storing and backing up data processed or generated on the Ostrava supercomputers. In addition to the fact that the PROJECT storage facility is based on state-of-the-art technologies, specifically IBM's Spectrum Scale solution, it is also much more friendly to users who base their scientific research on supercomputers. The state-of-the-art solution promises ample capacity, reliable and fast operation, and it operates as a central, i.e. supercomputer-independent, system. "This means that once project data is uploaded to the storage, users can access it from any supercomputer they are currently using for performing their calculations. Unlike the previous practice, which assumed having copies of the data on every supercomputer, this system brings significant time savings to users of Ostrava's supercomputing infrastructure," says Radovan Pasek, Head of HPC Operations and Administration at IT4Innovations.
Another advantage of the PROJECT storage facility is that it will not only serve the specific data needs of the existing Karolina, Barbora, and NVIDIA DGX-2 supercomputers, but thanks to its lifespan, which ranges between 8 and 10 years, it will also offer its capacity to their successors. In addition, it has sufficient storage capacity for projects with long life cycles, typically longer than one year, which is another reason for its acquisition.
One of the most important parameters of the PROJECT storage facility is its capacity, which is 15 petabytes. How to better imagine this number? "Let's compare the PROJECT storage facility to a 1TB external drive that a typical PC user uses to back up their data. If this user wanted to match our high-capacity storage, he or she would have to buy exactly 15,000 such external drives," calculates Roman Sliva, Senior HPC Architect at IT4Innovations.
Other PROJECT storage parameters include:
· The storage consists of a combination of faster SSDs and slower NL-SATA drives, which together with other infrastructure elements are arranged in three independent blocks for high availability and easy data replication.
· Total system throughput reaches 39 GB/s and handles up to 57,000 I/O operations per second.
· Although the capacity is currently sufficient, it is already clear that it will have to be increased in the future, given the ever-increasing amount of data handled by research projects. However, thanks to the modular design of the solution, it will be possible to expand the storage if necessary.
The supplier of the PROJECT storage facility is DATERA, a company that specializes in the design and implementation of data storage systems. The procurement was funded under the IT4Innovations National Supercomputing Center – path to exascale project.
Data storage facility of the CESNET association
With a capacity of incredible 33,673 petabytes, it is one of the largest data storage facilities in the country. In addition to the capacity, it is also worth mentioning its weight - 3.3 tons or the fact that it occupies a total of 6 racks. The solution contains 1776 Seagate Exos X18 rotating drives with a capacity of 18 terabytes. There are 74 Lenovo ThinkSystem SR635 servers for data storage and 12 servers for the entire system monitoring.
The CESNET association and the e-INFRA CZ project, which in addition to CESNET also includes IT4Innovations and CERIT-SC, are behind this large-scale data storage facility. This will serve educational institutions in the Czech Republic for the transfer, storage, archiving, and processing of scientific data.
The supplier of the solution, which amounted to CZK 39.9 million, is M Computers, a Czech company with experience in HPC installations. For more information, see www.mcomputers.cz/en/2022/03/23/we-built-one-of-the-largest-data-storage-facilities-in-the-czech-republic/.
The procurement of the PROJECT data storage was funded by the OP RDE project entitled IT4Innovations National Supercomputing Center – Path to exascale project ID: CZ.02.1.01/0.0/0.0/16_013/0001791.