Data Capacitor: Frequently Asked Questions

What is the Data Capacitor?
Who can use the Data Capacitor?
How do I access the Data Capacitor?
How much space can I use?
How long can I keep data on the Data Capacitor?
Is the Data Capacitor backed up?
What is the advantage of using the Data Capacitor?
Should I use the Data Capacitor differently than my local disk?
What is a Lustre file system?
My application is I/O bound, what can I do?

What is the Data Capacitor?

The Data Capacitor is a uniquely architected high-end system for distributed high volume data processing and flow. At the lowest level it is a high speed, high capacity file system architected for the short to mid term storage of large research data sets.

Who can use the Data Capacitor?

Data Capacitor file space is divided into two categories: Project and Scratch.

The Data Capacitor projects directory is dedicated to long-term projects with storage requirements that cannot be met with other existing systems. Requests for project space will be submitted to Team Data Capacitor and evaluated by the Data Capacitor Allocation Committee. (Request project space here)

Project requests will include:

  • Project proposal and justification
  • Project participants
  • TB requested
  • Data rate required
  • Special requests (additional mounts?)
  • Project name for directory name - /N/dc/projects/project_name

The default size for project requests will be 10TB. If space requirements are greater than 10TB, a written request will be required and reviewed by the Data Capacitor Allocation Committee.

Files in project space with access times greater than 30 days may be purged.

The Data Capacitor scratch directory is a temporary workspace currently available to all users of Big Red. Scratch space is not allocated and its total capacity will fluctuate based on project space requirements.

Files in scratch space may be purged after 14 days.

How do I access the Data Capacitor?

The Data Capacitor is mounted on BigRed as /N/dc/... and behaves like any other disk device on that machine. Anyone with an account on BigRed can access /N/dc/scratch. Access to /N/dc/projects requires an allocation, see above.

How much space can I use?

The entire system contains up to 524 TBytes of space shared by all users of the system. Projects are given, by default, a quota of 10 TBytes. Larger quotas can be requested if more space is needed. Due to performance issues, storing a large number of small files is discouraged but arrangements can be made if a need exists.

Available scratch space will vary depending on project use and comprises that portion of the Data Capacitor not allocated to projects.

How long can I keep data on the Data Capacitor?

Project files older than one month may be deleted, scratch files older than 14 days may be deleted.

It is the users responsibility to arrange for long term storage of any data on the system.

Is the Data Capacitor backed up?

No. The Data Capacitor is not intended for long term storage of data. It is possible to archive data stored or created on the Data Capacitor on HPSS using, for example, hsi or any of the other methods used to access HPSS.

What is the advantage of using the Data Capacitor?

The Data Capacitor reads and writes across more than one physical disk. This yields substantially higher read/write speeds than to a single disk. Files larger than the capacity of a single drive can also be used. Very large file sizes are supported.

Should I use the Data Capacitor differently than my local disk?

The Data Capacitor stores metadata (file names, sizes, etc.) differently than other file systems. Directories with a very large (more than 10,000) number of files will have a poor response time for metadata operations, like ls, for example. Files are also stored across several stripes to improve read/write speed. Blocks of 1 MByte are used for data transfers to/from these stripes. For these reasons, few large files will provide better performance than many small files.

What is a Lustre file system?

Lustre is an open source product of Cluster File Systems (CFS) Inc.

My application is I/O bound, what can I do?

The Lustre file system allows files to be striped across several storage devices. This can significantly improve I/O rates.