Database Design¶
Warning
Final MongoDB Supported Version: 0.7.0
0.7.0 is the last major release which support MongoDB. Fractal is moving towards a PostgreSQL database to make upgrades more stable and because it is more suited to the nature of QCArchive Data. The upgrade path from MongoDB to PostgreSQL will be provided by the Fractal developers in the next release. Due to the complex nature of the upgrade, the PostgreSQL upgrade will through scripts which will be provided. After the PostgreSQL upgrade, there will be built-in utilities to upgrade the Database.
QCArchive stores all its data and computations in a database in the backend of QCFractal. The DB is designed with extensibility in mind, allowing flexibility and easy accommodation of future features. The current backend of the DB storage is build on top of a non-relational DB, MongoDB, but it can be easily implemented in a Relational DB like MySQL or Postgres. In addition, Object Relational Mapping (ORM) is used to add some structure and ensure validation on the MongoDB which does not have any by definition. The ORM used is the most popular general MongoDB Python ORM, Mongoengine.
The main idea behind QCArchive DB design is to be able to store and retrieve wide range of Quantum Chemistry computations using different programs and variety of configurations. The DB also stores information about jobs submitted to request computations, and all their related data, along with registered users and computational managers.
QCArchive DB is organized into a set of tables (or documents), each of which are detailed below.
1) Molecule¶
The molecule table stores molecules used in any computation in the system. The molecule structure is based on the standard QCSchema. It stores entries like geometry, masses, and fragment charges. Please refer to the QCSchema for a complete description of all the possible fields.
2) Keyword¶
Keywords are a store of key-value pairs that are configuration for some
QC program. It is flexible and there is no restriction on what configuration
can be stored here. This table referenced by the Result
table.
3) Result¶
This table stores the actual computation results along with the attributes used to calculate it. Each entry is a single unit of computation. The following are the unique set of keys (or indices) that define a result:
driver
- The type of calculation being evaluated (i.e.energy
,gradient
,hessian
,properties
)program
: such asgamess
orpsi4
(lower case)molecule
: the ID of the molecule in theMolecule
tablemethod
: the method used in the computation (b3lyp, mp2, ccsd(t))keywords
: the ID of the keywords in theKeywords
tablebasis
: the name of the basis used in the computation (6-31g, cc-pvdz, def2-svp)
For more information see: Results.
4) Procedure¶
Procedures are also computational results but in a more complex fashion.
They perform more aggregate computations like optimizations, torsion drive, and
grid optimization. The DB can support new types of optimizations by
inheriting from the the base procedure table. Each procedure usually reference
several other results from the Results
table, and possibly other procedures
(self-reference).
5) Services¶
Services are more flexible workflows that eventually produce results to be
stored in the Result
and/or the Procedure
tables when they are done.
So, from the DB point of view, this is an intermediate table for on going
iterative computations.
More about services in QCArchive can be found here: Services.
6) TaskQueue¶
This table is the main task queue of the system. Tasks are submitted to this
table by QCFractal and wait for a manager to pull it for computation. Each
task in the queue references a Result
or a Procedure
, meaning that it is
corresponding to a specific Quantum computation. The table stores the status
of the task (WAITING
, RUNNING
, COMPLETE
, and ERROR
) and also
keeps track of the execution manager and the modification dates.
7) QueueManagers¶
Managers are the registered servers for computing tasks from the TaskQueue
.
This table keep information about the server such as the host, cluster,
number of completed tasks, submissions, and failures.
The database only keeps track of what Tasks have been handed out to each Manager and maintains a heartbeat to ensure the Manager is still connected. More information about the configuration and execution of managers can be found here: Fractal Queue Managers.