Skip to content

CHESSComputing/ChessDataManagement

Repository files navigation

Chess Data Management service

Build Status Go CI build Go Report Card CHESS-gitlab build

Introduction

The CHESS data flow has been discussed in this document.

Here we propose a possible architecture for CHESS data management based on gradual enchancement of existing infrastructure:

ChessDataManagement

In particular, we propose to introduce the following components:

  • MetaData DB based on MongoDB or similar document-oriented database. Such solution should provide the following features:

    • be able to handle free-structured text documents
    • provide reach QueryLanguage (QL)
  • Files DB based on any relation database, e.g. MySQL or free alternative MariaDB. The purpose of this database is provide data bookkeeping capabilities and organize meta-data in the following form:

    • a dataset is a collection of files (or blocks)
    • each dataset name may carry on an Experiment name and additional meta-data information
    • organize files in specific data-tiers, e.g. RAW for raw data, AOD for processed data, etc.
    • as such each dataset will have a form of a path: /Experiment/Processing/Tier

Both databases may reside in their own data-service called MetaData Service. Such service can provide RESTful APIs for end-users, such as

  • inject data to DBs
  • fetch results
  • update data in DBs
  • delete data in DBs

In addition, we suggest to introduce Input Data Service which can take care of standardization of user inputs, e.g. key-value pairs, tagging, etc. It is not required originally, but will help in a long run to provide uniform data representation for Meta Data Service.

Finally, the data access can be organized via XrootD service.

References

  1. Server
  2. Client
  3. Maintenance