Skip to content

suchismitasarker/ChessDataManagement

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chess Data Management service

Build Status Go Report Card

Introduction

The CHESS data flow has been discussed in this document.

Here we propose a possible architecture for CHESS data management based on gradual enchancement of existing infrastructure:

ChessDataManagement

In particular, we propose to introduce the following components:

  • MetaData DB based on MongoDB or similar document-oriented database. Such solution should provide the following features:

    • be able to handle free-structured text documents
    • provide reach QueryLanguage (QL)
  • Files DB based on any relation database, e.g. MySQL or free alternative MariaDB. The purpose of this database is provide data bookkeeping capabilities and organize meta-data in the following form:

    • a dataset is a collection of files (or blocks)
    • each dataset name may carry on an Experiment name and additional meta-data information
    • organize files in specific data-tiers, e.g. RAW for raw data, AOD for processed data, etc.
    • as such each dataset will have a form of a path: /Experiment/Processing/Tier

Both databases may reside in their own data-service called MetaData Service. Such service can provide RESTful APIs for end-users, such as

  • inject data to DBs
  • fetch results
  • update data in DBs
  • delete data in DBs

In addition, we suggest to introduce Input Data Service which can take care of standardization of user inputs, e.g. key-value pairs, tagging, etc. It is not required originally, but will help in a long run to provide uniform data representation for Meta Data Service.

Finally, the data access can be organized via XrootD service.

Insert data into MetaData DB:

We provide a chess_parser.py script to parse input Microsoft Word documents, extract and inject its content into MongoDB. Here is an example of such operation

# prepare files.db (so far we use SqliteDB)
rm files.db; sqlite3 files.db < doc/schema.sql

# start MongoDB

# prepare parameter file which will contain DB, experiment parameters
# so far I choose JSON data-format, but it can be replaced with
# YAML data-format which is much more simpler and intuitive for end-users
cat doc/params.json
{
    "fname": "doc/miller-774-1_beamtime_notes.docx",
    "path": "files",
    "dburi": "mongodb://localhost:8230",
    "dbname": "chess",
    "dbcoll": "meta",
    "filesdb": "files.db",
    "experiment": "Titanium",
    "processing": "FirstPass",
    "tier": "RAW"
}

# inject data into MongoDB (MetaDataDB) and Sqlite (FilesDB)
./chess_parser.py --params=doc/params.json --verbose

Find documents in MetaData DB

We provide basic implementation of finder script chess_finder.py which should be able to find required meta-data in MongoDB via provide free-text query:

# find meta-data information
./chess_finder.py --params=doc/params.json --query="scan 74-77"
# find corredponding files
./chess_finder.py --params=doc/params.json --query="scan 74-77" --list-files --verbose

Please note, to perform free text search queries we need to define text index, e.g.

db.meta.createIndex( { description: "text" } )

References

  1. MongoDB
  2. MySQL
  3. MariaDB
  4. MongoDB free-text search

About

CHESS Data Management

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 78.4%
  • Python 12.1%
  • JavaScript 7.0%
  • Makefile 1.4%
  • Dockerfile 1.1%