GitHub - suchismitasarker/ChessDataManagement: CHESS Data Management

Chess Data Management service

Introduction

The CHESS data flow has been discussed in this document.

Here we propose a possible architecture for CHESS data management based on gradual enchancement of existing infrastructure:

In particular, we propose to introduce the following components:

MetaData DB based on MongoDB or similar document-oriented database. Such solution should provide the following features:
- be able to handle free-structured text documents
- provide reach QueryLanguage (QL)
Files DB based on any relation database, e.g. MySQL or free alternative MariaDB. The purpose of this database is provide data bookkeeping capabilities and organize meta-data in the following form:
- a dataset is a collection of files (or blocks)
- each dataset name may carry on an Experiment name and additional meta-data information
- organize files in specific data-tiers, e.g. RAW for raw data, AOD for processed data, etc.
- as such each dataset will have a form of a path: /Experiment/Processing/Tier

Both databases may reside in their own data-service called MetaData Service. Such service can provide RESTful APIs for end-users, such as

inject data to DBs
fetch results
update data in DBs
delete data in DBs

In addition, we suggest to introduce Input Data Service which can take care of standardization of user inputs, e.g. key-value pairs, tagging, etc. It is not required originally, but will help in a long run to provide uniform data representation for Meta Data Service.

Finally, the data access can be organized via XrootD service.

Insert data into MetaData DB:

We provide a chess_parser.py script to parse input Microsoft Word documents, extract and inject its content into MongoDB. Here is an example of such operation

# prepare files.db (so far we use SqliteDB)
rm files.db; sqlite3 files.db < doc/schema.sql

# start MongoDB

# prepare parameter file which will contain DB, experiment parameters
# so far I choose JSON data-format, but it can be replaced with
# YAML data-format which is much more simpler and intuitive for end-users
cat doc/params.json
{
    "fname": "doc/miller-774-1_beamtime_notes.docx",
    "path": "files",
    "dburi": "mongodb://localhost:8230",
    "dbname": "chess",
    "dbcoll": "meta",
    "filesdb": "files.db",
    "experiment": "Titanium",
    "processing": "FirstPass",
    "tier": "RAW"
}

# inject data into MongoDB (MetaDataDB) and Sqlite (FilesDB)
./chess_parser.py --params=doc/params.json --verbose

Find documents in MetaData DB

We provide basic implementation of finder script chess_finder.py which should be able to find required meta-data in MongoDB via provide free-text query:

# find meta-data information
./chess_finder.py --params=doc/params.json --query="scan 74-77"
# find corredponding files
./chess_finder.py --params=doc/params.json --query="scan 74-77" --list-files --verbose

Please note, to perform free text search queries we need to define text index, e.g.

db.meta.createIndex( { description: "text" } )

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github/workflows		.github/workflows
GenerateSchema		GenerateSchema
client		client
doc		doc
web		web
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chess Data Management service

Introduction

Insert data into MetaData DB:

Find documents in MetaData DB

References

About

Releases

Packages

Languages

License

suchismitasarker/ChessDataManagement

Folders and files

Latest commit

History

Repository files navigation

Chess Data Management service

Introduction

Insert data into MetaData DB:

Find documents in MetaData DB

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages