Skip to content
This repository was archived by the owner on Jul 5, 2020. It is now read-only.

PDBxmmCIF JSON Schema

Keith T. Star edited this page Apr 22, 2016 · 6 revisions

This note is about how we have created a JSON Schema using the PDBxmmCIF metadata, and some about how it's used in Sphinx. See issue #1 for more information on what exactly PDBxmmCIF is all about. If you need some help with JSON Schema, this is a nice tutorial.

We generate the JSON schema file from the PDBx/mmCIF schema. The code used to do this is located in the build_pdbx_types.py file. It relies heavily on this Python library, which was only slightly modified to support Python 3.

The resultant JSON Schema generates 378 different JSON objects from the the V4 PDBx/mmCIF schema. Each of these objects is injected into our schema in the top-level definitions section, and is referenced by the properties section as a JSON Pointer. This organization allows for embedding schemata, as well as for validation. The end result is a single "root schema", which contains all of the data from the PDBx/mmCIF schema, that supports extension of the schema and validation of its instances.

The entire thing is wrapped up in the TypeManager class. It provides a Python wrapper to the JSON Schema, supporting operations like creating a new type, extending an existing type1, and creating instances of types.

N.B.: the schema is not generated with perfect fidelity. For example there are regular expressions in the PDBx/mmCIF schema that are used to validate type instances that aren't reflected in the JSON Schema. While the JSON Schema does support regular expression "patterns" for validation, this was not included at the time the generator was written in the interest of time.

1: This is implemented as something that looks more like composition than encapsulation. This is sort of a bummer. My initial vision was of a hierarchical type system. The advantage that I saw to this was that a plugin might handle something like, atom_site, and if you sent it an instance of a sub-type of atom_site, like apbs_atom_site, when it would handle that naturally. Not that this is impossible with composition, it's just more awkward, IMO. All that said, the utility of my initial idea is still unknown, so it's really not the right time to pursue it further.

Clone this wiki locally