diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 1a632f927..d0fb8d27a 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -19,7 +19,7 @@ * [Metadata](./metadata.md) - [Defining metadata types in rust](./metadata_derive.md) - [Metadata and tables](./metadata_tables.md) - - [Metadata schema](./metadata_schema.md) + - [Metadata processing with Python](./metadata_python.md) - [Advanced topics](./metadata_advanced.md) * [Error handling](./error_handling.md) diff --git a/book/src/metadata_python.md b/book/src/metadata_python.md new file mode 100644 index 000000000..4cf1ed110 --- /dev/null +++ b/book/src/metadata_python.md @@ -0,0 +1,31 @@ +# Metadata processing with Python + +## `JSON` metadata + +If your metadata are generated in `JSON` format via `serde` (see [here](metadata_derive.md)), then the metadata are simple to access from Python. +The code repository for `tskit-rust` contains examples in the `python/` subdirectory. + +You may work with `JSON` metadata with or without a metadata schema (see [here](https://tskit.dev/tskit/docs/stable/metadata.html)). +A schema is useful for data validation but there is an unfortunate inefficiency if your input to Python is a tree sequence rather than a table collection. +You will have to copy the tables, add the metadata schema, and regenerate a tree sequence. +See the examples mentioned above. + +## Other formats + +The `tskit-python` API only supports `JSON` and Python's `struct` data formats. +It is useful to use a format other than `JSON` in order to minimize storage requirements. +However, doing so will require that you provide a method to covert the data into a valid Python object. + +An easy way to provide conversion methods is to use [pyo3](https://pyo3.rs) to create a small Python module to deserialize your metadata into Python objects. +The `tskit-rust` code repository contains an example of this in the `python/` subdirectory. +The module is shown in its entirety below: + +```rust, noplaygound, ignore +{{#include ../../python/tskit_glue/src/lib.rs}} +``` + +Using it in Python is just a matter of importing the module: + +```python +{{#include ../../python/test_bincode_metadata.py}} +``` diff --git a/book/src/metadata_schema.md b/book/src/metadata_schema.md deleted file mode 100644 index 036c0323f..000000000 --- a/book/src/metadata_schema.md +++ /dev/null @@ -1,17 +0,0 @@ -# Metadata schema - -For useful data interchange with `tskit-python`, we need to define [metadata schema](https://tskit.dev/tskit/docs/stable/metadata.html). - -There are currently several points slowing down a rust API for schema: - -* It is not clear which `serde` formats are compatible with metadata on the Python side. -* Experiments have shown that `serde_json` works with `tskit-python`. - * Ideally, we would also like a binary format compatible with the Python `struct` - module. -* However, we have not found a solution eliminating the need to manually write the - schema as a string and add it to the tables. - Various crates to generate JSON schema from rust structs return schema that are over-specified - and fail to validate in `tskit-python`. -* We also have the problem that we will need to add some Python to our CI to prove to ourselves - that some reasonable tests can pass. -