Skip to content

Commit f94c48e

Browse files
committed
Add experimental support for Zarr format 3
1 parent d870b5b commit f94c48e

File tree

1 file changed

+14
-10
lines changed

1 file changed

+14
-10
lines changed

vcf_zarr_spec.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
This document is a technical specification for VCF Zarr, a means of encoding VCF data in chunked-columnar form using the Zarr format.
66

77
This specification depends on definitions and terminology from [The Variant Call Format Specification, VCFv4.3 and BCFv2.2](https://samtools.github.io/hts-specs/VCFv4.3.pdf),
8-
and [Zarr storage specification version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html).
8+
and [Zarr storage specification version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html) or [Zarr core specification verion 3](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html) [experimental].
99

1010
## Compatibility with VCF and BCF
1111

@@ -42,17 +42,19 @@ Each VCF field is stored in a separate Zarr array. This specification only manda
4242

4343
This document uses a shorthand notation to refer to Zarr data types (dtypes). The following table shows the mapping to VCF types.
4444

45-
| Shorthand | Zarr dtypes | VCF Type |
46-
|-----------|----------------------------------------------------------|-----------|
47-
| `bool` | `\|b1` | Flag |
48-
| `int` | `<i1`, `<i2`, `<i4`, `<i8` or `>i1`, `>i2`, `>i4`, `>i8` | Integer |
49-
| `float` | `<f4`, `<f8` or `>f4`, `>f8` | Float |
50-
| `char` | `\<U1` or `\>U1` | Character |
51-
| `str` | `\|O` | String |
45+
| Shorthand | Zarr 2 data types | Zarr 3 data types | VCF Type |
46+
|-----------|----------------------------------------------------------|-----------------------------------|-----------|
47+
| `bool` | `\|b1` | `bool` | Flag |
48+
| `int` | `<i1`, `<i2`, `<i4`, `<i8` or `>i1`, `>i2`, `>i4`, `>i8` | `int8`, `int16`, `int32`, `int64` | Integer |
49+
| `float` | `<f4`, `<f8` or `>f4`, `>f8` | `float32`, `float64` | Float |
50+
| `char` | `\<U1` or `\>U1` | `string` | Character |
51+
| `str` | `\|O` | `string` | String |
5252

5353
This specification does not mandate a byte order for numeric types: little-endian (e.g. `<i4`) or big-endian (`>i4`) are both permitted.
5454

55-
The `str` dtype is used to represent [variable-length strings](https://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays). In this case a Zarr array filter with and `id` of `vlen-utf8` must be specified for the array.
55+
*[Zarr 2 only]* The `str` dtype is used to represent [variable-length strings](https://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays). In this case a Zarr array filter with and `id` of `vlen-utf8` must be specified for the array.
56+
57+
*[Zarr 3 only]* The `str` dtype is used to represent [variable-length strings](https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/string). In this case a Zarr array [`vlen-utf8`](https://github.com/zarr-developers/zarr-extensions/blob/main/codecs/vlen-utf8/README.md) codec must be specified for the array.
5658

5759
### Missing and fill values
5860

@@ -73,7 +75,9 @@ An array called `<name>` may have an accompanying array called `<name>_mask` wit
7375

7476
### Array dimension names
7577

76-
Following [Xarray conventions](http://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html), each Zarr array has an attribute `_ARRAY_DIMENSIONS`, which is a list of strings naming the dimensions.
78+
*[Zarr 2 only]* Following [Xarray conventions](http://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html), each Zarr array has an attribute `_ARRAY_DIMENSIONS`, which is a list of strings naming the dimensions.
79+
80+
*[Zarr 3 only]* The Zarr array metadata must include `dimension_names`, which is a list of strings naming the dimensions.
7781

7882
The reserved dimension names and their sizes are listed in the following table, along with the corresponding VCF Number value, if applicable.
7983

0 commit comments

Comments
 (0)