A small package to simplify creating dataclasses from JSON and validating that JSON.
$ pip install dataglasses
Requires Python 3.10 or later.
If you wish to validate arbitrary JSON data against the generated JSON schemas in Python, consider installing jsonschema, though this is unnecessary when using dataglasses
to convert JSON into dataclasses.
>>> from dataclasses import dataclass
>>> from dataglasses import from_dict, to_json_schema
>>> from json import dumps
>>> @dataclass
... class InventoryItem:
... name: str
... unit_price: float
... quantity_on_hand: int = 0
>>> from_dict(InventoryItem, { "name": "widget", "unit_price": 3.0})
InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)
>>> print(dumps(to_json_schema(InventoryItem), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/InventoryItem",
"$defs": {
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
}
}
}
The purpose of this library is to speed up rapid development by making it trivial to populate dataclasses with dictionary data extracted from JSON (or elsewhere), as well as to perform basic validation on that data. The library contains just one file and two functions, so can even be directly copied into a project.
It is not intended for complex validation or high performance. For those, consider using pydantic.
The package contains just two functions:
def from_dict(
cls: type[T],
value: Any,
*,
strict: bool = False,
transform: Optional[TransformRules] = None,
local_refs: Optional[set[type]] = None,
) -> T
This converts a nested dictionary value
of input data into the given dataclass type cls
, raising an exception if the conversion is not possible. (The optional keyword arguments are described further down.)
def to_json_schema(
cls: type,
*,
strict: bool = False,
transform: Optional[TransformRules] = None,
local_refs: Optional[set[type]] = None,
) -> dict[str, Any]:
This generates a 2020-12 JSON schema representing valid inputs for the dataclass type cls
, raising an exception if the class cannot be represented in JSON. (Again, the optional keyword arguments are described further down.)
Below is a summary of the different supported use cases:
Dataclasses can be nested, using either global or local definitions.
>>> @dataclass
... class TrackedItem:
...
... @dataclass
... class GPS:
... lat: float
... long: float
...
... item: InventoryItem
... location: GPS
>>> from_dict(TrackedItem, {
... "item": { "name": "pie", "unit_price": 42},
... "location": { "lat": 52.2, "long": 0.1 } })
TrackedItem(item=InventoryItem(name='pie', unit_price=42, quantity_on_hand=0),
location=TrackedItem.GPS(lat=52.2, long=0.1))
>>> print(dumps(to_json_schema(TrackedItem), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/TrackedItem",
"$defs": {
"TrackedItem": {
"type": "object",
"properties": {
"item": {
"$ref": "#/$defs/InventoryItem"
},
"location": {
"$ref": "#/$defs/TrackedItem.GPS"
}
},
"required": [
"item",
"location"
]
},
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
},
"TrackedItem.GPS": {
"type": "object",
"properties": {
"lat": {
"type": "number"
},
"long": {
"type": "number"
}
},
"required": [
"lat",
"long"
]
}
}
}
There is automatic support for the generic collection types most compatible with JSON: list[T]
, tuple[...]
and Sequence[T]
(encoded as arrays) and dict[str, T]
and Mapping[str, T]
(encoded as objects).
>>> from collections.abc import Mapping, Sequence
>>> @dataclass
... class Catalog:
... items: Sequence[InventoryItem]
... publisher: tuple[str, int]
... purchases: Mapping[str, int]
>>> from_dict(Catalog, {
... "items": [{ "name": "widget", "unit_price": 3.0}],
... "publisher": ["ACME", 1982],
... "purchases": { "Wile E. Coyote": 52}})
Catalog(items=[InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)],
publisher=('ACME', 1982), purchases={'Wile E. Coyote': 52})
>>> print(dumps(to_json_schema(Catalog), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Catalog",
"$defs": {
"Catalog": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"$ref": "#/$defs/InventoryItem"
}
},
"publisher": {
"type": "array",
"prefixItems": [
{
"type": "string"
},
{
"type": "integer"
}
],
"minItems": 2,
"maxItems": 2
},
"purchases": {
"type": "object",
"patternProperties": {
"^.*$": {
"type": "integer"
}
}
}
},
"required": [
"items",
"publisher",
"purchases"
]
},
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
}
}
}
Unrestricted types like list
or dict
(or set
or Any
) and mappings with non-str
keys can be used with from_dict
but not with to_json_schema
. Alternatively, these, alongside unsupported generic types like set[T]
, can be used with both from_dict
and to_json_schema
by defining an appropriate encoding transformation (see section below).
Union types (S | T
or Union[S, T, ...]
) are matched against all their permitted subtypes in order, returning the first successful match, or raising an exception if there are none. Optional types (T | None
or Optional[T]
) are handled similarly. Note that an optional type is not the same as an optional field (i.e. one with a default): a field with an optional type is still a required field unless it has a default value (which could be None
but could also be something else).
>>> from typing import Optional
>>> @dataclass
... class ItemPurchase:
... items: Sequence[InventoryItem | TrackedItem]
... invoice: Optional[int] = None
>>> from_dict(ItemPurchase, {
... "items": [{
... "item": { "name": "pie", "unit_price": 42},
... "location": { "lat": 52.2, "long": 0.1 } }],
... "invoice": 1234})
ItemPurchase(items=[TrackedItem(item=
InventoryItem(name='pie', unit_price=42, quantity_on_hand=0),
location=TrackedItem.GPS(lat=52.2, long=0.1))], invoice=1234)
>>> print(dumps(to_json_schema(ItemPurchase), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/ItemPurchase",
"$defs": {
"ItemPurchase": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"anyOf": [
{
"$ref": "#/$defs/InventoryItem"
},
{
"$ref": "#/$defs/TrackedItem"
}
]
}
},
"invoice": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null
}
},
"required": [
"items"
]
},
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
},
"TrackedItem": {
"type": "object",
"properties": {
"item": {
"$ref": "#/$defs/InventoryItem"
},
"location": {
"$ref": "#/$defs/TrackedItem.GPS"
}
},
"required": [
"item",
"location"
]
},
"TrackedItem.GPS": {
"type": "object",
"properties": {
"lat": {
"type": "number"
},
"long": {
"type": "number"
}
},
"required": [
"lat",
"long"
]
}
}
}
Both Enum
and Literal
types can be used to match explicit enumerations. By default, Enum
types match both the values and symbolic names (preferring the former in case of a clash). This behaviour can be overridden using a transformation if desired (see section below).
>>> from enum import auto, StrEnum
>>> from typing import Literal
>>> class BuildType(StrEnum):
... DEBUG = auto()
... OPTIMIZED = auto()
>>> @dataclass
... class Release:
... build: BuildType
... approved: Literal["Yes", "No"]
>>> from_dict(Release, {"build": "debug", "confirmed": "Yes"})
Release(build=<Build.DEBUG: 'debug'>, approved='Yes')
>>> print(dumps(to_json_schema(Release), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Release",
"$defs": {
"Release": {
"type": "object",
"properties": {
"build": {
"enum": [
"debug",
"optimized",
"DEBUG",
"OPTIMIZED"
]
},
"approved": {
"enum": [
"Yes",
"No"
]
}
},
"required": [
"build",
"confirmed"
]
}
}
}
Annotated
types can be used to populate the property "description"
annotations in the JSON schema.
>>> from typing import Annotated
>>> @dataclass
... class InventoryItem:
... name: Annotated[str, "item name"]
... unit_price: Annotated[float, "unit price"]
... quantity_on_hand: Annotated[int, "quantity on hand"] = 0
>>> from_dict(InventoryItem, { "name": "widget", "unit_price": 3.0})
InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)
>>> print(dumps(to_json_schema(InventoryItem), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/InventoryItem",
"$defs": {
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "item name"
},
"unit_price": {
"type": "number",
"description": "unit price"
},
"quantity_on_hand": {
"type": "integer",
"description": "quantity on hand",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
}
}
}
Forward reference types (written as string literals or ForwardRef
objects) are supported, permitting recursive dataclasses. Global and class-scoped references are handled automatically:
>>> @dataclass
... class Cons:
... head: "Head"
... tail: Optional["Cons"] = None
...
... @dataclass
... class Head:
... v: int
...
... def __repr__(self):
... return f"{self.head.v}::{self.tail}"
>>> from_dict(Cons, {"head": {"v": 1}, "tail": {"head": {"v": 2}}})
1::2::None
>> print(dumps(to_json_schema(Cons), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Cons",
"$defs": {
"Cons": {
"type": "object",
"properties": {
"head": {
"$ref": "#/$defs/Cons.Head"
},
"tail": {
"anyOf": [
{
"$ref": "#/$defs/Cons"
},
{
"type": "null"
}
],
"default": null
}
},
"required": [
"head"
]
},
"Cons.Head": {
"type": "object",
"properties": {
"v": {
"type": "integer"
}
},
"required": [
"v"
]
}
}
}
Locally-scoped references, however, must be specified using the local_refs
keyword:
>>> def reverse_cons(seq):
...
... @dataclass
... class Cons:
... head: int
... tail: Optional["Cons"] = None
...
... def __repr__(self):
... return f"{self.head}::{self.tail}"
...
... value = None
... for x in seq: value = { "head": x, "tail": value }
... return from_dict(Cons, value, local_refs={Cons})
>>> reverse_cons([1,2,3])
3::2::1::None
Both from_dict
and to_json_schema
default to ignoring additional properties that are not part of a dataclass (similar to additionalProperties
defaulting to true in JSON schemas). This can be disabled with the strict
keyword.
>>> value = { "name": "widget", "unit_price": 4.0, "comment": "too expensive"}
>>> from_dict(InventoryItem, value)
InventoryItem(name='widget', unit_price=4.0, quantity_on_hand=0)
>>> from_dict(InventoryItem, value, strict=True)
TypeError: Unexpected <class '__main__.InventoryItem'> fields {'comment'}
>>> print(dumps(to_json_schema(InventoryItem, strict=True), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/InventoryItem",
"$defs": {
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "item name"
},
"unit_price": {
"type": "number",
"description": "unit price"
},
"quantity_on_hand": {
"type": "integer",
"description": "quantity on hand",
"default": 0
}
},
"required": [
"name",
"unit_price"
],
"additionalProperties": false
}
}
}
Transformations allow you to override the handling of specific types or dataclass fields, and can be used to normalise inputs or convert them into different types, including ones that aren't normally supported. Transformations are specified with the transform
keyword, using a mapping:
- the mapping keys are either:
- a type used somewhere in the output dataclass: e.g.
str
orset[int]
- a dataclass field specified by a class-name tuple: e.g.
(InventoryItem, "name")
or(Cons, "head")
- a type used somewhere in the output dataclass: e.g.
- the mapping values are a tuple consisting of:
- the JSON-serialisable input type that we want to represent this output type or field
- a callable function to convert from that input type to the output type
Note that the input type can be the same as the output type. Conversely, note that transformations don't help with serialising the dataclasses back into JSON from non-serialisable types.
>>> @dataclass
... class Person:
... name : str
... aliases: set[str]
>>> transform = {
... str: (str, str.title),
... set[str]: (list[str], set),
... (Person, "name"): (str, lambda s: s + "!")}
>>> from_dict(Person, {"name": "robert", "aliases": ["bob", "bobby"]}, transform=transform)
Person(name='Robert!', aliases={'Bobby', 'Bob'})
>>> print(dumps(to_json_schema(Person, transform=transform), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Person",
"$defs": {
"Person": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"aliases": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"aliases"
]
}
}
}
Bug reports, feature requests and contributions are very welcome. Note that PRs must include tests with 100% code coverage and pass the necessary quality checks before they can be merged.
To run the tests, make sure you have uv installed, then type:
$ uv run task tests
To perform the formatting and linting checks, type:
$ uv run task check
To automatically resolve automatically fixable formatting and linting issues, type:
$ uv run task format