Skip to content

Commit 4884de8

Browse files
documentation
1 parent 6eafa6e commit 4884de8

File tree

5 files changed

+147
-26
lines changed

5 files changed

+147
-26
lines changed

docs/reference/dsl_how_to_guides.md

Lines changed: 120 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -630,7 +630,7 @@ For more comprehensive examples have a look at the [DSL examples](https://github
630630

631631
### Document [doc_type]
632632

633-
If you want to create a model-like wrapper around your documents, use the `Document` class. It can also be used to create all the necessary mappings and settings in elasticsearch (see `life-cycle` for details).
633+
If you want to create a model-like wrapper around your documents, use the `Document` class (or the equivalent `AsyncDocument` for asynchronous applications). It can also be used to create all the necessary mappings and settings in Elasticsearch (see [Document life cycle](#life-cycle) below for details).
634634

635635
```python
636636
from datetime import datetime
@@ -721,9 +721,19 @@ class Post(Document):
721721
published: bool # same as published = Boolean(required=True)
722722
```
723723

724-
It is important to note that when using `Field` subclasses such as `Text`, `Date` and `Boolean`, they must be given in the right-side of an assignment, as shown in examples above. Using these classes as type hints will result in errors.
724+
::::{note}
725+
When using `Field` subclasses such as `Text`, `Date` and `Boolean` to define attributes, these classes must be given in the right-hand side.
726+
727+
```python
728+
class Post(Document):
729+
title = Text() # correct
730+
subtitle: Text # incorrect
731+
```
725732

726-
Python types are mapped to their corresponding field types according to the following table:
733+
Using a `Field` subclass as a Python type hint will result in errors.
734+
::::
735+
736+
Python types are mapped to their corresponding `Field` types according to the following table:
727737

728738
| Python type | DSL field |
729739
| --- | --- |
@@ -735,7 +745,7 @@ Python types are mapped to their corresponding field types according to the foll
735745
| `datetime` | `Date(required=True)` |
736746
| `date` | `Date(format="yyyy-MM-dd", required=True)` |
737747

738-
To type a field as optional, the standard `Optional` modifier from the Python `typing` package can be used. When using Python 3.10 or newer, "pipe" syntax can also be used, by adding `| None` to a type. The `List` modifier can be added to a field to convert it to an array, similar to using the `multi=True` argument on the field object.
748+
To type a field as optional, the standard `Optional` modifier from the Python `typing` package can be used. When using Python 3.10 or newer, "pipe" syntax can also be used, by adding `| None` to a type. The `List` modifier can be added to a field to convert it to an array, similar to using the `multi=True` argument on the `Field` object.
739749

740750
```python
741751
from typing import Optional, List
@@ -763,7 +773,7 @@ class Post(Document):
763773
comments: List[Comment] # same as comments = Nested(Comment, required=True)
764774
```
765775

766-
Unfortunately it is impossible to have Python type hints that uniquely identify every possible Elasticsearch field type. To choose a field type that is different than the one that is assigned according to the table above, the desired field instance can be added explicitly as a right-side assignment in the field declaration. The next example creates a field that is typed as `Optional[str]`, but is mapped to `Keyword` instead of `Text`:
776+
Unfortunately it is impossible to have Python type hints that uniquely identify every possible Elasticsearch `Field` type. To choose a type that is different than the one that is assigned according to the table above, the desired `Field` instance can be added explicitly as a right-side assignment in the field declaration. The next example creates a field that is typed as `Optional[str]`, but is mapped to `Keyword` instead of `Text`:
767777

768778
```python
769779
class MyDocument(Document):
@@ -787,7 +797,7 @@ class MyDocument(Document):
787797
category: str = mapped_field(Keyword(), default="general")
788798
```
789799

790-
When using the `mapped_field()` wrapper function, an explicit field type instance can be passed as a first positional argument, as the `category` field does in the example above.
800+
The `mapped_field()` wrapper function can optionally be given an explicit field type instance as a first positional argument, as the `category` field does in the example above to be defined as `Keyword` instead of the `Text` default.
791801

792802
Static type checkers such as [mypy](https://mypy-lang.org/) and [pyright](https://github.com/microsoft/pyright) can use the type hints and the dataclass-specific options added to the `mapped_field()` function to improve type inference and provide better real-time code completion and suggestions in IDEs.
793803

@@ -829,17 +839,17 @@ s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)
829839

830840
When specifying sorting order, the `+` and `-` unary operators can be used on the class field attributes to indicate ascending and descending order.
831841

832-
Finally, the `ClassVar` annotation can be used to define a regular class attribute that should not be mapped to the Elasticsearch index:
842+
Finally, it is also possible to define class attributes and request that they are ignored when building the Elasticsearch mapping. One way is to type attributes with the `ClassVar` annotation. Alternatively, the `mapped_field()` wrapper function accepts an `exclude` argument that can be set to `True`:
833843

834844
```python
835845
from typing import ClassVar
836846

837847
class MyDoc(Document):
838848
title: M[str] created_at: M[datetime] = mapped_field(default_factory=datetime.now)
839849
my_var: ClassVar[str] # regular class variable, ignored by Elasticsearch
850+
anoter_custom_var: int = mapped_field(exclude=True) # also ignored by Elasticsearch
840851
```
841852

842-
843853
#### Note on dates [_note_on_dates]
844854

845855
The DSL module will always respect the timezone information (or lack thereof) on the `datetime` objects passed in or stored in Elasticsearch. Elasticsearch itself interprets all datetimes with no timezone information as `UTC`. If you wish to reflect this in your python code, you can specify `default_timezone` when instantiating a `Date` field:
@@ -878,7 +888,7 @@ first.meta.id = 47
878888
first.save()
879889
```
880890

881-
All the metadata fields (`id`, `routing`, `index` etc) can be accessed (and set) via a `meta` attribute or directly using the underscored variant:
891+
All the metadata fields (`id`, `routing`, `index`, etc.) can be accessed (and set) via a `meta` attribute or directly using the underscored variant:
882892

883893
```python
884894
post = Post(meta={'id': 42})
@@ -961,12 +971,111 @@ first = Post.get(id=42)
961971
first.delete()
962972
```
963973

974+
#### Integration with Pydantic models
975+
976+
::::{warning}
977+
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
978+
::::
979+
980+
::::{note}
981+
This feature is available in the Python Elasticsearch client starting with release 9.2.0.
982+
::::
983+
984+
Applications that define their data models using [Pydantic](https://docs.pydantic.dev/latest/) can combine these
985+
models with Elasticsearch DSL annotations. To take advantage of this option, Pydantic's `BaseModel` base class
986+
needs to be replaced with `BaseESModel` (or `AsyncBaseESModel` for asynchronous applications), and then the model
987+
can include type annotations for Pydantic and Elasticsearch both, as demonstrated in the following example:
988+
989+
```python
990+
from typing import Annotated
991+
from pydantic import Field
992+
from elasticsearch import dsl
993+
from elasticsearch.dsl.pydantic import BaseESModel
994+
995+
class Quote(BaseESModel):
996+
quote: str
997+
author: Annotated[str, dsl.Keyword()]
998+
tags: Annotated[list[str], dsl.Keyword(normalizer="lowercase")]
999+
embedding: Annotated[list[float], dsl.DenseVector()] = Field(init=False, default=[])
1000+
1001+
class Index:
1002+
name = "quotes"
1003+
```
1004+
1005+
In this example, the `quote` attribute is annotated with a `str` type hint. Both Pydantic and Elasticsearch use this
1006+
annotation.
1007+
1008+
The `author` and `tags` attributes have a Python type hint and an Elasticsearch annotation, both wrapped with
1009+
Python's `typing.Annotated`. When using the `BaseESModel` class, the typing information intended for Elasticsearch needs
1010+
to be defined inside `Annotated`.
1011+
1012+
The `embedding` attribute includes a base Python type and an Elasticsearch annotation in the same format as the
1013+
other fields, but it adds Pydantic's `Field` definition as a right-hand side assignment.
1014+
1015+
Finally, any other items that need to be defined for the Elasticsearch document class, such as `class Index` and
1016+
`class Meta` entries (discussed later), can be added as well.
1017+
1018+
The next example demonstrates how to define `Object` and `Nested` fields:
1019+
1020+
```python
1021+
from typing import Annotated
1022+
from pydantic import BaseModel, Field
1023+
from elasticsearch import dsl
1024+
from elasticsearch.dsl.pydantic import BaseESModel
1025+
1026+
class Phone(BaseModel):
1027+
type: Annotated[str, dsl.Keyword()] = Field(default="Home")
1028+
number: str
1029+
1030+
class Person(BaseESModel):
1031+
name: str
1032+
main_phone: Phone # same as Object(Phone)
1033+
other_phones: list[Phone] # same as Nested(Phone)
1034+
1035+
class Index:
1036+
name = "people"
1037+
```
1038+
1039+
Note that inner classes do not need to be defined with a custom base class; these should be standard Pydantic model
1040+
classes. The attributes defined in these classes can include Elasticsearch annotations, as long as they are given
1041+
in an `Annotated` type hint.
1042+
1043+
All model classes that are created as described in this section function like normal Pydantic models and can be used
1044+
anywhere standard Pydantic models are used, but they have some added attributes:
1045+
1046+
- `_doc`: a class attribute that is a dynamically generated `Document` class to use with the Elasticsearch index.
1047+
- `meta`: an attribute added to all models that includes Elasticsearch document metadata items such as `id`, `score`, etc.
1048+
- `to_doc()`: a method that converts the Pydantic model to an Elasticsearch document.
1049+
- `from_doc()`: a class method that accepts an Elasticsearch document as an argument and returns an equivalent Pydantic model.
1050+
1051+
These are demonstrated in the examples below:
1052+
1053+
```python
1054+
# create a Pydantic model
1055+
quote = Quote(
1056+
quote="An unexamined life is not worth living.",
1057+
author="Socrates",
1058+
tags=["phillosophy"]
1059+
)
1060+
1061+
# save the model to the Elasticsearch index
1062+
quote.to_doc().save()
1063+
1064+
# get a document from the Elasticsearch index as a Pydantic model
1065+
quote = Quote.from_doc(Quote._doc.get(id=42))
1066+
1067+
# run a search and print the Pydantic models
1068+
s = Quote._doc.search().query(Match(Quote._doc.quote, "life"))
1069+
for doc in s:
1070+
quote = Quote.from_doc(doc)
1071+
print(quote.meta.id, quote.meta.score, quote.quote)
1072+
```
9641073

9651074
#### Analysis [_analysis]
9661075

9671076
To specify `analyzer` values for `Text` fields you can just use the name of the analyzer (as a string) and either rely on the analyzer being defined (like built-in analyzers) or define the analyzer yourself manually.
9681077

969-
Alternatively you can create your own analyzer and have the persistence layer handle its creation, from our example earlier:
1078+
Alternatively, you can create your own analyzer and have the persistence layer handle its creation, from our example earlier:
9701079

9711080
```python
9721081
from elasticsearch.dsl import analyzer, tokenizer
@@ -1622,7 +1731,7 @@ for response in responses:
16221731

16231732
### Asynchronous Documents, Indexes, and more [_asynchronous_documents_indexes_and_more]
16241733

1625-
The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and `FacetedSearch` classes all have asynchronous versions that use the same name with an `Async` prefix. These classes expose the same interfaces as the synchronous versions, but any methods that perform I/O are defined as coroutines.
1734+
The `Document`, `BaseESModel`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and `FacetedSearch` classes all have asynchronous versions that use the same name with an `Async` prefix. These classes expose the same interfaces as the synchronous versions, but any methods that perform I/O are defined as coroutines.
16261735

16271736
Auxiliary classes that do not perform I/O do not have asynchronous versions. The same classes can be used in synchronous and asynchronous applications.
16281737

docs/reference/dsl_tutorials.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ In this example you can see:
134134
* retrieving and saving the object into Elasticsearch
135135
* accessing the underlying client for other APIs
136136

137-
You can see more in the `persistence` chapter.
137+
You can see more in the [persistence](dsl_how_to_guides.md#_persistence_2) chapter.
138138

139139

140140
## Pre-built Faceted Search [_pre_built_faceted_search]

elasticsearch/dsl/pydantic.py

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424

2525

2626
class ESMeta(BaseModel):
27+
"""Metadata items associated with Elasticsearch documents."""
28+
2729
id: str = ""
2830
index: str = ""
2931
primary_term: int = 0
@@ -40,10 +42,15 @@ class _BaseModel(BaseModel):
4042

4143

4244
class _BaseESModelMetaclass(type(BaseModel)): # type: ignore[misc]
45+
"""Generic metaclass methods for BaseEsModel and AsyncBaseESModel."""
46+
4347
@staticmethod
4448
def process_annotations(
4549
metacls: Type["_BaseESModelMetaclass"], annotations: Dict[str, Any]
4650
) -> Dict[str, Any]:
51+
"""Process Pydantic typing annotations and adapt them so that they can
52+
be used to create the Elasticsearch document.
53+
"""
4754
updated_annotations = {}
4855
for var, ann in annotations.items():
4956
if isinstance(ann, type(BaseModel)):
@@ -71,6 +78,8 @@ def make_dsl_class(
7178
pydantic_model: type,
7279
pydantic_attrs: Optional[Dict[str, Any]] = None,
7380
) -> type:
81+
"""Create a DSL document class dynamically, using the structure of a
82+
Pydantic model."""
7483
dsl_attrs = {
7584
attr: value
7685
for attr, value in dsl_class.__dict__.items()
@@ -94,47 +103,50 @@ def make_dsl_class(
94103

95104

96105
class BaseESModelMetaclass(_BaseESModelMetaclass):
106+
"""Metaclass for the BaseESModel class."""
107+
97108
def __new__(cls, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]) -> Any:
98109
model = super().__new__(cls, name, bases, attrs)
99110
model._doc = cls.make_dsl_class(cls, dsl.Document, model, attrs)
100111
return model
101112

102113

114+
class AsyncBaseESModelMetaclass(_BaseESModelMetaclass):
115+
"""Metaclass for the AsyncBaseESModel class."""
116+
117+
def __new__(cls, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]) -> Any:
118+
model = super().__new__(cls, name, bases, attrs)
119+
model._doc = cls.make_dsl_class(cls, dsl.AsyncDocument, model, attrs)
120+
return model
121+
122+
103123
@dataclass_transform(kw_only_default=True, field_specifiers=(Field, PrivateAttr))
104124
class BaseESModel(_BaseModel, metaclass=BaseESModelMetaclass):
105125
_doc: ClassVar[Type[dsl.Document]]
106126

107127
def to_doc(self) -> dsl.Document:
128+
"""Convert this model to an Elasticsearch document."""
108129
data = self.model_dump()
109130
meta = {f"_{k}": v for k, v in data.pop("meta", {}).items() if v}
110131
return self._doc(**meta, **data)
111132

112133
@classmethod
113134
def from_doc(cls, dsl_obj: dsl.Document) -> Self:
135+
"""Create a model from the given Elasticsearch document."""
114136
return cls(meta=ESMeta(**dsl_obj.meta.to_dict()), **dsl_obj.to_dict())
115137

116138

117-
class AsyncBaseESModelMetaclass(_BaseESModelMetaclass):
118-
def __new__(cls, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]) -> Any:
119-
model = super().__new__(cls, name, bases, attrs)
120-
model._doc = cls.make_dsl_class(cls, dsl.AsyncDocument, model, attrs)
121-
return model
122-
123-
124139
@dataclass_transform(kw_only_default=True, field_specifiers=(Field, PrivateAttr))
125140
class AsyncBaseESModel(_BaseModel, metaclass=AsyncBaseESModelMetaclass):
126141
_doc: ClassVar[Type[dsl.AsyncDocument]]
127142

128143
def to_doc(self) -> dsl.AsyncDocument:
144+
"""Convert this model to an Elasticsearch document."""
129145
data = self.model_dump()
130146
meta = {f"_{k}": v for k, v in data.pop("meta", {}).items() if v}
131147
return self._doc(**meta, **data)
132148

133149
@classmethod
134150
def from_doc(cls, dsl_obj: dsl.AsyncDocument) -> Self:
151+
"""Create a model from the given Elasticsearch document."""
135152
return cls(meta=ESMeta(**dsl_obj.meta.to_dict()), **dsl_obj.to_dict())
136-
137-
138-
# TODO
139-
# - object and nested fields
140-
# - tests

test_elasticsearch/test_dsl/_async/test_document.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -533,7 +533,7 @@ def test_document_inheritance() -> None:
533533
} == MySubDoc._doc_type.mapping.to_dict()
534534

535535

536-
def test_childdoc_class_can_override_parent() -> None:
536+
def test_child_class_can_override_parent() -> None:
537537
class A(AsyncDocument):
538538
o = field.Object(dynamic=False, properties={"a": field.Text()})
539539

test_elasticsearch/test_dsl/_sync/test_document.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -533,7 +533,7 @@ def test_document_inheritance() -> None:
533533
} == MySubDoc._doc_type.mapping.to_dict()
534534

535535

536-
def test_childdoc_class_can_override_parent() -> None:
536+
def test_child_class_can_override_parent() -> None:
537537
class A(Document):
538538
o = field.Object(dynamic=False, properties={"a": field.Text()})
539539

0 commit comments

Comments
 (0)