Skip to content

Commit 36e0f09

Browse files
authored
fix: qa cookbook (#203)
* fix: qa cookbook * fix: docs data layer * fix: data layers * fix: cloud -> files system
1 parent 1363574 commit 36e0f09

File tree

9 files changed

+364
-291
lines changed

9 files changed

+364
-291
lines changed

api-reference/data-persistence/custom-data-layer.mdx

+2-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
title: Custom Data Layer
33
---
44

5-
The `BaseDataLayer` class serves as an abstract foundation for data persistence operations within the Chainlit framework. This class outlines methods for managing users, feedback, elements, steps, and threads in a chatbot application.
5+
The `BaseDataLayer` class serves as an abstract foundation for data persistence operations within the Chainlit framework.
6+
This class outlines methods for managing users, feedback, elements, steps, and threads in a chatbot application.
67

78
## Methods
89

data-layers/dynamodb.mdx

+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
---
2+
title: DynamoDB Data Layer
3+
---
4+
5+
This data layer also supports the `BaseStorageClient` that enables you to store your elements into AWS S3 or Azure Blob Storage.
6+
7+
## Example
8+
Here is an example of setting up this data layer. First install boto3:
9+
```bash
10+
pip install boto3
11+
```
12+
13+
Import the custom data layer and storage client, and set the `cl_data._data_layer` variable at the beginning of your Chainlit app.
14+
15+
```python
16+
import chainlit.data as cl_data
17+
from chainlit.data.dynamodb import DynamoDBDataLayer
18+
from chainlit.data.storage_clients import S3StorageClient
19+
20+
storage_client = S3StorageClient(bucket="<Your Bucket>")
21+
22+
cl_data._data_layer = DynamoDBDataLayer(table_name="<Your Table>", storage_provider=storage_client)
23+
```
24+
25+
## Table structure
26+
27+
Here is the Cloudformation used to create the dynamo table:
28+
```json
29+
{
30+
"AWSTemplateFormatVersion": "2010-09-09",
31+
"Resources": {
32+
"DynamoDBTable": {
33+
"Type": "AWS::DynamoDB::Table",
34+
"Properties": {
35+
"TableName": "<YOUR-TABLE-NAME>",
36+
"AttributeDefinitions": [
37+
{
38+
"AttributeName": "PK",
39+
"AttributeType": "S"
40+
},
41+
{
42+
"AttributeName": "SK",
43+
"AttributeType": "S"
44+
},
45+
{
46+
"AttributeName": "UserThreadPK",
47+
"AttributeType": "S"
48+
},
49+
{
50+
"AttributeName": "UserThreadSK",
51+
"AttributeType": "S"
52+
}
53+
],
54+
"KeySchema": [
55+
{
56+
"AttributeName": "PK",
57+
"KeyType": "HASH"
58+
},
59+
{
60+
"AttributeName": "SK",
61+
"KeyType": "RANGE"
62+
}
63+
],
64+
"GlobalSecondaryIndexes": [
65+
{
66+
"IndexName": "UserThread",
67+
"KeySchema": [
68+
{
69+
"AttributeName": "UserThreadPK",
70+
"KeyType": "HASH"
71+
},
72+
{
73+
"AttributeName": "UserThreadSK",
74+
"KeyType": "RANGE"
75+
}
76+
],
77+
"Projection": {
78+
"ProjectionType": "INCLUDE",
79+
"NonKeyAttributes": ["id", "name"]
80+
}
81+
}
82+
],
83+
"BillingMode": "PAY_PER_REQUEST"
84+
}
85+
}
86+
}
87+
}
88+
```
89+
90+
## Logging
91+
92+
DynamoDB data layer defines a child of chainlit logger.
93+
94+
```python
95+
import logging
96+
from chainlit import logger
97+
98+
logger.getChild("DynamoDB").setLevel(logging.DEBUG)
99+
```
100+
101+
## Limitations
102+
Filtering by positive/negative feedback is not supported.
103+
104+
The data layer methods are not async. Boto3 is not async and therefore the data layer uses non-async blocking io.
105+
106+
## Design
107+
108+
This implementation uses Single Table Design. There are 4 different entity types in one table identified by the prefixes in PK & SK.
109+
110+
Here are the entity types:
111+
```ts
112+
type User = {
113+
PK: "USER#{user.identifier}"
114+
SK: "USER"
115+
// ...PersistedUser
116+
}
117+
118+
type Thread = {
119+
PK: f"THREAD#{thread_id}"
120+
SK: "THREAD"
121+
// GSI: UserThread for querying in list_threads
122+
UserThreadPK: f"USER#{user_id}"
123+
UserThreadSK: f"TS#{ts}"
124+
// ...ThreadDict
125+
}
126+
127+
type Step = {
128+
PK: f"THREAD#{threadId}"
129+
SK: f"STEP#{stepId}"
130+
// ...StepDict
131+
132+
// feedback is stored as part of step.
133+
// NOTE: feedback.value is stored as Decimal in dynamo which is not json serializable
134+
feedback?: Feedback
135+
}
136+
137+
type Element = {
138+
"PK": f"THREAD#{threadId}"
139+
"SK": f"ELEMENT#{element.id}"
140+
// ...ElementDict
141+
}
142+
```

data-layers/official.mdx

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Official Data Layer
3+
---
4+
5+
Follow the steps in this repository to persist your conversations in 2 minutes:
6+
<Card
7+
title="Official Data Layer"
8+
color="#F80061"
9+
icon="message"
10+
href="https://github.com/Chainlit/chainlit-datalayer"
11+
>
12+
Out-of-the-box data layer schema to store your threads, steps, feedback, etc.
13+
</Card>
14+
15+
<Warning>
16+
Do not forget to have your Chainlit application point to the database you set up by
17+
adding the `DATABASE_URL` environment variable in your `.env`.
18+
19+
If you wish to store elements, the same goes for your files system configuration.
20+
</Warning>
21+
22+
<Tip>
23+
Custom element `props` are stored directly in PostgreSQL, not on cloud storage.
24+
</Tip>

data-layers/overview.mdx

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: Overview
3+
---
4+
5+
Choose one of the following options for your open source data layer:
6+
- use the official Chainlit data layer (PostgreSQL + SQLAlchemy)
7+
- leverage a community-based data layer
8+
- or build your own!
9+
10+
<CardGroup cols={2}>
11+
<Card
12+
title="Official data layer"
13+
icon="check"
14+
color="#16a34a"
15+
href="/data-layers/official"
16+
>
17+
The official Chainlit data layer
18+
</Card>
19+
<Card
20+
title="Community SQLAlchemy data layer"
21+
icon="database"
22+
color="#0285c7"
23+
href="/data-layers/sqlalchemy"
24+
>
25+
The community SQLAlchemy data layer
26+
</Card>
27+
<Card
28+
title="Community DynamoDB data layer"
29+
icon="database"
30+
color="#3afadc"
31+
href="/data-layers/dynamodb"
32+
>
33+
The community DynamoDB data layer
34+
</Card>
35+
<Card
36+
title="Custom data layer API"
37+
icon="text"
38+
color="#ea5a0c"
39+
href="/api-reference/data-persistence/custom-data-layer"
40+
>
41+
The custom data layer implementation reference
42+
</Card>
43+
</CardGroup>
44+
45+
46+
## Official data layer
47+
48+
When using the [official data layer](/data-layers/official), just add the `DATABASE_URL` variable to your `.env` and
49+
a cloud storage configuration if relevant.
50+
51+
## Community data layers
52+
53+
For community data layers, you need to import the corresponding data layer in your chainlit app.
54+
Here is how you would do it with `SQLAlchemyDataLayer`:
55+
56+
```python
57+
import chainlit as cl
58+
59+
from chainlit.data.sql_alchemy import SQLAlchemyDataLayer
60+
61+
@cl.data_layer
62+
def get_data_layer():
63+
return SQLAlchemyDataLayer(conninfo="...")
64+
```
65+
66+
## Custom data layers
67+
68+
Follow the [reference](/api-reference/data-persistence/custom-data-layer) for an exhaustive list of the methods your custom data layer needs to implement.
69+

data-layers/sqlalchemy.mdx

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: SQLAlchemy Data Layer
3+
---
4+
5+
This custom layer has been tested for PostgreSQL, however it should support more SQL databases thanks to the use of the SQL Alchemy database.
6+
7+
This data layer also supports the `BaseStorageClient` that enables you to store your elements into Azure Blob Storage or AWS S3.
8+
9+
Here is the SQL used to create the schema for this data layer:
10+
11+
```sql
12+
CREATE TABLE users (
13+
"id" UUID PRIMARY KEY,
14+
"identifier" TEXT NOT NULL UNIQUE,
15+
"metadata" JSONB NOT NULL,
16+
"createdAt" TEXT
17+
);
18+
19+
CREATE TABLE IF NOT EXISTS threads (
20+
"id" UUID PRIMARY KEY,
21+
"createdAt" TEXT,
22+
"name" TEXT,
23+
"userId" UUID,
24+
"userIdentifier" TEXT,
25+
"tags" TEXT[],
26+
"metadata" JSONB,
27+
FOREIGN KEY ("userId") REFERENCES users("id") ON DELETE CASCADE
28+
);
29+
30+
CREATE TABLE IF NOT EXISTS steps (
31+
"id" UUID PRIMARY KEY,
32+
"name" TEXT NOT NULL,
33+
"type" TEXT NOT NULL,
34+
"threadId" UUID NOT NULL,
35+
"parentId" UUID,
36+
"streaming" BOOLEAN NOT NULL,
37+
"waitForAnswer" BOOLEAN,
38+
"isError" BOOLEAN,
39+
"metadata" JSONB,
40+
"tags" TEXT[],
41+
"input" TEXT,
42+
"output" TEXT,
43+
"createdAt" TEXT,
44+
"start" TEXT,
45+
"end" TEXT,
46+
"generation" JSONB,
47+
"showInput" TEXT,
48+
"language" TEXT,
49+
"indent" INT,
50+
FOREIGN KEY ("threadId") REFERENCES threads("id") ON DELETE CASCADE
51+
);
52+
53+
CREATE TABLE IF NOT EXISTS elements (
54+
"id" UUID PRIMARY KEY,
55+
"threadId" UUID,
56+
"type" TEXT,
57+
"url" TEXT,
58+
"chainlitKey" TEXT,
59+
"name" TEXT NOT NULL,
60+
"display" TEXT,
61+
"objectKey" TEXT,
62+
"size" TEXT,
63+
"page" INT,
64+
"language" TEXT,
65+
"forId" UUID,
66+
"mime" TEXT,
67+
FOREIGN KEY ("threadId") REFERENCES threads("id") ON DELETE CASCADE
68+
);
69+
70+
CREATE TABLE IF NOT EXISTS feedbacks (
71+
"id" UUID PRIMARY KEY,
72+
"forId" UUID NOT NULL,
73+
"threadId" UUID NOT NULL,
74+
"value" INT NOT NULL,
75+
"comment" TEXT,
76+
FOREIGN KEY ("threadId") REFERENCES threads("id") ON DELETE CASCADE
77+
);
78+
```
79+
80+
## Example
81+
82+
Here is an example of setting up this data layer on a PostgreSQL database with an Azure storage client. First install the required dependencies:
83+
84+
```bash
85+
pip install asyncpg SQLAlchemy azure-identity azure-storage-file-datalake
86+
```
87+
88+
Import the custom data layer and storage client, and indicate which data layer to use with `@cl.data_layer` at the beginning of your Chainlit app:
89+
90+
```python
91+
import chainlit as cl
92+
from chainlit.data.sql_alchemy import SQLAlchemyDataLayer
93+
from chainlit.data.storage_clients import AzureStorageClient
94+
95+
storage_client = AzureStorageClient(account_url="<your_account_url>", container="<your_container>")
96+
97+
@cl.data_layer
98+
def get_data_layer():
99+
return SQLAlchemyDataLayer(conninfo="<your conninfo>", storage_provider=storage_client)
100+
```
101+
102+
Note that you need to add `+asyncpg` to the protocol in the `conninfo` string so that it uses the asyncpg library.

0 commit comments

Comments
 (0)