Skip to content

Commit 6476b90

Browse files
authored
Merge pull request #54 from py-why/creating_augmented_suggester
Causal Relationship suggester based on an existing database
2 parents 5928968 + de4c85f commit 6476b90

File tree

10 files changed

+6374
-2002
lines changed

10 files changed

+6374
-2002
lines changed

README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ PyWhy-LLM seamlessly integrates into your existing causal inference process. Imp
2828
from pywhyllm.suggesters.model_suggester import ModelSuggester
2929
from pywhyllm.suggesters.identification_suggester import IdentificationSuggester
3030
from pywhyllm.suggesters.validation_suggester import ValidationSuggester
31+
from pywhyllm.suggesters.augmented_model_suggester import AugmentedModelSuggester
3132
from pywhyllm import RelationshipStrategy
3233

3334
```
@@ -49,11 +50,22 @@ domain_expertises = modeler.suggest_domain_expertises(all_factors)
4950
# Suggest a set of potential confounders
5051
suggested_confounders = modeler.suggest_confounders(treatment, outcome, all_factors, domain_expertises)
5152

52-
# Suggest pair-wise relationship between variables
53+
# Suggest pair-wise relationships between variables
5354
suggested_dag = modeler.suggest_relationships(treatment, outcome, all_factors, domain_expertises, RelationshipStrategy.Pairwise)
5455
```
5556

57+
### Retrieval Augmented Generation (RAG)-based Modeler
5658

59+
```python
60+
# Create instance of Modeler
61+
modeler = AugmentedModelSuggester('gpt-4')
62+
63+
treatment = "smoking"
64+
outcome = "lung cancer"
65+
66+
# Suggest pair-wise relationship between two given variables, utilizing CauseNet for RAGing the LLM
67+
suggested_relationship = modeler.suggest_relationships(treatment, outcome)
68+
```
5769

5870
### Identifier
5971

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"provenance": []
7+
},
8+
"kernelspec": {
9+
"name": "python3",
10+
"display_name": "Python 3"
11+
},
12+
"language_info": {
13+
"name": "python"
14+
}
15+
},
16+
"cells": [
17+
{
18+
"cell_type": "code",
19+
"source": [
20+
"pip install dotenv"
21+
],
22+
"metadata": {
23+
"id": "cmZerbMu6Uk4"
24+
},
25+
"execution_count": null,
26+
"outputs": []
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {
32+
"id": "EulKv3Km4nMa"
33+
},
34+
"outputs": [],
35+
"source": [
36+
"from dotenv import load_dotenv\n",
37+
"import os\n",
38+
"\n",
39+
"load_dotenv()\n",
40+
"\n",
41+
"os.environ[\"OPENAI_API_KEY\"] = '' # specify your key here"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"source": [
47+
"pip install pywhyllm"
48+
],
49+
"metadata": {
50+
"collapsed": true,
51+
"id": "83sxVcP97xlH"
52+
},
53+
"execution_count": null,
54+
"outputs": []
55+
},
56+
{
57+
"cell_type": "markdown",
58+
"source": [
59+
"Here we introduce the AugmentedModelSuggester class. Creating an instance of it enables the chosen LLM to utilize Retrieval Augmented Generation (RAG) to determine causality. It currently does this by searching the CauseNet dataset for a relevant causal pair and augmenting the LLM with the corresponding evidence/information stored in the dataset.\n",
60+
"\n",
61+
"CauseNet is a large-scale knowledge base of causal relations extracted from the web, created by Heindorf et al. (2020). CauseNet is available at [causenet.org](https://causenet.org) and is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)."
62+
],
63+
"metadata": {
64+
"id": "DjYECuX84vbN"
65+
}
66+
},
67+
{
68+
"cell_type": "code",
69+
"source": [
70+
"from pywhyllm.suggesters.augmented_model_suggester import AugmentedModelSuggester\n",
71+
"\n",
72+
"model = AugmentedModelSuggester('gpt-4')"
73+
],
74+
"metadata": {
75+
"id": "VdfEKuDLEYcU"
76+
},
77+
"execution_count": null,
78+
"outputs": []
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"source": [
83+
"AugmentedModelSuggester can suggest the pairwise relationship given two variables. If a relevant causal pair is found in CauseNet, the LLM is augmented with the aforementioned information in CauseNet. If not found, by default, the LLM will rely on its own knowledge."
84+
],
85+
"metadata": {
86+
"id": "dES0LwHV57eX"
87+
}
88+
},
89+
{
90+
"cell_type": "code",
91+
"source": [
92+
"result = model.suggest_pairwise_relationship(\"smoking\", \"lung cancer\")"
93+
],
94+
"metadata": {
95+
"id": "D85ec6Pk5JzA"
96+
},
97+
"execution_count": null,
98+
"outputs": []
99+
},
100+
{
101+
"cell_type": "code",
102+
"source": [
103+
"result"
104+
],
105+
"metadata": {
106+
"id": "W3bFehXh5SQl"
107+
},
108+
"execution_count": null,
109+
"outputs": []
110+
},
111+
{
112+
"cell_type": "code",
113+
"source": [
114+
"result = model.suggest_pairwise_relationship(\"income\", \"exercise level\")"
115+
],
116+
"metadata": {
117+
"id": "odFkp921hQsX"
118+
},
119+
"execution_count": null,
120+
"outputs": []
121+
},
122+
{
123+
"cell_type": "code",
124+
"source": [
125+
"result"
126+
],
127+
"metadata": {
128+
"id": "ZIeStj9OwIPe"
129+
},
130+
"execution_count": null,
131+
"outputs": []
132+
},
133+
{
134+
"cell_type": "code",
135+
"source": [
136+
"result = model.suggest_pairwise_relationship(\"flooding\", \"rain\")"
137+
],
138+
"metadata": {
139+
"id": "Fm5XCFrRwKsV"
140+
},
141+
"execution_count": null,
142+
"outputs": []
143+
},
144+
{
145+
"cell_type": "code",
146+
"source": [
147+
"result"
148+
],
149+
"metadata": {
150+
"id": "HDo098ICwzi7"
151+
},
152+
"execution_count": null,
153+
"outputs": []
154+
}
155+
]
156+
}

0 commit comments

Comments
 (0)