Skip to content

Commit 71d344e

Browse files
committed
Updated the docs for app-architecture-v3 with Java and Jena
1 parent 8818c45 commit 71d344e

8 files changed

+189
-34
lines changed

impl/docs/application_architecture.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# CosmosAIGraph : Application Architecture
22

33
<p align="center">
4-
<img src="img/app-architecture-v2.png" width="90%">
4+
<img src="img/app-architecture-v3.png" width="90%">
55
</p>
66

77
---
88

99
## Application Components
1010

1111
- Microservices
12-
- web microervice - UI front end
13-
- graph microservice - Contains the in-memory rdflib graph and AI functionality
12+
- web microervice - UI front end with AI functionality
13+
- graph microservice - Contains the in-memory graph
1414
- Azure Container App - Runtime orchestrator for the above two microservices
1515
- Cosmos DB NoSQL or Mongo vCore API - Domain data and conversational AI documents, embeddings
1616
- Azure OpenAI - completions and embeddings service

impl/docs/cosmos_design_modeling.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,16 +46,17 @@ This domain of software libraries was chosen because it should be **relatable**
4646
to most customers, and it also suitable for **Bill-of-Materials** graphs.
4747

4848
The PyPi JSON files were obtained with HTTP requests to public URLs such as
49-
**https://pypi.org/pypi/{libname}/json**, and their HTML contents were tranformed into JSON.
49+
**https://pypi.org/pypi/{libname}/json**, and their HTML contents were transformed into JSON.
5050

5151
Subsequent data wrangling fetched referenced HTML documentation, produced
5252
**text summarization with Azure OpenAI and semantic-kernel** and produced
53-
a **vectorized embedding value** from several concatinated text attributes
53+
a **vectorized embedding value** from several concatenated text attributes
5454
within each library JSON document. A full description of this data wrangling
5555
process is beyond the scope of this documentation, but the process itself
5656
is in file 'impl/app/wrangle.py' in the repo.
5757

5858
## Next Steps: Load Cosmos DB with Library Documents
5959

60+
Depending on the Cosmos DB API you chose:
6061
- See [Load Azure Cosmos DB vCore](load_cosmos_vcore.md)
6162
- See [Load Azure Cosmos DB NoSQL](load_cosmos_nosql.md)

impl/docs/environment_variables.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,16 @@ All of these begin with the prefix `CAIG_`.
2323
| CAIG_AZURE_REGION | The Azure region where the ACA app is deployed to |
2424
| CAIG_CONFIG_CONTAINER | The vCore container for configuration JSON values |
2525
| CAIG_CONVERSATIONS_CONTAINER | The vCore container where the chat conversations and history are persisted |
26+
| CAIG_COSMOSDB_NOSQL_ACCT | The Name of your Cosmos DB NoSQL account |
27+
| CAIG_COSMOSDB_NOSQL_AUTH_MECHANISM | The Cosmos DB NoSQL authentication mechanism; key or rbac |
2628
| CAIG_COSMOSDB_NOSQL_KEY1 | The key of your Cosmos DB NoSQL account |
29+
| CAIG_COSMOSDB_NOSQL_RG | The Resource Group of your Cosmos DB NoSQL account |
2730
| CAIG_COSMOSDB_NOSQL_URI | The URI of your Cosmos DB NoSQL account |
2831
| CAIG_DEFINED_AUTH_USERS | |
32+
| CAIG_ENCRYPTION_SYMMETRIC_KEY | optional symmetric key for encryption/decryption |
2933
| CAIG_FEEDBACK_CONTAINER | The vCore container where user feedback is persisted |
34+
| CAIG_GRAPH_DUMP_OUTFILE | The file to write to if CAIG_GRAPH_DUMP_UPON_BUILD is true |
35+
| CAIG_GRAPH_DUMP_UPON_BUILD | Boolean true/false to dump the Java/Jena model to CAIG_GRAPH_DUMP_OUTFILE after GraphBuilder completes |
3036
| CAIG_GRAPH_NAMESPACE | The custom namespace for the RED graph |
3137
| CAIG_GRAPH_SERVICE_NAME | |
3238
| CAIG_GRAPH_SERVICE_PORT | |
@@ -39,9 +45,6 @@ All of these begin with the prefix `CAIG_`.
3945
| CAIG_HOME | Root directory of the CosmosAIGraph GitHub repository on your system |
4046
| CAIG_LA_WORKSPACE_NAME | The Log Analytics workspace name used by the Azure Container App (ACA) |
4147
| CAIG_LOG_LEVEL | a python logging standard-lib level name: notset, debug, info, warning, error, or critical |
42-
| CAIG_PG_FLEX_PASS | Azure PostgreSQL Flex Server user password |
43-
| CAIG_PG_FLEX_SERVER | Azure PostgreSQL Flex Server hostname |
44-
| CAIG_PG_FLEX_USER | Azure PostgreSQL Flex Server user |
4548
| CAIG_WEBSVC_AUTH_HEADER | x-caig-auth |
4649
| CAIG_WEBSVC_AUTH_VALUE | K6ZQw!81 |
4750
| CAIG_WEB_APP_NAME | |

impl/docs/img/app-architecture-v3.png

464 KB
Loading

impl/docs/load_cosmos_nosql.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# CosmosAIGraph : Load Azure Cosmos DB NoSQL
1+
# CosmosAIGraph : Load Azure Cosmos DB for NoSQL
22

33
## Configuration
44

@@ -8,6 +8,7 @@ This page assumes that you have set the following environment variables:
88
CAIG_GRAPH_SOURCE_TYPE <-- must be set to 'cosmos_nosql'
99
CAIG_COSMOSDB_NOSQL_URI <-- this value is unique to your Azure deployment
1010
CAIG_COSMOSDB_NOSQL_KEY1 <-- Read/Write key value
11+
CAIG_COSMOSDB_NOSQL_AUTH_MECHANISM <-- Authentication mechanism - key or RBAC (Entra ID)
1112
1213
CAIG_GRAPH_SOURCE_DB <-- defaults to 'caig'
1314
CAIG_GRAPH_SOURCE_CONTAINER <-- defaults to 'libraries'
@@ -54,6 +55,18 @@ solution.
5455

5556
---
5657

58+
Navigate to the **impl\app** directory of this repo and execute
59+
the following commands:
60+
61+
```
62+
> .\venv.ps1 <-- create the python virtual environment
63+
64+
> .\venv\Scripts\Activate.ps1 <-- activate the python virtual environment
65+
```
66+
67+
---
68+
69+
5770
## Load the entities document into the config container
5871

5972
This step will load one document into the **config** container.
@@ -90,13 +103,9 @@ For your CosmosAIGraph implementation, create and upload a similar file.
90103

91104
## Load the Library data into Cosmos DB NoSQL
92105

93-
Navigate to the **impl\app** directory of this repo and execute
94-
the following commands:
106+
This step will load the main dataset into a libraries container :
95107

96108
```
97-
> .\venv.ps1 <-- create the python virtual environment
98-
99-
> .\venv\Scripts\Activate.ps1 <-- activate the python virtual environment
100109
101110
> python main_nosql.py load_libraries caig libraries 999999
102111
@@ -120,7 +129,7 @@ the following commands:
120129

121130
### Execute a Vector Search with the loaded data
122131

123-
First generate an embedding value from the words:
132+
First generate an embedding value from the words:
124133
"asynchronous web framework with pydantic".
125134
Then use that embedding in a vector search vs the Cosmos DB
126135
libraries container.
@@ -136,5 +145,4 @@ doc 3: {'pk': 'pypi', 'id': 'pypi_async_asgi_testclient', 'name': 'async-asgi-te
136145
...
137146
```
138147

139-
Notice that the **FastAPI** library is correctly identified as the top semantic search result.
140-
148+
Notice that the **FastAPI** library is correctly identified as the top semantic search result.

impl/docs/local_execution.md

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -39,19 +39,30 @@ This approach executes the application packaged as **Docker Containers** rather
3939

4040
Start your **Docker Desktop** application if it's not already running.
4141

42-
Be sure to modify your environment variables in the **docker-compose.yml**
43-
before starting the microservices.
42+
Be sure to modify your environment variables in the appropriate
43+
**docker-compose-xxx.yml** ile before starting the microservices.
44+
45+
Two docker-compose yml files are available:
46+
47+
- docker/docker-compose-with-rdflib.yml
48+
- This uses the Python-based web application
49+
- This uses the Python-based graph microservice using rdflib
50+
51+
- docker-compose-with-jena.yml
52+
- This also uses the same Python-based web application
53+
- This uses the Java-based graph microservice using Apache Jena
4454

4555
Create two PowerShell Terminal windows, and navigate to the **impl/app/** directory in each.
4656

47-
In the first terminal window, execute the following command to start the application
48-
(both microservices).
57+
In the first terminal window, execute the following command to start the application (both microservices).
4958

5059
```
51-
docker compose -f docker/docker-compose.yml up
60+
docker compose -f docker/docker-compose-with-rdflib.yml up
61+
or
62+
docker compose -f docker/docker-compose-with-jena.yml up
5263
```
5364

54-
You should see verbose output that includes the following:
65+
You should see similar verbose output that includes the following:
5566

5667
<p align="center">
5768
<img src="img/docker-compose-up.png" width="50%">
@@ -62,10 +73,12 @@ You should see verbose output that includes the following:
6273
In the second terminal window, execute the following command to terminate the application.
6374

6475
```
65-
docker compose -f docker/docker-compose.yml down
76+
docker compose -f docker/docker-compose-with-rdflib.yml down
77+
or
78+
docker compose -f docker/docker-compose-with-jena.yml down
6679
```
6780

68-
You should see verbose output that includes the following:
81+
You should see similar verbose output that includes the following:
6982

7083
<p align="center">
7184
<img src="img/docker-compose-down.png" width="40%">
@@ -75,15 +88,20 @@ You should see verbose output that includes the following:
7588

7689
### The Docker Containers
7790

78-
These two pre-built Docker containers exist on **DockerHub**:
91+
These three pre-built Docker containers exist on **DockerHub**:
7992

8093
- cjoakim/caig_web_v2:latest
8194
- cjoakim/caig_graph_v2:latest
95+
- cjoakim/caig_graph_java_jena_v1:latest
8296

83-
These are used by default by the above **docker-compose** script
97+
These are used by default by the above **docker-compose** scripts
8498
and also by the **Azure Container App** deployment process.
8599

86-
If you wish to rebuild these containers and deploy them to your own Container Registry,
87-
please see the **impl/app/docker-builds.ps1** and **impl/app/docker-builds.sh**
88-
scripts in this repository. You're free to modify these as necessary.
100+
If you wish to rebuild these containers and deploy them to your own
101+
Container Registry, please see the following Dockerfiles in this repo.
102+
You're free to modify these as necessary.
89103
Please change the **cjoakim** prefix to your own identifier.
104+
105+
- impl/app/docker/Dockerfile_graph
106+
- impl/app/docker/Dockerfile_web
107+
- impl/java_jena_graph_websvc/Dockerfile

impl/docs/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
<p align="center">
3-
<img src="img/app-architecture-v2.png" width="90%">
3+
<img src="img/app-architecture-v3.png" width="90%">
44
</p>
55

66

impl/docs/understanding_the_code.md

Lines changed: 128 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ or none.
1111

1212
## The impl\app directory
1313

14-
**This directory contains the current codebase.**
14+
**This directory contains the Python codebase.**
1515

1616
It contains these directories:
1717

@@ -24,10 +24,11 @@ It contains these directories:
2424

2525
## The impl\app directory
2626

27-
This is where the implementation code and scripts are. It contains these directories.
27+
This is where the Python implementation code and scripts are.
28+
It contains these directories.
2829

2930
```
30-
├── docker Dockerfiles and docker-compose.yml
31+
├── docker Dockerfiles and docker-compose yml files
3132
├── keys Future use
3233
├── ontologies OWL schema files, with *.owl file suffix
3334
├── rdf RDF graph data files in "triples" format, with *.nt file suffix
@@ -275,3 +276,127 @@ personal coding style preferences. In addition to reformatting the source
275276
code, black can also identify some problems/errors in the code.
276277

277278
See the **code-reformat.ps1** and **code-reformat.sh** scripts in the impl directory.
279+
280+
---
281+
282+
## The impl\java_jena_graph_websvc directory
283+
284+
This directory contains the newer Graph Microservice implemented
285+
with Java, Spring Boot, and Apache Jena. Gradle is used as the
286+
build tool.
287+
288+
Please see the readme.md file in this directory regarding
289+
building and executing this version of the Graph Microservice.
290+
291+
It contains these directories:
292+
293+
```
294+
├── build Output of the Gradle-based compilation and packaging process
295+
├── data
296+
├── data/cosmosdb_documents.json
297+
├── ontologies
298+
├── rdf RDF files optionally used to load the graph
299+
├── src Standard Java source code directory structure
300+
│   ├── main
301+
│   └── test
302+
└── tmp Create this directory if it doesn't already exist
303+
```
304+
305+
### Key Classes in the Java/Jena implementation
306+
307+
This section describes the primary Java classes in the Java/Jena implementation.
308+
309+
#### com.microsoft.cosmosdb.caig.WebApp
310+
311+
This is the Spring Boot entry point for the application, per the
312+
@SpringBootApplication annotation.
313+
314+
#### com.microsoft.cosmosdb.caig.web.GraphRestController
315+
316+
Spring @RestController that implements the **/sparql_query** HTTP endpoint.
317+
This endpoint is invoked by the Python-based Web UI when
318+
executing graph queries.
319+
320+
#### com.microsoft.cosmosdb.caig.web.HealthRestController
321+
322+
Spring @RestController that implements the **/health** HTTP endpoint.
323+
This endpoint that can optionally be invoked by your container
324+
orchestrator runtime environment - such as Azure Container Apps (ACA)
325+
or Azure Kubernetes Service (AKS).
326+
327+
#### com.microsoft.cosmosdb.caig.web.PingRestController
328+
329+
Spring @RestController that implements the **/** and **/ping** HTTP endpoints.
330+
The / endpoint simply returns the epoch time, while /ping returns
331+
uptime and JVM memory information.
332+
333+
#### com.microsoft.cosmosdb.caig.web.AppStartup
334+
335+
This contains the application startup logic per the Spring
336+
ApplicationListener interface. It contains this logic which
337+
loads the in-memory graph in class AppGraph.
338+
339+
```
340+
AppGraph g = AppGraphBuilder.build(null);
341+
AppGraph.setSingleton(g);```
342+
```
343+
344+
#### com.microsoft.cosmosdb.caig.util.AppConfig
345+
346+
This static class returns almost all configuration values for the
347+
application, such as from **environment variables**.
348+
349+
The environment variables begin with the **CAIG_** prefix and are described
350+
in the [Environment Variables](environment_variables.md) page
351+
352+
The Spring Boot framework also uses the src/main/resources/application.properties
353+
file for some configuration values. But all **application coniguration**
354+
is done with environment variables and class AppConfig. This approach
355+
is typically used by Docker containerized applications.
356+
357+
#### com.microsoft.cosmosdb.caig.graph.AppGraphBuilder
358+
359+
This class creates and populates the in-memory graph object
360+
which is an instance of class **org.apache.jena.rdf.model.Model**.
361+
362+
AppGraphBuilder can populate the graph from one of several
363+
sources per the CAIG_GRAPH_SOURCE_TYPE environment variable.
364+
CAIG_GRAPH_SOURCE_TYPE may have one of the following values:
365+
366+
- json_docs_file - the graph is sourced from ile cosmosdb_documents.json in the repo
367+
- rdf_file - the graph is sourced from the file specified in CAIG_GRAPH_SOURCE_RDF_FILENAME
368+
- cosmos_nosql - the graph is sourced from your Cosmos DB NoSQL account
369+
- cosmos_vcore - the graph is sourced from your Cosmos DB Mongo vCore account
370+
371+
After the Jena graph is populated, you can optionally dump that
372+
graph to a file per the CAIG_GRAPH_DUMP_UPON_BUILD and CAIG_GRAPH_DUMP_OUTFILE
373+
environment variables.
374+
375+
#### com.microsoft.cosmosdb.caig.graph.AppGraph
376+
377+
This class contains the singleton instance of class
378+
org.apache.jena.rdf.model.Model in the Apache Jena SDK.
379+
380+
It implements this primary message signature:
381+
382+
```
383+
public synchronized SparqlQueryResponse query(SparqlQueryRequest request) {
384+
}
385+
```
386+
387+
The method is **synchronized** so as to be thread safe. Each HTTP request
388+
to the Spring Boot application runs in its' own Thread.
389+
390+
Classes SparqlQueryRequest and SparqlQueryResponse are simple
391+
JSON serializable classes used to receive the HTTP POSTed query
392+
and to return the JSON response to the query.
393+
394+
#### com.microsoft.cosmosdb.caig.graph.LibrariesGraphTriplesBuilder
395+
396+
For the cases where AppGraphBuilder sources the graph from Cosmos DB,
397+
instances of LibrariesGraphTriplesBuilder are uses to populate the
398+
graph from each appropriate Cosmos DB document.
399+
400+
Customers should implement their own GraphTriplesBuilder class for
401+
their needs and implement as necessary per the shape of your Cosmos DB
402+
documents and graph schema.

0 commit comments

Comments
 (0)