docs: Refactor docs

decodingml · Aug 1, 2024 · 6621975 · 6621975
1 parent e0f268d
commit 6621975
Show file tree

Hide file tree

Showing 4 changed files with 33 additions and 92 deletions.
diff --git a/GENERATE_INSTRUCT_DATASET.md b/GENERATE_INSTRUCT_DATASET.md
@@ -0,0 +1,22 @@
+# Generate Data for LLM finetuning task component
+
+## Component Structure
+
+### File Handling
+- `file_handler.py`: Manages file I/O operations, enabling reading and writing of JSON formatted data.
+
+### LLM Communication
+- `llm_communication.py`: Handles communication with OpenAI's LLMs, sending prompts and processing responses.
+
+### Data Generation
+- `generate_data.py`: Orchestrates the generation of training data by integrating file handling, LLM communication, and data formatting.
+
+
+### Usage
+
+The project includes a `Makefile` for easy management of common tasks. Here are the main commands you can use:
+
+- `make help`: Displays help for each make command.
+- `make local-start`: Build and start mongodb, mq and qdrant.
+- `make local-test-github`: Insert data to mongodb
+- `make generate-dataset`: Generate dataset for finetuning and version it in CometML
diff --git a/INSTALL_AND_USAGE.md b/INSTALL_AND_USAGE.md
@@ -50,7 +50,7 @@ Behind the scenes it will build and run all the Docker images defined in the [do
 > 127.0.0.1       mongo3
 > ```
 >
-> From what we know, on `Windows`, it `works out-of-the-box`.
+> From what we know, on `Windows`, it `works out-of-the-box`. For more details, check out this article: https://medium.com/workleap/the-only-local-mongodb-replica-set-with-docker-compose-guide-youll-ever-need-2f0b74dd8384
 
 > [!WARNING]
 > For `arm` users (e.g., `M1/M2/M3 macOS devices`), go to your Docker desktop application and enable `Use Rosetta for x86_64/amd64 emulation on Apple Silicon` from the Settings. There is a checkbox you have to check.
@@ -112,7 +112,7 @@ make local-test-retriever
 
 The last step, before fine-tuning is to generate an instruct dataset and track it as an artifact in Comet ML. To do so, run:
 ```shell
-make local-generate-dataset
+make generate-dataset
 ```
 
 > Now open [Comet ML](https://www.comet.com/signup/?utm_source=decoding_ml&utm_medium=partner&utm_content=github), go to your workspace, and open the `Artifacts` tab. There, you should find three artifacts as follows:
@@ -123,19 +123,19 @@ make local-generate-dataset
 
 ### Step 5: Fine-tuning
 
-For details on setting up the training pipeline on [Qwak](https://www.qwak.com/lp/end-to-end-mlops/?utm_source=github&utm_medium=referral&utm_campaign=decodingml) and running it, please referr to the [TRAINING]() document.
+For details on setting up the training pipeline on [Qwak](https://www.qwak.com/lp/end-to-end-mlops/?utm_source=github&utm_medium=referral&utm_campaign=decodingml) and running it, please refer to the [TRAINING](https://github.com/decodingml/llm-twin-course/blob/main/TRAINING.md) document.
 
 ### Step 6: Inference
 
-After you finetuned your model, the first step is to deploy the inference pipeline to Qwak as a REST API service:
+After you have finetuned your model, the first step is to deploy the inference pipeline to Qwak as a REST API service:
 ```shell
 deploy-inference-pipeline 
 ```
 
 > [!NOTE]
 > You can check out the progress of the deployment on [Qwak](https://www.qwak.com/lp/end-to-end-mlops/?utm_source=github&utm_medium=referral&utm_campaign=decodingml).
 
-After the deployment is finished (it will take a while) you can call it by calling:
+After the deployment is finished (it will take a while), you can call it by calling:
 ```shell
 make call-inference-pipeline
 ```

diff --git a/Makefile b/Makefile
@@ -60,7 +60,7 @@ cloud-test-github: # Send command to the cloud lambda with a Github repository
 local-feature-pipeline: # Run the RAG feature pipeline
 	RUST_BACKTRACE=full poetry run python -m bytewax.run 3-feature-pipeline/main.py
 
-local-generate-dataset: # Generate dataset for finetuning and version it in Comet ML
+generate-dataset: # Generate dataset for finetuning and version it in Comet ML
 	docker exec -it llm-twin-bytewax python -m finetuning.generate_data
 
 # ------ RAG ------

diff --git a/course/module-3/README.md → RAG.md b/course/module-3/README.md → RAG.md
@@ -1,8 +1,3 @@
-# Introduction
-This module is composed from 2 components:
-- RAG component
-- Finetuning dataset preparation component
-
 # RAG component
 A production RAG system is split into 3 main components:
 
@@ -48,8 +43,6 @@ To prepare your environment for these components, follow these steps:
 - `poetry init`
 - `poetry install`
 
-
-
 ## Docker Settings
 ### Host Configuration
 To ensure that your Docker containers can communicate with each other you need to update your `/etc/hosts` file. 
@@ -62,7 +55,7 @@ Add the following entries to map the hostnames to your local machine:
 127.0.0.1       mongo3
 ```
 
-For the Windows users check this article: https://medium.com/workleap/the-only-local-mongodb-replica-set-with-docker-compose-guide-youll-ever-need-2f0b74dd8384
+For Windows users check this article: https://medium.com/workleap/the-only-local-mongodb-replica-set-with-docker-compose-guide-youll-ever-need-2f0b74dd8384
 
 # CometML Integration
 
@@ -98,8 +91,6 @@ To access and set up the necessary CometML variables for your project, follow th
 4. **Set Environment Variables**:
    - Add the obtained `COMET_API_KEY` to your environment variables, along with the `COMET_PROJECT` and `COMET_WORKSPACE` names you have set up.
 
-
-
 # Qdrant Integration
 
 ## Overview
@@ -188,8 +179,8 @@ The `insert_data_mongo.py` script is designed to manage the automated downloadin
 
 # RAG Component
 
+## RAG Module Structure
 
-# RAG Module Structure
 ### Query Expansion
 - `query_expansion.py`: Handles the expansion of a given query into multiple variations using language model-based templates. It integrates the `ChatOpenAI` class from `langchain_openai` and a custom `QueryExpansionTemplate` to generate expanded queries suitable for further processing.
 
@@ -216,78 +207,6 @@ The workflow is straightforward:
 The project includes a `Makefile` for easy management of common tasks. Here are the main commands you can use:
 
 - `make help`: Displays help for each make command.
-- `make local-start-infra`: Build and start mongodb, mq and qdrant.
-- `make local-start-cdc`: Start cdc system
-- `make insert-data-mongo`: Insert data to mongodb
-- `make local-bytewax`: Run bytewax pipeline and send data to Qdrant
-- `make local-test-retriever:-`: Test RAG retrieval
-
-
-# Generate Data for LLM finetuning task component
-
-# Component Structure
-
-
-### File Handling
-- `file_handler.py`: Manages file I/O operations, enabling reading and writing of JSON formatted data.
-
-### LLM Communication
-- `llm_communication.py`: Handles communication with OpenAI's LLMs, sending prompts and processing responses.
-
-### Data Generation
-- `generate_data.py`: Orchestrates the generation of training data by integrating file handling, LLM communication, and data formatting.
-
-
-### Usage
-
-The project includes a `Makefile` for easy management of common tasks. Here are the main commands you can use:
-
-- `make help`: Displays help for each make command.
-- `make local-start-infra`: Build and start mongodb, mq and qdrant.
-- `make local-start-cdc`: Start cdc system
-- `make insert-data-mongo`: Insert data to mongodb
-- `make local-bytewax`: Run bytewax pipeline and send data to Qdrant
-- `make generate-dataset`: Generate dataset for finetuning and version it in CometML
-
-
-
-# Meet your teachers!
-The course is created under the [Decoding ML](https://decodingml.substack.com/) umbrella by:
-
-<table>
-  <tr>
-    <td><a href="https://github.com/iusztinpaul" target="_blank"><img src="https://github.com/iusztinpaul.png" width="100" style="border-radius:50%;"/></a></td>
-    <td>
-      <strong>Paul Iusztin</strong><br />
-      <i>Senior ML & MLOps Engineer</i>
-    </td>
-  </tr>
-  <tr>
-    <td><a href="https://github.com/alexandruvesa" target="_blank"><img src="https://github.com/alexandruvesa.png" width="100" style="border-radius:50%;"/></a></td>
-    <td>
-      <strong>Alexandru Vesa</strong><br />
-      <i>Senior AI Engineer</i>
-    </td>
-  </tr>
-  <tr>
-    <td><a href="https://github.com/Joywalker" target="_blank"><img src="https://github.com/Joywalker.png" width="100" style="border-radius:50%;"/></a></td>
-    <td>
-      <strong>Răzvanț Alexandru</strong><br />
-      <i>Senior ML Engineer</i>
-    </td>
-  </tr>
-</table>
-
-# License
-
-This course is an open-source project released under the MIT license. Thus, as long you distribute our LICENSE and acknowledge our work, you can safely clone or fork this project and use it as a source of inspiration for whatever you want (e.g., university projects, college degree projects, personal projects, etc.).
-
-# 🏆 Contribution
-
-A big "Thank you 🙏" to all our contributors! This course is possible only because of their efforts.
-
-<p align="center">
-    <a href="https://github.com/decodingml/llm-twin-course/graphs/contributors">
-      <img src="https://contrib.rocks/image?repo=decodingml/llm-twin-course" />
-    </a>
-</p>
+- `make local-start`: Build and start mongodb, mq and qdrant.
+- `make local-test-github`: Insert data to mongodb
+- `make local-test-retriever:`: Test RAG retrieval