GenAI

Implementations of LLMs for different use cases

I divide RAG pipelines to be a three step process - 1.) Chunking 2.)Vectordb and persistence 3.) Output Generation

Creating right chunks for specific use cases is the most important part in getting accurate answers. Work is required around this part. To persist data we are using sqlitedb. Other databases like in-memory or Postgresql can also be used. For indexing the vectors and performing vector search we are using faiss db. We can also use different sentence transformers model to see the difference in accuracy according to our use case.

RAG

For the project under RAG is a implenetation of the piepline with native python modules. This gives us a lot of control over our pipeline and prevents us from using any bloated python SDKs. Here we can clearly understand and control our chain and retrievals, set prompts easily, see the right chunk(context) of data being passed and modify those according to our needs.

GETTING STARTED

Clone the repo. -

git clone https://github.com/abhinav23484/GenAI.git

CD to RAG -

CD RAG

Create an environment -

python -m venv path-to-project_dir/GenAI/RAG/venv

Activate the environment -

source venv/bin/activate

Install dependencies -

pip install requirements.txt

Run it as an endpoint -

uvicorn main:app --reload

The endpoint can be tested at - http://127.0.0.1:8000/docs

Example run - (The document/dataset used here is RAG/datasets/bitcoin.pdf)

Here the model is loaded using ctransformers. Ctransformers The model parameters are : -

Config

Parameter	Type	Description	Default
`top_k`	`int`	The top-k value to use for sampling.	`40`
`top_p`	`float`	The top-p value to use for sampling.	`0.95`
`temperature`	`float`	The temperature to use for sampling.	`0.8`
`repetition_penalty`	`float`	The repetition penalty to use for sampling.	`1.1`
`last_n_tokens`	`int`	The number of last tokens to use for repetition penalty.	`64`
`seed`	`int`	The seed value to use for sampling tokens.	`-1`
`max_new_tokens`	`int`	The maximum number of new tokens to generate.	`256`
`stop`	`List[str]`	A list of sequences to stop generation when encountered.	`None`
`stream`	`bool`	Whether to stream the generated text.	`False`
`reset`	`bool`	Whether to reset the model state before generating text.	`True`
`batch_size`	`int`	The batch size to use for evaluating tokens in a single prompt.	`8`
`threads`	`int`	The number of threads to use for evaluating tokens.	`-1`
`context_length`	`int`	The maximum context length to use.	`-1`
`gpu_layers`	`int`	The number of layers to run on GPU.	`0`

In a simlilar way the model can be loaded using llamaindex or llamacpp or using transformers itself depending on the available infrastructure.

The fastAPI host and port can be configured -

uvicorn main:app --host 0.0.0.0 --port 80

Now this endpoint can be used to deploy and integrate it with other UI applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GenAI

RAG

GETTING STARTED

Config

Files

README.md

Latest commit

History

README.md

File metadata and controls

GenAI

RAG

GETTING STARTED

Config