A multilingual internship search assistant that uses LangChain + Llama 3 (Groq) to convert natural language questions into SQL queries over a local SQLite database of internships. It returns clickable internship links, formatted answers, and an HTML table viewβaccessible through a simple Flask + Jinja web UI and deployed as a Hugging Face Space.
π Live Demo: https://huggingface.co/spaces/joshi-deepak08/Internship_extraction_chatbot-Sahayak
- Overview
- Features
- Folder Structure
- How to Run Locally
- Architecture & Design Decisions
- Approach
- Pipeline Design
- Challenges & Trade-Offs
SAHAYAK is an AI assistant that helps students discover internships from a structured local database using natural language queries.
Key capabilities:
-
Understands user questions in any language (auto-detected & translated).
-
Uses LangChain + Llama 3 (Groq) to generate SQL queries dynamically.
-
Executes those SQL queries against a local
internship.dbSQLite database. -
Returns:
- a human-friendly answer with Markdown links
- an optional HTML table of internships (title, link, stipend).
-
Frontend is a Flask app with a simple chat-style interface showing previous conversation history.
- NL β SQL using LangChain + Llama 3
- Multilingual support via
deep_translator(input + output). - SQLite database of internships (
internship.db). - Tabular view of selected query results (e.g., internships with low stipend).
- Conversation history maintained in memory for UX.
- Deployed on Hugging Face Space for easy access.
Internship-Extraction-Chatbot/
β
βββ SAHAYAK: An AI Assistant/
β βββ static/ # CSS, JS, assets for UI
β βββ templates/
β β βββ index.html # Chat UI (Flask + Jinja)
β βββ app.py # Flask app + LangChain pipeline
β βββ internship.db # SQLite database with internship records
β βββ README.md
(Your GitHub repo root may directly contain these files depending on layout.)
git clone https://github.com/JoshiDeepak08/Internship-Extraction-Chatbot.git
cd Internship-Extraction-Chatbot/SAHAYAK:\ An\ AI\ Assistant(or navigate into the folder that contains app.py and internship.db.)
python -m venv venv
source venv/bin/activate # macOS / Linux
# or
venv\Scripts\activate # Windowspip install -r requirements.txt # if presentIf there is no requirements.txt at this level, install the key libs manually:
pip install flask sqlite3 pandas langchain langchain-community langchain-groq \
deep-translatorIn app.py you currently have:
os.environ["OPENAI_API_KEY"] = ""
os.environ["GROQ_API_KEY"] = ""Instead of hardcoding, export these before running:
export GROQ_API_KEY="your_groq_api_key_here"
# (OPENAI_API_KEY can remain empty; it's not used right now)On Windows (PowerShell):
$env:GROQ_API_KEY="your_groq_api_key_here"Alternatively, you can directly set them in the code (not recommended for production).
python app.pyBy default it will start in debug mode at:
http://127.0.0.1:5000
Open this in your browser to use the chatbot.
- Backend: Flask
- Database: SQLite (
internship.db) - LLM:
llama3-8b-8192viaChatGroq - Orchestration: LangChain (
SQLDatabase,create_sql_query_chain,QuerySQLDataBaseTool) - Translation:
deep_translator.GoogleTranslator - Frontend: Jinja2 HTML template (
templates/index.html) + simple CSS/Bootstrap.
- Automatically maps natural language to SQL given the DB schema.
- Handles query generation + execution pipeline in a few lines.
- Easy to change LLM or database backend later.
- Fast inference, good cost-performance.
- Open-weight model with strong reasoning over structured tasks like SQL generation.
- Lightweight, file-based DB.
- Perfect for a single-file internship dataset.
- Easy to ship with the repo and deploy on Hugging Face Space.
-
User types a question (can be in Hindi, English, or any language).
-
The app:
- Translates the question β English.
- Uses
create_sql_query_chainwithChatGroqandSQLDatabaseto generate a SQL query. - Extracts the SQL text from the LLM output (
generated_query.split("SQLQuery: ")[-1].strip()). - Executes SQL using
QuerySQLDataBaseTool.
-
The raw SQL result (rows) is:
- Optionally converted into an HTML table (especially for internship listings).
- Passed to another LLM prompt (
answer_prompt) to generate a short, readable summary.
-
The answer (in English) is translated back to the userβs original language using
GoogleTranslator. -
The final, translated response and HTML table are rendered on the page.
flowchart TD
A["User Question (Any Language)"] --> B["Flask Route"]
B --> C["Translate to English\ndeep-translator"]
C --> D["LangChain SQL Query Chain"]
D --> E["Generated SQL Query"]
E --> F["Execute on SQLite DB"]
F --> G["Raw SQL Result Rows"]
- Direct LLM-to-SQL can be risky if DB has write/drop access.
- Here, DB is read-only and local, so impact is contained.
-
Using GoogleTranslator for both directions introduces:
- Possible semantic drift.
- But huge benefit of multilingual UX.
-
Trade-off: slight inaccuracy vs accessibility for non-English users.
-
The quality of SQL generation depends heavily on:
- Clear column names in
internship.db. - Proper metadata exposure via
SQLDatabase.from_uri.
- Clear column names in
-
If schema changes, prompts may need updating.
- Currently, only a simple in-memory list
previous_conversationsis used. - No complex context or multi-turn reasoning yet; but good enough for first prototype.