Skip to content

Commit 4404c0a

Browse files
committed
Updated root README
1 parent 6e9bd6a commit 4404c0a

3 files changed

Lines changed: 496 additions & 2 deletions

File tree

README.md

Lines changed: 354 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,358 @@ Propulse is a multi-agent system that leverages AI to generate high-quality prop
1313
- Verifier Agent: Ensures factual accuracy and compliance
1414
- **Modern Tech Stack**: Built with FastAPI, Streamlit, and Google Cloud Platform
1515

16+
## 🚀 Quick Start
17+
18+
1. **Clone the Repository**
19+
```bash
20+
git clone https://github.com/nerdy1texan/propulse.git
21+
cd propulse
22+
```
23+
24+
2. **Set Up Environment**
25+
26+
For Windows Git Bash:
27+
```bash
28+
# Initialize conda in Git Bash (do this once)
29+
source ~/anaconda3/etc/profile.d/conda.sh
30+
31+
# Create and activate conda environment
32+
conda env create -f environment.yml
33+
conda activate propulse
34+
```
35+
36+
For other terminals:
37+
```bash
38+
# Create and activate conda environment
39+
conda env create -f environment.yml
40+
conda activate propulse
41+
```
42+
43+
3. **Configure Environment Variables**
44+
```bash
45+
cp .env.example .env
46+
# Edit .env with your configuration
47+
```
48+
49+
4. **Start Services**
50+
```bash
51+
# Start backend
52+
cd backend
53+
uvicorn main:app --reload
54+
55+
# In another terminal, start frontend
56+
cd frontend
57+
streamlit run main.py
58+
```
59+
60+
## 🏗️ Architecture
61+
62+
### System Architecture
63+
```mermaid
64+
graph TD
65+
subgraph "Frontend Layer"
66+
UI[Streamlit UI]
67+
Upload[Document Upload]
68+
Preview[Proposal Preview]
69+
end
70+
71+
subgraph "Backend Layer"
72+
API[FastAPI Service]
73+
Auth[Authentication]
74+
Cache[Redis Cache]
75+
end
76+
77+
subgraph "Agent Pipeline"
78+
R[Retriever Agent]
79+
W[Writer Agent]
80+
V[Verifier Agent]
81+
end
82+
83+
subgraph "Storage Layer"
84+
VDB1[Vector DB - RFPs]
85+
VDB2[Vector DB - Proposals]
86+
DB[(PostgreSQL)]
87+
GCS[Cloud Storage]
88+
end
89+
90+
UI --> API
91+
Upload --> API
92+
API --> Auth
93+
API --> Cache
94+
API --> R
95+
R --> VDB1
96+
R --> VDB2
97+
R --> W
98+
W --> V
99+
V --> API
100+
API --> Preview
101+
API --> DB
102+
API --> GCS
103+
```
104+
105+
### Workflow Diagram
106+
```mermaid
107+
sequenceDiagram
108+
actor User
109+
participant UI as Frontend
110+
participant API as Backend
111+
participant R as Retriever
112+
participant W as Writer
113+
participant V as Verifier
114+
participant DB as Databases
115+
116+
User->>UI: Upload RFP/Enter Prompt
117+
UI->>API: Submit Request
118+
API->>R: Get Relevant Context
119+
R->>DB: Query Vector DBs
120+
DB-->>R: Return Matches
121+
R->>W: Context + Prompt
122+
W->>V: Generated Proposal
123+
V->>API: Verified Content
124+
API->>UI: Return Proposal
125+
UI->>User: Display Result
126+
```
127+
128+
## 📁 Detailed Project Structure
129+
130+
```
131+
Propulse/
132+
├── backend/ # FastAPI backend service
133+
│ ├── agents/ # Agent implementations
134+
│ │ ├── retriever/ # Retriever agent logic
135+
│ │ │ ├── __init__.py
136+
│ │ │ ├── agent.py
137+
│ │ │ └── utils.py
138+
│ │ ├── writer/ # Writer agent logic
139+
│ │ │ ├── __init__.py
140+
│ │ │ ├── agent.py
141+
│ │ │ └── templates.py
142+
│ │ └── verifier/ # Verifier agent logic
143+
│ │ ├── __init__.py
144+
│ │ ├── agent.py
145+
│ │ └── rules.py
146+
│ ├── api/ # API endpoints
147+
│ │ ├── v1/
148+
│ │ │ ├── __init__.py
149+
│ │ │ ├── auth.py
150+
│ │ │ ├── proposals.py
151+
│ │ │ └── users.py
152+
│ │ └── middleware/
153+
│ ├── core/ # Core business logic
154+
│ │ ├── config/
155+
│ │ ├── models/
156+
│ │ └── services/
157+
│ ├── logs/ # Log files
158+
│ └── main.py
159+
├── frontend/ # Streamlit frontend
160+
│ ├── assets/ # Static assets
161+
│ │ ├── css/
162+
│ │ └── img/
163+
│ ├── components/ # Reusable components
164+
│ │ ├── upload/
165+
│ │ ├── prompt/
166+
│ │ └── preview/
167+
│ ├── pages/ # Application pages
168+
│ │ ├── home.py
169+
│ │ ├── generate.py
170+
│ │ └── history.py
171+
│ └── main.py
172+
├── shared/ # Shared resources
173+
│ ├── mcp_schemas/ # MCP protocol schemas
174+
│ │ ├── input/
175+
│ │ └── output/
176+
│ ├── sample_rfps/ # Sample RFP documents
177+
│ └── templates/ # Proposal templates
178+
├── infra/ # Infrastructure code
179+
│ ├── gcp/ # GCP configurations
180+
│ │ ├── backend/
181+
│ │ └── frontend/
182+
│ └── terraform/ # Terraform configurations
183+
├── scripts/ # Utility scripts
184+
│ ├── setup.sh
185+
│ └── cleanup.sh
186+
├── .github/ # GitHub configurations
187+
│ └── workflows/ # CI/CD workflows
188+
├── tests/ # Test suite
189+
│ ├── unit/
190+
│ └── integration/
191+
├── .env.example # Environment variables template
192+
├── environment.yml # Conda environment file
193+
├── .gitignore # Git ignore rules
194+
└── README.md # Project documentation
195+
```
196+
197+
## 🔑 Key Features
198+
199+
### Implemented Components ✅
200+
201+
#### **Retriever Agent (Prompt 2)**
202+
- **Dual Vector Search**: Simultaneously queries RFP and proposal vector databases
203+
- **Multi-Format Support**: Processes PDF, DOCX, and TXT documents
204+
- **Smart Text Chunking**: Intelligent document segmentation with overlapping windows
205+
- **MCP Compliance**: Follows Model Context Protocol for standardized I/O
206+
- **Real-time Logging**: Comprehensive JSONL logs with retrieval metadata
207+
- **Error Resilience**: Graceful handling of missing files or processing errors
208+
- **Flexible Querying**: Supports text-only, document-only, or combined queries
209+
- **Embedding Models**: Uses Sentence Transformers for semantic similarity
210+
- **FAISS Integration**: High-performance vector similarity search
211+
- **GPU Acceleration**: Optional GPU support for faster processing
212+
213+
#### **Text Processing Pipeline**
214+
- **PDF Extraction**: Advanced PDF text extraction with page preservation
215+
- **DOCX Processing**: Complete DOCX parsing including tables and paragraphs
216+
- **Text Normalization**: Intelligent cleaning and formatting
217+
- **Metadata Preservation**: Maintains source file information and processing timestamps
218+
219+
#### **Vector Database Management**
220+
- **Automated Building**: Scripts to build vector databases from document collections
221+
- **Index Management**: FAISS index creation and optimization
222+
- **Metadata Storage**: JSON-based chunk and database metadata
223+
- **Version Control**: Timestamped database builds with provenance tracking
224+
225+
### Upcoming Components 🚧
226+
- Writer Agent: Context-aware proposal generation
227+
- Verifier Agent: Hallucination detection and fact-checking
228+
- API Integration: RESTful endpoints for agent coordination
229+
- Frontend Interface: Streamlit-based user interface
230+
- Cloud Deployment: GCP Cloud Run deployment pipeline
231+
232+
## 💻 Usage Commands
233+
234+
### Environment Setup
235+
```bash
236+
# Initialize conda in Git Bash (Windows)
237+
source ~/anaconda3/etc/profile.d/conda.sh
238+
239+
# Create and activate environment
240+
conda env create -f environment.yml
241+
conda activate propulse
242+
243+
# Copy environment variables template
244+
cp .env.example .env
245+
# Edit .env with your configuration
246+
```
247+
248+
### Vector Database Operations
249+
```bash
250+
# Build vector databases from sample documents
251+
python scripts/build_vector_db.py
252+
253+
# Build with custom paths
254+
python scripts/build_vector_db.py \
255+
--rfp-dir shared/sample_rfps \
256+
--proposal-dir shared/templates \
257+
--output-dir data/vector_dbs
258+
259+
# Build with GPU acceleration
260+
python scripts/build_vector_db.py --gpu
261+
262+
# Use different embedding model
263+
python scripts/build_vector_db.py --model all-mpnet-base-v2
264+
```
265+
266+
### Retriever Agent Usage
267+
```python
268+
# Basic retrieval example
269+
from backend.agents.retriever_agent import RetrieverAgent, QueryInput
270+
271+
# Initialize agent
272+
agent = RetrieverAgent(
273+
rfp_db_path="data/vector_dbs/rfp_db",
274+
proposal_db_path="data/vector_dbs/proposal_db"
275+
)
276+
277+
# Text-only query
278+
query = QueryInput(
279+
text="Need web application development with user authentication",
280+
top_k=5,
281+
similarity_threshold=0.2
282+
)
283+
result = agent.retrieve(query)
284+
285+
# Query with document upload
286+
query_with_doc = QueryInput(
287+
text="Software development project",
288+
document_path="path/to/rfp.pdf",
289+
top_k=10
290+
)
291+
result = agent.retrieve(query_with_doc)
292+
293+
# Save results
294+
agent.save_result(result)
295+
```
296+
297+
### Testing
298+
```bash
299+
# Run all tests
300+
pytest
301+
302+
# Run specific test file
303+
pytest tests/test_retriever.py -v
304+
305+
# Run with coverage
306+
pytest --cov=backend tests/
307+
308+
# Run only unit tests (skip integration)
309+
pytest -m "not integration"
310+
```
311+
312+
### Development Tools
313+
```bash
314+
# Code formatting
315+
black .
316+
isort .
317+
318+
# Linting
319+
flake8
320+
321+
# Type checking
322+
mypy backend/
323+
324+
# Pre-commit hooks
325+
pre-commit install
326+
pre-commit run --all-files
327+
```
328+
329+
### Service Management
330+
```bash
331+
# Start backend service
332+
cd backend
333+
uvicorn main:app --reload --port 8000
334+
335+
# Start frontend (in separate terminal)
336+
cd frontend
337+
streamlit run main.py
338+
339+
# View API documentation
340+
# http://localhost:8000/docs
341+
```
342+
343+
### Logging and Monitoring
344+
```bash
345+
# View retriever logs
346+
tail -f logs/retriever_log.jsonl
347+
348+
# Monitor vector database build
349+
tail -f logs/vector_db_build.log
350+
351+
# Clean up logs and artifacts
352+
bash scripts/cleanup.sh
353+
```
354+
355+
### Infrastructure Management
356+
```bash
357+
# Deploy to GCP (when implemented)
358+
cd infra/terraform/prod
359+
terraform init
360+
terraform plan
361+
terraform apply
362+
363+
# View cloud resources
364+
gcloud run services list
365+
gcloud storage ls
366+
```
367+
16368
## 🏗️ Architecture
17369

18370
### System Architecture
@@ -246,5 +598,5 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
246598

247599
## 👥 Team
248600

249-
- Project Lead: [Your Name](https://github.com/yourusername)
250-
- Contributors: [See all contributors](https://github.com/yourusername/propulse/graphs/contributors) # Propulse
601+
- Project Lead: [Maulin Raval](https://github.com/nerdy1texan)
602+
- Contributors: [See all contributors](https://github.com/yourusername/propulse/graphs/contributors)

0 commit comments

Comments
 (0)