Claude/build ai system project eb6ui#27
Open
JusSil501 wants to merge 7 commits into
Open
Conversation
- Add pawpal_system.py: Task/Pet/Owner dataclasses + Scheduler with sorting, filtering, daily/weekly recurrence, and conflict detection - Add main.py: CLI demo verifying data flow end-to-end - Add tests/test_pawpal.py: 17 tests covering all core behaviors and edge cases - Update app.py: full Streamlit UI wired to logic layer with Claude AI schedule explanation via Anthropic API - Update README.md: architecture overview, UML diagram, features, testing docs - Update reflection.md: design decisions, tradeoffs, AI collaboration notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reliability logger writes to pawpal.log; don't check it in. Also drop the accidentally-tracked .pyc from the starter. https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
logger_setup.py:
- get_logger() wires a shared console + file handler (pawpal.log)
- sanitize_user_text() screens user input before any LLM call:
empty / oversized / prompt-injection / secret-leak patterns
- Raises GuardrailError so callers can surface a clean UI message.
knowledge_base.py:
- 15 curated pet-care snippets covering exercise, feeding, medication,
grooming, enrichment, seniors, puppies, hydration, training.
- Deterministic keyword + tag retriever (TF-style scorer, no deps).
- min_score guard returns [] for off-topic queries rather than
grounding the model on noise.
- format_context() renders [n]-numbered context for prompt injection.
https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
ai_agent.py:
- answer_question(): RAG-grounded Q&A via Claude Haiku. Retrieves top-k
snippets, forces inline [n] citations, returns an AgentResult with
confidence derived from citation coverage.
- ScheduleReviewAgent: plan -> act -> check loop.
PLAN retrieves guidance from the KB.
ACT asks the LLM to emit JSON listing at most 3 concrete issues.
CHECK runs each issue through a deterministic validator against
the real Scheduler. Hallucinated conflicts are dropped.
- Confidence decays with every validator rejection; AgentResult
carries the full trace so the UI can show what happened.
- _extract_json tolerates prose wrapping + code fences (Haiku sometimes
wraps JSON in "Sure! Here you go:" preamble).
evaluator.py:
- 8 offline checks: KB size, retriever relevance, retriever noise
rejection, guardrail block/allow paths, scheduler conflict detection,
empty-owner safety, agent JSON parse tolerance.
- All deterministic / no network -- safe for CI and runs from
the Streamlit reliability tab.
- EvalReport exposes passed/total/score + markdown rendering.
https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
34 new tests (all deterministic, no network):
- test_knowledge_base.py: canonical-query coverage, noise rejection,
k-respect, determinism, empty-context safety.
- test_guardrails.py: each block pattern (injection, secret leak,
system-prompt exfil, empty/None, length cap) plus happy-path.
- test_agent.py: JSON extractor (strict / prose-wrapped / code-fenced
/ empty), conflict validator (accepts real / rejects fake times /
rejects when no real conflicts exist), and a stubbed plan-act-check
run that confirms hallucinated claims are dropped.
- test_evaluator.py: meta-tests that the evaluator itself passes and
exposes a sane score.
Total suite: 51 tests, ~0.2s, 100% pass.
https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
app.py:
- Four tabs mapping 1:1 to the architecture:
Schedule, Ask PawPal (RAG), Review Agent, Reliability.
- Every AI response shows a traffic-light confidence indicator
(green >=0.7, yellow >=0.4, red otherwise) and the sources used.
- Sidebar indicates whether ANTHROPIC_API_KEY is set so users know
which tabs will work offline.
- Guardrail errors surface as clean UI messages, not stack traces.
main.py:
- CLI demo now runs the full applied-AI surface: scheduler,
retriever, guardrails, evaluator, and (if key present) live
LLM calls for Q&A + review agent.
https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
- README identifies the Module-2 base project, documents the four new AI features (RAG, agentic workflow, reliability eval, guardrails), setup + sample interactions, design tradeoffs, testing summary. - model_card.md covers intended use, limitations, biases (corpus skew, keyword-retriever gaps), misuse mitigations, testing surprises, and two concrete AI-collaboration examples (one helpful, one flawed). - assets/system_architecture.md contains the Mermaid source for the architecture diagram (exported to PNG via the Mermaid Live Editor). https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.