Skip to content

Claude/build ai system project eb6ui#27

Open
JusSil501 wants to merge 7 commits into
codepath:mainfrom
JusSil501:claude/build-ai-system-project-Eb6ui
Open

Claude/build ai system project eb6ui#27
JusSil501 wants to merge 7 commits into
codepath:mainfrom
JusSil501:claude/build-ai-system-project-Eb6ui

Conversation

@JusSil501

Copy link
Copy Markdown

No description provided.

JusSil501 and others added 7 commits March 18, 2026 12:22
- Add pawpal_system.py: Task/Pet/Owner dataclasses + Scheduler with
  sorting, filtering, daily/weekly recurrence, and conflict detection
- Add main.py: CLI demo verifying data flow end-to-end
- Add tests/test_pawpal.py: 17 tests covering all core behaviors and edge cases
- Update app.py: full Streamlit UI wired to logic layer with Claude AI
  schedule explanation via Anthropic API
- Update README.md: architecture overview, UML diagram, features, testing docs
- Update reflection.md: design decisions, tradeoffs, AI collaboration notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reliability logger writes to pawpal.log; don't check it in.
Also drop the accidentally-tracked .pyc from the starter.

https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
logger_setup.py:
  - get_logger() wires a shared console + file handler (pawpal.log)
  - sanitize_user_text() screens user input before any LLM call:
    empty / oversized / prompt-injection / secret-leak patterns
  - Raises GuardrailError so callers can surface a clean UI message.

knowledge_base.py:
  - 15 curated pet-care snippets covering exercise, feeding, medication,
    grooming, enrichment, seniors, puppies, hydration, training.
  - Deterministic keyword + tag retriever (TF-style scorer, no deps).
  - min_score guard returns [] for off-topic queries rather than
    grounding the model on noise.
  - format_context() renders [n]-numbered context for prompt injection.

https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
ai_agent.py:
  - answer_question(): RAG-grounded Q&A via Claude Haiku. Retrieves top-k
    snippets, forces inline [n] citations, returns an AgentResult with
    confidence derived from citation coverage.
  - ScheduleReviewAgent: plan -> act -> check loop.
      PLAN  retrieves guidance from the KB.
      ACT   asks the LLM to emit JSON listing at most 3 concrete issues.
      CHECK runs each issue through a deterministic validator against
            the real Scheduler. Hallucinated conflicts are dropped.
  - Confidence decays with every validator rejection; AgentResult
    carries the full trace so the UI can show what happened.
  - _extract_json tolerates prose wrapping + code fences (Haiku sometimes
    wraps JSON in "Sure! Here you go:" preamble).

evaluator.py:
  - 8 offline checks: KB size, retriever relevance, retriever noise
    rejection, guardrail block/allow paths, scheduler conflict detection,
    empty-owner safety, agent JSON parse tolerance.
  - All deterministic / no network -- safe for CI and runs from
    the Streamlit reliability tab.
  - EvalReport exposes passed/total/score + markdown rendering.

https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
34 new tests (all deterministic, no network):
  - test_knowledge_base.py: canonical-query coverage, noise rejection,
    k-respect, determinism, empty-context safety.
  - test_guardrails.py: each block pattern (injection, secret leak,
    system-prompt exfil, empty/None, length cap) plus happy-path.
  - test_agent.py: JSON extractor (strict / prose-wrapped / code-fenced
    / empty), conflict validator (accepts real / rejects fake times /
    rejects when no real conflicts exist), and a stubbed plan-act-check
    run that confirms hallucinated claims are dropped.
  - test_evaluator.py: meta-tests that the evaluator itself passes and
    exposes a sane score.

Total suite: 51 tests, ~0.2s, 100% pass.

https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
app.py:
  - Four tabs mapping 1:1 to the architecture:
      Schedule, Ask PawPal (RAG), Review Agent, Reliability.
  - Every AI response shows a traffic-light confidence indicator
    (green >=0.7, yellow >=0.4, red otherwise) and the sources used.
  - Sidebar indicates whether ANTHROPIC_API_KEY is set so users know
    which tabs will work offline.
  - Guardrail errors surface as clean UI messages, not stack traces.

main.py:
  - CLI demo now runs the full applied-AI surface: scheduler,
    retriever, guardrails, evaluator, and (if key present) live
    LLM calls for Q&A + review agent.

https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
- README identifies the Module-2 base project, documents the four new
  AI features (RAG, agentic workflow, reliability eval, guardrails),
  setup + sample interactions, design tradeoffs, testing summary.
- model_card.md covers intended use, limitations, biases (corpus skew,
  keyword-retriever gaps), misuse mitigations, testing surprises, and
  two concrete AI-collaboration examples (one helpful, one flawed).
- assets/system_architecture.md contains the Mermaid source for the
  architecture diagram (exported to PNG via the Mermaid Live Editor).

https://claude.ai/code/session_01NdDTc7eYWfQ8b1Fmsc1Dtp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants