Introduce Terraform Drift Detector with GitHub and Teams integration#36
Open
vibhatsrivastava wants to merge 38 commits into
Open
Introduce Terraform Drift Detector with GitHub and Teams integration#36vibhatsrivastava wants to merge 38 commits into
vibhatsrivastava wants to merge 38 commits into
Conversation
Introduce a new Terraform Drift Detector project that uses a ReAct agent with RAG (Chroma + embeddings) to detect drift between Terraform state and live AWS resources and to analyze policy impact. Adds planner and README docs, .env.example, policy YAMLs, RAG/vector store initialization, tools for parsing .tfstate, fetching AWS resources, diffing, and policy analysis, plus an agent entrypoint (src/main.py) and unit tests. Includes security and operational considerations (sensitive redaction, rate limiting, vector store persistence) and sample policy/docs to ground LLM outputs.
Provide a standalone test_infrastructure to run manual integration tests for the Terraform Drift Detector agent. Adds Terraform configuration (providers.tf, variables.tf, main.tf, outputs.tf, terraform.tfvars.example) to provision a tagged EC2 instance, plus a .gitignore for Terraform artifacts and a detailed test_infrastructure/README.md with quick start, workflow, expected results, and cleanup instructions. Also updates the project README to include a Manual Integration Testing section that explains how to provision the instance, simulate tag drift, run the agent against the generated terraform.tfstate, and destroy resources afterward.
Add optional GitHub issue creation and Microsoft Teams notifications to the Terraform drift detector. Introduces GitHub API tools (src/tools/github_tools.py) with create/search/update/close/post issue functionality, Teams adaptive-card notifier (src/integrations/teams_notifications.py), and teams.yaml ownership mappings plus a teams_parser utility for assignee resolution. main.py updated to extract the agent JSON output, deduplicate/create issues, and send Teams summary notifications when enabled. Documentation and environment templates updated (.env.example, README) with setup and workflow details, and tests added for GitHub, Teams and teams parser components.
Introduce a comprehensive planner doc outlining Phase 3 (automated remediation) and Phase 4 (testing & CI/CD) for the Terraform Drift Detector. Covers architecture (FastAPI webhook listener, AWX integration, GitHub Actions), security considerations, detailed implementation steps, pseudo-code, diagrams, test strategy, and example workflows to enable slash-command driven remediation, AWX job execution, validation, issue lifecycle, and PR/scheduled drift checks.
Introduce a comprehensive Mermaid flowchart and accompanying legend/descriptions to the project README to illustrate the full CLI workflows (Report, Issue Analysis, Auto-Analyze). The diagram documents steps for single vs. multi-repo runs, GitHub API fetch/post operations, LLM analysis, Teams notification paths, dry-run behavior, and key decision points to clarify execution logic and configuration flags.
Expand README with deployment guidance: add Deployment Options (AWX, Local CLI, GitHub Actions, Docker) and recommend AWX for production. Include a Deployment Comparison table, DevOps/Platform Teams benefits, and an End-to-End AWX execution flow (Mermaid diagram + legend) describing triggers, credential injection, remote execution, LLM integration, and notifications. Add a dedicated "Ansible AWX Deployment" quick-start with required files, setup steps, example job template, and expected JSON output, plus a tip linking to awx/README.md for full setup details.
Reorganize and tidy integration module initialization and minor cleanup across files: - cli/ai_agent_builder/integrations/__init__.py: Consolidated and reordered imports for caching, observability, orchestration, and vector_stores so modules are imported for registration side-effects; reformatted registry imports and updated __all__ to explicitly expose submodules. - cli/ai_agent_builder/integrations/orchestration/awx.py: Removed unused pathlib.Path import (fixes linter) and cleaned up whitespace/formatting in the AWX output parser helper. - cli/ai_agent_builder/env_manager.py: Removed trailing blank spaces in the generated .env template comments. These are non-functional cleanup changes to improve readability, lint compliance, and the module export surface.
Remove an unused `Path` import from cli/tests/test_awx_integration.py and clean up trailing whitespace and blank-line formatting throughout the file. Purely cosmetic formatting changes; no logic or test behavior was modified.
Remove numerous unused imports and perform small refactors across tests and project modules to reduce linter noise and simplify code. Changes include: removed unused Path/os/pytest/MagicMock/find_dotenv/chromadb/typing imports, simplified test assertions by calling functions without assigning return values, fixed a few string formatting instances (f-strings -> plain strings), adjusted typing hints, removed an unused import (send_drift_issue_notification) from a main module, and made small import/order tweaks in langfuse tracing tests. These edits are non-functional and intended to clean up warnings and improve code clarity.
Add # noqa: E402 to several delayed imports across example projects to silence flake8 E402 warnings after modifying sys.path. Replace a bare except with except Exception in awx/import_survey_api.py to avoid catching BaseException. Reorder and tidy imports in the GitHub reporter main module and remove a fragile captured output assertion from the reporter tests to prevent flaky failures. These are linting and test-stability improvements; no functional behavior changes intended.
Agent-Logs-Url: https://github.com/vibhatsrivastava/Agentic_AI_Development_Framework/sessions/904b8da2-bb71-49b1-82d4-e3c06e117d5b Co-authored-by: vibhatsrivastava <36897531+vibhatsrivastava@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vibhatsrivastava/Agentic_AI_Development_Framework/sessions/904b8da2-bb71-49b1-82d4-e3c06e117d5b Co-authored-by: vibhatsrivastava <36897531+vibhatsrivastava@users.noreply.github.com>
…w-failure Fix root-level pytest import resolution for Terraform Drift Detector tests
…erage Agent-Logs-Url: https://github.com/vibhatsrivastava/Agentic_AI_Development_Framework/sessions/147325d9-6f5c-45e9-876b-36900c9f9c16 Co-authored-by: vibhatsrivastava <36897531+vibhatsrivastava@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vibhatsrivastava/Agentic_AI_Development_Framework/sessions/147325d9-6f5c-45e9-876b-36900c9f9c16 Co-authored-by: vibhatsrivastava <36897531+vibhatsrivastava@users.noreply.github.com>
…terraform-drift-detector Fix Terraform Drift Detector CI regressions after import-path patch
Prepend the repository root to sys.path in projects/05_terraform_drift_detector/src/main.py so the project can import shared 'common' modules from the monorepo. Adds repo_root calculation and a conditional sys.path.insert; no other logic changes.
Add the 'unstructured' package to requirements.txt to enable document parsing for unstructured data. Also mark the Terraform output 'ami_id' as sensitive in test_infrastructure/outputs.tf to avoid exposing the AMI ID in Terraform outputs. Files changed: projects/05_terraform_drift_detector/requirements.txt, projects/05_terraform_drift_detector/test_infrastructure/outputs.tf.
🚀 Staging Deployment ✅ Success |
🚀 Staging Deployment ✅ Success |
Add AWS SSM parameter fetching and improve the Terraform drift detection pipeline. Removed unsupported callbacks from OllamaEmbeddings to avoid pydantic v2 issues. Update validate_state_file regex to accept Windows-style paths and prevent path traversal. In aws_tools: add _fetch_ssm_parameters, restructure EC2 output (attributes, tags), and add debug logging. In diff_tools: refactor comparison logic (extract _compare_resources_impl), add a pydantic args model, more robust input normalization (accept strings, dicts, lists and wrapped payloads), improved tag/attribute handling, skip unsupported types, and return structured dicts (with debug prints). terraform_tools: improve _redact_sensitive_attributes handling for invalid paths and non-string keys and log warnings. Tests added/updated for SSM, compare_resources wrappers, and Windows path handling. Also add prebuilt Chroma vector store database file.
…relax tfstate path validation
Attach Langfuse callback handler to OllamaEmbeddings by building a callbacks list and passing it into the embeddings constructor. Improve validate_state_file to normalize Windows backslashes to the OS separator before validation and tighten the regex/validation message to prevent path traversal and allow only safe characters. Also includes an updated Chroma vector store binary.
…ources Fix: sanitize truncated JSON in compare_resources and relax tfstate validation
🚀 Staging Deployment ✅ Success |
Add robust parsing and sanitization to compare_resources: support raw/truncated JSON strings, attempt json.loads and ast.literal_eval, strip ellipses/trailing commas, heuristically extract embedded objects/arrays, and extract specific fields (state_resources/cloud_resources). Add debug prints for incoming payloads and improve error/warning logging. Import re and ast and add helper parsers (try_parse_payload_string, extract_json_field) to better handle LLM/tool output. Adjust terraform state file validation to coerce path to string and be more permissive. Refine teams_parser fallback logic to track when a config was provided and only use GITHUB_ISSUE_ASSIGNEE in appropriate fallback cases. Add two helper scripts (scripts/debug_tools_call.py and scripts/invoke_compare_resources_test.py) to exercise parsing and compare flows locally. These changes increase resilience against malformed or partial JSON produced by upstream tools and LLMs.
Add recovery, caching, and concurrency improvements for the Terraform drift detector. Key changes: - Add compare_resources_raw tool to heuristically extract state/cloud JSON from malformed LLM/tool output and recover comparator results. - Harden compare_resources parsing and resource sanitization to better handle truncated/malformed JSON fragments. - Use enforce_json when creating the agent and add retry/backoff + increased recursion_limit for agent.invoke to reduce streaming/truncation failures. - Add DRY_RUN env handling to disable GitHub/Teams side-effects even when .env enables them. - Implement an in-process _VECTOR_STORE_CACHE in rag/vector_store to reuse Chroma stores and avoid repeated rebuilds. - Parallelize AWS fetchers: chunked/parallel describe_instances and threaded SSM parameter fetches to improve latency and reliability. - Fix OllamaEmbeddings call to avoid passing callback handlers (pydantic schema incompatibility). - Export compare_resources_raw in tools package and add two helper scripts (scripts/run_compare_raw.py, scripts/test_recovery.py) for local testing/recovery. These changes aim to make the detector more resilient to LLM output noise, faster when interacting with cloud APIs, and safer to run in CI/production by respecting dry-run flags.
Add performance and observability improvements for the Terraform Drift Detector: - Integrate Langfuse tracing (session grouping, metadata/tags, observation spans) in src/main.py and policy tools to surface LLM/tool latency and cache metrics. - Replace large system prompt with a compact JSON-first prompt and add format_drift_report() to render human-friendly markdown from agent JSON output. - Reduce RAG retrieval k from 5 to 2 in check and fix flows to reduce retrieved tokens. - Increase vector chunk size (500→1500) and overlap (50→200); exclude teams.yaml from indexing to reduce noise and chunk count (src/rag/vector_store.py). - Add aggressive caching layers: RAG cache and LLM response cache in src/tools/policy_tools.py, plus state-file parsing cache in src/tools/terraform_tools.py; include cache hit/miss logging. - Improve semantic query construction for policy retrieval and add RAG/LLM caching helpers with trace annotations. - Add two new docs: OPTIMIZATION_SUMMARY.md (detailed changes and impact) and TESTING_GUIDE.md (validation steps and expected results). - Update vector_store SQLite data (vector_store/chroma.sqlite3) as part of reindexing. These changes target 50–70% end-to-end latency improvement by reducing tokens, retrievals, and redundant parsing while adding observability for further tuning.
Improve observability and credential handling for the Terraform drift detector and workspace docs. - .github/copilot-instructions.md: add a Credential Resolution Order (Vault → project .env → root .env), require load_project_env() to raise a clear EnvironmentError if root .env is missing, prefer code-first responses, update Vault/Langfuse wording, and add testing/coverage guidance for common/ changes and refined CI dependency guidance. - projects/05_terraform_drift_detector/src/main.py: add Langfuse import fallbacks, early detection for malformed tool-call payloads, improved retry logic, and extensive timing instrumentation (perf_counter + TIMING logs) across vector store init, agent creation/invoke, JSON parse, GitHub/Teams actions, and total run times; minor error/logging refinements and whitespace cleanup. - projects/05_terraform_drift_detector/vector_store/chroma.sqlite3: update binary vector store database. These changes increase reliability, diagnostics, and clarity around secret resolution and testing expectations.
Add deterministic recovery and safer tool invocation in main ( _call_tool, _recover_drift_from_state_file ) so malformed model tool-calls can be rebuilt from state files. Harden compare and raw-compare logic in diff_tools: accept native objects, robustly extract/parse embedded JSON, improve Pydantic args with usage guidance, and sanitize error cases. Add GitHub helpers to validate/filter repository labels and assignees before creating issues to avoid API 422s. Make policy analysis tool accept dict inputs (and strings) for smoother tool chaining. Update tests to cover new parsing, recovery, and GitHub behaviors; update teams.yaml owners and refresh vector store DB snapshot.
Improve Teams notification compatibility by sending adaptive-card payloads first and falling back to a legacy MessageCard (connector) when the webhook rejects adaptive cards. Add helper _teams_webhook_succeeded and _build_summary_message_card, more robust error handling and logging around the webhook POST flow. Improve GitHub deduplication by adding _matches_existing_drift_issue and a fallback in search_existing_issues that scans recent open issues with the infrastructure-drift label when the Search API lookup fails; surface any search error in the result. Add tests covering both the Teams fallback and the GitHub issue-list fallback.
Binary SQLite database for the Chroma vector store was updated at projects/05_terraform_drift_detector/vector_store/chroma.sqlite3. This commit records a change to the serialized vector store (binary diff). No source code changes are included; ensure Chroma version compatibility and consider backing up previous DB state if required.
…ze payload structure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request primarily updates the GitHub Actions workflow for Copilot auto-implementation, streamlining how issues are assigned to the Copilot agent and how notifications are posted. Additionally, it includes several small code quality and import cleanups in the Python test and integration modules, removing unused imports and improving clarity.
GitHub Actions Workflow Improvements:
.github/workflows/copilot-implement.ymlworkflow now automatically assigns issues to the Copilot agent when approved, using a dedicated API call with explicit configuration (branch, model, and repository conventions). It provides clear success and error notifications, and posts detailed next-step instructions for users. [1] [2]Python Module and Test Cleanup:
cli/ai_agent_builder/integrations/__init__.py, ensuring only necessary modules and symbols are exported and removing redundant imports.Path,pytest,MagicMock, etc.) from various test files to clean up and simplify the codebase:cli/ai_agent_builder/integrations/orchestration/awx.pycli/tests/test_awx_integration.pycommon/tests/test_awx_utils.pycommon/tests/test_awx_wrapper.pycommon/tests/test_base_prompts.pycommon/tests/test_exceptions.pycommon/tests/test_langfuse_tracing.pycommon/tests/test_vault.pycommon/tests/test_langfuse_tracing.py[1] [2]common/tests/test_llm_factory.py[1] [2] [3]common/tests/test_vault.py