From dde7de08449bc27c83fbe891d512b6dd1e075c56 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Allamand?= Date: Fri, 12 Sep 2025 17:23:37 +0200 Subject: [PATCH 1/3] add documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: SΓ©bastien Allamand --- ARCHITECTURE.md | 732 +++++++++++++++++++++++++++++++++++++++++++++ GETTING-STARTED.md | 475 +++++++++++++++++++++++++++++ README.md | 178 +++++++++-- TODO.md | 1 + 4 files changed, 1356 insertions(+), 30 deletions(-) create mode 100644 ARCHITECTURE.md create mode 100644 GETTING-STARTED.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 000000000..87ff33281 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,732 @@ +# Platform Engineering on EKS - Platform Architecture Guide + +This document provides a comprehensive overview of the platform components and GitOps architecture implemented in the appmod-blueprints repository, focusing on the platform services and application deployment patterns. + +## πŸ“š For Workshop Participants +This guide will help you understand how the platform components work together to support your application development and deployment exercises. + +## 🏒 For Platform Adopters +Use this guide to understand the platform architecture patterns and GitOps workflows that can be implemented in your organization for production use. + +## βš™οΈ For Infrastructure Engineers +This document provides detailed technical specifications for platform components, GitOps configurations, and customization points for extending the platform. + +## πŸ‘©β€πŸ’» For Developers +Learn how platform services support your development workflows, from code commit to production deployment through self-service capabilities. + +## Table of Contents +- [Overview](#overview) +- [Platform Architecture](#platform-architecture) +- [GitOps Architecture](#gitops-architecture) +- [Platform Components](#platform-components) +- [Application Blueprints](#application-blueprints) +- [Data Flow and Workflows](#data-flow-and-workflows) +- [Security and Compliance](#security-and-compliance) +- [Deployment Scenarios](#deployment-scenarios) + +## Overview + +The appmod-blueprints repository contains the platform implementation for a complete GitOps-based platform engineering solution. This repository provides the platform services, application templates, and operational patterns that run on the infrastructure established by the platform-engineering-on-eks repository. + +## Platform Architecture + +### High-Level Platform Components + +```mermaid +graph TB + subgraph "Developer Interface" + BACKSTAGE[Backstage Portal] + IDE[Development Environment] + GIT[Git Repositories] + end + + subgraph "Platform Control Plane" + ARGOCD[ArgoCD] + CROSSPLANE[Crossplane] + EXTERNAL_SECRETS[External Secrets Operator] + CERT_MANAGER[Cert Manager] + end + + subgraph "Application Runtime" + INGRESS[Ingress Controller] + MONITORING[Monitoring Stack] + LOGGING[Logging Stack] + SERVICE_MESH[Service Mesh] + end + + subgraph "Infrastructure Services" + EKS[EKS Clusters] + RDS[RDS Databases] + S3[S3 Storage] + SECRETS[AWS Secrets Manager] + end + + BACKSTAGE --> ARGOCD + IDE --> GIT + GIT --> ARGOCD + ARGOCD --> CROSSPLANE + ARGOCD --> EXTERNAL_SECRETS + ARGOCD --> CERT_MANAGER + ARGOCD --> INGRESS + ARGOCD --> MONITORING + CROSSPLANE --> RDS + CROSSPLANE --> S3 + EXTERNAL_SECRETS --> SECRETS + INGRESS --> EKS +``` + +## GitOps Architecture + +### GitOps Repository Structure + +The platform implements a multi-repository GitOps pattern: + +```mermaid +graph LR + subgraph "GitOps Repositories" + PLATFORM[Platform Repo
Core platform services] + ADDONS[Addons Repo
Cluster addons] + WORKLOADS[Workloads Repo
Applications] + FLEET[Fleet Repo
Multi-cluster config] + end + + subgraph "ArgoCD Applications" + PLATFORM_APP[Platform ApplicationSet] + ADDONS_APP[Addons ApplicationSet] + WORKLOADS_APP[Workloads ApplicationSet] + FLEET_APP[Fleet ApplicationSet] + end + + subgraph "Target Environments" + HUB[Hub Cluster] + SPOKE_DEV[Spoke Dev] + SPOKE_PROD[Spoke Prod] + end + + PLATFORM --> PLATFORM_APP + ADDONS --> ADDONS_APP + WORKLOADS --> WORKLOADS_APP + FLEET --> FLEET_APP + + PLATFORM_APP --> HUB + ADDONS_APP --> HUB + ADDONS_APP --> SPOKE_DEV + ADDONS_APP --> SPOKE_PROD + WORKLOADS_APP --> SPOKE_DEV + WORKLOADS_APP --> SPOKE_PROD + FLEET_APP --> HUB +``` + +### GitOps Workflow Pattern + +```mermaid +sequenceDiagram + participant Dev as Developer + participant Backstage as Backstage Portal + participant Git as Git Repository + participant ArgoCD as ArgoCD + participant K8s as Kubernetes + + Dev->>Backstage: Create application from template + Backstage->>Git: Generate application code & GitOps config + Dev->>Git: Push code changes + Git->>ArgoCD: Webhook notification + ArgoCD->>Git: Pull latest configuration + ArgoCD->>K8s: Apply Kubernetes manifests + K8s->>ArgoCD: Report deployment status + ArgoCD->>Dev: Deployment notification +``` + +### GitOps Bridge Architecture + +The GitOps Bridge is a critical component that connects infrastructure provisioning with ArgoCD-based GitOps deployments. It acts as a data pipeline that passes infrastructure metadata from the bootstrap infrastructure to Kubernetes secrets, which are then consumed by ArgoCD ApplicationSets. + +```mermaid +graph TB + subgraph "Infrastructure Layer" + BOOTSTRAP[Bootstrap Infrastructure
CDK/Terraform] + ADDONS_META[Infrastructure Metadata
Cluster info, secrets, URLs] + end + + subgraph "GitOps Bridge" + GB_MODULE[GitOps Bridge Module] + K8S_SECRETS[Kubernetes Cluster Secrets
Infrastructure data] + end + + subgraph "ArgoCD Layer" + APPSETS[ApplicationSets
Multi-cluster deployment] + HELM_VALUES[Helm Values
Dynamic configuration] + GITOPS_APPS[GitOps Applications
Platform services] + end + + subgraph "AWS Services" + SECRETS_MGR[AWS Secrets Manager
Secure credentials] + SSM[AWS Systems Manager
Configuration parameters] + IAM[AWS IAM Roles
Service authentication] + end + + BOOTSTRAP --> ADDONS_META + ADDONS_META --> GB_MODULE + GB_MODULE --> K8S_SECRETS + K8S_SECRETS --> APPSETS + APPSETS --> HELM_VALUES + HELM_VALUES --> GITOPS_APPS + + SECRETS_MGR -.-> GITOPS_APPS + SSM -.-> GITOPS_APPS + IAM -.-> GITOPS_APPS +``` + +#### How the GitOps Bridge Works + +The GitOps Bridge enables infrastructure data to flow seamlessly into GitOps applications: + +1. **Infrastructure Metadata Collection**: The bootstrap infrastructure (from platform-engineering-on-eks) collects key information like cluster names, AWS regions, VPC IDs, and service URLs + +2. **Bridge Module Processing**: The GitOps Bridge module transforms this metadata into Kubernetes secrets that can be consumed by ArgoCD applications + +3. **ApplicationSet Consumption**: ArgoCD ApplicationSets use the cluster secrets to dynamically configure applications across multiple environments + +4. **Helm Value Injection**: Infrastructure data is injected into Helm charts as values, enabling environment-specific configurations + +#### Key Benefits + +- **Decoupling**: GitOps templates don't need to know dynamic infrastructure details +- **Predictability**: Infrastructure data follows consistent patterns and naming conventions +- **Security**: Sensitive data is managed through AWS Secrets Manager and Kubernetes secrets +- **Maintainability**: Changes to infrastructure automatically propagate to applications +- **Scalability**: New environments can be added without modifying GitOps configurations + +#### Example: Secret Management Pattern + +The platform uses predictable naming conventions for secrets that eliminate the need for dynamic secret name passing: + +``` +{project_context_prefix}-{service}-{type}-password +``` + +Examples: +- `peeks-workshop-gitops-keycloak-admin-password` +- `peeks-workshop-gitops-backstage-postgresql-password` + +This pattern allows GitOps applications to reference secrets by name without needing dynamic infrastructure data injection. + +### Cluster Registration and Discovery + +A key architectural pattern in the platform is the automatic cluster registration system. When EKS clusters are created through any infrastructure tool (Terraform, KRO, Crossplane), they automatically register themselves in AWS Secrets Manager with metadata that enables ArgoCD ApplicationSets to discover and configure them dynamically. + +#### Cluster Registration Flow + +```mermaid +sequenceDiagram + participant IaC as Infrastructure Tool
(Terraform/KRO/Crossplane) + participant AWS as AWS Secrets Manager + participant ESO as External Secrets Operator + participant ArgoCD as ArgoCD ApplicationSets + participant Cluster as Target Cluster + + Note over IaC,AWS: Cluster Creation & Registration + IaC->>AWS: Create EKS cluster + IaC->>AWS: Create cluster registration secret
with labels & annotations + + Note over ESO,ArgoCD: Discovery & Configuration + ESO->>AWS: Sync cluster secrets to hub + ESO->>ArgoCD: Provide cluster metadata as K8s secrets + ArgoCD->>ArgoCD: Generate ApplicationSets
based on cluster labels + + Note over ArgoCD,Cluster: Dynamic Deployment + ArgoCD->>Cluster: Deploy addons using
cluster-specific configuration + ArgoCD->>Cluster: Deploy workloads using
environment-specific values +``` + +#### Cluster Registration Secret Format + +Each cluster creates a standardized registration secret that ApplicationSets use for dynamic configuration: + +```json +{ + "cluster_name": "spoke-dev-us-east-1", + "cluster_endpoint": "https://ABC123.gr7.us-east-1.eks.amazonaws.com", + "cluster_ca_certificate": "LS0tLS1CRUdJTi...", + "aws_region": "us-east-1", + "resource_prefix": "peeks-workshop", + "environment": "dev", + "tenant": "platform-team", + "cluster_type": "spoke", + "labels": { + "environment": "dev", + "region": "us-east-1", + "cluster-type": "spoke", + "tenant": "platform-team", + "resource-prefix": "peeks-workshop", + "workload-type": "applications" + }, + "annotations": { + "addons_repo_basepath": "gitops/addons/", + "workloads_repo_basepath": "gitops/workloads/", + "kustomize_path": "environments/dev", + "helm_values_path": "values/dev.yaml", + "resource_prefix": "peeks-workshop", + "sync_wave": "10" + } +} +``` + +#### ApplicationSet Integration + +ArgoCD ApplicationSets automatically discover clusters and use their metadata for configuration: + +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: ApplicationSet +metadata: + name: platform-addons + namespace: argocd +spec: + generators: + - clusters: + selector: + matchLabels: + cluster-type: spoke + values: + environment: '{{metadata.labels.environment}}' + tenant: '{{metadata.labels.tenant}}' + addons_path: '{{metadata.annotations.addons_repo_basepath}}' + values_path: '{{metadata.annotations.helm_values_path}}' + template: + metadata: + name: '{{name}}-platform-addons' + spec: + project: '{{values.tenant}}' + source: + repoURL: https://github.com/aws-samples/appmod-blueprints + path: '{{values.addons_path}}environments/{{values.environment}}' + targetRevision: main + helm: + valueFiles: + - '{{values.values_path}}' + destination: + server: '{{server}}' + namespace: kube-system + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true +``` + +#### Multi-Tool Support + +The platform supports cluster creation through multiple infrastructure tools: + +**Terraform Integration:** +```hcl +resource "aws_secretsmanager_secret" "cluster_registration" { + name = "cluster-registration-${var.cluster_name}" + + secret_string = jsonencode({ + cluster_name = var.cluster_name + resource_prefix = var.resource_prefix + environment = var.environment + tenant = var.tenant + labels = { + environment = var.environment + "cluster-type" = var.cluster_type + tenant = var.tenant + "resource-prefix" = var.resource_prefix + } + annotations = { + addons_repo_basepath = "gitops/addons/" + kustomize_path = "environments/${var.environment}" + resource_prefix = var.resource_prefix + } + }) +} +``` + +**KRO (Kubernetes Resource Operator):** +```yaml +apiVersion: kro.run/v1alpha1 +kind: ResourceGroup +metadata: + name: eks-with-registration +spec: + resources: + - id: cluster-registration + template: + apiVersion: secretsmanager.aws.crossplane.io/v1beta1 + kind: Secret + spec: + forProvider: + name: cluster-registration-{{ .spec.clusterName }} + secretString: | + { + "cluster_name": "{{ .spec.clusterName }}", + "resource_prefix": "{{ .spec.resourcePrefix }}", + "environment": "{{ .spec.environment }}", + "labels": {{ .spec.labels | merge(dict "resource-prefix" .spec.resourcePrefix) | toJson }} + } +``` + +**Crossplane Composition:** +```yaml +apiVersion: apiextensions.crossplane.io/v1 +kind: Composition +metadata: + name: eks-cluster-with-registration +spec: + resources: + - name: cluster-registration-secret + base: + apiVersion: secretsmanager.aws.crossplane.io/v1beta1 + kind: Secret + spec: + forProvider: + secretString: | + { + "cluster_name": {{ .spec.clusterName | quote }}, + "resource_prefix": {{ .spec.resourcePrefix | quote }}, + "environment": {{ .spec.environment | quote }}, + "labels": {{ .spec.labels | merge(dict "resource-prefix" .spec.resourcePrefix) | toJson }} + } +``` + +#### Benefits of Cluster Registration Pattern + +1. **Automatic Discovery**: New clusters are immediately available to ArgoCD without manual configuration +2. **Environment Isolation**: Labels enable environment-specific deployment patterns +3. **Multi-Tenant Support**: Tenant-based cluster organization and access control +4. **Tool Flexibility**: Works with any infrastructure tool that can create AWS secrets +5. **GitOps Native**: Seamless integration with ArgoCD ApplicationSets and cluster generators +6. **Configuration Flexibility**: Annotations enable cluster-specific deployment customization +7. **Audit and Compliance**: Full audit trail of cluster registrations in AWS Secrets Manager + +## Platform Components + +### Platform Component Relationships + +```mermaid +graph TB + subgraph "Developer Interface Layer" + BACKSTAGE[Backstage Portal
Self-service templates] + GIT[Git Repositories
Source code & GitOps config] + end + + subgraph "Platform Control Layer" + ARGOCD[ArgoCD
GitOps controller] + CROSSPLANE[Crossplane
Infrastructure as Code] + ESO[External Secrets
Secret management] + CERT_MGR[Cert Manager
TLS automation] + end + + subgraph "Application Runtime Layer" + INGRESS[Ingress Controller
Traffic routing] + MONITORING[Monitoring Stack
Observability] + SERVICE_MESH[Service Mesh
Communication] + end + + subgraph "Infrastructure Layer" + EKS[EKS Clusters
Container orchestration] + AWS_SERVICES[AWS Services
RDS, S3, Secrets Manager] + end + + BACKSTAGE --> GIT + GIT --> ARGOCD + ARGOCD --> CROSSPLANE + ARGOCD --> ESO + ARGOCD --> CERT_MGR + ARGOCD --> INGRESS + ARGOCD --> MONITORING + CROSSPLANE --> AWS_SERVICES + ESO --> AWS_SERVICES + INGRESS --> EKS + MONITORING --> EKS + SERVICE_MESH --> EKS +``` + +### Core Platform Services + +#### ArgoCD - GitOps Controller +**Purpose**: Continuous deployment and configuration management +**Configuration**: `gitops/platform/charts/argo-cd/` +**Key Features**: +- Multi-cluster application deployment +- ApplicationSets for environment promotion +- RBAC integration with external identity providers +- Automated sync and drift detection + +#### Crossplane - Infrastructure as Code +**Purpose**: Cloud resource provisioning through Kubernetes APIs +**Configuration**: `platform/crossplane/` +**Key Features**: +- AWS resource compositions (RDS, S3, IAM) +- Self-service infrastructure through Kubernetes CRDs +- Policy-driven resource management +- Cost optimization through resource lifecycle management + +#### External Secrets Operator - Secret Management +**Purpose**: Secure secret synchronization from AWS Secrets Manager +**Configuration**: Deployed via GitOps addons +**Key Features**: +- AWS Secrets Manager integration +- Automatic secret rotation +- Cross-namespace secret sharing +- Pod Identity authentication + +#### Backstage - Developer Portal +**Purpose**: Self-service developer experience platform +**Configuration**: `platform/backstage/` +**Key Features**: +- Application templates and scaffolding +- Service catalog and documentation +- CI/CD pipeline integration +- Infrastructure visibility + +## Application Blueprints + +### Supported Application Types + +The platform includes blueprints for multiple technology stacks: + +#### .NET Applications +**Location**: `applications/dotnet/` +**Features**: +- Clean Architecture pattern +- Entity Framework with PostgreSQL +- Health checks and observability +- Container-optimized builds + +#### Java Applications +**Location**: `applications/java/` +**Features**: +- Spring Boot microservices +- JPA with database integration +- Actuator endpoints for monitoring +- Maven-based builds + +#### Node.js Applications +**Location**: `applications/node/` +**Features**: +- Express.js framework +- TypeScript support +- npm/yarn package management +- Modern JavaScript patterns + +#### Python Applications +**Location**: `applications/python/` +**Features**: +- FastAPI framework +- Async/await patterns +- Poetry dependency management +- Type hints and validation + +#### Rust Applications +**Location**: `applications/rust/` +**Features**: +- High-performance web services +- Cargo build system +- Memory safety and performance +- Cloud-native patterns + +#### Go Applications +**Location**: `applications/golang/` +**Features**: +- Standard library HTTP server +- Goroutine concurrency +- Module-based dependency management +- Minimal container images + +### Application Deployment Pattern + +```mermaid +graph TB + subgraph "Application Development" + TEMPLATE[Backstage Template] + CODE[Application Code] + DOCKERFILE[Container Definition] + end + + subgraph "CI/CD Pipeline" + BUILD[Container Build] + REGISTRY[Container Registry] + GITOPS[GitOps Config Update] + end + + subgraph "Platform Deployment" + ARGOCD_SYNC[ArgoCD Sync] + K8S_DEPLOY[Kubernetes Deployment] + INGRESS_CONFIG[Ingress Configuration] + end + + TEMPLATE --> CODE + CODE --> DOCKERFILE + DOCKERFILE --> BUILD + BUILD --> REGISTRY + BUILD --> GITOPS + GITOPS --> ARGOCD_SYNC + ARGOCD_SYNC --> K8S_DEPLOY + K8S_DEPLOY --> INGRESS_CONFIG +``` + +## Data Flow and Workflows + +### Developer Workflow + +```mermaid +sequenceDiagram + participant Dev as Developer + participant Backstage as Backstage + participant Git as Git Repository + participant CI as CI Pipeline + participant ArgoCD as ArgoCD + participant K8s as Kubernetes + participant Monitor as Monitoring + + Dev->>Backstage: Create new application + Backstage->>Git: Generate repository with template + Dev->>Git: Clone and develop locally + Dev->>Git: Push code changes + Git->>CI: Trigger build pipeline + CI->>Git: Update GitOps configuration + Git->>ArgoCD: Webhook notification + ArgoCD->>K8s: Deploy application + K8s->>Monitor: Emit metrics and logs + Monitor->>Dev: Deployment status and health +``` + +### Infrastructure Provisioning Workflow + +```mermaid +sequenceDiagram + participant Dev as Developer + participant Crossplane as Crossplane + participant AWS as AWS APIs + participant K8s as Kubernetes + participant App as Application + + Dev->>K8s: Create infrastructure claim + K8s->>Crossplane: Process claim via composition + Crossplane->>AWS: Provision resources (RDS, S3, etc.) + AWS->>Crossplane: Return resource details + Crossplane->>K8s: Create connection secrets + K8s->>App: Mount secrets as environment variables + App->>AWS: Connect to provisioned resources +``` + +## Security and Compliance + +### Security Architecture + +```mermaid +graph TB + subgraph "Identity and Access" + IAM[AWS IAM] + RBAC[Kubernetes RBAC] + POD_IDENTITY[EKS Pod Identity] + OIDC[OIDC Integration] + end + + subgraph "Secret Management" + SECRETS_MGR[AWS Secrets Manager] + EXTERNAL_SECRETS[External Secrets Operator] + K8S_SECRETS[Kubernetes Secrets] + end + + subgraph "Network Security" + VPC[VPC Isolation] + SECURITY_GROUPS[Security Groups] + NETWORK_POLICIES[Network Policies] + TLS[TLS Termination] + end + + subgraph "Compliance and Auditing" + CLOUDTRAIL[CloudTrail] + AUDIT_LOGS[Kubernetes Audit Logs] + POLICY_ENGINE[Policy Engine] + end + + IAM --> POD_IDENTITY + POD_IDENTITY --> EXTERNAL_SECRETS + SECRETS_MGR --> EXTERNAL_SECRETS + EXTERNAL_SECRETS --> K8S_SECRETS + RBAC --> K8S_SECRETS + VPC --> SECURITY_GROUPS + SECURITY_GROUPS --> NETWORK_POLICIES + NETWORK_POLICIES --> TLS + CLOUDTRAIL --> AUDIT_LOGS + AUDIT_LOGS --> POLICY_ENGINE +``` + +### Security Best Practices + +#### Identity and Access Management +- **Pod Identity**: Eliminates long-lived credentials for AWS service access +- **RBAC**: Fine-grained permissions for Kubernetes resources +- **OIDC Integration**: Centralized authentication through external identity providers +- **Least Privilege**: Minimal permissions for each component and user + +#### Secret Management +- **External Secrets**: Centralized secret management through AWS Secrets Manager +- **Automatic Rotation**: Secrets are rotated automatically without application downtime +- **Encryption**: Secrets encrypted at rest and in transit +- **Audit Trail**: All secret access is logged and monitored + +#### Network Security +- **VPC Isolation**: Network-level isolation between environments +- **Security Groups**: Application-level firewall rules +- **Network Policies**: Kubernetes-native network segmentation +- **TLS Everywhere**: End-to-end encryption for all communications + +## Deployment Scenarios + +### Platform-Only Deployment +**Purpose**: Core platform services without workshop-specific components +**Components**: +- ArgoCD for GitOps +- Crossplane for infrastructure provisioning +- External Secrets for secret management +- Ingress controller for traffic routing +- Monitoring and logging stack + +**Use Cases**: +- Production platform deployment +- Organizational platform adoption +- Custom application development + +### Full Workshop Environment +**Purpose**: Complete learning environment with sample applications +**Additional Components**: +- Backstage developer portal +- Sample applications across multiple technology stacks +- Workshop-specific configurations and examples +- Development tools and utilities + +**Use Cases**: +- Training and education +- Platform evaluation +- Proof-of-concept development + +### Development Environment +**Purpose**: Minimal setup for individual developers +**Components**: +- Single-cluster deployment +- Essential platform services only +- Local development tools integration +- Simplified networking and security + +**Use Cases**: +- Individual developer workstations +- Local testing and development +- Resource-constrained environments + +--- + +## Related Documentation + +- **Infrastructure Bootstrap**: See [platform-engineering-on-eks ARCHITECTURE.md](https://gitlab.aws.dev/aws-tfc-containers/containers-hands-on-content/platform-engineering-on-eks/-/blob/main/ARCHITECTURE.md) for infrastructure provisioning details +- **Getting Started**: See [GETTING-STARTED.md](GETTING-STARTED.md) for deployment instructions +- **Deployment Guide**: See [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for detailed deployment scenarios +- **Troubleshooting**: See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for common issues and solutions + +This architecture provides a comprehensive foundation for platform engineering that balances developer productivity, operational efficiency, and security requirements. \ No newline at end of file diff --git a/GETTING-STARTED.md b/GETTING-STARTED.md new file mode 100644 index 000000000..bc8a8f1b8 --- /dev/null +++ b/GETTING-STARTED.md @@ -0,0 +1,475 @@ +# Getting Started with Application Modernization Blueprints + +## What are Application Modernization Blueprints? + +Application Modernization Blueprints provide a comprehensive set of patterns, templates, and configurations for building modern, cloud-native applications on AWS. This repository contains the platform implementation that enables organizations to adopt platform engineering practices and accelerate application modernization initiatives. + +The blueprints include: + +- **GitOps Platform**: Complete ArgoCD-based continuous delivery platform +- **Application Templates**: Ready-to-use blueprints for multiple programming languages +- **Platform Components**: Crossplane, Backstage, monitoring, and security integrations +- **Operational Patterns**: Best practices for observability, security, and scalability +- **Developer Experience**: Self-service capabilities and standardized workflows + +## Why Use These Blueprints? + +### For Developers +- **Faster Time-to-Production**: Pre-configured templates and automated workflows +- **Consistent Patterns**: Standardized approaches across all applications +- **Self-Service Capabilities**: Deploy and manage applications independently +- **Built-in Best Practices**: Security, monitoring, and scalability included by default + +### For Platform Teams +- **Reference Implementation**: Production-ready platform engineering patterns +- **Extensible Architecture**: Customize and extend for organizational needs +- **Operational Excellence**: Integrated monitoring, logging, and alerting +- **Developer Productivity**: Reduce cognitive load and improve developer experience + +### For Organizations +- **Accelerated Modernization**: Proven patterns for application transformation +- **Reduced Risk**: Battle-tested configurations and security practices +- **Improved Governance**: Consistent policies and compliance across applications +- **Cost Optimization**: Efficient resource utilization and automated scaling + +## 40-Minute Platform Evaluation + +This quick start helps you evaluate the platform capabilities and understand the developer experience in under 40 minutes. + +### Prerequisites (5 minutes) + +#### If Using Existing Infrastructure +If you have the platform infrastructure already deployed (via CloudFormation template, manually deployed, or at an AWS event): + +- Access to the deployed VSCode IDE environment +- Platform services (ArgoCD, GitLab, Backstage) running +- Basic familiarity with Kubernetes and GitOps concepts + +#### If Starting Fresh +You'll need to deploy the infrastructure first using one of these options: + +**Option A: CloudFormation Template (Recommended for Public Users)** +- **What**: Pre-generated CloudFormation template for complete workshop setup +- **When**: First-time evaluation or workshop participation +- **Requirements**: AWS CLI, basic AWS knowledge +- **Time**: ~45 minutes for complete deployment + +**Option B: IDE-Only CloudFormation Template** +- **What**: Lightweight template that deploys only VSCode IDE environment +- **When**: You have existing platform services or want to explore platform concepts +- **Requirements**: AWS CLI, basic AWS knowledge +- **Time**: ~15 minutes for IDE deployment + +**Option C: Manual Platform Setup** +- **What**: Step-by-step manual deployment using provided guides +- **When**: Custom requirements or production-like deployment +- **Requirements**: Advanced AWS/Kubernetes knowledge +- **Time**: 2+ hours for complete setup + +**CloudFormation Deployment Steps:** + +1. **Download Template**: Get the appropriate CloudFormation template: + - **Full Workshop Template**: `peeks-workshop-team-stack-self.json` - Complete platform with all services + - **Central Stack Template**: `peeks-workshop-central-stack-self.json` - Central platform services + - Available from: [GitHub Releases](https://github.com/aws-samples/appmod-blueprints/releases) or AWS Workshop Studio + +2. **Deploy via AWS Console**: + ```bash + # Option 1: AWS Console + # 1. Open CloudFormation in AWS Console + # 2. Create Stack β†’ Upload template file + # 3. Fill in required parameter: ParticipantAssumedRoleArn + # 4. Deploy and wait for completion (~45 minutes for full, ~15 minutes for IDE-only) + + # Option 2: AWS CLI + aws cloudformation create-stack \ + --stack-name platform-engineering-workshop \ + --template-body file://peeks-workshop-team-stack-self.json \ + --parameters ParameterKey=ParticipantAssumedRoleArn,ParameterValue=arn:aws:iam::YOUR-ACCOUNT-ID:role/YourRoleName \ + --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \ + --region us-west-2 + ``` + +3. **Configure Parameters**: + - **ParticipantAssumedRoleArn**: IAM role ARN that the deployment will assume for AWS operations + - Example: `arn:aws:iam::123456789012:role/WorkshopParticipantRole` + - This role needs permissions for EKS, EC2, CloudFormation, and other AWS services used by the platform + +4. **Access Environment**: After deployment completes, check CloudFormation outputs for: + - **VSCode IDE URL**: Your development environment + - **IDE Password**: Auto-generated access credentials + +5. **Access Platform Services**: Once in the IDE environment: + ```bash + # Get all platform service URLs and credentials + ./scripts/6-tools-urls.sh + + # This script provides a table with: + # - Service URLs (ArgoCD, GitLab, Backstage, Grafana) + # - Access credentials for each service + # - Direct links to access the platform services + ``` + +### Step 1: Explore the Platform (10 minutes) + +#### Access Platform Services + +From your development environment, get all platform service URLs and credentials: + +```bash +# Get comprehensive service information +./scripts/6-tools-urls.sh + +# This provides a formatted table with: +# - ArgoCD URL and admin credentials +# - GitLab URL and root credentials +# - Backstage URL and access information +# - Grafana URL and admin credentials +# - All other platform service endpoints + +# Check platform status +kubectl get applications -n argocd +``` + +#### Explore Repository Structure + +```bash +# Navigate to the blueprints repository +cd appmod-blueprints + +# Explore key directories +ls -la applications/ # Application blueprints and examples +ls -la gitops/ # GitOps configurations for ArgoCD +ls -la platform/ # Platform components and compositions +ls -la packages/ # Helm charts and platform packages +``` + +### Step 2: Deploy a Sample Application (10 minutes) + +#### Option A: Using Backstage (Recommended) + +1. **Access Backstage Developer Portal** + - Open Backstage URL in your browser + - Browse the software catalog + - Explore available application templates + +2. **Create New Application** + - Click "Create Component" + - Choose an application template (e.g., .NET Northwind, Java Spring Boot) + - Fill in application details + - Submit the template + +3. **Monitor Deployment** + - Watch GitOps workflow in ArgoCD + - Observe application deployment progress + - Verify application health and status + +#### Option B: Using GitOps Directly + +```bash +# Deploy a sample .NET application +kubectl apply -f applications/dotnet/northwind/manifests/ + +# Monitor deployment +kubectl get applications -n argocd +kubectl get pods -n northwind-app + +# Check application logs +kubectl logs -n northwind-app -l app=northwind-api +``` + +### Step 3: Understand the Developer Workflow (10 minutes) + +#### GitOps Flow + +1. **Code Changes**: Developers push code to GitLab repositories +2. **CI Pipeline**: Automated builds and tests run in GitLab CI +3. **Image Build**: Container images built and pushed to registry +4. **GitOps Sync**: ArgoCD detects changes and deploys to clusters +5. **Monitoring**: Applications monitored via Grafana and Prometheus + +#### Self-Service Capabilities + +```bash +# View available Crossplane compositions +kubectl get compositions + +# Check platform APIs +kubectl get crds | grep -E "(backstage|crossplane|argo)" + +# Explore monitoring setup +kubectl get servicemonitors --all-namespaces +``` + +## Deployment Scenarios Comparison + +Choose the approach that best fits your evaluation or adoption needs: + +| Scenario | Time | Complexity | Use Case | Prerequisites | +|--------------------------------|-----------|------------|-------------------------------|----------------------------------| +| **πŸš€ CloudFormation Workshop** | 45 min | Low | Complete platform evaluation | AWS account, basic AWS knowledge | +| **πŸ’» CloudFormation IDE-Only** | 15 min | Low | Development environment only | AWS account, basic AWS knowledge | +| **πŸ—οΈ Platform Adoption** | 2-4 hours | Medium | Organizational implementation | Kubernetes knowledge | +| **βš™οΈ Custom Implementation** | 1-2 weeks | High | Production deployment | Advanced platform engineering | + +### CloudFormation Workshop (Recommended for First-Time Users) +- **What**: Deploy complete platform using pre-generated CloudFormation template +- **When**: First-time evaluation, workshops, or comprehensive platform assessment +- **Includes**: Full platform stack, VSCode IDE, sample applications, GitOps workflows +- **Next Steps**: Explore platform capabilities, plan organizational adoption + +### Platform Adoption +- **What**: Implement the platform for organizational use +- **When**: Ready to adopt platform engineering practices +- **Includes**: Full platform deployment, team training, application migration +- **Next Steps**: Customize platform components, onboard development teams + +### Developer Onboarding +- **What**: Learn platform workflows and self-service capabilities +- **When**: Onboarding developers to existing platform +- **Includes**: Application templates, GitOps workflows, monitoring practices +- **Next Steps**: Deploy production applications, contribute to platform evolution + +### Custom Implementation +- **What**: Adapt platform for specific organizational requirements +- **When**: Production deployment with custom needs +- **Includes**: Platform customization, security integration, operational procedures +- **Next Steps**: Production rollout, team training, continuous improvement + +## Application Blueprints Overview + +### Supported Technologies + +#### .NET Applications +- **Northwind Sample**: Clean architecture demonstration with Entity Framework +- **Microservices**: Service-to-service communication patterns +- **API Gateway**: Centralized API management and routing + +#### Java Applications +- **Spring Boot**: Microservices with Spring Cloud patterns +- **Observability**: Integrated tracing, metrics, and logging +- **Data Access**: JPA patterns with PostgreSQL integration + +#### Node.js Applications +- **Express APIs**: RESTful service patterns with modern tooling +- **Event-Driven**: Message queue integration with SQS/SNS +- **Frontend Integration**: React/Vue.js deployment patterns + +#### Python Applications +- **FastAPI Services**: High-performance async API patterns +- **Data Processing**: ETL pipelines with AWS services +- **Machine Learning**: MLOps patterns for model deployment + +#### Rust Applications +- **High-Performance Services**: Memory-safe system programming +- **WebAssembly**: Browser and edge deployment patterns +- **Async Patterns**: Tokio-based concurrent applications + +#### Go Applications +- **Cloud-Native Services**: Kubernetes-native application patterns +- **gRPC Services**: High-performance service communication +- **CLI Tools**: Platform tooling and automation utilities + +### Architecture Patterns + +#### Microservices Architecture +- Service mesh integration with Istio +- Inter-service communication patterns +- Distributed tracing and observability +- Circuit breaker and retry patterns + +#### Event-Driven Architecture +- Message queue integration (SQS, SNS, EventBridge) +- Event sourcing and CQRS patterns +- Saga pattern for distributed transactions +- Dead letter queue handling + +#### Serverless Integration +- Lambda function deployment patterns +- API Gateway integration +- Event-driven serverless workflows +- Cost optimization strategies + +## What's Next? + +After completing the evaluation, choose your path forward: + +### 🎯 Platform Adopters +1. **Architecture Review**: Study [ARCHITECTURE.md](ARCHITECTURE.md) for platform design details +2. **Deployment Planning**: Review [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for production setup +3. **Team Preparation**: Plan developer onboarding and training programs +4. **Customization**: Identify platform modifications for your organization + +### πŸ‘©β€πŸ’» Developers +1. **Template Exploration**: Try different application blueprints +2. **Workflow Mastery**: Practice GitOps deployment patterns +3. **Platform Services**: Learn to use Backstage, monitoring, and security tools +4. **Contribution**: Add new templates or improve existing patterns + +### βš™οΈ Platform Engineers +1. **Component Deep Dive**: Understand Crossplane compositions and platform APIs +2. **Customization**: Extend platform capabilities for organizational needs +3. **Operations**: Set up monitoring, alerting, and maintenance procedures +4. **Security**: Implement organization-specific security and compliance requirements + +## Troubleshooting Common Issues + +### IDE Configuration Issues + +**Platform Services Not Available or Environment Variables Missing** +```bash +# Re-run the configuration entrypoint script +./scripts/0-install.sh + +# This script will: +# - Set up environment variables for platform services +# - Configure kubectl access to clusters +# - Install required tools and dependencies +# - Verify platform service connectivity +``` + +**IDE Environment Corrupted or Incomplete Setup** +```bash +# From within the VSCode IDE terminal, re-run setup +cd /workspace/appmod-blueprints +./scripts/0-install.sh + +# Check if environment variables are properly set +echo $ARGOCD_URL +echo $GITLAB_URL +echo $BACKSTAGE_URL +``` + +### Platform Access Problems + +**Cannot Access Backstage/ArgoCD** +```bash +# Check service status +kubectl get services -n backstage +kubectl get services -n argocd + +# Verify ingress configuration +kubectl get ingress --all-namespaces + +# Check pod health +kubectl get pods -n backstage +kubectl get pods -n argocd +``` + +**GitLab Authentication Issues** +```bash +# Verify GitLab service +kubectl get services -n gitlab + +# Check GitLab pod logs +kubectl logs -n gitlab -l app=gitlab + +# Verify external access +curl -k $GITLAB_URL/api/v4/version +``` + +### Application Deployment Issues + +**ArgoCD Sync Failures** +```bash +# Check ArgoCD application status +kubectl get applications -n argocd + +# View application details +kubectl describe application -n argocd + +# Check ArgoCD logs +kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server +``` + +**Application Pod Failures** +```bash +# Check pod status +kubectl get pods -n + +# View pod logs +kubectl logs -n + +# Describe pod for events +kubectl describe pod -n +``` + +### Getting Help + +- **Platform Issues**: Check [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for detailed solutions +- **Architecture Questions**: Review [ARCHITECTURE.md](ARCHITECTURE.md) for platform design +- **Deployment Problems**: See [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for alternative approaches +- **Infrastructure Setup**: Use provided CloudFormation templates or follow [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) + +## Validation and Success Criteria + +### Platform Readiness Checklist + +Verify your platform is ready for application deployment: + +- [ ] **ArgoCD**: Applications syncing successfully +- [ ] **Backstage**: Developer portal accessible with templates +- [ ] **GitLab**: Repositories accessible with CI/CD pipelines +- [ ] **Monitoring**: Grafana dashboards showing platform metrics +- [ ] **Networking**: Service mesh and ingress working correctly + +### Application Deployment Validation + +After deploying a sample application: + +- [ ] **Deployment**: Application pods running and healthy +- [ ] **Networking**: Application accessible via ingress +- [ ] **Monitoring**: Metrics and logs flowing to observability stack +- [ ] **GitOps**: Changes sync automatically from Git repositories +- [ ] **Security**: Security policies applied and enforced + +### Developer Experience Validation + +Confirm the developer experience meets expectations: + +- [ ] **Self-Service**: Developers can deploy applications independently +- [ ] **Templates**: Application blueprints work as expected +- [ ] **Feedback**: Clear status and error messages throughout workflows +- [ ] **Documentation**: Developers can find help and guidance easily +- [ ] **Performance**: Reasonable response times for platform operations + +## Security and Compliance + +### Security Best Practices + +The platform implements several security patterns: + +- **Network Policies**: Kubernetes network segmentation +- **RBAC**: Role-based access control for platform services +- **Secret Management**: External Secrets Operator integration +- **Image Security**: Container image scanning and policies +- **Service Mesh**: mTLS for service-to-service communication + +### Compliance Considerations + +For production deployments, consider: + +- **Data Governance**: Data classification and handling policies +- **Audit Logging**: Comprehensive audit trails for all platform operations +- **Access Controls**: Integration with organizational identity providers +- **Vulnerability Management**: Regular security scanning and patching +- **Backup and Recovery**: Data protection and disaster recovery procedures + +### Security Validation + +```bash +# Check network policies +kubectl get networkpolicies --all-namespaces + +# Verify RBAC configuration +kubectl get rolebindings --all-namespaces +kubectl get clusterrolebindings + +# Check security policies +kubectl get psp # Pod Security Policies (if enabled) +kubectl get pss # Pod Security Standards (if enabled) + +# Review service mesh security +kubectl get peerauthentications --all-namespaces +kubectl get authorizationpolicies --all-namespaces +``` \ No newline at end of file diff --git a/README.md b/README.md index 4e5b5e8a4..847493205 100644 --- a/README.md +++ b/README.md @@ -1,53 +1,171 @@ -# Modern Engineering on AWS +# Platform Engineering on EKS - Application Modernization Blueprints -This repository is the main solution repository for the Modern Engineering on AWS initiative. It contains a comprehensive set of application modernization blueprints and patterns that cover various aspects of modern cloud-native development and operations on AWS. +## What is this? -## Overview +This repository contains the platform implementation and application modernization blueprints for building modern, cloud-native applications on AWS. It provides GitOps configurations, platform components, application templates, and operational patterns that work together to create a comprehensive platform engineering solution. -Modern Engineering on AWS is an initiative aimed at providing developers and organizations with best practices, patterns, and blueprints for building and managing modern applications on the AWS cloud platform. This repository serves as a central resource for implementing cutting-edge engineering practices in cloud environments. +## Quick Start + +Choose your path: + +- **πŸš€ Try it now** (30 min): [GETTING-STARTED.md](GETTING-STARTED.md) +- **πŸ—οΈ Deploy it**: [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) +- **πŸ”§ Understand it**: [ARCHITECTURE.md](ARCHITECTURE.md) +- **❓ Fix issues**: [TROUBLESHOOTING.md](TROUBLESHOOTING.md) + +## Repository Relationship + +This repository works with [platform-engineering-on-eks](https://gitlab.aws.dev/aws-tfc-containers/containers-hands-on-content/platform-engineering-on-eks) to provide a complete platform engineering solution. + +- **This repo (appmod-blueprints)**: Platform implementation, GitOps configurations, application blueprints, and operational patterns +- **Other repo (platform-engineering-on-eks)**: Infrastructure bootstrap, CDK deployment, workshop environment setup ## Key Features -- **Platform Engineering**: Blueprints and patterns for setting up robust, scalable platform infrastructure on AWS. -- **Application Deployment**: Best practices and tools for efficient and reliable application deployment processes. -- **GitOps**: Implementations of GitOps principles for managing infrastructure and applications as code. -- **KubeVela**: Integration patterns and examples using KubeVela for application delivery and management. -- **Progressive Delivery**: Strategies and implementations for gradual rollouts and feature flagging. +- **GitOps Platform**: Complete ArgoCD-based platform with multi-environment support +- **Application Blueprints**: Ready-to-use templates for .NET, Java, Node.js, Python, Rust, and Go applications +- **Platform Components**: Crossplane, Backstage, monitoring, security, and networking configurations +- **Developer Experience**: Self-service capabilities through Backstage templates and GitOps workflows +- **Production Ready**: Scalable patterns for real-world platform adoption + +## Who Should Use This + +- **πŸ‘©β€πŸ’» Developers**: Building and deploying applications on the platform +- **🏒 Platform Adopters**: Implementing platform engineering in their organization +- **βš™οΈ Platform Engineers**: Customizing and extending platform capabilities +- **🎯 DevOps Teams**: Establishing GitOps workflows and operational practices ## Application Blueprints -Our repository includes various application blueprints that demonstrate modern engineering practices. These blueprints cover: +Our repository includes various application blueprints demonstrating modern engineering practices: -1. Microservices architectures -2. Serverless applications -3. Containerized applications -4. Event-driven architectures -5. CI/CD pipelines +### Supported Technologies -Each blueprint provides a detailed walkthrough of the architecture, implementation details, and best practices. +- **.NET**: Northwind sample application with clean architecture +- **Java**: Spring Boot microservices with observability +- **Node.js**: Express applications with modern tooling +- **Python**: FastAPI services with async patterns +- **Rust**: High-performance web services +- **Go**: Cloud-native microservices -## Getting Started +### Architecture Patterns -To get started with Modern Engineering on AWS: +- Microservices architectures with service mesh +- Event-driven architectures with messaging +- Serverless integration patterns +- Progressive delivery and canary deployments +- Observability and monitoring integration -1. Clone this repository -2. Provision the platform that consists of the management cluster and two workloads clusters that represent dev and prod environments. The environments can be customized based on the customer needs. -3. Navigate to the specific tech or pattern you're interested in, e.g platform/backstage, platform/crossplane, platform/components (or traits). -4. Follow the README instructions in each subdirectory for detailed setup and usage guidelines +## Platform Components -## Contributing +### Core Services -We welcome contributions to the Modern Engineering on AWS initiative. Please read our [CONTRIBUTING](CONTRIBUTING.md) guide for details on our code of conduct and the process for submitting pull requests. +- **Backstage**: Developer portal and service catalog +- **ArgoCD**: GitOps continuous delivery +- **Crossplane**: Infrastructure as code and composition +- **Grafana**: Monitoring and observability +- **Keycloak**: Identity and access management -## Security +### Development Tools -See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information on reporting security issues. +- **GitLab**: Source code management and CI/CD +- **VS Code**: Cloud-based development environment +- **Argo Workflows**: Workflow orchestration +- **External Secrets**: Secure secret management -## License +## Need Help? + +- Check [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for common platform and application issues +- Review [ARCHITECTURE.md](ARCHITECTURE.md) for platform design and component relationships +- Explore [applications/](applications/) for example implementations +- Visit the [platform-engineering-on-eks repository](https://gitlab.aws.dev/aws-tfc-containers/containers-hands-on-content/platform-engineering-on-eks) for infrastructure setup + +## Quick Reference + +### Prerequisites + +Before using the platform, ensure you have the required setup: + +- **πŸ“‹ Setup Guide**: See [GETTING-STARTED.md](GETTING-STARTED.md#prerequisites) for complete prerequisites +- **πŸ”§ Platform Access**: GitLab URL, ArgoCD URL, Backstage portal access +- **βš™οΈ Environment**: IDE with required environment variables and credentials + +### For Developers + +```bash +# Access platform services (URLs provided after deployment) +# - Backstage Developer Portal: Self-service application creation +# - GitLab: Source code management and CI/CD +# - ArgoCD: GitOps deployment status and management + +# Deploy applications using Backstage templates +# 1. Access Backstage developer portal +# 2. Choose application template +# 3. Fill in application details +# 4. GitOps workflow handles deployment automatically +``` + +### For Platform Teams -This library is licensed under the MIT-0 License. See the LICENSE file for details. +```bash +# Platform components are managed via GitOps +# Key directories for customization: +# - gitops/platform/ - Core platform services +# - packages/ - Helm charts and configurations +# - platform/components/ - Crossplane compositions +# - platform/traits/ - Application deployment patterns +``` -## Contact +### Repository Structure -For any questions or feedback regarding Modern Engineering on AWS, please open an issue in this repository. +- `applications/` - Application blueprints and examples +- `gitops/` - GitOps configurations for ArgoCD +- `packages/` - Helm charts and platform packages +- `platform/` - Platform components and compositions +- `deployment/` - Environment-specific configurations + +## Getting Started Paths + +### πŸš€ Quick Evaluation (30 minutes) + +Perfect for decision makers and teams evaluating platform engineering: + +1. Use the [platform-engineering-on-eks](https://gitlab.aws.dev/aws-tfc-containers/containers-hands-on-content/platform-engineering-on-eks) repository to deploy infrastructure +2. Access the pre-configured development environment +3. Deploy a sample application using Backstage templates +4. Experience the complete developer workflow + +### πŸ—οΈ Platform Adoption + +For teams ready to implement platform engineering: + +1. Review [ARCHITECTURE.md](ARCHITECTURE.md) to understand the platform design +2. Follow [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for production deployment +3. Customize platform components in `platform/` directory +4. Adapt application blueprints for your technology stack + +### πŸ‘©β€πŸ’» Application Development + +For developers using an existing platform: + +1. Access your organization's Backstage developer portal +2. Browse available application templates +3. Create new applications using self-service workflows +4. Follow GitOps patterns for deployment and updates + +## Contributing + +We welcome contributions to improve the platform and add new application blueprints: + +- Read our [CONTRIBUTING.md](CONTRIBUTING.md) guide +- Follow our [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) +- Submit issues for bugs or feature requests +- Create pull requests for improvements + +## Security + +See [CONTRIBUTING.md](CONTRIBUTING.md#security-issue-notifications) for information on reporting security issues. + +## License +This project is licensed under the MIT-0 License. See the [LICENSE](LICENSE) file for details. \ No newline at end of file diff --git a/TODO.md b/TODO.md index e415868d1..af605a5d1 100644 --- a/TODO.md +++ b/TODO.md @@ -92,3 +92,4 @@ task taskcat-clean-deployment - [x] cloudwatch logs groups - like /aws/codebuild/PEEKSGITIAMStackDeployProje-30HWWmiBcgx0 or /aws/lambda/tCaT-peeks-workshop-test--IDEPEEKSIdePasswordExpor-OOmtoIML4oGw or tCaT-peeks-workshop-test-fleet-workshop-test-92b4118d493d47dcb827190d4e5ac6b9-IDEPEEKSIdeLogGroup3808F7B1-xMtRAbEgY8Ho +- Does WORKSHOP_ID=28c283c1-1d60-43fa-a604-4e983e0e8038 is the goor one ? \ No newline at end of file From 0fb0c841fc656ad5b439f8376ffcbc12517df046 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Allamand?= Date: Mon, 15 Sep 2025 08:17:00 +0200 Subject: [PATCH 2/3] add docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: SΓ©bastien Allamand --- ARCHITECTURE.md | 277 ++++++- DEPLOYMENT-GUIDE.md | 1875 +++++++++++++++++++++++++++++++++++++++++++ GETTING-STARTED.md | 60 +- TODO.md | 9 +- TROUBLESHOOTING.md | 761 ++++++++++++++++++ 5 files changed, 2934 insertions(+), 48 deletions(-) create mode 100644 DEPLOYMENT-GUIDE.md create mode 100644 TROUBLESHOOTING.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 87ff33281..4fa434b6a 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,21 +1,34 @@ +--- +title: "Platform Engineering on EKS - Platform Architecture Guide" +persona: ["workshop-participant", "platform-adopter", "infrastructure-engineer", "developer"] +deployment-scenario: ["full-workshop", "platform-only", "ide-only", "manual"] +difficulty: "intermediate" +estimated-time: "45 minutes" +prerequisites: ["EKS cluster", "Basic GitOps knowledge", "Kubernetes experience"] +related-pages: ["GETTING-STARTED.md", "DEPLOYMENT-GUIDE.md", "platform-engineering-on-eks/ARCHITECTURE.md"] +repository: "appmod-blueprints" +last-updated: "2025-01-09" +--- + # Platform Engineering on EKS - Platform Architecture Guide -This document provides a comprehensive overview of the platform components and GitOps architecture implemented in the appmod-blueprints repository, focusing on the platform services and application deployment patterns. +This document provides a comprehensive overview of the platform components and GitOps architecture implemented in the appmod-blueprints repository, focusing on the platform services, application deployment patterns, and operational workflows that run on AWS EKS infrastructure. ## πŸ“š For Workshop Participants -This guide will help you understand how the platform components work together to support your application development and deployment exercises. +This guide will help you understand how the platform components work together to support your application development and deployment exercises. Learn about the self-service capabilities, GitOps workflows, and how to use Backstage templates for application creation. ## 🏒 For Platform Adopters -Use this guide to understand the platform architecture patterns and GitOps workflows that can be implemented in your organization for production use. +Use this guide to understand the platform architecture patterns, GitOps workflows, and operational practices that can be implemented in your organization for production use. Focus on the platform services, multi-cluster patterns, and governance capabilities that enable developer productivity at scale. ## βš™οΈ For Infrastructure Engineers -This document provides detailed technical specifications for platform components, GitOps configurations, and customization points for extending the platform. +This document provides detailed technical specifications for platform components, GitOps configurations, Kubernetes operators, and customization points for extending the platform. Understand the CloudFormation-based infrastructure patterns and how platform services integrate with AWS services. ## πŸ‘©β€πŸ’» For Developers -Learn how platform services support your development workflows, from code commit to production deployment through self-service capabilities. +Learn how platform services support your development workflows, from code commit to production deployment through self-service capabilities. Understand how to use Backstage for application scaffolding, GitOps for deployment, and platform services for infrastructure provisioning. ## Table of Contents - [Overview](#overview) +- [Key Concepts and Terminology](#key-concepts-and-terminology) - [Platform Architecture](#platform-architecture) - [GitOps Architecture](#gitops-architecture) - [Platform Components](#platform-components) @@ -23,10 +36,67 @@ Learn how platform services support your development workflows, from code commit - [Data Flow and Workflows](#data-flow-and-workflows) - [Security and Compliance](#security-and-compliance) - [Deployment Scenarios](#deployment-scenarios) +- [Integration Points](#integration-points) ## Overview -The appmod-blueprints repository contains the platform implementation for a complete GitOps-based platform engineering solution. This repository provides the platform services, application templates, and operational patterns that run on the infrastructure established by the platform-engineering-on-eks repository. +The appmod-blueprints repository contains the platform implementation for a complete GitOps-based platform engineering solution. This repository provides the platform services, application templates, and operational patterns that run on AWS EKS infrastructure provisioned through CloudFormation templates and Terraform modules. + +### Platform Foundation + +The platform is built on AWS EKS clusters with the following foundational services: +- **EKS Clusters**: Container orchestration platform running Kubernetes +- **AWS Load Balancer Controller**: Ingress and service load balancing +- **AWS VPC CNI**: Native AWS networking for pods +- **EBS CSI Driver**: Persistent storage for stateful applications +- **AWS Secrets Manager**: Centralized secret management +- **AWS Systems Manager**: Configuration parameter storage + +### Repository Relationship + +This platform implementation works with the bootstrap infrastructure: + +1. **appmod-blueprints** (this repository): Platform services, GitOps workflows, and application templates +2. **platform-engineering-on-eks**: Bootstrap infrastructure that provisions the foundational AWS services + +### Key Platform Capabilities + +- **GitOps-Based Deployment**: ArgoCD manages all platform and application deployments +- **Self-Service Infrastructure**: Crossplane enables developers to provision AWS resources +- **Developer Portal**: Backstage provides application templates and service catalog +- **Multi-Cluster Management**: Hub-and-spoke architecture for environment isolation +- **Automated Secret Management**: External Secrets Operator syncs from AWS Secrets Manager +- **Application Blueprints**: Pre-configured templates for multiple technology stacks + +## Key Concepts and Terminology + +### Platform Architecture Terms +- **Platform Services**: Core Kubernetes operators and controllers that provide platform capabilities +- **GitOps Workflow**: Declarative deployment pattern using Git as the source of truth +- **Hub-and-Spoke Architecture**: Multi-cluster pattern with centralized control plane and distributed workload clusters +- **Application Blueprints**: Standardized templates for different technology stacks and deployment patterns +- **Self-Service Infrastructure**: Developer-accessible APIs for provisioning cloud resources + +### GitOps Terms +- **ArgoCD**: GitOps controller that manages continuous deployment from Git repositories +- **ApplicationSets**: ArgoCD resources that enable templated, multi-cluster application deployment +- **GitOps Bridge**: Data pipeline connecting infrastructure metadata to GitOps applications +- **Cluster Registration**: Process where clusters automatically register metadata for GitOps discovery +- **Sync Waves**: Ordered deployment phases ensuring proper dependency management + +### Platform Components +- **Backstage**: Developer portal providing self-service application creation and service catalog +- **Crossplane**: Kubernetes-native infrastructure as code for cloud resource provisioning +- **External Secrets Operator**: Kubernetes operator for syncing secrets from external systems +- **Ingress Controller**: Traffic routing and load balancing for applications +- **Service Mesh**: Communication layer providing security, observability, and traffic management + +### Infrastructure Integration Terms +- **CloudFormation Integration**: AWS native infrastructure as code service integration patterns +- **Pod Identity**: AWS EKS feature for secure, credential-free access to AWS services +- **Resource Prefix**: Consistent naming convention for all platform resources +- **Environment Isolation**: Separation of development, staging, and production environments +- **Multi-Tenant Architecture**: Platform design supporting multiple teams and applications securely ## Platform Architecture @@ -34,45 +104,86 @@ The appmod-blueprints repository contains the platform implementation for a comp ```mermaid graph TB - subgraph "Developer Interface" - BACKSTAGE[Backstage Portal] - IDE[Development Environment] - GIT[Git Repositories] + subgraph "Developer Interface Layer" + BACKSTAGE[Backstage Portal
Self-Service Templates] + IDE[Development Environment
VSCode + GitLab] + GIT[Git Repositories
Source Code & GitOps] end - subgraph "Platform Control Plane" - ARGOCD[ArgoCD] - CROSSPLANE[Crossplane] - EXTERNAL_SECRETS[External Secrets Operator] - CERT_MANAGER[Cert Manager] + subgraph "Platform Control Plane (Hub Cluster)" + ARGOCD[ArgoCD
GitOps Controller] + CROSSPLANE[Crossplane
Infrastructure as Code] + ESO[External Secrets Operator
Secret Management] + CERT_MGR[Cert Manager
TLS Automation] + GITLAB[GitLab
Git Repository Hosting] end - subgraph "Application Runtime" - INGRESS[Ingress Controller] - MONITORING[Monitoring Stack] - LOGGING[Logging Stack] - SERVICE_MESH[Service Mesh] + subgraph "Application Runtime (Spoke Clusters)" + INGRESS[AWS Load Balancer Controller
Traffic Routing] + MONITORING[Monitoring Stack
Prometheus + Grafana] + LOGGING[Logging Stack
Fluent Bit + CloudWatch] + WORKLOADS[Application Workloads
Multi-Language Support] end - subgraph "Infrastructure Services" - EKS[EKS Clusters] - RDS[RDS Databases] - S3[S3 Storage] - SECRETS[AWS Secrets Manager] + subgraph "AWS Infrastructure Services" + EKS[EKS Clusters
Hub + Spoke Architecture] + VPC[VPC & Networking
Multi-AZ Configuration] + SECRETS[AWS Secrets Manager
Centralized Secrets] + RDS[RDS Databases
Managed Databases] + S3[S3 Storage
Object Storage] + IAM[IAM Roles
Pod Identity] end - BACKSTAGE --> ARGOCD + BACKSTAGE --> GIT IDE --> GIT GIT --> ARGOCD ARGOCD --> CROSSPLANE - ARGOCD --> EXTERNAL_SECRETS - ARGOCD --> CERT_MANAGER + ARGOCD --> ESO + ARGOCD --> CERT_MGR ARGOCD --> INGRESS ARGOCD --> MONITORING + ARGOCD --> LOGGING + ARGOCD --> WORKLOADS + CROSSPLANE --> RDS CROSSPLANE --> S3 - EXTERNAL_SECRETS --> SECRETS + ESO --> SECRETS + ESO --> IAM + INGRESS --> EKS + MONITORING --> EKS + WORKLOADS --> VPC + + style ARGOCD fill:#e1f5fe + style BACKSTAGE fill:#f3e5f5 + style EKS fill:#e8f5e8 + style SECRETS fill:#fff3e0 +``` + +### Platform Service Dependencies + +```mermaid +sequenceDiagram + participant Dev as Developer + participant Backstage as Backstage Portal + participant Git as Git Repository + participant ArgoCD as ArgoCD + participant ESO as External Secrets + participant AWS as AWS Services + participant K8s as Kubernetes + + Note over Dev,K8s: Application Creation & Deployment + Dev->>Backstage: Create application from template + Backstage->>Git: Generate code & GitOps config + Dev->>Git: Push application changes + + Note over Git,K8s: GitOps Deployment Flow + Git->>ArgoCD: Webhook notification + ArgoCD->>ESO: Trigger secret sync + ESO->>AWS: Fetch secrets from Secrets Manager + ESO->>K8s: Create Kubernetes secrets + ArgoCD->>K8s: Deploy application manifests + K8s->>Dev: Application ready notification ``` ## GitOps Architecture @@ -139,12 +250,12 @@ sequenceDiagram ### GitOps Bridge Architecture -The GitOps Bridge is a critical component that connects infrastructure provisioning with ArgoCD-based GitOps deployments. It acts as a data pipeline that passes infrastructure metadata from the bootstrap infrastructure to Kubernetes secrets, which are then consumed by ArgoCD ApplicationSets. +The GitOps Bridge is a critical component that connects infrastructure provisioning with ArgoCD-based GitOps deployments. It acts as a data pipeline that passes infrastructure metadata from CloudFormation stacks and Terraform modules to Kubernetes secrets, which are then consumed by ArgoCD ApplicationSets. ```mermaid graph TB subgraph "Infrastructure Layer" - BOOTSTRAP[Bootstrap Infrastructure
CDK/Terraform] + BOOTSTRAP[Bootstrap Infrastructure
CloudFormation/Terraform] ADDONS_META[Infrastructure Metadata
Cluster info, secrets, URLs] end @@ -181,7 +292,7 @@ graph TB The GitOps Bridge enables infrastructure data to flow seamlessly into GitOps applications: -1. **Infrastructure Metadata Collection**: The bootstrap infrastructure (from platform-engineering-on-eks) collects key information like cluster names, AWS regions, VPC IDs, and service URLs +1. **Infrastructure Metadata Collection**: CloudFormation stacks and Terraform modules collect key information like cluster names, AWS regions, VPC IDs, and service URLs 2. **Bridge Module Processing**: The GitOps Bridge module transforms this metadata into Kubernetes secrets that can be consumed by ArgoCD applications @@ -316,7 +427,50 @@ spec: - CreateNamespace=true ``` -#### Multi-Tool Support +#### Infrastructure Integration Patterns + +The platform supports cluster creation and infrastructure provisioning through multiple tools and patterns: + +#### CloudFormation Integration +The platform integrates with AWS CloudFormation for infrastructure provisioning: + +```yaml +# CloudFormation template for EKS cluster with registration +Resources: + EKSCluster: + Type: AWS::EKS::Cluster + Properties: + Name: !Sub "${ResourcePrefix}-${Environment}-cluster" + Version: "1.28" + RoleArn: !GetAtt EKSServiceRole.Arn + ResourcesVpcConfig: + SubnetIds: !Ref SubnetIds + SecurityGroupIds: + - !Ref EKSSecurityGroup + + ClusterRegistrationSecret: + Type: AWS::SecretsManager::Secret + Properties: + Name: !Sub "cluster-registration-${EKSCluster}" + SecretString: !Sub | + { + "cluster_name": "${EKSCluster}", + "cluster_endpoint": "${EKSCluster.Endpoint}", + "resource_prefix": "${ResourcePrefix}", + "environment": "${Environment}", + "labels": { + "environment": "${Environment}", + "cluster-type": "spoke", + "resource-prefix": "${ResourcePrefix}" + }, + "annotations": { + "addons_repo_basepath": "gitops/addons/", + "resource_prefix": "${ResourcePrefix}" + } + } +``` + +### Multi-Tool Support The platform supports cluster creation through multiple infrastructure tools: @@ -729,4 +883,61 @@ graph TB - **Deployment Guide**: See [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for detailed deployment scenarios - **Troubleshooting**: See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for common issues and solutions +## Integration Points + +### Bootstrap Infrastructure Integration + +This platform implementation integrates with the bootstrap infrastructure through several key integration points: + +#### Infrastructure Dependency Flow +```mermaid +sequenceDiagram + participant Bootstrap as Bootstrap Infrastructure + participant AWS as AWS Services + participant ESO as External Secrets Operator + participant ArgoCD as ArgoCD + participant Platform as Platform Services + + Bootstrap->>AWS: Create EKS clusters & secrets + Bootstrap->>AWS: Configure VPC & networking + ESO->>AWS: Sync secrets to Kubernetes + ArgoCD->>AWS: Discover cluster registration secrets + ArgoCD->>Platform: Deploy platform services + Platform->>AWS: Consume infrastructure resources +``` + +#### Key Integration Mechanisms + +1. **Cluster Discovery**: ArgoCD ApplicationSets automatically discover EKS clusters through registration secrets created by the bootstrap infrastructure + +2. **Secret Synchronization**: External Secrets Operator syncs AWS Secrets Manager secrets created by the bootstrap infrastructure + +3. **Network Integration**: Platform services use VPC, subnets, and security groups provisioned by the bootstrap infrastructure + +4. **IAM Integration**: Platform components use Pod Identity roles and policies created by the bootstrap infrastructure + +#### Infrastructure Prerequisites + +The platform requires the following infrastructure components to be provisioned first: + +- **EKS Clusters**: Hub and spoke clusters with proper networking and security configurations +- **AWS Secrets Manager**: Secrets for platform services (Keycloak, Backstage, etc.) +- **VPC Configuration**: Proper networking setup with subnets and security groups +- **IAM Roles**: Pod Identity associations for secure AWS service access +- **GitOps Repositories**: Git repositories configured for ArgoCD access + +#### Cross-Repository Dependencies + +- **Bootstrap β†’ Platform**: Bootstrap infrastructure must be deployed before platform services +- **Shared Configuration**: Both repositories use consistent resource naming patterns and metadata +- **Secret Management**: Platform consumes secrets created by bootstrap infrastructure +- **Network Policies**: Platform services rely on network configurations from bootstrap + +### Related Documentation + +For complete infrastructure understanding, refer to: +- **Bootstrap Infrastructure**: [platform-engineering-on-eks ARCHITECTURE.md](https://gitlab.aws.dev/aws-tfc-containers/containers-hands-on-content/platform-engineering-on-eks/-/blob/main/ARCHITECTURE.md) for infrastructure provisioning details +- **Deployment Guide**: [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for detailed platform deployment scenarios +- **Getting Started**: [GETTING-STARTED.md](GETTING-STARTED.md) for platform evaluation and adoption paths + This architecture provides a comprehensive foundation for platform engineering that balances developer productivity, operational efficiency, and security requirements. \ No newline at end of file diff --git a/DEPLOYMENT-GUIDE.md b/DEPLOYMENT-GUIDE.md new file mode 100644 index 000000000..5552fda7c --- /dev/null +++ b/DEPLOYMENT-GUIDE.md @@ -0,0 +1,1875 @@ +# Deployment Guide - Application Modernization Blueprints + +## Overview + +This guide provides comprehensive deployment instructions for the Application Modernization Blueprints platform. Choose the deployment approach that best fits your organizational needs, from quick evaluation to production-ready platform adoption. + +## Deployment Scenarios Comparison + +| Scenario | Time | Complexity | Use Case | Prerequisites | +|--------------------------------|-----------|------------|-------------------------------|----------------------------------| +| **πŸš€ CloudFormation Workshop** | 45 min | Low | Complete platform evaluation | AWS account, basic AWS knowledge | +| **πŸ’» CloudFormation IDE-Only** | 15 min | Low | Development environment only | AWS account, basic AWS knowledge | +| **πŸ—οΈ Platform Adoption** | 2-4 hours | Medium | Organizational implementation | Kubernetes knowledge | +| **βš™οΈ Custom Implementation** | 1-2 weeks | High | Production deployment | Advanced platform engineering | + +## Prerequisites + +### Basic Requirements + +All deployment scenarios require: + +- AWS account with appropriate permissions +- AWS CLI v2 configured +- Basic understanding of Kubernetes and GitOps concepts + +### Tool Verification + +Verify all required tools are installed correctly: + +```bash +# Core tools +aws --version # Should show AWS CLI 2.x +kubectl version --client # Should show kubectl version +jq --version # Should show jq version +yq --version # Should show yq version +direnv --version # Should show direnv version (if installed) + +# Platform tools (if installed) +helm version # Should show Helm version +argocd version --client # Should show ArgoCD CLI version +# Note: eksctl is optional - you can use existing EKS clusters + +# Development tools (if installed) +node --version # Should show Node.js version +npm --version # Should show npm version +yarn --version # Should show yarn version +docker --version # Should show Docker version +terraform --version # Should show Terraform version + +# Test AWS authentication +aws sts get-caller-identity +``` + +### Tool Installation + +#### Core Tools (Required for all scenarios) + +```bash +# AWS CLI v2 +curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" +unzip awscliv2.zip && sudo ./aws/install + +# kubectl (Kubernetes command-line tool) +curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" +sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl + +# jq (JSON processor) +# macOS +brew install jq +# Ubuntu/Debian +sudo apt-get install jq +# Amazon Linux/RHEL/CentOS +sudo yum install jq + +# yq (YAML processor) +# macOS +brew install yq +# Linux +sudo wget -qO /usr/local/bin/yq https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 +sudo chmod +x /usr/local/bin/yq + +# direnv (environment variable management - recommended) +# macOS +brew install direnv +# Ubuntu/Debian +sudo apt-get install direnv +# Add to your shell profile: eval "$(direnv hook bash)" or eval "$(direnv hook zsh)" + +# Git +# Usually pre-installed on most systems +``` + +**Platform-Specific Notes:** +- **macOS**: Most tools can be installed via Homebrew (`brew install `) +- **Ubuntu/Debian**: Use `apt-get install ` for system packages +- **Amazon Linux/RHEL/CentOS**: Use `yum install ` for system packages +- **Windows**: Consider using WSL2 with Ubuntu for the best experience + +#### Platform Tools (Required for platform adoption scenarios) + +```bash +# Helm (Kubernetes package manager) +curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash + +# ArgoCD CLI (for GitOps management) +# macOS +brew install argocd +# Linux +curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64 +sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd +rm argocd-linux-amd64 + +# Note: eksctl is optional - you can use existing EKS clusters or other cluster creation methods +``` + +#### Development Tools (Optional) + +```bash +# Docker (for local development and testing) +# macOS +brew install --cask docker +# Ubuntu/Debian +curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh + +# Node.js and npm (for Backstage development) +curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash +nvm install 20 && nvm use 20 + +# Yarn (alternative package manager) +npm install -g yarn + +# Terraform (for infrastructure as code) +# macOS +brew install terraform +# Linux +wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg +echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list +sudo apt update && sudo apt install terraform +``` + +## Scenario 1: CloudFormation Workshop (Recommended) + +**Best for**: First-time evaluation, workshops, comprehensive platform assessment + +### What You Get + +- Complete platform stack with all services +- VSCode IDE accessible via browser +- ArgoCD, GitLab, Backstage, and monitoring +- Sample applications and GitOps workflows +- Pre-configured development environment + +### Step-by-Step Deployment + +#### 1. Download CloudFormation Template (2 minutes) + +```bash +# Download the complete workshop template +curl -o peeks-workshop-team-stack-self.json \ + https://github.com/aws-samples/appmod-blueprints/releases/latest/download/peeks-workshop-team-stack-self.json + +# Or download from AWS Workshop Studio if attending an event +``` + +#### 2. Deploy via AWS Console (40 minutes) + +1. **Open CloudFormation Console** + - Navigate to AWS CloudFormation in your target region + - Click "Create Stack" β†’ "With new resources" + +2. **Upload Template** + - Choose "Upload a template file" + - Select the downloaded JSON file + - Click "Next" + +3. **Configure Parameters** + - **Stack Name**: `platform-engineering-workshop` + - **ParticipantAssumedRoleArn**: `arn:aws:iam::YOUR-ACCOUNT-ID:role/YourRoleName` + - Click "Next" + +4. **Configure Stack Options** + - Add tags if desired + - Leave other options as default + - Click "Next" + +5. **Review and Deploy** + - Check "I acknowledge that AWS CloudFormation might create IAM resources" + - Click "Create Stack" + +#### 3. Alternative: Deploy via AWS CLI + +```bash +# Set your parameters +export AWS_REGION=us-west-2 +export PARTICIPANT_ROLE_ARN="arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/YourRoleName" + +# Deploy the stack +aws cloudformation create-stack \ + --stack-name platform-engineering-workshop \ + --template-body file://peeks-workshop-team-stack-self.json \ + --parameters ParameterKey=ParticipantAssumedRoleArn,ParameterValue=$PARTICIPANT_ROLE_ARN \ + --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \ + --region $AWS_REGION +``` + +#### 4. Access Your Environment (3 minutes) + +After deployment completes (~45 minutes): + +```bash +# Get IDE access information +IDE_URL=$(aws cloudformation describe-stacks \ + --stack-name platform-engineering-workshop \ + --query "Stacks[0].Outputs[?OutputKey=='IdeUrl'].OutputValue" \ + --output text) + +IDE_PASSWORD=$(aws cloudformation describe-stacks \ + --stack-name platform-engineering-workshop \ + --query "Stacks[0].Outputs[?OutputKey=='IdePassword'].OutputValue" \ + --output text) + +echo "IDE URL: $IDE_URL" +echo "IDE Password: $IDE_PASSWORD" +``` + +### Validation Steps + +From within the IDE environment: + +```bash +# Get platform service URLs and credentials +./scripts/6-tools-urls.sh + +# Verify platform services +kubectl get applications -n argocd + +# Check platform health +kubectl get pods --all-namespaces | grep -E "(argocd|gitlab|backstage|grafana)" + +# Test sample application deployment +kubectl get applications -n argocd -o wide +``` + +### Expected Outcomes + +βœ… **Success Criteria**: +- IDE accessible via browser with provided credentials +- ArgoCD dashboard showing deployed applications +- GitLab accessible with pre-configured repositories +- Backstage developer portal with application templates +- Sample applications deployed and healthy +- Monitoring dashboards available in Grafana + +### Troubleshooting + +#### IDE Environment Issues + +```bash +# If platform services are not accessible from within the IDE +cd /workspace/appmod-blueprints +./scripts/0-install.sh + +# This reconfigures: +# - Environment variables for platform services +# - kubectl cluster access +# - Tool installations and dependencies +# - Platform service connectivity verification +``` + +#### Platform Services Not Ready + +```bash +# Check ArgoCD application sync status +kubectl get applications -n argocd -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.sync.status}{"\t"}{.status.health.status}{"\n"}{end}' + +# Restart ArgoCD if needed +kubectl rollout restart deployment argocd-server -n argocd + +# Check GitLab pod status +kubectl get pods -n gitlab -l app=gitlab +``` + +## Scenario 2: CloudFormation IDE-Only + +**Best for**: Development environment testing, tool evaluation, minimal resource usage + +### What You Get + +- Browser-accessible VSCode IDE +- Pre-configured development tools +- Git and AWS CLI setup +- No EKS clusters or platform services (you can connect to existing ones) + +### Deployment Steps + +#### 1. Deploy IDE-Only Template + +```bash +# Download IDE-only template +curl -o ide-stack.yaml \ + https://raw.githubusercontent.com/aws-samples/java-on-aws/main/infrastructure/cfn/ide-stack.yaml + +# Create unique S3 bucket for deployment +CFN_S3=cfn-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]') +aws s3 mb s3://$CFN_S3 + +# Deploy IDE-only stack +aws cloudformation deploy \ + --stack-name ide-stack \ + --template-file ./ide-stack.yaml \ + --s3-bucket $CFN_S3 \ + --capabilities CAPABILITY_NAMED_IAM +``` + +#### 2. Configure Development Environment + +```bash +# Get IDE access details +IDE_URL=$(aws cloudformation describe-stacks --stack-name ide-stack --query "Stacks[0].Outputs[?OutputKey=='IdeUrl'].OutputValue" --output text) +IDE_PASSWORD=$(aws cloudformation describe-stacks --stack-name ide-stack --query "Stacks[0].Outputs[?OutputKey=='IdePassword'].OutputValue" --output text) + +echo "Access your IDE at: $IDE_URL" +echo "Password: $IDE_PASSWORD" +``` + +#### 3. Setup Platform Repository (from within IDE) + +```bash +# Configure workspace environment +cat << EOF > ~/environment/.envrc +export AWS_REGION=us-west-2 +export WORKSPACE_PATH="\$HOME/environment" +export AWS_ACCOUNT_ID=\$(aws sts get-caller-identity --output text --query Account) +export WORKSHOP_GIT_URL="https://github.com/aws-samples/appmod-blueprints.git" +export WORKSHOP_GIT_BRANCH="main" +EOF + +# If direnv is installed (recommended) +cd ~/environment && direnv allow + +# Or manually source the file +source ~/environment/.envrc + +# Verify environment variables +echo "AWS_REGION: $AWS_REGION" +echo "WORKSPACE_PATH: $WORKSPACE_PATH" + +# Clone the blueprints repository +git clone $WORKSHOP_GIT_URL $WORKSPACE_PATH/appmod-blueprints +cd $WORKSPACE_PATH/appmod-blueprints +git checkout $WORKSHOP_GIT_BRANCH +``` + +### Expected Outcomes + +βœ… **Success Criteria**: +- IDE accessible via browser +- Development tools functional (Git, AWS CLI, kubectl) +- Platform repository cloned and accessible +- Ready to connect to existing EKS clusters + +## Scenario 3: Platform Adoption + +**Best for**: Organizations implementing platform engineering practices + +### Prerequisites + +- Existing EKS cluster or ability to create one +- Understanding of GitOps workflows +- Kubernetes cluster admin access +- Helm 3.x installed + +### Step 1: Prepare EKS Cluster + +#### Option A: Create New EKS Cluster + +```bash +# Set cluster configuration +export CLUSTER_NAME=platform-adoption-cluster +export AWS_REGION=us-west-2 + +# Create EKS cluster (using AWS CLI or Console) +# Option 1: Use AWS Console to create EKS cluster with Auto Mode +# Option 2: Use AWS CLI +aws eks create-cluster \ + --name $CLUSTER_NAME \ + --version 1.31 \ + --role-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/eks-service-role \ + --resources-vpc-config subnetIds=subnet-xxx,subnet-yyy \ + --compute-config nodePoolsToCreate=system +``` + +#### Option B: Use Existing EKS Cluster + +```bash +# Update kubeconfig for existing cluster +aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME + +# Verify cluster access +kubectl get nodes +``` + +### Step 2: Install Core Platform Components + +#### 1. Install ArgoCD + +```bash +# Create ArgoCD namespace +kubectl create namespace argocd + +# Install ArgoCD +kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml + +# Wait for ArgoCD to be ready +kubectl wait --for=condition=available --timeout=300s deployment/argocd-server -n argocd + +# Get ArgoCD admin password +ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d) +echo "ArgoCD admin password: $ARGOCD_PASSWORD" +``` + +#### 2. Configure ArgoCD Access + +```bash +# Port forward to access ArgoCD UI +kubectl port-forward svc/argocd-server -n argocd 8080:443 & + +# Or configure ingress for production access +kubectl apply -f - < platform-spec.yaml < security-policies.yaml < platform-security/network-policies/kustomization.yaml < platform-security/network-policies/default-deny.yaml < platform-security/network-policies/argocd-policies.yaml < platform-security/network-policies-app.yaml < platform-security/pod-security-standards/namespace-security.yaml < platform-security/pod-security-app.yaml < platform-security/aws-load-balancer-controller.yaml < load-balancer-controller-service-account.yaml < external-dns-policy.json < external-dns-service-account.yaml < argocd-applications-backup.yaml +kubectl get appprojects -n argocd -o yaml > argocd-projects-backup.yaml + +# Backup Backstage configuration +kubectl get configmaps -n backstage -o yaml > backstage-config-backup.yaml +kubectl get secrets -n backstage -o yaml > backstage-secrets-backup.yaml + +# Create EBS snapshots for persistent volumes +aws ec2 describe-volumes --filters "Name=tag:kubernetes.io/cluster/prod-platform-hub,Values=owned" \ + --query 'Volumes[].VolumeId' --output text | \ + xargs -I {} aws ec2 create-snapshot --volume-id {} --description "Platform backup $(date +%Y-%m-%d)" +``` + +### Monitoring and Alerting + +```bash +# Configure critical alerts +kubectl apply -f - < 0 + for: 5m + labels: + severity: warning + annotations: + summary: "High pod restart rate detected" +EOF +``` + +### Maintenance Procedures + +```bash +# Update platform components +helm upgrade argocd argo/argo-cd --namespace argocd --reuse-values +helm upgrade backstage backstage/backstage --namespace backstage --reuse-values +helm upgrade monitoring prometheus-community/kube-prometheus-stack --namespace monitoring --reuse-values + +# Update EKS cluster +aws eks update-cluster-version --name prod-platform-hub --version 1.32 + +# Note: EKS Auto Mode handles node updates automatically +``` + +## Cleanup and Maintenance + +### Complete Cleanup + +#### CloudFormation Deployments + +```bash +# For CloudFormation workshop deployments +aws cloudformation delete-stack --stack-name platform-engineering-workshop + +# For IDE-only deployments +aws cloudformation delete-stack --stack-name ide-stack + +# Clean up S3 bucket used for deployment (if created) +aws s3 rb s3://$CFN_S3 --force + +# Comprehensive cleanup of any remaining platform resources +# This handles resources that may not be cleaned up by CloudFormation deletion +task taskcat-clean-deployment-force --region us-west-2 --prefix peeks +``` + +#### Platform Adoption Deployments + +```bash +# Remove ArgoCD applications (in reverse order of dependencies) +kubectl delete application test-app -n argocd +kubectl delete application monitoring -n argocd +kubectl delete application backstage -n argocd +kubectl delete application platform-security -n argocd + +# Remove ArgoCD itself +helm uninstall argocd -n argocd +kubectl delete namespace argocd + +# Remove other platform components +helm uninstall monitoring -n monitoring +kubectl delete namespace monitoring + +# Clean up persistent volumes +kubectl get pv | grep -E "(argocd|backstage|monitoring)" | awk '{print $1}' | xargs kubectl delete pv + +# Remove custom resources +kubectl delete crd -l app.kubernetes.io/part-of=argocd +``` + +#### Custom Implementation Cleanup + +```bash +# Remove all ArgoCD applications +kubectl get applications -n argocd -o name | xargs kubectl delete + +# Remove platform applications in dependency order +helm uninstall backstage -n backstage +helm uninstall monitoring -n monitoring +helm uninstall crossplane -n crossplane-system +helm uninstall aws-load-balancer-controller -n kube-system +helm uninstall external-dns -n kube-system +helm uninstall argocd -n argocd + +# Clean up namespaces +kubectl delete namespace backstage monitoring crossplane-system argocd + +# Remove EKS cluster (if created for this platform) +aws eks delete-cluster --name prod-platform-hub + +# Clean up any remaining AWS resources +# Use the comprehensive cleanup tool from platform-engineering-on-eks +curl -o cleanup.sh https://raw.githubusercontent.com/aws-samples/platform-engineering-on-eks/main/taskcat/scripts/enhanced-cleanup/enhanced-cleanup.sh +chmod +x cleanup.sh +./cleanup.sh --force --yes --region us-west-2 --prefix platform +``` + +### Maintenance Procedures + +#### Regular Maintenance Tasks + +```bash +# Update platform components +helm repo update + +# Update ArgoCD +helm upgrade argocd argo/argo-cd --namespace argocd --reuse-values + +# Update monitoring stack +helm upgrade monitoring prometheus-community/kube-prometheus-stack --namespace monitoring --reuse-values + +# Update Backstage +helm upgrade backstage backstage/backstage --namespace backstage --reuse-values + +# Update AWS Load Balancer Controller +helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --reuse-values +``` + +#### Backup Procedures + +```bash +# Backup ArgoCD configuration +kubectl get applications -n argocd -o yaml > argocd-applications-backup-$(date +%Y%m%d).yaml +kubectl get appprojects -n argocd -o yaml > argocd-projects-backup-$(date +%Y%m%d).yaml + +# Backup Backstage configuration +kubectl get configmaps -n backstage -o yaml > backstage-config-backup-$(date +%Y%m%d).yaml +kubectl get secrets -n backstage -o yaml > backstage-secrets-backup-$(date +%Y%m%d).yaml + +# Backup monitoring configuration +kubectl get prometheusrules -n monitoring -o yaml > monitoring-rules-backup-$(date +%Y%m%d).yaml +kubectl get servicemonitors --all-namespaces -o yaml > service-monitors-backup-$(date +%Y%m%d).yaml + +# Create EBS snapshots for persistent volumes +aws ec2 describe-volumes --filters "Name=tag:kubernetes.io/cluster/prod-platform-hub,Values=owned" \ + --query 'Volumes[].VolumeId' --output text | \ + xargs -I {} aws ec2 create-snapshot --volume-id {} --description "Platform backup $(date +%Y-%m-%d)" +``` + +#### Disaster Recovery + +```bash +# Restore ArgoCD applications from backup +kubectl apply -f argocd-applications-backup-YYYYMMDD.yaml +kubectl apply -f argocd-projects-backup-YYYYMMDD.yaml + +# Restore Backstage configuration +kubectl apply -f backstage-config-backup-YYYYMMDD.yaml +kubectl apply -f backstage-secrets-backup-YYYYMMDD.yaml + +# Restore monitoring configuration +kubectl apply -f monitoring-rules-backup-YYYYMMDD.yaml +kubectl apply -f service-monitors-backup-YYYYMMDD.yaml + +# Verify all applications are syncing +kubectl get applications -n argocd -o wide +``` + +## Cost Optimization + +### Resource Right-Sizing + +```bash +# Analyze resource usage +kubectl top nodes +kubectl top pods --all-namespaces + +# Implement resource quotas +kubectl apply -f - <> ~/.bashrc # for bash +echo 'eval "$(direnv hook zsh)"' >> ~/.zshrc # for zsh +# Then restart your shell or source the profile +``` + +#### ArgoCD Sync Failures + +```bash +# Check application status +kubectl describe application -n argocd + +# View sync logs +kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server + +# Force refresh and sync +argocd app refresh +argocd app sync +``` + +#### Backstage Connection Issues + +```bash +# Check Backstage logs +kubectl logs -n backstage -l app.kubernetes.io/name=backstage + +# Verify database connectivity +kubectl exec -n backstage -it deployment/backstage -- nc -zv postgresql 5432 + +# Check configuration +kubectl get configmap backstage-config -n backstage -o yaml +``` + +#### Resource Constraints + +```bash +# Check node resources +kubectl describe nodes + +# View resource usage +kubectl top nodes +kubectl top pods --all-namespaces --sort-by=memory + +# Check for evicted pods +kubectl get pods --all-namespaces --field-selector=status.phase=Failed +``` + +### Performance Optimization + +```bash +# Optimize ArgoCD performance +kubectl patch configmap argocd-cmd-params-cm -n argocd --patch '{"data":{"controller.repo.server.timeout.seconds":"300","controller.self.heal.timeout.seconds":"30"}}' + +# Configure resource limits +kubectl patch deployment argocd-server -n argocd --patch '{"spec":{"template":{"spec":{"containers":[{"name":"argocd-server","resources":{"limits":{"cpu":"500m","memory":"1Gi"},"requests":{"cpu":"250m","memory":"512Mi"}}}]}}}}' +``` + +## Next Steps + +After successful platform deployment: + +1. **Team Onboarding**: Train development teams on platform capabilities and workflows +2. **Application Migration**: Begin migrating existing applications to the platform +3. **Custom Templates**: Create organization-specific Backstage templates +4. **Security Integration**: Integrate with organizational identity providers and security tools +5. **Compliance Validation**: Ensure platform meets regulatory requirements +6. **Continuous Improvement**: Establish feedback loops and platform evolution processes + +For detailed application development patterns and platform usage, explore the application blueprints in the `applications/` directory. \ No newline at end of file diff --git a/GETTING-STARTED.md b/GETTING-STARTED.md index bc8a8f1b8..902209b76 100644 --- a/GETTING-STARTED.md +++ b/GETTING-STARTED.md @@ -14,31 +14,35 @@ The blueprints include: ## Why Use These Blueprints? -### For Developers +### πŸ‘©β€πŸ’» For Developers + - **Faster Time-to-Production**: Pre-configured templates and automated workflows - **Consistent Patterns**: Standardized approaches across all applications - **Self-Service Capabilities**: Deploy and manage applications independently - **Built-in Best Practices**: Security, monitoring, and scalability included by default -### For Platform Teams +### 🏒 For Platform Teams + - **Reference Implementation**: Production-ready platform engineering patterns - **Extensible Architecture**: Customize and extend for organizational needs - **Operational Excellence**: Integrated monitoring, logging, and alerting - **Developer Productivity**: Reduce cognitive load and improve developer experience -### For Organizations +### βš™οΈ For Organizations + - **Accelerated Modernization**: Proven patterns for application transformation - **Reduced Risk**: Battle-tested configurations and security practices - **Improved Governance**: Consistent policies and compliance across applications - **Cost Optimization**: Efficient resource utilization and automated scaling -## 40-Minute Platform Evaluation +## Platform Evaluation Quick Start -This quick start helps you evaluate the platform capabilities and understand the developer experience in under 40 minutes. +This quick start helps you evaluate the platform capabilities and understand the developer experience. Total time depends on your starting point - 30 minutes if infrastructure is already deployed, or 45+ minutes if deploying from scratch. ### Prerequisites (5 minutes) #### If Using Existing Infrastructure + If you have the platform infrastructure already deployed (via CloudFormation template, manually deployed, or at an AWS event): - Access to the deployed VSCode IDE environment @@ -46,21 +50,25 @@ If you have the platform infrastructure already deployed (via CloudFormation tem - Basic familiarity with Kubernetes and GitOps concepts #### If Starting Fresh + You'll need to deploy the infrastructure first using one of these options: **Option A: CloudFormation Template (Recommended for Public Users)** + - **What**: Pre-generated CloudFormation template for complete workshop setup - **When**: First-time evaluation or workshop participation - **Requirements**: AWS CLI, basic AWS knowledge - **Time**: ~45 minutes for complete deployment **Option B: IDE-Only CloudFormation Template** + - **What**: Lightweight template that deploys only VSCode IDE environment - **When**: You have existing platform services or want to explore platform concepts -- **Requirements**: AWS CLI, basic AWS knowledge -- **Time**: ~15 minutes for IDE deployment +- **Requirements**: AWS CLI, basic AWS knowledge +- **Time**: ~10 minutes for IDE deployment **Option C: Manual Platform Setup** + - **What**: Step-by-step manual deployment using provided guides - **When**: Custom requirements or production-like deployment - **Requirements**: Advanced AWS/Kubernetes knowledge @@ -74,6 +82,7 @@ You'll need to deploy the infrastructure first using one of these options: - Available from: [GitHub Releases](https://github.com/aws-samples/appmod-blueprints/releases) or AWS Workshop Studio 2. **Deploy via AWS Console**: + ```bash # Option 1: AWS Console # 1. Open CloudFormation in AWS Console @@ -100,6 +109,7 @@ You'll need to deploy the infrastructure first using one of these options: - **IDE Password**: Auto-generated access credentials 5. **Access Platform Services**: Once in the IDE environment: + ```bash # Get all platform service URLs and credentials ./scripts/6-tools-urls.sh @@ -213,24 +223,28 @@ Choose the approach that best fits your evaluation or adoption needs: | **βš™οΈ Custom Implementation** | 1-2 weeks | High | Production deployment | Advanced platform engineering | ### CloudFormation Workshop (Recommended for First-Time Users) + - **What**: Deploy complete platform using pre-generated CloudFormation template - **When**: First-time evaluation, workshops, or comprehensive platform assessment - **Includes**: Full platform stack, VSCode IDE, sample applications, GitOps workflows - **Next Steps**: Explore platform capabilities, plan organizational adoption ### Platform Adoption + - **What**: Implement the platform for organizational use - **When**: Ready to adopt platform engineering practices - **Includes**: Full platform deployment, team training, application migration - **Next Steps**: Customize platform components, onboard development teams ### Developer Onboarding + - **What**: Learn platform workflows and self-service capabilities - **When**: Onboarding developers to existing platform - **Includes**: Application templates, GitOps workflows, monitoring practices - **Next Steps**: Deploy production applications, contribute to platform evolution ### Custom Implementation + - **What**: Adapt platform for specific organizational requirements - **When**: Production deployment with custom needs - **Includes**: Platform customization, security integration, operational procedures @@ -241,31 +255,37 @@ Choose the approach that best fits your evaluation or adoption needs: ### Supported Technologies #### .NET Applications + - **Northwind Sample**: Clean architecture demonstration with Entity Framework - **Microservices**: Service-to-service communication patterns - **API Gateway**: Centralized API management and routing -#### Java Applications +#### Java Applications + - **Spring Boot**: Microservices with Spring Cloud patterns - **Observability**: Integrated tracing, metrics, and logging - **Data Access**: JPA patterns with PostgreSQL integration #### Node.js Applications + - **Express APIs**: RESTful service patterns with modern tooling - **Event-Driven**: Message queue integration with SQS/SNS - **Frontend Integration**: React/Vue.js deployment patterns #### Python Applications + - **FastAPI Services**: High-performance async API patterns - **Data Processing**: ETL pipelines with AWS services - **Machine Learning**: MLOps patterns for model deployment #### Rust Applications + - **High-Performance Services**: Memory-safe system programming - **WebAssembly**: Browser and edge deployment patterns - **Async Patterns**: Tokio-based concurrent applications #### Go Applications + - **Cloud-Native Services**: Kubernetes-native application patterns - **gRPC Services**: High-performance service communication - **CLI Tools**: Platform tooling and automation utilities @@ -273,18 +293,21 @@ Choose the approach that best fits your evaluation or adoption needs: ### Architecture Patterns #### Microservices Architecture + - Service mesh integration with Istio - Inter-service communication patterns - Distributed tracing and observability - Circuit breaker and retry patterns #### Event-Driven Architecture + - Message queue integration (SQS, SNS, EventBridge) - Event sourcing and CQRS patterns - Saga pattern for distributed transactions - Dead letter queue handling #### Serverless Integration + - Lambda function deployment patterns - API Gateway integration - Event-driven serverless workflows @@ -295,18 +318,21 @@ Choose the approach that best fits your evaluation or adoption needs: After completing the evaluation, choose your path forward: ### 🎯 Platform Adopters + 1. **Architecture Review**: Study [ARCHITECTURE.md](ARCHITECTURE.md) for platform design details 2. **Deployment Planning**: Review [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) for production setup 3. **Team Preparation**: Plan developer onboarding and training programs 4. **Customization**: Identify platform modifications for your organization ### πŸ‘©β€πŸ’» Developers + 1. **Template Exploration**: Try different application blueprints 2. **Workflow Mastery**: Practice GitOps deployment patterns 3. **Platform Services**: Learn to use Backstage, monitoring, and security tools 4. **Contribution**: Add new templates or improve existing patterns ### βš™οΈ Platform Engineers + 1. **Component Deep Dive**: Understand Crossplane compositions and platform APIs 2. **Customization**: Extend platform capabilities for organizational needs 3. **Operations**: Set up monitoring, alerting, and maintenance procedures @@ -316,7 +342,8 @@ After completing the evaluation, choose your path forward: ### IDE Configuration Issues -**Platform Services Not Available or Environment Variables Missing** +#### Platform Services Not Available or Environment Variables Missing + ```bash # Re-run the configuration entrypoint script ./scripts/0-install.sh @@ -328,7 +355,8 @@ After completing the evaluation, choose your path forward: # - Verify platform service connectivity ``` -**IDE Environment Corrupted or Incomplete Setup** +#### IDE Environment Corrupted or Incomplete Setup + ```bash # From within the VSCode IDE terminal, re-run setup cd /workspace/appmod-blueprints @@ -342,7 +370,8 @@ echo $BACKSTAGE_URL ### Platform Access Problems -**Cannot Access Backstage/ArgoCD** +#### Cannot Access Backstage/ArgoCD + ```bash # Check service status kubectl get services -n backstage @@ -356,7 +385,8 @@ kubectl get pods -n backstage kubectl get pods -n argocd ``` -**GitLab Authentication Issues** +#### GitLab Authentication Issues + ```bash # Verify GitLab service kubectl get services -n gitlab @@ -370,7 +400,8 @@ curl -k $GITLAB_URL/api/v4/version ### Application Deployment Issues -**ArgoCD Sync Failures** +#### ArgoCD Sync Failures + ```bash # Check ArgoCD application status kubectl get applications -n argocd @@ -382,7 +413,8 @@ kubectl describe application -n argocd kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server ``` -**Application Pod Failures** +#### Application Pod Failures + ```bash # Check pod status kubectl get pods -n diff --git a/TODO.md b/TODO.md index af605a5d1..dbf1f24c3 100644 --- a/TODO.md +++ b/TODO.md @@ -92,4 +92,11 @@ task taskcat-clean-deployment - [x] cloudwatch logs groups - like /aws/codebuild/PEEKSGITIAMStackDeployProje-30HWWmiBcgx0 or /aws/lambda/tCaT-peeks-workshop-test--IDEPEEKSIdePasswordExpor-OOmtoIML4oGw or tCaT-peeks-workshop-test-fleet-workshop-test-92b4118d493d47dcb827190d4e5ac6b9-IDEPEEKSIdeLogGroup3808F7B1-xMtRAbEgY8Ho -- Does WORKSHOP_ID=28c283c1-1d60-43fa-a604-4e983e0e8038 is the goor one ? \ No newline at end of file +- Does WORKSHOP_ID=28c283c1-1d60-43fa-a604-4e983e0e8038 is the goor one ? +- update region in backstage templates + + + +- #### Production Security Hardening - the hardening should be done, in a gitops manner, not using kubectl +- same for installing network policy, load balancer controller, all should be done with gitops, and maybe gitops-bridge if we need dependency with resources deployed in terraform +- add also the cleanup after destroy with tsk taskcat-clean-deployment-force, that can help remove any remaining aws resources deploy by the platform diff --git a/TROUBLESHOOTING.md b/TROUBLESHOOTING.md new file mode 100644 index 000000000..4b59c1ec9 --- /dev/null +++ b/TROUBLESHOOTING.md @@ -0,0 +1,761 @@ +# Troubleshooting Guide - Application Modernization Blueprints + +## Overview + +This guide provides solutions to common issues encountered when using the Application Modernization Blueprints platform. Issues are organized by symptoms to help you quickly identify and resolve problems during platform operations and application development. + +## Quick Diagnostic Commands + +Before diving into specific issues, run these commands to gather basic information: + +```bash +# Check platform service status +kubectl get pods -A | grep -E "(argocd|backstage|gitlab|grafana)" | grep -v Running + +# Check ArgoCD applications +kubectl get applications -n argocd -o custom-columns="NAME:.metadata.name,SYNC:.status.sync.status,HEALTH:.status.health.status" + +# Check cluster connectivity +kubectl get nodes +kubectl cluster-info + +# Check recent events +kubectl get events --sort-by='.lastTimestamp' | tail -10 +``` + +## Platform Access Issues + +### Cannot Access Platform Services + +**Symptoms:** +- Backstage, ArgoCD, or GitLab URLs return connection errors +- Services show as running but are not accessible +- Authentication failures across platform services + +**Diagnostic Commands:** +```bash +# Check service status +kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server +kubectl get pods -n backstage -l app.kubernetes.io/name=backstage +kubectl get pods -n gitlab -l app=gitlab-webservice-default + +# Check ingress configuration +kubectl get ingress -A +kubectl get services -A --field-selector spec.type=LoadBalancer + +# Check platform URLs script +./scripts/6-tools-urls.sh +``` + +**Common Causes & Solutions:** + +1. **CloudFront Distribution Issues** + ```bash + # Check CloudFront distribution status + aws cloudfront list-distributions --query 'DistributionList.Items[?contains(Origins.Items[0].Id, `http-origin`)]' + + # Get current domain name + DOMAIN_NAME=$(kubectl get secret ${RESOURCE_PREFIX}-hub-cluster -n argocd -o jsonpath='{.metadata.annotations.ingress_domain_name}' 2>/dev/null) + echo "Platform domain: $DOMAIN_NAME" + ``` + +2. **Service Pod Issues** + ```bash + # Restart problematic services + kubectl rollout restart deployment argocd-server -n argocd + kubectl rollout restart deployment backstage -n backstage + kubectl rollout restart deployment gitlab-webservice-default -n gitlab + + # Check pod logs for errors + kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server --tail=50 + kubectl logs -n backstage -l app.kubernetes.io/name=backstage --tail=50 + ``` + +3. **Authentication Service Issues** + ```bash + # Check Keycloak status + kubectl get pods -n keycloak + kubectl logs -n keycloak -l app.kubernetes.io/name=keycloak --tail=50 + + # Restart Keycloak if needed + kubectl rollout restart deployment keycloak -n keycloak + ``` + +### IDE Environment Not Working + +**Symptoms:** +- Cannot access development environment +- Environment variables not set correctly +- Tools not available in IDE + +**Diagnostic Commands:** +```bash +# Check environment variables +env | grep -E "(AWS_|CLUSTER_|RESOURCE_|WORKSPACE_)" + +# Check if bootstrap script ran +ls -la /workspace/appmod-blueprints/scripts/ +cat /workspace/appmod-blueprints/.bootstrap-complete 2>/dev/null || echo "Bootstrap not completed" + +# Check tool availability +kubectl version --client +aws --version +argocd version --client 2>/dev/null || echo "ArgoCD CLI not available" +``` + +**Solutions:** +```bash +# Re-run bootstrap script +cd /workspace/appmod-blueprints +./scripts/0-install.sh + +# Manually set environment variables if needed +source /etc/profile.d/workshop.sh + +# Source bashrc configurations +if [ -d /home/ec2-user/.bashrc.d ]; then + for file in /home/ec2-user/.bashrc.d/*.sh; do + [ -f "$file" ] && source "$file" + done +fi + +# Verify cluster access +kubectl get nodes +``` + +## GitOps and ArgoCD Issues + +### ArgoCD Applications Not Syncing + +**Symptoms:** +- Applications stuck in "OutOfSync" state +- Sync operations fail or timeout +- Applications show "Progressing" for extended periods + +**Diagnostic Commands:** +```bash +# Check application status +kubectl get applications -n argocd -o wide + +# Check specific application details +kubectl describe application -n argocd + +# Check ArgoCD server logs +kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server --tail=100 + +# Check repository server logs +kubectl logs -n argocd -l app.kubernetes.io/name=argocd-repo-server --tail=100 +``` + +**Solutions:** + +1. **Force Application Sync** + ```bash + # Using ArgoCD CLI (if available) + argocd app sync --force + + # Using kubectl + kubectl patch application -n argocd --type merge -p '{"operation":{"sync":{"revision":"HEAD"}}}' + + # Terminate stuck operations + kubectl patch application -n argocd --type merge -p '{"operation":null}' + ``` + +2. **Repository Access Issues** + ```bash + # Check repository secrets + kubectl get secrets -n argocd -l argocd.argoproj.io/secret-type=repository + + # Test repository connectivity + kubectl exec -n argocd deployment/argocd-repo-server -- git ls-remote https://github.com/aws-samples/appmod-blueprints.git + + # Refresh repository cache + kubectl delete pods -n argocd -l app.kubernetes.io/name=argocd-repo-server + ``` + +3. **Resource Conflicts** + ```bash + # Check for resource conflicts + kubectl get events -n --sort-by='.lastTimestamp' + + # Check resource quotas + kubectl describe resourcequotas -n + + # Check for stuck resources + kubectl get all -n | grep -E "(Terminating|Pending)" + ``` + +### Cluster Registration Issues + +**Symptoms:** +- Spoke clusters not visible in ArgoCD +- Cross-cluster deployments failing +- Cluster secrets missing + +**Diagnostic Commands:** +```bash +# Check cluster secrets in ArgoCD +kubectl get secrets -n argocd | grep cluster- + +# Check cluster registration script logs +kubectl logs -n argocd job/cluster-registration 2>/dev/null || echo "No cluster registration job found" + +# Verify spoke cluster access +kubectl config get-contexts +``` + +**Solutions:** +```bash +# Re-register spoke clusters +./scripts/3-register-terraform-spoke-clusters.sh dev +./scripts/3-register-terraform-spoke-clusters.sh prod + +# Manually add cluster if script fails +argocd cluster add --name + +# Check cluster connectivity +argocd cluster list +``` + +## Application Development Issues + +### Backstage Templates Not Working + +**Symptoms:** +- Cannot create new applications from templates +- Template scaffolding fails +- Generated repositories are empty or malformed + +**Diagnostic Commands:** +```bash +# Check Backstage pod status +kubectl get pods -n backstage -l app.kubernetes.io/name=backstage + +# Check Backstage logs +kubectl logs -n backstage -l app.kubernetes.io/name=backstage --tail=100 + +# Check template configuration +kubectl get configmap backstage-app-config -n backstage -o yaml | grep -A 20 "catalog:" +``` + +**Solutions:** +```bash +# Restart Backstage +kubectl rollout restart deployment backstage -n backstage + +# Check template repository access +kubectl exec -n backstage deployment/backstage -- curl -s https://github.com/aws-samples/appmod-blueprints/tree/main/backstage/examples/template + +# Verify GitLab integration +kubectl get secrets -n backstage | grep gitlab +kubectl logs -n backstage -l app.kubernetes.io/name=backstage | grep -i gitlab +``` + +### Application Deployment Failures + +**Symptoms:** +- Applications created in Backstage don't deploy +- GitOps workflows not triggering +- Applications stuck in initial state + +**Diagnostic Commands:** +```bash +# Check if application was created in ArgoCD +kubectl get applications -n argocd | grep + +# Check GitLab repository creation +# Access GitLab UI and verify repository exists + +# Check ArgoCD application events +kubectl describe application -n argocd +``` + +**Solutions:** +```bash +# Manually create ArgoCD application if missing +kubectl apply -f - < + namespace: argocd +spec: + project: default + source: + repoURL: + targetRevision: HEAD + path: deployment + destination: + server: https://kubernetes.default.svc + namespace: + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true +EOF + +# Check GitLab webhook configuration +# Verify webhook is configured to trigger ArgoCD sync +``` + +### Build and CI/CD Issues + +**Symptoms:** +- GitLab CI/CD pipelines failing +- Container builds not completing +- Image push failures + +**Diagnostic Commands:** +```bash +# Check GitLab runner status +kubectl get pods -n gitlab-runner 2>/dev/null || echo "GitLab runner not found" + +# Check GitLab CI/CD logs in GitLab UI +# Navigate to Project > CI/CD > Pipelines + +# Check container registry access +kubectl get secrets -n | grep regcred +``` + +**Solutions:** +```bash +# Restart GitLab runners +kubectl rollout restart deployment gitlab-runner -n gitlab-runner 2>/dev/null || echo "No GitLab runner deployment found" + +# Check GitLab configuration +kubectl get configmap gitlab-gitlab -n gitlab -o yaml | grep -A 10 "registry:" + +# Verify container registry credentials +kubectl create secret docker-registry regcred \ + --docker-server= \ + --docker-username= \ + --docker-password= \ + --docker-email= +``` + +## Infrastructure and Scaling Issues + +### Auto Mode Scaling Problems + +**Symptoms:** +- Pods stuck in Pending state +- Insufficient resources despite Auto Mode +- Nodes not scaling up + +**Diagnostic Commands:** +```bash +# Check Auto Mode configuration +aws eks describe-cluster --name --query 'cluster.computeConfig' + +# Check pod resource requests +kubectl describe pod + +# Check node capacity +kubectl describe nodes | grep -A 5 "Capacity:" + +# Check cluster autoscaler logs (if applicable) +kubectl logs -n kube-system -l app=cluster-autoscaler 2>/dev/null || echo "Cluster autoscaler not found" +``` + +**Solutions:** +```bash +# Verify Auto Mode is enabled +kubectl get nodes -o yaml | grep -E "(compute-type|nodegroup)" + +# Check if pods have appropriate resource requests +kubectl patch deployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}}}' + +# Force pod rescheduling +kubectl delete pod + +# Check for resource quotas limiting scaling +kubectl describe resourcequotas -A +``` + +### Storage Issues + +**Symptoms:** +- Persistent volumes not mounting +- Storage class issues +- Database connectivity problems + +**Diagnostic Commands:** +```bash +# Check storage classes +kubectl get storageclass + +# Check persistent volumes +kubectl get pv,pvc -A + +# Check EBS CSI driver +kubectl get pods -n kube-system -l app=ebs-csi-controller +``` + +**Solutions:** +```bash +# Restart EBS CSI driver +kubectl rollout restart deployment ebs-csi-controller -n kube-system + +# Check PVC events +kubectl describe pvc -n + +# Verify storage class configuration +kubectl describe storageclass gp3 +``` + +## Monitoring and Observability Issues + +### Grafana Not Showing Data + +**Symptoms:** +- Grafana dashboards empty +- No metrics data available +- Prometheus not scraping targets + +**Diagnostic Commands:** +```bash +# Check Grafana pod status +kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana + +# Check Prometheus targets +kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 & +# Visit http://localhost:9090/targets + +# Check Prometheus configuration +kubectl get configmap -n monitoring | grep prometheus +``` + +**Solutions:** +```bash +# Restart monitoring stack +kubectl rollout restart deployment grafana -n monitoring +kubectl rollout restart statefulset prometheus-prometheus-kube-prometheus-prometheus -n monitoring + +# Check service monitors +kubectl get servicemonitor -A + +# Verify metrics endpoints +kubectl get endpoints -n monitoring +``` + +### Log Collection Issues + +**Symptoms:** +- Logs not appearing in centralized logging +- Fluent Bit or log collectors not working +- High log volume causing issues + +**Diagnostic Commands:** +```bash +# Check log collection pods +kubectl get pods -n amazon-cloudwatch 2>/dev/null || echo "CloudWatch logging not configured" +kubectl get pods -n logging 2>/dev/null || echo "Logging namespace not found" + +# Check log collector configuration +kubectl get configmap -n amazon-cloudwatch | grep fluent +``` + +**Solutions:** +```bash +# Restart log collectors +kubectl rollout restart daemonset fluent-bit -n amazon-cloudwatch 2>/dev/null + +# Check log collector logs +kubectl logs -n amazon-cloudwatch -l k8s-app=fluent-bit --tail=50 + +# Adjust log levels if needed +kubectl patch configmap fluent-bit-config -n amazon-cloudwatch --patch '{"data":{"fluent-bit.conf":"[SERVICE]\n Log_Level info"}}' +``` + +## Security and Access Issues + +### RBAC and Permission Problems + +**Symptoms:** +- Users cannot access certain resources +- Service accounts lack necessary permissions +- Pod security policy violations + +**Diagnostic Commands:** +```bash +# Check current user permissions +kubectl auth can-i --list + +# Check service account permissions +kubectl describe serviceaccount -n + +# Check role bindings +kubectl get rolebindings,clusterrolebindings -A | grep + +# Check pod security policies +kubectl get psp 2>/dev/null || echo "Pod Security Policies not configured" +``` + +**Solutions:** +```bash +# Create necessary role binding +kubectl create rolebinding \ + --clusterrole= \ + --user= \ + --namespace= + +# Check pod security standards +kubectl get namespaces -o yaml | grep -A 3 "pod-security" + +# Update service account permissions +kubectl patch serviceaccount -n \ + -p '{"metadata":{"annotations":{"eks.amazonaws.com/role-arn":""}}}' +``` + +### Network Policy Issues + +**Symptoms:** +- Services cannot communicate +- Network connectivity blocked +- DNS resolution failures + +**Diagnostic Commands:** +```bash +# Check network policies +kubectl get networkpolicies -A + +# Test connectivity between pods +kubectl run -it --rm debug --image=busybox --restart=Never -- wget -qO- http://..svc.cluster.local + +# Check DNS resolution +kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup ..svc.cluster.local +``` + +**Solutions:** +```bash +# Temporarily disable network policies for testing +kubectl delete networkpolicy --all -n + +# Create allow-all network policy for debugging +kubectl apply -f - < +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress + ingress: + - {} + egress: + - {} +EOF + +# Check CoreDNS configuration +kubectl get configmap coredns -n kube-system -o yaml +``` + +## Performance Optimization + +### High Resource Usage + +**Symptoms:** +- Platform services consuming excessive CPU/memory +- Slow response times +- Frequent pod restarts due to resource limits + +**Diagnostic Commands:** +```bash +# Check resource usage +kubectl top pods -A --sort-by=cpu +kubectl top pods -A --sort-by=memory + +# Check resource limits and requests +kubectl describe pods -A | grep -A 5 -B 5 "Limits:\|Requests:" + +# Check for memory leaks +kubectl logs -n | grep -i "out of memory\|oom" +``` + +**Solutions:** +```bash +# Adjust resource limits +kubectl patch deployment -n -p '{"spec":{"template":{"spec":{"containers":[{"name":"","resources":{"limits":{"cpu":"1000m","memory":"2Gi"},"requests":{"cpu":"500m","memory":"1Gi"}}}]}}}}' + +# Enable horizontal pod autoscaling +kubectl autoscale deployment --cpu-percent=70 --min=2 --max=10 + +# Check for resource quotas +kubectl describe resourcequotas -A +``` + +### Database Performance Issues + +**Symptoms:** +- Slow database queries +- Connection timeouts +- Database pods restarting frequently + +**Diagnostic Commands:** +```bash +# Check database pod status +kubectl get pods -n | grep -E "(postgres|mysql|redis)" + +# Check database logs +kubectl logs -n --tail=100 + +# Check database connections +kubectl exec -it -n -- psql -U -c "SELECT count(*) FROM pg_stat_activity;" +``` + +**Solutions:** +```bash +# Increase database resources +kubectl patch statefulset -n -p '{"spec":{"template":{"spec":{"containers":[{"name":"","resources":{"limits":{"cpu":"2000m","memory":"4Gi"}}}]}}}}' + +# Check database configuration +kubectl get configmap -n -o yaml + +# Optimize database settings +kubectl patch configmap -n --patch '{"data":{"postgresql.conf":"max_connections = 200\nshared_buffers = 256MB"}}' +``` + +## Recovery and Maintenance + +### Platform Recovery After Failure + +**Symptoms:** +- Multiple services down +- Cluster in degraded state +- Data corruption or loss + +**Recovery Steps:** +```bash +# 1. Assess current state +kubectl get pods -A | grep -v Running +kubectl get nodes +kubectl get applications -n argocd + +# 2. Restart core services in order +kubectl rollout restart deployment coredns -n kube-system +kubectl rollout restart deployment argocd-server -n argocd +kubectl rollout restart deployment argocd-application-controller -n argocd + +# 3. Force sync critical applications +kubectl patch application bootstrap -n argocd --type merge -p '{"operation":{"sync":{"revision":"HEAD"}}}' + +# 4. Check application health +kubectl get applications -n argocd -o custom-columns="NAME:.metadata.name,SYNC:.status.sync.status,HEALTH:.status.health.status" + +# 5. Re-run bootstrap if needed +cd /workspace/appmod-blueprints +./scripts/0-install.sh +``` + +### Backup and Restore + +**Symptoms:** +- Need to backup platform configuration +- Restore from previous state +- Migrate to new cluster + +**Backup Commands:** +```bash +# Backup ArgoCD applications +kubectl get applications -n argocd -o yaml > argocd-applications-backup.yaml + +# Backup secrets +kubectl get secrets -A -o yaml > secrets-backup.yaml + +# Backup configmaps +kubectl get configmaps -A -o yaml > configmaps-backup.yaml + +# Backup persistent volume claims +kubectl get pvc -A -o yaml > pvc-backup.yaml +``` + +**Restore Commands:** +```bash +# Restore ArgoCD applications +kubectl apply -f argocd-applications-backup.yaml + +# Restore secrets (be careful with sensitive data) +kubectl apply -f secrets-backup.yaml + +# Restore configmaps +kubectl apply -f configmaps-backup.yaml +``` + +## Getting Additional Help + +### Escalation Paths + +1. **Platform Team Support** + - Check internal documentation and runbooks + - Contact platform engineering team + - Review platform architecture documentation + +2. **Community Resources** + - [ArgoCD Community](https://github.com/argoproj/argo-cd/discussions) + - [Backstage Community](https://github.com/backstage/backstage/discussions) + - [Kubernetes Slack](https://kubernetes.slack.com/) + +3. **AWS Support** + - For EKS-related issues, create AWS support case + - Include cluster name, region, and error messages + - Check [AWS EKS Best Practices](https://aws.github.io/aws-eks-best-practices/) + +### Collecting Diagnostic Information + +Before seeking help, collect this information: + +```bash +# Create diagnostic bundle +mkdir -p platform-diagnostics/$(date +%Y%m%d-%H%M%S) +cd platform-diagnostics/$(date +%Y%m%d-%H%M%S) + +# Basic cluster information +kubectl cluster-info > cluster-info.txt +kubectl get nodes -o wide > nodes.txt +kubectl version > version.txt + +# Platform service status +kubectl get pods -A > all-pods.txt +kubectl get applications -n argocd -o wide > argocd-apps.txt +kubectl get ingress -A > ingress.txt +kubectl get services -A > services.txt + +# Recent events +kubectl get events --sort-by='.lastTimestamp' > events.txt + +# Logs from key services +kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server --tail=200 > argocd-server.log +kubectl logs -n backstage -l app.kubernetes.io/name=backstage --tail=200 > backstage.log +kubectl logs -n gitlab -l app=gitlab-webservice-default --tail=200 > gitlab.log + +# Configuration +kubectl get configmaps -A -o yaml > configmaps.yaml +kubectl get secrets -A -o yaml > secrets.yaml # Be careful with sensitive data + +# Create archive +cd .. +tar -czf platform-diagnostics-$(date +%Y%m%d-%H%M%S).tar.gz $(date +%Y%m%d-%H%M%S)/ +``` + +### Log Analysis + +```bash +# Search for common error patterns +grep -r -i "error\|fail\|exception" platform-diagnostics/ + +# Check for resource issues +grep -r -i "insufficient\|resource\|memory\|cpu" platform-diagnostics/ + +# Look for network issues +grep -r -i "connection\|timeout\|dns\|network" platform-diagnostics/ + +# Check authentication problems +grep -r -i "auth\|permission\|forbidden\|unauthorized" platform-diagnostics/ +``` + +This diagnostic information will help support teams identify and resolve issues more efficiently. \ No newline at end of file From 9cdaff5b9ad7e9d9a20a8a4a525555f223e7a9cd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Allamand?= Date: Mon, 22 Sep 2025 15:08:33 +0200 Subject: [PATCH 3/3] Add development tooling and expand documentation - Add pre-commit configuration and GitHub workflow - Add comprehensive AI context documentation - Add Taskfile for task automation - Expand troubleshooting guide significantly - Update deployment and getting started guides - Resolve TODO merge conflicts - Add setup script for pre-commit hooks --- .github/workflows/pre-commit.yml | 36 ++ .pre-commit-config.yaml | 25 ++ AI-CONTEXT.md | 493 +++++++++++++++++++++ DEPLOYMENT-GUIDE.md | 12 + GETTING-STARTED.md | 12 + README.md | 12 + TODO.md | 4 +- TROUBLESHOOTING.md | 725 ++++++++++++++++++++++++++++++- Taskfile.yaml | 211 +++++++++ amazon-q-target-file.md | 140 +++++- setup-pre-commit.sh | 27 ++ 11 files changed, 1691 insertions(+), 6 deletions(-) create mode 100644 .github/workflows/pre-commit.yml create mode 100644 .pre-commit-config.yaml create mode 100644 AI-CONTEXT.md create mode 100644 Taskfile.yaml create mode 100755 setup-pre-commit.sh diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml new file mode 100644 index 000000000..7828158ea --- /dev/null +++ b/.github/workflows/pre-commit.yml @@ -0,0 +1,36 @@ +name: Pre-commit Checks + +on: + push: + branches: [main, master] + pull_request: + branches: [main, master] + +jobs: + pre-commit: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: "20" + cache: "yarn" + cache-dependency-path: backstage/yarn.lock + + - name: Install Backstage dependencies + run: | + cd backstage + yarn install --frozen-lockfile + + - name: Set up Python + uses: actions/setup-python@v4 + with: + python-version: "3.x" + + - name: Install pre-commit + run: pip install pre-commit + + - name: Run pre-commit + run: pre-commit run --all-files diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 000000000..908031049 --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,25 @@ +repos: + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.4.0 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + - id: check-added-large-files + - id: check-merge-conflict + + - repo: local + hooks: + - id: backstage-typecheck + name: Backstage TypeScript Check + entry: bash -c 'cd backstage && yarn tsc --noEmit' + language: system + files: '^backstage/.*\.(ts|tsx)$' + pass_filenames: false + + - id: terraform-fmt + name: Terraform Format Check + entry: terraform fmt -check -recursive + language: system + files: '^platform/.*\.tf$' + pass_filenames: false diff --git a/AI-CONTEXT.md b/AI-CONTEXT.md new file mode 100644 index 000000000..30316b7c9 --- /dev/null +++ b/AI-CONTEXT.md @@ -0,0 +1,493 @@ +--- +title: "AI Context Document - Application Modernization Blueprints Platform" +persona: ["ai-assistant"] +deployment-scenario: ["platform-only", "full-workshop", "manual"] +difficulty: "advanced" +estimated-time: "reference" +prerequisites: ["AI assistant context", "Platform knowledge"] +related-pages: ["README.md", "ARCHITECTURE.md", "platform-engineering-on-eks/AI-CONTEXT.md"] +repository: "appmod-blueprints" +last-updated: "2025-01-19" +--- + +# AI Context Document - Application Modernization Blueprints Platform + +## Project Overview + +This repository contains the **platform implementation** and application modernization blueprints for building modern, cloud-native applications on AWS EKS. It provides GitOps configurations, platform components, application templates, and operational patterns that create a comprehensive platform engineering solution. + +### Repository Purpose +- **Primary Role**: Platform services, GitOps workflows, and application blueprints +- **Target Users**: Developers, platform adopters, platform engineers, DevOps teams +- **Key Output**: Complete GitOps-based platform with self-service capabilities +- **Integration**: Works with [platform-engineering-on-eks](https://gitlab.aws.dev/aws-tfc-containers/containers-hands-on-content/platform-engineering-on-eks) for infrastructure bootstrap + +### What This Repository Provides +1. **GitOps Platform**: Complete ArgoCD-based platform with multi-environment support +2. **Application Blueprints**: Ready-to-use templates for multiple technology stacks +3. **Platform Components**: Crossplane, Backstage, monitoring, security, and networking +4. **Developer Experience**: Self-service capabilities through Backstage templates and GitOps workflows + +## Architecture Overview + +### High-Level Platform Components +``` +Developer Interface Layer +β”œβ”€β”€ Backstage Portal (Self-Service Templates) +β”œβ”€β”€ Development Environment (VSCode + GitLab) +└── Git Repositories (Source Code & GitOps) + +Platform Control Plane (Hub Cluster) +β”œβ”€β”€ ArgoCD (GitOps Controller) +β”œβ”€β”€ Crossplane (Infrastructure as Code) +β”œβ”€β”€ External Secrets Operator (Secret Management) +β”œβ”€β”€ Cert Manager (TLS Automation) +└── GitLab (Git Repository Hosting) + +Application Runtime (Spoke Clusters) +β”œβ”€β”€ AWS Load Balancer Controller (Traffic Routing) +β”œβ”€β”€ Monitoring Stack (Prometheus + Grafana) +β”œβ”€β”€ Logging Stack (Fluent Bit + CloudWatch) +└── Application Workloads (Multi-Language Support) + +AWS Infrastructure Services +β”œβ”€β”€ EKS Clusters (Hub + Spoke Architecture) +β”œβ”€β”€ VPC & Networking (Multi-AZ Configuration) +β”œβ”€β”€ AWS Secrets Manager (Centralized Secrets) +β”œβ”€β”€ RDS Databases (Managed Databases) +β”œβ”€β”€ S3 Storage (Object Storage) +└── IAM Roles (Pod Identity) +``` + +### Repository Relationship +- **This repo**: Platform implementation, GitOps configurations, application blueprints +- **platform-engineering-on-eks**: Infrastructure bootstrap, CDK deployment, workshop environment +- **Integration Flow**: Bootstrap creates infrastructure β†’ This repo deploys platform services β†’ GitOps manages applications + +## Key Concepts and Terminology + +### Platform Architecture Terms +- **Platform Services**: Core Kubernetes operators and controllers providing platform capabilities +- **GitOps Workflow**: Declarative deployment pattern using Git as source of truth +- **Hub-and-Spoke Architecture**: Multi-cluster pattern with centralized control plane and distributed workload clusters +- **Application Blueprints**: Standardized templates for different technology stacks and deployment patterns +- **Self-Service Infrastructure**: Developer-accessible APIs for provisioning cloud resources + +### GitOps Terms +- **ArgoCD**: GitOps controller managing continuous deployment from Git repositories +- **ApplicationSets**: ArgoCD resources enabling templated, multi-cluster application deployment +- **GitOps Bridge**: Data pipeline connecting infrastructure metadata to GitOps applications +- **Cluster Registration**: Process where clusters automatically register metadata for GitOps discovery +- **Sync Waves**: Ordered deployment phases ensuring proper dependency management + +### Platform Components +- **Backstage**: Developer portal providing self-service application creation and service catalog +- **Crossplane**: Kubernetes-native infrastructure as code for cloud resource provisioning +- **External Secrets Operator**: Kubernetes operator for syncing secrets from external systems +- **Ingress Controller**: Traffic routing and load balancing for applications +- **Service Mesh**: Communication layer providing security, observability, and traffic management + +### Infrastructure Integration Terms +- **CloudFormation Integration**: AWS native infrastructure as code service integration patterns +- **Pod Identity**: AWS EKS feature for secure, credential-free access to AWS services +- **Resource Prefix**: Consistent naming convention for all platform resources +- **Environment Isolation**: Separation of development, staging, and production environments +- **Multi-Tenant Architecture**: Platform design supporting multiple teams and applications securely + +## File Structure and Key Components + +### Platform Infrastructure (`platform/`) +``` +platform/ +β”œβ”€β”€ infra/terraform/ # Terraform infrastructure modules +β”‚ β”œβ”€β”€ common/ # Shared infrastructure (VPC, EKS, S3) +β”‚ β”œβ”€β”€ hub/ # Hub cluster and platform services +β”‚ β”œβ”€β”€ spokes/ # Spoke clusters for workloads +β”‚ └── old/ # Legacy configurations +β”œβ”€β”€ backstage/ # Backstage developer portal +β”‚ β”œβ”€β”€ templates/ # Software templates for scaffolding +β”‚ └── components/ # Service catalog components +└── components/ # Platform CUE components +``` + +### GitOps Configurations (`gitops/`) +``` +gitops/ +β”œβ”€β”€ addons/ # Platform addon configurations +β”‚ β”œβ”€β”€ charts/ # Helm charts for platform services +β”‚ β”œβ”€β”€ bootstrap/ # Bootstrap configurations +β”‚ β”œβ”€β”€ environments/ # Environment-specific configs +β”‚ └── tenants/ # Tenant-specific configurations +β”œβ”€β”€ fleet/ # Fleet management configurations +β”‚ β”œβ”€β”€ bootstrap/ # Fleet ApplicationSets +β”‚ └── members/ # Fleet member configurations +β”œβ”€β”€ platform/ # Platform service configurations +└── workloads/ # Application workload configurations +``` + +### Application Blueprints (`applications/`) +``` +applications/ +β”œβ”€β”€ dotnet/ # .NET applications with clean architecture +β”œβ”€β”€ java/ # Java Spring Boot microservices +β”œβ”€β”€ node/ # Node.js Express applications +β”œβ”€β”€ python/ # Python FastAPI services +β”œβ”€β”€ rust/ # Rust high-performance web services +β”œβ”€β”€ golang/ # Go cloud-native microservices +β”œβ”€β”€ next-js/ # Next.js React applications +└── mono-a2c/ # Monolith to containers migration +``` + +### Package Configurations (`packages/`) +``` +packages/ +β”œβ”€β”€ ack/ # AWS Controllers for Kubernetes +β”œβ”€β”€ argocd/ # ArgoCD configurations +β”œβ”€β”€ backstage/ # Backstage configurations +β”œβ”€β”€ cert-manager/ # Certificate management +β”œβ”€β”€ crossplane/ # Infrastructure as code +β”œβ”€β”€ external-secrets/ # Secret management +β”œβ”€β”€ grafana/ # Monitoring and dashboards +β”œβ”€β”€ ingress-nginx/ # Ingress controller +β”œβ”€β”€ keycloak/ # Identity and access management +└── kyverno/ # Policy engine +``` + +## Terraform Module Architecture + +### Common Module (`platform/infra/terraform/common/`) +**Purpose**: Foundational infrastructure shared across all environments + +**Key Resources**: +- VPC Configuration (Multi-AZ networking with public/private subnets) +- EKS Cluster (Managed Kubernetes with auto-scaling node groups) +- S3 Backend (Terraform state storage with DynamoDB locking) +- IAM Configuration (Cluster access roles and service account policies) +- Core Addons (AWS Load Balancer Controller, EBS CSI Driver) +- Security Groups (Network access control for cluster components) + +### Hub Module (`platform/infra/terraform/hub/`) +**Purpose**: Central platform services and GitOps control plane + +**Key Resources**: +- Backstage Developer Portal (Service catalog and software templates) +- ArgoCD GitOps Controller (Continuous deployment management) +- Keycloak Identity Provider (SSO and OIDC authentication) +- External Secrets Operator (AWS Secrets Manager integration) +- Ingress Controllers (Traffic routing and SSL termination) +- Monitoring Stack (CloudWatch integration and observability) + +### Spokes Module (`platform/infra/terraform/spokes/`) +**Purpose**: Application workload environments (staging, production) + +**Key Resources**: +- Separate EKS Clusters (Isolated environments for applications) +- ArgoCD Registration (Connection to hub cluster GitOps) +- Environment-Specific Networking (Workload-appropriate configurations) +- Application Monitoring (Environment-specific observability) +- Workload Security (RBAC and network policies) + +## GitOps Architecture and Patterns + +### GitOps Repository Structure +``` +GitOps Flow: +Git Repositories β†’ ArgoCD ApplicationSets β†’ Kubernetes Manifests β†’ Running Applications + +Repository Types: +β”œβ”€β”€ Platform Repo (Core platform services) +β”œβ”€β”€ Addons Repo (Cluster addons) +β”œβ”€β”€ Workloads Repo (Applications) +└── Fleet Repo (Multi-cluster config) +``` + +### Cluster Registration Pattern +**Key Feature**: Automatic cluster discovery and configuration + +**How it Works**: +1. Infrastructure tools (Terraform/KRO/Crossplane) create EKS clusters +2. Cluster registration secret created in AWS Secrets Manager with metadata +3. External Secrets Operator syncs secrets to hub cluster +4. ArgoCD ApplicationSets discover clusters and deploy applications dynamically + +**Cluster Registration Secret Format**: +```json +{ + "cluster_name": "spoke-dev-us-east-1", + "cluster_endpoint": "https://ABC123.gr7.us-east-1.eks.amazonaws.com", + "resource_prefix": "peeks-workshop", + "environment": "dev", + "tenant": "platform-team", + "cluster_type": "spoke", + "labels": { + "environment": "dev", + "cluster-type": "spoke", + "tenant": "platform-team" + }, + "annotations": { + "addons_repo_basepath": "gitops/addons/", + "workloads_repo_basepath": "gitops/workloads/", + "kustomize_path": "environments/dev", + "resource_prefix": "peeks-workshop" + } +} +``` + +### ApplicationSet Integration +ArgoCD ApplicationSets use cluster metadata for dynamic configuration: +- **Cluster Discovery**: Automatically find new clusters via secrets +- **Environment-Specific Deployment**: Use labels for environment targeting +- **Multi-Tenant Support**: Tenant-based application deployment +- **Configuration Templating**: Annotations provide deployment customization + +## Platform Components Deep Dive + +### ArgoCD - GitOps Controller +**Configuration**: `gitops/platform/charts/argo-cd/` +**Key Features**: +- Multi-cluster application deployment +- ApplicationSets for environment promotion +- RBAC integration with external identity providers +- Automated sync and drift detection +- Web UI for deployment visualization + +### Backstage - Developer Portal +**Configuration**: `platform/backstage/` +**Key Features**: +- Application templates and scaffolding +- Service catalog and documentation +- CI/CD pipeline integration +- Infrastructure visibility +- Plugin ecosystem for extensibility + +### Crossplane - Infrastructure as Code +**Configuration**: `platform/crossplane/` +**Key Features**: +- AWS resource compositions (RDS, S3, IAM) +- Self-service infrastructure through Kubernetes CRDs +- Policy-driven resource management +- Cost optimization through resource lifecycle management +- GitOps-native infrastructure provisioning + +### External Secrets Operator - Secret Management +**Key Features**: +- AWS Secrets Manager integration +- Automatic secret rotation +- Cross-namespace secret sharing +- Pod Identity authentication +- Multiple secret store support + +## Application Blueprints and Technology Support + +### Supported Technology Stacks + +#### .NET Applications (`applications/dotnet/`) +- **Architecture**: Clean Architecture pattern with Entity Framework +- **Database**: PostgreSQL integration with migrations +- **Observability**: Health checks, metrics, and logging +- **Deployment**: Multi-stage Dockerfile with optimized builds + +#### Java Applications (`applications/java/`) +- **Framework**: Spring Boot microservices with Spring Data JPA +- **Build System**: Maven with multi-module projects +- **Monitoring**: Actuator endpoints and Micrometer metrics +- **Testing**: JUnit 5 with integration test support + +#### Node.js Applications (`applications/node/`) +- **Framework**: Express.js with TypeScript support +- **Package Management**: npm/yarn with lock files +- **Development**: Hot reload and debugging support +- **Security**: Helmet.js and security best practices + +#### Python Applications (`applications/python/`) +- **Framework**: FastAPI with async/await patterns +- **Dependency Management**: Poetry for reproducible builds +- **Type Safety**: Type hints and Pydantic validation +- **Testing**: pytest with async test support + +#### Rust Applications (`applications/rust/`) +- **Performance**: High-performance web services with Tokio +- **Build System**: Cargo with workspace support +- **Safety**: Memory safety and zero-cost abstractions +- **Deployment**: Minimal container images with multi-stage builds + +#### Go Applications (`applications/golang/`) +- **Concurrency**: Goroutine-based concurrent processing +- **Modules**: Go modules for dependency management +- **Performance**: Compiled binaries with small footprint +- **Cloud Native**: Kubernetes-native patterns and health checks + +### Application Deployment Pattern +``` +Developer Workflow: +Backstage Template β†’ Generated Repository β†’ Code Development β†’ Git Push β†’ +CI Pipeline β†’ Container Build β†’ GitOps Config Update β†’ ArgoCD Sync β†’ +Kubernetes Deployment β†’ Application Running +``` + +## Deployment Process and Commands + +### Terraform Deployment (Use Scripts Only) +**⚠️ IMPORTANT**: Always use deployment scripts, never run terraform commands directly + +```bash +# Deploy common infrastructure +cd platform/infra/terraform/common && ./deploy.sh + +# Deploy hub cluster +cd platform/infra/terraform/hub && ./deploy.sh + +# Deploy spoke cluster +cd platform/infra/terraform/spokes && ./deploy.sh dev + +# Destroy resources +cd platform/infra/terraform/hub && ./destroy.sh +``` + +### GitOps Deployment Flow +1. **Infrastructure Deployment**: Terraform creates EKS clusters and AWS resources +2. **Cluster Registration**: Clusters automatically register in AWS Secrets Manager +3. **ArgoCD Bootstrap**: ArgoCD discovers clusters and begins application deployment +4. **Platform Services**: Core platform services deploy via GitOps +5. **Application Deployment**: Applications deploy through Backstage templates and GitOps + +### Bootstrap Script Workflow +```bash +# Enhanced bootstrap process with health monitoring +scripts/0-bootstrap.sh +β”œβ”€β”€ 1-argocd-gitlab-setup.sh # ArgoCD and GitLab integration +β”œβ”€β”€ Wait for ArgoCD health # Monitor application sync status +β”œβ”€β”€ 2-bootstrap-accounts.sh # Account setup after ArgoCD ready +└── 6-tools-urls.sh # Generate access URLs +``` + +## Secret Management Architecture + +### Predictable Naming Convention +The platform uses consistent secret naming that eliminates dynamic configuration needs: + +**Pattern**: `{resource_prefix}-{service}-{type}-password` + +**Examples**: +- `peeks-workshop-gitops-keycloak-admin-password` +- `peeks-workshop-gitops-backstage-postgresql-password` +- `peeks-workshop-gitops-argocd-admin-password` + +### Secret Management Flow +``` +AWS Secrets Manager β†’ External Secrets Operator β†’ Kubernetes Secrets β†’ Applications +``` + +**Benefits**: +- **Decoupling**: GitOps templates don't need dynamic infrastructure details +- **Predictability**: Secret names follow consistent patterns +- **Security**: Centralized secret management in AWS Secrets Manager +- **Maintainability**: Infrastructure changes don't require GitOps updates + +## Common Interaction Patterns + +### Developer Workflow +1. **Access Platform**: Use Backstage developer portal for self-service +2. **Create Application**: Choose technology template and generate repository +3. **Develop Locally**: Clone repository and develop using preferred tools +4. **Deploy Application**: Push code triggers CI/CD and GitOps deployment +5. **Monitor Application**: Use platform observability tools for monitoring + +### Platform Engineer Workflow +1. **Customize Platform**: Modify Terraform modules and GitOps configurations +2. **Add Services**: Extend platform with new operators and services +3. **Manage Environments**: Configure environment-specific settings and policies +4. **Monitor Platform**: Use ArgoCD and monitoring tools for platform health +5. **Troubleshoot Issues**: Use logs, metrics, and diagnostic tools + +### Platform Adopter Workflow +1. **Evaluate Platform**: Deploy platform-only configuration for assessment +2. **Customize Components**: Adapt platform services for organizational needs +3. **Integrate Systems**: Connect platform with existing tools and processes +4. **Train Teams**: Onboard development teams to platform capabilities +5. **Scale Adoption**: Expand platform usage across organization + +### DevOps Team Workflow +1. **Establish GitOps**: Set up GitOps workflows and repository structures +2. **Configure Environments**: Create development, staging, and production environments +3. **Implement Policies**: Define security, compliance, and operational policies +4. **Monitor Operations**: Set up alerting, logging, and incident response +5. **Optimize Performance**: Tune platform performance and resource utilization + +## Troubleshooting Patterns + +### Common Platform Issues +- **ArgoCD Sync Failures**: Applications not syncing due to configuration errors +- **Secret Management**: External Secrets Operator not syncing from AWS Secrets Manager +- **Cluster Registration**: New clusters not discovered by ArgoCD ApplicationSets +- **Network Connectivity**: Services unable to communicate across clusters +- **Resource Constraints**: Insufficient cluster resources for platform services + +### Diagnostic Commands +```bash +# Check ArgoCD application status +kubectl get applications -n argocd + +# Verify External Secrets Operator +kubectl get externalsecrets -A +kubectl get secretstores -A + +# Check cluster registration secrets +kubectl get secrets -n argocd | grep cluster + +# Monitor platform service health +kubectl get pods -n backstage +kubectl get pods -n crossplane-system +kubectl get pods -n external-secrets +``` + +### Recovery Procedures +1. **ArgoCD Issues**: Check application logs, verify Git repository access, validate RBAC +2. **Secret Sync Issues**: Verify Pod Identity permissions, check AWS Secrets Manager +3. **Cluster Discovery**: Validate cluster registration secrets and External Secrets config +4. **Network Issues**: Check security groups, network policies, and ingress configurations +5. **Resource Issues**: Scale cluster nodes, adjust resource requests/limits + +## Security Considerations + +### Identity and Access Management +- **Pod Identity**: Eliminates long-lived credentials for AWS service access +- **RBAC**: Fine-grained Kubernetes permissions for users and services +- **OIDC Integration**: Centralized authentication through Keycloak +- **Multi-Tenant Security**: Namespace isolation and tenant-based access control + +### Secret Management Security +- **External Secrets**: Centralized secret management through AWS Secrets Manager +- **Automatic Rotation**: Secrets rotated without application downtime +- **Encryption**: Secrets encrypted at rest and in transit +- **Audit Trail**: All secret access logged and monitored + +### Network Security +- **VPC Isolation**: Network-level isolation between environments +- **Security Groups**: Application-level firewall rules +- **Network Policies**: Kubernetes-native network segmentation +- **TLS Everywhere**: End-to-end encryption for all communications + +### GitOps Security +- **Git Repository Security**: Access control and audit logging for GitOps repositories +- **Signed Commits**: Cryptographic verification of configuration changes +- **Policy as Code**: Automated policy enforcement through Kyverno +- **Compliance Monitoring**: Continuous compliance checking and reporting + +## Key Success Metrics + +### Platform Health Indicators +- **ArgoCD Sync Status**: All applications synced and healthy +- **Platform Services**: Backstage, Crossplane, and other services operational +- **Cluster Connectivity**: Hub cluster can manage all spoke clusters +- **Secret Synchronization**: All secrets syncing from AWS Secrets Manager +- **Application Deployments**: Applications deploying successfully through GitOps + +### Developer Experience Metrics +- **Time to First Application**: How quickly developers can deploy first application +- **Self-Service Adoption**: Usage of Backstage templates and self-service features +- **Deployment Frequency**: How often applications are deployed through platform +- **Mean Time to Recovery**: How quickly issues are resolved +- **Developer Satisfaction**: Feedback on platform usability and capabilities + +This context provides AI assistants with comprehensive understanding of the platform implementation repository, its components, and common interaction patterns for different user types. \ No newline at end of file diff --git a/DEPLOYMENT-GUIDE.md b/DEPLOYMENT-GUIDE.md index 5552fda7c..02b6bb33d 100644 --- a/DEPLOYMENT-GUIDE.md +++ b/DEPLOYMENT-GUIDE.md @@ -1,3 +1,15 @@ +--- +title: "Deployment Guide - Application Modernization Blueprints" +persona: ["platform-adopter", "infrastructure-engineer"] +deployment-scenario: ["platform-only", "full-workshop", "manual"] +difficulty: "intermediate" +estimated-time: "60 minutes" +prerequisites: ["EKS Cluster", "kubectl", "ArgoCD CLI", "Git access"] +related-pages: ["GETTING-STARTED.md", "ARCHITECTURE.md", "TROUBLESHOOTING.md"] +repository: "appmod-blueprints" +last-updated: "2025-01-19" +--- + # Deployment Guide - Application Modernization Blueprints ## Overview diff --git a/GETTING-STARTED.md b/GETTING-STARTED.md index 902209b76..12820ba3b 100644 --- a/GETTING-STARTED.md +++ b/GETTING-STARTED.md @@ -1,3 +1,15 @@ +--- +title: "Getting Started with Application Modernization Blueprints" +persona: ["platform-adopter", "developer", "infrastructure-engineer"] +deployment-scenario: ["platform-only", "full-workshop"] +difficulty: "beginner" +estimated-time: "30 minutes" +prerequisites: ["EKS Cluster", "kubectl access", "Basic GitOps knowledge"] +related-pages: ["README.md", "DEPLOYMENT-GUIDE.md", "ARCHITECTURE.md"] +repository: "appmod-blueprints" +last-updated: "2025-01-19" +--- + # Getting Started with Application Modernization Blueprints ## What are Application Modernization Blueprints? diff --git a/README.md b/README.md index 847493205..b5beca1e7 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,15 @@ +--- +title: "Platform Engineering on EKS - Application Modernization Blueprints" +persona: ["platform-adopter", "infrastructure-engineer", "developer"] +deployment-scenario: ["platform-only", "full-workshop"] +difficulty: "intermediate" +estimated-time: "30 minutes" +prerequisites: ["AWS Account", "kubectl", "Basic Kubernetes knowledge"] +related-pages: ["GETTING-STARTED.md", "ARCHITECTURE.md", "DEPLOYMENT-GUIDE.md"] +repository: "appmod-blueprints" +last-updated: "2025-01-19" +--- + # Platform Engineering on EKS - Application Modernization Blueprints ## What is this? diff --git a/TODO.md b/TODO.md index dbf1f24c3..a3f13d8df 100644 --- a/TODO.md +++ b/TODO.md @@ -95,8 +95,10 @@ task taskcat-clean-deployment - Does WORKSHOP_ID=28c283c1-1d60-43fa-a604-4e983e0e8038 is the goor one ? - update region in backstage templates +sudo npm install -g yarn +- gitlab - {"message":{"project_namespace.path":["can only include non-accented letters, digits, '_', '-' and '.'. It must not start with '-', '_', or '.', nor end with '-', '_', '.', '.git', or '.atom'."],"path":["can only include non-accented letters, digits, '_', '-' and '.'. It must not start with '-', '_', or '.', nor end with '-', '_', '.', '.git', or '.atom'."]}} -- #### Production Security Hardening - the hardening should be done, in a gitops manner, not using kubectl +#### Production Security Hardening - the hardening should be done, in a gitops manner, not using kubectl - same for installing network policy, load balancer controller, all should be done with gitops, and maybe gitops-bridge if we need dependency with resources deployed in terraform - add also the cleanup after destroy with tsk taskcat-clean-deployment-force, that can help remove any remaining aws resources deploy by the platform diff --git a/TROUBLESHOOTING.md b/TROUBLESHOOTING.md index 4b59c1ec9..c6aaee6f3 100644 --- a/TROUBLESHOOTING.md +++ b/TROUBLESHOOTING.md @@ -1,9 +1,38 @@ +--- +title: "Troubleshooting Guide - Application Modernization Blueprints" +persona: ["platform-adopter", "infrastructure-engineer", "developer", "workshop-participant"] +deployment-scenario: ["platform-only", "full-workshop", "manual"] +difficulty: "intermediate" +estimated-time: "varies" +prerequisites: ["Platform access", "kubectl", "Basic troubleshooting skills"] +related-pages: ["DEPLOYMENT-GUIDE.md", "ARCHITECTURE.md", "platform-engineering-on-eks/TROUBLESHOOTING.md"] +repository: "appmod-blueprints" +last-updated: "2025-01-19" +--- + # Troubleshooting Guide - Application Modernization Blueprints ## Overview This guide provides solutions to common issues encountered when using the Application Modernization Blueprints platform. Issues are organized by symptoms to help you quickly identify and resolve problems during platform operations and application development. +**πŸ“š For Workshop Participants**: Focus on platform access and application deployment issues to complete workshop exercises +**🏒 For Platform Adopters**: Review GitOps workflows and platform operations troubleshooting for production use +**βš™οΈ For Infrastructure Engineers**: Use advanced diagnostics and performance optimization procedures for platform management +**πŸ‘©β€πŸ’» For Developers**: Check application development and CI/CD pipeline issues for daily development workflows + +> **Quick Navigation**: Jump to [Platform Access](#platform-access-issues) | [GitOps Issues](#gitops-and-argocd-issues) | [Application Development](#application-development-issues) | [Performance](#performance-optimization) | [Getting Help](#getting-additional-help) + +## Prerequisites + +Before troubleshooting, ensure you have: + +- Access to the EKS cluster with `kubectl` configured +- ArgoCD CLI installed (optional but helpful) +- Access to platform services (Backstage, GitLab, ArgoCD) +- Proper environment variables set in your development environment +- Understanding of your platform deployment scenario + ## Quick Diagnostic Commands Before diving into specific issues, run these commands to gather basic information: @@ -21,6 +50,22 @@ kubectl cluster-info # Check recent events kubectl get events --sort-by='.lastTimestamp' | tail -10 + +# Run platform health check +./scripts/0-install.sh --check-only 2>/dev/null || echo "Run ./scripts/0-install.sh to initialize platform" +``` + +### Platform Service URLs + +Get current platform service URLs: + +```bash +# Get platform URLs (requires access to hub cluster secrets) +kubectl get secret ${RESOURCE_PREFIX}-hub-cluster -n argocd -o jsonpath='{.metadata.annotations.ingress_domain_name}' 2>/dev/null || echo "Platform domain not available" + +# Check service endpoints +kubectl get ingress -A +kubectl get services -A --field-selector spec.type=LoadBalancer ``` ## Platform Access Issues @@ -690,17 +735,27 @@ kubectl apply -f configmaps-backup.yaml ### Escalation Paths -1. **Platform Team Support** +1. **Platform Documentation** + - Check platform getting started: [GETTING-STARTED.md](GETTING-STARTED.md) + - Review platform architecture: [ARCHITECTURE.md](ARCHITECTURE.md) + - Consult deployment scenarios: [DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md) + +2. **Infrastructure Issues** + - For infrastructure deployment issues, see [platform-engineering-on-eks troubleshooting](https://gitlab.aws.dev/aws-samples/platform-engineering-on-eks/-/blob/main/TROUBLESHOOTING.md) + - Check CDK deployment documentation + +3. **Platform Team Support** - Check internal documentation and runbooks - Contact platform engineering team - Review platform architecture documentation -2. **Community Resources** +4. **Community Resources** - [ArgoCD Community](https://github.com/argoproj/argo-cd/discussions) - [Backstage Community](https://github.com/backstage/backstage/discussions) - [Kubernetes Slack](https://kubernetes.slack.com/) + - [Platform Engineering Community](https://platformengineering.org/) -3. **AWS Support** +5. **AWS Support** - For EKS-related issues, create AWS support case - Include cluster name, region, and error messages - Check [AWS EKS Best Practices](https://aws.github.io/aws-eks-best-practices/) @@ -758,4 +813,666 @@ grep -r -i "connection\|timeout\|dns\|network" platform-diagnostics/ grep -r -i "auth\|permission\|forbidden\|unauthorized" platform-diagnostics/ ``` -This diagnostic information will help support teams identify and resolve issues more efficiently. \ No newline at end of file +This diagnostic information will help support teams identify and resolve issues more efficiently. + +## Platform Deployment Scenarios + +### Platform-Only Deployment + +**🏒 For Platform Adopters**: Troubleshooting platform without workshop components + +**Symptoms:** +- Deploying platform services without workshop-specific components +- Need production-ready configuration +- Want to customize platform for organizational needs + +**Verification Steps:** +```bash +# Check core platform services +kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server +kubectl get pods -n backstage -l app.kubernetes.io/name=backstage + +# Verify GitOps is working +kubectl get applications -n argocd -o custom-columns="NAME:.metadata.name,SYNC:.status.sync.status,HEALTH:.status.health.status" + +# Check platform configuration +kubectl get configmap backstage-app-config -n backstage -o yaml | grep -A 5 "organization:" +``` + +**Common Platform-Only Issues:** + +1. **Missing Workshop Dependencies** + ```bash + # Remove workshop-specific applications + kubectl delete applications -n argocd -l workshop=true 2>/dev/null + + # Update platform configuration for production + kubectl patch configmap backstage-app-config -n backstage --patch '{"data":{"app-config.yaml":"organization:\n name: Your Organization"}}' + ``` + +2. **Authentication Configuration** + ```bash + # Configure production authentication + kubectl get secrets -n backstage | grep auth + kubectl get configmap backstage-app-config -n backstage -o yaml | grep -A 10 "auth:" + + # Update for your identity provider + kubectl patch configmap backstage-app-config -n backstage --patch '{"data":{"app-config.yaml":"auth:\n providers:\n oauth2Proxy: {}\n environment: production"}}' + ``` + +### IDE-Only Setup + +**πŸ‘©β€πŸ’» For Developers**: Setting up development environment without full platform + +**Symptoms:** +- Want development environment without full platform overhead +- Need access to platform tools for development +- Working on platform customization + +**Setup Commands:** +```bash +# Minimal environment setup +export AWS_PROFILE=your-profile +export AWS_REGION=your-region +export KUBECONFIG=~/.kube/config + +# Install required tools +curl -sSL https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64 -o argocd +chmod +x argocd && sudo mv argocd /usr/local/bin/ + +# Connect to existing cluster +aws eks update-kubeconfig --region $AWS_REGION --name your-cluster-name +kubectl get nodes +``` + +**Development Workflow:** +```bash +# Clone platform repositories +git clone https://github.com/aws-samples/appmod-blueprints.git +cd appmod-blueprints + +# Set up development environment +source ./scripts/setup-dev-env.sh 2>/dev/null || echo "Set environment variables manually" + +# Test platform connectivity +kubectl get applications -n argocd +argocd app list --server argocd-server.argocd.svc.cluster.local +``` + +### Manual Configuration + +**βš™οΈ For Infrastructure Engineers**: Step-by-step manual setup without automation + +**Symptoms:** +- Automation scripts not working in your environment +- Need to understand each step for customization +- Working in restricted environment + +**Manual Setup Steps:** + +1. **Core Platform Services:** + ```bash + # Install ArgoCD + kubectl create namespace argocd + kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml + + # Wait for ArgoCD to be ready + kubectl wait --for=condition=available --timeout=300s deployment/argocd-server -n argocd + ``` + +2. **Platform Applications:** + ```bash + # Create bootstrap application + kubectl apply -f - < -n argocd | grep -A 10 "Operation:" + +# Check for resource conflicts +kubectl get events -n --sort-by='.lastTimestamp' | grep -i conflict + +# Check for manual changes outside GitOps +kubectl diff -f + +# Monitor sync operations +watch 'kubectl get applications -n argocd -o custom-columns="NAME:.metadata.name,SYNC:.status.sync.status,HEALTH:.status.health.status,OPERATION:.status.operationState.phase"' +``` + +**Solutions:** +```bash +# Disable auto-sync temporarily +kubectl patch application -n argocd --type merge -p '{"spec":{"syncPolicy":{"automated":null}}}' + +# Force hard refresh +kubectl patch application -n argocd --type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}' + +# Reset application to clean state +kubectl patch application -n argocd --type merge -p '{"operation":null}' +kubectl patch application -n argocd --type merge -p '{"status":null}' + +# Re-enable auto-sync after fixing conflicts +kubectl patch application -n argocd --type merge -p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true}}}}' +``` + +### Multi-Cluster GitOps Issues + +**Symptoms:** +- Applications not deploying to spoke clusters +- Cluster connectivity issues +- Cross-cluster resource dependencies failing + +**Diagnostic Commands:** +```bash +# Check cluster registration status +argocd cluster list + +# Test cluster connectivity +kubectl --context= get nodes + +# Check cluster secrets in ArgoCD +kubectl get secrets -n argocd | grep cluster- + +# Verify cluster access from ArgoCD +kubectl exec -n argocd deployment/argocd-server -- argocd cluster list +``` + +**Solutions:** +```bash +# Re-register spoke cluster +argocd cluster add --name + +# Update cluster credentials +kubectl patch secret -n argocd --type merge -p '{"data":{"config":""}}' + +# Test cross-cluster deployment +kubectl apply -f - < + namespace: test + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true +EOF +``` + +## Platform Operations Issues + +### Resource Management and Scaling + +**Symptoms:** +- Platform services consuming excessive resources +- Auto-scaling not working as expected +- Resource quotas preventing deployments + +**Diagnostic Commands:** +```bash +# Check platform resource consumption +kubectl top pods -n argocd --sort-by=cpu +kubectl top pods -n backstage --sort-by=memory +kubectl top pods -n gitlab --sort-by=memory + +# Check resource quotas and limits +kubectl describe resourcequotas -A +kubectl describe limitranges -A + +# Check horizontal pod autoscaler status +kubectl get hpa -A +kubectl describe hpa -n + +# Monitor resource usage over time +kubectl top nodes --sort-by=cpu +kubectl top nodes --sort-by=memory +``` + +**Solutions:** +```bash +# Adjust resource limits for platform services +kubectl patch deployment argocd-server -n argocd -p '{"spec":{"template":{"spec":{"containers":[{"name":"argocd-server","resources":{"limits":{"cpu":"1000m","memory":"2Gi"},"requests":{"cpu":"500m","memory":"1Gi"}}}]}}}}' + +# Enable HPA for platform services +kubectl autoscale deployment backstage -n backstage --cpu-percent=70 --min=2 --max=5 + +# Adjust resource quotas +kubectl patch resourcequota -n --type merge -p '{"spec":{"hard":{"requests.cpu":"4","requests.memory":"8Gi"}}}' + +# Scale platform services based on usage +kubectl scale deployment argocd-repo-server --replicas=3 -n argocd +kubectl scale deployment gitlab-sidekiq-all-in-1-v2 --replicas=2 -n gitlab +``` + +### Platform Service Dependencies + +**Symptoms:** +- Services failing due to dependency issues +- Database connectivity problems +- External service integration failures + +**Diagnostic Commands:** +```bash +# Check service dependencies +kubectl get pods -A -o wide | grep -E "(postgres|redis|keycloak)" + +# Test database connectivity +kubectl exec -it deployment/backstage -n backstage -- nc -zv postgresql 5432 +kubectl exec -it deployment/gitlab-webservice-default -n gitlab -- nc -zv redis 6379 + +# Check external service connectivity +kubectl run -it --rm debug --image=busybox --restart=Never -- wget -qO- https://api.github.com +kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup github.com + +# Check service mesh configuration (if applicable) +kubectl get virtualservices -A +kubectl get destinationrules -A +``` + +**Solutions:** +```bash +# Restart dependent services in order +kubectl rollout restart deployment postgresql -n postgresql +kubectl rollout restart deployment redis -n redis +kubectl rollout restart deployment keycloak -n keycloak +kubectl rollout restart deployment backstage -n backstage + +# Fix database connection issues +kubectl patch deployment backstage -n backstage -p '{"spec":{"template":{"spec":{"containers":[{"name":"backstage","env":[{"name":"POSTGRES_HOST","value":"postgresql.postgresql.svc.cluster.local"}]}]}}}}' + +# Update external service configurations +kubectl patch configmap backstage-app-config -n backstage --patch '{"data":{"app-config.yaml":"integrations:\n github:\n - host: github.com\n token: ${GITHUB_TOKEN}"}}' +``` + +## Application Development Troubleshooting + +### Template and Scaffolding Issues + +**Symptoms:** +- Backstage templates failing to generate code +- Generated applications missing files +- Template parameters not being substituted + +**Diagnostic Commands:** +```bash +# Check template catalog status +kubectl logs -n backstage -l app.kubernetes.io/name=backstage | grep -i template + +# Verify template repository access +kubectl exec -n backstage deployment/backstage -- curl -s https://github.com/aws-samples/appmod-blueprints/tree/main/platform/backstage/templates + +# Check template configuration +kubectl get configmap backstage-app-config -n backstage -o yaml | grep -A 20 "catalog:" + +# Test template processing +kubectl exec -it deployment/backstage -n backstage -- cat /app/app-config.yaml | grep -A 10 "techdocs:" +``` + +**Solutions:** +```bash +# Refresh template catalog +kubectl exec -n backstage deployment/backstage -- curl -X POST http://localhost:7007/api/catalog/refresh + +# Update template repository configuration +kubectl patch configmap backstage-app-config -n backstage --patch '{"data":{"app-config.yaml":"catalog:\n locations:\n - type: url\n target: https://github.com/aws-samples/appmod-blueprints/blob/main/platform/backstage/templates/template.yaml"}}' + +# Restart Backstage to reload templates +kubectl rollout restart deployment backstage -n backstage + +# Check template validation +kubectl logs -n backstage -l app.kubernetes.io/name=backstage | grep -i "template.*error" +``` + +## Quick Reference + +### Essential Platform Commands + +```bash +# Platform health check +./scripts/0-install.sh --check-only + +# ArgoCD application status +kubectl get applications -n argocd -o custom-columns="NAME:.metadata.name,SYNC:.status.sync.status,HEALTH:.status.health.status" + +# Platform service status +kubectl get pods -A | grep -E "(argocd|backstage|gitlab|grafana)" | grep -v Running + +# Force sync all applications +kubectl patch applications -n argocd --all --type merge -p '{"operation":{"sync":{"revision":"HEAD"}}}' + +# Reset platform to clean state +./scripts/restore_template_defaults.sh +``` + +### Platform Service Access + +```bash +# Get platform domain +kubectl get secret ${RESOURCE_PREFIX}-hub-cluster -n argocd -o jsonpath='{.metadata.annotations.ingress_domain_name}' 2>/dev/null + +# Port forward for local access +kubectl port-forward svc/argocd-server -n argocd 8080:443 & +kubectl port-forward svc/backstage -n backstage 8081:7007 & +kubectl port-forward svc/gitlab-webservice-default -n gitlab 8082:8181 & +``` + +### Development Workflow + +```bash +# Create new application from template +# Use Backstage UI: https:///backstage/create + +# Check application deployment +kubectl get applications -n argocd | grep +kubectl get pods -n + +# Debug application issues +kubectl describe application -n argocd +kubectl logs -n -l app= +``` + +### Common File Locations + +- Platform configuration: `/workspace/appmod-blueprints/gitops/` +- Application templates: `/workspace/appmod-blueprints/platform/backstage/templates/` +- Platform components: `/workspace/appmod-blueprints/platform/components/` +- Deployment manifests: `/workspace/appmod-blueprints/applications/` +- Scripts: `/workspace/appmod-blueprints/scripts/` + +### Related Documentation + +- [Getting Started Guide](GETTING-STARTED.md) - Platform evaluation and setup +- [Architecture Documentation](ARCHITECTURE.md) - Platform components and design +- [Deployment Guide](DEPLOYMENT-GUIDE.md) - Platform deployment scenarios +- [Infrastructure Troubleshooting](https://gitlab.aws.dev/aws-samples/platform-engineering-on-eks/-/blob/main/TROUBLESHOOTING.md) - Infrastructure-specific issues + +### CI/CD Pipeline Failures + +**Symptoms:** +- GitLab CI/CD pipelines not triggering +- Build failures in pipelines +- Deployment stages failing + +**Diagnostic Commands:** +```bash +# Check GitLab runner status +kubectl get pods -n gitlab-runner +kubectl logs -n gitlab-runner -l app=gitlab-runner --tail=50 + +# Check GitLab CI/CD configuration +# (Access GitLab UI and check Project > Settings > CI/CD) + +# Verify container registry access +kubectl get secrets -A | grep regcred +kubectl describe secret regcred -n + +# Check pipeline logs in GitLab UI +# Navigate to Project > CI/CD > Pipelines > [Pipeline] > Jobs +``` + +**Solutions:** +```bash +# Restart GitLab runners +kubectl rollout restart deployment gitlab-runner -n gitlab-runner + +# Update runner configuration +kubectl patch configmap gitlab-runner-config -n gitlab-runner --patch '{"data":{"config.toml":"concurrent = 4\ncheck_interval = 0\n\n[session_server]\n session_timeout = 1800"}}' + +# Fix container registry credentials +kubectl create secret docker-registry regcred \ + --docker-server= \ + --docker-username= \ + --docker-password= \ + --docker-email= \ + -n + +# Update GitLab CI/CD variables +# Use GitLab UI: Project > Settings > CI/CD > Variables +``` + +### Application Runtime Issues + +**Symptoms:** +- Applications crashing after deployment +- Performance issues in deployed applications +- Service-to-service communication failures + +**Diagnostic Commands:** +```bash +# Check application pod status +kubectl get pods -n -o wide + +# Check application logs +kubectl logs -n -l app= --tail=100 + +# Check service endpoints +kubectl get endpoints -n + +# Test service connectivity +kubectl run -it --rm debug --image=busybox --restart=Never -- wget -qO- http://..svc.cluster.local + +# Check resource usage +kubectl top pods -n +``` + +**Solutions:** +```bash +# Adjust application resource limits +kubectl patch deployment -n -p '{"spec":{"template":{"spec":{"containers":[{"name":"","resources":{"limits":{"cpu":"1000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}}}]}}}}' + +# Fix service configuration +kubectl patch service -n -p '{"spec":{"ports":[{"port":80,"targetPort":8080}]}}' + +# Enable application health checks +kubectl patch deployment -n -p '{"spec":{"template":{"spec":{"containers":[{"name":"","livenessProbe":{"httpGet":{"path":"/health","port":8080},"initialDelaySeconds":30,"periodSeconds":10}}]}}}}' + +# Scale application based on load +kubectl autoscale deployment -n --cpu-percent=70 --min=2 --max=10 +``` + +## Security and Compliance Issues + +### RBAC and Access Control + +**Symptoms:** +- Users cannot access platform services +- Service accounts lack necessary permissions +- Cross-namespace access issues + +**Diagnostic Commands:** +```bash +# Check current user permissions +kubectl auth can-i --list --as= + +# Check service account permissions +kubectl describe serviceaccount -n +kubectl get rolebindings,clusterrolebindings -A | grep + +# Check RBAC policies +kubectl get roles,clusterroles -A | grep +kubectl describe role -n + +# Test specific permissions +kubectl auth can-i create pods --as=system:serviceaccount:: +``` + +**Solutions:** +```bash +# Create necessary role binding +kubectl create rolebinding \ + --clusterrole= \ + --serviceaccount=: \ + --namespace= + +# Create custom role for specific permissions +kubectl apply -f - < + name: +rules: +- apiGroups: [""] + resources: ["pods", "services"] + verbs: ["get", "list", "create", "update", "patch", "delete"] +- apiGroups: ["apps"] + resources: ["deployments"] + verbs: ["get", "list", "create", "update", "patch", "delete"] +EOF + +# Update service account with IAM role (for AWS) +kubectl patch serviceaccount -n \ + -p '{"metadata":{"annotations":{"eks.amazonaws.com/role-arn":""}}}' +``` + +### Secret Management Issues + +**Symptoms:** +- Applications cannot access secrets +- Secret rotation not working +- External secret integration failures + +**Diagnostic Commands:** +```bash +# Check secret availability +kubectl get secrets -A | grep +kubectl describe secret -n + +# Check external secrets operator +kubectl get pods -n external-secrets-system +kubectl logs -n external-secrets-system -l app.kubernetes.io/name=external-secrets + +# Check secret store configuration +kubectl get secretstore -A +kubectl describe secretstore -n + +# Test secret access from pod +kubectl exec -it -n -- env | grep +``` + +**Solutions:** +```bash +# Restart external secrets operator +kubectl rollout restart deployment external-secrets -n external-secrets-system + +# Force secret refresh +kubectl patch externalsecret -n --type merge -p '{"metadata":{"annotations":{"force-sync":"'$(date +%s)'"}}}' + +# Create manual secret if external secrets fail +kubectl create secret generic \ + --from-literal== \ + -n + +# Update secret store configuration +kubectl patch secretstore -n --type merge -p '{"spec":{"provider":{"aws":{"region":"","auth":{"secretRef":{"accessKeyID":{"name":"","key":"access-key"}}}}}}}' +``` + +## Monitoring and Alerting + +### Observability Stack Issues + +**Symptoms:** +- Metrics not appearing in Grafana +- Prometheus not scraping targets +- Alert manager not sending notifications + +**Diagnostic Commands:** +```bash +# Check monitoring stack status +kubectl get pods -n monitoring + +# Check Prometheus targets +kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 & +# Visit http://localhost:9090/targets + +# Check Grafana data sources +kubectl port-forward -n monitoring svc/grafana 3000:3000 & +# Visit http://localhost:3000 and check data sources + +# Check alert manager configuration +kubectl get configmap alertmanager-config -n monitoring -o yaml +``` + +**Solutions:** +```bash +# Restart monitoring components +kubectl rollout restart deployment grafana -n monitoring +kubectl rollout restart statefulset prometheus-prometheus-kube-prometheus-prometheus -n monitoring +kubectl rollout restart statefulset alertmanager-alertmanager-kube-prometheus-alertmanager -n monitoring + +# Fix service monitor configuration +kubectl apply -f - <-metrics + namespace: monitoring +spec: + selector: + matchLabels: + app: + endpoints: + - port: metrics + path: /metrics + interval: 30s +EOF + +# Update Grafana data source +kubectl patch configmap grafana-datasources -n monitoring --patch '{"data":{"datasources.yaml":"apiVersion: 1\ndatasources:\n- name: Prometheus\n type: prometheus\n url: http://prometheus-operated:9090\n access: proxy\n isDefault: true"}}' +``` + +This comprehensive troubleshooting guide provides platform operators and developers with the tools and procedures needed to diagnose and resolve common issues in the Application Modernization Blueprints platform. \ No newline at end of file diff --git a/Taskfile.yaml b/Taskfile.yaml new file mode 100644 index 000000000..79aa39311 --- /dev/null +++ b/Taskfile.yaml @@ -0,0 +1,211 @@ +version: "3.17" + +vars: + AWS_REGION: '{{env "AWS_REGION" | default .AWS_REGION | default "us-west-2"}}' + AWS_PROFILE: '{{.AWS_PROFILE | default (env "AWS_PROFILE") | default "default"}}' + +tasks: + # ===== DOCUMENTATION VALIDATION COMMANDS ===== + + validate-docs: + desc: "Validate appmod-blueprints documentation" + cmds: + - | + echo "πŸ” Running appmod-blueprints documentation validation..." + echo "πŸ“ Validating local documentation files" + echo "" + + # Simple validation for this repository only + ERRORS=0 + WARNINGS=0 + + echo "Checking documentation files..." + + # Check for required documentation files + REQUIRED_FILES=("README.md" "GETTING-STARTED.md" "ARCHITECTURE.md" "DEPLOYMENT-GUIDE.md" "TROUBLESHOOTING.md") + + for file in "${REQUIRED_FILES[@]}"; do + if [ -f "$file" ]; then + echo "βœ… $file - Present" + + # Check for metadata header + if head -n 5 "$file" | grep -q "^---$"; then + echo " βœ… Metadata header found" + else + echo " ⚠️ Missing metadata header" + WARNINGS=$((WARNINGS + 1)) + fi + + # Check for basic content structure + if [ $(wc -l < "$file") -lt 10 ]; then + echo " ⚠️ File seems too short (less than 10 lines)" + WARNINGS=$((WARNINGS + 1)) + fi + + else + echo "❌ $file - Missing" + ERRORS=$((ERRORS + 1)) + fi + done + + echo "" + echo "πŸ“Š Validation Summary:" + echo " Errors: $ERRORS" + echo " Warnings: $WARNINGS" + + if [ $ERRORS -eq 0 ]; then + echo "βœ… Documentation validation passed" + else + echo "❌ Documentation validation failed" + exit 1 + fi + summary: | + Validates appmod-blueprints documentation: + - Checks for required documentation files + - Validates metadata headers + - Ensures basic content structure + - Repository-specific validation only + + lint-docs: + desc: "Lint documentation files for common issues" + cmds: + - | + echo "πŸ” Linting documentation files..." + + # Check for common markdown issues + find . -name "*.md" -not -path "./.git/*" | while read -r file; do + echo "Checking $file..." + + # Check for trailing whitespace + if grep -q '[[:space:]]$' "$file"; then + echo " ⚠️ Trailing whitespace found" + fi + + # Check for inconsistent heading levels + if grep -q '^#[^#]' "$file" && grep -q '^###[^#]' "$file" && ! grep -q '^##[^#]' "$file"; then + echo " ⚠️ Inconsistent heading levels (h1 to h3 without h2)" + fi + + # Check for broken internal links (basic check) + grep -o '\[.*\]([^)]*\.md[^)]*)' "$file" | while read -r link; do + target=$(echo "$link" | sed 's/.*(\([^)]*\)).*/\1/') + if [ ! -f "$target" ] && [ ! -f "$(dirname "$file")/$target" ]; then + echo " ⚠️ Potentially broken internal link: $target" + fi + done + done + + echo "βœ… Documentation linting completed" + summary: | + Lints documentation for common issues: + - Trailing whitespace + - Inconsistent heading levels + - Broken internal links + - Markdown formatting issues + + # ===== PLATFORM OPERATIONS ===== + + install: + desc: "Install dependencies for appmod-blueprints platform" + cmds: + - | + echo "πŸ“¦ Installing appmod-blueprints dependencies..." + + # Check for Helm + if command -v helm &> /dev/null; then + echo "βœ… Helm found" + else + echo "⚠️ Helm not found - required for platform operations" + fi + + # Check for kubectl + if command -v kubectl &> /dev/null; then + echo "βœ… kubectl found" + else + echo "⚠️ kubectl not found - required for platform operations" + fi + + # Check for ArgoCD CLI + if command -v argocd &> /dev/null; then + echo "βœ… ArgoCD CLI found" + else + echo "⚠️ ArgoCD CLI not found - recommended for GitOps operations" + fi + + echo "βœ… Dependency check completed" + + lint: + desc: "Lint GitOps configurations and Helm charts" + cmds: + - | + echo "πŸ” Linting appmod-blueprints configurations..." + + # Lint YAML files + if command -v yamllint &> /dev/null; then + echo "Running yamllint on YAML files..." + find . -name "*.yaml" -o -name "*.yml" | grep -v ".git" | xargs yamllint -d relaxed || echo "yamllint issues found" + else + echo "⚠️ yamllint not found - skipping YAML validation" + fi + + # Lint Helm charts if present + if [ -d "packages" ] && command -v helm &> /dev/null; then + echo "Linting Helm charts..." + find packages -name "Chart.yaml" -exec dirname {} \; | while read -r chart_dir; do + echo "Linting $chart_dir..." + helm lint "$chart_dir" || echo "Helm lint issues in $chart_dir" + done + fi + + echo "βœ… Configuration linting completed" + + validate-gitops: + desc: "Validate GitOps configurations" + cmds: + - | + echo "πŸ” Validating GitOps configurations..." + + # Check for ArgoCD applications + if [ -d "gitops" ]; then + echo "Checking ArgoCD applications..." + find gitops -name "*.yaml" -o -name "*.yml" | while read -r file; do + if grep -q "kind: Application" "$file"; then + echo "βœ… ArgoCD Application found: $file" + fi + done + fi + + # Check for Kustomization files + find . -name "kustomization.yaml" -o -name "kustomization.yml" | while read -r file; do + echo "βœ… Kustomization found: $file" + if command -v kubectl &> /dev/null; then + kubectl kustomize "$(dirname "$file")" > /dev/null && echo " βœ… Valid" || echo " ❌ Invalid" + fi + done + + echo "βœ… GitOps validation completed" + + test: + desc: "Run platform configuration tests" + cmds: + - | + echo "πŸ§ͺ Running appmod-blueprints tests..." + + # Test Helm chart rendering + if [ -d "packages" ] && command -v helm &> /dev/null; then + echo "Testing Helm chart rendering..." + find packages -name "Chart.yaml" -exec dirname {} \; | while read -r chart_dir; do + echo "Testing $chart_dir..." + helm template test "$chart_dir" > /dev/null && echo " βœ… Renders successfully" || echo " ❌ Rendering failed" + done + fi + + # Test Kustomize builds + find . -name "kustomization.yaml" -o -name "kustomization.yml" | while read -r file; do + if command -v kubectl &> /dev/null; then + echo "Testing $(dirname "$file")..." + kubectl kustomize "$(dirname "$file")" > /dev/null && echo " βœ… Builds successfully" || echo " ❌ Build failed" + fi + done + + echo "βœ… Platform tests completed" diff --git a/amazon-q-target-file.md b/amazon-q-target-file.md index dcda273c5..08cda2689 100644 --- a/amazon-q-target-file.md +++ b/amazon-q-target-file.md @@ -1,4 +1,31 @@ -# AppMod Blueprints - Platform Architecture +# AppMod Blueprints - AI Context Document + +This document provides comprehensive context for AI assistants working with the AppMod Blueprints platform engineering solution. It includes architecture overview, common interaction patterns, troubleshooting guidance, and key concepts for effective AI assistance. + +## Quick Reference for AI Assistants + +### Project Overview +AppMod Blueprints is a platform engineering solution that provides: +- **GitOps-based application delivery** using ArgoCD and Kubernetes +- **Developer self-service** through Backstage templates and service catalog +- **Multi-environment management** with hub and spoke cluster architecture +- **Identity and access management** via Keycloak OIDC integration +- **Infrastructure as Code** using Terraform and Helm charts + +### Repository Relationship +This repository (appmod-blueprints) works with the platform-engineering-on-eks repository: +- **platform-engineering-on-eks**: Bootstrap infrastructure (CDK, IDE, base clusters) +- **appmod-blueprints**: Platform implementation (GitOps, applications, developer tools) + +### Common User Interaction Patterns +1. **Platform Adoption**: Users want to deploy the platform for their organization +2. **Application Development**: Developers need to deploy applications using platform services +3. **Troubleshooting**: Users encounter issues with GitOps, ArgoCD, or application deployments +4. **Customization**: Teams want to modify platform components or add new services + +--- + +# Platform Architecture Details ## Bootstrap Script Improvements (2025-09-02) @@ -659,6 +686,117 @@ variable "create_github_repos" { This architecture provides a production-ready platform engineering solution that combines infrastructure automation, GitOps workflows, developer productivity tools, and enterprise security in a scalable, maintainable manner. +## AI Assistant Guidance + +### Key Concepts for AI Understanding + +#### GitOps Workflow +- **Source of Truth**: Git repositories contain all configuration +- **ArgoCD**: Monitors Git and applies changes to Kubernetes clusters +- **ApplicationSets**: Generate multiple Applications from templates +- **Sync Policies**: Automatic vs manual synchronization strategies + +#### Platform Services +- **Backstage**: Developer portal for service catalog and templates +- **Keycloak**: Identity provider for SSO across all platform services +- **External Secrets**: Manages secrets from AWS Secrets Manager +- **Ingress Controllers**: Handle traffic routing and SSL termination + +#### Common Troubleshooting Areas +1. **ArgoCD Sync Issues**: Applications stuck in "OutOfSync" or "Progressing" state +2. **Secret Management**: External Secrets not creating Kubernetes secrets +3. **Authentication**: OIDC integration problems between services +4. **Networking**: Ingress and load balancer configuration issues +5. **Resource Dependencies**: Services failing due to missing prerequisites + +### Typical User Questions and Guidance + +#### "How do I deploy an application?" +Guide users to: +1. Use Backstage templates for scaffolding +2. Understand GitOps workflow (Git commit β†’ ArgoCD sync) +3. Check ArgoCD UI for deployment status +4. Review application logs and events for issues + +#### "ArgoCD shows my application as unhealthy" +Help users: +1. Check ArgoCD application details and events +2. Verify Kubernetes resource status (`kubectl get pods`, `kubectl describe`) +3. Review application logs (`kubectl logs`) +4. Check for resource quotas or permission issues + +#### "I can't access platform services" +Troubleshoot: +1. Verify ingress configuration and DNS resolution +2. Check Keycloak authentication and user permissions +3. Validate SSL certificates and load balancer health +4. Review security group and network policy settings + +#### "How do I customize the platform?" +Direct users to: +1. Modify Helm charts in `gitops/addons/charts/` +2. Update ApplicationSet templates for new services +3. Add custom Backstage templates in `platform/backstage/templates/` +4. Configure environment-specific overrides + +### File Structure Context + +#### Critical Directories +- `gitops/`: All GitOps configurations and ArgoCD ApplicationSets +- `platform/infra/terraform/`: Infrastructure as Code modules +- `platform/backstage/`: Developer portal configuration and templates +- `scripts/`: Deployment and utility scripts + +#### Key Configuration Files +- `gitops/addons/bootstrap/`: Core platform service ApplicationSets +- `gitops/fleet/`: Multi-cluster management configurations +- `platform/infra/terraform/*/variables.tf`: Infrastructure configuration options +- `platform/backstage/templates/`: Software templates for developers + +### Environment Variables and Configuration + +#### Required Environment Variables +```bash +RESOURCE_PREFIX="peeks-workshop" # Prefix for all AWS resources +GIT_PASSWORD="${IDE_PASSWORD}" # Authentication for Git operations +TFSTATE_BUCKET_NAME="..." # S3 bucket for Terraform state +AWS_REGION="us-west-2" # Target AWS region +``` + +#### Common Configuration Patterns +- Resource naming: `${resource_prefix}-${component}-${type}` +- Secret naming: `${resource_prefix}-${service}-${secret_type}` +- Cluster naming: `${resource_prefix}-${environment}-cluster` + +### Integration Points + +#### With platform-engineering-on-eks Repository +- Shares Terraform state backend (S3 bucket, DynamoDB table) +- Uses IAM roles and policies created by bootstrap infrastructure +- Inherits environment variables from CDK deployment +- Connects to IDE environment for development workflow + +#### With AWS Services +- **EKS**: Kubernetes clusters for application hosting +- **Secrets Manager**: External secret storage +- **Route 53**: DNS management for ingress +- **CloudWatch**: Logging and monitoring integration + +### Best Practices for AI Assistance + +1. **Always check prerequisites**: Ensure bootstrap infrastructure exists before platform deployment +2. **Verify environment variables**: Many issues stem from missing or incorrect environment configuration +3. **Follow GitOps principles**: Changes should go through Git, not direct kubectl commands +4. **Use deployment scripts**: Never run terraform commands directly, always use provided deploy.sh scripts +5. **Check ArgoCD first**: For application issues, ArgoCD UI provides the best troubleshooting starting point + +### Related Documentation +- **[README.md](README.md)**: Project overview and quick start +- **[GETTING-STARTED.md](GETTING-STARTED.md)**: 30-minute evaluation guide +- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Detailed system architecture +- **[DEPLOYMENT-GUIDE.md](DEPLOYMENT-GUIDE.md)**: Step-by-step deployment instructions +- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)**: Common issues and solutions + ## Deployment and Git Configuration (2025-08-29) ### Load Balancer Naming Fix diff --git a/setup-pre-commit.sh b/setup-pre-commit.sh new file mode 100755 index 000000000..94a7e9944 --- /dev/null +++ b/setup-pre-commit.sh @@ -0,0 +1,27 @@ +#!/bin/bash + +# Setup pre-commit hooks for the appmod-blueprints repository + +echo "Setting up pre-commit hooks for appmod-blueprints..." + +# Check if pre-commit is installed +if ! command -v pre-commit &> /dev/null; then + echo "pre-commit is not installed. Installing via pip..." + pip install pre-commit +fi + +# Install the git hook scripts +pre-commit install + +echo "Pre-commit hooks installed successfully!" +echo "" +echo "The following checks will run on commit:" +echo "- Trailing whitespace removal" +echo "- End of file fixer" +echo "- YAML syntax check" +echo "- Large files check" +echo "- Merge conflict check" +echo "- Backstage TypeScript type checking" +echo "- Terraform format checking" +echo "" +echo "To run checks manually: pre-commit run --all-files" \ No newline at end of file