This Python script (db_refresh_copy.py
) automates the process of copying database dumps between AWS EC2 instances using AWS Systems Manager (SSM) and Amazon S3. It supports multi-threaded execution, file compression, splitting large files, and service management (start/stop) on source and destination instances.
- Overview
- Features
- Prerequisites
- Installation
- Usage
- Input File Format
- Output File Format
- How It Works
- Logging
- Error Handling
- Contributing
- License
The db_refresh_copy.py
script is designed to facilitate the transfer of database dumps between AWS EC2 instances. It uses AWS SSM to execute commands on instances, compresses and optionally splits database dumps, uploads them to an S3 bucket, and downloads/extracts them on destination instances. The script supports dry-run mode for testing and multi-threaded processing for efficiency.
- Multi-threaded Execution π: Processes multiple destination instances concurrently to reduce execution time
- File Compression and Splitting ποΈ: Compresses database dumps and splits large files (>5GB) into smaller parts for efficient transfer
- Service Management βοΈ: Stops and starts services on source and destination instances using SSM
- Dry-run Mode π: Simulates the process without making changes for testing purposes
- Disk Space Validation πΎ: Checks available disk space before processing to prevent failures
- Size Verification β : Compares source and destination file sizes to ensure data integrity
- Error Handling π: Robust error handling for AWS SSO, SSM, and S3 operations
-
Before running the automation, make sure all AWS environment profiles mentioned in the
event.json
file are authenticated using AWS SSO. -
Run the following command for each environment profile listed under
source.environment
anddestinations[].environment
:aws sso login --profile <env>
-
Python 3.8+ π
-
AWS CLI configured with appropriate profiles and SSO authentication
-
Required Python libraries:
pip install boto3 pathlib
-
AWS Permissions π:
- Access to AWS SSM (
AWS-RunShellScript
document) - Read/write permissions for the specified S3 bucket
- Permissions to manage EC2 instances and services
- Access to AWS SSM (
-
Tools on EC2 Instances:
tar
andpigz
for compressionaws
CLI installed for S3 operations
-
Input File: A JSON file (
event.json
) specifying services, instances, and paths
- Clone the repository:
git clone https://vcsmgt.atpco.org/Engines/db_refresh.git cd db-refresh-copy
- Install required Python packages:
pip install -r requirements.txt
- Configure AWS CLI:
aws configure
- Place the input
event.json
file in theevent
directory
- Prepare the
event.json
file - Run the script:
python db_refresh_copy.py
- The script will generate an output file in the
output
directory
Dry-run Mode: Set "dryRun": true
in event.json
to simulate without changes
{
"services": [
{
"enabled": true,
"name": "Routings Engine",
"source": {
"environment": "engu",
"instanceId": "i-0b1b77f5c6948d082",
"path": "/opt/atpco/engine/db/neo4j/chgdetroutings/Routings"
},
"destinations": [
{
"environment": "ppj",
"instanceId": "i-057d31b7dfd13887d",
"path": "/opt/atpco/engine/db/neo4j/chgdetroutings/Routings"
}
]
}
],
"s3Bucket": "ppj-transfer-bucket",
"dryRun": true
}
- services: List of services to process
enabled
: Boolean to enable/disable processing of the servicename
: Service name (e.g., "Routings Engine")source
: Source instance details (environment, instance ID, path)destinations
: List of destination instances (environment, instance ID, path)
- s3Bucket: S3 bucket for temporary storage
- dryRun: Boolean to enable dry-run mode
The script generates a JSON output file in the output
directory (e.g., db_copy_output_YYYYMMDD_HHMMSS.json
) with the following structure:
Dry-run Mode:
{
"status": "done",
"results": [
{
"name": "Routings Engine",
"status": "Success",
"upload_result": {
"status": "Success",
"service": "Routings Engine",
"message": "Dry-run: Upload skipped"
},
"download_results": [
{
"environment": "ppj",
"instanceId": "i-057d31b7dfd13887d",
"path": "/opt/atpco/engine/db/neo4j/chgdetroutings/Routings",
"status": "Success",
"service": "Routings Engine",
"message": "Dry-run: Download skipped"
},
{
"environment": "ppj",
"instanceId": "i-0591295a1783a221a",
"path": "/opt/atpco/engine/db/neo4j/chgdetroutings/Routings/",
"status": "Success",
"service": "Routings Engine",
"message": "Dry-run: Download skipped"
}
],
"message": "All destinations completed"
}
]
}
Normal Execution:
{
"status": "done",
"results": [
{
"name": "Routings Engine",
"status": "Success",
"upload_result": {
"status": "Success",
"service": "Routings Engine",
"message": "Upload completed",
"duration_seconds": 271.87160444259644
},
"download_results": [
{
"environment": "ppj",
"instanceId": "i-057d31b7dfd13887d",
"path": "/opt/atpco/engine/db/neo4j/chgdetroutings/Routings",
"status": "Success",
"service": "Routings Engine",
"message": "Download and extraction completed for Routings Engine",
"details": [
"Download and extraction completed successfully"
],
"duration_seconds": 209.64332580566406
},
{
"environment": "ppj",
"instanceId": "i-0591295a1783a221a",
"path": "/opt/atpco/engine/db/neo4j/chgdetroutings/Routings/",
"status": "Success",
"service": "Routings Engine",
"message": "Download and extraction completed for Routings Engine",
"details": [
"Download and extraction completed successfully"
],
"duration_seconds": 267.83109521865845
}
],
"message": "All destinations completed"
}
]
}
- status: Overall execution status ("done")
- results: List of service results, including:
name
: Service namestatus
: Success or Errorupload_result
: Details of the upload process (status, message, duration)download_results
: List of download results for each destination instancemessage
: Summary message
-
Initialization:
- Loads the input
event.json
file - Configures logging and AWS sessions
- Loads the input
-
Service Processing:
- Iterates through enabled services in the input file
- For each service:
- Source Instance:
- Stops the service using SSM
- Checks available disk space
- Compresses the database dump (splits if >5GB)
- Uploads the compressed file(s) to S3
- Restarts the service
- Destination Instances:
- Stops the service
- Downloads and extracts the file(s) from S3
- Restarts the service
- Verifies file sizes to ensure data integrity
- Source Instance:
-
Output:
- Generates a JSON output file with execution details
- Logs all actions and errors
- The script uses Python's
logging
module with INFO level by default - Logs include:
- Start/end of script execution π
- Service processing status π
- Upload/download progress π€π₯
- Errors and warnings β
β οΈ
- Logs are prefixed with emojis for clarity (e.g., β for success, β for errors)
- AWS SSO Errors: Prompts to run
aws sso login --profile env
if the SSO session expires - Disk Space Issues: Checks available disk space and logs the required cleanup size if insufficient
- SSM Command Failures: Captures and logs SSM command errors
- S3 Upload/Download Failures: Logs specific errors and marks the service status as "Error"
- General Exceptions: Logs full stack traces for debugging
S3 policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::985663031727:root"
},
"Action": [
"s3:PutObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::ppj-transfer-bucket",
"arn:aws:s3:::ppj-transfer-bucket/*"
]
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::985663031727:root"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::ppj-transfer-bucket/*"
}
]
}