-
Notifications
You must be signed in to change notification settings - Fork 9
fix: issue with resource validation assuming Fargate (#34) #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: issue with resource validation assuming Fargate (#34) #37
Conversation
As described in issue snakemake#34, validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.
📝 WalkthroughWalkthroughThe changes introduce platform detection logic to identify whether AWS Batch job queues use EC2 or Fargate compute environments, with corresponding platform-specific resource validation. Two new pass-through methods are added to BatchClient for AWS Batch describe operations. Changes
Sequence Diagram(s)sequenceDiagram
participant BatchJobBuilder
participant AWS as AWS Batch API
participant Validator as Resource Validator
BatchJobBuilder->>+AWS: _get_platform_from_queue()
AWS-->>-BatchJobBuilder: describe_job_queues response
BatchJobBuilder->>+AWS: describe_compute_environments
AWS-->>-BatchJobBuilder: compute environment config
alt computeResources.type is FARGATE/FARGATE_SPOT
BatchJobBuilder->>BatchJobBuilder: platform = FARGATE
else
BatchJobBuilder->>BatchJobBuilder: platform = EC2
end
Note over BatchJobBuilder: Platform determined and stored
BatchJobBuilder->>+Validator: _validate_resources(vcpu, mem)
alt platform == FARGATE
Validator->>Validator: _validate_fargate_resources()
Note over Validator: Check vCPU/memory<br/>combos against<br/>Fargate constraints
else
Validator->>Validator: _validate_ec2_resources()
Note over Validator: Enforce vcpu >= 1<br/>and mem >= 1024 MiB
end
Validator-->>-BatchJobBuilder: validated vcpu, mem
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
snakemake_executor_plugin_aws_batch/batch_client.py(1 hunks)snakemake_executor_plugin_aws_batch/batch_job_builder.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
snakemake_executor_plugin_aws_batch/batch_job_builder.pysnakemake_executor_plugin_aws_batch/batch_client.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (2)
snakemake_executor_plugin_aws_batch/batch_client.py (2)
describe_job_queues(76-83)describe_compute_environments(85-92)snakemake_executor_plugin_aws_batch/constant.py (1)
BATCH_JOB_PLATFORM_CAPABILITIES(57-59)
🪛 Ruff (0.14.4)
snakemake_executor_plugin_aws_batch/batch_job_builder.py
102-102: Do not catch blind exception: Exception
(BLE001)
118-118: Avoid specifying long messages outside the exception class
(TRY003)
134-134: Avoid specifying long messages outside the exception class
(TRY003)
136-136: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (8)
snakemake_executor_plugin_aws_batch/batch_client.py (1)
76-92: LGTM! Clean pass-through wrappers.Both methods follow the established pattern in the class and enable the downstream platform detection logic.
snakemake_executor_plugin_aws_batch/batch_job_builder.py (7)
33-34: Good approach to detect platform early.Determining the platform during initialization enables fail-fast behavior and consistent platform-specific validation throughout the job lifecycle.
102-106: Broad exception catch is acceptable here.While the static analysis tool flags the broad
Exceptioncatch, the fallback to EC2 with logging is appropriate for robustness during initialization. EC2 is the safer default as it has more flexible resource constraints than Fargate.
127-137: EC2 validation appropriately minimal.The basic sanity checks (vcpu >= 1, memory >= 1024 MiB) are appropriate for EC2, which supports flexible resource allocation across various instance types. The 1024 MiB minimum aligns with the default memory value change.
139-150: Clean platform-aware routing.The validation routing correctly delegates to platform-specific validators based on the detected platform. The integer conversion ensures consistent handling across both validators.
160-160: Verify the default memory reduction is intentional.The default memory was reduced from 2048 MiB to 1024 MiB. While 1024 MiB is the minimum for EC2, this behavioral change could affect existing users who rely on the previous default.
Please confirm:
- Is this default change intentional and documented in release notes?
- Are there any implications for existing jobs that don't explicitly specify memory?
205-205: Core fix: dynamic platform capability.Excellent! This change from a hardcoded EC2 value to the dynamically determined
self.platformis the key fix that enables proper platform-specific behavior for both EC2 and Fargate environments.
70-71: ****The assumption in the review comment is invalid. AWS Batch does not permit mixing EC2 and FARGATE compute environments in the same job queue — all compute environments attached to a single queue must use the same provisioning model (EC2/Spot or FARGATE/FARGATE_SPOT) and architecture. Since the queue's compute environments are guaranteed to have the same platform, checking only the first one is correct and sufficient.
Likely an incorrect or invalid review comment.
| def _validate_fargate_resources(self, vcpu: int, mem: int) -> tuple[str, str]: | ||
| """Validates vcpu and memory conform to Fargate requirements. | ||
| Fargate requires strict memory/vCPU combinations. | ||
| https://docs.aws.amazon.com/batch/latest/userguide/fargate.html | ||
| """ | ||
| if mem in VALID_RESOURCES_MAPPING: | ||
| if vcpu in VALID_RESOURCES_MAPPING[mem]: | ||
| return str(vcpu), str(mem) | ||
| else: | ||
| raise WorkflowError(f"Invalid vCPU value {vcpu} for memory {mem} MB") | ||
| raise WorkflowError(f"Invalid vCPU value {vcpu} for memory {mem} MB on Fargate") | ||
| else: | ||
| min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v]) | ||
| self.logger.warning( | ||
| f"Memory value {mem} MB is invalid for vCPU {vcpu}." | ||
| f"Memory value {mem} MB is invalid for vCPU {vcpu} on Fargate. " | ||
| f"Setting memory to minimum allowed value {min_mem} MB." | ||
| ) | ||
| return str(vcpu), str(min_mem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle edge case: invalid vCPU with no valid memory mapping.
Line 120 could raise ValueError if the vcpu value doesn't exist in any memory mapping. This happens when both vcpu and mem are invalid.
Consider adding a check before line 120:
else:
+ # Check if vcpu exists in any memory mapping
+ valid_memories = [m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v]
+ if not valid_memories:
+ raise WorkflowError(
+ f"Invalid vCPU value {vcpu} for Fargate. "
+ f"Valid vCPU values are: {sorted(set(v for values in VALID_RESOURCES_MAPPING.values() for v in values))}"
+ )
- min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v])
+ min_mem = min(valid_memories)
self.logger.warning(🧰 Tools
🪛 Ruff (0.14.4)
118-118: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In snakemake_executor_plugin_aws_batch/batch_job_builder.py around lines 108 to
125, when mem is not a key in VALID_RESOURCES_MAPPING the current code computes
min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v])
which will raise a ValueError if the provided vcpu does not appear in any
mapping; add a guard: build the list of candidate memory values first, if it's
empty raise a WorkflowError with a clear message about invalid vCPU and memory
for Fargate, otherwise take min(candidate_memories), log the adjustment as
before and return str(vcpu), str(min_mem).
This closes #34, the issue being that validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.
Tested and working with an EC2 queue only.
Summary by CodeRabbit
New Features
Changes