Skip to content

Conversation

@radusuciu
Copy link

@radusuciu radusuciu commented Nov 15, 2025

This closes #34, the issue being that validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.

Tested and working with an EC2 queue only.

Summary by CodeRabbit

  • New Features

    • Added automatic platform detection for AWS Batch compute environments (EC2 or FARGATE).
    • Implemented platform-specific resource validation with tailored constraints for each compute platform type.
  • Changes

    • Updated default memory allocation to 1024 MiB.

As described in issue snakemake#34, validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

📝 Walkthrough

Walkthrough

The changes introduce platform detection logic to identify whether AWS Batch job queues use EC2 or Fargate compute environments, with corresponding platform-specific resource validation. Two new pass-through methods are added to BatchClient for AWS Batch describe operations.

Changes

Cohort / File(s) Summary
BatchClient extensions
snakemake_executor_plugin_aws_batch/batch_client.py
Added two new pass-through methods: describe_job_queues(**kwargs) and describe_compute_environments(**kwargs) to delegate AWS Batch operations to underlying boto3 client.
BatchJobBuilder platform support
snakemake_executor_plugin_aws_batch/batch_job_builder.py
Added platform detection in __init__ via new _get_platform_from_queue() method that queries AWS Batch to determine EC2 vs. Fargate. Refactored _validate_resources() to route to platform-specific validators: _validate_fargate_resources() (enforces vCPU/memory constraints) and _validate_ec2_resources() (basic sanity checks). Updated default memory from 2048 MiB to 1024 MiB. Changed platformCapabilities in job definition registration from hardcoded EC2 to dynamic self.platform.

Sequence Diagram(s)

sequenceDiagram
    participant BatchJobBuilder
    participant AWS as AWS Batch API
    participant Validator as Resource Validator

    BatchJobBuilder->>+AWS: _get_platform_from_queue()
    AWS-->>-BatchJobBuilder: describe_job_queues response
    BatchJobBuilder->>+AWS: describe_compute_environments
    AWS-->>-BatchJobBuilder: compute environment config
    
    alt computeResources.type is FARGATE/FARGATE_SPOT
        BatchJobBuilder->>BatchJobBuilder: platform = FARGATE
    else
        BatchJobBuilder->>BatchJobBuilder: platform = EC2
    end
    
    Note over BatchJobBuilder: Platform determined and stored
    
    BatchJobBuilder->>+Validator: _validate_resources(vcpu, mem)
    
    alt platform == FARGATE
        Validator->>Validator: _validate_fargate_resources()
        Note over Validator: Check vCPU/memory<br/>combos against<br/>Fargate constraints
    else
        Validator->>Validator: _validate_ec2_resources()
        Note over Validator: Enforce vcpu >= 1<br/>and mem >= 1024 MiB
    end
    
    Validator-->>-BatchJobBuilder: validated vcpu, mem
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Platform detection logic: Review AWS Batch API integration in _get_platform_from_queue(), including error handling, fallback behavior, and logging for missing or invalid data.
  • Validator routing and logic: Verify that _validate_resources() correctly routes to platform-specific validators and that both _validate_fargate_resources() and _validate_ec2_resources() enforce the correct constraints for their respective platforms.
  • Integration points: Confirm that the dynamically determined self.platform is correctly propagated to register_job_definition() and that the default memory value change (2048 → 1024 MiB) aligns with product requirements.
  • Edge cases: Check handling of missing or incomplete AWS Batch responses and whether fallback to EC2 is appropriate for all error scenarios.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main fix: addressing overly restrictive resource validation that was assuming Fargate. It directly relates to the primary changeset.
Linked Issues check ✅ Passed The PR implements all requirements from issue #34: platform detection from job queue, Fargate-specific validation, and flexible EC2 resource handling.
Out of Scope Changes check ✅ Passed All changes are directly scoped to solving issue #34: delegating AWS Batch operations and implementing platform-aware resource validation.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@radusuciu radusuciu changed the title fix: issue with resource validation assuming Fargate #34 fix: issue with resource validation assuming Fargate (#34) Nov 15, 2025
@radusuciu radusuciu marked this pull request as ready for review November 15, 2025 17:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2525ca0 and 3cc3117.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_aws_batch/batch_client.py (1 hunks)
  • snakemake_executor_plugin_aws_batch/batch_job_builder.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • snakemake_executor_plugin_aws_batch/batch_job_builder.py
  • snakemake_executor_plugin_aws_batch/batch_client.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (2)
snakemake_executor_plugin_aws_batch/batch_client.py (2)
  • describe_job_queues (76-83)
  • describe_compute_environments (85-92)
snakemake_executor_plugin_aws_batch/constant.py (1)
  • BATCH_JOB_PLATFORM_CAPABILITIES (57-59)
🪛 Ruff (0.14.4)
snakemake_executor_plugin_aws_batch/batch_job_builder.py

102-102: Do not catch blind exception: Exception

(BLE001)


118-118: Avoid specifying long messages outside the exception class

(TRY003)


134-134: Avoid specifying long messages outside the exception class

(TRY003)


136-136: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (8)
snakemake_executor_plugin_aws_batch/batch_client.py (1)

76-92: LGTM! Clean pass-through wrappers.

Both methods follow the established pattern in the class and enable the downstream platform detection logic.

snakemake_executor_plugin_aws_batch/batch_job_builder.py (7)

33-34: Good approach to detect platform early.

Determining the platform during initialization enables fail-fast behavior and consistent platform-specific validation throughout the job lifecycle.


102-106: Broad exception catch is acceptable here.

While the static analysis tool flags the broad Exception catch, the fallback to EC2 with logging is appropriate for robustness during initialization. EC2 is the safer default as it has more flexible resource constraints than Fargate.


127-137: EC2 validation appropriately minimal.

The basic sanity checks (vcpu >= 1, memory >= 1024 MiB) are appropriate for EC2, which supports flexible resource allocation across various instance types. The 1024 MiB minimum aligns with the default memory value change.


139-150: Clean platform-aware routing.

The validation routing correctly delegates to platform-specific validators based on the detected platform. The integer conversion ensures consistent handling across both validators.


160-160: Verify the default memory reduction is intentional.

The default memory was reduced from 2048 MiB to 1024 MiB. While 1024 MiB is the minimum for EC2, this behavioral change could affect existing users who rely on the previous default.

Please confirm:

  1. Is this default change intentional and documented in release notes?
  2. Are there any implications for existing jobs that don't explicitly specify memory?

205-205: Core fix: dynamic platform capability.

Excellent! This change from a hardcoded EC2 value to the dynamically determined self.platform is the key fix that enables proper platform-specific behavior for both EC2 and Fargate environments.


70-71: ****

The assumption in the review comment is invalid. AWS Batch does not permit mixing EC2 and FARGATE compute environments in the same job queue — all compute environments attached to a single queue must use the same provisioning model (EC2/Spot or FARGATE/FARGATE_SPOT) and architecture. Since the queue's compute environments are guaranteed to have the same platform, checking only the first one is correct and sufficient.

Likely an incorrect or invalid review comment.

Comment on lines +108 to 125
def _validate_fargate_resources(self, vcpu: int, mem: int) -> tuple[str, str]:
"""Validates vcpu and memory conform to Fargate requirements.
Fargate requires strict memory/vCPU combinations.
https://docs.aws.amazon.com/batch/latest/userguide/fargate.html
"""
if mem in VALID_RESOURCES_MAPPING:
if vcpu in VALID_RESOURCES_MAPPING[mem]:
return str(vcpu), str(mem)
else:
raise WorkflowError(f"Invalid vCPU value {vcpu} for memory {mem} MB")
raise WorkflowError(f"Invalid vCPU value {vcpu} for memory {mem} MB on Fargate")
else:
min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v])
self.logger.warning(
f"Memory value {mem} MB is invalid for vCPU {vcpu}."
f"Memory value {mem} MB is invalid for vCPU {vcpu} on Fargate. "
f"Setting memory to minimum allowed value {min_mem} MB."
)
return str(vcpu), str(min_mem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle edge case: invalid vCPU with no valid memory mapping.

Line 120 could raise ValueError if the vcpu value doesn't exist in any memory mapping. This happens when both vcpu and mem are invalid.

Consider adding a check before line 120:

         else:
+            # Check if vcpu exists in any memory mapping
+            valid_memories = [m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v]
+            if not valid_memories:
+                raise WorkflowError(
+                    f"Invalid vCPU value {vcpu} for Fargate. "
+                    f"Valid vCPU values are: {sorted(set(v for values in VALID_RESOURCES_MAPPING.values() for v in values))}"
+                )
-            min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v])
+            min_mem = min(valid_memories)
             self.logger.warning(
🧰 Tools
🪛 Ruff (0.14.4)

118-118: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In snakemake_executor_plugin_aws_batch/batch_job_builder.py around lines 108 to
125, when mem is not a key in VALID_RESOURCES_MAPPING the current code computes
min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v])
which will raise a ValueError if the provided vcpu does not appear in any
mapping; add a guard: build the list of candidate memory values first, if it's
empty raise a WorkflowError with a clear message about invalid vCPU and memory
for Fargate, otherwise take min(candidate_memories), log the adjustment as
before and return str(vcpu), str(min_mem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VALID_RESOURCES_MAPPING is restrictive, and only applicable to Fargate compute environments.

1 participant