Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more methods of WDL task disk specification #5001

Merged
merged 78 commits into from
Oct 10, 2024
Merged
Changes from 8 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
3971be6
better disk logic and add logic to mount specific points
stxue1 Jun 29, 2024
48990d0
cromwell compatibility
stxue1 Jun 29, 2024
e8d223c
Convert from wdl string to normal string
stxue1 Jun 29, 2024
fbe0eef
Merge branch 'master' into issues/4995-disk-spec-wdl
stxue1 Jun 29, 2024
8f7199a
floats
stxue1 Jul 1, 2024
166bf41
Merge branch 'issues/4995-disk-spec-wdl' of github.com:DataBiosphere/…
stxue1 Jul 1, 2024
d7719b9
Satisfy mypy
stxue1 Jul 1, 2024
0c131e0
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 2, 2024
b4714ec
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 10, 2024
4443728
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 11, 2024
8da0d7d
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 16, 2024
9379715
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 17, 2024
85a4df9
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 18, 2024
6aa73b1
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 19, 2024
cc36f71
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 19, 2024
36fd277
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 19, 2024
c7bec56
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 20, 2024
4b4e7f0
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
6bf2a4e
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
31b7e27
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
98bba55
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
25e0e51
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 26, 2024
42fedf6
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 29, 2024
d35d033
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 30, 2024
9d75e4b
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 31, 2024
f752535
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 1, 2024
4ce33d5
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 3, 2024
e9df2f9
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 8, 2024
01b8102
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 14, 2024
2955c4d
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 15, 2024
a364601
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 19, 2024
02d873f
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 19, 2024
b65b315
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 21, 2024
6f30676
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 22, 2024
58196ce
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 22, 2024
66d3e50
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 22, 2024
ea19cb6
Follow new spec
stxue1 Aug 23, 2024
32c65dd
mypy
stxue1 Aug 23, 2024
63b4410
Merge branch 'issues/4995-disk-spec-wdl' of github.com:DataBiosphere/…
stxue1 Aug 23, 2024
a1a8651
Support cromwell disks attributes for backwards compatibility
stxue1 Aug 23, 2024
29ffd3f
Deal with pipes deprecation
stxue1 Aug 23, 2024
7068810
Update md5sum test to be compatible with newer docker/singularity ver…
stxue1 Aug 23, 2024
c090823
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 27, 2024
14e2ee1
Merge branch 'master' of github.com:DataBiosphere/toil into issues/49…
stxue1 Aug 29, 2024
901c4c2
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 3, 2024
aa58e2f
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 10, 2024
8b15af6
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 10, 2024
e04f5c1
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 12, 2024
ae2f169
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 16, 2024
eb56ef9
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 16, 2024
a21fc3a
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 16, 2024
1a098b4
Address comments
stxue1 Sep 17, 2024
ceccb07
Merge branch 'issues/4995-disk-spec-wdl' of github.com:DataBiosphere/…
stxue1 Sep 17, 2024
839e09b
Update src/toil/wdl/wdltoil.py
stxue1 Sep 17, 2024
89ca0d4
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 24, 2024
bf3ca2a
Fix missing reverse iteration loop and make local-disk disambiguation…
stxue1 Sep 24, 2024
4130ab8
move out disk parse into a function
stxue1 Sep 24, 2024
d194edc
Fix issues with cromwell compatibility
stxue1 Sep 24, 2024
f2080a8
Move local-disk into parse and dont convert_units in parse function
stxue1 Sep 24, 2024
f0dbe3f
Fix edge case where only size is requested
stxue1 Sep 24, 2024
e6a9082
Add tests
stxue1 Sep 24, 2024
6fbef8c
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 26, 2024
68fb254
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 26, 2024
8d093dd
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 26, 2024
b184386
Remove redef
stxue1 Sep 27, 2024
67b2554
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 30, 2024
7f4b452
Add docstring and remove dead comment
stxue1 Oct 1, 2024
07d6b31
Merge branch 'master' of github.com:DataBiosphere/toil into issues/49…
stxue1 Oct 2, 2024
f3a3a2f
Add back dropped mount_spec argument
stxue1 Oct 2, 2024
9ab5c9c
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 3, 2024
0f247a6
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 3, 2024
067ee8d
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 3, 2024
a7a1459
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 4, 2024
c7ac131
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 4, 2024
806ff9c
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 7, 2024
acaa894
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 8, 2024
accf831
Merge branch 'master' of github.com:DataBiosphere/toil into issues/49…
stxue1 Oct 9, 2024
a0b65f4
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 100 additions & 23 deletions src/toil/wdl/wdltoil.py
Original file line number Diff line number Diff line change
Expand Up @@ -1747,40 +1747,71 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
memory_spec = human2bytes(memory_spec)
runtime_memory = memory_spec

mount_spec: Dict[str, int] = dict()
if runtime_bindings.has_binding('disks'):
# Miniwdl doesn't have this, but we need to be able to parse things like:
# local-disk 5 SSD
# which would mean we need 5 GB space. Cromwell docs for this are at https://cromwell.readthedocs.io/en/stable/RuntimeAttributes/#disks
# We ignore all disk types, and complain if the mount point is not `local-disk`.
disks_spec: str = runtime_bindings.resolve('disks').value
all_specs = disks_spec.split(',')
disks_spec: Union[List[WDL.Value.String], str] = runtime_bindings.resolve('disks').value
if isinstance(disks_spec, list):
# SPEC says to use the first one
# the parser gives an array of WDL string objects
all_specs = [part.value for part in disks_spec]
else:
all_specs = disks_spec.split(',')
# Sum up the gigabytes in each disk specification
stxue1 marked this conversation as resolved.
Show resolved Hide resolved
total_gb = 0
total_bytes: float = 0
for spec in all_specs:
# Split up each spec as space-separated. We assume no fields
# are empty, and we want to allow people to use spaces after
# their commas when separating the list, like in Cromwell's
# examples, so we strip whitespace.
spec_parts = spec.strip().split(' ')
if len(spec_parts) != 3:
# TODO: Add a WDL line to this error
raise ValueError(f"Could not parse disks = {disks_spec} because {spec} does not have 3 space-separated parts")
if spec_parts[0] != 'local-disk':
# TODO: Add a WDL line to this error
raise NotImplementedError(f"Could not provide disks = {disks_spec} because only the local-disks mount point is implemented")
try:
total_gb += int(spec_parts[1])
except:
# TODO: Add a WDL line to this error
raise ValueError(f"Could not parse disks = {disks_spec} because {spec_parts[1]} is not an integer")
part_size = None
# default to GiB as per spec
part_suffix: str = "GiB"
# default to the execution directory
part_mount_point: str = self._wdl_options.get("execution_dir") or os.getcwd()
for i, part in enumerate(spec_parts):
if part.replace(".", "", 1).isdigit():
# round down floats
part_size = int(float(part))
continue
if i == 0:
# mount point is always the first
part_mount_point = part
continue
if part_size is not None:
# suffix will always be after the size, if it exists
part_suffix = part
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything here to prohibit extraneous pieces. We probably should reject anything that follows neither the spec nor Cromwell's convention, because in that case we know we can't do whatever is being asked for.


if part_size is None:
# Disk spec did not include a size
raise ValueError(f"Could not parse disks = {disks_spec} because {spec} does not specify a disk size")

per_part_size = convert_units(part_size, part_suffix)
total_bytes += per_part_size
if mount_spec.get(part_mount_point) is not None:
# raise an error as all mount points must be unique
raise ValueError(f"Could not parse disks = {disks_spec} because the mount point {part_mount_point} is specified multiple times")

# TODO: we always ignore the disk type and assume we have the right one.
# TODO: Cromwell rounds LOCAL disks up to the nearest 375 GB. I
# can't imagine that ever being standardized; just leave it
# alone so that the workflow doesn't rely on this weird and
# likely-to-change Cromwell detail.
if spec_parts[2] == 'LOCAL':
if part_mount_point != "local-disk":
# Don't mount local-disk. This isn't in the spec, but is carried over from cromwell
mount_spec[part_mount_point] = int(per_part_size)

if not os.path.exists(part_mount_point):
# this isn't a valid mount point
raise NotImplementedError(f"Cannot use mount point {part_mount_point} as it does not exist")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the spec says we're supposed to require (the mount point needs to exist on the "host system"), but I've read the disks section of the spec twice and that requirement doesn't really make any sense, because we're mounting storage into the container. The spec doesn't say we actually do anything with this path on the host.


if part_suffix == "LOCAL":
# TODO: Cromwell rounds LOCAL disks up to the nearest 375 GB. I
# can't imagine that ever being standardized; just leave it
# alone so that the workflow doesn't rely on this weird and
# likely-to-change Cromwell detail.
logger.warning('Not rounding LOCAL disk to the nearest 375 GB; workflow execution will differ from Cromwell!')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also probably switch the default unit to GB here, since that is what the Cromwell syntax expects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the default unit to GiB as that is the WDL spec default https://github.com/openwdl/wdl/blob/e43e042104b728df1f1ad6e6145945d2b32331a6/SPEC.md?plain=1#L5082

total_bytes: float = convert_units(total_gb, 'GB')
runtime_disk = int(total_bytes)

# The gpu field is the WDL 1.1 standard, so this field will be the absolute truth on whether to use GPUs or not
Expand All @@ -1806,7 +1837,7 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
runtime_accelerators = [accelerator_requirement]

# Schedule to get resources. Pass along the bindings from evaluating all the inputs and decls, and the runtime, with files virtualized.
run_job = WDLTaskJob(self._task, virtualize_files(bindings, standard_library), virtualize_files(runtime_bindings, standard_library), self._task_id, self._namespace, self._task_path, cores=runtime_cores or self.cores, memory=runtime_memory or self.memory, disk=runtime_disk or self.disk, accelerators=runtime_accelerators or self.accelerators, wdl_options=self._wdl_options)
run_job = WDLTaskJob(self._task, virtualize_files(bindings, standard_library), virtualize_files(runtime_bindings, standard_library), self._task_id, self._namespace, self._task_path, mount_spec, cores=runtime_cores or self.cores, memory=runtime_memory or self.memory, disk=runtime_disk or self.disk, accelerators=runtime_accelerators or self.accelerators, wdl_options=self._wdl_options)
# Run that as a child
self.addChild(run_job)

Expand All @@ -1829,7 +1860,7 @@ class WDLTaskJob(WDLBaseJob):
All bindings are in terms of task-internal names.
"""

def __init__(self, task: WDL.Tree.Task, task_internal_bindings: Promised[WDLBindings], runtime_bindings: Promised[WDLBindings], task_id: List[str], namespace: str, task_path: str, **kwargs: Any) -> None:
def __init__(self, task: WDL.Tree.Task, task_internal_bindings: Promised[WDLBindings], runtime_bindings: Promised[WDLBindings], task_id: List[str], namespace: str, task_path: str, mount_spec: Dict[str, int], **kwargs: Any) -> None:
"""
Make a new job to run a task.

Expand All @@ -1853,6 +1884,7 @@ def __init__(self, task: WDL.Tree.Task, task_internal_bindings: Promised[WDLBind
self._task_id = task_id
self._namespace = namespace
self._task_path = task_path
self._mount_spec = mount_spec

###
# Runtime code injection system
Expand Down Expand Up @@ -2204,8 +2236,53 @@ def patched_run_invocation(*args: Any, **kwargs: Any) -> List[str]:
return command_line

# Apply the patch
task_container._run_invocation = patched_run_invocation # type: ignore
task_container._run_invocation = patched_run_invocation # type: ignore

singularity_original_prepare_mounts = task_container.prepare_mounts

def patch_prepare_mounts_singularity() -> List[Tuple[str, str, bool]]:
"""
Mount the mount points specified from the disk requirements.

The singularity and docker patch are separate as they have different function signatures
"""
# todo: support AWS EBS/Kubernetes persistent volumes
# this logic likely only works for local clusters as we don't deal with the size of each mount point
mounts: List[Tuple[str, str, bool]] = singularity_original_prepare_mounts()
# todo: support AWS EBS/Kubernetes persistent volumes
# this logic likely only works for local clusters as we don't deal with the size of each mount point
for mount_point, _ in self._mount_spec.items():
abs_mount_point = os.path.abspath(mount_point)
mounts.append((abs_mount_point, abs_mount_point, True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How sure are you that the spec actually means you're supposed to mount this particular host path through to the container?

It does say that the host path is required to exist. But it also says a lot about mounting "persistent volumes" at these mount points, and making them the required size. It doesn't seem to say anything about mounting those host paths specifically into the container.

What if you ask for a 100 GiB /tmp mount, and /tmp exists on the host, but /tmp on the host is only 10 GiB and Toil is actually doing all its work on a much larger /scratch/tmp? Shouldn't you get a 100 GiB /tmp mount in the container that actually refers to some location in /scratch/tmp on the host?

If the mount point feature was really meant to mount particular host paths across, wouldn't it take both a host-side path and a container-side path like the actual underlying mount functionality uses?

return mounts
task_container.prepare_mounts = patch_prepare_mounts_singularity # type: ignore[method-assign]
elif isinstance(task_container, SwarmContainer):
docker_original_prepare_mounts = task_container.prepare_mounts

try:
# miniwdl depends on docker so this should be available but check just in case
import docker
# docker stubs are still WIP: https://github.com/docker/docker-py/issues/2796
from docker.types import Mount # type: ignore[import-untyped]

def patch_prepare_mounts_docker(logger: logging.Logger) -> List[Mount]:
"""
Same as the singularity patch but for docker
"""
mounts: List[Mount] = docker_original_prepare_mounts(logger)
for mount_point, _ in self._mount_spec.items():
abs_mount_point = os.path.abspath(mount_point)
mounts.append(
Mount(
abs_mount_point.rstrip("/").replace("{{", '{{"{{"}}'),
abs_mount_point.rstrip("/").replace("{{", '{{"{{"}}'),
type="bind",
)
)
return mounts
task_container.prepare_mounts = patch_prepare_mounts_docker # type: ignore[method-assign]
except ImportError:
logger.warning("Docker package not installed. Unable to add mount points.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to add the ability to make these mounts in Toil as a monkey-patch. There's nothing here that wouldn't make just as much sense in MiniWDL (or more, since MiniWDL actually has a shared filesystem as an assumption), so instead of doing monkey-patches we should PR this machinery to MiniWDL.

Or if we're starting to add multiple monkey-patches to the TaskContainers, maybe we really want to extend them instead?

Copy link
Contributor Author

@stxue1 stxue1 Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is a good idea to PR this machinery to MiniWDL. The main reason why I had to monkeypatch this in the first place is because MiniWDL doesn't actually support the disks runtime attribute. Instead of PR-ing the code one-to-one, I think a PR that adds functionality to extend the list of mount points for both docker and singularity is best

# Show the runtime info to the container
task_container.process_runtime(miniwdl_logger, {binding.name: binding.value for binding in devirtualize_files(runtime_bindings, standard_library)})

Expand Down