-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support more methods of WDL task disk specification #5001
base: master
Are you sure you want to change the base?
Conversation
…toil into issues/4995-disk-spec-wdl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to go back and get clarification from the WDL folks before we can implement this properly.
The spec talks about "persistent volumes", but doesn't really explain what those are or which tasks would expect to be able to read which other tasks' writes. The implementation here doesn't actually provide any kind of persistence that I can see, unless running somewhere where the worker nodes have persistent filesystems.
It's not really clear to me whether we're meant to be mounting arbitrary host paths into the containers, or doing something more like Docker volumes. There's a requirement for the host-side path to exist but no other evidence that the task would be able to expect to actually access anything at that host-side path, and the execution engine is somehow responsible for making the volumes be the right size, which is impossible if it is just meant to mount across whatever's already there.
Can we dig up any workflows that genuinely use the mountpoint feature as more than just a test case, to see how they expect it to behave? Can we find or elicit any more explanation from the spec authors as to what the mount point feature is meant to accomplish?
It also might not really make sense to implement this on our own in Toil without some support from MiniWDL. We rely on MiniWDL for ordering up an appropriate container given the runtime spec, and unless we need to hook it into Toil's job requirements logic or make the batch system do special things with batch-system-specific persistent storage, it would be best if we could just get this feature for free when it shows up in MiniWDL.
src/toil/wdl/wdltoil.py
Outdated
if not os.path.exists(part_mount_point): | ||
# this isn't a valid mount point | ||
raise NotImplementedError(f"Cannot use mount point {part_mount_point} as it does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the spec says we're supposed to require (the mount point needs to exist on the "host system"), but I've read the disks section of the spec twice and that requirement doesn't really make any sense, because we're mounting storage into the container. The spec doesn't say we actually do anything with this path on the host.
src/toil/wdl/wdltoil.py
Outdated
singularity_original_prepare_mounts = task_container.prepare_mounts | ||
|
||
def patch_prepare_mounts_singularity() -> List[Tuple[str, str, bool]]: | ||
""" | ||
Mount the mount points specified from the disk requirements. | ||
|
||
The singularity and docker patch are separate as they have different function signatures | ||
""" | ||
# todo: support AWS EBS/Kubernetes persistent volumes | ||
# this logic likely only works for local clusters as we don't deal with the size of each mount point | ||
mounts: List[Tuple[str, str, bool]] = singularity_original_prepare_mounts() | ||
# todo: support AWS EBS/Kubernetes persistent volumes | ||
# this logic likely only works for local clusters as we don't deal with the size of each mount point | ||
for mount_point, _ in self._mount_spec.items(): | ||
abs_mount_point = os.path.abspath(mount_point) | ||
mounts.append((abs_mount_point, abs_mount_point, True)) | ||
return mounts | ||
task_container.prepare_mounts = patch_prepare_mounts_singularity # type: ignore[method-assign] | ||
elif isinstance(task_container, SwarmContainer): | ||
docker_original_prepare_mounts = task_container.prepare_mounts | ||
|
||
try: | ||
# miniwdl depends on docker so this should be available but check just in case | ||
import docker | ||
# docker stubs are still WIP: https://github.com/docker/docker-py/issues/2796 | ||
from docker.types import Mount # type: ignore[import-untyped] | ||
|
||
def patch_prepare_mounts_docker(logger: logging.Logger) -> List[Mount]: | ||
""" | ||
Same as the singularity patch but for docker | ||
""" | ||
mounts: List[Mount] = docker_original_prepare_mounts(logger) | ||
for mount_point, _ in self._mount_spec.items(): | ||
abs_mount_point = os.path.abspath(mount_point) | ||
mounts.append( | ||
Mount( | ||
abs_mount_point.rstrip("/").replace("{{", '{{"{{"}}'), | ||
abs_mount_point.rstrip("/").replace("{{", '{{"{{"}}'), | ||
type="bind", | ||
) | ||
) | ||
return mounts | ||
task_container.prepare_mounts = patch_prepare_mounts_docker # type: ignore[method-assign] | ||
except ImportError: | ||
logger.warning("Docker package not installed. Unable to add mount points.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to add the ability to make these mounts in Toil as a monkey-patch. There's nothing here that wouldn't make just as much sense in MiniWDL (or more, since MiniWDL actually has a shared filesystem as an assumption), so instead of doing monkey-patches we should PR this machinery to MiniWDL.
Or if we're starting to add multiple monkey-patches to the TaskContainers, maybe we really want to extend them instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably is a good idea to PR this machinery to MiniWDL. The main reason why I had to monkeypatch this in the first place is because MiniWDL doesn't actually support the disks
runtime attribute. Instead of PR-ing the code one-to-one, I think a PR that adds functionality to extend the list of mount points for both docker and singularity is best
src/toil/wdl/wdltoil.py
Outdated
# this logic likely only works for local clusters as we don't deal with the size of each mount point | ||
for mount_point, _ in self._mount_spec.items(): | ||
abs_mount_point = os.path.abspath(mount_point) | ||
mounts.append((abs_mount_point, abs_mount_point, True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How sure are you that the spec actually means you're supposed to mount this particular host path through to the container?
It does say that the host path is required to exist. But it also says a lot about mounting "persistent volumes" at these mount points, and making them the required size. It doesn't seem to say anything about mounting those host paths specifically into the container.
What if you ask for a 100 GiB /tmp
mount, and /tmp
exists on the host, but /tmp
on the host is only 10 GiB and Toil is actually doing all its work on a much larger /scratch/tmp
? Shouldn't you get a 100 GiB /tmp
mount in the container that actually refers to some location in /scratch/tmp
on the host?
If the mount point feature was really meant to mount particular host paths across, wouldn't it take both a host-side path and a container-side path like the actual underlying mount functionality uses?
@stxue1 Don't we now have the clarification we need so that we can finish this? I think the WDL spec is being revised to make it clearer that the mounts are meant to request so much storage available at such-and-such a path in the container, and that they are not actually meant to mount specific paths into the container. But we still need the changes in this PR adding array-of-mounts support. |
…toil into issues/4995-disk-spec-wdl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the total space requirement is not actually getting through to the job's disk requirement field. Also, we're only checking for enough space for each mountpoint individually, when we know they're all going to be fulfilled from the same underlying filesystem and we can just check for the total amount of free space.
src/toil/wdl/wdltoil.py
Outdated
if part.replace(".", "", 1).isdigit(): | ||
# round down floats | ||
part_size = int(float(part)) | ||
continue | ||
if i == 0: | ||
# mount point is always the first | ||
specified_mount_point = part | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mount point isn't always the first; if the first item is all numeric it gets interpreted as a size and not a mount point.
I think we get away with this because it's impossible to have an all-numeric mount point that is valid, since the mount point is specified to be an "absolute" path and thus necessarily contains /
.
But if someone asks for 001 15 GiB
hoping for a directory named 001
somewhere, we are not going to parse that like they meant it. I think we will interpret that as a root volume size of 15 GiB.
src/toil/wdl/wdltoil.py
Outdated
if part_size is not None: | ||
# suffix will always be after the size, if it exists | ||
part_suffix = part | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything here to prohibit extraneous pieces. We probably should reject anything that follows neither the spec nor Cromwell's convention, because in that case we know we can't do whatever is being asked for.
# can't imagine that ever being standardized; just leave it | ||
# alone so that the workflow doesn't rely on this weird and | ||
# likely-to-change Cromwell detail. | ||
logger.warning('Not rounding LOCAL disk to the nearest 375 GB; workflow execution will differ from Cromwell!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also probably switch the default unit to GB
here, since that is what the Cromwell syntax expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to keep the default unit to GiB
as that is the WDL spec default https://github.com/openwdl/wdl/blob/e43e042104b728df1f1ad6e6145945d2b32331a6/SPEC.md?plain=1#L5082
src/toil/wdl/wdltoil.py
Outdated
if mount_spec.get(specified_mount_point) is not None: | ||
# raise an error as all mount points must be unique | ||
raise ValueError(f"Could not parse disks = {disks_spec} because the mount point {specified_mount_point} is specified multiple times") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never actually add local-disk
in here, so you are going to be allowed to specify it multiple times. Maybe that's fine? It makes sense to sum them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check to catch local-disk
being specified multiple times
part_suffix = "GB" | ||
|
||
per_part_size = convert_units(part_size, part_suffix) | ||
total_bytes += per_part_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total space needed (including any local-disk
or mount-point-less size) doesn't ever get copied to runtime_disk
and so isn't actually used for scheduling.
src/toil/wdl/wdltoil.py
Outdated
|
||
per_part_size = convert_units(part_size, part_suffix) | ||
total_bytes += per_part_size | ||
if specified_mount_point is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not prohibiting multiple mount-point-less specifications, or using e.g. 25 GiB
and local-disk 300 SSD
together. And we also don't store either of those in the mount_spec
dict, which means they don't feed into the df
check later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a check now, I also store it into the mount_spec
dict now
src/toil/wdl/wdltoil.py
Outdated
def ensure_mount_point(self, file_store: AbstractFileStore, mount_spec: Dict[str, int]) -> Dict[str, str]: | ||
""" | ||
Ensure the mount point sources are available. | ||
|
||
Will check if the mount point source has the requested amount of space available. | ||
|
||
Note: We are depending on Toil's job scheduling backend to error when the sum of multiple mount points disk requests is greater than the total available | ||
For example, if a task has two mount points request 100 GB each but there is only 100 GB available, the df check may pass | ||
but Toil should fail to schedule the jobs internally | ||
|
||
:param mount_spec: Mount specification from the disks attribute in the WDL task. Is a dict where key is the mount point target and value is the size | ||
:param file_store: File store to create a tmp directory for the mount point source | ||
:return: Dict mapping mount point target to mount point source | ||
""" | ||
logger.debug("Detected mount specifications, creating mount points.") | ||
mount_src_mapping = {} | ||
# Create one tmpdir to encapsulate all mount point sources, each mount point will be associated with a subdirectory | ||
tmpdir = file_store.getLocalTempDir() | ||
|
||
# The POSIX standard doesn't specify how to escape spaces in mount points and file system names | ||
# The only defect of this regex is if the target mount point is the same format as the df output | ||
# It is likely reliable enough to trust the user has not created a mount with a df output-like name | ||
regex_df = re.compile(r".+ \d+ +\d+ +(\d+) +\d+% +.+") | ||
try: | ||
for mount_target, mount_size in mount_spec.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this as a loop over each requested mount point (except the local-disk
/no-mount-point ones that set the size of the working directory, whicvh never made it into mount_spec
), and checking for space for each individually, we should sum up all the space needed and check for at least that much total space. We know all the mount points and also the task working directory are going to come from the same underlying filesystem where the Toil work directory is. So we only need to ask about the free space on that filesystem once.
Then we would no longer have the two-100GB-mount-points problem; we would know that we need 200GB total and only have say 150.
We can probably just read the job's disk requirement (I think self.requirements.disk
) and make sure df
shows that much space. Then we don't need two copies of the logic to sum up the total of all the mount points.
…toil into issues/4995-disk-spec-wdl
Co-authored-by: Adam Novak <[email protected]>
Closes #4995
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist