-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flags for non-interactive use #15
Comments
I think the original idea was being able to use this tool in a background job on a cluster using a command such as sbatch. When run like this there is no user input. The job must run to completion without any user input. So anywhere we prompt for user input may not behave as expected. I'm aware of two places in the current code that reads input: cautious-robot/src/cautiousrobot/__main__.py Lines 214 to 217 in 632103d
cautious-robot/src/cautiousrobot/__main__.py Lines 224 to 227 in 632103d
I'm wondering if the code above would hang at the |
These two scenarios seem like error handling:
One approach is to have these two scenarios error out instead of prompting and add flags to allow a user to manually override. |
Another thought I had related to running cautious-robot in a cluster is running multiple instances of cautious-robot in parallel. For example using a sbatch array job. Each job would download part of a CSV. A work around that might be acceptable is to have users split their CSV into multiple files and run an instance of cautious-robot with each. Using cautious-robot this way may require some changes to how we create output directories. cautious-robot/src/cautiousrobot/__main__.py Lines 134 to 135 in 632103d
Then when re-running it works. Changing this to |
Another consideration to think of is making the tool friendly to workflow languages. cautious-robot/src/cautiousrobot/__main__.py Lines 229 to 232 in 632103d
Being able to specify the output directory or file paths for all files created by the tool would resolve this problem. |
For interactive vs non-interactive behavior, |
I think there are some edge cases for |
Adding in an ending index would also help the batching, since a user wouldn't need to create separate CSVs to implement that, but this would be a good start. |
We do have the option to specify output directory for images. Part of the reasoning for doing it this way was to match the logs to the CSV used for download and to avoid placing the logs into the image folder, as that could be rather inconvenient. Would it be reasonable to create an optional |
I agree that putting logs into the image folder would be painful. Having the an optional flag to control where output logs are created sounds good to me. |
There could be a race condition in the logging code when running multiple downloads at once if all processes use the same log filename. |
If it's being run from multiple CSVs, then the default logs would be named differently. However, if we set up an end index option for multiprocessing or if a user passed the same logging location/name for their multiple CSVs that would become an issue. We could add submission date-time to the log filename to improve robustness? |
FWIW, using
sounds like a good solution to the core issue here. For the missing URLs issue:
with a message explaining how to use the flag or to fix the input. For the existing output directory issue: So when the output location is found to already exist, With this functionality, I'm not sure when or why a user would want to overwrite any already existing output data, so an overwrite flag may not be needed unless I'm missing something? |
Agreed.
Yes, let's make that adjustment to the resized folder name, and our re-write to the resize code definitely makes the option for separating that out simpler and a good idea.
If we remove the overwrite (instead checking for each image already downloaded, as noted here), then it would be more about whether or not a user wants to place images in folder that already exists (and thus has contents that would be included in the checksum at the end). Not sure if that would warrant a flag. |
Suggested adjustment for avoiding overwrite. Everything following |
@johnbradley and @thompsonmj, any thoughts on this?
|
Yep, I think that's what I was going for. So with this change, |
Breaking that part out sounds good to me. |
Would be good to be able to schedule in batch without requiring user input. This could be implemented as an exit in case of the regular interactive prompts (non-unique filenames, not enough filenames for existing urls, or overwriting the targeted image download folder). In this case, it would likely be a good idea to check all three before exiting so the user can fix all at once instead of finding more issues on resubmit.
@johnbradley, as this was your request, I'd very much appreciate your opinion on the implementation.
The text was updated successfully, but these errors were encountered: