Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganise run-processing logs #118

Merged
merged 2 commits into from
Oct 30, 2023
Merged

Reorganise run-processing logs #118

merged 2 commits into from
Oct 30, 2023

Conversation

takluyver
Copy link
Member

At present, variables evaluated in listener subprocesses go into amore.log, variables evaluated under amore-proto reprocess are logged in the terminal of whoever ran it, and variables evaluated in a Slurm job get a per-job log file in slurm_logs. This makes it hard to find any log messages from processing a given run (see #45, #49).

With this change, all DAMNIT processing for a given run will be logged in a file like process_logs/r123_p4507.out. I've tried to put some markers in to delineate different processes adding to the file.

Example log file
----- Processing r147 (p3210) -----
2023-10-27 19:27:45,343 INFO __main__: args.run_data='raw', args.match=[]
2023-10-27 19:27:45,472 INFO __main__: Ensured p3210 r147 in database
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.0068 s
INFO:__main__:Using 7 variables (of 8) from context file (cluster variables will be processed later)
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.005 s
INFO:__main__:Writing 8 variables to 22 datasets in extracted_data/p3210_r147.h5
INFO:__main__:Writing 8 variables to 8 datasets in /tmp/tmpta779nkr/reduced.h5
Running in /home/kluyvert/Code/mid-2833/env/bin/python3
2023-10-27 19:28:00,033 INFO __main__: Reduced data has 8 fields
2023-10-27 19:28:00,034 INFO __main__: Adding p3210 r147 to database, with 8 columns
2023-10-27 19:28:00,059 INFO __main__: Sent Kafka update to topic 'amore-db-f7186c94cfbc37ea52d983c43021fe83c109c110'
2023-10-27 19:28:00,099 INFO __main__: Launched Slurm job 3873126 to calculate cluster variables
Running in /home/kluyvert/Code/mid-2833/env/bin/python3

----- Processing r147 (p3210) -----
2023-10-27 19:28:02,152 INFO __main__: args.run_data='raw', args.match=[]
2023-10-27 19:28:02,152 INFO __main__: Extracting cluster variables in Slurm job 3873126 on max-exfl021.desy.de
2023-10-27 19:28:02,284 INFO __main__: Ensured p3210 r147 in database
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.063 s
INFO:__main__:Using 8 variables (of 8) from context file 
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.0051 s
INFO:__main__:Writing 9 variables to 25 datasets in extracted_data/p3210_r147.h5
INFO:__main__:Writing 9 variables to 9 datasets in /tmp/tmpfe1g0kjs/reduced.h5
Running in /home/kluyvert/Code/mid-2833/env/bin/python3
2023-10-27 19:28:06,871 INFO __main__: Reduced data has 9 fields
2023-10-27 19:28:06,871 INFO __main__: Adding p3210 r147 to database, with 9 columns
2023-10-27 19:28:06,877 INFO __main__: Sent Kafka update to topic 'amore-db-f7186c94cfbc37ea52d983c43021fe83c109c110'
Running in /home/kluyvert/Code/mid-2833/env/bin/python3

Question: Should we keep older logs at all? For now, when you reprocess a run, its log file is overwritten. Another options is to append each new processing to the same file, and another is to write a new file each time (while making it easy to see the latest).

I view this PR as part 1 of 2: this changes where the backend puts the information, part 2 will be to use the new structure so you can get to the logs from the GUI. I've got some ideas about how precisely to do that, but I think having the Slurm, non-Slurm, listener & reprocess logs all going to the same place is a good starting point.

cc @JamesWrigley @matheuscteo

Copy link
Member

@JamesWrigley JamesWrigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea, the logs being in multiple places is definitely quite annoying. When it comes to the implementation, the one concern I have is that this will lead to multiple extractors writing to the same file, which will cause their logs to be interleaved. There's two situations I can think of where this would happen:

  1. Someone reprocesses a run (perhaps only a non-cluster variable with --match) while a slurm job for that run is already in progress.
  2. At some point I want to add support for multiple slurm jobs 😈 This is useful whenever you have multiple heavy-but-independent variables.

And it would be really nice if the logs weren't interleaved 😅 That's actually my main gripe with the current logging system: locally-processed runs often overlap in execution which causes their logs to be interleaved and makes debugging tricky.

I wrote a whole thing about this on Zulip, but the solutions all seem to be rather complex. I would say that right now this is a net win and we should merge it and fix the interleaving/concurrency issues later.

WRT old logs, let's keep them and append new ones to the same file. Sometimes multiple variables will be failing and while debugging and amore-proto reprocess --match'ing one variable you might not want the logs for the other failing variables to be deleted. This also ties in the with the versioning thing, I think we should preserve logs for each version.

@takluyver
Copy link
Member Author

Yeah, I can see that interleaving is going to be annoying, especially if there are multiple Slurm jobs per run. I can see some possible ways to deal with that, but they all have their own drawbacks. Let's discuss that further elsewhere.

WRT old logs, let's keep them and append new ones to the same file.

Sure, that's easy to do. 🙂

Copy link
Member

@JamesWrigley JamesWrigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@takluyver
Copy link
Member Author

Thanks for the review 👍

@takluyver takluyver merged commit 2acc9ce into master Oct 30, 2023
@takluyver takluyver deleted the processing-logs branch October 30, 2023 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants