Reorganise run-processing logs #118

takluyver · 2023-10-27T17:44:10Z

At present, variables evaluated in listener subprocesses go into amore.log, variables evaluated under amore-proto reprocess are logged in the terminal of whoever ran it, and variables evaluated in a Slurm job get a per-job log file in slurm_logs. This makes it hard to find any log messages from processing a given run (see #45, #49).

With this change, all DAMNIT processing for a given run will be logged in a file like process_logs/r123_p4507.out. I've tried to put some markers in to delineate different processes adding to the file.

Example log file

----- Processing r147 (p3210) -----
2023-10-27 19:27:45,343 INFO __main__: args.run_data='raw', args.match=[]
2023-10-27 19:27:45,472 INFO __main__: Ensured p3210 r147 in database
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.0068 s
INFO:__main__:Using 7 variables (of 8) from context file (cluster variables will be processed later)
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.005 s
INFO:__main__:Writing 8 variables to 22 datasets in extracted_data/p3210_r147.h5
INFO:__main__:Writing 8 variables to 8 datasets in /tmp/tmpta779nkr/reduced.h5
Running in /home/kluyvert/Code/mid-2833/env/bin/python3
2023-10-27 19:28:00,033 INFO __main__: Reduced data has 8 fields
2023-10-27 19:28:00,034 INFO __main__: Adding p3210 r147 to database, with 8 columns
2023-10-27 19:28:00,059 INFO __main__: Sent Kafka update to topic 'amore-db-f7186c94cfbc37ea52d983c43021fe83c109c110'
2023-10-27 19:28:00,099 INFO __main__: Launched Slurm job 3873126 to calculate cluster variables
Running in /home/kluyvert/Code/mid-2833/env/bin/python3

----- Processing r147 (p3210) -----
2023-10-27 19:28:02,152 INFO __main__: args.run_data='raw', args.match=[]
2023-10-27 19:28:02,152 INFO __main__: Extracting cluster variables in Slurm job 3873126 on max-exfl021.desy.de
2023-10-27 19:28:02,284 INFO __main__: Ensured p3210 r147 in database
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.063 s
INFO:__main__:Using 8 variables (of 8) from context file 
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/MID/202221/p003210' in 0.0051 s
INFO:__main__:Writing 9 variables to 25 datasets in extracted_data/p3210_r147.h5
INFO:__main__:Writing 9 variables to 9 datasets in /tmp/tmpfe1g0kjs/reduced.h5
Running in /home/kluyvert/Code/mid-2833/env/bin/python3
2023-10-27 19:28:06,871 INFO __main__: Reduced data has 9 fields
2023-10-27 19:28:06,871 INFO __main__: Adding p3210 r147 to database, with 9 columns
2023-10-27 19:28:06,877 INFO __main__: Sent Kafka update to topic 'amore-db-f7186c94cfbc37ea52d983c43021fe83c109c110'
Running in /home/kluyvert/Code/mid-2833/env/bin/python3

Question: Should we keep older logs at all? For now, when you reprocess a run, its log file is overwritten. Another options is to append each new processing to the same file, and another is to write a new file each time (while making it easy to see the latest).

I view this PR as part 1 of 2: this changes where the backend puts the information, part 2 will be to use the new structure so you can get to the logs from the GUI. I've got some ideas about how precisely to do that, but I think having the Slurm, non-Slurm, listener & reprocess logs all going to the same place is a good starting point.

cc @JamesWrigley @matheuscteo

JamesWrigley

I think this is a great idea, the logs being in multiple places is definitely quite annoying. When it comes to the implementation, the one concern I have is that this will lead to multiple extractors writing to the same file, which will cause their logs to be interleaved. There's two situations I can think of where this would happen:

Someone reprocesses a run (perhaps only a non-cluster variable with --match) while a slurm job for that run is already in progress.
At some point I want to add support for multiple slurm jobs 😈 This is useful whenever you have multiple heavy-but-independent variables.

And it would be really nice if the logs weren't interleaved 😅 That's actually my main gripe with the current logging system: locally-processed runs often overlap in execution which causes their logs to be interleaved and makes debugging tricky.

I wrote a whole thing about this on Zulip, but the solutions all seem to be rather complex. I would say that right now this is a net win and we should merge it and fix the interleaving/concurrency issues later.

WRT old logs, let's keep them and append new ones to the same file. Sometimes multiple variables will be failing and while debugging and amore-proto reprocess --match'ing one variable you might not want the logs for the other failing variables to be deleted. This also ties in the with the versioning thing, I think we should preserve logs for each version.

damnit/backend/listener.py

takluyver · 2023-10-30T11:27:00Z

Yeah, I can see that interleaving is going to be annoying, especially if there are multiple Slurm jobs per run. I can see some possible ways to deal with that, but they all have their own drawbacks. Let's discuss that further elsewhere.

WRT old logs, let's keep them and append new ones to the same file.

Sure, that's easy to do. 🙂

JamesWrigley

LGTM!

takluyver · 2023-10-30T15:06:02Z

Thanks for the review 👍

JamesWrigley requested changes Oct 27, 2023

View reviewed changes

damnit/backend/listener.py Show resolved Hide resolved

takluyver requested a review from JamesWrigley October 30, 2023 12:08

JamesWrigley approved these changes Oct 30, 2023

View reviewed changes

takluyver added 2 commits October 30, 2023 14:57

Move logic of reprocess subcommand into extract_data module

cc4e7a4

Log processing for runs into separate files

7597dbd

takluyver force-pushed the processing-logs branch from 59dc0c8 to 7597dbd Compare October 30, 2023 15:01

takluyver merged commit 2acc9ce into master Oct 30, 2023

takluyver deleted the processing-logs branch October 30, 2023 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganise run-processing logs #118

Reorganise run-processing logs #118

takluyver commented Oct 27, 2023

JamesWrigley left a comment

takluyver commented Oct 30, 2023

JamesWrigley left a comment

takluyver commented Oct 30, 2023

Reorganise run-processing logs #118

Reorganise run-processing logs #118

Conversation

takluyver commented Oct 27, 2023

JamesWrigley left a comment

Choose a reason for hiding this comment

takluyver commented Oct 30, 2023

JamesWrigley left a comment

Choose a reason for hiding this comment

takluyver commented Oct 30, 2023