Add option to reprocess runs in parallel #250

takluyver · 2024-05-10T12:06:57Z

To address #240.

This isn't particularly elegant code, but it does work, at least in a quick test. The most awkward bit is with the extractor object: I can't just use the same one across all threads, because at least the sqlite database connection shouldn't be shared across threads. But I also don't want to create a new Extractor for each individual run, so I didn't just map() over the available runs. I'm still thinking about how to handle this better.

I wanted to show less output when processing in parallel, because the usual output would probably be too fast and too jumbled up to be much use. The longer output is still directed to the relevant file in the process_logs folder.

JamesWrigley · 2024-05-10T16:43:59Z

For the record, I'm fine with creating extractors for each run :) That's essentially what's happening now when I use parallel anyway.

takluyver · 2024-05-10T18:24:08Z

Thanks, that's good to know. I vaguely remember that sometimes setting up the Kafka producer could take a while, so for recomputing one small variable in several runs it was an annoying overhead to recreate it each time.

I'll have a look (next week) if I can find a neater way to do it, otherwise I can fall back to just creating a new Extractor for each run.

takluyver · 2024-07-01T18:09:58Z

I'm going to close this, because I think #270 meets the need better, by submitting jobs to the solaris cluster to run in parallel.

Add option to reprocess runs in parallel

0827e87

Use a thread local variable to refer to the database

877f5b4

takluyver mentioned this pull request Jun 19, 2024

Use Slurm jobs for reprocessing #270

Merged

takluyver closed this Jul 1, 2024

JamesWrigley deleted the reprocess-parallel branch July 1, 2024 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to reprocess runs in parallel #250

Add option to reprocess runs in parallel #250

takluyver commented May 10, 2024

JamesWrigley commented May 10, 2024

takluyver commented May 10, 2024

takluyver commented Jul 1, 2024

Add option to reprocess runs in parallel #250

Add option to reprocess runs in parallel #250

Conversation

takluyver commented May 10, 2024

JamesWrigley commented May 10, 2024

takluyver commented May 10, 2024

takluyver commented Jul 1, 2024