Update parallel vignette to address {data.table} multithreading #319

jdblischak · 2025-04-02T20:22:18Z

This PR updates the parallel vignette to better compare the results from sequential and parallel processing. We thank @michaelmayer2 for alerting us to the fact that the default multithreading of {data.table} made the sequential processing timings confusing.

I made the following updates:

Updated the variable names to reflect whether they were created by sequential or parallel processing
Two important code chunks with the results had set eval=FALSE. I replaced the results table with a histogram to demonstrate the difference in durations for the two different enrollment strategies. I used all.equal() to demonstrate that set.seed() worked as expected
I set data.table::setDTthreads(threads = 1) so that the sequential processing time was completely sequential. I didn't need to do any extra for the parallel processing because {data.table} automatically sets threads = 1 when it runs inside a forked process. Thus this will behave like we want when run in a non-interactive R session on a Linux machine (namely in GitHub Actions to produce the pkgdown page and on CRAN to produce the bundled vignette). The multithreading behavior gets more complex when run in Windows or from RStudio, but I don't think we need to worry about this

nanxstats

I appreciate the suggestions and efforts to improve the correctness and readability of this parallel vignette - it's often a challenge because it's more convoluted than it should be. Now this is the definition of ✨clarity✨

jdblischak added 4 commits April 2, 2025 13:50

Update variable names to reflect results from parallel runs

c13d50e

Evaluate code chunks with results in parallel vignette

9b533c6

Set {data.table} threads to 1 for sequential run of sim_fixed_n()

440dadc

Add caveat that parallel behavior is affected by Windows and RStudio

552c483

jdblischak requested review from nanxstats and cmansch April 2, 2025 20:22

jdblischak self-assigned this Apr 2, 2025

jdblischak changed the title ~~Update parallel vignette~~ Update parallel vignette to address {data.table} multithreading Apr 2, 2025

nanxstats approved these changes Apr 3, 2025

View reviewed changes

LittleBeannie merged commit 27ad462 into Merck:main Apr 3, 2025
9 checks passed

jdblischak deleted the update-parallel-vignette branch April 4, 2025 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update parallel vignette to address {data.table} multithreading #319

Update parallel vignette to address {data.table} multithreading #319

Uh oh!

jdblischak commented Apr 2, 2025

Uh oh!

nanxstats left a comment

Uh oh!

Uh oh!

Uh oh!

Update parallel vignette to address {data.table} multithreading #319

Update parallel vignette to address {data.table} multithreading #319

Uh oh!

Conversation

jdblischak commented Apr 2, 2025

Uh oh!

nanxstats left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!