Update parallel vignette to address {data.table} multithreading #319
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the parallel vignette to better compare the results from sequential and parallel processing. We thank @michaelmayer2 for alerting us to the fact that the default multithreading of {data.table} made the sequential processing timings confusing.
I made the following updates:
eval=FALSE
. I replaced the results table with a histogram to demonstrate the difference in durations for the two different enrollment strategies. I usedall.equal()
to demonstrate thatset.seed()
worked as expecteddata.table::setDTthreads(threads = 1)
so that the sequential processing time was completely sequential. I didn't need to do any extra for the parallel processing because {data.table} automatically setsthreads = 1
when it runs inside a forked process. Thus this will behave like we want when run in a non-interactive R session on a Linux machine (namely in GitHub Actions to produce the pkgdown page and on CRAN to produce the bundled vignette). The multithreading behavior gets more complex when run in Windows or from RStudio, but I don't think we need to worry about this