Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 18226 - Leakage between vignettes #12

Open
gmbecker opened this issue Jun 22, 2022 · 6 comments
Open

Bug 18226 - Leakage between vignettes #12

gmbecker opened this issue Jun 22, 2022 · 6 comments
Labels
Add-ons Issues with add-ons discuss fix Add comment(s) on how to fix the bug needs diagnosis Track down the cause of the bug, or identify as not a bug

Comments

@gmbecker
Copy link

https://bugs.r-project.org/show_bug.cgi?id=18226

Do you agree this a bug?

Does the fix proposed by Duncan Murdoch make sense (probably yes, but good to consider ourselves)?

How likely is it this can be changed in the short term? If not likely, what would the staging of this change look like?

@gmbecker gmbecker added needs diagnosis Track down the cause of the bug, or identify as not a bug discuss fix Add comment(s) on how to fix the bug Add-ons Issues with add-ons labels Jun 22, 2022
@kcf-jackson
Copy link

Summary

Issue: When two vignettes are built, the function defined in the first vignette can persist to the second one.

Likely cause: When vignettes (in the same directory) are built using tools::buildVignettes(), the code chunks are evaluated using the same environment .GlobalEnv (more detail in the next section).

Analysis

I traced the issue in R-4.2.1 as follows:

  • Starting with src/library/tools/R/build.R : .build_packages -> prepare_pkg -> line 318 defines shQuote("tools::buildVignettes(dir = ".", tangle = TRUE)" . It is then executed at line 323.
    I think this is where the vignettes build process begins (when the package is built).

  • In tools::buildVignettes(), the function loops through the list of vignettes files in a given directory and calls engine$weave() to build the vignettes one at a time. engine$weave() internally calls utils::Sweave().

  • utils::Sweave() calls driver$runcode to execute a code chunk, where driver is initiated as RweaveLatex() by default.

  • driver$runcode calls evalFunc to evaluate an expression, where evalFunc is initiated as RweaveEvalWithOpt() by default.

  • evalFunc is where the evaluation of the expression happens: eval(expr, .GlobalEnv)

I believe the use of .GlobalEnv is what causes the namespace to persist over multiple vignettes.

Thoughts

@gmbecker

Do you agree this a bug?

It is unclear what the intended behavior is. If vignettes of the same directory are considered as one long vignette split into small pieces, then persisting the namespace may make sense. Otherwise, if vignettes are considered self-contained documents, then this could be a bug.

In terms of risk, between the users assuming the namespace persistence and the ones who do not, I think it is easier for the former group to find out when persistence is not available (since the latter vignettes using the namespace would just fail) than the latter group to find out the persistence is implicitly there all the time.

In any case, we may not need to speculate what the intended behavior is, but rather to support both - keeping the existing behavior as the default for backward compatibility and allowing that to be overridden when needed.

Does the fix proposed by Duncan Murdoch make sense (probably yes, but good to consider ourselves)?

It may not be necessary to start a new R session to build a vignette (assuming that is what Duncan suggested). new.env() might suffice here, but care needs to be taken to ensure that it can persist within the same file over multiple code chunks.

How likely is it this can be changed in the short term? If not likely, what would the staging of this change look like?

Not sure. It does not seem to require much change to the code, so maybe this can be changed in the short term.

Thanks.

@bastistician
Copy link

Thank you for your contribution. I think evaluation in the global environment should be considered a feature of Sweave() (and similarly knitr::knit(), by default), as it allows running the vignette in an interactive session and easily inspect/work with the results.

As discussed in the R-devel thread, one option for buildVignettes would be to clean up after each vignette, but I don't think we should consider that option. It will be imperfect, think of vignettes loading different packages, potentially modifiying the S3 methods table etc. If anything, buildVignettes should be modified to do the same as during R CMD check (since R 3.6.0), i.e., weave the vignettes in separate R sessions. That is a breaking change, but given that

(a) the current R CMD build vignette strategy does not seem to be documented (maybe someone can confirm this), and
(b) R CMD check rebuilds vignettes in separate sessions since a long time by default,

the impact of such a change should be small. Of course, changes to the build procedure need great care as they cannot be verified by simply rechecking existing source builds from CRAN for example.

@kcf-jackson
Copy link

Thank you for the fast response and for explaining the rationale!

Confirming my understanding:

  • Evaluation in the global environment is a feature.
  • Evaluation strategy is engine agnostic, so modification would not happen inside the individual engine, but at the level of the build command.
  • The way forward is to make buildVignettes weave the vignettes in separate R sessions (hence declaring vignettes leakage is indeed a bug?). The change is a breaking one; the impact should be small; great care is needed.

(a) the current R CMD build vignette strategy does not seem to be documented (maybe someone can confirm this),

I checked Manual - R-devel - Writing R extensions - section 1.4 - writing package vignettes and didn't see the strategy documented. The four points related the R CMD build are:

  • "the tarball built by R CMD build includes in inst/doc the components intended to be installed."
  • "R CMD build will automatically vignettes in inst/doc for distribution with the package sources."
  • "By default R CMD build will run Sweave on all Sweave vignette source files in vignettes..., then R CMD build will try to run make after the Sweave runs, ... ."
  • "When R CMD build builds the vignettes, it copies these and the vignette sources from directory vignettes to inst/doc."

So perhaps the next thing to explore is to see roughly how much change it would incur matching R CMD check?

P.S. Sorry I did not realise the post was a thread, and I overlooked the responses. Felt a bit silly duplicating the effort, but it has also been a lot of fun seeing firsthand how Sweave builds the document line-by-line! For future reference, this links to the last response of the thread which contains all the key discussion.

@elinw
Copy link

elinw commented Jun 24, 2022

But one issue is that if the vignettes are built in different orders they won't be the same. I think DM is really talking about that this can happen unintentionally. So the question is whether we should allow it to happen intentionally.

@gmbecker
Copy link
Author

gmbecker commented Jun 28, 2022

Some of these issues came up during discussion in the first session.

Personally, I think vignettes should be able to run in a clean session (with packages involved and data available), primarily because I think that should be true of all dynamic documents that haven't been linked together explicitly via, e.g., an explicitly declared dependency structure (a la make).

Barring that, however, it does seem to me that if vignettes are going to be run in the same session, there should be a way to declare a collate order for them, similar to what we do for R files in <pkg>/R/ via <pkg>/DESCRIPTION (Though the mechanism need not be identical, and, e.g., could live somewhere else than DESCRIPTION if desired)

@kcf-jackson
Copy link

Thanks a lot for the input.

@elinw I agree, and to add to what you said, I think it makes a difference to know whether the behavior is by design or not, because if the persistence is by design, then what needs to be changed is the documentation, not the code; the different results caused by the different vignettes order (unexpected by users) is a problem communicating the right behavior to the users, not a problem in the implementation.

@gmbecker Great point on declaring a collate order. At the moment, I think the collate order follows the order returned from list.files (called on the vignettes directory), would it be worthwhile to document that and make a note that order matters?

For everyone's information, after investigating a bit further, I notice:

  • buildVignettes() already supports building vignettes with separate session using the ser_elibs argument, which is designed for R CMD check. One can pass a empty RDS file to it to enable that option. Here is an example
# Using testpkg1 provided by Duncan Murdoch
# tools::buildVignettes(dir = ".")   # with leakage
# Without leakage
f <- tempfile()
saveRDS(NULL, file = f)
tools::buildVignettes(dir = ".", ser_elibs = f)

I don't think R CMD build accepts arguments that let you configure ser_elibs. But it is good to know what is needed is already there. The relevant section in the source code is here.

  • Another piece of information is that R CMD check seems to do two rounds of checks on the vignette code. One round checks that the code can run fine [link], and the other round checks the vignettes can be built successfully [link]. In the second round, one can use the environment variables _R_CHECK_BUILD_VIGNETTES_SEPARATELY_ and _R_CHECK_DEPENDS_ONLY_VIGNETTES_ to control the vignettes build process to use one session or separate sessions. Though I am not sure what these flags would do to the other parts of the build process.

To move forward with this issue, I think we need to settle whether there should be one / two behaviors for the R CMD build process (knowing that buildVignettes() can support both). If one, which one? If two, how should the non-default one be incorporated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Add-ons Issues with add-ons discuss fix Add comment(s) on how to fix the bug needs diagnosis Track down the cause of the bug, or identify as not a bug
Projects
None yet
Development

No branches or pull requests

4 participants