Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messaging should be printed #43

Open
stemangiola opened this issue Mar 13, 2025 · 4 comments
Open

Messaging should be printed #43

stemangiola opened this issue Mar 13, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@stemangiola
Copy link

Thanks for this package

The messaging about msigdbdf is only printed if library(msigdbr) is called

library(msigdbr )
For full functionality, please install the 'msigdbdf' package with:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')

However, if the function is referenced with the package prefix (library is never called)

> msigdbr::msigdbr() 
The 'msigdbdf' package must be installed to access the full dataset.
# A tibble: 11,597 × 19
   gene_symbol ncbi_gene ensembl_gene    db_gene_symbol db_ncbi_gene
   <chr>       <chr>     <chr>           <chr>          <chr>       
 1 ABCC4       10257     ENSG00000125257 ABCC4          10257       
 2 ABTB3       121551    ENSG00000151136 ABTB3          121551      
 3 ADAMTSL3    57188     ENSG00000156218 ADAMTSL3       57188       
 4 ANKRD13A    88455     ENSG00000076513 ANKRD13A       88455       
 5 ATL1        51062     ENSG00000198513 ATL1           51062       
 6 B4GALNT3    283358    ENSG00000139044 B4GALNT3       283358      
 7 CA10        56934     ENSG00000154975 CA10           56934       
 8 CACNB1      782       ENSG00000067191 CACNB1         782         
 9 CAMK4       814       ENSG00000152495 CAMK4          814         
10 CCDC106     29903     ENSG00000173581 CCDC106        29903       
# ℹ 11,587 more rows
# ℹ 14 more variables: db_ensembl_gene <chr>, source_gene <chr>, gs_id <chr>,
#   gs_name <chr>, gs_collection <chr>, gs_subcollection <chr>,
#   gs_collection_name <chr>, gs_description <chr>, gs_source_species <chr>,
#   gs_pmid <chr>, gs_geoid <chr>, gs_url <chr>, db_version <chr>,
#   db_target_species <chr>
# ℹ Use `print(n = ...)` to see more rows

The instructions to install the package are never printed.

message("The 'msigdbdf' package must be installed to access the full dataset.")

I think the instructions for installing the package should be printed on any occasion.

Thanks a lot.

@igordot igordot added the enhancement New feature or request label Mar 13, 2025
@igordot
Copy link
Owner

igordot commented Mar 13, 2025

Thank you for the suggestion. I was debating how to best handle the messaging. A more extreme approach I considered was not including any data and returning an error, but I decided against that that since it would break some of the reverse dependencies. Regardless, I agree it's a good idea to make the message more clear. This was posted on CRAN yesterday, so I would like to give it a couple of days before pushing an update in case anything else comes up.

In general, the new version is obviously a big change from the previous CRAN releases and there were some intermediate attempts to provide a more seamless experience that did not pass CRAN review. Apologies about any complications.

@stemangiola
Copy link
Author

What is the reason for not including the dependency for the data? The experiment hub of the bioconductor allows downloading data when needed. So, the data dependency package can be gracefully added to the DESCRIPTION.

@igordot
Copy link
Owner

igordot commented Mar 14, 2025

My original plan was to keep this as a single package on CRAN. This worked well for several years. Eventually I hit the size limit. I tried to submit it as a data-only package and it was not approved. I decided to host the data package on R-universe. The caveat with non-CRAN dependencies is that they need be optional (DESCRIPTION Suggests field) so they are not installed by default.

ExperimentHub is an interesting option. I have not looked into it extensively, but I believe it would still require an extra step to install the data. More importantly, the data is tied to a specific Bioconductor release. Then the most recent release is only available on the latest Bioconductor release which in turn is tied to the latest R release.

@stemangiola
Copy link
Author

Given the scope of this software I would think Bioconductor is the right place. Definitely, you can trigger data download automatically when whatever function is called from a mother package.

Plus ExperimentHub make so you have all the caching framework for free.

The way it is now is a bit cumbersome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants