Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability of msigdbdf in standard repos? #48

Open
assaron opened this issue Mar 18, 2025 · 2 comments
Open

Availability of msigdbdf in standard repos? #48

assaron opened this issue Mar 18, 2025 · 2 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@assaron
Copy link

assaron commented Mar 18, 2025

Hi,

Due to the recent update of msigdbr my package fgsea started to fail builds, since it relies on pathways from msigdbr in one of the vignettes: https://bioconductor.org/checkResults/release/bioc-LATEST/fgsea/nebbiolo2-buildsrc.html Since now there is a separate package msigdbdf, which I can't add to dependencies, I don't have a simple clean option to fix the build, which is a bit unfortunate.

WIth this I have a question: do you have any plans to submit msigdbdf anywhere? From what I understood, you split it out due to size restrictions at CRAN? Did you consider Bioconductor? You can publish a data package there, and they can be pretty big.

Thanks,
Alexey

@igordot igordot added bug Something isn't working enhancement New feature or request labels Mar 18, 2025
@igordot
Copy link
Owner

igordot commented Mar 18, 2025

Thank you for reporting the error. Apologies about any difficulties related to the update. As you may have noticed, I mention fgsea in the msigdbr vignette. I recognize it's a very useful tool and I don't want to hinder its development.

The exact error you are seeing is due to MSigDB reorganization. The "CP:KEGG" sub-collection was converted to "CP:KEGG_MEDICUS" and "CP:KEGG_LEGACY".

However, you have a valid concern about the separate data package in general. The msigdbr package includes all Hallmark gene sets and a small subset of the rest of the database. Except for changes related to the database organization like renamed collections, code that worked before should still be working. Since only a few gene sets are returned, the results may not be biologically meaningful, but they should not throw an error.

The separate data package resulted due to CRAN size limitations. My original plan was to keepmsigdbr as a single package on CRAN, which is a great repository with few restrictions. This worked well for several years. Eventually I hit the size limit and postponed updates for a while. I tried to submit a data-only package and it was not approved. I decided to host the data package on R-universe. The caveat with non-CRAN dependencies is that they need be optional (DESCRIPTION Suggests field) so they are not installed by default.

I am considering Bioconductor ExperimentHub as another option. My main concern with any Bioconductor-based solution is it will be tied to a specific Bioconductor release. The most recent release is only available on the latest Bioconductor release which in turn is tied to the latest R release.

@assaron
Copy link
Author

assaron commented Mar 18, 2025

@igordot thanks for the info! I didn't notice that you still store some of the pathways within the msigdbr package, that's a great solution. I still have to modify the vignette, since it reproduces one particular study that relied on KEGG gene sets, but apparently it's not as bad as I though initially.

My main concern with any Bioconductor-based solution is it will be tied to a specific Bioconductor release. The most recent release is only available on the latest Bioconductor release which in turn is tied to the latest R release.

My experience with Biocnductor is pretty great. The releases are tied to half-year cycles, but you can always have the most recent version available in the development branch and github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants