Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database of all all object references in all CRAN packages #15

Open
mpadge opened this issue Aug 27, 2021 · 2 comments
Open

Database of all all object references in all CRAN packages #15

mpadge opened this issue Aug 27, 2021 · 2 comments

Comments

@mpadge
Copy link
Member

mpadge commented Aug 27, 2021

@krlmlr Our discussions about Sourcegraph got me thinking that the routines in this package could be used to generate a database of all object references in all CRAN packages - function calls in R, but also arbitrarily more complex object references in all other src and inst languages. All info is currently extracted in the CRAN archive trawl, yet ultimately disposed in order to summarise all stats for each package as a single vector. The full intermediate results could nevertheless be dumped in a database, the whole thing put in some publicly accessible place, and everyone would have the ability to query object relationships and cross-references within and between all R packages.

I note in particular that the "References" in Sourcegraph seem to be merely text-based, and are not actual object references - the whole system treats code as mere text. With this system we could build a proper Sourcegraph-like system that linked any object (function, class, struct, whatever) to all other references in all CRAN packages. Thoughts?

@krlmlr
Copy link

krlmlr commented Aug 29, 2021

I love the idea of such a database. I think support for R code is most important -- I often want to find uses of methods or functions in other packages.

What would be the size of the database? Should we start with a machine-readable dump into individual files committed to GitHub, and take it from there?

@mpadge
Copy link
Member Author

mpadge commented Sep 22, 2021

@krlmlr Interim progress report on this: Can now extract network of all external calls, all done through static analyses. Example via #20 with summary of all external calls from dplyr:

library (pkgstats)
packageVersion ("pkgstats")
#> [1] '0.0.1.6'
u <- "https://cran.r-project.org/src/contrib/dplyr_1.0.7.tar.gz"
path <- file.path (tempdir (),
                   tail (strsplit (u, "\\/") [[1]], 1))
download.file (u, destfile = path)

s <- pkgstats (path)
pkgstats_summary (s)$external_calls
#> [1] "base:654,DBI:3,dplyr:316,generics:22,glue:7,graphics:1,lobstr:3,methods:11,pillar:4,rlang:3,RSQLite:1,stats:5,tidyselect:9,utils:10,vctrs:5"
# Counts of numbers of external calls to different pkgs

# Can be processed to extract further info:
x <- strsplit (pkgstats_summary (s)$external_calls, ",") [[1]]
x <- do.call (rbind, strsplit (x, ":"))
x <- data.frame (pkg = x [, 1],
                 ncalls = as.integer (x [, 2]))
x$ncalls_rel <- round (x$ncalls / sum (x$ncalls), 3)
x <- x [order (x$ncalls, decreasing = TRUE), ]
rownames (x) <- NULL
print (x)
#>           pkg ncalls ncalls_rel
#> 1        base    654      0.620
#> 2       dplyr    316      0.300
#> 3    generics     22      0.021
#> 4     methods     11      0.010
#> 5       utils     10      0.009
#> 6  tidyselect      9      0.009
#> 7        glue      7      0.007
#> 8       stats      5      0.005
#> 9       vctrs      5      0.005
#> 10     pillar      4      0.004
#> 11        DBI      3      0.003
#> 12     lobstr      3      0.003
#> 13      rlang      3      0.003
#> 14   graphics      1      0.001
#> 15    RSQLite      1      0.001

Created on 2021-09-22 by the reprex package (v2.0.0.9000)

@mpadge mpadge mentioned this issue Dec 6, 2021
4 tasks
mpadge added a commit that referenced this issue Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants