-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database of all all object references in all CRAN packages #15
Comments
I love the idea of such a database. I think support for R code is most important -- I often want to find uses of methods or functions in other packages. What would be the size of the database? Should we start with a machine-readable dump into individual files committed to GitHub, and take it from there? |
@krlmlr Interim progress report on this: Can now extract network of all external calls, all done through static analyses. Example via #20 with summary of all external calls from library (pkgstats)
packageVersion ("pkgstats")
#> [1] '0.0.1.6'
u <- "https://cran.r-project.org/src/contrib/dplyr_1.0.7.tar.gz"
path <- file.path (tempdir (),
tail (strsplit (u, "\\/") [[1]], 1))
download.file (u, destfile = path)
s <- pkgstats (path)
pkgstats_summary (s)$external_calls
#> [1] "base:654,DBI:3,dplyr:316,generics:22,glue:7,graphics:1,lobstr:3,methods:11,pillar:4,rlang:3,RSQLite:1,stats:5,tidyselect:9,utils:10,vctrs:5"
# Counts of numbers of external calls to different pkgs
# Can be processed to extract further info:
x <- strsplit (pkgstats_summary (s)$external_calls, ",") [[1]]
x <- do.call (rbind, strsplit (x, ":"))
x <- data.frame (pkg = x [, 1],
ncalls = as.integer (x [, 2]))
x$ncalls_rel <- round (x$ncalls / sum (x$ncalls), 3)
x <- x [order (x$ncalls, decreasing = TRUE), ]
rownames (x) <- NULL
print (x)
#> pkg ncalls ncalls_rel
#> 1 base 654 0.620
#> 2 dplyr 316 0.300
#> 3 generics 22 0.021
#> 4 methods 11 0.010
#> 5 utils 10 0.009
#> 6 tidyselect 9 0.009
#> 7 glue 7 0.007
#> 8 stats 5 0.005
#> 9 vctrs 5 0.005
#> 10 pillar 4 0.004
#> 11 DBI 3 0.003
#> 12 lobstr 3 0.003
#> 13 rlang 3 0.003
#> 14 graphics 1 0.001
#> 15 RSQLite 1 0.001 Created on 2021-09-22 by the reprex package (v2.0.0.9000) |
@krlmlr Our discussions about Sourcegraph got me thinking that the routines in this package could be used to generate a database of all object references in all CRAN packages - function calls in R, but also arbitrarily more complex object references in all other
src
andinst
languages. All info is currently extracted in the CRAN archive trawl, yet ultimately disposed in order to summarise all stats for each package as a single vector. The full intermediate results could nevertheless be dumped in a database, the whole thing put in some publicly accessible place, and everyone would have the ability to query object relationships and cross-references within and between all R packages.I note in particular that the "References" in Sourcegraph seem to be merely text-based, and are not actual object references - the whole system treats code as mere text. With this system we could build a proper Sourcegraph-like system that linked any object (function, class, struct, whatever) to all other references in all CRAN packages. Thoughts?
The text was updated successfully, but these errors were encountered: