-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I'm running into an issue where code that ran previously becomes so slow it doesn't finish. I've traced the slowness back to the function .[integer64 in this package.
Lines 886 to 905 in 62cd4ee
| `[.integer64` <- function(x, i, ...) { | |
| cl <- oldClass(x) | |
| ret <- NextMethod() | |
| # Begin NA-handling from Leonardo Silvestri | |
| if (!missing(i)) { | |
| if (inherits(i, "character")) { | |
| na_idx <- union(which(!(i %in% names(x))), which(is.na(i))) | |
| if (length(na_idx)) | |
| ret[na_idx] <- NA_integer64_ | |
| } else { | |
| na_idx <- is.na(rep(TRUE, length(x))[i]) | |
| if (any(na_idx)) | |
| ret[na_idx] <- NA_integer64_ | |
| } | |
| } | |
| # End NA-handling from Leonardo Silvestri | |
| oldClass(ret) <- cl | |
| remcache(ret) | |
| ret | |
| } |
Or more concretely to this line:
na_idx <- is.na(rep(TRUE, length(x))[i])In my case I have a duckdb database I'm using DBI to fetch results of a query in batches. In my case i is 100k integers and x is almost 500 million integer64's. Materializing a vector of size x seems to be the issue. Using tag 4.0.5 does not have this issue while 4.5.2 and later have it. From the changelog it there was a bugfix I think is responsible: ""[.integer64"(x,i) can now cope with i longer than x"
The following code should demonstrate the difference when running with tags 4.0.5 and 4.6.0-1
# remotes::install_github("r-lib/[email protected]") or remotes::install_github("r-lib/[email protected]")
library(bit64)
x <- as.integer64(rep(1, 1e8))
i <- sample(c(NA, 1:1e4, 1e7 + 1), size = 1e5, replace = TRUE)
microbenchmark(
bit64 = x[i],
times = 10L
)4.0.5:
Unit: microseconds
expr min lq mean median uq max neval
bit64 672.959 894.574 916.0243 945.992 961.139 1027.361 10
4.6.0-1:
Unit: milliseconds
expr min lq mean median uq max neval
bit64 194.0301 200.1372 213.4576 204.3696 233.6707 235.2293 10
current main:
Unit: milliseconds
expr min lq mean median uq max neval
bit64 193.7276 200.4751 214.2612 205.575 232.2011 238.9536 10
Roughly a difference of ~333x. I thought about replacing the offending line with
na_idx <- is.na(i) | (i > length(x))But unfortunately that doesn't work when i is boolean and length(i) > length(x) (untested case btw).