-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbGetQuery() should optionally omit to attempt to create a data frame #23
Comments
This sounds like a good feature, maybe it can be introduced as Would you like to take a shot at it and i'll be happy to review it? I'm the only dedicated RMongo dev at the moment so i can use your help. On Tue, Oct 29, 2013 at 10:06 PM, mgoeker [email protected] wrote:
Tommy Chheng |
Dear Tommy, Quoting Tommy Chheng [email protected]:
Sorry for the delayed response. Last week I had a very tight schedule. I am not familiar with Scala, hence I made a first attempt in R (file Yours This message was sent using IMP, the Internet Messaging Program. |
Hi, can you attach the modifications as a pull request? |
Dear RMongo team,
thanks a lot for creating this useful R package. We are currently trying to use RMongo in conjunction with our opm package for database I/O. In our case we are using S4 objects that can be converted to and from JSON or YAML via lists. These are nested lists with a partially undefined structure, fitting well to the MongoDB concept.
I do admit that yielding data frames is normally an appropriate approach in R because users want to work with rectangular data. But in our case the underlying objects are non-rectangular, and we already have our own customized conversion functions to get the S4 objects back from a list.
When trying RMongo with our kind of data, database input was fast and coding was a pleasure (except for the need to convert all dots in R names, but this is apparently a restriction of MongoDB itself.). But database queries were more problematic because they were much slower, even though the database only contained the few previously inserted objects. When running R CMD Rprof on dbGetQuery() I noticed that most of the running time is spent within scan(), which seems to be called via read.csv(). So it might be that most of the time is gone for unnecessarily trying to create a data frame. The data frames were indeed convertible because each field was a JSON string:
mongo2opm <- function(x) {
x <- split(x, seq_len(nrow(x)))
x <- rapply(x, rjson::fromJSON, "character", NULL, "replace")
opms(x, precomputed = FALSE, skip = FALSE, group = TRUE)
}
...but there might be some unnecessary code here. (Note that opms() is our own function to obtain the objects we want.) So my question is whether dbGetQuery() could optionally skip all attempts to create a data frame and either return a list or a JSON character string, or whether there are any other solutions to the problem.
Yours
Markus
The text was updated successfully, but these errors were encountered: