Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbGetQuery() should optionally omit to attempt to create a data frame #23

Open
mgoeker opened this issue Oct 30, 2013 · 3 comments
Open

Comments

@mgoeker
Copy link

mgoeker commented Oct 30, 2013

Dear RMongo team,

thanks a lot for creating this useful R package. We are currently trying to use RMongo in conjunction with our opm package for database I/O. In our case we are using S4 objects that can be converted to and from JSON or YAML via lists. These are nested lists with a partially undefined structure, fitting well to the MongoDB concept.

I do admit that yielding data frames is normally an appropriate approach in R because users want to work with rectangular data. But in our case the underlying objects are non-rectangular, and we already have our own customized conversion functions to get the S4 objects back from a list.

When trying RMongo with our kind of data, database input was fast and coding was a pleasure (except for the need to convert all dots in R names, but this is apparently a restriction of MongoDB itself.). But database queries were more problematic because they were much slower, even though the database only contained the few previously inserted objects. When running R CMD Rprof on dbGetQuery() I noticed that most of the running time is spent within scan(), which seems to be called via read.csv(). So it might be that most of the time is gone for unnecessarily trying to create a data frame. The data frames were indeed convertible because each field was a JSON string:

mongo2opm <- function(x) {
x <- split(x, seq_len(nrow(x)))
x <- rapply(x, rjson::fromJSON, "character", NULL, "replace")
opms(x, precomputed = FALSE, skip = FALSE, group = TRUE)
}

...but there might be some unnecessary code here. (Note that opms() is our own function to obtain the objects we want.) So my question is whether dbGetQuery() could optionally skip all attempts to create a data frame and either return a list or a JSON character string, or whether there are any other solutions to the problem.

Yours
Markus

@tc
Copy link
Owner

tc commented Oct 30, 2013

This sounds like a good feature, maybe it can be introduced as
dbGetQueryJson or dbGetQueryRaw

Would you like to take a shot at it and i'll be happy to review it?

I'm the only dedicated RMongo dev at the moment so i can use your help.

On Tue, Oct 29, 2013 at 10:06 PM, mgoeker [email protected] wrote:

Dear RMongo team,

thanks a lot for creating this useful R package. We are currently trying
to use RMongo in conjunction with our opm package for database I/O. In our
case we are using S4 objects that can be converted to and from JSON or YAML
via lists. These are nested lists with a partially undefined structure,
fitting well to the MongoDB concept.

I do admit that yielding data frames is normally an appropriate approach
in R because users want to work with rectangular data. But in our case the
underlying objects are non-rectangular, and we already have our own
customized conversion functions to get the S4 objects back from a list.

When trying RMongo with our kind of data, database input was fast and
coding was a pleasure (except for the need to convert all dots in R names,
but this is apparently a restriction of MongoDB itself.). But database
queries were more problematic because they were much slower, even though
the database only contained the few previously inserted objects. When
running R CMD Rprof on dbGetQuery() I noticed that most of the running time
is spent within scan(), which seems to be called via read.csv(). So it
might be that most of the time is gone for unnecessarily trying to create a
data frame. The data frames were indeed convertible because each field was
a JSON string:

mongo2opm <- function(x) {
x <- split(x, seq_len(nrow(x)))
x <- rapply(x, rjson::fromJSON, "character", NULL, "replace")
opms(x, precomputed = FALSE, skip = FALSE, group = TRUE)
}

...but there might be some unnecessary code here. (Note that opms() is our
own function to obtain the objects we want.) So my question is whether
dbGetQuery() could optionally skip all attempts to create a data frame and
either return a list or a JSON character string, or whether there are any
other solutions to the problem.

Yours
Markus


Reply to this email directly or view it on GitHubhttps://github.com//issues/23
.

Tommy Chheng

@mgoeker
Copy link
Author

mgoeker commented Nov 10, 2013

Dear Tommy,

Quoting Tommy Chheng [email protected]:

This sounds like a good feature, maybe it can be introduced as
dbGetQueryJson or dbGetQueryRaw

Would you like to take a shot at it and i'll be happy to review it?

I'm the only dedicated RMongo dev at the moment so i can use your help.

Sorry for the delayed response. Last week I had a very tight schedule.

I am not familiar with Scala, hence I made a first attempt in R (file
attached). It might be more efficient, however, to avoid the Scala
dbGetQuery method (and the call of toCsvOutput), and to use the JSON
parser that comes with the Java libraries that are loaded anyway. But
at least the return value is the kind of object I had in mind.

Yours
Markus


This message was sent using IMP, the Internet Messaging Program.

@tc
Copy link
Owner

tc commented Nov 11, 2013

Hi, can you attach the modifications as a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants