dbGetQuery() should optionally omit to attempt to create a data frame #23

mgoeker · 2013-10-30T05:06:00Z

Dear RMongo team,

thanks a lot for creating this useful R package. We are currently trying to use RMongo in conjunction with our opm package for database I/O. In our case we are using S4 objects that can be converted to and from JSON or YAML via lists. These are nested lists with a partially undefined structure, fitting well to the MongoDB concept.

I do admit that yielding data frames is normally an appropriate approach in R because users want to work with rectangular data. But in our case the underlying objects are non-rectangular, and we already have our own customized conversion functions to get the S4 objects back from a list.

When trying RMongo with our kind of data, database input was fast and coding was a pleasure (except for the need to convert all dots in R names, but this is apparently a restriction of MongoDB itself.). But database queries were more problematic because they were much slower, even though the database only contained the few previously inserted objects. When running R CMD Rprof on dbGetQuery() I noticed that most of the running time is spent within scan(), which seems to be called via read.csv(). So it might be that most of the time is gone for unnecessarily trying to create a data frame. The data frames were indeed convertible because each field was a JSON string:

mongo2opm <- function(x) {
x <- split(x, seq_len(nrow(x)))
x <- rapply(x, rjson::fromJSON, "character", NULL, "replace")
opms(x, precomputed = FALSE, skip = FALSE, group = TRUE)
}

...but there might be some unnecessary code here. (Note that opms() is our own function to obtain the objects we want.) So my question is whether dbGetQuery() could optionally skip all attempts to create a data frame and either return a list or a JSON character string, or whether there are any other solutions to the problem.

Yours
Markus

tc · 2013-10-30T16:56:21Z

This sounds like a good feature, maybe it can be introduced as
dbGetQueryJson or dbGetQueryRaw

Would you like to take a shot at it and i'll be happy to review it?

I'm the only dedicated RMongo dev at the moment so i can use your help.

On Tue, Oct 29, 2013 at 10:06 PM, mgoeker [email protected] wrote:

Dear RMongo team,

thanks a lot for creating this useful R package. We are currently trying
to use RMongo in conjunction with our opm package for database I/O. In our
case we are using S4 objects that can be converted to and from JSON or YAML
via lists. These are nested lists with a partially undefined structure,
fitting well to the MongoDB concept.

I do admit that yielding data frames is normally an appropriate approach
in R because users want to work with rectangular data. But in our case the
underlying objects are non-rectangular, and we already have our own
customized conversion functions to get the S4 objects back from a list.

When trying RMongo with our kind of data, database input was fast and
coding was a pleasure (except for the need to convert all dots in R names,
but this is apparently a restriction of MongoDB itself.). But database
queries were more problematic because they were much slower, even though
the database only contained the few previously inserted objects. When
running R CMD Rprof on dbGetQuery() I noticed that most of the running time
is spent within scan(), which seems to be called via read.csv(). So it
might be that most of the time is gone for unnecessarily trying to create a
data frame. The data frames were indeed convertible because each field was
a JSON string:

mongo2opm <- function(x) {
x <- split(x, seq_len(nrow(x)))
x <- rapply(x, rjson::fromJSON, "character", NULL, "replace")
opms(x, precomputed = FALSE, skip = FALSE, group = TRUE)
}

...but there might be some unnecessary code here. (Note that opms() is our
own function to obtain the objects we want.) So my question is whether
dbGetQuery() could optionally skip all attempts to create a data frame and
either return a list or a JSON character string, or whether there are any
other solutions to the problem.

Yours
Markus

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/23
.

Tommy Chheng

mgoeker · 2013-11-10T18:54:43Z

Dear Tommy,

Quoting Tommy Chheng [email protected]:

This sounds like a good feature, maybe it can be introduced as
dbGetQueryJson or dbGetQueryRaw

Would you like to take a shot at it and i'll be happy to review it?

I'm the only dedicated RMongo dev at the moment so i can use your help.

Sorry for the delayed response. Last week I had a very tight schedule.

I am not familiar with Scala, hence I made a first attempt in R (file
attached). It might be more efficient, however, to avoid the Scala
dbGetQuery method (and the call of toCsvOutput), and to use the JSON
parser that comes with the Java libraries that are loaded anyway. But
at least the return value is the kind of object I had in mind.

Yours
Markus

This message was sent using IMP, the Internet Messaging Program.

tc · 2013-11-11T15:49:48Z

Hi, can you attach the modifications as a pull request?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbGetQuery() should optionally omit to attempt to create a data frame #23

dbGetQuery() should optionally omit to attempt to create a data frame #23

mgoeker commented Oct 30, 2013

tc commented Oct 30, 2013

mgoeker commented Nov 10, 2013

tc commented Nov 11, 2013

dbGetQuery() should optionally omit to attempt to create a data frame #23

dbGetQuery() should optionally omit to attempt to create a data frame #23

Comments

mgoeker commented Oct 30, 2013

tc commented Oct 30, 2013

mgoeker commented Nov 10, 2013

tc commented Nov 11, 2013