Implement duckdb #428

shahronak47 · 2025-01-13T15:37:41Z

Hi Andres, I have added the code in implement-duckdb branch. First you need to set the path of duckdb file that I have in E drive.

devtools::load_all()
Sys.setenv('PIP_CACHE_FILE' = fs::path('e:/PIP/pipapi_data/demo.duckdb'))

There is a vignette called duckdb-caching.Rmd which explains the process. There are two master files rg_master_file and fg_master_file in duckdb. They are selected based on the fill_gaps argument. I have kept both of them as empty for you to test and verify the examples.

Here are examples for you to try out. Please create the lkup object to try the following code.

## 1.

pip(country = c("AGO", "USA"), year = 2000, lkup = lkup)
# Since rg_master_file was empty initially, now it should have 2 rows
DBI::dbGetQuery(con, "select * from rg_master_file")

## 2. 
pip(country = c("AGO"), year = 2000, lkup = lkup)
# This has already been calculated above so the number of rows should still remain as 2
DBI::dbGetQuery(con, "select * from rg_master_file")

## 3.
pip(country = c("AGO", "COL"), year = 2000, lkup = lkup)
# We now have a new country here so the number of rows will now be 3
DBI::dbGetQuery(con, "select * from rg_master_file")

## 4. 

pip(country = "all", year = 2000, lkup = lkup)
DBI::dbGetQuery(con, "select * from rg_master_file")

## 5. 

pip(country = "AGO", year = "all", lkup = lkup)
DBI::dbGetQuery(con, "select * from rg_master_file")

## 6. 

pip(country = "AGO", year = "all", lkup = lkup, fill_gaps = TRUE)
DBI::dbGetQuery(con, "select * from rg_master_file")
DBI::dbGetQuery(con, "select * from fg_master_file")

dbDisconnect(con)

After you have got hang of it, you can try out different pip calls and verify the caching algorithm and the result.

.gitignore

randrescastaneda

Hi @shahronak47,

Thank you for your PR. This is looking great. I reviewed the code and have a few points to discuss:

The lines DBI::dbGetQuery(con, "select * ") cannot be executed because the con object is not available in the global environment. However, this should not be an issue if subsequent calls to pip() use the con object created in the previous call.
Running pip(country = "AGO", year = "all", lkup = lkup, fill_gaps = TRUE) produces the following warning:
```
Warning message:
Connection is garbage-collected, use dbDisconnect() to avoid this.
```
The idea of displaying whether the data is loaded from cache is great. However, this message should only appear in interactive mode, right?
We need a separate DuckDB file for each release, and it should not be placed at the root of the folder. The cache must be release-dependent.
The con object should be created in the .Rprofile when the lookups are created.

Let's discuss this further in a phone call.

Thank you.

…eam/pipapi into implement-duckdb

shahronak47 · 2025-02-14T17:53:07Z

Hi Andres,

Based on our discussions here are the changes implemented -

Duckdb file name changed from demo.duckdb to cache.duckdb
cache.duckdb now sits in root of the release folder.
A function reset_cache has been introduced to erase the data for a specific release. It requires two environment variables PIP_CACHE_LOCAL_KEY and PIP_CACHE_SERVER_KEY to be of same value for it to erase the cache. PIP_CACHE_SERVER_KEY is set at the server and PIP_CACHE_LOCAL_KEY is passed by the user.
Read and write connections are separately created in pip call and they are closed immediately after their use.
Set options(pipapi.query_live_data = TRUE) to bypass cache.
A new endpoint (/duckdb-reset) has been introduced as well to reset the cache however, I would need to talk to you about how it will be used.

…eam/pipapi into implement-duckdb

shahronak47 · 2025-03-06T16:13:16Z

Hi @randrescastaneda , one thing to note here is the latest duckplyr ( > 1.0.0) depends on dplyr so we need to import dplyr in pipapi.

randrescastaneda · 2025-03-10T14:33:31Z

Hi @randrescastaneda , one thing to note here is the latest duckplyr ( > 1.0.0) depends on dplyr so we need to import dplyr in pipapi.

OOHHH. Ok. Let's do it. It is not ideal, but it is what it is. Thanks for letting me Know.

Best,
Andres

shahronak47 and others added 21 commits December 12, 2024 06:46

first draft

0beb12c

bring all functions in pipapi

8f799dd

finish 3rd case

5adc8bf

draft for case 4

8d0d547

change all case

48669d7

fix case 4

6efb333

draft push

af7bad1

making sure everything works except country and year all

dbbfa79

new version

fa96de3

fix for new implementation

cd752f3

change for fill_gaps

3da0aff

use more keys for joining

55ccf31

final touches

4d40952

add data'

050fa11

Speed comparison

49d5d55

time complete

f4295b2

add more stats

7e377aa

more comparison

82902af

update vignettee

2726d0e

update numbers

819dc36

call connection object only once

33bb2f8

shahronak47 commented Jan 13, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

Ronak Sunil Shah and others added 8 commits January 14, 2025 06:20

update speed comparison

21d4331

include dcos

83cc628

fix conflicts

79b05e2

rm missing comma

b6795ad

push draft

9aaf684

ready for separate master files

7c52503

remove bugs

4f03a8d

fix docs

70ed16e

update timing

fb62569

shahronak47 requested a review from randrescastaneda January 24, 2025 16:08

shahronak47 added 2 commits January 27, 2025 20:01

add updates

c2c5fbb

Vignette builder

82c8903

randrescastaneda requested changes Feb 4, 2025

View reviewed changes

shahronak47 added 7 commits February 7, 2025 23:22

separate read and write connection

d054382

added reset cache function

f668d2e

move connection in func;reset_cache ready for API

2d4c9f4

option to query live data

0fd49fc

Merge branch 'DEV' into implement-duckdb

dc65a50

add an API endpoint

6ed301e

Merge branch 'implement-duckdb' of https://github.com/PIP-Technical-T…

5bcec83

…eam/pipapi into implement-duckdb

Ronak Sunil Shah and others added 11 commits February 19, 2025 11:46

fix-fg_pip_local

80eca1d

early response for empty table

a6cec56

Merge branch 'implement-duckdb' of https://github.com/PIP-Technical-T…

5ed78a4

…eam/pipapi into implement-duckdb

add test file for testing cache

f75b011

add default pipapi.query_live_data option and clean it up a little

344d001

create file if it doesn't exist

f37d565

Merge branch 'implement-duckdb' of https://github.com/PIP-Technical-T…

a61cdb7

…eam/pipapi into implement-duckdb

lineup_year issue solve

3d8b3d0

fix tests

8514b1b

change condition

5afb426

depend on latest duckplyr

c0f30aa

update vignette

1623ca8

randrescastaneda approved these changes Mar 19, 2025

View reviewed changes

randrescastaneda merged commit aa15bc9 into DEV Mar 19, 2025

randrescastaneda deleted the implement-duckdb branch March 19, 2025 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement duckdb #428

Implement duckdb #428

Uh oh!

shahronak47 commented Jan 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

randrescastaneda left a comment

Uh oh!

shahronak47 commented Feb 14, 2025

Uh oh!

shahronak47 commented Mar 6, 2025

Uh oh!

randrescastaneda commented Mar 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement duckdb #428

Implement duckdb #428

Uh oh!

Conversation

shahronak47 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

randrescastaneda left a comment

Choose a reason for hiding this comment

Uh oh!

shahronak47 commented Feb 14, 2025

Uh oh!

shahronak47 commented Mar 6, 2025

Uh oh!

randrescastaneda commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shahronak47 commented Jan 13, 2025 •

edited

Loading

randrescastaneda commented Mar 10, 2025 •

edited

Loading