Development is a community effort, and we welcome participation.
+
+
Issues
+
Before posting a new issue or discussion topic, please take a moment to search for existing similar threads in order to avoid duplication.
+
For bug reports: if you can, please install the latest GitHub version of pipapi (i.e. remotes::install_github("PIP-Technical-Team/pipapi")) and verify that the issue still persists.
+
Describe your issue in prose as clearly and concisely as possible.
+
For any problem you identify, post a minimal reproducible example so the maintainer can troubleshoot. A reproducible example is:
+
+Runnable: post enough R code and data so any onlooker can create the error on their own computer.
+
+Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
External code contributions are extremely helpful in the right circumstances. Here are the recommended steps.
+
Prior to contribution, please propose your idea in a discussion topic or issue thread so you and the maintainer can define the intent and scope of your work.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
What this means is that if this environment variable is set
+Sys.getenv("PIPAPI_APPLY_CACHING") we create a caching file
+which saves the result on our disk. This disk is your local computer if
+you are working locally or server if the package is deployed there. This
+caching file is generated at location d i.e
+rappdirs::user_cache_dir("pipapi") in this case however,
+you can change it to any local folder while debugging. Rest of the lines
+include the functions that we are caching like pip,
+ui_hp_stacked, pip_grp_logic and so on.
+
So how this works is that we let’s say call any function which is
+cached, for example - pip.
Now a file is created at the cache location which has it’s output and
+looks like this.
+
+
This is the file whose name is hashed based on the arguments passed
+to the cached function (pip) and the log file has all the
+logs of caching operation.
then this time it will use the cached result and give the output
+instantly. No processing is done at all in this. No new file is
+generated this time around but the log file is updated.
+
+
+
Debugging caching
+
+
In pip-precaching-script repository, I have created a
+branch called debug-ronak and in this branch if you look at
+the file main.R you will see the exact script that I used
+to debug the API. Note that there are two levels where we need to check
+caching or any general PIP issue. One is when you are using
+pip directly as a function like how we showed above like
+pip(country = "all", year = "all", lkup = lkup) and another
+one is via the API like how it is shown in main.R file.
+Also note that I am using pip function as a general example
+here. This is true for all the functions in pipapi package.
+The API calls the functions from pipapi package so the
+basic code is the same across both the levels. However, API has some
+additional layers on top of these functions which might make them
+different.
+
The recent case of caching not working only for country = “all” and
+year = “all” was visible via API, however when we used the
+pip call directly caching was working perfectly fine. So in
+this case it was something in the API that was causing the trouble. It
+is very rare but it does happen every now and then. And just to conclude
+the topic the issue was that for intensive calculation like country =
+“all” and year = “all” we were using
+promises::future_promise function for asynchronous calling
+and we forgot to include promises package in the
+DESCRIPTION file of pipapi package so
+promises package was not available and it did not work when
+using country = “all” and year = “all”.
+
Moreover, it is very important to kill the API you
+are launching if you are using the same port for debugging
+(apis$kill()). We are using callr to launch
+the API in new session so we can’t actually “see” that a session has
+been launched so it is important to understand about this. For example,
+if we launch the API on port 8080 with callr, the session
+is busy in the background. In the current session that we have access to
+is working normal and we can execute our code. If we do some changes and
+run the same code to launch the API it would not reflect the changes
+because the background session is still busy with the previous code and
+has not been killed yet. In such scenario we have couple of options
+:
+
+
Use new port number to run code in different session
+
Kill the previous session and launch the API again to see the
+changes
Current caching mechanism for pip uses traditional caching where
+basically a hash is created based on the value of the arguments passed
+in the function and if someone calls the same function with the same
+arguments again the cached result is returned instead of doing the same
+calculation again. For pip we used the packages
+cachem and memoise to implement this system of
+caching. This traditional caching strategy works well in general
+however, pip is a special case and it would benefit much
+more if it had a custom strategy for caching.
+
+
+
How caching currently works?
+
+
Consider these pip functions
+
+# 1.
+pip(country ="all", year =2000, lkup =lkup)
+
+# 2.
+pip(country ="AGO", year =2000, lkup =lkup)
+
Now since these are separate set of arguments 2 files of caching are
+created and saved on the disk. Now if a call to pip is made
+again pip(country = "AGO", year = 2000, lkup = lkup) which
+is same as 2) then it would return the result from the cached file
+stored on the disk without doing any calculation. Needless to say, this
+result is much faster.
+
However, notice that the 2nd call is subset of the 1st one. What I
+mean by that is the result of 2) is already present in result of 1). We
+have done the calculations for all the countries for the year 2000 in 1)
+we just need output of “AGO” from it to get the result for 2).
+
+
+
Custom caching for pipapi.
+
+
What if we could take subset of an existing cache like how we need it
+as above. However, this is not how traditional caching systems work. We
+would need to implement something custom if we want to make this
+work.
+
We came up with an idea to implement this custom caching using
+duckdb in a table. Basically, all the queries that are
+called till now are saved in this table and whenever a new call is made
+it checks if the query is already called, if yes then it returns the
+result immediately or else it will do the calculation and then save the
+result to the table for next use and return the result. There are
+various scenarios that we need to consider and let’s take help of an
+example to understand each one of them.
+
Consider that we are just starting out and there is nothing saved in
+the table.
+
+
Scenario 1 -
+
+
+pip(country =c("AGO", "USA"), year =2000, lkup =lkup)
+
Now since nothing is saved in the table this will go through the
+whole round of calculation and save the result in the table for future
+use and return the output.
Now this is something which we have already calculated in our
+previous call. In traditional caching this would be treated as a
+separate call and the calculation would have been performed again.
+However, in our custom caching it goes through the existing table and
+checks if we already have the result for this call. Since we do have it
+saved in our table we will just return the result in this case as it is
+from the table without doing any calculation.
+
+
+
Scenario 3 -
+
+
+pip(country =c("ARG", "USA"), year =2000, lkup =lkup)
+
Notice this time it is combination of scenario 1 and 2 where one part
+of the calculation we already have (“USA”) and another part we don’t
+(“ARG”). In this case, we return the result for the country that we have
+in the table and send rest of the arguments for the calculation. We save
+the result from calculation in the table and return the output by
+combining both the result.
In this scenario before we check in the table we need to decode this
+“all” argument to list of actual country names because in the table we
+save the data with actual country names. Once we have the list of
+country names we check which of those values are already available in
+the table. If we consider the 3 scenarios above then we already have
+result for c("ARG", "AGO", "USA") and we need to find
+result for the remaining countries. After saving the data for the
+remaining countries in the table, we return the result by combining the
+two.
This is similar to scenario 4 but instead of having
+country = "all" we have here year = "all" so
+in this case we need to decode the year parameter. However,
+the sequence of operation remains the same as above.
This is combination of scenario 4 and 5 where we need to decode both
+country and year parameter, check the values
+that are present in the table, query the data that does not exist, save
+it into the table, combine the result and return the output.
+
These are 6 different scenarios that can occur. Note that I have not
+used all the arguments here in the pip call. We are using
+the default povline i.e 1.9 here but it can be something
+else. In which case, it will become scenario 1 where nothing is found in
+the table and the output is calculated and result is saved in the table.
+Similarly, we can also have situations where fill_gaps is
+set to TRUE which would also follow the same process.
+
+
+
+
Code overview
+
+
We are creating a duckdb file to save our table. The location of this
+file is saved in an environment variable PIP_CACHE_FILE
+(Example Sys.setenv(PIP_CACHE_FILE = "demo.duckdb")). A
+table called master_file is created inside it where we save
+our cache.
+
Based on fill_gaps parameter we call either the function
+fg_pip or rg_pip. Both the functions call the
+subset_lkup function to filter the data that from
+lkup that is relevant to our call. In
+subset_lkup function we call the function
+return_if_exists which as the name suggests returns the
+data from cache if it exists. A new file called
+duckdb_fun.R has been added to manage all the functions
+related to duckdb.
+
A named list is returned from return_if_exists function
+where it returns the final output (if it exists) from the master file
+and subsetted lkup (Scenario 3 where we have a part of data
+in master file). The partial (or full) final output is again returned as
+a named list from subset_lkup function which is used at the
+end to combine the two outputs. If lkup is non-empty then
+after all the calculation is done we use the function
+update_master_file to append the master file with new
+data.
+
+
+
Speed comparison
+
+
+microbenchmark::microbenchmark(
+ duckdb_caching =pip(country =c("AGO", "USA"), year =2000, lkup =lkup)
+)
+
+#Unit: milliseconds
+# expr min lq mean median uq max neval
+#duckdb_caching 102.745 107.2221 112.5106 109.2035 114.8544 146.8841 100
+
+microbenchmark::microbenchmark(
+ pip_DEV =pip(country =c("AGO", "USA"), year =2000, lkup =lkup)
+)
+
+#Unit: milliseconds
+# expr min lq mean median uq max neval
+#pip_DEV 51.01007 53.67546 62.35531 56.08937 60.12046 354.7717 100
+
+microbenchmark::microbenchmark(
+ duckdb_caching =pip(country ="all", year ="all", lkup =lkup)
+)
+
+#Unit: milliseconds
+# expr min lq mean median uq max neval
+# duckdb_caching 110.0725 115.0695 129.6408 118.4356 122.2621 494.3855 100
+
+microbenchmark::microbenchmark(
+ pip_DEV =pip(country ="all", year ="all", lkup =lkup)
+)
+
+#Unit: seconds
+# expr min lq mean median uq max neval
+#pip_DEV 14.42378 14.78717 14.98249 14.96088 15.11043 17.44418 100
In the recent update to pipapi we added three new
+endpoints grouped-stats, regression-params and
+lorenz-curve. The goal of this post is to share the process
+of adding new endpoints in pipapi as there are lot of
+checks in place in the code and it can be quite challenging to do
+so.
+
Arguments : Every argument and it’s value passed to
+the API is validated. This enhances the security of the API by making
+sure only expected arguments and it’s value are passed through API.
+
The arguments are validated in function
+validate_query_parameters which has a list of all valid
+arguments for all the endpoints. If your endpoint is using an existing
+argument then you don’t need to do anything since both the argument and
+it’s value is already validated. For eg - If your endpoint has an
+argument ppp which is an existing argument in the API then
+you don’t need to do any changes since ppp is already
+validated.
+
Values : Validate your input values in
+create_query_controls function by adding range or list of
+accepted values. If the arguments is character then you need to give all
+possible values that it can take. If the argument is numeric, then you
+need to supply min and max values to ensure
+that the numeric values stays in range. Based on the type of argument,
+check_param_chr, check_param_num or
+check_param_lgl is called. This also ensures that the
+argument name should mean the same everywhere. So it is not possible
+that the same argument can have two different meaning. For example, it
+is not possible that the argument requested_mean accepts
+value 0 to 1 in one endpoint and c("yes"/"no") in another
+endpoint again ensuring consistency.
+
Another thing to note is that the argument and values are available
+in both req$args as well as req$argsQuery
+however, all the validation is performed only on argsQuery
+and only argsQuery is used the entire API. So we suggest to
+continue using argsQuery for consistency purposes.
Fujs T, Eilertsen A (2025).
+pipapi: API for the Poverty and Inequality Platform.
+R package version 1.3.11.9000, https://github.com/PIP-Technical-Team/pipapi, https://pip-technical-team.github.io/pipapi.
+
+
@Manual{,
+ title = {pipapi: API for the Poverty and Inequality Platform},
+ author = {Tony Fujs and Aleksander Eilertsen},
+ year = {2025},
+ note = {R package version 1.3.11.9000, https://github.com/PIP-Technical-Team/pipapi},
+ url = {https://pip-technical-team.github.io/pipapi},
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/bootstrap-toc.css b/docs/bootstrap-toc.css
new file mode 100644
index 00000000..5a859415
--- /dev/null
+++ b/docs/bootstrap-toc.css
@@ -0,0 +1,60 @@
+/*!
+ * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/)
+ * Copyright 2015 Aidan Feldman
+ * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */
+
+/* modified from https://github.com/twbs/bootstrap/blob/94b4076dd2efba9af71f0b18d4ee4b163aa9e0dd/docs/assets/css/src/docs.css#L548-L601 */
+
+/* All levels of nav */
+nav[data-toggle='toc'] .nav > li > a {
+ display: block;
+ padding: 4px 20px;
+ font-size: 13px;
+ font-weight: 500;
+ color: #767676;
+}
+nav[data-toggle='toc'] .nav > li > a:hover,
+nav[data-toggle='toc'] .nav > li > a:focus {
+ padding-left: 19px;
+ color: #563d7c;
+ text-decoration: none;
+ background-color: transparent;
+ border-left: 1px solid #563d7c;
+}
+nav[data-toggle='toc'] .nav > .active > a,
+nav[data-toggle='toc'] .nav > .active:hover > a,
+nav[data-toggle='toc'] .nav > .active:focus > a {
+ padding-left: 18px;
+ font-weight: bold;
+ color: #563d7c;
+ background-color: transparent;
+ border-left: 2px solid #563d7c;
+}
+
+/* Nav: second level (shown on .active) */
+nav[data-toggle='toc'] .nav .nav {
+ display: none; /* Hide by default, but at >768px, show it */
+ padding-bottom: 10px;
+}
+nav[data-toggle='toc'] .nav .nav > li > a {
+ padding-top: 1px;
+ padding-bottom: 1px;
+ padding-left: 30px;
+ font-size: 12px;
+ font-weight: normal;
+}
+nav[data-toggle='toc'] .nav .nav > li > a:hover,
+nav[data-toggle='toc'] .nav .nav > li > a:focus {
+ padding-left: 29px;
+}
+nav[data-toggle='toc'] .nav .nav > .active > a,
+nav[data-toggle='toc'] .nav .nav > .active:hover > a,
+nav[data-toggle='toc'] .nav .nav > .active:focus > a {
+ padding-left: 28px;
+ font-weight: 500;
+}
+
+/* from https://github.com/twbs/bootstrap/blob/e38f066d8c203c3e032da0ff23cd2d6098ee2dd6/docs/assets/css/src/docs.css#L631-L634 */
+nav[data-toggle='toc'] .nav > .active > ul {
+ display: block;
+}
diff --git a/docs/bootstrap-toc.js b/docs/bootstrap-toc.js
new file mode 100644
index 00000000..1cdd573b
--- /dev/null
+++ b/docs/bootstrap-toc.js
@@ -0,0 +1,159 @@
+/*!
+ * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/)
+ * Copyright 2015 Aidan Feldman
+ * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */
+(function() {
+ 'use strict';
+
+ window.Toc = {
+ helpers: {
+ // return all matching elements in the set, or their descendants
+ findOrFilter: function($el, selector) {
+ // http://danielnouri.org/notes/2011/03/14/a-jquery-find-that-also-finds-the-root-element/
+ // http://stackoverflow.com/a/12731439/358804
+ var $descendants = $el.find(selector);
+ return $el.filter(selector).add($descendants).filter(':not([data-toc-skip])');
+ },
+
+ generateUniqueIdBase: function(el) {
+ var text = $(el).text();
+ var anchor = text.trim().toLowerCase().replace(/[^A-Za-z0-9]+/g, '-');
+ return anchor || el.tagName.toLowerCase();
+ },
+
+ generateUniqueId: function(el) {
+ var anchorBase = this.generateUniqueIdBase(el);
+ for (var i = 0; ; i++) {
+ var anchor = anchorBase;
+ if (i > 0) {
+ // add suffix
+ anchor += '-' + i;
+ }
+ // check if ID already exists
+ if (!document.getElementById(anchor)) {
+ return anchor;
+ }
+ }
+ },
+
+ generateAnchor: function(el) {
+ if (el.id) {
+ return el.id;
+ } else {
+ var anchor = this.generateUniqueId(el);
+ el.id = anchor;
+ return anchor;
+ }
+ },
+
+ createNavList: function() {
+ return $('
');
+ },
+
+ createChildNavList: function($parent) {
+ var $childList = this.createNavList();
+ $parent.append($childList);
+ return $childList;
+ },
+
+ generateNavEl: function(anchor, text) {
+ var $a = $('');
+ $a.attr('href', '#' + anchor);
+ $a.text(text);
+ var $li = $('');
+ $li.append($a);
+ return $li;
+ },
+
+ generateNavItem: function(headingEl) {
+ var anchor = this.generateAnchor(headingEl);
+ var $heading = $(headingEl);
+ var text = $heading.data('toc-text') || $heading.text();
+ return this.generateNavEl(anchor, text);
+ },
+
+ // Find the first heading level (`
`, then `
`, etc.) that has more than one element. Defaults to 1 (for `
helper function that creates a unique hash of code + data
+this hash value will be used as the value of the etag header
+to facilitate caching of PIP API responses
+
+
+
+
+
+
+
+
diff --git a/docs/reference/extract_ppp_date.html b/docs/reference/extract_ppp_date.html
new file mode 100644
index 00000000..6a40e19f
--- /dev/null
+++ b/docs/reference/extract_ppp_date.html
@@ -0,0 +1,111 @@
+
+Return the ppp date from the version of the data — extract_ppp_date • pipapi
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/reference/extract_release_date.html b/docs/reference/extract_release_date.html
new file mode 100644
index 00000000..9793f846
--- /dev/null
+++ b/docs/reference/extract_release_date.html
@@ -0,0 +1,111 @@
+
+Return the release date from the version of the data — extract_release_date • pipapi
+
+
+
Fill in maned objects of a list with the value of named objects in the
+parent frame in which the list has been created. This objects must have the
+same names as the objects of the list
+
+
+
+
fillin_list(l, assign =TRUE)
+
+
+
+
Arguments
+
l
+
list to populate with names objects
+
+
+
assign
+
logical: whether to assign to parent frame
+
+
+
+
Value
+
+
+
invisible list l populated with objects of the same frame
+
+
+
+
+
+
+
+
diff --git a/docs/reference/filter_lkup.html b/docs/reference/filter_lkup.html
new file mode 100644
index 00000000..67c3322d
--- /dev/null
+++ b/docs/reference/filter_lkup.html
@@ -0,0 +1,130 @@
+
+Helper to filter metadata
+aggregate distribution need to be filtered out when popshare is not null
+This is a temporary function until a full fix is implemented, and popshare is
+supported for all distributions — filter_lkup • pipapi
+
+
+
Helper to filter metadata
+aggregate distribution need to be filtered out when popshare is not null
+This is a temporary function until a full fix is implemented, and popshare is
+supported for all distributions
Helper to filter metadata
+aggregate distribution need to be filtered out when popshare is not null
+This is a temporary function until a full fix is implemented, and popshare is
+supported for all distributions
+
+
+
+
+
+
+
+
diff --git a/docs/reference/get_ctr_alt_agg.html b/docs/reference/get_ctr_alt_agg.html
new file mode 100644
index 00000000..36b86a86
--- /dev/null
+++ b/docs/reference/get_ctr_alt_agg.html
@@ -0,0 +1,158 @@
+
+Helper function to retrieve the required countries
+needed to compute alternative aggregates requested by user
+Get countries that belong to aggregates requested by the user that are NOT
+official but alternative aggregates. We need to find out missing data
+estimates only for those countries. For instance, if the user requested LAC
+and AFE, we don't care about the the countries with missing data in the LAC
+because their estimates are done implicitly. We DO care about the estimates
+of the missing countries in AFE because we need the explicit SSA estimates. — get_ctr_alt_agg • pipapi
+
+
+
Helper function to retrieve the required countries
+needed to compute alternative aggregates requested by user
+Get countries that belong to aggregates requested by the user that are NOT
+official but alternative aggregates. We need to find out missing data
+estimates only for those countries. For instance, if the user requested LAC
+and AFE, we don't care about the the countries with missing data in the LAC
+because their estimates are done implicitly. We DO care about the estimates
+of the missing countries in AFE because we need the explicit SSA estimates.
Helper function to retrieve the required countries
+needed to compute alternative aggregates requested by user
+Get countries that belong to aggregates requested by the user that are NOT
+official but alternative aggregates. We need to find out missing data
+estimates only for those countries. For instance, if the user requested LAC
+and AFE, we don't care about the the countries with missing data in the LAC
+because their estimates are done implicitly. We DO care about the estimates
+of the missing countries in AFE because we need the explicit SSA estimates.
+
+
+
+
+
+
+
+
diff --git a/docs/reference/get_impl_ctrs.html b/docs/reference/get_impl_ctrs.html
new file mode 100644
index 00000000..31b9341f
--- /dev/null
+++ b/docs/reference/get_impl_ctrs.html
@@ -0,0 +1,128 @@
+
+Helper function to retrieve the implicit country surveys present in both
+alternative and official aggregates — get_impl_ctrs • pipapi
+
+
+
Helper to filter metadata
+aggregate distribution need to be filtered out when popshare is not null
+This is a temporary function until a full fix is implemented, and popshare is
+supported for all distributions
Helper function to retrieve the required countries
+needed to compute alternative aggregates requested by user
+Get countries that belong to aggregates requested by the user that are NOT
+official but alternative aggregates. We need to find out missing data
+estimates only for those countries. For instance, if the user requested LAC
+and AFE, we don't care about the the countries with missing data in the LAC
+because their estimates are done implicitly. We DO care about the estimates
+of the missing countries in AFE because we need the explicit SSA estimates.
Helper function to determine whether an API call is compute intensive
+and should be forked to a parallel process to avoid blocking the main
+R process
+
+
+
+
+
+
+
+
diff --git a/docs/reference/is_forked.html b/docs/reference/is_forked.html
new file mode 100644
index 00000000..30616088
--- /dev/null
+++ b/docs/reference/is_forked.html
@@ -0,0 +1,135 @@
+
+Helper function to determine whether an API call is compute intensive
+and should be forked to a parallel process to avoid blocking the main
+R process — is_forked • pipapi
+
+
+
Helper function to determine whether an API call is compute intensive
+and should be forked to a parallel process to avoid blocking the main
+R process
Helper function to determine whether an API call is compute intensive
+and should be forked to a parallel process to avoid blocking the main
+R process
numeric: Proportion of the population living below the
+poverty line
+
+
+
fill_gaps
+
logical: If set to TRUE, will interpolate / extrapolate
+values for missing years
+
+
+
group_by
+
character: Will return aggregated values for predefined
+sub-groups
+
+
+
welfare_type
+
character: Welfare type
+
+
+
reporting_level
+
character: Geographical reporting level
+
+
+
ppp
+
numeric: Custom Purchase Power Parity value
+
+
+
lkup
+
list: A list of lkup tables
+
+
+
censor
+
logical: Triggers censoring of country/year statistics
+
+
+
lkup_hash
+
character: hash of pip
+
+
+
additional_ind
+
logical: If TRUE add new set of indicators. Default if
+FALSE
+
+
+
+
Value
+
+
+
data.table
+
+
+
+
Examples
+
if(FALSE){
+# Create lkups
+lkups<-create_lkups("<data-folder>")
+
+# A single country and year
+pip(country ="AGO",
+ year =2000,
+ povline =1.9,
+ lkup =lkups)
+
+# All years for a single country
+pip(country ="AGO",
+ year ="all",
+ povline =1.9,
+ lkup =lkups)
+
+# Fill gaps
+pip(country ="AGO",
+ year ="all",
+ povline =1.9,
+ fill_gaps =TRUE,
+ lkup =lkups)
+
+# Group by regions
+pip(country ="all",
+ year ="all",
+ povline =1.9,
+ group_by ="wb",
+ lkup =lkups)
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/reference/pip_aggregate.html b/docs/reference/pip_aggregate.html
new file mode 100644
index 00000000..b303f66e
--- /dev/null
+++ b/docs/reference/pip_aggregate.html
@@ -0,0 +1,127 @@
+
+Calculate estimates for aggregates different to the official regional
+aggregation — pip_aggregate • pipapi
+
+
+
character: Will return aggregated values for predefined
+sub-groups
+
+
+
welfare_type
+
character: Welfare type
+
+
+
reporting_level
+
character: Geographical reporting level
+
+
+
lkup
+
list: A list of lkup tables
+
+
+
censor
+
logical: Triggers censoring of country/year statistics
+
+
+
lkup_hash
+
character: hash of pip
+
+
+
+
Value
+
+
+
data.table
+
+
+
+
Examples
+
if(FALSE){
+# Create lkups
+lkups<-create_lkups("<data-folder>")
+
+# A single country and year
+pip_grp(country ="all",
+ year =2000,
+ povline =1.9,
+ group_by ="wb",
+ lkup =lkups)
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/reference/pip_grp_logic.html b/docs/reference/pip_grp_logic.html
new file mode 100644
index 00000000..91fb1687
--- /dev/null
+++ b/docs/reference/pip_grp_logic.html
@@ -0,0 +1,167 @@
+
+Logic for computing new aggregate — pip_grp_logic • pipapi
+
+
+
Returns the Lorenz curve. User provides the cumulative welfare and
+cumulative weight, as well as the number of points on the lorenz curve required.
+By default, the best fitting Lorenz parameterization (quadratic or beta) is
+selected.
numeric vector of cumulative share of welfare (income/consumption)
+
+
+
weight
+
numeric vector of cumulative share of the population
+
+
+
lorenz
+
either "lb" or "lq"
+
+
+
n_bins
+
atomic double vector of length 1: number of points on the
+lorenz curve
+
+
+
params
+
list of parameters
+
+
+
+
Value
+
+
+
Returns a list which contains:
numeric lorenz curve,
+
corresponding points on x-axis,
+
whether lq or lb parameterization, and
+
if complete=TRUE, also returns all params.
+
+
+
+
Examples
+
if(FALSE){
+# Example 1: Generating a Lorenz Curve with default settings
+pipgd_lorenz_curve(welfare =pip_gd$L,
+ weight =pip_gd$P)
+
+# Example 2: Specifying the number of bins for the Lorenz Curve
+pipgd_lorenz_curve(welfare =pip_gd$L,
+ weight =pip_gd$P,
+ n_bins =50)
+
+# Example 3: Using pre-calculated parameters
+use_params<-pipgd_params(welfare =pip_gd$L,
+ weight =pip_gd$P)
+pipgd_lorenz_curve(params =use_params)
+
+
+# Example 4: Generating Lorenz Curve with a specific Lorenz model(e.g. Lorenz beta)
+pipgd_lorenz_curve(params =use_params,
+ lorenz ="lb")
+}
+
+
character: List of valid region codes that can be used
+for region selection
+
+
+
+
Value
+
+
+
logical vector
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/reference/select_off_alt_agg.html b/docs/reference/select_off_alt_agg.html
new file mode 100644
index 00000000..2e4f542c
--- /dev/null
+++ b/docs/reference/select_off_alt_agg.html
@@ -0,0 +1,120 @@
+
+Helper function to identify how Official and Alternative regions should be
+handled — select_off_alt_agg • pipapi
+
+
+
start_api(api_version ="v1", port =80, host ="0.0.0.0")
+
+
+
+
Arguments
+
api_version
+
character: API version to launch
+
+
+
port
+
integer: Port
+
+
+
host
+
character: Host
+
+
+
+
Value
+
+
+
plumber API
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/reference/subset_ctry_years.html b/docs/reference/subset_ctry_years.html
new file mode 100644
index 00000000..5a3d02ed
--- /dev/null
+++ b/docs/reference/subset_ctry_years.html
@@ -0,0 +1,137 @@
+
+Subset country-years table
+This is a table created at start time to facilitate imputations
+It part of the interpolated_list object — subset_ctry_years • pipapi
+
+
+
numeric vector of length one of lineup year for the world
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
new file mode 100644
index 00000000..cdf78438
--- /dev/null
+++ b/docs/sitemap.xml
@@ -0,0 +1,300 @@
+
+
+
+ https://pip-technical-team.github.io/pipapi/404.html
+
+
+ https://pip-technical-team.github.io/pipapi/articles/debug-caching.html
+
+
+ https://pip-technical-team.github.io/pipapi/articles/duckdb-caching.html
+
+
+ https://pip-technical-team.github.io/pipapi/articles/index.html
+
+
+ https://pip-technical-team.github.io/pipapi/articles/new-endpoints.html
+
+
+ https://pip-technical-team.github.io/pipapi/authors.html
+
+
+ https://pip-technical-team.github.io/pipapi/CONTRIBUTING.html
+
+
+ https://pip-technical-team.github.io/pipapi/index.html
+
+
+ https://pip-technical-team.github.io/pipapi/LICENSE-text.html
+
+
+ https://pip-technical-team.github.io/pipapi/LICENSE.html
+
+
+ https://pip-technical-team.github.io/pipapi/news/index.html
+
+
+ https://pip-technical-team.github.io/pipapi/PULL_REQUEST_TEMPLATE.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/add_agg_medians.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/add_distribution_type.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/add_dist_stats.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/add_pg.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/add_spl.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/add_vars_out_of_pipeline.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/assign_serializer.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/available_versions.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/censor_stats.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/change_grouped_stats_to_csv.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/citation_from_version.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/clear_cache.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/create_countries_vctr.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/create_etag_header.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/create_lkups.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/create_return_cols.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/create_versioned_lkups.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/create_vintage_pattern_call.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/empty_response.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/empty_response_cp_poverty.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/empty_response_grp.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/estimate_type_ctr_lnp.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/estimate_type_var.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/extract_identity.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/extract_ppp_date.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/extract_release_date.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/fg_assign_nas_values_to_dup_cols.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/fg_pip.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/fg_remove_duplicates.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/fg_standardize_cache_id.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/fillin_list.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/filter_lkup.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/filter_md.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_additional_indicators.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_additional_indicators_grp.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_aux_table.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_aux_table_ui.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_caller_names.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_ctr_alt_agg.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_grp_to_compute.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_impl_ctrs.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_md_vars.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_metaregion_table.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_param_values.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_pg_table.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_pip_version.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_spr_table.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_svy_data.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_user_alt_gt.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_user_x_code.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/get_valid_aux_long_format_tables.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ifel_isnull.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/index.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/is_empty.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/is_forked.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/lkup.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/pip.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/pipgd_lorenz_curve.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/pip_aggregate.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/pip_grp.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/pip_grp_logic.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/reporting_level_list.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/return_correct_version.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/return_if_exists.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/rg_pip.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/select_country.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/select_off_alt_agg.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/select_reporting_level.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/select_user_aggs.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/select_years.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/start_api.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/subset_ctry_years.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/subset_lkup.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_cp_charts.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_cp_download.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_cp_key_indicators.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_cp_poverty_charts.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_hp_countries.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_hp_stacked.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_pc_charts.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_pc_regional.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/ui_svy_meta.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/update_master_file.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/validate_input_grouped_stats.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/valid_years.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/version_dataframe.html
+
+
+ https://pip-technical-team.github.io/pipapi/reference/wld_lineup_year.html
+
+
diff --git a/inst/TMP/TMP_duckdb_cache.R b/inst/TMP/TMP_duckdb_cache.R
new file mode 100644
index 00000000..e37562d9
--- /dev/null
+++ b/inst/TMP/TMP_duckdb_cache.R
@@ -0,0 +1,38 @@
+devtools::load_all(".")
+force <- FALSE
+if (!"lkups" %in% ls() || isTRUE(force)) {
+ data_dir <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_LOCAL") |>
+ fs::path()
+ fs::dir_ls(data_dir, recurse = FALSE)
+}
+
+
+latest_version <-
+ pipapi:::available_versions(data_dir) |>
+ max()
+
+latest_version <- NULL
+latest_version <- "20240627_2017_01_02_PROD"
+lkups <- create_versioned_lkups(data_dir,
+ vintage_pattern = latest_version)
+
+lkup <- lkups$versions_paths[[lkups$latest_release]]
+
+
+reset_cache(lkup = lkup)
+
+
+
+# 1.
+pip(country = "all", year = 2000, lkup = lkup)
+
+# 2.
+pip(country = "AGO", year = 2000, lkup = lkup)
+
+
+pip(country = "all", year = "all", lkup = lkup)
+
+
+pip(country = "IND", year = 2018, lkup = lkup)
+
+pip(country = "IND", year = "all", lkup = lkup)
diff --git a/inst/plumber/v1/endpoints.R b/inst/plumber/v1/endpoints.R
index c88873e5..3a61a7a5 100644
--- a/inst/plumber/v1/endpoints.R
+++ b/inst/plumber/v1/endpoints.R
@@ -85,7 +85,7 @@ function(req, res) {
# treated asynchronously.
# 2) The introduction of PPP versioning implies having a dynamic default
# poverty line
-
+ browser()
req <- pipapi:::assign_required_params(req,
pl_lkup = lkups$pl_lkup)
@@ -371,6 +371,18 @@ function() {
}
}
+#* Reset DuckDB cache file
+#* @get /api/v1/duckdb-reset
+#* @param pass:[chr] Local password, this password is checked against the server password
+#* @param type:[chr] Which table do you want to delete? Values accepted are "both", "rg" and "fg"
+#* @serializer unboxedJSON
+function(req, res) {
+ params <- req$argsQuery
+ params$lkup <- lkups$versions_paths[[params$version]]
+ params$version <- NULL
+ do.call(pipapi:::reset_cache, params)
+}
+
# #* Return cache log
# #* @get /api/v1/cache-log
# #* @serializer print list(quote = FALSE)
diff --git a/man/estimate_type_var.Rd b/man/estimate_type_var.Rd
index 8ef32d77..e6e983cd 100644
--- a/man/estimate_type_var.Rd
+++ b/man/estimate_type_var.Rd
@@ -9,7 +9,7 @@ estimate_type_var(df, lkup)
\arguments{
\item{df}{data.table: Table to censor.}
-\item{censored_table}{data.table: Censor table}
+\item{lkup}{lkup value}
}
\description{
It also censors specific stats
diff --git a/man/fg_pip.Rd b/man/fg_pip.Rd
index 9012b01b..3f671bdf 100644
--- a/man/fg_pip.Rd
+++ b/man/fg_pip.Rd
@@ -12,7 +12,8 @@ fg_pip(
welfare_type,
reporting_level,
ppp,
- lkup
+ lkup,
+ con
)
}
\arguments{
@@ -32,6 +33,8 @@ poverty line}
\item{ppp}{numeric: Custom Purchase Power Parity value}
\item{lkup}{list: A list of lkup tables}
+
+\item{con}{duckdb connection object}
}
\value{
data.frame
diff --git a/man/get_user_x_code.Rd b/man/get_user_x_code.Rd
index ad051135..fc1ba375 100644
--- a/man/get_user_x_code.Rd
+++ b/man/get_user_x_code.Rd
@@ -2,7 +2,7 @@
% Please edit documentation in R/create_countries_vctr.R
\name{get_user_x_code}
\alias{get_user_x_code}
-\title{Helper function to define user_{var}_code}
+\title{Helper function to define user_\{var\}_code}
\usage{
get_user_x_code(x)
}
@@ -13,5 +13,5 @@ get_user_x_code(x)
character
}
\description{
-Helper function to define user_{var}_code
+Helper function to define user_\{var\}_code
}
diff --git a/man/pipgd_lorenz_curve.Rd b/man/pipgd_lorenz_curve.Rd
index a5e40185..7547a5cd 100644
--- a/man/pipgd_lorenz_curve.Rd
+++ b/man/pipgd_lorenz_curve.Rd
@@ -15,8 +15,6 @@ pipgd_lorenz_curve(welfare = NULL, weight = NULL, lorenz = NULL, n_bins = 100)
\item{n_bins}{atomic double vector of length 1: number of points on the
lorenz curve}
-
-\item{params}{list of parameters}
}
\value{
Returns a list which contains:
diff --git a/man/return_if_exists.Rd b/man/return_if_exists.Rd
new file mode 100644
index 00000000..967f4689
--- /dev/null
+++ b/man/return_if_exists.Rd
@@ -0,0 +1,24 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/duckdb_func.R
+\name{return_if_exists}
+\alias{return_if_exists}
+\title{Return the rows of the table if they exist in master file}
+\usage{
+return_if_exists(lkup, povline, con, fill_gaps)
+}
+\arguments{
+\item{lkup}{list: A list of lkup tables}
+
+\item{povline}{numeric: Poverty line}
+
+\item{con}{Connection object to duckdb table}
+
+\item{fill_gaps}{logical: If set to TRUE, will interpolate / extrapolate
+values for missing years}
+}
+\value{
+Dataframe
+}
+\description{
+Return the rows of the table if they exist in master file
+}
diff --git a/man/rg_pip.Rd b/man/rg_pip.Rd
index 4a4f1881..5d107e95 100644
--- a/man/rg_pip.Rd
+++ b/man/rg_pip.Rd
@@ -12,7 +12,8 @@ rg_pip(
welfare_type,
reporting_level,
ppp,
- lkup
+ lkup,
+ con
)
}
\arguments{
@@ -32,6 +33,8 @@ poverty line}
\item{ppp}{numeric: Custom Purchase Power Parity value}
\item{lkup}{list: A list of lkup tables}
+
+\item{con}{duckdb connection object}
}
\value{
data.frame
diff --git a/man/subset_lkup.Rd b/man/subset_lkup.Rd
index 9278bf4b..48b7d162 100644
--- a/man/subset_lkup.Rd
+++ b/man/subset_lkup.Rd
@@ -11,7 +11,10 @@ subset_lkup(
reporting_level,
lkup,
valid_regions,
- data_dir = NULL
+ data_dir = NULL,
+ povline,
+ con,
+ fill_gaps
)
}
\arguments{
@@ -29,6 +32,13 @@ subset_lkup(
for region selection}
\item{data_dir}{character: directory path from lkup$data_root}
+
+\item{povline}{numeric: Poverty line}
+
+\item{con}{duckdb connection object}
+
+\item{fill_gaps}{logical: If set to TRUE, will interpolate / extrapolate
+values for missing years}
}
\value{
data.frame
diff --git a/man/update_master_file.Rd b/man/update_master_file.Rd
new file mode 100644
index 00000000..f4e712a7
--- /dev/null
+++ b/man/update_master_file.Rd
@@ -0,0 +1,22 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/duckdb_func.R
+\name{update_master_file}
+\alias{update_master_file}
+\title{Update master file with the contents of the dataframe}
+\usage{
+update_master_file(dat, cache_file_path, fill_gaps)
+}
+\arguments{
+\item{dat}{Dataframe to be appended}
+
+\item{cache_file_path}{path where cache file is saved}
+
+\item{fill_gaps}{logical: If set to TRUE, will interpolate / extrapolate
+values for missing years}
+}
+\value{
+number of rows updated
+}
+\description{
+Update master file with the contents of the dataframe
+}
diff --git a/pipapi.Rproj b/pipapi.Rproj
index 7470a16d..7a441a72 100644
--- a/pipapi.Rproj
+++ b/pipapi.Rproj
@@ -1,5 +1,5 @@
Version: 1.0
-ProjectId: 73133a1c-39b9-4622-abdf-7862f43a14a1
+ProjectId: 3fab55c7-e5d2-495a-8d65-ef16b44733fe
RestoreWorkspace: No
SaveWorkspace: No
diff --git a/tests/testthat/test-fg_pip-local.R b/tests/testthat/test-fg_pip-local.R
index ee5523ed..c3c3ae97 100644
--- a/tests/testthat/test-fg_pip-local.R
+++ b/tests/testthat/test-fg_pip-local.R
@@ -13,6 +13,8 @@ lkups <- create_versioned_lkups(data_dir,
vintage_pattern = latest_version)
lkup <- lkups$versions_paths[[lkups$latest_release]]
+con <- duckdb::dbConnect(duckdb::duckdb(), dbdir = fs::path(lkup$data_root, "cache", ext = "duckdb"))
+
local_mocked_bindings(
get_caller_names = function() c("else")
)
@@ -28,10 +30,11 @@ test_that("Imputation is working for extrapolated aggregated distribution", {
welfare_type = "all",
reporting_level = "all",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
- expect_equal(nrow(tmp), 2)
+ expect_equal(nrow(tmp$main_data), 2)
tmp <- fg_pip(
country = "CHN",
@@ -41,10 +44,11 @@ test_that("Imputation is working for extrapolated aggregated distribution", {
welfare_type = "all",
reporting_level = "national",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
- expect_equal(nrow(tmp), 2)
+ expect_equal(nrow(tmp$main_data), 2)
})
## Interpolation ----
@@ -57,10 +61,11 @@ test_that("Imputation is working for interpolated mixed distribution", {
welfare_type = "all",
reporting_level = "all",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
- expect_equal(nrow(tmp), 2)
+ expect_equal(nrow(tmp$main_data), 2)
tmp <- fg_pip(
country = "IND",
@@ -70,10 +75,11 @@ test_that("Imputation is working for interpolated mixed distribution", {
welfare_type = "all",
reporting_level = "national",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
- expect_equal(nrow(tmp), 2)
+ expect_equal(nrow(tmp$main_data), 2)
})
test_that("Imputation is working for interpolated aggregate distribution", {
@@ -85,10 +91,11 @@ test_that("Imputation is working for interpolated aggregate distribution", {
welfare_type = "all",
reporting_level = "all",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
- expect_equal(nrow(tmp), 2)
+ expect_equal(nrow(tmp$main_data), 2)
tmp <- fg_pip(
country = "CHN",
@@ -98,10 +105,11 @@ test_that("Imputation is working for interpolated aggregate distribution", {
welfare_type = "all",
reporting_level = "national",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
- expect_equal(nrow(tmp), 2)
+ expect_equal(nrow(tmp$main_data), 2)
})
@@ -150,9 +158,10 @@ tmp <- fg_pip(
welfare_type = "all",
reporting_level = "all",
ppp = NULL,
- lkup = lkup
+ lkup = lkup,
+ con = con
)
-
+tmp <- tmp$main_data
# dt <- pip(country = "ALL",
# lkup = lkup,
# povline = 2.15,
diff --git a/tests/testthat/test-ui_poverty_indicators.R b/tests/testthat/test-ui_poverty_indicators.R
index b4422c97..d209887e 100644
--- a/tests/testthat/test-ui_poverty_indicators.R
+++ b/tests/testthat/test-ui_poverty_indicators.R
@@ -20,7 +20,7 @@ test_that("ui_pc_charts() works as expected", {
povline = 1.9,
lkup = lkups)
expect_equal(class(res), c("data.table", "data.frame"))
- expect_equal(names(res), lkups$return_cols$ui_pc_charts$cols)
+ expect_equal(names(res), setdiff(lkups$return_cols$ui_pc_charts$cols, "estimate_type"))
expect_equal(nrow(res), nrow(lkups$svy_lkup[country_code == "AGO"]))
},
get_caller_names = function() c("else")
diff --git a/tests/testthat/test-utils.R b/tests/testthat/test-utils.R
index da38fa01..5ec627d5 100644
--- a/tests/testthat/test-utils.R
+++ b/tests/testthat/test-utils.R
@@ -68,7 +68,7 @@ test_that("subset_lkup correctly selects all countries", {
valid_regions = valid_regions,
data_dir = data_dir)
- expect_equal(nrow(tmp), nrow(ref_lkup))
+ expect_equal(nrow(tmp$lkup), nrow(ref_lkup))
})
test_that("subset_lkup correctly selects countries", {
@@ -81,7 +81,7 @@ test_that("subset_lkup correctly selects countries", {
valid_regions = valid_regions,
data_dir = data_dir)
- expect_equal(sort(unique(tmp$country_code)), sort(selection))
+ expect_equal(sort(unique(tmp$lkup$country_code)), sort(selection))
})
test_that("subset_lkup correctly selects single regions", {
@@ -94,7 +94,7 @@ test_that("subset_lkup correctly selects single regions", {
valid_regions = valid_regions,
data_dir = data_dir)
- expect_equal(sort(unique(tmp$region_code)), sort(selection))
+ expect_equal(sort(unique(tmp$lkup$region_code)), sort(selection))
})
test_that("subset_lkup correctly selects multiple regions", {
@@ -107,7 +107,7 @@ test_that("subset_lkup correctly selects multiple regions", {
valid_regions = valid_regions,
data_dir = data_dir)
- expect_equal(sort(unique(tmp$region_code)), sort(selection))
+ expect_equal(sort(unique(tmp$lkup$region_code)), sort(selection))
})
test_that("subset_lkup correctly selects countries and regions", {
@@ -125,9 +125,9 @@ test_that("subset_lkup correctly selects countries and regions", {
data_dir = data_dir)
# Regions are selected
- expect_true(all(region_selection %in% (unique(tmp$region_code))))
+ expect_true(all(region_selection %in% (unique(tmp$lkup$region_code))))
# Countries are selected
- expect_true(all(country_selection %in% (unique(tmp$country_code))))
+ expect_true(all(country_selection %in% (unique(tmp$lkup$country_code))))
})
# select_country() test suite
diff --git a/vignettes/debug-caching.Rmd b/vignettes/debug-caching.Rmd
index ce110bfe..5b82048c 100644
--- a/vignettes/debug-caching.Rmd
+++ b/vignettes/debug-caching.Rmd
@@ -1,8 +1,12 @@
---
title: "Debug caching and API endpoints"
-output: html_document
date: "2024-10-02"
author: "Ronak Shah"
+output: rmarkdown::html_vignette
+vignette: >
+ %\VignetteIndexEntry{Debug caching and API endpoints}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
---
## How caching works?
diff --git a/vignettes/duckdb-caching.Rmd b/vignettes/duckdb-caching.Rmd
new file mode 100644
index 00000000..fe5749c8
--- /dev/null
+++ b/vignettes/duckdb-caching.Rmd
@@ -0,0 +1,159 @@
+---
+title: "DuckDB Caching"
+date: "2024-12-26"
+author: "Ronak Shah"
+output: rmarkdown::html_vignette
+vignette: >
+ %\VignetteIndexEntry{DuckDB Caching}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(eval = FALSE, echo = TRUE)
+```
+
+## Introduction
+
+Current caching mechanism for pip uses traditional caching where basically a hash is created based on the value of the arguments passed in the function and if someone calls the same function with the same arguments again the cached result is returned instead of doing the same calculation again. For `pip` we used the packages `cachem` and `memoise` to implement this system of caching. We will call this caching system as traditional caching strategy. This traditional caching strategy works well in general however, `pip` is a special case and it would benefit much more if it had a custom strategy for caching.
+
+## How caching currently works?
+
+Consider these pip functions
+
+```{r, eval=FALSE}
+# 1.
+pip(country = "all", year = 2000, lkup = lkup)
+
+# 2.
+pip(country = "AGO", year = 2000, lkup = lkup)
+```
+
+Now since these are separate set of arguments 2 files of caching are created and saved on the disk. Now if a call to `pip` is made again `pip(country = "AGO", year = 2000, lkup = lkup)` which is same as 2) then it would return the result from the cached file stored on the disk without doing any calculation. Needless to say, this result is much faster.
+
+However, notice that the 2nd call is subset of the 1st one. What I mean by that is the result of 2) is already present in result of 1). We have done the calculations for all the countries for the year 2000 in 1) we just need output of `"AGO"` from it to get the result for 2).
+
+## Custom caching for pipapi.
+
+What if we could take subset of an existing cache like how we need it as above. However, this is not how traditional caching systems work so there is no ready-made solution available. We would need to implement this logic from scratch if we want to make this work.
+
+We came up with an idea to implement this custom caching using `duckdb` where we save the output in a table. Basically, all the queries that are called till now are saved in the table and whenever a new call is made it checks if the result already exists in the table, if yes then it returns the result immediately or else it will do the calculation and then save the result to the table for next use and return the result. There are various scenarios that we need to consider to understand this approach. Let's take help of an example to understand each one of them.
+
+Consider that we are just starting out and there is nothing saved in the table and it's empty.
+
+#### Scenario 1 -
+
+```{r}
+pip(country = c("AGO", "USA"), year = 2000, lkup = lkup)
+```
+
+Since the table is empty, this call will do all the calculation and save the result in the table for future use and return the output.
+
+#### Scenario 2 -
+
+```{r}
+pip(country = "USA", year = 2000, lkup = lkup)
+```
+
+Now this is something which we have already calculated in our previous call. In traditional caching this would be treated as a separate call and the calculation would have been performed again. However, in our custom caching it goes through the existing table and checks if we already have the result for this call. Since we do have it saved in our table we will just return the result in this case as it is from the table without doing any calculation.
+
+#### Scenario 3 -
+
+```{r}
+pip(country = c("ARG", "USA"), year = 2000, lkup = lkup)
+```
+
+Notice this time it is combination of scenario 1 and 2 where one part of the calculation we already have ("USA") and another part we don't ("ARG"). In this case, we return the result for the country that we have in the table and send rest of the arguments for the calculation. We save the result from calculation in the table and return the output by combining both the result.
+
+#### Scenario 4 -
+
+```{r}
+pip(country = "all", year = 2000, lkup = lkup)
+```
+
+In this scenario before we check in the table we need to decode this "all" argument to list of actual country names because in the table we save the data with actual country names. Once we have the list of country names we check which of those values are already available in the table. If we consider the 3 scenarios above then we already have result for `c("ARG", "AGO", "USA")` and we need to find result for the remaining countries. After saving the data for the remaining countries in the table, we return the result by combining the two.
+
+#### Scenario 5 -
+
+```{r}
+pip(country = "AGO", year = "all", lkup = lkup)
+```
+
+This is similar to scenario 4 but instead of having `country = "all"` we have here `year = "all"` so in this case we need to decode the `year` parameter. However, the sequence of operation remains the same as above.
+
+#### Scenario 6 -
+
+```{r}
+pip(country = "all", year = "all", lkup = lkup)
+```
+
+This is combination of scenario 4 and 5 where we need to decode both `country` and `year` parameter, check the values that are present in the table, query the data that does not exist, save it into the table, combine the result and return the output.
+
+These are 6 different scenarios that can occur. Note that I have not used all the arguments here in the `pip` call. We are using the default `povline` i.e 1.9 here but it can be something else. In which case, it will become scenario 1 where nothing is found in the table and the output is calculated and result is saved in the table. Similarly, we can also have situations where `fill_gaps` is set to `TRUE` which would also follow the same process.
+
+## Code overview
+
+We are creating a duckdb file to save our table. This file is specific to one release and is saved in the root of the release folder with name `cache.duckdb`. There are two tables created in the duckdb file called `rg_master_file` and `fg_master_file` based on the `fill_gaps` argument a table is selected to save and retrieve data. Based on `fill_gaps` parameter we call either the function `fg_pip` or `rg_pip`. Both the functions call the `subset_lkup` function to filter the data from `lkup` that is relevant to our call. In `subset_lkup` function we call the function `return_if_exists` which as the name suggests returns the data from cache if it exists. A new file called `duckdb_fun.R` has been added to manage all the functions related to duckdb.
+
+A named list is returned from `return_if_exists` function where it returns the final output (if it exists) from the master file and subsetted `lkup`. The partial (or full) final output is again returned as a named list from `subset_lkup` function which is used at the end to combine the two outputs. If `lkup` is non-empty then after all the calculation is done we use the function `update_master_file` to append the master file with new data. If we are running the function for the first time in the release the code also has a provision to create an empty `cache.duckdb` file with two tables.
+
+## Speed comparison
+
+For analysis purposes, we are comparing speed in different scenarios on `DEV` branch vs that in `implement-duckdb` branch.
+
+```{r}
+microbenchmark::microbenchmark(
+ pip_DEV = pip(country = c("AGO", "USA"), year = 2000, lkup = lkup)
+)
+
+#Unit: microseconds
+# expr min lq mean median uq max neval
+# duckdb_DEV 830.463 899.872 2241.876 918.446 954.1285 116429.3 100
+
+microbenchmark::microbenchmark(
+ duckdb_caching = pip(country = c("AGO", "USA"), year = 2000, lkup = lkup)
+)
+
+#Unit: milliseconds
+# expr min lq mean median uq max neval
+# duckdb_caching 161.6227 170.5818 185.5906 175.3116 184.9512 793.8183 100
+```
+
+```{r}
+country_list <- c("AGO", "ARG", "AUT", "BEL", "BGD", "BLR", "BOL", "CAN", "CHE",
+ "CHL", "COL", "CRI", "DEU", "DNK", "DOM", "ECU", "ESP", "EST",
+ "FIN", "FRA", "FSM", "GBR", "GEO", "GRC", "GTM", "HRV", "HUN",
+ "IDN", "IDN", "IDN", "IRL", "ITA", "KGZ", "LTU", "LUX", "MAR",
+ "MDA", "MEX", "MKD", "MRT", "NOR", "PAN", "PER", "PHL", "PHL",
+ "POL", "ROU", "RUS", "RWA", "SLV", "STP", "SWE", "SWZ", "THA",
+ "TON", "TUN", "TWN", "TZA", "URY", "USA", "UZB", "ZAF")
+
+tictoc::tic()
+
+for(i in seq_along(country_list)) {
+ out <- pip(country = country_list[seq_len(i)], year = 2000, lkup = lkup)
+}
+
+tictoc::toc()
+
+## For DEV version
+# 16.36 sec elapsed
+
+## For Duckdb
+#18.17 sec elapsed
+```
+
+```{r}
+tictoc::tic()
+
+for(i in seq_along(country_list)) {
+ out <- pip(country = country_list[seq_len(i)], year = "all", lkup = lkup)
+}
+
+tictoc::toc()
+## DEV
+# 403.38 sec elapsed
+
+## Duckdb caching
+# 33.53 sec elapsed
+```
diff --git a/vignettes/new-endpoints.Rmd b/vignettes/new-endpoints.Rmd
index df7f1fa6..e52a7e3b 100644
--- a/vignettes/new-endpoints.Rmd
+++ b/vignettes/new-endpoints.Rmd
@@ -18,3 +18,5 @@ The arguments are validated in function `validate_query_parameters` which has a
**Values :** Validate your input values in `create_query_controls` function by adding range or list of accepted values. If the arguments is character then you need to give all possible values that it can take. If the argument is numeric, then you need to supply `min` and `max` values to ensure that the numeric values stays in range. Based on the type of argument, `check_param_chr`, `check_param_num` or `check_param_lgl` is called. This also ensures that the argument name should mean the same everywhere. So it is not possible that the same argument can have two different meaning. For example, it is not possible that the argument `requested_mean` accepts value 0 to 1 in one endpoint and `c("yes"/"no")` in another endpoint again ensuring consistency.
Another thing to note is that the argument and values are available in both `req$args` as well as `req$argsQuery` however, all the validation is performed only on `argsQuery` and only `argsQuery` is used the entire API. So we suggest to continue using `argsQuery` for consistency purposes.
+
+Once you do these changes, don't forget to refresh the session before testing out your changes.