-
Notifications
You must be signed in to change notification settings - Fork 0
release 20250930 #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release 20250930 #459
Conversation
…rs not included as lineup years
|
Caution Review failedFailed to post review comments WalkthroughMajor version update introduces lineup-based dispatch, new and legacy FGT pipelines, aggregation APIs, environment helpers, timeout-safe API endpoints, and expanded documentation. Numerous signatures change, new exports are added, caching and DuckDB update logic are revised, and UI functions/endpoints are adapted to the new flows and feature flags. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant pip as pip()
participant Sel as use_new_lineup_version
participant New as pip_new_lineups()
participant Old as pip_old_lineups()
Client->>pip: country, year, povline/popshare, lkup, ...
pip->>Sel: lkup$version
alt New lineup
pip->>New: delegated call
New->>New: validate inputs, cache, fill_gaps branch
New-->>pip: data.table result
else Old lineup
pip->>Old: delegated call
Old->>Old: validate inputs, cache, fill_gaps branch
Old-->>pip: data.table result
end
pip-->>Client: result
sequenceDiagram
autonumber
actor User
participant API as Plumber Endpoint (safe_endpoint)
participant Core as Core handler
participant TO as with_req_timeout
Note over API: Attach request_id, timing
User->>API: HTTP request
API->>TO: with timeout expr
TO->>Core: execute handler
alt completes before timeout
Core-->>TO: result
TO-->>API: ok: TRUE, result
API-->>User: 200 + payload
else timeout
TO-->>API: ok: FALSE, error: timeout
API-->>User: 503 + JSON error {request_id, endpoint}
end
Note over API: On error: 500 + structured JSON
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Poem
Pre-merge checks and finishing touches and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests
✨ Finishing touches🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 55
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (12)
man/rg_pip_old.Rd (1)
7-56: Fix the duplicated usage block and incorrect return/value docs.The Rd now renders two identical
rg_pip_old(...)usage signatures plus duplicated\value{}and\description{}sections, which is confusing. More importantly, the\value{}stanza still saysdata.frame, but the implementation inR/rg_pip_old.Rreturns a list withmain_dataanddata_in_cache. The documentation needs to be regenerated so it advertises a single usage block and the correct return structure (list with its components). Please update the roxygen comments accordingly and re-runroxygen2.man/update_master_file.Rd (1)
11-22: Document the newdecimalargument (andverbose).Line [12] adds
decimal = 2to the public signature, but the \arguments section still stops atfill_gaps, so readers get no guidance on eitherverboseordecimal. Please add \item entries for both parameters so the documentation reflects the exported API. You can apply a change like:\arguments{ \item{dat}{Dataframe to be appended} \item{cache_file_path}{path where cache file is saved} \item{fill_gaps}{logical: If set to TRUE, will interpolate / extrapolate values for missing years} +\item{verbose}{logical: print progress information when TRUE.} +\item{decimal}{integer: number of decimal places to round numeric fields.} }man/fg_pip.Rd (1)
15-36: Document the newpipenvargument.The signature now exposes
pipenv = NULL, but there’s no matching\item{pipenv}{...}entry. Readers of the help page won’t know what to pass. Please add a clear argument description (purpose, expected type/default behavior).R/add_agg_stats.R (1)
146-160: Bug: column-wide NA coercion in negative/zero handlers.
negative_to_naandzeros_to_nareturn a singleNA_real_if any offending value exists, wiping entire columns duringlapply. These should be element-wise replacements.Apply this fix:
-negative_to_na <- function(x) { - if (any(x < 0, na.rm = TRUE) || anyNA(x)) { - NA_real_ - } else { - x - } -} +negative_to_na <- function(x) { + ix <- x < 0 + if (any(ix, na.rm = TRUE)) x[ix] <- NA_real_ + x +} -zeros_to_na <- function(x) { - if (any(x == 0, na.rm = TRUE)) { - NA_real_ - } else { - x - } -} +zeros_to_na <- function(x) { + ix <- x == 0 + if (any(ix, na.rm = TRUE)) x[ix] <- NA_real_ + x +}tests/testthat/test-utils.R (1)
11-14: Replace secondlkupsassignment withlkup
In both blocks of tests/testthat/test-utils.R (around lines 11–14 and 16–22), change:-lkups <- lkups$versions_paths[[lkups$latest_release]] +lkup <- lkups$versions_paths[[lkups$latest_release]]This defines
lkupand preserveslkupsfor subsequent tests.man/add_dist_stats.Rd (1)
7-13: Update argument docs to match new signature.The usage now shows
add_dist_stats(df, lkup, fill_gaps), but the\arguments{}block still documentsdist_statsand omitslkup/fill_gaps. Please align the argument documentation with the current parameters so users get accurate guidance.man/get_aux_table.Rd (1)
7-15: Updatedata_dirargument description to match new default.Now that
data_dirdefaults toNULL, the argument section should clarify that callers may omit it and that the function will derive the directory (e.g., fromlkup/environment helpers) when absent. Without that note, the docs imply the parameter is required and hide the new behavior introduced in this release. Please revise the argument text (and description/details if needed) to reflect how the function resolvesdata_dir = NULL.tests/testthat/test-fg_pip-local.R (1)
25-34: Close or remove the unused DuckDB connection.All the
fg_pip()calls now rely solely onlkup, but we still open a DuckDB connection that is never used or closed. This leaves a stray handle to the cache database for the duration of the test session. Please drop the manualduckdb::dbConnect()(or ensure you disconnect it right away) so the test teardown stays clean.Apply this diff to remove the unused connection:
-con <- duckdb::dbConnect(duckdb::duckdb(), dbdir = fs::path(lkup$data_root, "cache", ext = "duckdb"))R/ui_country_profile.R (1)
40-42:yearparameter is ignored; honor MRV/ALL/specific yearsThe function always picks MRV. This breaks ui_cp_download(year = "ALL") and explicit year requests.
ui_cp_ki_headcount <- function(country, - year = "MRV", + year = "MRV", povline, lkup) { @@ - res <- - res_all[country_code == country - ][, - .SD[which.max(reporting_year)]] + res_country <- res_all[country_code == country] + if (is.character(year) && toupper(year) == "MRV") { + res <- res_country[, .SD[which.max(reporting_year)]] + } else if (is.character(year) && toupper(year) == "ALL") { + res <- res_country + } else { + yrs <- suppressWarnings(as.integer(year)) + res <- res_country[reporting_year %chin% yrs] + if (nrow(res) == 0L) res <- res_country[, .SD[which.max(reporting_year)]] + } @@ - res <- - if (nrow(res) > 1) { - if (all(c("consumption", "income") %in% res$welfare_type)) { - # Get consumpton when both are available - res[welfare_type == "consumption"] - } else { - # get the first observations. this must be revised. - res[1] - } - } else { - res - } + # within-year tie-breaker: prefer consumption; ensure one row per year + res <- res[, { + if (any(welfare_type == "consumption")) .SD[welfare_type == "consumption"][1] + else .SD[1] + }, by = .(country_code, reporting_year)]R/pip_grp_logic.R (1)
72-115: Dispatch ignoreslkup$use_new_lineup_versionnow
Switching tofg_pip_old()forces the legacy pipeline even whenlkup$use_new_lineup_versionisTRUE. That means any request hitting a “new lineup” lookup will now grab stale old-lineup FG data (or fail if the old cache doesn’t cover it), breaking aggregated outputs. Please reinstate the version-aware dispatch—e.g. callfg_pip()and let it choose, or replicate the flag check here and route tofg_pip_new()vsfg_pip_old().R/create_lkups.R (1)
703-717: Hash uses unstripped query_controls (version field not removed)You created
query_controls_hashwithout theversionfield, buthash_lkupusesquery_controls. This defeats the intent and can cause unnecessary cache churn.- hash_lkup <- list(hash_svy_lkup, + hash_lkup <- list(hash_svy_lkup, hash_ref_lkup, dist_stats, pop_region, cp_lkups, pl_lkup, censored, aux_files, return_cols, - query_controls, + query_controls_hash, aux_tables, valid_years )inst/plumber/v1/endpoints.R (1)
814-815: Docs typo in reporting_level options.Missing quote around "national".
-#* @param reporting_level:[chr] Reporting level. Options are "all", national", "urban", "rural". +#* @param reporting_level:[chr] Reporting level. Options are "all", "national", "urban", "rural".
🧹 Nitpick comments (59)
inst/TMP/TMP_API_launcher.R (1)
7-9: Guard against missing local data root
Sys.getenv()returns an empty string whenPIPAPI_DATA_ROOT_FOLDER_LOCALis absent. Passing that straight intofs::path()/fs::dir_ls()silently falls back to the working directory, so the subsequentcreate_versioned_lkups()call fails later with a much less obvious error. Please short-circuit here with an explicit check so failures are immediate and easier to diagnose.- data_dir <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_LOCAL") |> - fs::path() - fs::dir_ls(data_dir, recurse = FALSE) + data_root <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_LOCAL") + if (identical(data_root, "")) { + stop("PIPAPI_DATA_ROOT_FOLDER_LOCAL is not set; cannot locate local data root.", call. = FALSE) + } + data_dir <- fs::path(data_root) + fs::dir_ls(data_dir, recurse = FALSE)R/utils-plumber.R (2)
646-653: Trace capture may be empty; add robust fallback
rlang::trace_back()without an rlang-traced condition often yields little. Capture a usable trace for base errors.- out$trace <- utils::capture.output( - rlang::trace_back(bottom = 10, simplify = "branch") - ) + out$trace <- tryCatch({ + if (inherits(e, "rlang_error")) { + utils::capture.output(rlang::trace_back(e, simplify = "branch")) + } else { + as.character(sys.calls()) + } + }, error = \(.) NULL)
629-631: Make PIPAPI_DEBUG parsing more permissiveSupport common truthy values (“true”, “1”, “yes”, “t”) to avoid surprises across environments.
- if (is.null(debug)) { - debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE") - } + if (is.null(debug)) { + debug <- tolower(Sys.getenv("PIPAPI_DEBUG", "false")) %in% c("true","1","t","yes") + }R/create_countries_vctr.R (2)
15-18: Defensive checks for lkup structureFail fast if
lkupis missing required members; helps avoid cryptic errors downstream.create_countries_vctr <- function(country, year, lkup) { - valid_years <- lkup$valid_years - aux_files <- lkup$aux_files + stopifnot(is.list(lkup), !is.null(lkup$valid_years), !is.null(lkup$aux_files)) + valid_years <- lkup$valid_years + aux_files <- lkup$aux_files
261-268: Avoid eval(parse()) for column selection; build predicate programmaticallyParsing strings is harder to maintain and easy to break. Use
Reduce/lapplyover columns.- out <- user_alt_gt_code |> - # Create filter for data.table - paste0(" %in% user_alt_agg", collapse = " | ") |> - # parse the filter as unevaluated expression - {\(.) parse(text = .) }() |> - # filter and get country codes - {\(.) cl[eval(.), country_code] }() + idx <- Reduce(`|`, lapply(user_alt_gt_code, \(col) cl[[col]] %in% user_alt_agg)) + out <- cl[idx, country_code]man/assign_stat.Rd (1)
18-22: Clarify behavior for missing/extra names and recyclingDocument how
statnames are matched tolev(must cover all unique levels? errors vs. NA fill), and whether unnamed scalars are recycled.Suggested additions:
- If
statis a named vector/list, names must include all uniquelev; missing names yieldNA(or error).- If
statis scalar, it is broadcast to all rows.- Length of
countsmust matchlev.Also applies to: 24-33
R/compute_fgt_old.R (2)
11-51: Handle NAs and avoid recomputing denominators in FGT/WATTS loop
fmean/fsumshould setna.rm = TRUEto avoid NA propagation.- Precompute
W <- fsum(wt, na.rm = TRUE)once.- Minor: compute
lpov <- log(pov)once per iteration.compute_fgt_dt_old <- function(dt, welfare, weight, povlines) { w <- dt[[welfare]] wt <- dt[[weight]] n <- length(w) m <- length(povlines) # Pre-allocate result matrix res <- matrix(NA_real_, nrow = m, ncol = 3) colnames(res) <- c("FGT0", "FGT1", "FGT2") watts_vec <- numeric(m) # Precompute log(w) for efficiency logw <- rep(NA_real_, n) pos <- w > 0 logw[pos] <- log(w[pos]) + W <- fsum(wt, na.rm = TRUE) for (i in seq_along(povlines)) { - pov <- povlines[i] + pov <- povlines[i]; lpov <- log(pov) poor <- w < pov rel_dist <- 1 - (w / pov) rel_dist[!poor] <- 0 - res[i, 1] <- fmean(poor, w = wt) # FGT0 - res[i, 2] <- fmean(rel_dist, w = wt) # FGT1 - res[i, 3] <- fmean(rel_dist^2, w = wt) # FGT2 + res[i, 1] <- fmean(poor, w = wt, na.rm = TRUE) # FGT0 + res[i, 2] <- fmean(rel_dist, w = wt, na.rm = TRUE) # FGT1 + res[i, 3] <- fmean(rel_dist^2, w = wt, na.rm = TRUE) # FGT2 # Optimized Watts index calculation keep <- poor & pos if (any(keep, na.rm = TRUE)) { - watts_vec[i] <- (fsum((log(pov) - logw[keep]) * wt[keep])) / fsum(wt) + watts_vec[i] <- fsum((lpov - logw[keep]) * wt[keep], na.rm = TRUE) / W } else { watts_vec[i] <- 0 } }
87-99: Guardsetnames(dt, "area", "reporting_level")whenareais absentSome inputs may already use
reporting_level. Avoid errors by checking column existence.- setnames(dt, "area", "reporting_level") + if ("area" %in% names(dt)) setnames(dt, "area", "reporting_level")man/fg_remove_duplicates.Rd (1)
12-14: Document new argument in the Rd arguments list (via roxygen).
use_new_lineup_version = FALSEwas added to the usage, but there’s no corresponding\item{use_new_lineup_version}{...}in\arguments{}. Update the roxygen in R/fg_pip.R (add@param use_new_lineup_version logical: …, optionally linking to\code{\link{use_new_lineup_version}}), then regenerate docs to keep the public interface consistent.DESCRIPTION (1)
76-77: Pin remote dependency for reproducible PROD builds.
Remotes: PIP-Technical-Team/wbpip@DEVis volatile for a release. Consider pinning to a tag or commit SHA (e.g.,@vX.Y.Zor@<sha>) to avoid unreviewed upstream changes entering PROD.R/add_agg_stats.R (1)
16-23: Prefer data.table split by columns over interaction().For readability and speed:
-aggregated_list <- split(aggregated, - interaction( - aggregated$country_code, - aggregated$reporting_year, - aggregated$poverty_line - ), - drop = TRUE -) +aggregated_list <- split( + aggregated, + by = c("country_code", "reporting_year", "poverty_line"), + drop = TRUE +)tests/testthat/test-pip-local.R (1)
845-852: Stabilize the new popshare regression test.Add a row-count assertion to guard against accidental multi-row outputs and make the intent explicit.
test_that("make sure popshare bug no which was reported no longer exists", { - out <- pip(country = "USA", year = 2022, + out <- pip(country = "USA", year = 2022, popshare = .5, lkup = lkup) + expect_equal(nrow(out), 1) # Ensure poverty line is not the default one expect_false(out$poverty_line %in% c(1.9, 3)) # Ensure headcount is closer to 0.5 expect_equal(out$headcount, 0.5, tolerance = .05) })R/pipapi-env.R (1)
24-41: Consider validating keys before touching .pipapienv.Right now any non-string input bubbles straight into
rlang::env_get/env_poke, which will error on, e.g., vectors. A quickrlang::is_string(key) || abortcheck would make the helpers safer for downstream callers.man/add_attributes_as_columns_multi.Rd (3)
11-18: Tighten wording forreporting_level_rows$rows.Current: “rows (integer cumulative row ends)”. Suggest: “rows: integer vector of cumulative end-row indices for each segment.” Update in the roxygen source (R/utils-pipdata.R) so this file regenerates.
5-8: Add cross-reference to the vectorized variant.Add a See Also entry in the roxygen source:
@seealso [add_attributes_as_columns_vectorized]to help users choose between APIs.
31-34: Provide a minimal, non-executed example.Consider a small
\dontrun{}example that shows setting required attributes on a tiny data.table and calling the function. This improves discoverability. Update the roxygen source, not this Rd file.R/pip.R (3)
45-57: Avoid referencinglkupin a default argument; computelkup_hashinside.Defaulting
lkup_hash = lkup$cache_data_id$hash_pipin formals is brittle. Prefer a NULL default and derive inside after validatinglkup.-pip <- function(country = "ALL", +pip <- function(country = "ALL", ... - lkup, - censor = FALSE, - lkup_hash = lkup$cache_data_id$hash_pip, + lkup, + censor = FALSE, + lkup_hash = NULL, additional_ind = FALSE) { + # Normalize lkup_hash once lkup is available + if (is.null(lkup_hash)) { + stopifnot(!missing(lkup)) + lkup_hash <- lkup$cache_data_id$hash_pip + }
61-61: Make the lineup switch robust.Coerce to logical safely and handle missing flag.
- use_new <- lkup$use_new_lineup_version + use_new <- isTRUE(lkup$use_new_lineup_version)
45-53: Normalize enum-like arguments locally withmatch.arg.Ensure consistent behavior regardless of callee expectations.
pip <- function(country = "ALL", year = "ALL", povline = 1.9, popshare = NULL, fill_gaps = FALSE, - group_by = c("none", "wb"), - welfare_type = c("all", "consumption", "income"), - reporting_level = c("all", "national", "rural", "urban"), + group_by = c("none", "wb"), + welfare_type = c("all", "consumption", "income"), + reporting_level = c("all", "national", "rural", "urban"), ppp = NULL, lkup, censor = FALSE, lkup_hash = NULL, additional_ind = FALSE) { + group_by <- match.arg(group_by) + welfare_type <- match.arg(welfare_type) + reporting_level <- match.arg(reporting_level)R/rg_pip_old.R (3)
61-62: Qualifyrbindlistto avoid namespace ambiguity.Be explicit to ensure the data.table implementation is used.
- res <- rbindlist(res, fill = TRUE) + res <- data.table::rbindlist(res, fill = TRUE)
67-73: Qualifyjoinor switch to a well-known merge.
join(...)is not a base or data.table function. Either qualify it (e.g.,pipapi::join) or usedata.table::mergewithall=TRUEandon=..- out <- join(res, + out <- pipapi::join(res, metadata, on = c("file", "reporting_level"), how = "full", validate = "m:1", verbose = 0)Alternative with data.table:
# set keys/columns beforehand data.table::setkey(res, file, reporting_level) data.table::setkey(metadata, file, reporting_level) out <- metadata[res, on = .(file, reporting_level), nomatch = 0L, allow.cartesian = TRUE]
74-78: Guard renames if columns are absent.If
survey_mean_ppporsurvey_median_pppare missing, this will error. Add presence checks.- out[, `:=`( - mean = survey_mean_ppp, - median = survey_median_ppp, - file = NULL - )] + if ("survey_mean_ppp" %in% names(out)) out[, mean := survey_mean_ppp] + if ("survey_median_ppp" %in% names(out)) out[, median := survey_median_ppp] + out[, file := NULL]man/safe_endpoint.Rd (1)
54-65: Example: show idiomatic wrapper usageThe example currently invokes the returned function inline. Consider documenting the typical pattern (returning the wrapped handler) for clarity.
-#* @get /api/v1/pip -function(req, res) { - safe_endpoint(function(req, res) { - params <- req$argsQuery - params$lkup <- lkups$versions_paths[[req$argsQuery$version]] - params$version <- NULL - do.call(pipapi::ui_pip, params) - }, endpoint = "/api/v1/pip")(req, res) -} +#* @get /api/v1/pip +function(req, res) safe_endpoint( + fun = function(req, res) { + params <- req$argsQuery + params$lkup <- lkups$versions_paths[[req$argsQuery$version]] + params$version <- NULL + do.call(pipapi::ui_pip, params) + }, + endpoint = "/api/v1/pip" +)(req, res)inst/plumber/v1/plumber.R (2)
3-7: Respect pre-set thread env varsAvoid overriding admin‑provided caps. Set only if unset.
-Sys.setenv( - OPENBLAS_NUM_THREADS = "1", - MKL_NUM_THREADS = "1", - OMP_NUM_THREADS = "1" -) +env <- Sys.getenv(c("OPENBLAS_NUM_THREADS","MKL_NUM_THREADS","OMP_NUM_THREADS"), NA) +set_if_empty <- function(k, v) if (is.na(env[[k]]) || nzchar(env[[k]]) == FALSE) Sys.setenv(structure(v, names = k)) +set_if_empty("OPENBLAS_NUM_THREADS", "1") +set_if_empty("MKL_NUM_THREADS", "1") +set_if_empty("OMP_NUM_THREADS", "1")
79-85: Guard API spec path existencesystem.file can return "". Add a check to avoid failing at startup.
- plumber::pr_set_api_spec(yaml::read_yaml(api_spec_path)) + plumber::pr_set_api_spec({ + if (!nzchar(api_spec_path) || !file.exists(api_spec_path)) { + warning("API spec not found at: ", api_spec_path) + NULL + } else { + yaml::read_yaml(api_spec_path) + } + })R/ui_country_profile.R (3)
121-127: Make PG optional and resilientIf pg_svy is absent or empty, avoid errors and still return charts.
- pg <- get_aux_table(data_dir = lkup$data_root, - "pg_svy") - pg <- pg[country_code == country] - dl2[["pg"]] <- pg + pg <- tryCatch(get_aux_table(data_dir = lkup$data_root, "pg_svy"), + error = function(e) data.table::as.data.table(NULL)) + if (NROW(pg)) { + dl2[["pg"]] <- pg[country_code == country] + }
129-131: List shaping: avoid extra nestingIf consumers don’t expect an extra list level, drop the outer list().
- dl <- list(append(dl, dl2)) |> - setNames(country) + dl <- append(dl, dl2) |> + (\(.) list(.[[1]] = .))() |> + setNames(country)
322-322: Qualify rbindlist or ensure importUnqualified rbindlist requires NAMESPACE import. Safer to use data.table::rbindlist here.
- rbindlist(use.names = TRUE) + data.table::rbindlist(use.names = TRUE)man/add_attributes_as_columns_vectorized.Rd (1)
13-16: Document invisible return and by‑reference natureSmall doc polish: make it explicit the function returns dt invisibly, modified by reference.
-\value{ -The same \code{data.table}, modified by reference, with added columns: +\value{ +The same \code{data.table}, \emph{modified by reference} (returned \code{invisibly}), +with added columns:R/rg_pip.R (1)
20-23: Remove premature povline unlist (dead/bug‑prone)This is overwritten by metadata$povline later; keeping it risks type mismatches.
- # povline is set to NULL if popshare is given - if (!is.null(popshare)) povline <- NULL - if (is.list(povline)) povline <- unlist(povline) + # povline is set to NULL if popshare is given + if (!is.null(popshare)) povline <- NULLR/ui_home_page.R (2)
11-13: Roxygen/doc mismatch for lkup_hash defaultThe param says “hash of pip” but default uses
hash_pip_grp. Align the docstring (or the default) to avoid confusion.Apply this small doc tweak:
-#' @param lkup_hash character: hash of pip +#' @param lkup_hash character: cache hash (pip_grp)
85-89: Guard against NA and control precision for UI valuesIf either
reporting_poporheadcountis NA you’ll propagate NA. Consider a NA‑safe computation and optional rounding for display.- out[, - `:=`( - pop_in_poverty = reporting_pop * headcount / pop_units, - reporting_pop = reporting_pop / pop_units - )] + out[, `:=`( + reporting_pop = reporting_pop / pop_units, + pop_in_poverty = fifelse(!is.na(headcount) & !is.na(reporting_pop), + reporting_pop * headcount, NA_real_) + )] + # Optionally round for UI: + # out[, c("reporting_pop","pop_in_poverty") := lapply(.SD, round, 2), + # .SDcols = c("reporting_pop","pop_in_poverty")]R/create_lkups.R (3)
78-80: Name shadowing makes the code harder to reason about
use_new_lineup_version <- use_new_lineup_version(versions)shadows the function with a logical. Prefer a different variable name.- use_new_lineup_version <- use_new_lineup_version(versions) + use_new <- use_new_lineup_version(versions)Also update subsequent
if (use_new_lineup_version)checks toif (use_new).
777-793: Expose lineup metadata in lkup for downstream consumers
ui_hp_stacked()needs lineup years whenuse_newis true. Consider attaching them for easy access.lkup <- list( @@ - use_new_lineup_version = use_new_lineup_version) + use_new_lineup_version = use_new + ) + if (isTRUE(use_new)) { + lkup$lineup_years <- lineup_years$lineup_years + }
95-95: Inline “Why is this necessary?” commentThe
setnames(country_list, 'region', 'region_name')question should be clarified or removed before release.Add a brief rationale (schema normalization) or drop the comment.
R/fgt_cumsum.R (3)
43-51: Avoidlist2env+rm; use explicit localsUsing
list2env(LDTg, envir = environment())makes scope brittle. Prefer explicit binding for clarity and safety.-get_total_pop <- \(LDTg) { - list2env(LDTg, envir = environment()) - rm(LDTg) - add_vars(g[["groups"]], - get_vars(DT, c("weight")) |> - fsum(g)) |> - setnames(old = "weight", - new = "W") -} +get_total_pop <- \(LDTg) { + DT <- LDTg$DT + g <- LDTg$g + add_vars(g[["groups"]], + get_vars(DT, "weight") |> fsum(g)) |> + setnames("weight", "W") +}
151-163: Deterministic coding: OK, add explicit integer type forcodeLooks good. For robustness after rbinds, consider setting
storage.mode(code) <- "integer"once at the end (after possible updates).
213-224: Error message: count of unseen pairs usesfsum(nas)
fsumover logicals is fine, but to avoid dependency surprises usesum(nas)here; it’s clearer and avoids type confusion in messages.- "encode_pairs(): {fsum(nas)} unseen (id, reporting_level) pair(s).", + "encode_pairs(): {sum(nas)} unseen (id, reporting_level) pair(s).",R/utils-pipdata.R (6)
10-28: Type/shape handling intransform_input
- If
yearsis a vector, it’s replicated per country as intended; OK.- Minor: “formated” -> “formatted” in return doc. Also clarify expected
input_liststructure in comments (current@inheritParamsmay mislead).
98-101: Prefersetalloccolover comment uncertainty
setalloccol(dt)is the by-ref allocator; the comment shows uncertainty. Keep it and drop the comment, or usealloc.col(dt)consistently across helpers.- setalloccol(dt) # pre-allocate room for new columns... #AC, I am still not sure about this. + setalloccol(dt) # pre-allocate room for new columns
111-119: Replace loop with vectorized replicationYou already use a vectorized approach in
add_attributes_as_columns_multi. Apply the same here for speed and simplicity.- reporting_level_vec <- character(n) - start <- 1L - for (i in seq_along(lev)) { - end <- rows[i] - reporting_level_vec[start:end] <- lev[i] - start <- end + 1L - } - dt[, reporting_level := reporting_level_vec] + dt[, reporting_level := rep.int(lev, diff(c(0L, rows)))]
133-149: Distribution stats loop can be simplified and protected
- If
dscontains NULLs or missing names, assignments per level may warn.- Consider using
assign_stat(you already export it) or map once by names to avoid per-level subsetting.- if (length(ds)) { - dstats <- c("mean", "median") - for (l in lev) { - wrl <- whichv(dt$reporting_level, l) - ld <- lapply(dstats, \(d) { - if (is.null(ds[[d]][[l]])) { - NA - } else { - ds[[d]][[l]] - } - }) - dt[wrl, (dstats) := ld] - } - } + if (length(ds)) { + assign_stat(dt, lev, diff(c(0L, rows)), ds$mean, "mean") + assign_stat(dt, lev, diff(c(0L, rows)), ds$median, "median") + }
181-214: Consistency with allocatorHere you use
alloc.col(dt)while the vectorized variant usessetalloccol(dt). Choose one;setalloccolis idiomatic for by-ref growth.
317-322: Handle NULLdist_statsentries safelyIf any element yields
NULL,rbindlistcan error. Filter out NULLs first.- lapply(lt_att, \(.) { - .$dist_stats - }) |> - rbindlist(fill = TRUE) + stats_list <- lapply(lt_att, \(.) .$dist_stats) + stats_list <- Filter(Negate(is.null), stats_list) + rbindlist(stats_list, fill = TRUE)R/fg_pip.R (2)
224-241: Stray debug print and comment blocks
print("here")and commented blocks should be removed or guarded by a verbose flag.- print("here") + # message("fg_remove_duplicates: legacy lineup path")
255-266:fg_standardize_cache_id: wrong variable sanitizedYou
gsub(..., interpolation_id)insideifelse, but are returning a standardized cache_id. Likely intent is to sanitizecache_id. Also, the function is currently unused; either fix or remove to avoid confusion.- out <- ifelse(grepl("|", interpolation_id, fixed = TRUE), - gsub(paste0("_", - unique(reporting_level), - collapse = '|'), - '', - interpolation_id), - cache_id) + out <- ifelse( + grepl("|", interpolation_id, fixed = TRUE), + gsub(paste0("_", unique(reporting_level), collapse = "|"), "", cache_id), + cache_id + )R/pip_old_lineups.R (3)
145-149: Prefer robust binding over rowbind() and confirm originrowbind may not be available/consistent; rbindlist is safer and faster.
- out <- main_data |> - rowbind(cached_data) + out <- data.table::rbindlist(list(main_data, cached_data), use.names = TRUE, fill = TRUE)If rowbind() is intended (internal helper), confirm it’s in namespace and handles column mismatches.
67-68: Unused parameter lkup_hashlkup_hash is never referenced in this function. Remove it or wire it into cache/update paths.
109-113: DuckDB file creation racefile.exists() + create could race under concurrency. Consider tryCatch around creation or lazy connect that creates the DB if missing.
R/fg_pip_old.R (2)
96-103: Loop index: prefer nrow(ctry_years) over length of a columnSafer when interpolation_id has NAs or differing length.
- for (ctry_year_id in seq_along(ctry_years$interpolation_id)) { + for (ctry_year_id in seq_len(nrow(ctry_years))) {
241-249: Regex replacement clarity in fg_standardize_cache_id_oldPattern builds alternation from unique(reporting_level). Add explicit anchors or word boundaries to avoid accidental matches (e.g., substrings).
R/compute_fgt_new.R (2)
255-256: Guard setnames for area → reporting_levelsetnames() errors if 'area' is absent. Guard to avoid hard failures across heterogeneous inputs.
- setnames(dt, "area", "reporting_level") + if ("area" %in% names(dt)) { + setnames(dt, "area", "reporting_level") + }
24-31: Log handling: precompute only where positiveYou already mask non-positive with pos in compute_fgt_dt; replicate or document behavior in compute_fgt for consistency (currently logs all w with warnings suppressed).
inst/plumber/v1/endpoints.R (4)
18-30: Harden request-id generation (avoid FP artifacts/precision loss).Formatting doubles to string can yield decimals and precision loss for epoch-ms. Generate an integer-like string reliably.
-.req_id <- function() { - ts_ms_num <- as.numeric(Sys.time()) * 1000 - ts_ms_chr <- format(ts_ms_num, scientific = FALSE, trim = TRUE) - rnd <- sample.int(1e9, 1) - id_raw <- paste0(ts_ms_chr, "-", rnd) - - list( - id_raw = id_raw, - timestamp = as.POSIXct(as.numeric(ts_ms_chr) / 1000, - origin = "1970-01-01", tz = "UTC"), - random = rnd - ) -} +.req_id <- function() { + ts_ms_chr <- formatC(as.numeric(Sys.time()) * 1000, format = "f", digits = 0) + rnd <- sample.int(1e9, 1) + id_raw <- paste0(ts_ms_chr, "-", rnd) + list( + id_raw = id_raw, + timestamp = as.POSIXct(as.numeric(ts_ms_chr) / 1000, + origin = "1970-01-01", tz = "UTC"), + random = rnd + ) +}
230-231: Poverty line rounding inconsistent with pipeline (2 vs 3 decimals).API rounds to 2 d.p., pip_new_lineups() rounds to 3 (Line 76 in R/pip_new_lineups.R). Align to one rule to avoid cache/data mismatches.
Would you like me to generate a small patch aligning both to 2 d.p., or is 3 d.p. the new standard?
327-354: Avoid per-endpoint tryCatch/timeout duplication; prefer safe_endpoint wrapper.You already introduced safe_endpoint for cp-charts. Consider wrapping /api/v1/pip similarly to centralize 503/500 shaping and request_id propagation.
858-867: pc-download lacks timeout handling; add with_req_timeout + CSV-friendly 503.Without a timeout guard this route can hang. Return a minimal CSV row on timeout to honor the serializer.
-#* @serializer csv -function(req) { +#* @serializer csv +function(req, res) { params <- req$argsQuery params$lkup <- lkups$versions_paths[[req$argsQuery$version]] params$pop_units <- 1 params$version <- NULL params$censor <- TRUE - do.call(pipapi::ui_pc_charts, params) + out <- do.call(pipapi::ui_pc_charts, params) |> + with_req_timeout() + if (is.null(out)) { + res$status <- 503 + return(data.frame(error = "Request timed out", + request_id = req$.id, + endpoint = "/api/v1/pc-download")) + } + out }R/pip_new_lineups.R (2)
76-77: Rounding policy: 3 d.p. here vs 2 d.p. in endpoints.Decide one rounding rule for povline to keep caches/keys consistent across API and internal calls.
If 2 d.p. is expected at the API boundary, should pip_new_lineups() accept the raw value but round to 2 d.p. internally?
107-113: Ensure cache file existence creates parent dir and handles failures.If data_root lacks a cache dir, create it before writing the DuckDB file; surface failure with a clear error.
- cache_file_path <- fs::path(lkup$data_root, 'cache', ext = "duckdb") + cache_dir <- fs::path(lkup$data_root) + fs::dir_create(cache_dir, recurse = TRUE) + cache_file_path <- fs::path(cache_dir, 'cache', ext = "duckdb") if (!file.exists(cache_file_path)) { # Create an empty duckdb file create_duckdb_file(cache_file_path) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (98)
DESCRIPTION(2 hunks)NAMESPACE(3 hunks)NEWS.md(1 hunks)R/aaa.R(1 hunks)R/add_agg_stats.R(1 hunks)R/compute_fgt_new.R(1 hunks)R/compute_fgt_old.R(1 hunks)R/create_countries_vctr.R(2 hunks)R/create_lkups.R(9 hunks)R/duckdb_func.R(13 hunks)R/fg_pip.R(3 hunks)R/fg_pip_old.R(1 hunks)R/fgt_cumsum.R(1 hunks)R/get_aux_table.R(2 hunks)R/infer_poverty_line.R(1 hunks)R/pip.R(2 hunks)R/pip_agg.R(1 hunks)R/pip_grp.R(3 hunks)R/pip_grp_logic.R(4 hunks)R/pip_grp_new.R(1 hunks)R/pip_new_lineups.R(1 hunks)R/pip_old.R(7 hunks)R/pip_old_lineups.R(1 hunks)R/pipapi-env.R(1 hunks)R/pipapi-package.R(1 hunks)R/rg_pip.R(3 hunks)R/rg_pip_old.R(1 hunks)R/ui_country_profile.R(3 hunks)R/ui_home_page.R(3 hunks)R/ui_poverty_indicators.R(4 hunks)R/utils-pipdata.R(1 hunks)R/utils-plumber.R(1 hunks)R/utils.R(10 hunks)R/zzz.R(2 hunks)data-raw/data.R(1 hunks)inst/TMP/TMP_API_launcher.R(1 hunks)inst/TMP/TMP_duckdb_cache.R(1 hunks)inst/plumber/v1/endpoints.R(14 hunks)inst/plumber/v1/plumber.R(1 hunks)man/add_attributes_as_columns_multi.Rd(1 hunks)man/add_attributes_as_columns_vectorized.Rd(1 hunks)man/add_dist_stats.Rd(1 hunks)man/add_dist_stats_old.Rd(1 hunks)man/assign_stat.Rd(1 hunks)man/build_pair_dict.Rd(1 hunks)man/compute_fgt.Rd(1 hunks)man/compute_fgt_dt.Rd(1 hunks)man/compute_fgt_dt_old.Rd(1 hunks)man/create_countries_vctr.Rd(1 hunks)man/create_full_list.Rd(1 hunks)man/decode_pairs.Rd(1 hunks)man/empty_response_fg_add.Rd(1 hunks)man/encode_pairs.Rd(1 hunks)man/fg_assign_nas_values_to_dup_cols_old.Rd(1 hunks)man/fg_pip.Rd(1 hunks)man/fg_pip_old.Rd(1 hunks)man/fg_remove_duplicates.Rd(1 hunks)man/fg_remove_duplicates_old.Rd(1 hunks)man/fg_standardize_cache_id_old.Rd(1 hunks)man/fgt_cumsum.Rd(1 hunks)man/format_lfst.Rd(1 hunks)man/get_aux_table.Rd(1 hunks)man/get_aux_table_ui.Rd(1 hunks)man/get_country_code_subset.Rd(1 hunks)man/get_dt_dist_stats.Rd(1 hunks)man/get_from_pipapienv.Rd(1 hunks)man/get_lt_attr.Rd(1 hunks)man/get_mean_median.Rd(1 hunks)man/get_pipapienv.Rd(1 hunks)man/get_rl_rows_single.Rd(1 hunks)man/get_total_pop.Rd(1 hunks)man/infer_poverty_line.Rd(1 hunks)man/list_code_column_values.Rd(1 hunks)man/load_data_list.Rd(1 hunks)man/load_data_list_old.Rd(1 hunks)man/load_list_refy.Rd(1 hunks)man/map_fgt.Rd(1 hunks)man/pip.Rd(2 hunks)man/pip_agg.Rd(3 hunks)man/pip_aggregate_by.Rd(1 hunks)man/pip_new_lineups.Rd(1 hunks)man/pip_old.Rd(1 hunks)man/pip_old_lineups.Rd(1 hunks)man/rg_pip_old.Rd(2 hunks)man/safe_endpoint.Rd(1 hunks)man/set_in_pipapienv.Rd(1 hunks)man/subset_lkup.Rd(2 hunks)man/transform_input.Rd(1 hunks)man/ui_cp_poverty_charts.Rd(1 hunks)man/ui_pc_charts.Rd(2 hunks)man/update_master_file.Rd(1 hunks)man/update_pair_dict.Rd(1 hunks)man/use_new_lineup_version.Rd(1 hunks)man/with_req_timeout.Rd(1 hunks)tests/testthat/test-create_countries_vctr.R(7 hunks)tests/testthat/test-fg_pip-local.R(7 hunks)tests/testthat/test-pip-local.R(1 hunks)tests/testthat/test-utils.R(5 hunks)
🔇 Additional comments (33)
R/compute_fgt_old.R (1)
32-35: No action needed—NAMESPACE already imports fmean/fsum
NAMESPACE containsimport(collapse, except = fdroplevels)(line 64), sofmean/fsumare available unqualified.Likely an incorrect or invalid review comment.
tests/testthat/test-utils.R (1)
69-71: LGTM: pass povline = NULL explicitly.The updated
subset_lkupcalls reflect the new signature and keep test intent clear.Also applies to: 83-85, 97-99, 111-113, 129-131
DESCRIPTION (1)
73-75: Imports verified BothparallelandR.utilsare called viapkg::fun(e.g.parallel::detectCores,R.utils::withTimeout), so noimportFromentries are required in NAMESPACE.parallelships with base R andR.utilsis listed under Imports, ensuring it’s installed in CI.R/pipapi-package.R (1)
62-69: LGTM on the new dataset entry.
The added Rd stub matches the existing pattern for empty-response datasets and looks consistent.tests/testthat/test-create_countries_vctr.R (1)
455-580: Thanks for updating the test calls.
Passinglkupkeeps the tests aligned with the refactoredcreate_countries_vctrsignature.man/fg_standardize_cache_id_old.Rd (1)
5-20: Doc entry looks good.
The usage, arguments, and description line up with the OLD helper’s purpose.man/empty_response_fg_add.Rd (1)
1-16: Documentation looks good.The dataset metadata aligns with the new export and matches the pattern of the existing empty_response documentation.
man/list_code_column_values.Rd (1)
1-25: Clear and complete helper documentation.Argument descriptions, return value, and example are crisp—thank you for keeping this helper well documented.
man/update_pair_dict.Rd (1)
1-24: Internal helper doc looks solid.The usage, argument descriptions, and purpose are all spelled out clearly for
update_pair_dict; no issues on my end.man/subset_lkup.Rd (1)
17-45: Nice addition ofpopsharedocs.The new parameter is clearly introduced with defaults and description, keeping the API docs aligned with the implementation.
man/decode_pairs.Rd (1)
19-41: Clear and complete docstring.The parameter and return-value sections read cleanly and match the new decode workflow. No further action needed.
R/pip_old.R (1)
144-150: Nice guard on mixed path types.Forcing
pathto character before binding cached rows prevents type clashes when re-reading cached data. Good catch.man/add_attributes_as_columns_multi.Rd (1)
21-23: Confirm exposure offilecolumn in public output.Including
filein the returned columns can leak internal filenames and collide during joins. If this column is only for internal joins, consider dropping it before returning or explicitly documenting its contents and stability guarantees. Please confirm intent; otherwise, adjust implementation/docs.R/pip.R (2)
55-55: Behavior change:censornow defaults to FALSE.This is a breaking default change. Please confirm downstream callers/tests and update NEWS/docs accordingly.
65-93: Dispatcher structure looks good.Clear delegation to new/old implementations based on the lookup flag; parameter pass-through is consistent.
If either callee expects scalar enums, the
match.argnormalization above will avoid surprises.R/rg_pip_old.R (1)
47-50: Empty-result handling is clean.Early return with placeholder and cache flags avoids downstream errors.
man/safe_endpoint.Rd (1)
27-35: Clarify message masking vs. debug behaviorYou state message is actual error only in debug, otherwise “Internal Server Error”. Please ensure this matches the implemented wrapper. If the global plumber error handler still returns err$message unconditionally, update either the code or this doc to avoid mixed guarantees.
R/rg_pip.R (2)
71-79: process_dt signature matches per-survey povline usage Verified thatprocess_dtis defined asfunction(dt, povline, …)inR/compute_fgt_new.R, so bothMap(process_dt, lt, povline)andlapply(lt, process_dt, povline = povline)correctly align.
59-69: Ensure correct sorting before calling infer_poverty_line
infer_poverty_line(..., assume_sorted = TRUE)skips any internal sorting and requireswelfare/weightto be pre-ordered. Sinceload_data_list()doesn’t guarantee sorted inputs, either sortx$welfareandx$weightbefore the call or setassume_sorted = FALSE. You can mirror the logic infg_pip.R(lines 71–75) to choose or enforce sorting.inst/plumber/v1/plumber.R (1)
24-43: Omit preroute hook:req$.startis already initialized
Thectxfilter ininst/plumber/v1/endpoints.Rsetsreq$.start <- .now()before routing, so adding a preroute hook is unnecessary.man/fg_remove_duplicates_old.Rd (1)
1-24: Doc matches exported helper
Signature and default columns line up with the code infg_pip_old, so the generated Rd looks good.NAMESPACE (1)
3-66: Exports/imports consistent with new surface
The additional exports and the explicitdata.tableimport reflect the new helpers and look consistent with the rest of the PR changes.R/create_lkups.R (1)
1018-1030: Exported helper looks good
use_new_lineup_version(x)is clear and robust. No changes requested.R/pip_grp.R (2)
280-288: Order of operations: compute pop_in_poverty after fixing populationAfter correcting aggregation to sum
reporting_pop, computingpop_in_povertyis fine. No change beyond the fixes above.Please re-check a few known totals (e.g., WLD headcount × pop vs. pop_in_poverty) after applying the aggregation fixes.
53-62: fg_pip_old usage acknowledgedUsing
fg_pip_old()here is fine as a transitional step. Ensure retirement plan/timeline is tracked.R/fgt_cumsum.R (1)
75-85: Cutpoint join assumes existence of index==0 row
findInterval(..., left.open = TRUE)can return 0, and the next join expects an index==0 row to exist inDT_min. Please assert/guarantee that zero row exists per group before the join.R/utils-pipdata.R (2)
248-276: Typed NA assignment inassign_statis OK viarep.intLooks good. Minor: when broadcasting scalar, you can avoid
rep.intbydt[, (colname) := unname(v)](data.table recycles). Current version is explicit and fine.
329-335: Attribute name consistencyYou return
dist_stats = attr(., "dt_dist_stats")but other helpers expect an attribute nameddist_stats. Confirm this divergence is intentional. Otherwise, standardize attribute names to avoid subtle bugs.R/fg_pip.R (4)
63-64:load_list_refydepends on fix to_placeholderThis call will currently fail due to the
_bug reported in R/fgt_cumsum.R. After fixing that, this section should work.
68-83: Popshare path: ensurepovlinecolumn name consistencyDownstream you unconditionally
setnames(out, "povline", "poverty_line"). Confirm thatprocess_dtreturns apovlinecolumn in the popshare branch; otherwisesetnameswill error. Adjust to rename only when present.- setnames(out, "povline", "poverty_line") + if ("povline" %in% names(out)) setnames(out, "povline", "poverty_line")
163-175:join(..., overid = 2): confirm intended duplicate-handlingUsing
overid = 2changes how duplicate keys are handled. Please confirm this aligns with expectations for metadata multiplicity or document the choice.
45-47: LikelysetDT()misuse with pipe (assigns NULL)
metadata <- metadata$lkup |> setDT()can assignNULLbecausesetDT()converts by reference and returns (often invisibly). Useas.data.table()or assign thensetDT()separately.- metadata <- metadata$lkup |> - setDT() + metadata <- as.data.table(metadata$lkup) + # or: + # metadata <- metadata$lkup; setDT(metadata)Likely an incorrect or invalid review comment.
R/compute_fgt_new.R (1)
51-66: mean_and_med path assumes columns existffirst(dt$mean)/median/reporting_year require those columns present in dt. Add checks or compute them upstream to avoid subscript errors.
| fg <- fg_pip( | ||
| ctr, | ||
| year = year, | ||
| povline = 3, | ||
| welfare_type = "all", | ||
| reporting_level = "all", | ||
| popshare = NULL, | ||
| lkup = lkup | ||
| ) |> | ||
| rbindlist(fill = TRUE) | ||
| empty_response_fg <- fg[-1] | ||
|
|
||
|
|
||
| add_vars_out_of_pipeline(fg, fill_gaps = TRUE, lkup = lkup) | ||
|
|
||
| empty_response_fg_add <- fg[-1] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty_response_fg mutates alongside fg
empty_response_fg <- fg[-1] captures a shallow view of the same column pointers. When add_vars_out_of_pipeline(fg, …) appends columns by reference, those columns also appear in empty_response_fg, so we lose the “pre augmentation” snapshot. Please take a defensive copy before calling add_vars_out_of_pipeline, e.g.
-empty_response_fg <- fg[-1]
+empty_response_fg <- data.table::copy(fg[-1])That guarantees empty_response_fg stays untouched while fg gains the extra fields.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fg <- fg_pip( | |
| ctr, | |
| year = year, | |
| povline = 3, | |
| welfare_type = "all", | |
| reporting_level = "all", | |
| popshare = NULL, | |
| lkup = lkup | |
| ) |> | |
| rbindlist(fill = TRUE) | |
| empty_response_fg <- fg[-1] | |
| add_vars_out_of_pipeline(fg, fill_gaps = TRUE, lkup = lkup) | |
| empty_response_fg_add <- fg[-1] | |
| fg <- fg_pip( | |
| ctr, | |
| year = year, | |
| povline = 3, | |
| welfare_type = "all", | |
| reporting_level = "all", | |
| popshare = NULL, | |
| lkup = lkup | |
| ) |> | |
| rbindlist(fill = TRUE) | |
| empty_response_fg <- data.table::copy(fg[-1]) | |
| add_vars_out_of_pipeline(fg, fill_gaps = TRUE, lkup = lkup) | |
| empty_response_fg_add <- fg[-1] |
🤖 Prompt for AI Agents
In data-raw/data.R around lines 38 to 54, empty_response_fg currently holds a
shallow view of fg (fg[-1]) so when add_vars_out_of_pipeline mutates fg by
reference those new columns also appear in empty_response_fg; make a defensive
deep copy of the snapshot before calling add_vars_out_of_pipeline (use
data.table::copy on the result you want to preserve) so empty_response_fg
remains the pre-augmentation snapshot while fg can be safely modified in-place.
| ncores <- parallel::detectCores(logical = FALSE) | ||
| data.table::setDTthreads(max(1L, ncores)) | ||
| collapse::set_collapse(nthreads = max(1L, ncores)) | ||
| fst::threads_fst(max(1L, ncores)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle detectCores() == NA to avoid propagating NA threads
detectCores can return NA in some environments; max(1L, NA) ⇒ NA. Guard it.
-ncores <- parallel::detectCores(logical = FALSE)
-data.table::setDTthreads(max(1L, ncores))
-collapse::set_collapse(nthreads = max(1L, ncores))
-fst::threads_fst(max(1L, ncores))
+ncores <- parallel::detectCores(logical = FALSE)
+ncores <- if (is.na(ncores)) 1L else as.integer(max(1L, ncores))
+data.table::setDTthreads(ncores)
+collapse::set_collapse(nthreads = ncores)
+fst::threads_fst(ncores)🤖 Prompt for AI Agents
In inst/plumber/v1/plumber.R around lines 9 to 13, parallel::detectCores(logical
= FALSE) can return NA which makes max(1L, ncores) produce NA and propagate
invalid thread counts; change the assignment to coerce a safe integer fallback
(e.g. set ncores <- as.integer(parallel::detectCores(logical = FALSE)); if
(is.na(ncores) || ncores < 1L) ncores <- 1L) and then use that safe ncores when
calling setDTthreads(), set_collapse(), and threads_fst().
| plumber::pr_set_error(function(req, res, err) { | ||
| method <- req$REQUEST_METHOD %||% "" | ||
| path <- req$PATH_INFO %||% "" | ||
| rid <- req$.id %||% "NA" | ||
|
|
||
| cat( | ||
| sprintf( | ||
| '{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n', | ||
| rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE) | ||
| ), | ||
| file = stderr() | ||
| ) | ||
|
|
||
| res$status <- 500 | ||
| list( | ||
| error = "Internal Server Error", | ||
| message = err$message, | ||
| path = path, | ||
| method = method, | ||
| request_id = rid | ||
| ) | ||
| }) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t leak internal error messages in PROD responses
The handler returns err$message to clients unconditionally. Gate it on PIPAPI_DEBUG and keep full details in logs.
-plumber::pr_set_error(function(req, res, err) {
+plumber::pr_set_error(function(req, res, err) {
method <- req$REQUEST_METHOD %||% ""
path <- req$PATH_INFO %||% ""
rid <- req$.id %||% "NA"
cat(
sprintf(
'{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n',
rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE)
),
file = stderr()
)
res$status <- 500
+ debug <- tolower(Sys.getenv("PIPAPI_DEBUG", "false")) %in% c("true","1","t","yes","y")
list(
error = "Internal Server Error",
- message = err$message,
+ message = if (debug) err$message else "Internal Server Error",
path = path,
method = method,
request_id = rid
)
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| plumber::pr_set_error(function(req, res, err) { | |
| method <- req$REQUEST_METHOD %||% "" | |
| path <- req$PATH_INFO %||% "" | |
| rid <- req$.id %||% "NA" | |
| cat( | |
| sprintf( | |
| '{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n', | |
| rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE) | |
| ), | |
| file = stderr() | |
| ) | |
| res$status <- 500 | |
| list( | |
| error = "Internal Server Error", | |
| message = err$message, | |
| path = path, | |
| method = method, | |
| request_id = rid | |
| ) | |
| }) |> | |
| plumber::pr_set_error(function(req, res, err) { | |
| method <- req$REQUEST_METHOD %||% "" | |
| path <- req$PATH_INFO %||% "" | |
| rid <- req$.id %||% "NA" | |
| cat( | |
| sprintf( | |
| '{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n', | |
| rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE) | |
| ), | |
| file = stderr() | |
| ) | |
| res$status <- 500 | |
| debug <- tolower(Sys.getenv("PIPAPI_DEBUG", "false")) %in% c("true", "1", "t", "yes", "y") | |
| list( | |
| error = "Internal Server Error", | |
| message = if (debug) err$message else "Internal Server Error", | |
| path = path, | |
| method = method, | |
| request_id = rid | |
| ) | |
| }) |> |
🤖 Prompt for AI Agents
In inst/plumber/v1/plumber.R around lines 56 to 77, the error handler currently
returns err$message to clients unconditionally; change it to only include the
full err$message in the HTTP response when PIPAPI_DEBUG is set (e.g.
Sys.getenv("PIPAPI_DEBUG") == "1"), otherwise return a generic message like
"Internal Server Error"; keep the existing detailed JSON error written to stderr
(for logs) and populate the response fields (error, message, path, method,
request_id) with the generic message when not in debug mode and with the real
err$message when in debug mode.
| dest_cache <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_SERVER") |> | ||
| fs::path(ver_to_use, "cache.duckdb") | ||
|
|
||
| if (fs::file_exists(ori_cache)) { | ||
| fs::file_copy(ori_cache, dest_cache, overwrite = TRUE) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create the destination directory before copying.
Line 55 constructs a new per-release folder, but Line 59 calls fs::file_copy without ensuring that directory exists. On a fresh release this will raise No such file or directory. Please create the parent directory first.
dest_cache <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_SERVER") |>
fs::path(ver_to_use, "cache.duckdb")
if (fs::file_exists(ori_cache)) {
+ fs::dir_create(fs::path_dir(dest_cache), recurse = TRUE)
fs::file_copy(ori_cache, dest_cache, overwrite = TRUE)
}🤖 Prompt for AI Agents
In inst/TMP/TMP_duckdb_cache.R around lines 55 to 60, the code builds dest_cache
but calls fs::file_copy without ensuring its parent directory exists, which will
fail on a fresh release; before copying, create the destination directory (e.g.,
call fs::dir_create on the parent directory of dest_cache) so the directory
exists, then proceed to fs::file_copy as before.
| \title{Title} | ||
| \usage{ | ||
| compute_fgt_dt_old(dt, welfare, weight, povlines) | ||
| } | ||
| \arguments{ | ||
| \item{dt}{data frame with \code{welfare} and \code{weight} columns} | ||
|
|
||
| \item{welfare}{character: welfare variable name} | ||
|
|
||
| \item{weight}{character: weight variable name} | ||
|
|
||
| \item{povlines}{double: vector with poveryt lines} | ||
| } | ||
| \value{ | ||
| data.table with estimates poverty estimates | ||
| } | ||
| \description{ | ||
| Title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace placeholder copy and typos.
The help file still has scaffolding text (Title, Title) and a typo in “poveryt”. This will surface to users as-is. Please give the function a meaningful title/description and clean up the argument spelling before release.
🤖 Prompt for AI Agents
In man/compute_fgt_dt_old.Rd around lines 5–22, replace the placeholder "Title"
entries and fix the typo "poveryt" in the \arguments and \description sections:
provide a concise, meaningful \title (e.g., "Compute Foster–Greer–Thorbecke
poverty measures"), write a clear \description summarizing what
compute_fgt_dt_old() does and its inputs/outputs, correct "poveryt" to "poverty"
in \item{povlines}, and improve \value and \usage text to accurately describe
the returned data.table and function signature so the help page is user-facing
and free of scaffolding.
| country_code = country, | ||
| reporting_year = res$reporting_year, | ||
| poverty_line = povline, | ||
| headcount = res$headcount | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fill poverty_line from result to avoid NULL column
If povline is NULL, constructing data.table omits the column. Use res$poverty_line.
- out <- data.table::data.table(
- country_code = country,
- reporting_year = res$reporting_year,
- poverty_line = povline,
- headcount = res$headcount
- )
+ out <- data.table::data.table(
+ country_code = res$country_code,
+ reporting_year = res$reporting_year,
+ poverty_line = res$poverty_line,
+ headcount = res$headcount
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| country_code = country, | |
| reporting_year = res$reporting_year, | |
| poverty_line = povline, | |
| headcount = res$headcount | |
| ) | |
| out <- data.table::data.table( | |
| country_code = res$country_code, | |
| reporting_year = res$reporting_year, | |
| poverty_line = res$poverty_line, | |
| headcount = res$headcount | |
| ) |
🤖 Prompt for AI Agents
In R/ui_country_profile.R around lines 78 to 82, the data.table construction
uses povline which can be NULL and causes the poverty_line column to be omitted;
replace poverty_line = povline with poverty_line = res$poverty_line (or use a
coalesce/rescue to ensure a non-NULL value) so the poverty_line column is always
present in the resulting data.table.
| use_new <- lkup$use_new_lineup_version | ||
|
|
||
| # Run correct function | ||
| #------------------------------------- | ||
| out <- if (use_new) { | ||
| x <- pip_grp_new(country = "ALL", | ||
| year = ref_years, | ||
| povline = povline, | ||
| welfare_type = "all", | ||
| reporting_level = "national", | ||
| lkup = lkup, | ||
| censor = FALSE) | ||
|
|
||
| regs <- lkup$aux_files$country_list[, region_code] |> | ||
| funique() |> | ||
| c("WLD") | ||
|
|
||
| x |> | ||
| fsubset(region_code %in% regs) | ||
|
|
||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New-lineup uses ref_lkup years; should use lineup/refy years
When use_new is TRUE you still pass year = ref_years derived from lkup$ref_lkup. For lineup flow, years should come from lineup/refy data (or lkup$valid_years if that already includes lineup years). This can return wrong or empty results.
Use lineup/refy years and only exclude the 1981–1989 filter on the old path:
- ref_years <- sort(unique(lkup$ref_lkup$reporting_year))
- ref_years <- ref_years[!ref_years %in% c(1981:1989)]
+ # decide years based on lineup path
+ use_new <- lkup$use_new_lineup_version
+ if (isTRUE(use_new) && "refy_lkup" %in% names(lkup)) {
+ ref_years <- sort(unique(lkup$refy_lkup$reporting_year))
+ } else {
+ ref_years <- sort(unique(lkup$ref_lkup$reporting_year))
+ ref_years <- ref_years[!ref_years %in% c(1981:1989)]
+ }
@@
- out <- if (use_new) {
+ out <- if (isTRUE(use_new)) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| use_new <- lkup$use_new_lineup_version | |
| # Run correct function | |
| #------------------------------------- | |
| out <- if (use_new) { | |
| x <- pip_grp_new(country = "ALL", | |
| year = ref_years, | |
| povline = povline, | |
| welfare_type = "all", | |
| reporting_level = "national", | |
| lkup = lkup, | |
| censor = FALSE) | |
| regs <- lkup$aux_files$country_list[, region_code] |> | |
| funique() |> | |
| c("WLD") | |
| x |> | |
| fsubset(region_code %in% regs) | |
| } else { | |
| # decide years based on lineup path | |
| use_new <- lkup$use_new_lineup_version | |
| if (isTRUE(use_new) && "refy_lkup" %in% names(lkup)) { | |
| ref_years <- sort(unique(lkup$refy_lkup$reporting_year)) | |
| } else { | |
| ref_years <- sort(unique(lkup$ref_lkup$reporting_year)) | |
| ref_years <- ref_years[!ref_years %in% c(1981:1989)] | |
| } | |
| # Run correct function | |
| #------------------------------------- | |
| out <- if (isTRUE(use_new)) { | |
| x <- pip_grp_new( | |
| country = "ALL", | |
| year = ref_years, | |
| povline = povline, | |
| welfare_type = "all", | |
| reporting_level = "national", | |
| lkup = lkup, | |
| censor = FALSE | |
| ) | |
| regs <- lkup$aux_files$country_list[, region_code] |> | |
| funique() |> | |
| c("WLD") | |
| x |> | |
| fsubset(region_code %in% regs) | |
| } else { | |
| # ... existing "old" path ... | |
| } |
🤖 Prompt for AI Agents
In R/ui_home_page.R around lines 18–38, when use_new is TRUE the code
incorrectly passes year = ref_years (from lkup$ref_lkup); change the year
argument for the new-lineup branch to use the lineup/refy years (e.g.,
lkup$lineup$refy or lkup$valid_years if that already contains lineup years) so
the pip_grp_new call uses lineup years, and ensure the 1981–1989
exclusion/filter is only applied on the old-path (else) branch, not in this
new-lineup branch.
| regs <- cl[, .(region_code, africa_split_code)] |> | ||
| unlist() |> # convert to vector | ||
| na_omit() |> | ||
| unique() |> | ||
| unname() |> | ||
| c("WLD") # add the world |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined na_omit helper
na_omit() is not an exported function (base R provides stats::na.omit()), so this call will throw at runtime and block regional results. Please switch to the correct helper.
- na_omit() |>
+ stats::na.omit() |>🤖 Prompt for AI Agents
R/ui_poverty_indicators.R around lines 106 to 111: the pipeline calls a
non-existent na_omit() helper causing a runtime error; replace that call with
the exported stats::na.omit() (or simply na.omit) so the NA removal works as
intended and the downstream unique()/unname() calls operate on a clean vector.
| #' @export | ||
| safe_endpoint <- function(fun, endpoint, debug = NULL) { | ||
| if (is.null(debug)) { | ||
| debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE") | ||
| } | ||
|
|
||
| function(req, res) { | ||
| tryCatch( | ||
| { | ||
| fun(req, res) | ||
| }, | ||
| error = function(e) { | ||
| res$status <- 500L | ||
| out <- list( | ||
| error = paste("Error in", endpoint), | ||
| message = if (debug) conditionMessage(e) else "Internal Server Error", | ||
| request_id = tryCatch(req$.id, error = \(.) NA), | ||
| endpoint = endpoint | ||
| ) | ||
| if (debug) { | ||
| out$class <- class(e)[[1]] | ||
| out$call <- as.character(conditionCall(e)) | ||
| out$query <- req$argsQuery | ||
| out$trace <- utils::capture.output( | ||
| rlang::trace_back(bottom = 10, simplify = "branch") | ||
| ) | ||
| } | ||
| out | ||
| } | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts currently surface as 200; map them to 504 and propagate metadata
with_req_timeout() returns a non-error list (ok = FALSE), so safe_endpoint() treats it as success and leaves res$status as 200. Clients won’t detect timeouts. Detect this structured timeout and set 504, or raise a typed condition from with_req_timeout() and map it in safe_endpoint().
Apply either approach:
A) Minimal change (detect structured timeout in safe_endpoint):
safe_endpoint <- function(fun, endpoint, debug = NULL) {
if (is.null(debug)) {
- debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE")
+ debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE")
}
function(req, res) {
- tryCatch(
- {
- fun(req, res)
- },
- error = function(e) {
+ val <- tryCatch(
+ {
+ fun(req, res)
+ },
+ error = function(e) {
res$status <- 500L
out <- list(
error = paste("Error in", endpoint),
message = if (debug) conditionMessage(e) else "Internal Server Error",
request_id = tryCatch(req$.id, error = \(.) NA),
endpoint = endpoint
)
if (debug) {
out$class <- class(e)[[1]]
out$call <- as.character(conditionCall(e))
out$query <- req$argsQuery
- out$trace <- utils::capture.output(
- rlang::trace_back(bottom = 10, simplify = "branch")
- )
+ out$trace <- tryCatch({
+ if (inherits(e, "rlang_error")) {
+ utils::capture.output(rlang::trace_back(e, simplify = "branch"))
+ } else {
+ as.character(sys.calls())
+ }
+ }, error = \(.) NULL)
}
- out
+ out
}
)
+ # Map structured timeout objects to 504 Gateway Timeout
+ if (is.list(val)) {
+ msg <- tryCatch(tolower(as.character(val$error)), error = \(.) "")
+ if (isFALSE(val$ok) && grepl("timeout", msg)) {
+ res$status <- 504L
+ val$request_id <- tryCatch(req$.id, error = \(.) NA)
+ val$endpoint <- endpoint
+ return(val)
+ }
+ }
+ val
}
}B) Typed timeout error (preferred), keep safe_endpoint() simple:
with_req_timeout <- function(expr,
secs = as.numeric(Sys.getenv("PLUMBER_REQ_TIMEOUT", "150"))) {
if (!is.finite(secs) || secs <= 0) return(force(expr))
start <- proc.time()[["elapsed"]]
tryCatch(
{
R.utils::withTimeout(
expr = force(expr),
timeout = secs,
onTimeout = "error"
)
},
TimeoutException = \(e) {
elapsed <- proc.time()[["elapsed"]] - start
- list(
- ok = FALSE,
- error = sprintf("Request exceeded timeout of %s seconds", secs),
- elapsed = elapsed
- )
+ stop(structure(
+ list(message = sprintf("Request exceeded timeout of %s seconds", secs),
+ elapsed = elapsed),
+ class = c("pipapi_timeout", "error", "condition")
+ ))
}
)
}If using (B), add in safe_endpoint():
- res$status <- 500L
+ res$status <- if (inherits(e, "pipapi_timeout")) 504L else 500LAlso applies to: 675-698
🤖 Prompt for AI Agents
In R/utils-plumber.R around lines 627-658 (and similarly 675-698), safe_endpoint
currently treats the non-error return from with_req_timeout() as success and
leaves res$status 200, so timeouts aren’t propagated; fix by either A) detecting
the structured timeout return (e.g. a list with ok == FALSE or a specific
timeout flag) inside the error handler path and setting res$status <- 504 and
returning the timeout metadata, or B) preferably modify with_req_timeout() to
signal a typed timeout condition (classed error or abort), then update
safe_endpoint’s tryCatch to catch that timeout type and set res$status <- 504
while attaching the timeout metadata to the response; implement one approach,
ensure the response body contains request_id/endpoint and any timeout info, and
keep other error handling unchanged.
| join(x = fgt, | ||
| y = dist, | ||
| on = by_var, | ||
| how = "left", | ||
| validate = "m:1", # multiple povlines | ||
| verbose = 0L) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined join() call
get_mean_median() invokes join(...) without qualifying or importing it, so the first call will throw “could not find function 'join'”. Use the fully qualified helper (e.g., collapse::join) or a data.table merge so the new lineup path doesn’t hard-crash.
- join(x = fgt,
- y = dist,
- on = by_var,
- how = "left",
- validate = "m:1", # multiple povlines
- verbose = 0L)
+ collapse::join(
+ x = fgt,
+ y = dist,
+ on = by_var,
+ how = "left",
+ validate = "m:1", # multiple povlines
+ verbose = 0L
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| join(x = fgt, | |
| y = dist, | |
| on = by_var, | |
| how = "left", | |
| validate = "m:1", # multiple povlines | |
| verbose = 0L) | |
| } | |
| collapse::join( | |
| x = fgt, | |
| y = dist, | |
| on = by_var, | |
| how = "left", | |
| validate = "m:1", # multiple povlines | |
| verbose = 0L | |
| ) | |
| } |
🤖 Prompt for AI Agents
In R/utils.R around lines 1538 to 1544, the call to join(...) is undefined and
will throw "could not find function 'join'"; replace it with a fully qualified
call to the intended helper (for example collapse::join(...)) or convert it to
an explicit data.table merge so the function does not rely on an unimported
symbol; if you choose collapse::join, also ensure collapse is declared in
DESCRIPTION (Imports) or add an import/importFrom in NAMESPACE so the package
builds cleanly.
Summary by CodeRabbit
New Features
Changes
Bug Fixes
Documentation
Refactor
Tests