Skip to content

Conversation

@randrescastaneda
Copy link
Member

@randrescastaneda randrescastaneda commented Sep 29, 2025

Summary by CodeRabbit

  • New Features

    • Added pip_agg and new/old lineup pipelines with improved aggregation and caching.
    • Popshare support now infers poverty lines per survey.
    • API gains standardized error handling, request timeouts, and request IDs.
    • UI endpoints enhanced (home page, country profile, poverty indicators).
    • New dataset empty_response_fg_add and environment helpers.
    • Decimal precision control for cached poverty lines.
  • Changes

    • pip default censor set to FALSE.
    • Aggregations now account for poverty_line.
  • Bug Fixes

    • Resolved popshare-related accuracy issue.
  • Documentation

    • Extensive new/updated help pages and release notes.
  • Refactor

    • Endpoints adopt (req, res) signatures; routing between new/old lineups.
  • Tests

    • Added popshare regression test; updated suite to new signatures.

@coderabbitai
Copy link

coderabbitai bot commented Sep 29, 2025

Caution

Review failed

Failed to post review comments

Walkthrough

Major version update introduces lineup-based dispatch, new and legacy FGT pipelines, aggregation APIs, environment helpers, timeout-safe API endpoints, and expanded documentation. Numerous signatures change, new exports are added, caching and DuckDB update logic are revised, and UI functions/endpoints are adapted to the new flows and feature flags.

Changes

Cohort / File(s) Summary of changes
Versioning & Namespace
DESCRIPTION, NAMESPACE, NEWS.md
Bump to 1.4.1. Add imports (parallel, R.utils, data.table). Export many new APIs. Append extensive release notes for 1.4.x and 1.3.x.
Env & Startup
R/aaa.R, R/pipapi-env.R, R/zzz.R
Add private .pipapienv and exported accessors (get_*, set_*). Extend memoised bindings and store config in env.
Lineup Dispatch & Selection
R/pip.R, R/pip_new_lineups.R, R/pip_old_lineups.R, R/create_lkups.R, R/create_countries_vctr.R
Introduce wrapper dispatching to new/old lineup flows via use_new_lineup_version. Add new lineup pipeline, legacy variant, and helper. Update signatures and lookups. Change pip() default censor=FALSE.
FGT Engine (new)
R/compute_fgt_new.R, R/fgt_cumsum.R, R/infer_poverty_line.R, R/rg_pip.R
Add vector and data.table FGT computation, cumulative-sum pipeline, quantile-based poverty line inference, and survey-year flow integrating per-survey povlines. Remove legacy helpers from rg_pip.
FGT Engine (old)
R/compute_fgt_old.R, R/rg_pip_old.R, R/fg_pip_old.R
Add legacy FGT computation and survey/imputed-year pipelines with duplicate-handling helpers.
Aggregation APIs
R/pip_agg.R, R/pip_grp_new.R, R/pip_grp_logic.R, R/pip_grp.R, R/add_agg_stats.R
Introduce pip_agg and new group flow, refactor legacy group logic, adjust signatures, and include poverty_line in aggregation keys.
Data utilities & stats merge
R/utils.R, R/utils-pipdata.R
Add add_dist_stats, get_mean_median, popshare-aware lookup, and multiple attribute-to-column utilities with helpers.
Plumber safety & timeout
R/utils-plumber.R, inst/plumber/v1/endpoints.R, inst/plumber/v1/plumber.R
Add safe_endpoint and with_req_timeout. Wrap endpoints with standardized error/timeout handling, request context/timing, and global error hooks.
DuckDB & Caching
R/duckdb_func.R
Extend update_master_file with decimal, schema-aware mean/median handling, improved cache merge, robust connection errors.
UI functions
R/ui_country_profile.R, R/ui_home_page.R, R/ui_poverty_indicators.R
Update signatures (add lkup_hash, censor, defaults), switch to new aggregation paths, add prosperity gap, refactor selections/filters.
Datasets & data build
R/pipapi-package.R, data-raw/data.R
Generate empty_response_fg via pipeline; add and export empty_response_fg_add.
Temporary tooling
inst/TMP/TMP_API_launcher.R, inst/TMP/TMP_duckdb_cache.R
Simplify data dir/bootstrap, add fastverse, new cache reset/build demo scripts.
Documentation
man/*
Add/update Rd docs for new/changed functions, signatures, datasets, and internals; reflect API additions and parameter changes.
Tests
tests/testthat/*
Update for new signatures and behaviors; add popshare regression test and adjust expectations.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant pip as pip()
  participant Sel as use_new_lineup_version
  participant New as pip_new_lineups()
  participant Old as pip_old_lineups()
  Client->>pip: country, year, povline/popshare, lkup, ...
  pip->>Sel: lkup$version
  alt New lineup
    pip->>New: delegated call
    New->>New: validate inputs, cache, fill_gaps branch
    New-->>pip: data.table result
  else Old lineup
    pip->>Old: delegated call
    Old->>Old: validate inputs, cache, fill_gaps branch
    Old-->>pip: data.table result
  end
  pip-->>Client: result
Loading
sequenceDiagram
  autonumber
  actor User
  participant API as Plumber Endpoint (safe_endpoint)
  participant Core as Core handler
  participant TO as with_req_timeout
  Note over API: Attach request_id, timing
  User->>API: HTTP request
  API->>TO: with timeout expr
  TO->>Core: execute handler
  alt completes before timeout
    Core-->>TO: result
    TO-->>API: ok: TRUE, result
    API-->>User: 200 + payload
  else timeout
    TO-->>API: ok: FALSE, error: timeout
    API-->>User: 503 + JSON error {request_id, endpoint}
  end
  Note over API: On error: 500 + structured JSON
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Poem

I thump my paw: new lines, old burrows align,
Pip-ways fork where flags now shine.
FGTs hop fast, cumsums glide,
Timeouts guard the API tide.
Caches purr, ducks in a row,
Bunny approves—ship we go! 🐇✨

Pre-merge checks and finishing touches and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title “release 20250930” conveys only a release date and does not mention the new version number or the primary changes introduced (such as bumping from v1.3.19 to v1.4.1 and adding major features and dependencies), making it too generic for clear history tracking. It fails to highlight the main focus of the pull request and may not inform reviewers or future readers about the key update. Consider updating the title to specify the version and main change, for example “Release pipapi v1.4.1” or “Bump pipapi to v1.4.1 with new FGT modules and exports,” to clearly reflect the key update.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch QA
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch QA

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 55

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)
man/rg_pip_old.Rd (1)

7-56: Fix the duplicated usage block and incorrect return/value docs.

The Rd now renders two identical rg_pip_old(...) usage signatures plus duplicated \value{} and \description{} sections, which is confusing. More importantly, the \value{} stanza still says data.frame, but the implementation in R/rg_pip_old.R returns a list with main_data and data_in_cache. The documentation needs to be regenerated so it advertises a single usage block and the correct return structure (list with its components). Please update the roxygen comments accordingly and re-run roxygen2.

man/update_master_file.Rd (1)

11-22: Document the new decimal argument (and verbose).

Line [12] adds decimal = 2 to the public signature, but the \arguments section still stops at fill_gaps, so readers get no guidance on either verbose or decimal. Please add \item entries for both parameters so the documentation reflects the exported API. You can apply a change like:

 \arguments{
 \item{dat}{Dataframe to be appended}
 
 \item{cache_file_path}{path where cache file is saved}
 
 \item{fill_gaps}{logical: If set to TRUE, will interpolate / extrapolate
 values for missing years}
+\item{verbose}{logical: print progress information when TRUE.}
+\item{decimal}{integer: number of decimal places to round numeric fields.}
 }
man/fg_pip.Rd (1)

15-36: Document the new pipenv argument.

The signature now exposes pipenv = NULL, but there’s no matching \item{pipenv}{...} entry. Readers of the help page won’t know what to pass. Please add a clear argument description (purpose, expected type/default behavior).

R/add_agg_stats.R (1)

146-160: Bug: column-wide NA coercion in negative/zero handlers.

negative_to_na and zeros_to_na return a single NA_real_ if any offending value exists, wiping entire columns during lapply. These should be element-wise replacements.

Apply this fix:

-negative_to_na <- function(x) {
-  if (any(x < 0, na.rm = TRUE) || anyNA(x)) {
-    NA_real_
-  } else {
-    x
-  }
-}
+negative_to_na <- function(x) {
+  ix <- x < 0
+  if (any(ix, na.rm = TRUE)) x[ix] <- NA_real_
+  x
+}
 
-zeros_to_na <- function(x) {
-  if (any(x == 0, na.rm = TRUE)) {
-    NA_real_
-  } else {
-    x
-  }
-}
+zeros_to_na <- function(x) {
+  ix <- x == 0
+  if (any(ix, na.rm = TRUE)) x[ix] <- NA_real_
+  x
+}
tests/testthat/test-utils.R (1)

11-14: Replace second lkups assignment with lkup
In both blocks of tests/testthat/test-utils.R (around lines 11–14 and 16–22), change:

-lkups <- lkups$versions_paths[[lkups$latest_release]]
+lkup  <- lkups$versions_paths[[lkups$latest_release]]

This defines lkup and preserves lkups for subsequent tests.

man/add_dist_stats.Rd (1)

7-13: Update argument docs to match new signature.

The usage now shows add_dist_stats(df, lkup, fill_gaps), but the \arguments{} block still documents dist_stats and omits lkup/fill_gaps. Please align the argument documentation with the current parameters so users get accurate guidance.

man/get_aux_table.Rd (1)

7-15: Update data_dir argument description to match new default.

Now that data_dir defaults to NULL, the argument section should clarify that callers may omit it and that the function will derive the directory (e.g., from lkup/environment helpers) when absent. Without that note, the docs imply the parameter is required and hide the new behavior introduced in this release. Please revise the argument text (and description/details if needed) to reflect how the function resolves data_dir = NULL.

tests/testthat/test-fg_pip-local.R (1)

25-34: Close or remove the unused DuckDB connection.

All the fg_pip() calls now rely solely on lkup, but we still open a DuckDB connection that is never used or closed. This leaves a stray handle to the cache database for the duration of the test session. Please drop the manual duckdb::dbConnect() (or ensure you disconnect it right away) so the test teardown stays clean.

Apply this diff to remove the unused connection:

-con <- duckdb::dbConnect(duckdb::duckdb(), dbdir = fs::path(lkup$data_root, "cache", ext = "duckdb"))
R/ui_country_profile.R (1)

40-42: year parameter is ignored; honor MRV/ALL/specific years

The function always picks MRV. This breaks ui_cp_download(year = "ALL") and explicit year requests.

 ui_cp_ki_headcount <- function(country,
-                               year = "MRV",
+                               year = "MRV",
                                povline,
                                lkup) {
@@
-  res <-
-    res_all[country_code == country
-            ][,
-              .SD[which.max(reporting_year)]]
+  res_country <- res_all[country_code == country]
+  if (is.character(year) && toupper(year) == "MRV") {
+    res <- res_country[, .SD[which.max(reporting_year)]]
+  } else if (is.character(year) && toupper(year) == "ALL") {
+    res <- res_country
+  } else {
+    yrs <- suppressWarnings(as.integer(year))
+    res <- res_country[reporting_year %chin% yrs]
+    if (nrow(res) == 0L) res <- res_country[, .SD[which.max(reporting_year)]]
+  }
@@
-  res <-
-    if (nrow(res) > 1) {
-      if (all(c("consumption", "income") %in% res$welfare_type)) {
-        # Get consumpton when both are available
-        res[welfare_type == "consumption"]
-      } else {
-        # get the first observations. this must be revised.
-        res[1]
-      }
-  } else {
-    res
-  }
+  # within-year tie-breaker: prefer consumption; ensure one row per year
+  res <- res[, {
+    if (any(welfare_type == "consumption")) .SD[welfare_type == "consumption"][1]
+    else .SD[1]
+  }, by = .(country_code, reporting_year)]
R/pip_grp_logic.R (1)

72-115: Dispatch ignores lkup$use_new_lineup_version now
Switching to fg_pip_old() forces the legacy pipeline even when lkup$use_new_lineup_version is TRUE. That means any request hitting a “new lineup” lookup will now grab stale old-lineup FG data (or fail if the old cache doesn’t cover it), breaking aggregated outputs. Please reinstate the version-aware dispatch—e.g. call fg_pip() and let it choose, or replicate the flag check here and route to fg_pip_new() vs fg_pip_old().

R/create_lkups.R (1)

703-717: Hash uses unstripped query_controls (version field not removed)

You created query_controls_hash without the version field, but hash_lkup uses query_controls. This defeats the intent and can cause unnecessary cache churn.

-  hash_lkup <- list(hash_svy_lkup,
+  hash_lkup <- list(hash_svy_lkup,
                     hash_ref_lkup,
                     dist_stats,
                     pop_region,
                     cp_lkups,
                     pl_lkup,
                     censored,
                     aux_files,
                     return_cols,
-                    query_controls,
+                    query_controls_hash,
                     aux_tables,
                     valid_years
                     )
inst/plumber/v1/endpoints.R (1)

814-815: Docs typo in reporting_level options.

Missing quote around "national".

-#* @param reporting_level:[chr] Reporting level. Options are "all", national", "urban", "rural".
+#* @param reporting_level:[chr] Reporting level. Options are "all", "national", "urban", "rural".
🧹 Nitpick comments (59)
inst/TMP/TMP_API_launcher.R (1)

7-9: Guard against missing local data root

Sys.getenv() returns an empty string when PIPAPI_DATA_ROOT_FOLDER_LOCAL is absent. Passing that straight into fs::path()/fs::dir_ls() silently falls back to the working directory, so the subsequent create_versioned_lkups() call fails later with a much less obvious error. Please short-circuit here with an explicit check so failures are immediate and easier to diagnose.

-  data_dir <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_LOCAL") |>
-    fs::path()
-  fs::dir_ls(data_dir, recurse = FALSE)
+  data_root <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_LOCAL")
+  if (identical(data_root, "")) {
+    stop("PIPAPI_DATA_ROOT_FOLDER_LOCAL is not set; cannot locate local data root.", call. = FALSE)
+  }
+  data_dir <- fs::path(data_root)
+  fs::dir_ls(data_dir, recurse = FALSE)
R/utils-plumber.R (2)

646-653: Trace capture may be empty; add robust fallback

rlang::trace_back() without an rlang-traced condition often yields little. Capture a usable trace for base errors.

-          out$trace <- utils::capture.output(
-            rlang::trace_back(bottom = 10, simplify = "branch")
-          )
+          out$trace <- tryCatch({
+            if (inherits(e, "rlang_error")) {
+              utils::capture.output(rlang::trace_back(e, simplify = "branch"))
+            } else {
+              as.character(sys.calls())
+            }
+          }, error = \(.) NULL)

629-631: Make PIPAPI_DEBUG parsing more permissive

Support common truthy values (“true”, “1”, “yes”, “t”) to avoid surprises across environments.

-  if (is.null(debug)) {
-    debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE")
-  }
+  if (is.null(debug)) {
+    debug <- tolower(Sys.getenv("PIPAPI_DEBUG", "false")) %in% c("true","1","t","yes")
+  }
R/create_countries_vctr.R (2)

15-18: Defensive checks for lkup structure

Fail fast if lkup is missing required members; helps avoid cryptic errors downstream.

 create_countries_vctr <- function(country,
                                   year,
                                   lkup) {
-  valid_years <-   lkup$valid_years
-  aux_files   <-   lkup$aux_files
+  stopifnot(is.list(lkup), !is.null(lkup$valid_years), !is.null(lkup$aux_files))
+  valid_years <- lkup$valid_years
+  aux_files   <- lkup$aux_files

261-268: Avoid eval(parse()) for column selection; build predicate programmatically

Parsing strings is harder to maintain and easy to break. Use Reduce/lapply over columns.

-    out      <- user_alt_gt_code |>
-      # Create filter for data.table
-      paste0(" %in% user_alt_agg", collapse =  " | ") |>
-      # parse the filter as unevaluated expression
-      {\(.) parse(text = .) }() |>
-      # filter and get country codes
-      {\(.) cl[eval(.), country_code] }()
+    idx <- Reduce(`|`, lapply(user_alt_gt_code, \(col) cl[[col]] %in% user_alt_agg))
+    out <- cl[idx, country_code]
man/assign_stat.Rd (1)

18-22: Clarify behavior for missing/extra names and recycling

Document how stat names are matched to lev (must cover all unique levels? errors vs. NA fill), and whether unnamed scalars are recycled.

Suggested additions:

  • If stat is a named vector/list, names must include all unique lev; missing names yield NA (or error).
  • If stat is scalar, it is broadcast to all rows.
  • Length of counts must match lev.

Also applies to: 24-33

R/compute_fgt_old.R (2)

11-51: Handle NAs and avoid recomputing denominators in FGT/WATTS loop

  • fmean/fsum should set na.rm = TRUE to avoid NA propagation.
  • Precompute W <- fsum(wt, na.rm = TRUE) once.
  • Minor: compute lpov <- log(pov) once per iteration.
 compute_fgt_dt_old <- function(dt, welfare, weight, povlines) {
   w <- dt[[welfare]]
   wt <- dt[[weight]]
   n <- length(w)
   m <- length(povlines)
 
   # Pre-allocate result matrix
   res <- matrix(NA_real_, nrow = m, ncol = 3)
   colnames(res) <- c("FGT0", "FGT1", "FGT2")
   watts_vec <- numeric(m)
 
   # Precompute log(w) for efficiency
   logw <- rep(NA_real_, n)
   pos <- w > 0
   logw[pos] <- log(w[pos])
+  W <- fsum(wt, na.rm = TRUE)
 
   for (i in seq_along(povlines)) {
-    pov <- povlines[i]
+    pov <- povlines[i]; lpov <- log(pov)
     poor <- w < pov
     rel_dist <- 1 - (w / pov)
     rel_dist[!poor] <- 0
-    res[i, 1] <- fmean(poor, w = wt) # FGT0
-    res[i, 2] <- fmean(rel_dist, w = wt) # FGT1
-    res[i, 3] <- fmean(rel_dist^2, w = wt) # FGT2
+    res[i, 1] <- fmean(poor, w = wt, na.rm = TRUE)       # FGT0
+    res[i, 2] <- fmean(rel_dist, w = wt, na.rm = TRUE)   # FGT1
+    res[i, 3] <- fmean(rel_dist^2, w = wt, na.rm = TRUE) # FGT2
 
     # Optimized Watts index calculation
     keep <- poor & pos
     if (any(keep, na.rm = TRUE)) {
-      watts_vec[i] <- (fsum((log(pov) - logw[keep]) * wt[keep])) / fsum(wt)
+      watts_vec[i] <- fsum((lpov - logw[keep]) * wt[keep], na.rm = TRUE) / W
     } else {
       watts_vec[i] <- 0
     }
   }

87-99: Guard setnames(dt, "area", "reporting_level") when area is absent

Some inputs may already use reporting_level. Avoid errors by checking column existence.

-      setnames(dt, "area", "reporting_level")
+      if ("area" %in% names(dt)) setnames(dt, "area", "reporting_level")
man/fg_remove_duplicates.Rd (1)

12-14: Document new argument in the Rd arguments list (via roxygen).

use_new_lineup_version = FALSE was added to the usage, but there’s no corresponding \item{use_new_lineup_version}{...} in \arguments{}. Update the roxygen in R/fg_pip.R (add @param use_new_lineup_version logical: …, optionally linking to \code{\link{use_new_lineup_version}}), then regenerate docs to keep the public interface consistent.

DESCRIPTION (1)

76-77: Pin remote dependency for reproducible PROD builds.

Remotes: PIP-Technical-Team/wbpip@DEV is volatile for a release. Consider pinning to a tag or commit SHA (e.g., @vX.Y.Z or @<sha>) to avoid unreviewed upstream changes entering PROD.

R/add_agg_stats.R (1)

16-23: Prefer data.table split by columns over interaction().

For readability and speed:

-aggregated_list <- split(aggregated,
-  interaction(
-    aggregated$country_code,
-    aggregated$reporting_year,
-    aggregated$poverty_line
-  ),
-  drop = TRUE
-)
+aggregated_list <- split(
+  aggregated,
+  by = c("country_code", "reporting_year", "poverty_line"),
+  drop = TRUE
+)
tests/testthat/test-pip-local.R (1)

845-852: Stabilize the new popshare regression test.

Add a row-count assertion to guard against accidental multi-row outputs and make the intent explicit.

 test_that("make sure popshare bug no which was reported no longer exists", {
-  out <- pip(country = "USA", year = 2022,
+  out <- pip(country = "USA", year = 2022,
              popshare = .5, lkup = lkup)
+  expect_equal(nrow(out), 1)
   # Ensure poverty line is not the default one
   expect_false(out$poverty_line %in% c(1.9, 3))
   # Ensure headcount is closer to 0.5
   expect_equal(out$headcount, 0.5, tolerance = .05)
 })
R/pipapi-env.R (1)

24-41: Consider validating keys before touching .pipapienv.

Right now any non-string input bubbles straight into rlang::env_get/env_poke, which will error on, e.g., vectors. A quick rlang::is_string(key) || abort check would make the helpers safer for downstream callers.

man/add_attributes_as_columns_multi.Rd (3)

11-18: Tighten wording for reporting_level_rows$rows.

Current: “rows (integer cumulative row ends)”. Suggest: “rows: integer vector of cumulative end-row indices for each segment.” Update in the roxygen source (R/utils-pipdata.R) so this file regenerates.


5-8: Add cross-reference to the vectorized variant.

Add a See Also entry in the roxygen source: @seealso [add_attributes_as_columns_vectorized] to help users choose between APIs.


31-34: Provide a minimal, non-executed example.

Consider a small \dontrun{} example that shows setting required attributes on a tiny data.table and calling the function. This improves discoverability. Update the roxygen source, not this Rd file.

R/pip.R (3)

45-57: Avoid referencing lkup in a default argument; compute lkup_hash inside.

Defaulting lkup_hash = lkup$cache_data_id$hash_pip in formals is brittle. Prefer a NULL default and derive inside after validating lkup.

-pip <- function(country         = "ALL",
+pip <- function(country         = "ALL",
                 ...
-                lkup,
-                censor          = FALSE,
-                lkup_hash       = lkup$cache_data_id$hash_pip,
+                lkup,
+                censor          = FALSE,
+                lkup_hash       = NULL,
                 additional_ind  = FALSE) {
+  # Normalize lkup_hash once lkup is available
+  if (is.null(lkup_hash)) {
+    stopifnot(!missing(lkup))
+    lkup_hash <- lkup$cache_data_id$hash_pip
+  }

61-61: Make the lineup switch robust.

Coerce to logical safely and handle missing flag.

-  use_new <- lkup$use_new_lineup_version
+  use_new <- isTRUE(lkup$use_new_lineup_version)

45-53: Normalize enum-like arguments locally with match.arg.

Ensure consistent behavior regardless of callee expectations.

 pip <- function(country         = "ALL",
                 year            = "ALL",
                 povline         = 1.9,
                 popshare        = NULL,
                 fill_gaps       = FALSE,
-                group_by        = c("none", "wb"),
-                welfare_type    = c("all", "consumption", "income"),
-                reporting_level = c("all", "national", "rural", "urban"),
+                group_by        = c("none", "wb"),
+                welfare_type    = c("all", "consumption", "income"),
+                reporting_level = c("all", "national", "rural", "urban"),
                 ppp             = NULL,
                 lkup,
                 censor          = FALSE,
                 lkup_hash       = NULL,
                 additional_ind  = FALSE) {
+  group_by        <- match.arg(group_by)
+  welfare_type    <- match.arg(welfare_type)
+  reporting_level <- match.arg(reporting_level)
R/rg_pip_old.R (3)

61-62: Qualify rbindlist to avoid namespace ambiguity.

Be explicit to ensure the data.table implementation is used.

-  res <- rbindlist(res, fill = TRUE)
+  res <- data.table::rbindlist(res, fill = TRUE)

67-73: Qualify join or switch to a well-known merge.

join(...) is not a base or data.table function. Either qualify it (e.g., pipapi::join) or use data.table::merge with all=TRUE and on=..

-  out <- join(res,
+  out <- pipapi::join(res,
               metadata,
               on = c("file", "reporting_level"),
               how = "full",
               validate = "m:1",
               verbose = 0)

Alternative with data.table:

# set keys/columns beforehand
data.table::setkey(res, file, reporting_level)
data.table::setkey(metadata, file, reporting_level)
out <- metadata[res, on = .(file, reporting_level), nomatch = 0L, allow.cartesian = TRUE]

74-78: Guard renames if columns are absent.

If survey_mean_ppp or survey_median_ppp are missing, this will error. Add presence checks.

-  out[, `:=`(
-    mean = survey_mean_ppp,
-    median = survey_median_ppp,
-    file = NULL
-  )]
+  if ("survey_mean_ppp" %in% names(out))  out[,  mean := survey_mean_ppp]
+  if ("survey_median_ppp" %in% names(out)) out[, median := survey_median_ppp]
+  out[, file := NULL]
man/safe_endpoint.Rd (1)

54-65: Example: show idiomatic wrapper usage

The example currently invokes the returned function inline. Consider documenting the typical pattern (returning the wrapped handler) for clarity.

-#* @get /api/v1/pip
-function(req, res) {
-  safe_endpoint(function(req, res) {
-    params <- req$argsQuery
-    params$lkup <- lkups$versions_paths[[req$argsQuery$version]]
-    params$version <- NULL
-    do.call(pipapi::ui_pip, params)
-  }, endpoint = "/api/v1/pip")(req, res)
-}
+#* @get /api/v1/pip
+function(req, res) safe_endpoint(
+  fun = function(req, res) {
+    params <- req$argsQuery
+    params$lkup <- lkups$versions_paths[[req$argsQuery$version]]
+    params$version <- NULL
+    do.call(pipapi::ui_pip, params)
+  },
+  endpoint = "/api/v1/pip"
+)(req, res)
inst/plumber/v1/plumber.R (2)

3-7: Respect pre-set thread env vars

Avoid overriding admin‑provided caps. Set only if unset.

-Sys.setenv(
-  OPENBLAS_NUM_THREADS = "1",
-  MKL_NUM_THREADS      = "1",
-  OMP_NUM_THREADS      = "1"
-)
+env <- Sys.getenv(c("OPENBLAS_NUM_THREADS","MKL_NUM_THREADS","OMP_NUM_THREADS"), NA)
+set_if_empty <- function(k, v) if (is.na(env[[k]]) || nzchar(env[[k]]) == FALSE) Sys.setenv(structure(v, names = k))
+set_if_empty("OPENBLAS_NUM_THREADS", "1")
+set_if_empty("MKL_NUM_THREADS",      "1")
+set_if_empty("OMP_NUM_THREADS",      "1")

79-85: Guard API spec path existence

system.file can return "". Add a check to avoid failing at startup.

-  plumber::pr_set_api_spec(yaml::read_yaml(api_spec_path))
+  plumber::pr_set_api_spec({
+    if (!nzchar(api_spec_path) || !file.exists(api_spec_path)) {
+      warning("API spec not found at: ", api_spec_path)
+      NULL
+    } else {
+      yaml::read_yaml(api_spec_path)
+    }
+  })
R/ui_country_profile.R (3)

121-127: Make PG optional and resilient

If pg_svy is absent or empty, avoid errors and still return charts.

-  pg <- get_aux_table(data_dir = lkup$data_root,
-                      "pg_svy")
-  pg <- pg[country_code == country]
-  dl2[["pg"]] <- pg
+  pg <- tryCatch(get_aux_table(data_dir = lkup$data_root, "pg_svy"),
+                 error = function(e) data.table::as.data.table(NULL))
+  if (NROW(pg)) {
+    dl2[["pg"]] <- pg[country_code == country]
+  }

129-131: List shaping: avoid extra nesting

If consumers don’t expect an extra list level, drop the outer list().

-  dl <- list(append(dl, dl2)) |>
-    setNames(country)
+  dl <- append(dl, dl2) |>
+    (\(.) list(.[[1]] = .))() |>
+    setNames(country)

322-322: Qualify rbindlist or ensure import

Unqualified rbindlist requires NAMESPACE import. Safer to use data.table::rbindlist here.

-    rbindlist(use.names = TRUE)
+    data.table::rbindlist(use.names = TRUE)
man/add_attributes_as_columns_vectorized.Rd (1)

13-16: Document invisible return and by‑reference nature

Small doc polish: make it explicit the function returns dt invisibly, modified by reference.

-\value{
-The same \code{data.table}, modified by reference, with added columns:
+\value{
+The same \code{data.table}, \emph{modified by reference} (returned \code{invisibly}),
+with added columns:
R/rg_pip.R (1)

20-23: Remove premature povline unlist (dead/bug‑prone)

This is overwritten by metadata$povline later; keeping it risks type mismatches.

-  # povline is set to NULL if popshare is given
-  if (!is.null(popshare)) povline <- NULL
-  if (is.list(povline))   povline <- unlist(povline)
+  # povline is set to NULL if popshare is given
+  if (!is.null(popshare)) povline <- NULL
R/ui_home_page.R (2)

11-13: Roxygen/doc mismatch for lkup_hash default

The param says “hash of pip” but default uses hash_pip_grp. Align the docstring (or the default) to avoid confusion.

Apply this small doc tweak:

-#' @param lkup_hash character: hash of pip
+#' @param lkup_hash character: cache hash (pip_grp)

85-89: Guard against NA and control precision for UI values

If either reporting_pop or headcount is NA you’ll propagate NA. Consider a NA‑safe computation and optional rounding for display.

-  out[,
-      `:=`(
-        pop_in_poverty = reporting_pop * headcount / pop_units,
-        reporting_pop  = reporting_pop / pop_units
-        )]
+  out[, `:=`(
+    reporting_pop  = reporting_pop / pop_units,
+    pop_in_poverty = fifelse(!is.na(headcount) & !is.na(reporting_pop),
+                             reporting_pop * headcount, NA_real_)
+  )]
+  # Optionally round for UI:
+  # out[, c("reporting_pop","pop_in_poverty") := lapply(.SD, round, 2),
+  #     .SDcols = c("reporting_pop","pop_in_poverty")]
R/create_lkups.R (3)

78-80: Name shadowing makes the code harder to reason about

use_new_lineup_version <- use_new_lineup_version(versions) shadows the function with a logical. Prefer a different variable name.

-  use_new_lineup_version <- use_new_lineup_version(versions)
+  use_new <- use_new_lineup_version(versions)

Also update subsequent if (use_new_lineup_version) checks to if (use_new).


777-793: Expose lineup metadata in lkup for downstream consumers

ui_hp_stacked() needs lineup years when use_new is true. Consider attaching them for easy access.

   lkup <- list(
@@
-    use_new_lineup_version = use_new_lineup_version)
+    use_new_lineup_version = use_new
+  )
+  if (isTRUE(use_new)) {
+    lkup$lineup_years <- lineup_years$lineup_years
+  }

95-95: Inline “Why is this necessary?” comment

The setnames(country_list, 'region', 'region_name') question should be clarified or removed before release.

Add a brief rationale (schema normalization) or drop the comment.

R/fgt_cumsum.R (3)

43-51: Avoid list2env+rm; use explicit locals

Using list2env(LDTg, envir = environment()) makes scope brittle. Prefer explicit binding for clarity and safety.

-get_total_pop <- \(LDTg) {
-  list2env(LDTg, envir = environment())
-  rm(LDTg)
-  add_vars(g[["groups"]],
-           get_vars(DT, c("weight")) |>
-           fsum(g)) |>
-    setnames(old = "weight",
-             new =  "W")
-}
+get_total_pop <- \(LDTg) {
+  DT <- LDTg$DT
+  g  <- LDTg$g
+  add_vars(g[["groups"]],
+           get_vars(DT, "weight") |> fsum(g)) |>
+    setnames("weight", "W")
+}

151-163: Deterministic coding: OK, add explicit integer type for code

Looks good. For robustness after rbinds, consider setting storage.mode(code) <- "integer" once at the end (after possible updates).


213-224: Error message: count of unseen pairs uses fsum(nas)

fsum over logicals is fine, but to avoid dependency surprises use sum(nas) here; it’s clearer and avoids type confusion in messages.

-          "encode_pairs(): {fsum(nas)} unseen (id, reporting_level) pair(s).",
+          "encode_pairs(): {sum(nas)} unseen (id, reporting_level) pair(s).",
R/utils-pipdata.R (6)

10-28: Type/shape handling in transform_input

  • If years is a vector, it’s replicated per country as intended; OK.
  • Minor: “formated” -> “formatted” in return doc. Also clarify expected input_list structure in comments (current @inheritParams may mislead).

98-101: Prefer setalloccol over comment uncertainty

setalloccol(dt) is the by-ref allocator; the comment shows uncertainty. Keep it and drop the comment, or use alloc.col(dt) consistently across helpers.

-  setalloccol(dt)    # pre-allocate room for new columns... #AC, I am still not sure about this.
+  setalloccol(dt)    # pre-allocate room for new columns

111-119: Replace loop with vectorized replication

You already use a vectorized approach in add_attributes_as_columns_multi. Apply the same here for speed and simplicity.

-  reporting_level_vec <- character(n)
-  start <- 1L
-  for (i in seq_along(lev)) {
-    end <- rows[i]
-    reporting_level_vec[start:end] <- lev[i]
-    start <- end + 1L
-  }
-  dt[, reporting_level := reporting_level_vec]
+  dt[, reporting_level := rep.int(lev, diff(c(0L, rows)))]

133-149: Distribution stats loop can be simplified and protected

  • If ds contains NULLs or missing names, assignments per level may warn.
  • Consider using assign_stat (you already export it) or map once by names to avoid per-level subsetting.
-  if (length(ds)) {
-    dstats <- c("mean", "median")
-    for (l in lev) {
-      wrl <- whichv(dt$reporting_level, l)
-      ld <- lapply(dstats, \(d) {
-        if (is.null(ds[[d]][[l]])) {
-          NA
-        } else {
-          ds[[d]][[l]]
-        }
-      })
-      dt[wrl, (dstats) := ld]
-    }
-  }
+  if (length(ds)) {
+    assign_stat(dt, lev, diff(c(0L, rows)), ds$mean,   "mean")
+    assign_stat(dt, lev, diff(c(0L, rows)), ds$median, "median")
+  }

181-214: Consistency with allocator

Here you use alloc.col(dt) while the vectorized variant uses setalloccol(dt). Choose one; setalloccol is idiomatic for by-ref growth.


317-322: Handle NULL dist_stats entries safely

If any element yields NULL, rbindlist can error. Filter out NULLs first.

-  lapply(lt_att, \(.) {
-    .$dist_stats
-  }) |>
-    rbindlist(fill = TRUE)
+  stats_list <- lapply(lt_att, \(.) .$dist_stats)
+  stats_list <- Filter(Negate(is.null), stats_list)
+  rbindlist(stats_list, fill = TRUE)
R/fg_pip.R (2)

224-241: Stray debug print and comment blocks

print("here") and commented blocks should be removed or guarded by a verbose flag.

-    print("here")
+    # message("fg_remove_duplicates: legacy lineup path")

255-266: fg_standardize_cache_id: wrong variable sanitized

You gsub(..., interpolation_id) inside ifelse, but are returning a standardized cache_id. Likely intent is to sanitize cache_id. Also, the function is currently unused; either fix or remove to avoid confusion.

-  out <- ifelse(grepl("|", interpolation_id, fixed = TRUE),
-                gsub(paste0("_",
-                            unique(reporting_level),
-                            collapse = '|'),
-                     '',
-                     interpolation_id),
-                cache_id)
+  out <- ifelse(
+    grepl("|", interpolation_id, fixed = TRUE),
+    gsub(paste0("_", unique(reporting_level), collapse = "|"), "", cache_id),
+    cache_id
+  )
R/pip_old_lineups.R (3)

145-149: Prefer robust binding over rowbind() and confirm origin

rowbind may not be available/consistent; rbindlist is safer and faster.

-    out <- main_data |>
-      rowbind(cached_data)
+    out <- data.table::rbindlist(list(main_data, cached_data), use.names = TRUE, fill = TRUE)

If rowbind() is intended (internal helper), confirm it’s in namespace and handles column mismatches.


67-68: Unused parameter lkup_hash

lkup_hash is never referenced in this function. Remove it or wire it into cache/update paths.


109-113: DuckDB file creation race

file.exists() + create could race under concurrency. Consider tryCatch around creation or lazy connect that creates the DB if missing.

R/fg_pip_old.R (2)

96-103: Loop index: prefer nrow(ctry_years) over length of a column

Safer when interpolation_id has NAs or differing length.

-    for (ctry_year_id in seq_along(ctry_years$interpolation_id)) {
+    for (ctry_year_id in seq_len(nrow(ctry_years))) {

241-249: Regex replacement clarity in fg_standardize_cache_id_old

Pattern builds alternation from unique(reporting_level). Add explicit anchors or word boundaries to avoid accidental matches (e.g., substrings).

R/compute_fgt_new.R (2)

255-256: Guard setnames for area → reporting_level

setnames() errors if 'area' is absent. Guard to avoid hard failures across heterogeneous inputs.

-      setnames(dt, "area", "reporting_level")
+      if ("area" %in% names(dt)) {
+        setnames(dt, "area", "reporting_level")
+      }

24-31: Log handling: precompute only where positive

You already mask non-positive with pos in compute_fgt_dt; replicate or document behavior in compute_fgt for consistency (currently logs all w with warnings suppressed).

inst/plumber/v1/endpoints.R (4)

18-30: Harden request-id generation (avoid FP artifacts/precision loss).

Formatting doubles to string can yield decimals and precision loss for epoch-ms. Generate an integer-like string reliably.

-.req_id <- function() {
-  ts_ms_num <- as.numeric(Sys.time()) * 1000
-  ts_ms_chr <- format(ts_ms_num, scientific = FALSE, trim = TRUE)
-  rnd       <- sample.int(1e9, 1)
-  id_raw    <- paste0(ts_ms_chr, "-", rnd)
-
-  list(
-    id_raw    = id_raw,
-    timestamp = as.POSIXct(as.numeric(ts_ms_chr) / 1000,
-                           origin = "1970-01-01", tz = "UTC"),
-    random    = rnd
-  )
-}
+.req_id <- function() {
+  ts_ms_chr <- formatC(as.numeric(Sys.time()) * 1000, format = "f", digits = 0)
+  rnd       <- sample.int(1e9, 1)
+  id_raw    <- paste0(ts_ms_chr, "-", rnd)
+  list(
+    id_raw    = id_raw,
+    timestamp = as.POSIXct(as.numeric(ts_ms_chr) / 1000,
+                           origin = "1970-01-01", tz = "UTC"),
+    random    = rnd
+  )
+}

230-231: Poverty line rounding inconsistent with pipeline (2 vs 3 decimals).

API rounds to 2 d.p., pip_new_lineups() rounds to 3 (Line 76 in R/pip_new_lineups.R). Align to one rule to avoid cache/data mismatches.

Would you like me to generate a small patch aligning both to 2 d.p., or is 3 d.p. the new standard?


327-354: Avoid per-endpoint tryCatch/timeout duplication; prefer safe_endpoint wrapper.

You already introduced safe_endpoint for cp-charts. Consider wrapping /api/v1/pip similarly to centralize 503/500 shaping and request_id propagation.


858-867: pc-download lacks timeout handling; add with_req_timeout + CSV-friendly 503.

Without a timeout guard this route can hang. Return a minimal CSV row on timeout to honor the serializer.

-#* @serializer csv
-function(req) {
+#* @serializer csv
+function(req, res) {
   params <- req$argsQuery
   params$lkup <- lkups$versions_paths[[req$argsQuery$version]]
   params$pop_units <- 1
   params$version <- NULL
   params$censor  <- TRUE
 
-  do.call(pipapi::ui_pc_charts, params)
+  out <- do.call(pipapi::ui_pc_charts, params) |>
+    with_req_timeout()
+  if (is.null(out)) {
+    res$status <- 503
+    return(data.frame(error = "Request timed out",
+                      request_id = req$.id,
+                      endpoint = "/api/v1/pc-download"))
+  }
+  out
 }
R/pip_new_lineups.R (2)

76-77: Rounding policy: 3 d.p. here vs 2 d.p. in endpoints.

Decide one rounding rule for povline to keep caches/keys consistent across API and internal calls.

If 2 d.p. is expected at the API boundary, should pip_new_lineups() accept the raw value but round to 2 d.p. internally?


107-113: Ensure cache file existence creates parent dir and handles failures.

If data_root lacks a cache dir, create it before writing the DuckDB file; surface failure with a clear error.

-  cache_file_path <- fs::path(lkup$data_root, 'cache', ext = "duckdb")
+  cache_dir       <- fs::path(lkup$data_root)
+  fs::dir_create(cache_dir, recurse = TRUE)
+  cache_file_path <- fs::path(cache_dir, 'cache', ext = "duckdb")
   if (!file.exists(cache_file_path)) {
     # Create an empty duckdb file
     create_duckdb_file(cache_file_path)
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32c908e and a521953.

📒 Files selected for processing (98)
  • DESCRIPTION (2 hunks)
  • NAMESPACE (3 hunks)
  • NEWS.md (1 hunks)
  • R/aaa.R (1 hunks)
  • R/add_agg_stats.R (1 hunks)
  • R/compute_fgt_new.R (1 hunks)
  • R/compute_fgt_old.R (1 hunks)
  • R/create_countries_vctr.R (2 hunks)
  • R/create_lkups.R (9 hunks)
  • R/duckdb_func.R (13 hunks)
  • R/fg_pip.R (3 hunks)
  • R/fg_pip_old.R (1 hunks)
  • R/fgt_cumsum.R (1 hunks)
  • R/get_aux_table.R (2 hunks)
  • R/infer_poverty_line.R (1 hunks)
  • R/pip.R (2 hunks)
  • R/pip_agg.R (1 hunks)
  • R/pip_grp.R (3 hunks)
  • R/pip_grp_logic.R (4 hunks)
  • R/pip_grp_new.R (1 hunks)
  • R/pip_new_lineups.R (1 hunks)
  • R/pip_old.R (7 hunks)
  • R/pip_old_lineups.R (1 hunks)
  • R/pipapi-env.R (1 hunks)
  • R/pipapi-package.R (1 hunks)
  • R/rg_pip.R (3 hunks)
  • R/rg_pip_old.R (1 hunks)
  • R/ui_country_profile.R (3 hunks)
  • R/ui_home_page.R (3 hunks)
  • R/ui_poverty_indicators.R (4 hunks)
  • R/utils-pipdata.R (1 hunks)
  • R/utils-plumber.R (1 hunks)
  • R/utils.R (10 hunks)
  • R/zzz.R (2 hunks)
  • data-raw/data.R (1 hunks)
  • inst/TMP/TMP_API_launcher.R (1 hunks)
  • inst/TMP/TMP_duckdb_cache.R (1 hunks)
  • inst/plumber/v1/endpoints.R (14 hunks)
  • inst/plumber/v1/plumber.R (1 hunks)
  • man/add_attributes_as_columns_multi.Rd (1 hunks)
  • man/add_attributes_as_columns_vectorized.Rd (1 hunks)
  • man/add_dist_stats.Rd (1 hunks)
  • man/add_dist_stats_old.Rd (1 hunks)
  • man/assign_stat.Rd (1 hunks)
  • man/build_pair_dict.Rd (1 hunks)
  • man/compute_fgt.Rd (1 hunks)
  • man/compute_fgt_dt.Rd (1 hunks)
  • man/compute_fgt_dt_old.Rd (1 hunks)
  • man/create_countries_vctr.Rd (1 hunks)
  • man/create_full_list.Rd (1 hunks)
  • man/decode_pairs.Rd (1 hunks)
  • man/empty_response_fg_add.Rd (1 hunks)
  • man/encode_pairs.Rd (1 hunks)
  • man/fg_assign_nas_values_to_dup_cols_old.Rd (1 hunks)
  • man/fg_pip.Rd (1 hunks)
  • man/fg_pip_old.Rd (1 hunks)
  • man/fg_remove_duplicates.Rd (1 hunks)
  • man/fg_remove_duplicates_old.Rd (1 hunks)
  • man/fg_standardize_cache_id_old.Rd (1 hunks)
  • man/fgt_cumsum.Rd (1 hunks)
  • man/format_lfst.Rd (1 hunks)
  • man/get_aux_table.Rd (1 hunks)
  • man/get_aux_table_ui.Rd (1 hunks)
  • man/get_country_code_subset.Rd (1 hunks)
  • man/get_dt_dist_stats.Rd (1 hunks)
  • man/get_from_pipapienv.Rd (1 hunks)
  • man/get_lt_attr.Rd (1 hunks)
  • man/get_mean_median.Rd (1 hunks)
  • man/get_pipapienv.Rd (1 hunks)
  • man/get_rl_rows_single.Rd (1 hunks)
  • man/get_total_pop.Rd (1 hunks)
  • man/infer_poverty_line.Rd (1 hunks)
  • man/list_code_column_values.Rd (1 hunks)
  • man/load_data_list.Rd (1 hunks)
  • man/load_data_list_old.Rd (1 hunks)
  • man/load_list_refy.Rd (1 hunks)
  • man/map_fgt.Rd (1 hunks)
  • man/pip.Rd (2 hunks)
  • man/pip_agg.Rd (3 hunks)
  • man/pip_aggregate_by.Rd (1 hunks)
  • man/pip_new_lineups.Rd (1 hunks)
  • man/pip_old.Rd (1 hunks)
  • man/pip_old_lineups.Rd (1 hunks)
  • man/rg_pip_old.Rd (2 hunks)
  • man/safe_endpoint.Rd (1 hunks)
  • man/set_in_pipapienv.Rd (1 hunks)
  • man/subset_lkup.Rd (2 hunks)
  • man/transform_input.Rd (1 hunks)
  • man/ui_cp_poverty_charts.Rd (1 hunks)
  • man/ui_pc_charts.Rd (2 hunks)
  • man/update_master_file.Rd (1 hunks)
  • man/update_pair_dict.Rd (1 hunks)
  • man/use_new_lineup_version.Rd (1 hunks)
  • man/with_req_timeout.Rd (1 hunks)
  • tests/testthat/test-create_countries_vctr.R (7 hunks)
  • tests/testthat/test-fg_pip-local.R (7 hunks)
  • tests/testthat/test-pip-local.R (1 hunks)
  • tests/testthat/test-utils.R (5 hunks)
🔇 Additional comments (33)
R/compute_fgt_old.R (1)

32-35: No action needed—NAMESPACE already imports fmean/fsum
NAMESPACE contains import(collapse, except = fdroplevels) (line 64), so fmean/fsum are available unqualified.

Likely an incorrect or invalid review comment.

tests/testthat/test-utils.R (1)

69-71: LGTM: pass povline = NULL explicitly.

The updated subset_lkup calls reflect the new signature and keep test intent clear.

Also applies to: 83-85, 97-99, 111-113, 129-131

DESCRIPTION (1)

73-75: Imports verified Both parallel and R.utils are called via pkg::fun (e.g. parallel::detectCores, R.utils::withTimeout), so no importFrom entries are required in NAMESPACE. parallel ships with base R and R.utils is listed under Imports, ensuring it’s installed in CI.

R/pipapi-package.R (1)

62-69: LGTM on the new dataset entry.
The added Rd stub matches the existing pattern for empty-response datasets and looks consistent.

tests/testthat/test-create_countries_vctr.R (1)

455-580: Thanks for updating the test calls.
Passing lkup keeps the tests aligned with the refactored create_countries_vctr signature.

man/fg_standardize_cache_id_old.Rd (1)

5-20: Doc entry looks good.
The usage, arguments, and description line up with the OLD helper’s purpose.

man/empty_response_fg_add.Rd (1)

1-16: Documentation looks good.

The dataset metadata aligns with the new export and matches the pattern of the existing empty_response documentation.

man/list_code_column_values.Rd (1)

1-25: Clear and complete helper documentation.

Argument descriptions, return value, and example are crisp—thank you for keeping this helper well documented.

man/update_pair_dict.Rd (1)

1-24: Internal helper doc looks solid.

The usage, argument descriptions, and purpose are all spelled out clearly for update_pair_dict; no issues on my end.

man/subset_lkup.Rd (1)

17-45: Nice addition of popshare docs.

The new parameter is clearly introduced with defaults and description, keeping the API docs aligned with the implementation.

man/decode_pairs.Rd (1)

19-41: Clear and complete docstring.

The parameter and return-value sections read cleanly and match the new decode workflow. No further action needed.

R/pip_old.R (1)

144-150: Nice guard on mixed path types.

Forcing path to character before binding cached rows prevents type clashes when re-reading cached data. Good catch.

man/add_attributes_as_columns_multi.Rd (1)

21-23: Confirm exposure of file column in public output.

Including file in the returned columns can leak internal filenames and collide during joins. If this column is only for internal joins, consider dropping it before returning or explicitly documenting its contents and stability guarantees. Please confirm intent; otherwise, adjust implementation/docs.

R/pip.R (2)

55-55: Behavior change: censor now defaults to FALSE.

This is a breaking default change. Please confirm downstream callers/tests and update NEWS/docs accordingly.


65-93: Dispatcher structure looks good.

Clear delegation to new/old implementations based on the lookup flag; parameter pass-through is consistent.

If either callee expects scalar enums, the match.arg normalization above will avoid surprises.

R/rg_pip_old.R (1)

47-50: Empty-result handling is clean.

Early return with placeholder and cache flags avoids downstream errors.

man/safe_endpoint.Rd (1)

27-35: Clarify message masking vs. debug behavior

You state message is actual error only in debug, otherwise “Internal Server Error”. Please ensure this matches the implemented wrapper. If the global plumber error handler still returns err$message unconditionally, update either the code or this doc to avoid mixed guarantees.

R/rg_pip.R (2)

71-79: process_dt signature matches per-survey povline usage Verified that process_dt is defined as function(dt, povline, …) in R/compute_fgt_new.R, so both Map(process_dt, lt, povline) and lapply(lt, process_dt, povline = povline) correctly align.


59-69: Ensure correct sorting before calling infer_poverty_line
infer_poverty_line(..., assume_sorted = TRUE) skips any internal sorting and requires welfare/weight to be pre-ordered. Since load_data_list() doesn’t guarantee sorted inputs, either sort x$welfare and x$weight before the call or set assume_sorted = FALSE. You can mirror the logic in fg_pip.R (lines 71–75) to choose or enforce sorting.

inst/plumber/v1/plumber.R (1)

24-43: Omit preroute hook: req$.start is already initialized
The ctx filter in inst/plumber/v1/endpoints.R sets req$.start <- .now() before routing, so adding a preroute hook is unnecessary.

man/fg_remove_duplicates_old.Rd (1)

1-24: Doc matches exported helper
Signature and default columns line up with the code in fg_pip_old, so the generated Rd looks good.

NAMESPACE (1)

3-66: Exports/imports consistent with new surface
The additional exports and the explicit data.table import reflect the new helpers and look consistent with the rest of the PR changes.

R/create_lkups.R (1)

1018-1030: Exported helper looks good

use_new_lineup_version(x) is clear and robust. No changes requested.

R/pip_grp.R (2)

280-288: Order of operations: compute pop_in_poverty after fixing population

After correcting aggregation to sum reporting_pop, computing pop_in_poverty is fine. No change beyond the fixes above.

Please re-check a few known totals (e.g., WLD headcount × pop vs. pop_in_poverty) after applying the aggregation fixes.


53-62: fg_pip_old usage acknowledged

Using fg_pip_old() here is fine as a transitional step. Ensure retirement plan/timeline is tracked.

R/fgt_cumsum.R (1)

75-85: Cutpoint join assumes existence of index==0 row

findInterval(..., left.open = TRUE) can return 0, and the next join expects an index==0 row to exist in DT_min. Please assert/guarantee that zero row exists per group before the join.

R/utils-pipdata.R (2)

248-276: Typed NA assignment in assign_stat is OK via rep.int

Looks good. Minor: when broadcasting scalar, you can avoid rep.int by dt[, (colname) := unname(v)] (data.table recycles). Current version is explicit and fine.


329-335: Attribute name consistency

You return dist_stats = attr(., "dt_dist_stats") but other helpers expect an attribute named dist_stats. Confirm this divergence is intentional. Otherwise, standardize attribute names to avoid subtle bugs.

R/fg_pip.R (4)

63-64: load_list_refy depends on fix to _ placeholder

This call will currently fail due to the _ bug reported in R/fgt_cumsum.R. After fixing that, this section should work.


68-83: Popshare path: ensure povline column name consistency

Downstream you unconditionally setnames(out, "povline", "poverty_line"). Confirm that process_dt returns a povline column in the popshare branch; otherwise setnames will error. Adjust to rename only when present.

-  setnames(out, "povline", "poverty_line")
+  if ("povline" %in% names(out)) setnames(out, "povline", "poverty_line")

163-175: join(..., overid = 2): confirm intended duplicate-handling

Using overid = 2 changes how duplicate keys are handled. Please confirm this aligns with expectations for metadata multiplicity or document the choice.


45-47: Likely setDT() misuse with pipe (assigns NULL)

metadata <- metadata$lkup |> setDT() can assign NULL because setDT() converts by reference and returns (often invisibly). Use as.data.table() or assign then setDT() separately.

-  metadata <- metadata$lkup |>
-    setDT()
+  metadata <- as.data.table(metadata$lkup)
+  # or:
+  # metadata <- metadata$lkup; setDT(metadata)

Likely an incorrect or invalid review comment.

R/compute_fgt_new.R (1)

51-66: mean_and_med path assumes columns exist

ffirst(dt$mean)/median/reporting_year require those columns present in dt. Add checks or compute them upstream to avoid subscript errors.

Comment on lines +38 to 54
fg <- fg_pip(
ctr,
year = year,
povline = 3,
welfare_type = "all",
reporting_level = "all",
popshare = NULL,
lkup = lkup
) |>
rbindlist(fill = TRUE)
empty_response_fg <- fg[-1]


add_vars_out_of_pipeline(fg, fill_gaps = TRUE, lkup = lkup)

empty_response_fg_add <- fg[-1]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

empty_response_fg mutates alongside fg
empty_response_fg <- fg[-1] captures a shallow view of the same column pointers. When add_vars_out_of_pipeline(fg, …) appends columns by reference, those columns also appear in empty_response_fg, so we lose the “pre augmentation” snapshot. Please take a defensive copy before calling add_vars_out_of_pipeline, e.g.

-empty_response_fg <- fg[-1]
+empty_response_fg <- data.table::copy(fg[-1])

That guarantees empty_response_fg stays untouched while fg gains the extra fields.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fg <- fg_pip(
ctr,
year = year,
povline = 3,
welfare_type = "all",
reporting_level = "all",
popshare = NULL,
lkup = lkup
) |>
rbindlist(fill = TRUE)
empty_response_fg <- fg[-1]
add_vars_out_of_pipeline(fg, fill_gaps = TRUE, lkup = lkup)
empty_response_fg_add <- fg[-1]
fg <- fg_pip(
ctr,
year = year,
povline = 3,
welfare_type = "all",
reporting_level = "all",
popshare = NULL,
lkup = lkup
) |>
rbindlist(fill = TRUE)
empty_response_fg <- data.table::copy(fg[-1])
add_vars_out_of_pipeline(fg, fill_gaps = TRUE, lkup = lkup)
empty_response_fg_add <- fg[-1]
🤖 Prompt for AI Agents
In data-raw/data.R around lines 38 to 54, empty_response_fg currently holds a
shallow view of fg (fg[-1]) so when add_vars_out_of_pipeline mutates fg by
reference those new columns also appear in empty_response_fg; make a defensive
deep copy of the snapshot before calling add_vars_out_of_pipeline (use
data.table::copy on the result you want to preserve) so empty_response_fg
remains the pre-augmentation snapshot while fg can be safely modified in-place.

Comment on lines +9 to +13
ncores <- parallel::detectCores(logical = FALSE)
data.table::setDTthreads(max(1L, ncores))
collapse::set_collapse(nthreads = max(1L, ncores))
fst::threads_fst(max(1L, ncores))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle detectCores() == NA to avoid propagating NA threads

detectCores can return NA in some environments; max(1L, NA) ⇒ NA. Guard it.

-ncores <- parallel::detectCores(logical = FALSE)
-data.table::setDTthreads(max(1L, ncores))
-collapse::set_collapse(nthreads = max(1L, ncores))
-fst::threads_fst(max(1L, ncores))
+ncores <- parallel::detectCores(logical = FALSE)
+ncores <- if (is.na(ncores)) 1L else as.integer(max(1L, ncores))
+data.table::setDTthreads(ncores)
+collapse::set_collapse(nthreads = ncores)
+fst::threads_fst(ncores)
🤖 Prompt for AI Agents
In inst/plumber/v1/plumber.R around lines 9 to 13, parallel::detectCores(logical
= FALSE) can return NA which makes max(1L, ncores) produce NA and propagate
invalid thread counts; change the assignment to coerce a safe integer fallback
(e.g. set ncores <- as.integer(parallel::detectCores(logical = FALSE)); if
(is.na(ncores) || ncores < 1L) ncores <- 1L) and then use that safe ncores when
calling setDTthreads(), set_collapse(), and threads_fst().

Comment on lines +56 to +77
plumber::pr_set_error(function(req, res, err) {
method <- req$REQUEST_METHOD %||% ""
path <- req$PATH_INFO %||% ""
rid <- req$.id %||% "NA"

cat(
sprintf(
'{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n',
rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE)
),
file = stderr()
)

res$status <- 500
list(
error = "Internal Server Error",
message = err$message,
path = path,
method = method,
request_id = rid
)
}) |>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t leak internal error messages in PROD responses

The handler returns err$message to clients unconditionally. Gate it on PIPAPI_DEBUG and keep full details in logs.

-plumber::pr_set_error(function(req, res, err) {
+plumber::pr_set_error(function(req, res, err) {
   method <- req$REQUEST_METHOD %||% ""
   path   <- req$PATH_INFO %||% ""
   rid    <- req$.id %||% "NA"
 
   cat(
     sprintf(
       '{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n',
       rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE)
     ),
     file = stderr()
   )
 
   res$status <- 500
+  debug <- tolower(Sys.getenv("PIPAPI_DEBUG", "false")) %in% c("true","1","t","yes","y")
   list(
     error      = "Internal Server Error",
-    message    = err$message,
+    message    = if (debug) err$message else "Internal Server Error",
     path       = path,
     method     = method,
     request_id = rid
   )
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
plumber::pr_set_error(function(req, res, err) {
method <- req$REQUEST_METHOD %||% ""
path <- req$PATH_INFO %||% ""
rid <- req$.id %||% "NA"
cat(
sprintf(
'{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n',
rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE)
),
file = stderr()
)
res$status <- 500
list(
error = "Internal Server Error",
message = err$message,
path = path,
method = method,
request_id = rid
)
}) |>
plumber::pr_set_error(function(req, res, err) {
method <- req$REQUEST_METHOD %||% ""
path <- req$PATH_INFO %||% ""
rid <- req$.id %||% "NA"
cat(
sprintf(
'{"type":"error","id":"%s","method":"%s","path":"%s","msg":%s}\n',
rid, method, path, jsonlite::toJSON(err$message, auto_unbox = TRUE)
),
file = stderr()
)
res$status <- 500
debug <- tolower(Sys.getenv("PIPAPI_DEBUG", "false")) %in% c("true", "1", "t", "yes", "y")
list(
error = "Internal Server Error",
message = if (debug) err$message else "Internal Server Error",
path = path,
method = method,
request_id = rid
)
}) |>
🤖 Prompt for AI Agents
In inst/plumber/v1/plumber.R around lines 56 to 77, the error handler currently
returns err$message to clients unconditionally; change it to only include the
full err$message in the HTTP response when PIPAPI_DEBUG is set (e.g.
Sys.getenv("PIPAPI_DEBUG") == "1"), otherwise return a generic message like
"Internal Server Error"; keep the existing detailed JSON error written to stderr
(for logs) and populate the response fields (error, message, path, method,
request_id) with the generic message when not in debug mode and with the real
err$message when in debug mode.

Comment on lines +55 to +60
dest_cache <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_SERVER") |>
fs::path(ver_to_use, "cache.duckdb")

if (fs::file_exists(ori_cache)) {
fs::file_copy(ori_cache, dest_cache, overwrite = TRUE)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Create the destination directory before copying.

Line 55 constructs a new per-release folder, but Line 59 calls fs::file_copy without ensuring that directory exists. On a fresh release this will raise No such file or directory. Please create the parent directory first.

 dest_cache <- Sys.getenv("PIPAPI_DATA_ROOT_FOLDER_SERVER") |>
   fs::path(ver_to_use, "cache.duckdb")

 if (fs::file_exists(ori_cache)) {
+  fs::dir_create(fs::path_dir(dest_cache), recurse = TRUE)
   fs::file_copy(ori_cache, dest_cache, overwrite = TRUE)
 }
🤖 Prompt for AI Agents
In inst/TMP/TMP_duckdb_cache.R around lines 55 to 60, the code builds dest_cache
but calls fs::file_copy without ensuring its parent directory exists, which will
fail on a fresh release; before copying, create the destination directory (e.g.,
call fs::dir_create on the parent directory of dest_cache) so the directory
exists, then proceed to fs::file_copy as before.

Comment on lines +5 to +22
\title{Title}
\usage{
compute_fgt_dt_old(dt, welfare, weight, povlines)
}
\arguments{
\item{dt}{data frame with \code{welfare} and \code{weight} columns}

\item{welfare}{character: welfare variable name}

\item{weight}{character: weight variable name}

\item{povlines}{double: vector with poveryt lines}
}
\value{
data.table with estimates poverty estimates
}
\description{
Title
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace placeholder copy and typos.

The help file still has scaffolding text (Title, Title) and a typo in “poveryt”. This will surface to users as-is. Please give the function a meaningful title/description and clean up the argument spelling before release.

🤖 Prompt for AI Agents
In man/compute_fgt_dt_old.Rd around lines 5–22, replace the placeholder "Title"
entries and fix the typo "poveryt" in the \arguments and \description sections:
provide a concise, meaningful \title (e.g., "Compute Foster–Greer–Thorbecke
poverty measures"), write a clear \description summarizing what
compute_fgt_dt_old() does and its inputs/outputs, correct "poveryt" to "poverty"
in \item{povlines}, and improve \value and \usage text to accurately describe
the returned data.table and function signature so the help page is user-facing
and free of scaffolding.

Comment on lines +78 to 82
country_code = country,
reporting_year = res$reporting_year,
poverty_line = povline,
headcount = res$headcount
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fill poverty_line from result to avoid NULL column

If povline is NULL, constructing data.table omits the column. Use res$poverty_line.

-  out <- data.table::data.table(
-    country_code   = country,
-    reporting_year = res$reporting_year,
-    poverty_line   = povline,
-    headcount      = res$headcount
-  )
+  out <- data.table::data.table(
+    country_code   = res$country_code,
+    reporting_year = res$reporting_year,
+    poverty_line   = res$poverty_line,
+    headcount      = res$headcount
+  )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
country_code = country,
reporting_year = res$reporting_year,
poverty_line = povline,
headcount = res$headcount
)
out <- data.table::data.table(
country_code = res$country_code,
reporting_year = res$reporting_year,
poverty_line = res$poverty_line,
headcount = res$headcount
)
🤖 Prompt for AI Agents
In R/ui_country_profile.R around lines 78 to 82, the data.table construction
uses povline which can be NULL and causes the poverty_line column to be omitted;
replace poverty_line = povline with poverty_line = res$poverty_line (or use a
coalesce/rescue to ensure a non-NULL value) so the poverty_line column is always
present in the resulting data.table.

Comment on lines +18 to +38
use_new <- lkup$use_new_lineup_version

# Run correct function
#-------------------------------------
out <- if (use_new) {
x <- pip_grp_new(country = "ALL",
year = ref_years,
povline = povline,
welfare_type = "all",
reporting_level = "national",
lkup = lkup,
censor = FALSE)

regs <- lkup$aux_files$country_list[, region_code] |>
funique() |>
c("WLD")

x |>
fsubset(region_code %in% regs)

} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New-lineup uses ref_lkup years; should use lineup/refy years

When use_new is TRUE you still pass year = ref_years derived from lkup$ref_lkup. For lineup flow, years should come from lineup/refy data (or lkup$valid_years if that already includes lineup years). This can return wrong or empty results.

Use lineup/refy years and only exclude the 1981–1989 filter on the old path:

-  ref_years <- sort(unique(lkup$ref_lkup$reporting_year))
-  ref_years <- ref_years[!ref_years %in% c(1981:1989)]
+  # decide years based on lineup path
+  use_new <- lkup$use_new_lineup_version
+  if (isTRUE(use_new) && "refy_lkup" %in% names(lkup)) {
+    ref_years <- sort(unique(lkup$refy_lkup$reporting_year))
+  } else {
+    ref_years <- sort(unique(lkup$ref_lkup$reporting_year))
+    ref_years <- ref_years[!ref_years %in% c(1981:1989)]
+  }
@@
-  out <- if (use_new) {
+  out <- if (isTRUE(use_new)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
use_new <- lkup$use_new_lineup_version
# Run correct function
#-------------------------------------
out <- if (use_new) {
x <- pip_grp_new(country = "ALL",
year = ref_years,
povline = povline,
welfare_type = "all",
reporting_level = "national",
lkup = lkup,
censor = FALSE)
regs <- lkup$aux_files$country_list[, region_code] |>
funique() |>
c("WLD")
x |>
fsubset(region_code %in% regs)
} else {
# decide years based on lineup path
use_new <- lkup$use_new_lineup_version
if (isTRUE(use_new) && "refy_lkup" %in% names(lkup)) {
ref_years <- sort(unique(lkup$refy_lkup$reporting_year))
} else {
ref_years <- sort(unique(lkup$ref_lkup$reporting_year))
ref_years <- ref_years[!ref_years %in% c(1981:1989)]
}
# Run correct function
#-------------------------------------
out <- if (isTRUE(use_new)) {
x <- pip_grp_new(
country = "ALL",
year = ref_years,
povline = povline,
welfare_type = "all",
reporting_level = "national",
lkup = lkup,
censor = FALSE
)
regs <- lkup$aux_files$country_list[, region_code] |>
funique() |>
c("WLD")
x |>
fsubset(region_code %in% regs)
} else {
# ... existing "old" path ...
}
🤖 Prompt for AI Agents
In R/ui_home_page.R around lines 18–38, when use_new is TRUE the code
incorrectly passes year = ref_years (from lkup$ref_lkup); change the year
argument for the new-lineup branch to use the lineup/refy years (e.g.,
lkup$lineup$refy or lkup$valid_years if that already contains lineup years) so
the pip_grp_new call uses lineup years, and ensure the 1981–1989
exclusion/filter is only applied on the old-path (else) branch, not in this
new-lineup branch.

Comment on lines +106 to +111
regs <- cl[, .(region_code, africa_split_code)] |>
unlist() |> # convert to vector
na_omit() |>
unique() |>
unname() |>
c("WLD") # add the world
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix undefined na_omit helper

na_omit() is not an exported function (base R provides stats::na.omit()), so this call will throw at runtime and block regional results. Please switch to the correct helper.

-    na_omit() |>
+    stats::na.omit() |>
🤖 Prompt for AI Agents
R/ui_poverty_indicators.R around lines 106 to 111: the pipeline calls a
non-existent na_omit() helper causing a runtime error; replace that call with
the exported stats::na.omit() (or simply na.omit) so the NA removal works as
intended and the downstream unique()/unname() calls operate on a clean vector.

Comment on lines +627 to +658
#' @export
safe_endpoint <- function(fun, endpoint, debug = NULL) {
if (is.null(debug)) {
debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE")
}

function(req, res) {
tryCatch(
{
fun(req, res)
},
error = function(e) {
res$status <- 500L
out <- list(
error = paste("Error in", endpoint),
message = if (debug) conditionMessage(e) else "Internal Server Error",
request_id = tryCatch(req$.id, error = \(.) NA),
endpoint = endpoint
)
if (debug) {
out$class <- class(e)[[1]]
out$call <- as.character(conditionCall(e))
out$query <- req$argsQuery
out$trace <- utils::capture.output(
rlang::trace_back(bottom = 10, simplify = "branch")
)
}
out
}
)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Timeouts currently surface as 200; map them to 504 and propagate metadata

with_req_timeout() returns a non-error list (ok = FALSE), so safe_endpoint() treats it as success and leaves res$status as 200. Clients won’t detect timeouts. Detect this structured timeout and set 504, or raise a typed condition from with_req_timeout() and map it in safe_endpoint().

Apply either approach:

A) Minimal change (detect structured timeout in safe_endpoint):

 safe_endpoint <- function(fun, endpoint, debug = NULL) {
   if (is.null(debug)) {
-    debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE")
+    debug <- identical(Sys.getenv("PIPAPI_DEBUG"), "TRUE")
   }
 
   function(req, res) {
-    tryCatch(
-      {
-        fun(req, res)
-      },
-      error = function(e) {
+    val <- tryCatch(
+      {
+        fun(req, res)
+      },
+      error = function(e) {
         res$status <- 500L
         out <- list(
           error      = paste("Error in", endpoint),
           message    = if (debug) conditionMessage(e) else "Internal Server Error",
           request_id = tryCatch(req$.id, error = \(.) NA),
           endpoint   = endpoint
         )
         if (debug) {
           out$class <- class(e)[[1]]
           out$call  <- as.character(conditionCall(e))
           out$query <- req$argsQuery
-          out$trace <- utils::capture.output(
-            rlang::trace_back(bottom = 10, simplify = "branch")
-          )
+          out$trace <- tryCatch({
+            if (inherits(e, "rlang_error")) {
+              utils::capture.output(rlang::trace_back(e, simplify = "branch"))
+            } else {
+              as.character(sys.calls())
+            }
+          }, error = \(.) NULL)
         }
-        out
+        out
       }
     )
+    # Map structured timeout objects to 504 Gateway Timeout
+    if (is.list(val)) {
+      msg <- tryCatch(tolower(as.character(val$error)), error = \(.) "")
+      if (isFALSE(val$ok) && grepl("timeout", msg)) {
+        res$status <- 504L
+        val$request_id <- tryCatch(req$.id, error = \(.) NA)
+        val$endpoint   <- endpoint
+        return(val)
+      }
+    }
+    val
   }
 }

B) Typed timeout error (preferred), keep safe_endpoint() simple:

 with_req_timeout <- function(expr,
                              secs = as.numeric(Sys.getenv("PLUMBER_REQ_TIMEOUT", "150"))) {
   if (!is.finite(secs) || secs <= 0) return(force(expr))
 
   start <- proc.time()[["elapsed"]]
   tryCatch(
     {
       R.utils::withTimeout(
         expr     = force(expr),
         timeout  = secs,
         onTimeout = "error"
       )
 
     },
     TimeoutException = \(e) {
       elapsed <- proc.time()[["elapsed"]] - start
-      list(
-        ok     = FALSE,
-        error  = sprintf("Request exceeded timeout of %s seconds", secs),
-        elapsed = elapsed
-      )
+      stop(structure(
+        list(message = sprintf("Request exceeded timeout of %s seconds", secs),
+             elapsed = elapsed),
+        class = c("pipapi_timeout", "error", "condition")
+      ))
     }
   )
 }

If using (B), add in safe_endpoint():

-        res$status <- 500L
+        res$status <- if (inherits(e, "pipapi_timeout")) 504L else 500L

Also applies to: 675-698

🤖 Prompt for AI Agents
In R/utils-plumber.R around lines 627-658 (and similarly 675-698), safe_endpoint
currently treats the non-error return from with_req_timeout() as success and
leaves res$status 200, so timeouts aren’t propagated; fix by either A) detecting
the structured timeout return (e.g. a list with ok == FALSE or a specific
timeout flag) inside the error handler path and setting res$status <- 504 and
returning the timeout metadata, or B) preferably modify with_req_timeout() to
signal a typed timeout condition (classed error or abort), then update
safe_endpoint’s tryCatch to catch that timeout type and set res$status <- 504
while attaching the timeout metadata to the response; implement one approach,
ensure the response body contains request_id/endpoint and any timeout info, and
keep other error handling unchanged.

Comment on lines +1538 to +1544
join(x = fgt,
y = dist,
on = by_var,
how = "left",
validate = "m:1", # multiple povlines
verbose = 0L)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix undefined join() call

get_mean_median() invokes join(...) without qualifying or importing it, so the first call will throw “could not find function 'join'”. Use the fully qualified helper (e.g., collapse::join) or a data.table merge so the new lineup path doesn’t hard-crash.

-  join(x = fgt,
-      y = dist,
-      on = by_var,
-      how = "left",
-      validate = "m:1", # multiple povlines
-      verbose = 0L)
+  collapse::join(
+    x = fgt,
+    y = dist,
+    on = by_var,
+    how = "left",
+    validate = "m:1", # multiple povlines
+    verbose = 0L
+  )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
join(x = fgt,
y = dist,
on = by_var,
how = "left",
validate = "m:1", # multiple povlines
verbose = 0L)
}
collapse::join(
x = fgt,
y = dist,
on = by_var,
how = "left",
validate = "m:1", # multiple povlines
verbose = 0L
)
}
🤖 Prompt for AI Agents
In R/utils.R around lines 1538 to 1544, the call to join(...) is undefined and
will throw "could not find function 'join'"; replace it with a fully qualified
call to the intended helper (for example collapse::join(...)) or convert it to
an explicit data.table merge so the function does not rely on an unimported
symbol; if you choose collapse::join, also ensure collapse is declared in
DESCRIPTION (Imports) or add an import/importFrom in NAMESPACE so the package
builds cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants