diff --git a/CLAUDE.md b/CLAUDE.md index 05a7659..69658f3 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ ## What this package does -**urbnindicators** is an R package that provides analysis-ready American Community Survey (ACS) data with minimal user effort. The main entry point is `compile_acs_data()`, which pulls hundreds of standardized variables (raw counts + calculated percentages), generates a codebook, and computes margins of error and coefficients of variation. +**urbnindicators** is an R package that provides analysis-ready American Community Survey (ACS) data with minimal user effort. The main entry point is `compile_acs_data()`, which pulls hundreds of standardized variables (raw counts + calculated percentages), generates a codebook, and computes margins of error. - Five-year ACS estimates only; tract-level geography and up (no block groups) - Lifecycle stage: experimental @@ -37,7 +37,7 @@ CI runs on GitHub Actions: `test-coverage.yaml` (push/PR to main) and `pkgdown.y - **Indentation**: 2 spaces - **Naming**: `snake_case` for functions and variables - **Variable naming pattern**: `[concept]_[subconcept]_[characteristic]_[metric]` (e.g., `race_nonhispanic_white_alone_percent`) -- **Variable suffixes**: `_percent` for percentages, `_universe` or `_universe_` for universe variables, `_M` for margins of error, `_CV` for coefficients of variation, `_SE` for standard errors +- **Variable suffixes**: `_percent` for percentages, `_universe` or `_universe_` for universe variables, `_M` for margins of error - **Documentation**: roxygen2 (v7.3.2) with markdown mode enabled - **Conditionals**: `dplyr::if_else()` (not base `ifelse()`) - **Division**: use `safe_divide(x, y)` for percentage calculations (returns 0 instead of NaN) @@ -69,12 +69,10 @@ Users can request specific subsets of data: # Pull specific tables (using construct-level names) compile_acs_data(tables = c("race", "snap"), years = 2022, geography = "county", states = "NJ") -# Pull by indicator name (returns the full parent table) -compile_acs_data(indicators = c("snap_received_percent"), years = 2022, geography = "county", states = "NJ") - -# Discover available tables, indicators, and variables +# Discover available tables and variables list_tables() -list_variables() # tibble of all variables and their table names +list_variables() # tibble of all variables and their table names +get_acs_codebook() # browse ACS variables with clean names and table codes ``` **Construct-level table names**: Some ACS tables contain multiple constructs. These are split into separate user-facing tables: @@ -83,7 +81,7 @@ list_variables() # tibble of all variables and their table names Both construct names and internal names are accepted by `compile_acs_data(tables = ...)` and `resolve_tables()`. -When `tables`/`indicators` are specified: +When `tables` are specified: 1. `resolve_tables()` determines which tables are needed (always includes `total_population`) 2. `collect_raw_variables()` builds the named ACS variable vector for those tables 3. Only those tables' `compute_fn` functions are called @@ -92,19 +90,19 @@ When `tables`/`indicators` are specified: ### Key source files -1. **`R/table_registry.R`** - Central registry: table definitions, `list_tables()`, `list_indicators()`, `resolve_tables()`, `collect_raw_variables()`, `expand_codebook_entry()`, and all `register_table()` calls. -2. **`R/list_acs_variables.R`** - `list_acs_variables()` (supports optional `tables` param), `select_variables_by_name()`, `filter_variables()`. -3. **`R/compile_acs_data.R`** - `compile_acs_data()` (with `tables`, `indicators`, deprecated `variables`), `internal_compute_acs_variables()` (legacy), `safe_divide()`. +1. **`R/table_registry.R`** - Central registry: table definitions, `list_tables()`, `resolve_tables()`, `collect_raw_variables()`, `expand_codebook_entry()`, and all `register_table()` calls. +2. **`R/list_acs_variables.R`** - `list_acs_variables()` (supports optional `tables` param), `select_variables_by_name()`, `filter_variables()`, `get_acs_codebook()`. +3. **`R/compile_acs_data.R`** - `compile_acs_data()` (with `tables`, deprecated `variables`), `internal_compute_acs_variables()` (legacy), `safe_divide()`. 4. **`R/generate_codebook.R`** - `generate_codebook()` (registry-based) and `generate_codebook_legacy()` (AST-based, for deprecated `variables` path). -5. **`R/calculate_cvs.R`** - Computes standard errors and coefficients of variation. Parses codebook definition text strings. No changes needed when adding tables. +5. **`R/calculate_cvs.R`** - Computes margins of error for derived variables (uses standard errors as intermediates internally). Parses codebook definition text strings. No changes needed when adding tables. 6. **`R/make_pretty_names.R`** - Converts variable names to publication-ready labels. 7. **`R/utils-pipe.R`** - Re-exports `%>%`. ### Exported functions -- `compile_acs_data(tables, indicators, ...)` - Pull and compute ACS data +- `compile_acs_data(tables, ...)` - Pull and compute ACS data - `list_tables()` - Available table names for the `tables` parameter (construct-level names) -- `list_indicators()` - Available indicator names for the `indicators` parameter +- `get_acs_codebook(year, table)` - Browse ACS variables with clean names and table codes - `list_variables(year)` - Tibble mapping all variables (raw + computed) to their table name - `list_acs_variables(year, tables)` - Named vector of ACS variable codes - `select_variables_by_name(variable_name, census_codebook)` - Filter variables by pattern @@ -121,9 +119,9 @@ To add a new ACS table to the package: - `compute_fn` that calculates derived indicators using `safe_divide()` and `dplyr::across()` - `codebook_entries` with structured entries (types: `simple_percent`, `across_percent`, `across_sum`, `complex`, `one_minus`, `metadata`) 2. **Add any new global variables** to the `utils::globalVariables()` call at the bottom of `R/table_registry.R` -3. **Verify**: `devtools::load_all()` then `list_tables()` shows your table; `list_indicators()` shows your indicators +3. **Verify**: `devtools::load_all()` then `list_tables()` shows your table 4. **Verify codebook**: the codebook auto-generates from `codebook_entries` -- no changes to `R/generate_codebook.R` needed -5. **Verify CVs**: `R/calculate_cvs.R` parses codebook definition strings -- no changes needed if definitions follow standard patterns +5. **Verify MOEs**: `R/calculate_cvs.R` parses codebook definition strings -- no changes needed if definitions follow standard patterns 6. **Update pretty names** if needed (`R/make_pretty_names.R` -- rarely needed) ### Codebook entry types @@ -142,7 +140,7 @@ To add a new ACS table to the package: - Percentages must be 0-1 bounded - All measures must have meaningful, non-missing values - At least 2 distinct values per measure -- CVs should be reasonable for tract-level data (flag if >50 for many tracts) +- MOEs should be reasonable for tract-level data - Compare to published Census Bureau benchmarks when available ## Legacy path diff --git a/NAMESPACE b/NAMESPACE index 16d22d4..a3a377a 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -4,6 +4,7 @@ export("%>%") export(calculate_custom_geographies) export(compile_acs_data) export(filter_variables) +export(get_acs_codebook) export(list_acs_variables) export(list_tables) export(list_variables) diff --git a/R/auto_percent.R b/R/auto_percent.R new file mode 100644 index 0000000..b15b022 --- /dev/null +++ b/R/auto_percent.R @@ -0,0 +1,241 @@ +#' @importFrom magrittr %>% + +####----AUTO-PERCENTAGE COMPUTATION FOR ARBITRARY ACS TABLES----#### + +## Detect whether a string looks like a raw ACS table code (e.g., "B25070", "C15002B", "B01001APR") +is_raw_acs_code = function(x) { + grepl("^[BC][0-9]{5}[A-I]?(?:PR)?$", x, perl = TRUE) +} + +## Resolve a user-supplied string to an ACS table code. +## Accepts: +## 1. A raw ACS code ("B25070") -> returned as-is +## 2. A cleaned variable name from clean_acs_names() -> resolved to parent table +## Returns the table code, or NULL if not resolvable. +resolve_to_acs_table = function(x, year, census_variables = NULL) { + if (is_raw_acs_code(x)) return(x) + + ## load census codebook if not provided + if (is.null(census_variables)) { + suppressMessages({suppressWarnings({ + census_variables = tidycensus::load_variables(year = year, dataset = "acs5") + })}) + } + + ## apply clean_acs_names and search for a match + cleaned = census_variables %>% clean_acs_names() + clean_col = cleaned$clean_names %>% stringr::str_remove("_$") + + ## exact match first + match_idx = which(clean_col == x) + if (length(match_idx) == 0) { + ## try partial match (user gives a prefix) + match_idx = which(stringr::str_starts(clean_col, stringr::fixed(x))) + } + + if (length(match_idx) == 0) return(NULL) + + ## extract the ACS table code from the variable name (e.g., "B25070_001" -> "B25070") + acs_name = cleaned$name[match_idx[1]] + table_code = stringr::str_extract(acs_name, "^[BC][0-9]{5}[A-I]?(?:PR)?") + return(table_code) +} + +## Build a label tree for a single ACS table. +## Takes a data frame filtered to one table (from tidycensus::load_variables) +## with clean_acs_names() already applied. +## Returns the data frame with additional columns: segments, depth, is_total, +## is_subtotal, parent_code, parent_clean_name. +build_label_tree = function(variables_df) { + ## parse label segments (split on !!) + variables_df = variables_df %>% + dplyr::mutate( + segments = stringr::str_split(label, "!!"), + depth = purrr::map_int(segments, length), + is_total = stringr::str_detect(name, "_001$"), + is_subtotal = stringr::str_detect(label, ":$") & !is_total, + clean_name_trimmed = stringr::str_remove(clean_names, "_$")) + + ## assign parent for each variable + ## for variable i, walk backward to find the nearest ancestor subtotal + ## whose segments are a strict prefix of this variable's segments + n = nrow(variables_df) + total_name = variables_df$name[1] + total_clean = variables_df$clean_name_trimmed[1] + + parent_results = purrr::map(seq_len(n), function(i) { + if (variables_df$is_total[i]) { + return(list(parent_code = NA_character_, parent_clean_name = NA_character_)) + } + + current_segments = variables_df$segments[[i]] + candidates = rev(seq_len(i - 1)) + + ## find the nearest ancestor whose segments are a strict prefix + match_idx = purrr::detect(candidates, function(j) { + candidate_segments = variables_df$segments[[j]] + length(candidate_segments) < length(current_segments) && + all(candidate_segments == current_segments[seq_along(candidate_segments)]) && + (variables_df$is_subtotal[j] || variables_df$is_total[j]) + }) + + if (!is.null(match_idx)) { + list(parent_code = variables_df$name[match_idx], + parent_clean_name = variables_df$clean_name_trimmed[match_idx]) + } else { + ## fallback to _001 (table total) + list(parent_code = total_name, parent_clean_name = total_clean) + } + }) + + variables_df$parent_code = purrr::map_chr(parent_results, "parent_code") + variables_df$parent_clean_name = purrr::map_chr(parent_results, "parent_clean_name") + return(variables_df) +} + +## Classify an ACS table as "count" (percentages appropriate) or "skip" (not appropriate). +## Detection based on concept field and the _001 label. +classify_acs_table = function(nodes) { + concept = nodes$concept[1] + total_label = nodes$label[nodes$is_total][1] + + concept_lower = tolower(concept) + label_lower = tolower(total_label) + + ## patterns that indicate non-percentage-amenable tables + skip_patterns = c( + "median", "aggregate", "average", "mean", + "allocation of", "imputation of", + "margin of error") + + has_skip_pattern = purrr::some(skip_patterns, function(pattern) { + grepl(pattern, concept_lower, fixed = TRUE) || + grepl(pattern, label_lower, fixed = TRUE) + }) + if (has_skip_pattern) return("skip") + + ## singleton tables (only one variable) — no meaningful percentages + if (nrow(nodes) <= 1) return("skip") + + ## tables where the total is not a count (e.g., median income tables may have + ## a numeric label rather than "Estimate!!Total:") + if (!grepl(":", total_label) && !grepl("^Estimate!!Total$", total_label)) { + return("skip") + } + + return("count") +} + +## Generate simple_percent definitions for auto-computed tables. +## For each non-total variable, produces a define_percent() call. +## denominator_mode: "parent" (nearest parent subtotal), "total" (_001), or a specific ACS variable code. +generate_auto_definitions = function(nodes, denominator_mode = "parent", + custom_denominator = NULL) { + ## only process non-total variables + leaf_nodes = nodes %>% dplyr::filter(!is_total) + + if (nrow(leaf_nodes) == 0) return(list()) + + ## determine total row clean name (for "total" mode or fallback) + total_clean_name = nodes$clean_name_trimmed[nodes$is_total][1] + + ## if a custom denominator ACS code is given, find its clean name + custom_denom_clean = NULL + if (!is.null(custom_denominator)) { + match_row = nodes %>% dplyr::filter(name == custom_denominator) + if (nrow(match_row) > 0) { + custom_denom_clean = match_row$clean_name_trimmed[1] + } else { + rlang::warn(paste0("Custom denominator '", custom_denominator, + "' not found in table. Falling back to table total.")) + denominator_mode = "total" + } + } + + purrr::map(seq_len(nrow(leaf_nodes)), function(i) { + row = leaf_nodes[i, ] + numerator = row$clean_name_trimmed + + ## determine denominator + if (!is.null(custom_denom_clean)) { + denominator = custom_denom_clean + } else if (denominator_mode == "total") { + denominator = total_clean_name + } else { + ## "parent" mode: use parent_clean_name, fall back to total + denominator = row$parent_clean_name + if (is.na(denominator)) denominator = total_clean_name + } + + ## skip if numerator == denominator (the total itself as a subtotal) + if (identical(numerator, denominator)) return(NULL) + + ## raw variables ending in _pct (renamed from _percent by clean_acs_names): + ## replace _pct with _percent so the computed column gets the standard suffix + if (grepl("_pct$", numerator)) { + output = sub("_pct$", "_percent", numerator) + } else { + output = paste0(numerator, "_percent") + } + define_percent(output = output, numerator = numerator, denominator = denominator) + }) %>% purrr::compact() +} + +## Orchestrator: build a complete auto-table entry from an ACS table code. +## Returns a list with the same shape as register_table() entries, plus is_auto = TRUE. +## Pass census_variables to avoid redundant tidycensus::load_variables() calls. +build_auto_table_entry = function(table_code, year, denominator_mode = "parent", + custom_denominator = NULL, + census_variables = NULL) { + ## load variables only if not provided + if (is.null(census_variables)) { + suppressMessages({suppressWarnings({ + census_variables = tidycensus::load_variables(year = year, dataset = "acs5") + })}) + } + + table_vars = census_variables %>% + dplyr::filter(stringr::str_detect(name, paste0("^", table_code, "_"))) + + if (nrow(table_vars) == 0) { + stop(paste0("ACS table '", table_code, "' not found in the ", year, + " 5-year ACS. Check the table code.")) + } + + ## apply clean_acs_names + table_vars = table_vars %>% clean_acs_names() + + ## build label tree + nodes = build_label_tree(table_vars) + + ## classify table + table_type = classify_acs_table(nodes) + + ## generate definitions + if (table_type == "count") { + definitions = generate_auto_definitions( + nodes, + denominator_mode = denominator_mode, + custom_denominator = custom_denominator) + } else { + definitions = list() + } + + ## build raw_variables named vector (clean_name_ -> ACS code) + raw_variables = stats::setNames(nodes$name, paste0(nodes$clean_name_trimmed, "_")) + + list( + name = table_code, + description = nodes$concept[1], + acs_tables = table_code, + depends_on = character(0), + raw_variable_source = list(type = "manual"), + raw_variables = raw_variables, + definitions = definitions, + is_auto = TRUE, + table_type = table_type) +} + +utils::globalVariables(c( + "clean_names", "is_total", "clean_name_trimmed", + "segments", "depth", "is_subtotal", "parent_code", "parent_clean_name")) diff --git a/R/calculate_custom_geographies.R b/R/calculate_custom_geographies.R index 36cce43..8f8a2c7 100644 --- a/R/calculate_custom_geographies.R +++ b/R/calculate_custom_geographies.R @@ -258,9 +258,7 @@ calculate_custom_geographies = function( if ("total_population_universe_M" %in% colnames(result)) { result = result %>% dplyr::mutate( - population_density_land_sq_kilometer_SE = se_simple(total_population_universe_M) / area_land_sq_kilometer, - population_density_land_sq_kilometer_M = population_density_land_sq_kilometer_SE * 1.645, - population_density_land_sq_kilometer_CV = cv(population_density_land_sq_kilometer, population_density_land_sq_kilometer_SE)) + population_density_land_sq_kilometer_M = (se_simple(total_population_universe_M) / area_land_sq_kilometer) * 1.645) } } @@ -271,10 +269,10 @@ calculate_custom_geographies = function( dplyr::select(dplyr::all_of(c(group_id, "data_source_year")), dplyr::matches("_M$")) - ## Strip MOE/SE/CV columns before re-running definitions + ## Strip MOE columns before re-running definitions result_for_defs = result %>% as.data.frame() %>% - dplyr::select(-dplyr::matches("_M$|_SE$|_CV$")) + dplyr::select(-dplyr::matches("_M$")) ## Re-run definitions for each resolved table to recalculate percentages result_for_defs = purrr::reduce(resolved_tables, function(.data, table_name) { @@ -368,9 +366,7 @@ calculate_custom_geographies = function( df %>% dplyr::mutate( - !!paste0(var_name, "_SE") := percent_se, - !!paste0(var_name, "_M") := percent_se * 1.645, - !!paste0(var_name, "_CV") := cv(df[[var_name]], percent_se)) + !!paste0(var_name, "_M") := percent_se * 1.645) } result = purrr::reduce( @@ -379,34 +375,10 @@ calculate_custom_geographies = function( .init = result) } - ####----Calculate SEs and CVs for Sum Variables----#### - existing_sum_moe_cols = paste0(sum_variables, "_M") - existing_sum_moe_cols = existing_sum_moe_cols[existing_sum_moe_cols %in% colnames(result)] - - if (length(existing_sum_moe_cols) > 0) { - result = result %>% - dplyr::mutate( - dplyr::across( - dplyr::all_of(existing_sum_moe_cols), - ~ se_simple(.x), - .names = "{stringr::str_replace(.col, '_M$', '_SE')}")) - - existing_sum_vars = stringr::str_remove(existing_sum_moe_cols, "_M$") - existing_sum_vars = existing_sum_vars[existing_sum_vars %in% colnames(result)] - - if (length(existing_sum_vars) > 0) { - result = purrr::reduce( - existing_sum_vars, - function(df, var_name) { - se_col = paste0(var_name, "_SE") - cv_col = paste0(var_name, "_CV") - if (se_col %in% colnames(df) && var_name %in% colnames(df)) { - df %>% - dplyr::mutate(!!cv_col := cv(df[[var_name]], df[[se_col]])) - } else { df } - }, - .init = result) - } + ####----Sum Variables Already Have MOEs----#### + ## Sum variables already have _M columns from the aggregation step above; + ## no additional error derivation is needed for them. + { } ####----Attach Geometry if Spatial----#### @@ -445,5 +417,5 @@ utils::globalVariables(c( ":=", "variable_type", "aggregation_strategy", "calculated_variable", "total_population_universe", "area_land_sq_kilometer", "total_population_universe_M", "population_density_land_sq_kilometer", - "population_density_land_sq_kilometer_SE", "data_source_year", "geometry", + "data_source_year", "geometry", "numerator_vars", "numerator_subtract_vars", "denominator_vars", "denominator_subtract_vars")) diff --git a/R/calculate_cvs.R b/R/calculate_cvs.R index 79525da..4bc3aba 100644 --- a/R/calculate_cvs.R +++ b/R/calculate_cvs.R @@ -107,21 +107,6 @@ se_proportion_ratio = function( return(se) } -#' @title Calculate a coefficient of variation -#' @details Return a coefficient of variation at the 90% level -#' @param estimate The estimate -#' @param se The standard error -#' @returns A coefficient of variation at the 90% level -cv = function(estimate, se) { - cv = se / estimate * 100 - - ## when the estimate is zero, this produces an infinite value - ## replacing this with an NA value - cv = dplyr::if_else(is.infinite(cv), NA, cv) - - return(cv) -} - #' @title Calculate standard error for a product of two estimates #' @details Calculate the standard error for an estimate derived by multiplying #' two estimates together. For example, multiplying a proportion by a population @@ -244,20 +229,21 @@ se_weighted_mean = function( return(se) } -#' @title Calculate coefficients of variation -#' @details Create CVs for all ACS estimates and derived indicators. -#' Uses pre-parsed codebook columns (numerator_vars, denominator_vars, -#' se_calculation_type) to determine how to calculate standard errors. +#' @title Calculate margins of error for derived variables +#' @details Calculates margins of error for all derived ACS estimates. Standard +#' errors are computed internally as an intermediate step but are not included +#' in the returned dataframe. Uses pre-parsed codebook columns +#' (numerator_vars, denominator_vars, se_calculation_type) to determine how +#' to calculate standard errors. #' @param .df The dataset returned from \code{compile_acs_data()}. #' The argument to this parameter must have an attribute named `codebook` (as is #' true of results from \code{compile_acs_data())}. -#' @returns A modified dataframe that includes newly calculated indicators. +#' @returns A modified dataframe that includes margins of error (suffixed +#' \code{_M}) for derived variables. #' @keywords internal -calculate_cvs = function(.df) { - warning("Coefficients of variation and related calculated measures of error such - as margins of error (for derived variables) and standard errors are - experimental features and should be used with caution. - Such measures are respectively suffixed with `_CV`, `_M`, and `_SE`.") +calculate_moes = function(.df) { + warning("Margins of error for derived variables are experimental features + and should be used with caution. Such measures are suffixed with `_M`.") ## the codebook attached to the default compile_acs_data() return codebook = .df %>% attr("codebook") @@ -324,7 +310,7 @@ calculate_cvs = function(.df) { .names = "{.col}_M")) ## Step 2: calculate SEs for all variables (using df_with_sum_moes which has derived MOEs) - df_cvs1 = df_with_sum_moes %>% + df_with_ses = df_with_sum_moes %>% dplyr::mutate( dplyr::across( .cols = dplyr::any_of(cv_variables), @@ -401,28 +387,19 @@ calculate_cvs = function(.df) { purrr::map(denominator_estimate_variables, ~ df_with_sum_moes %>% dplyr::pull(.x))))} return(SE)}, - .names = "{.col}_SE")) %>% - dplyr::mutate( - ## create coefficients of variation from standard errors - dplyr::across( - .cols = dplyr::matches("_SE$"), - .fns = ~ cv( - estimate = get(dplyr::cur_column() %>% stringr::str_remove("_SE$")), - se = .x), - .names = "{.col %>% stringr::str_remove('_SE')}_CV")) + .names = "{.col}_SE")) - moe_variables = df_cvs1 %>% + moe_variables = df_with_ses %>% dplyr::select(dplyr::matches("_M$")) %>% colnames() %>% stringr::str_remove("_M$") - se_variables = df_cvs1 %>% + se_variables = df_with_ses %>% dplyr::select(dplyr::matches("_SE$")) %>% colnames() %>% stringr::str_remove("_SE$") - # Variables with SE but not MOE occur when we calculate SEs directly - # (e.g., for complex percentages) but didn't create corresponding MOEs - df_cvs = df_cvs1 %>% + ## Convert SEs to MOEs for variables that don't already have one + df_moes = df_with_ses %>% dplyr::mutate( dplyr::across( .cols = dplyr::all_of(se_variables[!se_variables %in% moe_variables] %>% stringr::str_c("_SE")), @@ -431,9 +408,11 @@ calculate_cvs = function(.df) { ## reduce number of digits dplyr::across( .cols = dplyr::where(is.numeric), - .fns = ~ round(.x, digits = 4))) + .fns = ~ round(.x, digits = 4))) %>% + ## drop intermediate SE columns + dplyr::select(-dplyr::matches("_SE$")) - return(df_cvs) + return(df_moes) } utils::globalVariables(c( diff --git a/R/compile_acs_data.R b/R/compile_acs_data.R index 8953313..0c173ac 100644 --- a/R/compile_acs_data.R +++ b/R/compile_acs_data.R @@ -16,12 +16,24 @@ safe_divide = function(x, y) { dplyr::if_else(y == 0, 0, x / y) } #' @description Construct measures frequently used in social sciences #' research, leveraging \code{tidycensus::get_acs()} to acquire raw estimates from #' the Census Bureau API. -#' @param tables A character vector of table names to include (e.g., -#' \code{c("race", "snap")}). Use \code{list_tables()} to see available tables. -#' When NULL (default) and \code{indicators} is also NULL, all tables are included. -#' @param indicators A character vector of indicator names to include (e.g., -#' \code{c("snap_received_percent")}). Each indicator's parent table is -#' automatically included. +#' @param tables A character vector of table names to include. Two formats are +#' accepted: +#' \itemize{ +#' \item \strong{Registered table names} (e.g., \code{"race"}, \code{"snap"}). +#' These are pre-built tables with curated variable definitions. Use +#' \code{list_tables()} to see all available registered tables. +#' \item \strong{Raw ACS table codes} (e.g., \code{"B25070"}, \code{"C15002B"}). +#' Any valid ACS Detailed or Collapsed table code can be passed directly. +#' These are auto-processed at runtime: raw variables are fetched, the +#' label hierarchy is parsed, and percentages are computed automatically. +#' Use the \code{denominator} parameter to control how percentages are +#' calculated for these tables. +#' } +#' Both formats can be mixed freely (e.g., \code{c("snap", "B25070")}). +#' If an ACS code corresponds to an already-registered table, the registered +#' version is used automatically. +#' When NULL (default), all registered tables are included (unregistered ACS +#' tables must be requested explicitly). #' @param years A numeric vector of four-digit years for which to pull five-year #' American Community Survey estimates. #' @param geography A geography type that is accepted by \code{tidycensus::get_acs()}, e.g., @@ -33,6 +45,12 @@ safe_divide = function(x, y) { dplyr::if_else(y == 0, 0, x / y) } #' will override the \code{states} parameter. If \code{NULL}, all counties in the the #' state(s) specified in the \code{states} parameter will be included. #' @param spatial Boolean. Return a simple features (sf), spatially-enabled dataframe? +#' @param denominator Controls how auto-computed percentages choose their +#' denominator. \code{"parent"} (default) uses the nearest parent subtotal from +#' the ACS label hierarchy. \code{"total"} uses the table total (variable +#' \code{_001}). A specific ACS variable code (e.g., \code{"B25070_001"}) uses +#' that variable. Only affects unregistered (auto) tables; registered tables +#' always use their predefined definitions. #' @param ... Deprecated arguments. If \code{variables} is passed, a deprecation #' warning is issued and the value is ignored. #' @seealso \code{tidycensus::get_acs()}, which this function wraps. @@ -49,21 +67,29 @@ safe_divide = function(x, y) { dplyr::if_else(y == 0, 0, x / y) } #' df = compile_acs_data(tables = c("race", "snap"), years = 2022, #' geography = "county", states = "NJ") #' -#' ## Pull by indicator name (returns the full parent table) -#' df = compile_acs_data(indicators = c("snap_received_percent"), -#' years = 2022, geography = "county", states = "NJ") +#' ## Pull an unregistered ACS table by code +#' df = compile_acs_data(tables = "B25070", years = 2022, +#' geography = "state", states = "DC") +#' +#' ## Mix registered and unregistered tables +#' df = compile_acs_data(tables = c("snap", "B25070"), years = 2022, +#' geography = "state", states = "DC") +#' +#' ## Use table total as denominator instead of parent subtotals +#' df = compile_acs_data(tables = "B25070", denominator = "total", +#' years = 2022, geography = "state", states = "DC") #' } #' @export #' @importFrom magrittr %>% compile_acs_data = function( tables = NULL, - indicators = NULL, - years = c(2022), + years = c(2024), geography = "county", states = NULL, counties = NULL, spatial = FALSE, + denominator = "parent", ...) { ## handle deprecated `variables` parameter and unknown arguments @@ -72,7 +98,7 @@ compile_acs_data = function( lifecycle::deprecate_warn( when = "0.1.0", what = "compile_acs_data(variables)", - details = "The `variables` parameter is ignored. Use `tables` or `indicators` to select specific data, or call with no selection arguments for all tables." + details = "The `variables` parameter is ignored. Use `tables` to select specific data, or call with no selection arguments for all tables." ) } unknown_args = setdiff(names(dots), "variables") @@ -99,23 +125,124 @@ compile_acs_data = function( stop("`years` must be between 2009 (earliest 5-year ACS) and the current year.") } + ## validate denominator parameter + valid_denominator = denominator %in% c("parent", "total") || + grepl("^[BC][0-9]{5}[A-I]?(_[0-9]{3})?$", denominator, perl = TRUE) + if (!valid_denominator) { + stop(paste0("`denominator` must be \"parent\", \"total\", or a valid ACS variable code (e.g., \"B25070_001\"). Got: \"", denominator, "\".")) + } + + ####----Partition tables into registry vs auto (raw ACS codes)----#### + auto_table_entries = list() + registry_tables = tables + raw_acs_codes = character(0) + + if (!is.null(tables)) { + construct_map = build_construct_map() + internal_names = names(.table_registry$tables) + + ## load census variables once for resolve_to_acs_table lookups + suppressMessages({suppressWarnings({ + census_variables_for_resolve = tidycensus::load_variables(year = years[1], dataset = "acs5") + })}) + + ## collect all acs_tables from registered tables to detect overlap + registered_acs_codes = purrr::map(internal_names, function(tn) { + entry = get_table(tn) + if (!is.null(entry[["acs_tables"]])) entry[["acs_tables"]] else character(0) + }) %>% unlist() %>% unique() + + ## helper: find the registered table covering a given ACS code + find_covering_table = function(acs_code) { + purrr::detect(internal_names, function(tn) { + entry = get_table(tn) + acs_code %in% entry[["acs_tables"]] + }) + } + + ## classify each user-supplied table name + classified = purrr::map(tables, function(tbl) { + if (tbl %in% internal_names || tbl %in% names(construct_map)) { + ## known registry table or construct name + return(list(type = "registry", value = tbl)) + } + if (is_raw_acs_code(tbl)) { + ## raw ACS table code — check for overlap with registered tables + if (tbl %in% registered_acs_codes) { + covering = find_covering_table(tbl) + if (!is.null(covering)) return(list(type = "registry", value = covering)) + } + return(list(type = "auto", value = tbl)) + } + ## try resolving as a cleaned variable name + resolved_code = resolve_to_acs_table(tbl, year = years[1], + census_variables = census_variables_for_resolve) + if (!is.null(resolved_code)) { + if (resolved_code %in% registered_acs_codes) { + covering = find_covering_table(resolved_code) + if (!is.null(covering)) return(list(type = "registry", value = covering)) + } + return(list(type = "auto", value = resolved_code)) + } + ## not resolvable — pass through to resolve_tables() which will error if invalid + list(type = "registry", value = tbl) + }) + + registry_tables = purrr::map_chr( + purrr::keep(classified, ~ .x$type == "registry"), "value") %>% unique() + raw_acs_codes = purrr::map_chr( + purrr::keep(classified, ~ .x$type == "auto"), "value") %>% unique() + } + ####----Resolve tables and variables via the registry----#### ## resolve which tables to include - if (is.null(tables) && is.null(indicators)) { + if (is.null(tables)) { ## default: all internal table names resolved_tables = names(.table_registry$tables) } else { - resolved_tables = resolve_tables(tables = tables, indicators = indicators) + ## pass only registry tables (not raw ACS codes) to resolve_tables + registry_tables_input = if (length(registry_tables) > 0) registry_tables else NULL + resolved_tables = resolve_tables(tables = registry_tables_input) } ## determine whether tigris geometry is needed needs_tigris = isTRUE(spatial) || ("population_density" %in% resolved_tables) + ####----Build auto table entries for raw ACS codes----#### + if (length(raw_acs_codes) > 0) { + ## determine denominator mode and custom denominator + denominator_mode = denominator + custom_denominator = NULL + if (!denominator %in% c("parent", "total")) { + denominator_mode = "custom" + custom_denominator = denominator + } + + suppressMessages({suppressWarnings({ + auto_table_entries = purrr::map(raw_acs_codes, function(code) { + build_auto_table_entry( + table_code = code, + year = years[1], + denominator_mode = denominator_mode, + custom_denominator = custom_denominator, + census_variables = census_variables_for_resolve) + }) + })}) + names(auto_table_entries) = raw_acs_codes + } + ## collect raw ACS variables from the registry suppressWarnings({suppressMessages({ variables = collect_raw_variables(resolved_tables = resolved_tables, year = years[1]) })}) + ## append auto-table raw variables + if (length(auto_table_entries) > 0) { + auto_variables = purrr::map(auto_table_entries, ~ .x[["raw_variables"]]) %>% + unname() %>% unlist() + variables = c(variables, auto_variables) + } + ## default values for the states argument if (length(states) == 0) { states = tigris::fips_codes %>% @@ -296,16 +423,23 @@ this function returns.")} } }, .init = df_calculated_estimates) + ## apply auto-table definitions + if (length(auto_table_entries) > 0) { + df_calculated_estimates = purrr::reduce(auto_table_entries, function(.data, auto_entry) { + if (!is.null(auto_entry[["definitions"]]) && length(auto_entry[["definitions"]]) > 0) { + execute_definitions(.data, auto_entry[["definitions"]]) + } else { + .data + } + }, .init = df_calculated_estimates) + } + ####----Generate codebook----#### - ## generate codebook BEFORE the _pct rename so that regex matching in - ## expand_codebook_entry() works on the original _percent column names; - ## the codebook's internal rename (percent -> pct for Count variables) - ## produces the correct output names - codebook = generate_codebook(.data = df_calculated_estimates, resolved_tables = resolved_tables) + codebook = generate_codebook(.data = df_calculated_estimates, + resolved_tables = resolved_tables, + auto_table_entries = auto_table_entries) df_calculated_estimates = df_calculated_estimates %>% - ## these variable names end in "percent", but they're actually count estimates - dplyr::rename_with(.cols = dplyr::matches("household_income.*percent$"), .fn = ~ stringr::str_replace(., "percent$", "pct")) %>% ## ensure the vintage of the data and the GEOID for each observation are the first columns dplyr::select(data_source_year, GEOID, dplyr::everything()) @@ -319,30 +453,29 @@ this function returns.")} df_calculated_estimates = df_calculated_estimates %>% dplyr::left_join( ., - moes %>% - dplyr::rename_with(.cols = dplyr::matches("household_income.*percent_M$"), .fn = ~ stringr::str_replace(., "percent_M$", "pct_M")), + moes, by = c("GEOID", "data_source_year")) - ####----Calculate CVs----#### + ####----Calculate MOEs for derived variables----#### attr(df_calculated_estimates, "codebook") = codebook suppressMessages({suppressWarnings({ - df_cvs = calculate_cvs(df_calculated_estimates) %>% + df_moes = calculate_moes(df_calculated_estimates) %>% {if (!needs_tigris || spatial == FALSE) . else dplyr::right_join(., geometries %>% dplyr::select(GEOID, data_source_year), by = c("GEOID", "data_source_year"), relationship = "one-to-one")} })}) ## attach the codebook and resolved tables as attributes to the returned dataset - attr(df_cvs, "codebook") = codebook %>% + attr(df_moes, "codebook") = codebook %>% dplyr::select(calculated_variable, variable_type, definition, dplyr::everything()) - attr(df_cvs, "resolved_tables") = resolved_tables + attr(df_moes, "resolved_tables") = resolved_tables - if (isTRUE(spatial)) { df_cvs = sf::st_as_sf(df_cvs) } + if (isTRUE(spatial)) { df_moes = sf::st_as_sf(df_moes) } - return(df_cvs) + return(df_moes) } utils::globalVariables(c( "ALAND", "AWATER", "area_land_sq_kilometer", "area_water_sq_kilometer", "total_population_universe", "state", "GEOID", "data_source_year", ".", "state_code", "county_code", "county_fips", "state_name", "county", - "needs_tigris", "resolved_tables")) + "needs_tigris", "resolved_tables", "auto_table_entries")) diff --git a/R/generate_codebook.R b/R/generate_codebook.R index 971dfe4..c26b0b7 100644 --- a/R/generate_codebook.R +++ b/R/generate_codebook.R @@ -6,6 +6,8 @@ #' @param .data The dataset returned from \code{urbnindicators::compile_acs_data()}. #' @param resolved_tables A character vector of resolved table names from the #' table registry. When NULL (default), all registered tables are used. +#' @param auto_table_entries A list of auto-generated table entries from +#' \code{build_auto_table_entry()}. Default is an empty list. #' @returns A tibble containing the names and definitions of variables returned from #' \code{urbnindicators::compile_acs_data()}. #' @examples @@ -16,13 +18,13 @@ #' states = "NJ", #' counties = NULL, #' spatial = FALSE) %>% -#' dplyr::select(-dplyr::matches("_M$|_SE$|_CV$")) +#' dplyr::select(-dplyr::matches("_M$")) #' codebook = generate_codebook(.data = df) #' } #' @importFrom magrittr %>% #' @keywords internal -generate_codebook = function(.data, resolved_tables = NULL) { +generate_codebook = function(.data, resolved_tables = NULL, auto_table_entries = list()) { .data = .data %>% sf::st_drop_geometry() @@ -83,6 +85,19 @@ generate_codebook = function(.data, resolved_tables = NULL) { stringsAsFactors = FALSE) } + ## Add auto-table raw variables to crosswalk + if (length(auto_table_entries) > 0) { + auto_crosswalk_rows = purrr::map(auto_table_entries, function(auto_entry) { + raw_variable_codes = auto_entry[["raw_variables"]] + clean_names_vec = names(raw_variable_codes) %>% stringr::str_remove("_$") + data.frame( + raw_name = as.character(raw_variable_codes), + clean_name = clean_names_vec, + stringsAsFactors = FALSE) + }) + crosswalk_rows = c(crosswalk_rows, auto_crosswalk_rows) + } + variable_name_crosswalk = dplyr::bind_rows(crosswalk_rows) %>% dplyr::distinct(clean_name, .keep_all = TRUE) @@ -105,6 +120,26 @@ generate_codebook = function(.data, resolved_tables = NULL) { ~ expand_codebook_entry(entry = .x, .data = .data, crosswalk = variable_name_crosswalk)) %>% purrr::list_rbind() }) %>% purrr::list_rbind() + ## Expand auto-table definitions into codebook rows + if (length(auto_table_entries) > 0) { + auto_documentation = purrr::map(auto_table_entries, function(auto_entry) { + if (is.null(auto_entry[["definitions"]]) || length(auto_entry[["definitions"]]) == 0) { + return(tibble::tibble(calculated_variable = character(0), + variable_type = character(0), + definition = character(0), + numerator_vars = list(), + numerator_subtract_vars = list(), + denominator_vars = list(), + denominator_subtract_vars = list())) + } + purrr::map( + auto_entry[["definitions"]], + ~ expand_codebook_entry(entry = .x, .data = .data, crosswalk = variable_name_crosswalk)) %>% purrr::list_rbind() + }) %>% purrr::list_rbind() + + partial_documentation = dplyr::bind_rows(partial_documentation, auto_documentation) + } + ####----Raw Variables----#### ## collect all raw variable clean names from the resolved tables raw_variable_names = variable_name_crosswalk$clean_name %>% @@ -154,15 +189,7 @@ generate_codebook = function(.data, resolved_tables = NULL) { stringr::str_detect(calculated_variable, "quintile") ~ "Quintile ($)", stringr::str_detect(calculated_variable, "index") ~ "Index", is.na(variable_type) ~ "Count", - .default = variable_type)) %>% - dplyr::mutate( - calculated_variable = dplyr::if_else( - stringr::str_detect(calculated_variable, "percent$") & variable_type == "Count", - stringr::str_replace(calculated_variable, "percent$", "pct"), - calculated_variable), - definition = dplyr::case_when( - stringr::str_detect(definition, "household_income_by_gross_rent") ~ stringr::str_replace_all(definition, "percent ", "pct "), - TRUE ~ definition)) + .default = variable_type)) ####----Add SE Calculation Type and Aggregation Strategy----#### result1 = result1 %>% diff --git a/R/list_acs_variables.R b/R/list_acs_variables.R index 4e68f5e..887596a 100644 --- a/R/list_acs_variables.R +++ b/R/list_acs_variables.R @@ -57,7 +57,7 @@ filter_variables = function(variable_vector, match_string, match_type = "positiv #' `r lifecycle::badge("deprecated")` #' #' Use [list_variables()] instead to see available variables, or pass -#' `tables`/`indicators` to [compile_acs_data()]. +#' `tables` to [compile_acs_data()]. #' @param year The year for which variable names should be selected. #' @param tables An optional character vector of table names from the table #' registry (e.g., \code{c("race", "snap")}). When provided, only variables @@ -74,9 +74,54 @@ list_acs_variables = function(year = "2022", tables = NULL) { when = "0.1.0", what = "list_acs_variables()", with = "list_variables()", - details = "Use `list_variables()` to see available variables, or pass `tables`/`indicators` to `compile_acs_data()`." + details = "Use `list_variables()` to see available variables, or pass `tables` to `compile_acs_data()`." ) invisible(NULL) } +#' @title Browse the ACS codebook with clean variable names +#' @description Returns a tibble of ACS variables for the given year, with the +#' parent table code, raw variable code, and a cleaned snake_case name. +#' Useful for finding the table code to pass to +#' \code{compile_acs_data(tables = ...)}. +#' @param year A four-digit year for the five-year ACS estimates (default 2022). +#' @param table An optional ACS table code (e.g., \code{"B22003"}) to filter +#' results to a single table. +#' @returns A tibble with columns \code{table} (parent ACS table code), +#' \code{variable_raw} (ACS variable code), and \code{variable_clean} +#' (snake_case name produced by the package). +#' @examples +#' \dontrun{ +#' ## Browse all variables +#' get_acs_codebook() +#' +#' ## Filter to a specific table +#' get_acs_codebook(table = "B22003") +#' +#' ## Search for variables by keyword +#' get_acs_codebook() %>% dplyr::filter(stringr::str_detect(variable_clean, "snap")) +#' } +#' @export +get_acs_codebook = function(year = 2022, table = NULL) { + suppressWarnings({suppressMessages({ + census_variables = tidycensus::load_variables(year = year, dataset = "acs5") + })}) + + if (!is.null(table)) { + pattern = paste0("^", table, "_") + census_variables = census_variables %>% + dplyr::filter(stringr::str_detect(name, pattern)) + if (nrow(census_variables) == 0) { + stop(paste0("No variables found for table '", table, "' in year ", year, ".")) + } + } + + census_variables %>% + clean_acs_names() %>% + dplyr::transmute( + table = stringr::str_extract(name, "^[BC][0-9]{5}[A-I]?(?:PR)?"), + variable_raw = name, + variable_clean = stringr::str_remove(clean_names, "_$")) +} + utils::globalVariables(c("name", "concept", "label", "clean_names")) diff --git a/R/table_registry.R b/R/table_registry.R index 11c6dc1..41533f4 100644 --- a/R/table_registry.R +++ b/R/table_registry.R @@ -281,6 +281,9 @@ execute_definitions = function(.data, definitions) { #' requested via the \code{tables} parameter of \code{compile_acs_data()}. #' Multi-construct tables (e.g., \code{sex_by_age}) are reported as their #' individual constructs (e.g., \code{"age"} and \code{"sex"}). +#' Note: only pre-registered tables are listed here. Any valid ACS table code +#' (e.g., \code{"B25070"}) can also be passed to \code{compile_acs_data(tables = ...)} +#' and will be auto-processed. #' @returns A character vector of table names. #' @examples #' list_tables() @@ -293,52 +296,10 @@ list_tables = function() { sort(unique(all_names)) } -#' @title List available indicators -#' @description Returns a tibble of all derived indicator names and their -#' parent tables, for use with the \code{indicators} parameter of -#' \code{compile_acs_data()}. Table names reflect construct-level names -#' (e.g., \code{"age"} rather than \code{"sex_by_age"}). -#' @returns A tibble with columns \code{indicator} and \code{table}. -#' @keywords internal -list_indicators = function() { - purrr::map( - names(.table_registry$tables), - function(table_name) { - table_entry = .table_registry$tables[[table_name]] - indicators = character(0) - if (!is.null(table_entry[["definitions"]])) { - ## use [["output"]] to avoid partial matching (e.g., matching output_naming_function) - ## skip entries without a literal "output" field (across_percent, across_sum - ## produce dynamic output names that depend on the data) - indicators = purrr::compact( - purrr::map(table_entry[["definitions"]], function(entry) { - out = entry[["output"]] - if (is.character(out)) out else NULL - })) %>% - unlist() %>% - unique() - } - if (length(indicators) > 0) { - ## assign construct-level table names - constructs = table_entry[["constructs"]] - table_names = purrr::map_chr(indicators, function(indicator) { - if (!is.null(constructs)) { - assign_construct(indicator, constructs, table_name) - } else { - table_name - } - }) - tibble::tibble(indicator = indicators, table = table_names) - } else { - tibble::tibble(indicator = character(0), table = character(0)) - } - }) %>% purrr::list_rbind() -} - ## Resolve user selections to full set of internal table names (internal) -## Always includes total_population. Resolves indicator names to parent tables. +## Always includes total_population. ## Accepts both construct names (e.g., "age") and internal names (e.g., "sex_by_age"). -resolve_tables = function(tables = NULL, indicators = NULL) { +resolve_tables = function(tables = NULL) { resolved = "total_population" construct_map = build_construct_map() internal_names = names(.table_registry$tables) @@ -362,24 +323,6 @@ resolve_tables = function(tables = NULL, indicators = NULL) { resolved = union(resolved, mapped) } - if (!is.null(indicators)) { - indicator_table = list_indicators() - unknown_indicators = indicators[!indicators %in% indicator_table$indicator] - if (length(unknown_indicators) > 0) { - stop(paste0("Unknown indicator(s): ", paste0(unknown_indicators, collapse = ", "), - ". Use list_indicators() to see available indicators.")) - } - ## list_indicators() returns construct-level table names; map them to internal names - parent_constructs = indicator_table %>% - dplyr::filter(indicator %in% indicators) %>% - dplyr::pull(table) %>% - unique() - parent_tables = unique(purrr::map_chr(parent_constructs, function(construct_name) { - if (construct_name %in% names(construct_map)) construct_map[[construct_name]] else construct_name - })) - resolved = union(resolved, parent_tables) - } - ## resolve dependencies resolved = purrr::reduce(resolved, function(accumulated, table_name) { table_entry = get_table(table_name) @@ -431,7 +374,10 @@ collect_raw_variables = function(resolved_tables, year = 2022) { #' @title List all variables and their tables #' @description Returns a tibble mapping all variables (raw ACS variables and #' computed indicators) to their construct-level table name. This provides a -#' comprehensive view of every variable that \code{compile_acs_data()} produces. +#' comprehensive view of every variable that \code{compile_acs_data()} produces +#' for registered tables. Variables from unregistered ACS tables passed as +#' raw codes (e.g., \code{"B25070"}) are not included here; they are +#' auto-generated at runtime. #' @param year The ACS year used to resolve variable names (default 2022). #' @returns A tibble with columns \code{variable} and \code{table}. #' @examples @@ -511,11 +457,7 @@ list_variables = function(year = 2022) { computed_names = result$computed } - ## Step 3: apply household_income percent -> pct rename (matches compile_acs_data behavior) - raw_clean_names = stringr::str_replace(raw_clean_names, "^(household_income.*)percent$", "\\1pct") - computed_names = stringr::str_replace(computed_names, "^(household_income.*)percent$", "\\1pct") - - ## Step 4: assign each variable to a construct + ## Step 3: assign each variable to a construct all_variable_names = unique(c(raw_clean_names, computed_names)) purrr::map(all_variable_names, function(variable_name) { if (!is.null(constructs)) { @@ -1167,27 +1109,27 @@ register_table(list( raw_variables = NULL, definitions = list( define_percent("cost_burdened_30percentormore_allincomes_percent", - numerator_regex = "household_income_by_gross_rent.*(30_0|35_0|40_0|50_0).*(percent)", + numerator_regex = "household_income_by_gross_rent.*(30_0|35_0|40_0|50_0).*(pct)", denominator_regex = "household_income_by_gross_rent.*([0-9]$|100000_more$)", subtract_regex = "household_income.*not_computed"), define_percent("cost_burdened_50percentormore_allincomes_percent", - numerator_regex = "household_income_by_gross_rent.*50_0.*percent", + numerator_regex = "household_income_by_gross_rent.*50_0.*pct", denominator_regex = "household_income_by_gross_rent.*([0-9]$|100000_more$)", subtract_regex = "household_income.*not_computed"), define_percent("cost_burdened_30percentormore_incomeslessthan35000_percent", - numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999).*(30_0|35_0|40_0|50_0).*(percent)", + numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999).*(30_0|35_0|40_0|50_0).*(pct)", denominator_regex = "household_income_by_gross_rent.*(10000|19999|34999)$", subtract_regex = "household_income.*(10000_|19999|34999).*not_computed"), define_percent("cost_burdened_50percentormore_incomeslessthan35000_percent", - numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999).*50_0.*(percent)", + numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999).*50_0.*(pct)", denominator_regex = "household_income_by_gross_rent.*(10000|19999|34999)$", subtract_regex = "household_income.*(10000_|19999|34999).*not_computed"), define_percent("cost_burdened_30percentormore_incomeslessthan50000_percent", - numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999|49999).*(30_0|35_0|40_0|50_0).*(percent)", + numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999|49999).*(30_0|35_0|40_0|50_0).*(pct)", denominator_regex = "household_income_by_gross_rent.*(10000|19999|34999|49999)$", subtract_regex = "household_income.*(10000_|19999|34999|49999).*not_computed"), define_percent("cost_burdened_50percentormore_incomeslessthan50000_percent", - numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999|49999).*50_0.*percent", + numerator_regex = "household_income_by_gross_rent.*(10000_|19999|34999|49999).*50_0.*pct", denominator_regex = "household_income_by_gross_rent.*(10000|19999|34999|49999)$", subtract_regex = "household_income.*(10000_|19999|34999|49999).*not_computed")) )) diff --git a/R/utils-clean-names.R b/R/utils-clean-names.R index 0e20210..23a2326 100644 --- a/R/utils-clean-names.R +++ b/R/utils-clean-names.R @@ -41,7 +41,8 @@ clean_acs_names = function(variables_df) { "american_indian_alaska_native" = "aian", "black_african_american" = "black", "household_income_by_gross_rent_as_a_percentage_of_household_income_in_the_past_12_months" = - "household_income_by_gross_rent_as_a_percentage_of_household_income")), + "household_income_by_gross_rent_as_a_percentage_of_household_income", + "_percent$" = "_pct")), clean_names = dplyr::if_else( label %in% c("Estimate!!Total:", "Estimate!!Total"), paste0(clean_names, "_universe_"), diff --git a/README.Rmd b/README.Rmd index bae9d07..dc7151d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -40,8 +40,8 @@ With a single function call, you get: - Access to hundreds of standardized variables, such as percentages and the raw count variables used to produce them. -- Margins of error and coefficients of variation for all - variables--those direct from the API as well as derived variables. +- Margins of error for all variables--those direct from the API as + well as derived variables. - Meaningful, consistent variable names. @@ -119,6 +119,8 @@ df = compile_acs_data( states = "NJ") glimpse(df) |> head(10) + +?compile_acs_data ``` ## Visualize Data @@ -138,7 +140,7 @@ plot_data = df %>% transmute( county_name = NAME %>% str_remove(" County, New Jersey"), race_personofcolor_percent, - race_personofcolor_percent_SE, + race_personofcolor_percent_M, data_source_year = factor(data_source_year)) state_averages = plot_data %>% @@ -179,7 +181,7 @@ ggplot() + x = county_name, ydist = distributional::dist_normal( race_personofcolor_percent, - race_personofcolor_percent_SE), + race_personofcolor_percent_M / 1.645), color = data_source_year), point_size = 2, .width = .95) + diff --git a/README.md b/README.md index c5d920c..286cd77 100644 --- a/README.md +++ b/README.md @@ -24,8 +24,8 @@ With a single function call, you get: - Access to hundreds of standardized variables, such as percentages and the raw count variables used to produce them. -- Margins of error and coefficients of variation for all variables–those - direct from the API as well as derived variables. +- Margins of error for all variables–those direct from the API as well + as derived variables. - Meaningful, consistent variable names. diff --git a/debug_auto.R b/debug_auto.R new file mode 100644 index 0000000..f49492c --- /dev/null +++ b/debug_auto.R @@ -0,0 +1,61 @@ +## Debug auto table integration - insert debug prints into compile_acs_data + +devtools::load_all() + +## Temporarily override execute_definition to add debug info +orig_execute_definition = environment(compile_acs_data)$execute_definition + +## Run compile_acs_data and capture the error with full debug info +tryCatch({ + result = compile_acs_data( + tables = "B25070", + years = 2022, + geography = "state", + states = "DC") + cat("SUCCESS\n") + print(colnames(result)[1:10]) +}, error = function(e) { + cat("ERROR:", conditionMessage(e), "\n\n") + + ## Now debug: build the variables ourselves and check naming + entry = build_auto_table_entry("B25070", year = 2022) + tp_vars = suppressMessages(suppressWarnings( + collect_raw_variables(resolved_tables = "total_population", year = 2022))) + auto_vars = entry$raw_variables + all_vars = c(tp_vars, auto_vars) + + cat("Variable keys:\n") + print(names(all_vars)) + + ## Get the actual data the same way compile_acs_data does + df = suppressMessages(tidycensus::get_acs( + geography = "state", + variables = all_vars, + year = 2022, + state = "DC", + survey = "acs5", + output = "wide")) + + df = df %>% + dplyr::mutate(data_source_year = 2022) + + moes = df %>% dplyr::select(GEOID, data_source_year, dplyr::matches("_M$")) + + df = df %>% + dplyr::select(-dplyr::matches("_M$")) %>% + dplyr::rename_with(~ stringr::str_remove(.x, "_E$")) + + cat("\nActual column names:\n") + print(colnames(df)) + + cat("\nFirst definition:\n") + print(entry$definitions[[1]]) + + ## Test execute_definitions + tryCatch({ + df2 = execute_definitions(df, entry$definitions) + cat("\nexecute_definitions succeeded\n") + }, error = function(e2) { + cat("\nexecute_definitions failed:", conditionMessage(e2), "\n") + }) +}) diff --git a/man/calculate_cvs.Rd b/man/calculate_cvs.Rd deleted file mode 100644 index d29d567..0000000 --- a/man/calculate_cvs.Rd +++ /dev/null @@ -1,25 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/calculate_cvs.R -\name{calculate_cvs} -\alias{calculate_cvs} -\title{Calculate coefficients of variation} -\usage{ -calculate_cvs(.df) -} -\arguments{ -\item{.df}{The dataset returned from \code{compile_acs_data()}. -The argument to this parameter must have an attribute named \code{codebook} (as is -true of results from \code{compile_acs_data())}.} -} -\value{ -A modified dataframe that includes newly calculated indicators. -} -\description{ -Calculate coefficients of variation -} -\details{ -Create CVs for all ACS estimates and derived indicators. -Uses pre-parsed codebook columns (numerator_vars, denominator_vars, -se_calculation_type) to determine how to calculate standard errors. -} -\keyword{internal} diff --git a/man/calculate_moes.Rd b/man/calculate_moes.Rd new file mode 100644 index 0000000..be76c34 --- /dev/null +++ b/man/calculate_moes.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/calculate_cvs.R +\name{calculate_moes} +\alias{calculate_moes} +\title{Calculate margins of error for derived variables} +\usage{ +calculate_moes(.df) +} +\arguments{ +\item{.df}{The dataset returned from \code{compile_acs_data()}. +The argument to this parameter must have an attribute named \code{codebook} (as is +true of results from \code{compile_acs_data())}.} +} +\value{ +A modified dataframe that includes margins of error (suffixed +\code{_M}) for derived variables. +} +\description{ +Calculate margins of error for derived variables +} +\details{ +Calculates margins of error for all derived ACS estimates. Standard +errors are computed internally as an intermediate step but are not included +in the returned dataframe. Uses pre-parsed codebook columns +(numerator_vars, denominator_vars, se_calculation_type) to determine how +to calculate standard errors. +} +\keyword{internal} diff --git a/man/compile_acs_data.Rd b/man/compile_acs_data.Rd index bd182de..e96402b 100644 --- a/man/compile_acs_data.Rd +++ b/man/compile_acs_data.Rd @@ -6,23 +6,34 @@ \usage{ compile_acs_data( tables = NULL, - indicators = NULL, - years = c(2022), + years = c(2024), geography = "county", states = NULL, counties = NULL, spatial = FALSE, + denominator = "parent", ... ) } \arguments{ -\item{tables}{A character vector of table names to include (e.g., -\code{c("race", "snap")}). Use \code{list_tables()} to see available tables. -When NULL (default) and \code{indicators} is also NULL, all tables are included.} - -\item{indicators}{A character vector of indicator names to include (e.g., -\code{c("snap_received_percent")}). Each indicator's parent table is -automatically included.} +\item{tables}{A character vector of table names to include. Two formats are +accepted: +\itemize{ +\item \strong{Registered table names} (e.g., \code{"race"}, \code{"snap"}). +These are pre-built tables with curated variable definitions. Use +\code{list_tables()} to see all available registered tables. +\item \strong{Raw ACS table codes} (e.g., \code{"B25070"}, \code{"C15002B"}). +Any valid ACS Detailed or Collapsed table code can be passed directly. +These are auto-processed at runtime: raw variables are fetched, the +label hierarchy is parsed, and percentages are computed automatically. +Use the \code{denominator} parameter to control how percentages are +calculated for these tables. +} +Both formats can be mixed freely (e.g., \code{c("snap", "B25070")}). +If an ACS code corresponds to an already-registered table, the registered +version is used automatically. +When NULL (default), all registered tables are included (unregistered ACS +tables must be requested explicitly).} \item{years}{A numeric vector of four-digit years for which to pull five-year American Community Survey estimates.} @@ -40,6 +51,13 @@ state(s) specified in the \code{states} parameter will be included.} \item{spatial}{Boolean. Return a simple features (sf), spatially-enabled dataframe?} +\item{denominator}{Controls how auto-computed percentages choose their +denominator. \code{"parent"} (default) uses the nearest parent subtotal from +the ACS label hierarchy. \code{"total"} uses the table total (variable +\code{_001}). A specific ACS variable code (e.g., \code{"B25070_001"}) uses +that variable. Only affects unregistered (auto) tables; registered tables +always use their predefined definitions.} + \item{...}{Deprecated arguments. If \code{variables} is passed, a deprecation warning is issued and the value is ignored.} } @@ -63,9 +81,17 @@ df = compile_acs_data(years = c(2022), geography = "county", states = "NJ") df = compile_acs_data(tables = c("race", "snap"), years = 2022, geography = "county", states = "NJ") -## Pull by indicator name (returns the full parent table) -df = compile_acs_data(indicators = c("snap_received_percent"), - years = 2022, geography = "county", states = "NJ") +## Pull an unregistered ACS table by code +df = compile_acs_data(tables = "B25070", years = 2022, + geography = "state", states = "DC") + +## Mix registered and unregistered tables +df = compile_acs_data(tables = c("snap", "B25070"), years = 2022, + geography = "state", states = "DC") + +## Use table total as denominator instead of parent subtotals +df = compile_acs_data(tables = "B25070", denominator = "total", + years = 2022, geography = "state", states = "DC") } } \seealso{ diff --git a/man/cv.Rd b/man/cv.Rd deleted file mode 100644 index b7f4704..0000000 --- a/man/cv.Rd +++ /dev/null @@ -1,22 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/calculate_cvs.R -\name{cv} -\alias{cv} -\title{Calculate a coefficient of variation} -\usage{ -cv(estimate, se) -} -\arguments{ -\item{estimate}{The estimate} - -\item{se}{The standard error} -} -\value{ -A coefficient of variation at the 90\% level -} -\description{ -Calculate a coefficient of variation -} -\details{ -Return a coefficient of variation at the 90\% level -} diff --git a/man/generate_codebook.Rd b/man/generate_codebook.Rd index 1a2888d..72405b5 100644 --- a/man/generate_codebook.Rd +++ b/man/generate_codebook.Rd @@ -4,13 +4,16 @@ \alias{generate_codebook} \title{Document variables from \code{compile_acs_data()}} \usage{ -generate_codebook(.data, resolved_tables = NULL) +generate_codebook(.data, resolved_tables = NULL, auto_table_entries = list()) } \arguments{ \item{.data}{The dataset returned from \code{urbnindicators::compile_acs_data()}.} \item{resolved_tables}{A character vector of resolved table names from the table registry. When NULL (default), all registered tables are used.} + +\item{auto_table_entries}{A list of auto-generated table entries from +\code{build_auto_table_entry()}. Default is an empty list.} } \value{ A tibble containing the names and definitions of variables returned from @@ -32,7 +35,7 @@ df = compile_acs_data( states = "NJ", counties = NULL, spatial = FALSE) \%>\% - dplyr::select(-dplyr::matches("_M$|_SE$|_CV$")) + dplyr::select(-dplyr::matches("_M$")) codebook = generate_codebook(.data = df) } } diff --git a/man/get_acs_codebook.Rd b/man/get_acs_codebook.Rd new file mode 100644 index 0000000..10d80a4 --- /dev/null +++ b/man/get_acs_codebook.Rd @@ -0,0 +1,37 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/list_acs_variables.R +\name{get_acs_codebook} +\alias{get_acs_codebook} +\title{Browse the ACS codebook with clean variable names} +\usage{ +get_acs_codebook(year = 2022, table = NULL) +} +\arguments{ +\item{year}{A four-digit year for the five-year ACS estimates (default 2022).} + +\item{table}{An optional ACS table code (e.g., \code{"B22003"}) to filter +results to a single table.} +} +\value{ +A tibble with columns \code{table} (parent ACS table code), +\code{variable_raw} (ACS variable code), and \code{variable_clean} +(snake_case name produced by the package). +} +\description{ +Returns a tibble of ACS variables for the given year, with the +parent table code, raw variable code, and a cleaned snake_case name. +Useful for finding the table code to pass to +\code{compile_acs_data(tables = ...)}. +} +\examples{ +\dontrun{ +## Browse all variables +get_acs_codebook() + +## Filter to a specific table +get_acs_codebook(table = "B22003") + +## Search for variables by keyword +get_acs_codebook() \%>\% dplyr::filter(stringr::str_detect(variable_clean, "snap")) +} +} diff --git a/man/list_acs_variables.Rd b/man/list_acs_variables.Rd index b8f5b5f..765a343 100644 --- a/man/list_acs_variables.Rd +++ b/man/list_acs_variables.Rd @@ -21,7 +21,7 @@ available table names.} \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} Use \code{\link[=list_variables]{list_variables()}} instead to see available variables, or pass -\code{tables}/\code{indicators} to \code{\link[=compile_acs_data]{compile_acs_data()}}. +\code{tables} to \code{\link[=compile_acs_data]{compile_acs_data()}}. } \examples{ \dontrun{ diff --git a/man/list_indicators.Rd b/man/list_indicators.Rd deleted file mode 100644 index 2aebb50..0000000 --- a/man/list_indicators.Rd +++ /dev/null @@ -1,18 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/table_registry.R -\name{list_indicators} -\alias{list_indicators} -\title{List available indicators} -\usage{ -list_indicators() -} -\value{ -A tibble with columns \code{indicator} and \code{table}. -} -\description{ -Returns a tibble of all derived indicator names and their -parent tables, for use with the \code{indicators} parameter of -\code{compile_acs_data()}. Table names reflect construct-level names -(e.g., \code{"age"} rather than \code{"sex_by_age"}). -} -\keyword{internal} diff --git a/man/list_tables.Rd b/man/list_tables.Rd index 746d966..9cc6f3f 100644 --- a/man/list_tables.Rd +++ b/man/list_tables.Rd @@ -14,6 +14,9 @@ Returns the names of all registered ACS tables that can be requested via the \code{tables} parameter of \code{compile_acs_data()}. Multi-construct tables (e.g., \code{sex_by_age}) are reported as their individual constructs (e.g., \code{"age"} and \code{"sex"}). +Note: only pre-registered tables are listed here. Any valid ACS table code +(e.g., \code{"B25070"}) can also be passed to \code{compile_acs_data(tables = ...)} +and will be auto-processed. } \examples{ list_tables() diff --git a/man/list_variables.Rd b/man/list_variables.Rd index 381056c..40d52cd 100644 --- a/man/list_variables.Rd +++ b/man/list_variables.Rd @@ -15,7 +15,10 @@ A tibble with columns \code{variable} and \code{table}. \description{ Returns a tibble mapping all variables (raw ACS variables and computed indicators) to their construct-level table name. This provides a -comprehensive view of every variable that \code{compile_acs_data()} produces. +comprehensive view of every variable that \code{compile_acs_data()} produces +for registered tables. Variables from unregistered ACS tables passed as +raw codes (e.g., \code{"B25070"}) are not included here; they are +auto-generated at runtime. } \examples{ \dontrun{ diff --git a/renv.lock b/renv.lock index c0d9724..0d5e238 100644 --- a/renv.lock +++ b/renv.lock @@ -145,6 +145,24 @@ "Maintainer": "Winston Chang ", "Repository": "CRAN" }, + "RColorBrewer": { + "Package": "RColorBrewer", + "Version": "1.1-3", + "Source": "Repository", + "Date": "2022-04-03", + "Title": "ColorBrewer Palettes", + "Authors@R": "c(person(given = \"Erich\", family = \"Neuwirth\", role = c(\"aut\", \"cre\"), email = \"erich.neuwirth@univie.ac.at\"))", + "Author": "Erich Neuwirth [aut, cre]", + "Maintainer": "Erich Neuwirth ", + "Depends": [ + "R (>= 2.0.0)" + ], + "Description": "Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org.", + "License": "Apache License 2.0", + "NeedsCompilation": "no", + "Repository": "https://packagemanager.posit.co/cran/latest", + "Encoding": "UTF-8" + }, "Rcpp": { "Package": "Rcpp", "Version": "1.1.1", @@ -176,7 +194,68 @@ "NeedsCompilation": "yes", "Author": "Dirk Eddelbuettel [aut, cre] (ORCID: ), Romain Francois [aut] (ORCID: ), JJ Allaire [aut] (ORCID: ), Kevin Ushey [aut] (ORCID: ), Qiang Kou [aut] (ORCID: ), Nathan Russell [aut], Iñaki Ucar [aut] (ORCID: ), Doug Bates [aut] (ORCID: ), John Chambers [aut]", "Maintainer": "Dirk Eddelbuettel ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "Rttf2pt1": { + "Package": "Rttf2pt1", + "Version": "1.3.12", + "Source": "GitHub", + "Title": "'ttf2pt1' Program", + "Author": "Winston Chang, Andrew Weeks, Frank M. Siegert, Mark Heath, Thomas Henlick, Sergey Babkin, Turgut Uyar, Rihardas Hepas, Szalay Tamas, Johan Vromans, Petr Titera, Lei Wang, Chen Xiangyang, Zvezdan Petkovic, Rigel, I. Lee Hetherington", + "Maintainer": "Winston Chang ", + "Description": "Contains the program 'ttf2pt1', for use with the 'extrafont' package. This product includes software developed by the 'TTF2PT1' Project and its contributors.", + "Depends": [ + "R (>= 2.15)" + ], + "License": "file LICENSE", + "URL": "https://github.com/wch/Rttf2pt1", + "Encoding": "UTF-8", + "RoxygenNote": "7.2.3", + "RemoteType": "github", + "RemoteHost": "api.github.com", + "RemoteUsername": "wch", + "RemoteRepo": "Rttf2pt1", + "RemoteRef": "main", + "RemoteSha": "f625326af9783f6ae4d42cc5302dd6f2968e008f" + }, + "S7": { + "Package": "S7", + "Version": "0.2.1", + "Source": "Repository", + "Title": "An Object Oriented System Meant to Become a Successor to S3 and S4", + "Authors@R": "c( person(\"Object-Oriented Programming Working Group\", role = \"cph\"), person(\"Davis\", \"Vaughan\", role = \"aut\"), person(\"Jim\", \"Hester\", role = \"aut\", comment = c(ORCID = \"0000-0002-2739-7082\")), person(\"Tomasz\", \"Kalinowski\", role = \"aut\"), person(\"Will\", \"Landau\", role = \"aut\"), person(\"Michael\", \"Lawrence\", role = \"aut\"), person(\"Martin\", \"Maechler\", role = \"aut\", comment = c(ORCID = \"0000-0002-8685-9910\")), person(\"Luke\", \"Tierney\", role = \"aut\"), person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0003-4757-117X\")) )", + "Description": "A new object oriented programming system designed to be a successor to S3 and S4. It includes formal class, generic, and method specification, and a limited form of multiple dispatch. It has been designed and implemented collaboratively by the R Consortium Object-Oriented Programming Working Group, which includes representatives from R-Core, 'Bioconductor', 'Posit'/'tidyverse', and the wider R community.", + "License": "MIT + file LICENSE", + "URL": "https://rconsortium.github.io/S7/, https://github.com/RConsortium/S7", + "BugReports": "https://github.com/RConsortium/S7/issues", + "Depends": [ + "R (>= 3.5.0)" + ], + "Imports": [ + "utils" + ], + "Suggests": [ + "bench", + "callr", + "covr", + "knitr", + "methods", + "rmarkdown", + "testthat (>= 3.2.0)", + "tibble" + ], + "VignetteBuilder": "knitr", + "Config/build/compilation-database": "true", + "Config/Needs/website": "sloop", + "Config/testthat/edition": "3", + "Config/testthat/parallel": "TRUE", + "Config/testthat/start-first": "external-generic", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.3", + "NeedsCompilation": "yes", + "Author": "Object-Oriented Programming Working Group [cph], Davis Vaughan [aut], Jim Hester [aut] (ORCID: ), Tomasz Kalinowski [aut], Will Landau [aut], Michael Lawrence [aut], Martin Maechler [aut] (ORCID: ), Luke Tierney [aut], Hadley Wickham [aut, cre] (ORCID: )", + "Maintainer": "Hadley Wickham ", + "Repository": "https://packagemanager.posit.co/cran/latest" }, "askpass": { "Package": "askpass", @@ -203,6 +282,27 @@ "Maintainer": "Jeroen Ooms ", "Repository": "CRAN" }, + "base64enc": { + "Package": "base64enc", + "Version": "0.1-6", + "Source": "Repository", + "Title": "Tools for 'base64' Encoding", + "Author": "Simon Urbanek [aut, cre, cph] (https://urbanek.nz, ORCID: )", + "Authors@R": "person(\"Simon\", \"Urbanek\", role=c(\"aut\",\"cre\",\"cph\"), email=\"Simon.Urbanek@r-project.org\", comment=c(\"https://urbanek.nz\", ORCID=\"0000-0003-2297-1732\"))", + "Maintainer": "Simon Urbanek ", + "Depends": [ + "R (>= 2.9.0)" + ], + "Enhances": [ + "png" + ], + "Description": "Tools for handling 'base64' encoding. It is more flexible than the orphaned 'base64' package.", + "License": "GPL-2 | GPL-3", + "URL": "https://www.rforge.net/base64enc", + "BugReports": "https://github.com/s-u/base64enc/issues", + "NeedsCompilation": "yes", + "Repository": "CRAN" + }, "bit": { "Package": "bit", "Version": "4.6.0", @@ -270,6 +370,32 @@ "Maintainer": "Michael Chirico ", "Repository": "CRAN" }, + "cachem": { + "Package": "cachem", + "Version": "1.1.0", + "Source": "Repository", + "Title": "Cache R Objects with Automatic Pruning", + "Description": "Key-value stores with automatic pruning. Caches can limit either their total size or the age of the oldest object (or both), automatically pruning objects to maintain the constraints.", + "Authors@R": "c( person(\"Winston\", \"Chang\", , \"winston@posit.co\", c(\"aut\", \"cre\")), person(family = \"Posit Software, PBC\", role = c(\"cph\", \"fnd\")))", + "License": "MIT + file LICENSE", + "Encoding": "UTF-8", + "ByteCompile": "true", + "URL": "https://cachem.r-lib.org/, https://github.com/r-lib/cachem", + "Imports": [ + "rlang", + "fastmap (>= 1.2.0)" + ], + "Suggests": [ + "testthat" + ], + "RoxygenNote": "7.2.3", + "Config/Needs/routine": "lobstr", + "Config/Needs/website": "pkgdown", + "NeedsCompilation": "yes", + "Author": "Winston Chang [aut, cre], Posit Software, PBC [cph, fnd]", + "Maintainer": "Winston Chang ", + "Repository": "CRAN" + }, "class": { "Package": "class", "Version": "7.3-23", @@ -410,6 +536,43 @@ "Maintainer": "Matthew Lincoln ", "Repository": "CRAN" }, + "conflicted": { + "Package": "conflicted", + "Version": "1.2.0", + "Source": "Repository", + "Title": "An Alternative Conflict Resolution Strategy", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@rstudio.com\", role = c(\"aut\", \"cre\")), person(\"RStudio\", role = c(\"cph\", \"fnd\")) )", + "Description": "R's default conflict management system gives the most recently loaded package precedence. This can make it hard to detect conflicts, particularly when they arise because a package update creates ambiguity that did not previously exist. 'conflicted' takes a different approach, making every conflict an error and forcing you to choose which function to use.", + "License": "MIT + file LICENSE", + "URL": "https://conflicted.r-lib.org/, https://github.com/r-lib/conflicted", + "BugReports": "https://github.com/r-lib/conflicted/issues", + "Depends": [ + "R (>= 3.2)" + ], + "Imports": [ + "cli (>= 3.4.0)", + "memoise", + "rlang (>= 1.0.0)" + ], + "Suggests": [ + "callr", + "covr", + "dplyr", + "Matrix", + "methods", + "pkgload", + "testthat (>= 3.0.0)", + "withr" + ], + "Config/Needs/website": "tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Encoding": "UTF-8", + "RoxygenNote": "7.2.3", + "NeedsCompilation": "no", + "Author": "Hadley Wickham [aut, cre], RStudio [cph, fnd]", + "Maintainer": "Hadley Wickham ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, "cpp11": { "Package": "cpp11", "Version": "0.5.3", @@ -455,7 +618,7 @@ "NeedsCompilation": "no", "Author": "Davis Vaughan [aut, cre] (ORCID: ), Jim Hester [aut] (ORCID: ), Romain François [aut] (ORCID: ), Benjamin Kietzman [ctb], Posit Software, PBC [cph, fnd]", "Maintainer": "Davis Vaughan ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "crayon": { "Package": "crayon", @@ -521,6 +684,49 @@ "Maintainer": "Jeroen Ooms ", "Repository": "CRAN" }, + "distributional": { + "Package": "distributional", + "Version": "0.6.0", + "Source": "Repository", + "Title": "Vectorised Probability Distributions", + "Authors@R": "c(person(given = \"Mitchell\", family = \"O'Hara-Wild\", role = c(\"aut\", \"cre\"), email = \"mail@mitchelloharawild.com\", comment = c(ORCID = \"0000-0001-6729-7695\")), person(given = \"Matthew\", family = \"Kay\", role = c(\"aut\"), comment = c(ORCID = \"0000-0001-9446-0419\")), person(given = \"Alex\", family = \"Hayes\", role = c(\"aut\"), comment = c(ORCID = \"0000-0002-4985-5160\")), person(given = \"Rob\", family = \"Hyndman\", role = c(\"aut\"), comment = c(ORCID = \"0000-0002-2140-5352\")), person(given = \"Earo\", family = \"Wang\", role = c(\"ctb\"), comment = c(ORCID = \"0000-0001-6448-5260\")), person(given = \"Vencislav\", family = \"Popov\", role = c(\"ctb\"), comment = c(ORCID = \"0000-0002-8073-4199\")))", + "Description": "Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.", + "License": "GPL-3", + "Depends": [ + "R (>= 4.0.0)" + ], + "Imports": [ + "vctrs (>= 0.3.0)", + "rlang (>= 0.4.5)", + "generics", + "stats", + "numDeriv", + "utils", + "lifecycle", + "pillar" + ], + "Suggests": [ + "testthat (>= 2.1.0)", + "covr", + "mvtnorm", + "actuar (>= 2.0.0)", + "evd", + "ggdist", + "ggplot2", + "gk", + "pkgdown" + ], + "RdMacros": "lifecycle", + "URL": "https://pkg.mitchelloharawild.com/distributional/, https://github.com/mitchelloharawild/distributional", + "BugReports": "https://github.com/mitchelloharawild/distributional/issues", + "Encoding": "UTF-8", + "Language": "en-GB", + "RoxygenNote": "7.3.3", + "NeedsCompilation": "no", + "Author": "Mitchell O'Hara-Wild [aut, cre] (ORCID: ), Matthew Kay [aut] (ORCID: ), Alex Hayes [aut] (ORCID: ), Rob Hyndman [aut] (ORCID: ), Earo Wang [ctb] (ORCID: ), Vencislav Popov [ctb] (ORCID: )", + "Maintainer": "Mitchell O'Hara-Wild ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, "dplyr": { "Package": "dplyr", "Version": "1.2.0", @@ -616,6 +822,108 @@ "Repository": "CRAN", "Encoding": "UTF-8" }, + "extrafont": { + "Package": "extrafont", + "Version": "0.19", + "Source": "GitHub", + "Title": "Tools for Using Fonts", + "Author": "Winston Chang ", + "Maintainer": "Winston Chang ", + "Description": "Tools to using fonts other than the standard PostScript fonts. This package makes it easy to use system TrueType fonts and with PDF or PostScript output files, and with bitmap output files in Windows. extrafont can also be used with fonts packaged specifically to be used with, such as the fontcm package, which has Computer Modern PostScript fonts with math symbols.", + "Depends": [ + "R (>= 2.15)" + ], + "Imports": [ + "extrafontdb", + "grDevices", + "utils", + "Rttf2pt1" + ], + "Suggests": [ + "fontcm" + ], + "License": "GPL-2", + "URL": "https://github.com/wch/extrafont", + "RoxygenNote": "7.1.2", + "RemoteType": "github", + "RemoteHost": "api.github.com", + "RemoteUsername": "wch", + "RemoteRepo": "extrafont", + "RemoteRef": "master", + "RemoteSha": "028fc67103b14318410ad84fa182acc3975b54f2", + "Remotes": "Rttf2pt1=github::wch/Rttf2pt1" + }, + "extrafontdb": { + "Package": "extrafontdb", + "Version": "1.1", + "Source": "Repository", + "Type": "Package", + "Title": "Holding the Database for the 'extrafont' Package", + "Date": "2025-09-25", + "Depends": [ + "R (>= 2.14)" + ], + "Suggests": [ + "testthat (>= 3.0.0)" + ], + "Authors@R": "c( person(given = \"Winston\", family= \"Chang\", role = c(\"aut\")), person(given = \"Frederic\", family= \"Bertrand\", role = c(\"cre\"), email = \"frederic.bertrand@lecnam.net\", comment = c(ORCID = \"0000-0002-0837-8281\")) )", + "Author": "Winston Chang [aut], Frederic Bertrand [cre] (ORCID: )", + "Maintainer": "Frederic Bertrand ", + "Description": "It is meant to be used with the 'extrafont' package. The 'extrafont' package contains the code to install and use fonts, while the 'extrafontdb' package contains the font database.", + "License": "GPL-2", + "LazyLoad": "yes", + "NeedsCompilation": "no", + "URL": "https://github.com/fbertran/extrafontdb", + "BugReports": "https://github.com/fbertran/extrafontdb/issues", + "RoxygenNote": "7.3.3", + "Encoding": "UTF-8", + "Config/testthat/edition": "3", + "Collate": "'extrafontdb.r'", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "farver": { + "Package": "farver", + "Version": "2.1.2", + "Source": "Repository", + "Type": "Package", + "Title": "High Performance Colour Space Manipulation", + "Authors@R": "c( person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Berendea\", \"Nicolae\", role = \"aut\", comment = \"Author of the ColorSpace C++ library\"), person(\"Romain\", \"François\", , \"romain@purrple.cat\", role = \"aut\", comment = c(ORCID = \"0000-0002-2444-4226\")), person(\"Posit, PBC\", role = c(\"cph\", \"fnd\")) )", + "Description": "The encoding of colour can be handled in many different ways, using different colour spaces. As different colour spaces have different uses, efficient conversion between these representations are important. The 'farver' package provides a set of functions that gives access to very fast colour space conversion and comparisons implemented in C++, and offers speed improvements over the 'convertColor' function in the 'grDevices' package.", + "License": "MIT + file LICENSE", + "URL": "https://farver.data-imaginist.com, https://github.com/thomasp85/farver", + "BugReports": "https://github.com/thomasp85/farver/issues", + "Suggests": [ + "covr", + "testthat (>= 3.0.0)" + ], + "Config/testthat/edition": "3", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.1", + "NeedsCompilation": "yes", + "Author": "Thomas Lin Pedersen [cre, aut] (), Berendea Nicolae [aut] (Author of the ColorSpace C++ library), Romain François [aut] (), Posit, PBC [cph, fnd]", + "Maintainer": "Thomas Lin Pedersen ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "fastmap": { + "Package": "fastmap", + "Version": "1.2.0", + "Source": "Repository", + "Title": "Fast Data Structures", + "Authors@R": "c( person(\"Winston\", \"Chang\", email = \"winston@posit.co\", role = c(\"aut\", \"cre\")), person(given = \"Posit Software, PBC\", role = c(\"cph\", \"fnd\")), person(given = \"Tessil\", role = \"cph\", comment = \"hopscotch_map library\") )", + "Description": "Fast implementation of data structures, including a key-value store, stack, and queue. Environments are commonly used as key-value stores in R, but every time a new key is used, it is added to R's global symbol table, causing a small amount of memory leakage. This can be problematic in cases where many different keys are used. Fastmap avoids this memory leak issue by implementing the map using data structures in C++.", + "License": "MIT + file LICENSE", + "Encoding": "UTF-8", + "RoxygenNote": "7.2.3", + "Suggests": [ + "testthat (>= 2.1.1)" + ], + "URL": "https://r-lib.github.io/fastmap/, https://github.com/r-lib/fastmap", + "BugReports": "https://github.com/r-lib/fastmap/issues", + "NeedsCompilation": "yes", + "Author": "Winston Chang [aut, cre], Posit Software, PBC [cph, fnd], Tessil [cph] (hopscotch_map library)", + "Maintainer": "Winston Chang ", + "Repository": "CRAN" + }, "generics": { "Package": "generics", "Version": "0.1.4", @@ -646,6 +954,199 @@ "NeedsCompilation": "no", "Author": "Hadley Wickham [aut, cre] (ORCID: ), Max Kuhn [aut], Davis Vaughan [aut], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Hadley Wickham ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "ggdist": { + "Package": "ggdist", + "Version": "3.3.3", + "Source": "Repository", + "Title": "Visualizations of Distributions and Uncertainty", + "Date": "2025-04-20", + "Authors@R": "c( person(\"Matthew\", \"Kay\", role = c(\"aut\", \"cre\"), email = \"mjskay@northwestern.edu\"), person(\"Brenton M.\", \"Wiernik\", role = \"ctb\", email = \"brenton@wiernik.org\") )", + "Maintainer": "Matthew Kay ", + "Description": "Provides primitives for visualizing distributions using 'ggplot2' that are particularly tuned for visualizing uncertainty in either a frequentist or Bayesian mode. Both analytical distributions (such as frequentist confidence distributions or Bayesian priors) and distributions represented as samples (such as bootstrap distributions or Bayesian posterior samples) are easily visualized. Visualization primitives include but are not limited to: points with multiple uncertainty intervals, eye plots (Spiegelhalter D., 1999) , density plots, gradient plots, dot plots (Wilkinson L., 1999) , quantile dot plots (Kay M., Kola T., Hullman J., Munson S., 2016) , complementary cumulative distribution function barplots (Fernandes M., Walls L., Munson S., Hullman J., Kay M., 2018) , and fit curves with multiple uncertainty ribbons.", + "Depends": [ + "R (>= 4.0.0)" + ], + "Imports": [ + "grid", + "ggplot2 (>= 3.5.0)", + "scales", + "rlang (>= 0.3.0)", + "cli", + "tibble", + "vctrs", + "withr", + "glue", + "gtable", + "distributional (>= 0.3.2)", + "numDeriv", + "quadprog", + "Rcpp" + ], + "Suggests": [ + "tidyselect", + "dplyr (>= 1.0.0)", + "fda", + "posterior (>= 1.4.0)", + "beeswarm (>= 0.4.0)", + "rmarkdown", + "knitr", + "testthat (>= 3.0.0)", + "vdiffr (>= 1.0.0)", + "svglite (>= 2.1.0)", + "fontquiver", + "sysfonts", + "showtext", + "mvtnorm", + "covr", + "broom (>= 0.5.6)", + "patchwork", + "tidyr (>= 1.0.0)", + "ragg (>= 1.3.0)", + "pkgdown" + ], + "License": "GPL (>= 3)", + "Language": "en-US", + "BugReports": "https://github.com/mjskay/ggdist/issues", + "URL": "https://mjskay.github.io/ggdist/, https://github.com/mjskay/ggdist/", + "VignetteBuilder": "knitr", + "RoxygenNote": "7.3.2", + "LazyData": "true", + "Encoding": "UTF-8", + "Collate": "\"ggdist-package.R\" \"util.R\" \"compat.R\" \"rd.R\" \"RcppExports.R\" \"abstract_geom.R\" \"abstract_stat.R\" \"abstract_stat_slabinterval.R\" \"auto_partial.R\" \"binning_methods.R\" \"bounder.R\" \"curve_interval.R\" \"cut_cdf_qi.R\" \"data.R\" \"density.R\" \"distributions.R\" \"draw_key_slabinterval.R\" \"geom.R\" \"geom_slabinterval.R\" \"geom_dotsinterval.R\" \"geom_blur_dots.R\" \"geom_interval.R\" \"geom_lineribbon.R\" \"geom_pointinterval.R\" \"geom_slab.R\" \"geom_spike.R\" \"geom_swarm.R\" \"guide_rampbar.R\" \"interval_widths.R\" \"lkjcorr_marginal.R\" \"parse_dist.R\" \"partial_colour_ramp.R\" \"point_interval.R\" \"position_dodgejust.R\" \"pr.R\" \"rd_density.R\" \"rd_dotsinterval.R\" \"rd_slabinterval.R\" \"rd_spike.R\" \"rd_lineribbon.R\" \"scale_colour_ramp.R\" \"scale_thickness.R\" \"scale_side_mirrored.R\" \"scale_.R\" \"smooth.R\" \"stat.R\" \"stat_slabinterval.R\" \"stat_dotsinterval.R\" \"stat_mcse_dots.R\" \"stat_pointinterval.R\" \"stat_interval.R\" \"stat_lineribbon.R\" \"stat_spike.R\" \"student_t.R\" \"subguide.R\" \"subscale.R\" \"testthat.R\" \"theme_ggdist.R\" \"thickness.R\" \"tidy_format_translators.R\" \"weighted_ecdf.R\" \"weighted_hist.R\" \"weighted_quantile.R\" \"deprecated.R\"", + "Config/testthat/edition": "3", + "LinkingTo": [ + "Rcpp" + ], + "NeedsCompilation": "yes", + "Author": "Matthew Kay [aut, cre], Brenton M. Wiernik [ctb]", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "ggplot2": { + "Package": "ggplot2", + "Version": "4.0.2", + "Source": "Repository", + "Title": "Create Elegant Data Visualisations Using the Grammar of Graphics", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\", comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Winston\", \"Chang\", role = \"aut\", comment = c(ORCID = \"0000-0002-1576-2126\")), person(\"Lionel\", \"Henry\", role = \"aut\"), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Kohske\", \"Takahashi\", role = \"aut\"), person(\"Claus\", \"Wilke\", role = \"aut\", comment = c(ORCID = \"0000-0002-7470-9261\")), person(\"Kara\", \"Woo\", role = \"aut\", comment = c(ORCID = \"0000-0002-5125-4188\")), person(\"Hiroaki\", \"Yutani\", role = \"aut\", comment = c(ORCID = \"0000-0002-3385-7233\")), person(\"Dewey\", \"Dunnington\", role = \"aut\", comment = c(ORCID = \"0000-0002-9415-4582\")), person(\"Teun\", \"van den Brand\", role = \"aut\", comment = c(ORCID = \"0000-0002-9335-7468\")), person(\"Posit, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", + "Description": "A system for 'declaratively' creating graphics, based on \"The Grammar of Graphics\". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.", + "License": "MIT + file LICENSE", + "URL": "https://ggplot2.tidyverse.org, https://github.com/tidyverse/ggplot2", + "BugReports": "https://github.com/tidyverse/ggplot2/issues", + "Depends": [ + "R (>= 4.1)" + ], + "Imports": [ + "cli", + "grDevices", + "grid", + "gtable (>= 0.3.6)", + "isoband", + "lifecycle (> 1.0.1)", + "rlang (>= 1.1.0)", + "S7", + "scales (>= 1.4.0)", + "stats", + "vctrs (>= 0.6.0)", + "withr (>= 2.5.0)" + ], + "Suggests": [ + "broom", + "covr", + "dplyr", + "ggplot2movies", + "hexbin", + "Hmisc", + "hms", + "knitr", + "mapproj", + "maps", + "MASS", + "mgcv", + "multcomp", + "munsell", + "nlme", + "profvis", + "quantreg", + "quarto", + "ragg (>= 1.2.6)", + "RColorBrewer", + "roxygen2", + "rpart", + "sf (>= 0.7-3)", + "svglite (>= 2.1.2)", + "testthat (>= 3.1.5)", + "tibble", + "vdiffr (>= 1.0.6)", + "xml2" + ], + "Enhances": [ + "sp" + ], + "VignetteBuilder": "quarto", + "Config/Needs/website": "ggtext, tidyr, forcats, tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2025-04-23", + "Encoding": "UTF-8", + "LazyData": "true", + "RoxygenNote": "7.3.3", + "Collate": "'ggproto.R' 'ggplot-global.R' 'aaa-.R' 'aes-colour-fill-alpha.R' 'aes-evaluation.R' 'aes-group-order.R' 'aes-linetype-size-shape.R' 'aes-position.R' 'all-classes.R' 'compat-plyr.R' 'utilities.R' 'aes.R' 'annotation-borders.R' 'utilities-checks.R' 'legend-draw.R' 'geom-.R' 'annotation-custom.R' 'annotation-logticks.R' 'scale-type.R' 'layer.R' 'make-constructor.R' 'geom-polygon.R' 'geom-map.R' 'annotation-map.R' 'geom-raster.R' 'annotation-raster.R' 'annotation.R' 'autolayer.R' 'autoplot.R' 'axis-secondary.R' 'backports.R' 'bench.R' 'bin.R' 'coord-.R' 'coord-cartesian-.R' 'coord-fixed.R' 'coord-flip.R' 'coord-map.R' 'coord-munch.R' 'coord-polar.R' 'coord-quickmap.R' 'coord-radial.R' 'coord-sf.R' 'coord-transform.R' 'data.R' 'docs_layer.R' 'facet-.R' 'facet-grid-.R' 'facet-null.R' 'facet-wrap.R' 'fortify-map.R' 'fortify-models.R' 'fortify-spatial.R' 'fortify.R' 'stat-.R' 'geom-abline.R' 'geom-rect.R' 'geom-bar.R' 'geom-tile.R' 'geom-bin2d.R' 'geom-blank.R' 'geom-boxplot.R' 'geom-col.R' 'geom-path.R' 'geom-contour.R' 'geom-point.R' 'geom-count.R' 'geom-crossbar.R' 'geom-segment.R' 'geom-curve.R' 'geom-defaults.R' 'geom-ribbon.R' 'geom-density.R' 'geom-density2d.R' 'geom-dotplot.R' 'geom-errorbar.R' 'geom-freqpoly.R' 'geom-function.R' 'geom-hex.R' 'geom-histogram.R' 'geom-hline.R' 'geom-jitter.R' 'geom-label.R' 'geom-linerange.R' 'geom-pointrange.R' 'geom-quantile.R' 'geom-rug.R' 'geom-sf.R' 'geom-smooth.R' 'geom-spoke.R' 'geom-text.R' 'geom-violin.R' 'geom-vline.R' 'ggplot2-package.R' 'grob-absolute.R' 'grob-dotstack.R' 'grob-null.R' 'grouping.R' 'properties.R' 'margins.R' 'theme-elements.R' 'guide-.R' 'guide-axis.R' 'guide-axis-logticks.R' 'guide-axis-stack.R' 'guide-axis-theta.R' 'guide-legend.R' 'guide-bins.R' 'guide-colorbar.R' 'guide-colorsteps.R' 'guide-custom.R' 'guide-none.R' 'guide-old.R' 'guides-.R' 'guides-grid.R' 'hexbin.R' 'import-standalone-obj-type.R' 'import-standalone-types-check.R' 'labeller.R' 'labels.R' 'layer-sf.R' 'layout.R' 'limits.R' 'performance.R' 'plot-build.R' 'plot-construction.R' 'plot-last.R' 'plot.R' 'position-.R' 'position-collide.R' 'position-dodge.R' 'position-dodge2.R' 'position-identity.R' 'position-jitter.R' 'position-jitterdodge.R' 'position-nudge.R' 'position-stack.R' 'quick-plot.R' 'reshape-add-margins.R' 'save.R' 'scale-.R' 'scale-alpha.R' 'scale-binned.R' 'scale-brewer.R' 'scale-colour.R' 'scale-continuous.R' 'scale-date.R' 'scale-discrete-.R' 'scale-expansion.R' 'scale-gradient.R' 'scale-grey.R' 'scale-hue.R' 'scale-identity.R' 'scale-linetype.R' 'scale-linewidth.R' 'scale-manual.R' 'scale-shape.R' 'scale-size.R' 'scale-steps.R' 'scale-view.R' 'scale-viridis.R' 'scales-.R' 'stat-align.R' 'stat-bin.R' 'stat-summary-2d.R' 'stat-bin2d.R' 'stat-bindot.R' 'stat-binhex.R' 'stat-boxplot.R' 'stat-connect.R' 'stat-contour.R' 'stat-count.R' 'stat-density-2d.R' 'stat-density.R' 'stat-ecdf.R' 'stat-ellipse.R' 'stat-function.R' 'stat-identity.R' 'stat-manual.R' 'stat-qq-line.R' 'stat-qq.R' 'stat-quantilemethods.R' 'stat-sf-coordinates.R' 'stat-sf.R' 'stat-smooth-methods.R' 'stat-smooth.R' 'stat-sum.R' 'stat-summary-bin.R' 'stat-summary-hex.R' 'stat-summary.R' 'stat-unique.R' 'stat-ydensity.R' 'summarise-plot.R' 'summary.R' 'theme.R' 'theme-defaults.R' 'theme-current.R' 'theme-sub.R' 'utilities-break.R' 'utilities-grid.R' 'utilities-help.R' 'utilities-patterns.R' 'utilities-resolution.R' 'utilities-tidy-eval.R' 'zxx.R' 'zzz.R'", + "NeedsCompilation": "no", + "Author": "Hadley Wickham [aut] (ORCID: ), Winston Chang [aut] (ORCID: ), Lionel Henry [aut], Thomas Lin Pedersen [aut, cre] (ORCID: ), Kohske Takahashi [aut], Claus Wilke [aut] (ORCID: ), Kara Woo [aut] (ORCID: ), Hiroaki Yutani [aut] (ORCID: ), Dewey Dunnington [aut] (ORCID: ), Teun van den Brand [aut] (ORCID: ), Posit, PBC [cph, fnd] (ROR: )", + "Maintainer": "Thomas Lin Pedersen ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "ggrepel": { + "Package": "ggrepel", + "Version": "0.9.7", + "Source": "Repository", + "Authors@R": "c( person(\"Kamil\", \"Slowikowski\", email = \"kslowikowski@gmail.com\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-2843-6370\")), person(\"Teun\", \"van den Brand\", role = \"ctb\", comment = c(ORCID = \"0000-0002-9335-7468\")), person(\"Alicia\", \"Schep\", role = \"ctb\", comment = c(ORCID = \"0000-0002-3915-0618\")), person(\"Sean\", \"Hughes\", role = \"ctb\", comment = c(ORCID = \"0000-0002-9409-9405\")), person(\"Trung Kien\", \"Dang\", role = \"ctb\", comment = c(ORCID = \"0000-0001-7562-6495\")), person(\"Saulius\", \"Lukauskas\", role = \"ctb\"), person(\"Jean-Olivier\", \"Irisson\", role = \"ctb\", comment = c(ORCID = \"0000-0003-4920-3880\")), person(\"Zhian N\", \"Kamvar\", role = \"ctb\", comment = c(ORCID = \"0000-0003-1458-7108\")), person(\"Thompson\", \"Ryan\", role = \"ctb\", comment = c(ORCID = \"0000-0002-0450-8181\")), person(\"Dervieux\", \"Christophe\", role = \"ctb\", comment = c(ORCID = \"0000-0003-4474-2498\")), person(\"Yutani\", \"Hiroaki\", role = \"ctb\"), person(\"Pierre\", \"Gramme\", role = \"ctb\"), person(\"Amir Masoud\", \"Abdol\", role = \"ctb\"), person(\"Malcolm\", \"Barrett\", role = \"ctb\", comment = c(ORCID = \"0000-0003-0299-5825\")), person(\"Robrecht\", \"Cannoodt\", role = \"ctb\", comment = c(ORCID = \"0000-0003-3641-729X\")), person(\"Michał\", \"Krassowski\", role = \"ctb\", comment = c(ORCID = \"0000-0002-9638-7785\")), person(\"Michael\", \"Chirico\", role = \"ctb\", comment = c(ORCID = \"0000-0003-0787-087X\")), person(\"Pedro\", \"Aphalo\", role = \"ctb\", comment = c(ORCID = \"0000-0003-3385-972X\")), person(\"Francis\", \"Barton\", role = \"ctb\") )", + "Title": "Automatically Position Non-Overlapping Text Labels with 'ggplot2'", + "Description": "Provides text and label geoms for 'ggplot2' that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.", + "Depends": [ + "R (>= 4.5.0)", + "ggplot2 (>= 3.5.2)" + ], + "Imports": [ + "grid", + "Rcpp", + "rlang (>= 1.1.6)", + "S7", + "scales (>= 1.4.0)", + "withr (>= 3.0.2)" + ], + "Suggests": [ + "knitr", + "rmarkdown", + "testthat", + "svglite", + "vdiffr", + "gridExtra", + "ggpp", + "patchwork", + "devtools", + "prettydoc", + "ggbeeswarm", + "dplyr", + "magrittr", + "readr", + "stringr", + "marquee", + "rsvg", + "sf" + ], + "VignetteBuilder": "knitr", + "License": "GPL-3 | file LICENSE", + "URL": "https://ggrepel.slowkow.com/, https://github.com/slowkow/ggrepel", + "BugReports": "https://github.com/slowkow/ggrepel/issues", + "RoxygenNote": "7.3.3", + "LinkingTo": [ + "Rcpp" + ], + "Encoding": "UTF-8", + "NeedsCompilation": "yes", + "Author": "Kamil Slowikowski [aut, cre] (ORCID: ), Teun van den Brand [ctb] (ORCID: ), Alicia Schep [ctb] (ORCID: ), Sean Hughes [ctb] (ORCID: ), Trung Kien Dang [ctb] (ORCID: ), Saulius Lukauskas [ctb], Jean-Olivier Irisson [ctb] (ORCID: ), Zhian N Kamvar [ctb] (ORCID: ), Thompson Ryan [ctb] (ORCID: ), Dervieux Christophe [ctb] (ORCID: ), Yutani Hiroaki [ctb], Pierre Gramme [ctb], Amir Masoud Abdol [ctb], Malcolm Barrett [ctb] (ORCID: ), Robrecht Cannoodt [ctb] (ORCID: ), Michał Krassowski [ctb] (ORCID: ), Michael Chirico [ctb] (ORCID: ), Pedro Aphalo [ctb] (ORCID: ), Francis Barton [ctb]", + "Maintainer": "Kamil Slowikowski ", "Repository": "CRAN" }, "glue": { @@ -687,7 +1188,78 @@ "NeedsCompilation": "yes", "Author": "Jim Hester [aut] (), Jennifer Bryan [aut, cre] (), Posit Software, PBC [cph, fnd]", "Maintainer": "Jennifer Bryan ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "gridExtra": { + "Package": "gridExtra", + "Version": "2.3", + "Source": "Repository", + "Authors@R": "c(person(\"Baptiste\", \"Auguie\", email = \"baptiste.auguie@gmail.com\", role = c(\"aut\", \"cre\")), person(\"Anton\", \"Antonov\", email = \"tonytonov@gmail.com\", role = c(\"ctb\")))", + "License": "GPL (>= 2)", + "Title": "Miscellaneous Functions for \"Grid\" Graphics", + "Type": "Package", + "Description": "Provides a number of user-level functions to work with \"grid\" graphics, notably to arrange multiple grid-based plots on a page, and draw tables.", + "VignetteBuilder": "knitr", + "Imports": [ + "gtable", + "grid", + "grDevices", + "graphics", + "utils" + ], + "Suggests": [ + "ggplot2", + "egg", + "lattice", + "knitr", + "testthat" + ], + "RoxygenNote": "6.0.1", + "NeedsCompilation": "no", + "Author": "Baptiste Auguie [aut, cre], Anton Antonov [ctb]", + "Maintainer": "Baptiste Auguie ", + "Repository": "https://packagemanager.posit.co/cran/latest", + "Encoding": "UTF-8" + }, + "gtable": { + "Package": "gtable", + "Version": "0.3.6", + "Source": "Repository", + "Title": "Arrange 'Grobs' in Tables", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\"), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", + "Description": "Tools to make it easier to work with \"tables\" of 'grobs'. The 'gtable' package defines a 'gtable' grob class that specifies a grid along with a list of grobs and their placement in the grid. Further the package makes it easy to manipulate and combine 'gtable' objects so that complex compositions can be built up sequentially.", + "License": "MIT + file LICENSE", + "URL": "https://gtable.r-lib.org, https://github.com/r-lib/gtable", + "BugReports": "https://github.com/r-lib/gtable/issues", + "Depends": [ + "R (>= 4.0)" + ], + "Imports": [ + "cli", + "glue", + "grid", + "lifecycle", + "rlang (>= 1.1.0)", + "stats" + ], + "Suggests": [ + "covr", + "ggplot2", + "knitr", + "profvis", + "rmarkdown", + "testthat (>= 3.0.0)" + ], + "VignetteBuilder": "knitr", + "Config/Needs/website": "tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2024-10-25", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.2", + "NeedsCompilation": "no", + "Author": "Hadley Wickham [aut], Thomas Lin Pedersen [aut, cre], Posit Software, PBC [cph, fnd]", + "Maintainer": "Thomas Lin Pedersen ", + "Repository": "https://packagemanager.posit.co/cran/latest" }, "hms": { "Package": "hms", @@ -763,6 +1335,48 @@ "Maintainer": "Hadley Wickham ", "Repository": "CRAN" }, + "isoband": { + "Package": "isoband", + "Version": "0.3.0", + "Source": "Repository", + "Title": "Generate Isolines and Isobands from Regularly Spaced Elevation Grids", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\", comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Claus O.\", \"Wilke\", , \"wilke@austin.utexas.edu\", role = \"aut\", comment = c(\"Original author\", ORCID = \"0000-0002-7470-9261\")), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Posit, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", + "Description": "A fast C++ implementation to generate contour lines (isolines) and contour polygons (isobands) from regularly spaced grids containing elevation data.", + "License": "MIT + file LICENSE", + "URL": "https://isoband.r-lib.org, https://github.com/r-lib/isoband", + "BugReports": "https://github.com/r-lib/isoband/issues", + "Imports": [ + "cli", + "grid", + "rlang", + "utils" + ], + "Suggests": [ + "covr", + "ggplot2", + "knitr", + "magick", + "bench", + "rmarkdown", + "sf", + "testthat (>= 3.0.0)", + "xml2" + ], + "VignetteBuilder": "knitr", + "Config/Needs/website": "tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2025-12-05", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.3", + "Config/build/compilation-database": "true", + "LinkingTo": [ + "cpp11" + ], + "NeedsCompilation": "yes", + "Author": "Hadley Wickham [aut] (ORCID: ), Claus O. Wilke [aut] (Original author, ORCID: ), Thomas Lin Pedersen [aut, cre] (ORCID: ), Posit, PBC [cph, fnd] (ROR: )", + "Maintainer": "Thomas Lin Pedersen ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, "janitor": { "Package": "janitor", "Version": "2.2.1", @@ -839,6 +1453,26 @@ "Author": "Jeroen Ooms [aut, cre] (), Duncan Temple Lang [ctb], Lloyd Hilaiel [cph] (author of bundled libyajl)", "Repository": "CRAN" }, + "labeling": { + "Package": "labeling", + "Version": "0.4.3", + "Source": "Repository", + "Type": "Package", + "Title": "Axis Labeling", + "Date": "2023-08-29", + "Author": "Justin Talbot,", + "Maintainer": "Nuno Sempere ", + "Description": "Functions which provide a range of axis labeling algorithms.", + "License": "MIT + file LICENSE | Unlimited", + "Collate": "'labeling.R'", + "NeedsCompilation": "no", + "Imports": [ + "stats", + "graphics" + ], + "Repository": "https://packagemanager.posit.co/cran/latest", + "Encoding": "UTF-8" + }, "lifecycle": { "Package": "lifecycle", "Version": "1.0.5", @@ -954,6 +1588,36 @@ "NeedsCompilation": "yes", "Author": "Stefan Milton Bache [aut, cph] (Original author and creator of magrittr), Hadley Wickham [aut], Lionel Henry [cre], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Lionel Henry ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "memoise": { + "Package": "memoise", + "Version": "2.0.1", + "Source": "Repository", + "Title": "'Memoisation' of Functions", + "Authors@R": "c(person(given = \"Hadley\", family = \"Wickham\", role = \"aut\", email = \"hadley@rstudio.com\"), person(given = \"Jim\", family = \"Hester\", role = \"aut\"), person(given = \"Winston\", family = \"Chang\", role = c(\"aut\", \"cre\"), email = \"winston@rstudio.com\"), person(given = \"Kirill\", family = \"Müller\", role = \"aut\", email = \"krlmlr+r@mailbox.org\"), person(given = \"Daniel\", family = \"Cook\", role = \"aut\", email = \"danielecook@gmail.com\"), person(given = \"Mark\", family = \"Edmondson\", role = \"ctb\", email = \"r@sunholo.com\"))", + "Description": "Cache the results of a function so that when you call it again with the same arguments it returns the previously computed value.", + "License": "MIT + file LICENSE", + "URL": "https://memoise.r-lib.org, https://github.com/r-lib/memoise", + "BugReports": "https://github.com/r-lib/memoise/issues", + "Imports": [ + "rlang (>= 0.4.10)", + "cachem" + ], + "Suggests": [ + "digest", + "aws.s3", + "covr", + "googleAuthR", + "googleCloudStorageR", + "httr", + "testthat" + ], + "Encoding": "UTF-8", + "RoxygenNote": "7.1.2", + "NeedsCompilation": "no", + "Author": "Hadley Wickham [aut], Jim Hester [aut], Winston Chang [aut, cre], Kirill Müller [aut], Daniel Cook [aut], Mark Edmondson [ctb]", + "Maintainer": "Winston Chang ", "Repository": "CRAN" }, "mime": { @@ -977,6 +1641,26 @@ "Maintainer": "Yihui Xie ", "Repository": "CRAN" }, + "numDeriv": { + "Package": "numDeriv", + "Version": "2016.8-1.1", + "Source": "Repository", + "Title": "Accurate Numerical Derivatives", + "Description": "Methods for calculating (usually) accurate numerical first and second order derivatives. Accurate calculations are done using 'Richardson''s' extrapolation or, when applicable, a complex step derivative is available. A simple difference method is also provided. Simple difference is (usually) less accurate but is much quicker than 'Richardson''s' extrapolation and provides a useful cross-check. Methods are provided for real scalar and vector valued functions.", + "Depends": [ + "R (>= 2.11.1)" + ], + "LazyLoad": "yes", + "ByteCompile": "yes", + "License": "GPL-2", + "Copyright": "2006-2011, Bank of Canada. 2012-2016, Paul Gilbert", + "Author": "Paul Gilbert and Ravi Varadhan", + "Maintainer": "Paul Gilbert ", + "URL": "http://optimizer.r-forge.r-project.org/", + "NeedsCompilation": "no", + "Repository": "https://packagemanager.posit.co/cran/latest", + "Encoding": "UTF-8" + }, "openssl": { "Package": "openssl", "Version": "2.3.4", @@ -1065,7 +1749,7 @@ "NeedsCompilation": "no", "Author": "Kirill Müller [aut, cre] (ORCID: ), Hadley Wickham [aut], RStudio [cph]", "Maintainer": "Kirill Müller ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "pkgconfig": { "Package": "pkgconfig", @@ -1089,7 +1773,7 @@ "BugReports": "https://github.com/r-lib/pkgconfig/issues", "Encoding": "UTF-8", "NeedsCompilation": "no", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "prettyunits": { "Package": "prettyunits", @@ -1222,7 +1906,25 @@ "NeedsCompilation": "yes", "Author": "Hadley Wickham [aut, cre] (ORCID: ), Lionel Henry [aut], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Hadley Wickham ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "quadprog": { + "Package": "quadprog", + "Version": "1.5-8", + "Source": "Repository", + "Type": "Package", + "Title": "Functions to Solve Quadratic Programming Problems", + "Date": "2019-11-20", + "Author": "S original by Berwin A. Turlach R port by Andreas Weingessel Fortran contributions from Cleve Moler (dposl/LINPACK and (a modified version of) dpodi/LINPACK)", + "Maintainer": "Berwin A. Turlach ", + "Description": "This package contains routines and documentation for solving quadratic programming problems.", + "Depends": [ + "R (>= 3.1.0)" + ], + "License": "GPL (>= 2)", + "NeedsCompilation": "yes", + "Repository": "https://packagemanager.posit.co/cran/latest", + "Encoding": "UTF-8" }, "rappdirs": { "Package": "rappdirs", @@ -1504,6 +2206,50 @@ "Maintainer": "Edzer Pebesma ", "Repository": "CRAN" }, + "scales": { + "Package": "scales", + "Version": "1.4.0", + "Source": "Repository", + "Title": "Scale Functions for Visualization", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\"), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Dana\", \"Seidel\", role = \"aut\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", + "Description": "Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.", + "License": "MIT + file LICENSE", + "URL": "https://scales.r-lib.org, https://github.com/r-lib/scales", + "BugReports": "https://github.com/r-lib/scales/issues", + "Depends": [ + "R (>= 4.1)" + ], + "Imports": [ + "cli", + "farver (>= 2.0.3)", + "glue", + "labeling", + "lifecycle", + "R6", + "RColorBrewer", + "rlang (>= 1.1.0)", + "viridisLite" + ], + "Suggests": [ + "bit64", + "covr", + "dichromat", + "ggplot2", + "hms (>= 0.5.0)", + "stringi", + "testthat (>= 3.0.0)" + ], + "Config/Needs/website": "tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2025-04-23", + "Encoding": "UTF-8", + "LazyLoad": "yes", + "RoxygenNote": "7.3.2", + "NeedsCompilation": "no", + "Author": "Hadley Wickham [aut], Thomas Lin Pedersen [cre, aut] (), Dana Seidel [aut], Posit Software, PBC [cph, fnd] (03wc8by49)", + "Maintainer": "Thomas Lin Pedersen ", + "Repository": "https://packagemanager.posit.co/cran/latest" + }, "selectr": { "Package": "selectr", "Version": "0.5-1", @@ -1677,7 +2423,7 @@ "Author": "Marek Gagolewski [aut, cre, cph] (), Bartek Tartanus [ctb], Unicode, Inc. and others [ctb] (ICU4C source code, Unicode Character Database)", "Maintainer": "Marek Gagolewski ", "License_is_FOSS": "yes", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "stringr": { "Package": "stringr", @@ -1722,7 +2468,7 @@ "NeedsCompilation": "no", "Author": "Hadley Wickham [aut, cre, cph], Posit Software, PBC [cph, fnd]", "Maintainer": "Hadley Wickham ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "sys": { "Package": "sys", @@ -1748,6 +2494,54 @@ "Maintainer": "Jeroen Ooms ", "Repository": "CRAN" }, + "systemfonts": { + "Package": "systemfonts", + "Version": "1.3.1", + "Source": "Repository", + "Type": "Package", + "Title": "System Native Font Finding", + "Authors@R": "c( person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Jeroen\", \"Ooms\", , \"jeroen@berkeley.edu\", role = \"aut\", comment = c(ORCID = \"0000-0002-4035-0289\")), person(\"Devon\", \"Govett\", role = \"aut\", comment = \"Author of font-manager\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", + "Description": "Provides system native access to the font catalogue. As font handling varies between systems it is difficult to correctly locate installed fonts across different operating systems. The 'systemfonts' package provides bindings to the native libraries on Windows, macOS and Linux for finding font files that can then be used further by e.g. graphic devices. The main use is intended to be from compiled code but 'systemfonts' also provides access from R.", + "License": "MIT + file LICENSE", + "URL": "https://github.com/r-lib/systemfonts, https://systemfonts.r-lib.org", + "BugReports": "https://github.com/r-lib/systemfonts/issues", + "Depends": [ + "R (>= 3.2.0)" + ], + "Imports": [ + "base64enc", + "grid", + "jsonlite", + "lifecycle", + "tools", + "utils" + ], + "Suggests": [ + "covr", + "farver", + "ggplot2", + "graphics", + "knitr", + "ragg", + "rmarkdown", + "svglite", + "testthat (>= 2.1.0)" + ], + "LinkingTo": [ + "cpp11 (>= 0.2.1)" + ], + "VignetteBuilder": "knitr", + "Config/build/compilation-database": "true", + "Config/Needs/website": "tidyverse/tidytemplate", + "Config/usethis/last-upkeep": "2025-04-23", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.2", + "SystemRequirements": "fontconfig, freetype2", + "NeedsCompilation": "yes", + "Author": "Thomas Lin Pedersen [aut, cre] (ORCID: ), Jeroen Ooms [aut] (ORCID: ), Devon Govett [aut] (Author of font-manager), Posit Software, PBC [cph, fnd] (ROR: )", + "Maintainer": "Thomas Lin Pedersen ", + "Repository": "CRAN" + }, "tibble": { "Package": "tibble", "Version": "3.3.1", @@ -1811,7 +2605,7 @@ "NeedsCompilation": "yes", "Author": "Kirill Müller [aut, cre] (ORCID: ), Hadley Wickham [aut], Romain Francois [ctb], Jennifer Bryan [ctb], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Kirill Müller ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "tidycensus": { "Package": "tidycensus", @@ -2100,6 +2894,54 @@ "Maintainer": "Edzer Pebesma ", "Repository": "CRAN" }, + "urbnthemes": { + "Package": "urbnthemes", + "Version": "0.0.3", + "Source": "GitHub", + "Type": "Package", + "Title": "Additional theme and utilities for \"ggplot2\" in the Urban Institute style", + "Authors@R": "c( person(given = \"Aaron\", family = \"Williams\", middle = \"R.\", email = \"awilliams@urban.org\", role = c(\"aut\", \"cre\")), person(given = \"Kyle\", family = \"Ueyama\", email = \"kueyama@urban.org\", role = \"aut\"), person(given = \"Ajjit\", family = \"Narayanan\", email = \"anarayanan@urban.org\", role = \"aut\"), person(given = \"Ben\", family = \"Chartoff\", email = \"bchartoff@urban.org\", role = \"aut\") )", + "Description": "Align \"ggplot2\" output more closely with the Urban Institute Data Visualization style guide .", + "Depends": [ + "R (>= 3.1.0)" + ], + "Imports": [ + "extrafont", + "ggplot2 (>= 3.3.0)", + "ggrepel", + "grid", + "gridExtra", + "lifecycle", + "scales", + "conflicted", + "tibble", + "purrr", + "stringr", + "systemfonts" + ], + "License": "GPL-3", + "URL": "https://github.com/UrbanInstitute/urbnthemes", + "BugReports": "https://github.com/UrbanInstitute/urbnthemes/issues", + "Encoding": "UTF-8", + "LazyData": "true", + "RoxygenNote": "7.3.2", + "Suggests": [ + "knitr", + "rmarkdown", + "testthat" + ], + "VignetteBuilder": "knitr", + "Roxygen": "list(markdown = TRUE)", + "Author": "Aaron R. Williams [aut, cre], Kyle Ueyama [aut], Ajjit Narayanan [aut], Ben Chartoff [aut]", + "Maintainer": "Aaron R. Williams ", + "RemoteType": "github", + "RemoteHost": "api.github.com", + "RemoteUsername": "UrbanInstitute", + "RemoteRepo": "urbnthemes", + "RemoteRef": "main", + "RemoteSha": "c7c37dd1ce8d1fee7eb7e1aed7f4eb7dcaf4d5b4", + "Remotes": "extrafont=github::wch/extrafont" + }, "utf8": { "Package": "utf8", "Version": "1.2.6", @@ -2129,7 +2971,7 @@ "NeedsCompilation": "yes", "Author": "Patrick O. Perry [aut, cph], Kirill Müller [cre] (ORCID: ), Unicode, Inc. [cph, dtc] (Unicode Character Database)", "Maintainer": "Kirill Müller ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "uuid": { "Package": "uuid", @@ -2197,7 +3039,35 @@ "NeedsCompilation": "yes", "Author": "Hadley Wickham [aut], Lionel Henry [aut], Davis Vaughan [aut, cre], data.table team [cph] (Radix sort based on data.table's forder() and their contribution to R's order()), Posit Software, PBC [cph, fnd]", "Maintainer": "Davis Vaughan ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" + }, + "viridisLite": { + "Package": "viridisLite", + "Version": "0.4.3", + "Source": "Repository", + "Type": "Package", + "Title": "Colorblind-Friendly Color Maps (Lite Version)", + "Date": "2026-02-03", + "Authors@R": "c( person(\"Simon\", \"Garnier\", email = \"garnier@njit.edu\", role = c(\"aut\", \"cre\")), person(\"Noam\", \"Ross\", email = \"noam.ross@gmail.com\", role = c(\"ctb\", \"cph\")), person(\"Bob\", \"Rudis\", email = \"bob@rud.is\", role = c(\"ctb\", \"cph\")), person(\"Marco\", \"Sciaini\", email = \"sciaini.marco@gmail.com\", role = c(\"ctb\", \"cph\")), person(\"Antônio Pedro\", \"Camargo\", role = c(\"ctb\", \"cph\")), person(\"Cédric\", \"Scherer\", email = \"scherer@izw-berlin.de\", role = c(\"ctb\", \"cph\")) )", + "Maintainer": "Simon Garnier ", + "Description": "Color maps designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency. The color maps are also perceptually-uniform, both in regular form and also when converted to black-and-white for printing. This is the 'lite' version of the 'viridis' package that also contains 'ggplot2' bindings for discrete and continuous color and fill scales and can be found at .", + "License": "MIT + file LICENSE", + "Encoding": "UTF-8", + "Depends": [ + "R (>= 2.10)" + ], + "Suggests": [ + "hexbin (>= 1.27.0)", + "ggplot2 (>= 1.0.1)", + "testthat", + "covr" + ], + "URL": "https://sjmgarnier.github.io/viridisLite/, https://github.com/sjmgarnier/viridisLite/", + "BugReports": "https://github.com/sjmgarnier/viridisLite/issues/", + "RoxygenNote": "7.3.3", + "NeedsCompilation": "no", + "Author": "Simon Garnier [aut, cre], Noam Ross [ctb, cph], Bob Rudis [ctb, cph], Marco Sciaini [ctb, cph], Antônio Pedro Camargo [ctb, cph], Cédric Scherer [ctb, cph]", + "Repository": "https://packagemanager.posit.co/cran/latest" }, "vroom": { "Package": "vroom", @@ -2307,7 +3177,7 @@ "NeedsCompilation": "no", "Author": "Jim Hester [aut], Lionel Henry [aut, cre], Kirill Müller [aut], Kevin Ushey [aut], Hadley Wickham [aut], Winston Chang [aut], Jennifer Bryan [ctb], Richard Cotton [ctb], Posit Software, PBC [cph, fnd]", "Maintainer": "Lionel Henry ", - "Repository": "CRAN" + "Repository": "https://packagemanager.posit.co/cran/latest" }, "wk": { "Package": "wk", diff --git a/tests/testthat/test-auto_percent.R b/tests/testthat/test-auto_percent.R new file mode 100644 index 0000000..1504fbe --- /dev/null +++ b/tests/testthat/test-auto_percent.R @@ -0,0 +1,271 @@ +####----UNIT TESTS (no API calls)----#### + +test_that("is_raw_acs_code identifies valid ACS table codes", { + ## positive cases + expect_true(is_raw_acs_code("B25070")) + expect_true(is_raw_acs_code("B01001")) + expect_true(is_raw_acs_code("C15002")) + expect_true(is_raw_acs_code("B01001A")) + expect_true(is_raw_acs_code("B01001I")) + expect_true(is_raw_acs_code("B01001APR")) + + ## negative cases + expect_false(is_raw_acs_code("race")) + expect_false(is_raw_acs_code("snap")) + expect_false(is_raw_acs_code("B2507")) ## too few digits + expect_false(is_raw_acs_code("B250700")) ## too many digits + expect_false(is_raw_acs_code("D25070")) ## wrong prefix + expect_false(is_raw_acs_code("B25070_001")) ## variable code, not table code + expect_false(is_raw_acs_code("b25070")) ## lowercase +}) + +test_that("build_label_tree correctly assigns parent-child relationships", { + ## mock a minimal variables_df for B22003 (SNAP receipt) + mock_df = data.frame( + name = c("B22003_001", "B22003_002", "B22003_003", "B22003_004", + "B22003_005", "B22003_006", "B22003_007"), + label = c( + "Estimate!!Total:", + "Estimate!!Total:!!Received Food Stamps/SNAP in the past 12 months:", + "Estimate!!Total:!!Received Food Stamps/SNAP in the past 12 months:!!Household income in the past 12 months below poverty level", + "Estimate!!Total:!!Received Food Stamps/SNAP in the past 12 months:!!Household income in the past 12 months at or above poverty level", + "Estimate!!Total:!!Did not receive Food Stamps/SNAP in the past 12 months:", + "Estimate!!Total:!!Did not receive Food Stamps/SNAP in the past 12 months:!!Household income in the past 12 months below poverty level", + "Estimate!!Total:!!Did not receive Food Stamps/SNAP in the past 12 months:!!Household income in the past 12 months at or above poverty level" + ), + concept = rep("RECEIPT OF FOOD STAMPS/SNAP IN THE PAST 12 MONTHS BY POVERTY STATUS IN THE PAST 12 MONTHS FOR HOUSEHOLDS", 7), + stringsAsFactors = FALSE + ) + + ## apply clean_acs_names (requires the package function) + mock_df = mock_df %>% clean_acs_names() + result = build_label_tree(mock_df) + + ## total has no parent + expect_true(result$is_total[1]) + expect_true(is.na(result$parent_code[1])) + + ## subtotals (received, did not receive) should have total as parent + expect_equal(result$parent_code[2], "B22003_001") + expect_equal(result$parent_code[5], "B22003_001") + + ## leaves should have their subtotal as parent + expect_equal(result$parent_code[3], "B22003_002") + expect_equal(result$parent_code[4], "B22003_002") + expect_equal(result$parent_code[6], "B22003_005") + expect_equal(result$parent_code[7], "B22003_005") +}) + +test_that("classify_acs_table correctly identifies count vs skip tables", { + ## count table: has a "Total:" label + count_nodes = data.frame( + concept = "RECEIPT OF FOOD STAMPS/SNAP", + label = c("Estimate!!Total:", "Estimate!!Total:!!Received"), + is_total = c(TRUE, FALSE), + stringsAsFactors = FALSE + ) + expect_equal(classify_acs_table(count_nodes), "count") + + ## median table + median_nodes = data.frame( + concept = "MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS", + label = c("Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars)"), + is_total = c(TRUE), + stringsAsFactors = FALSE + ) + expect_equal(classify_acs_table(median_nodes), "skip") + + ## aggregate table + aggregate_nodes = data.frame( + concept = "AGGREGATE HOUSEHOLD INCOME", + label = c("Estimate!!Aggregate household income in the past 12 months (in 2022 inflation-adjusted dollars)"), + is_total = c(TRUE), + stringsAsFactors = FALSE + ) + expect_equal(classify_acs_table(aggregate_nodes), "skip") + + ## singleton table (only 1 variable) + singleton_nodes = data.frame( + concept = "GINI INDEX OF INCOME INEQUALITY", + label = c("Estimate!!Gini Index"), + is_total = c(TRUE), + stringsAsFactors = FALSE + ) + expect_equal(classify_acs_table(singleton_nodes), "skip") +}) + +test_that("generate_auto_definitions produces correct definitions with parent mode", { + ## mock nodes with parent info + mock_nodes = data.frame( + name = c("B22003_001", "B22003_002", "B22003_003"), + clean_name_trimmed = c("snap_universe", "snap_received", "snap_below_poverty"), + is_total = c(TRUE, FALSE, FALSE), + parent_code = c(NA, "B22003_001", "B22003_002"), + parent_clean_name = c(NA, "snap_universe", "snap_received"), + stringsAsFactors = FALSE + ) + + defs = generate_auto_definitions(mock_nodes, denominator_mode = "parent") + + expect_length(defs, 2) + expect_equal(defs[[1]][["type"]], "simple_percent") + expect_equal(defs[[1]][["output"]], "snap_received_percent") + expect_equal(defs[[1]][["numerator"]], "snap_received") + expect_equal(defs[[1]][["denominator"]], "snap_universe") ## parent of snap_received + + expect_equal(defs[[2]][["output"]], "snap_below_poverty_percent") + expect_equal(defs[[2]][["denominator"]], "snap_received") ## parent of snap_below_poverty +}) + +test_that("generate_auto_definitions produces correct definitions with total mode", { + mock_nodes = data.frame( + name = c("B22003_001", "B22003_002", "B22003_003"), + clean_name_trimmed = c("snap_universe", "snap_received", "snap_below_poverty"), + is_total = c(TRUE, FALSE, FALSE), + parent_code = c(NA, "B22003_001", "B22003_002"), + parent_clean_name = c(NA, "snap_universe", "snap_received"), + stringsAsFactors = FALSE + ) + + defs = generate_auto_definitions(mock_nodes, denominator_mode = "total") + + expect_length(defs, 2) + ## all denominators should be the total + expect_equal(defs[[1]][["denominator"]], "snap_universe") + expect_equal(defs[[2]][["denominator"]], "snap_universe") +}) + +####----INTEGRATION TESTS (require Census API key)----#### + +test_that("compile_acs_data works with an unregistered ACS table", { + skip_on_cran() + skip_if_not(nchar(Sys.getenv("CENSUS_API_KEY")) > 0, "Census API key not available") + + ## B25070: Gross Rent as a Percentage of Household Income (not registered) + result = compile_acs_data( + tables = "B25070", + years = 2022, + geography = "state", + states = "DC") + + ## should have auto-computed percentage columns + pct_cols = grep("_percent$", colnames(result), value = TRUE) + expect_true(length(pct_cols) > 0) + + ## computed percentage columns should be 0-1 bounded + for (col in pct_cols) { + vals = result[[col]] + expect_true(all(vals >= 0 & vals <= 1, na.rm = TRUE), + info = paste0(col, " has values outside [0, 1]")) + } + + ## should have codebook + codebook = attr(result, "codebook") + expect_true(is.data.frame(codebook)) + expect_true(nrow(codebook) > 0) + + ## should have MOE columns but not SE/CV columns + expect_true(any(grepl("_M$", colnames(result)))) + expect_false(any(grepl("_SE$", colnames(result)))) + expect_false(any(grepl("_CV$", colnames(result)))) +}) + +test_that("compile_acs_data works with mixed registry + unregistered tables", { + skip_on_cran() + skip_if_not(nchar(Sys.getenv("CENSUS_API_KEY")) > 0, "Census API key not available") + + result = compile_acs_data( + tables = c("snap", "B25070"), + years = 2022, + geography = "state", + states = "DC") + + ## should have both snap and auto variables + expect_true("snap_received_percent" %in% colnames(result)) + ## should have auto-computed B25070 percentage variables + auto_pct = grep("gross_rent.*_percent$", colnames(result), value = TRUE) + expect_true(length(auto_pct) > 0) +}) + +test_that("compile_acs_data with denominator = 'total' uses table total", { + skip_on_cran() + skip_if_not(nchar(Sys.getenv("CENSUS_API_KEY")) > 0, "Census API key not available") + + ## B25070 is not registered, so denominator param fully controls it + result = compile_acs_data( + tables = "B25070", + denominator = "total", + years = 2022, + geography = "state", + states = "DC") + + ## verify codebook: all auto-table percent definitions should use the _001 total + codebook = attr(result, "codebook") + auto_pct_rows = codebook %>% + dplyr::filter(variable_type == "Percent", + !grepl("total_population", calculated_variable)) + + expect_true(nrow(auto_pct_rows) > 0) + + ## all definitions should contain the universe/_001 variable as denominator + for (i in seq_len(nrow(auto_pct_rows))) { + def = auto_pct_rows$definition[i] + expect_true(grepl("_001\\)", def) || grepl("universe", def), + info = paste0("Definition for ", auto_pct_rows$calculated_variable[i], + " does not use total as denominator: ", def)) + } +}) + +test_that("compile_acs_data handles median tables silently (no percentages)", { + skip_on_cran() + skip_if_not(nchar(Sys.getenv("CENSUS_API_KEY")) > 0, "Census API key not available") + + ## B19013: Median household income - should return raw variables, no percentages + result = compile_acs_data( + tables = "B19013", + years = 2022, + geography = "state", + states = "DC") + + ## should have raw variables but no auto-computed percent columns for this table + auto_pct = grep("median.*household.*income.*_percent$", colnames(result), value = TRUE) + expect_length(auto_pct, 0) +}) + +test_that("compile_acs_data works with race-iterated table", { + skip_on_cran() + skip_if_not(nchar(Sys.getenv("CENSUS_API_KEY")) > 0, "Census API key not available") + + ## B06007: Place of birth by language spoken at home — not registered, has hierarchy + result = compile_acs_data( + tables = "B06007", + years = 2022, + geography = "state", + states = "DC") + + pct_cols = grep("_percent$", colnames(result), value = TRUE) + expect_true(length(pct_cols) > 0) + + ## percentages should be 0-1 bounded + for (col in pct_cols) { + vals = result[[col]] + expect_true(all(vals >= 0 & vals <= 1, na.rm = TRUE), + info = paste0(col, " has values outside [0, 1]")) + } +}) + +test_that("compile_acs_data detects overlap with registered tables", { + skip_on_cran() + skip_if_not(nchar(Sys.getenv("CENSUS_API_KEY")) > 0, "Census API key not available") + + ## B22003 is the ACS table behind the registered "snap" table + ## passing the raw code should silently use the registered version + result = compile_acs_data( + tables = c("B22003"), + years = 2022, + geography = "state", + states = "DC") + + ## should have the registered snap variable + expect_true("snap_received_percent" %in% colnames(result)) +}) diff --git a/tests/testthat/test-calculate_cvs.R b/tests/testthat/test-calculate_cvs.R index b36b970..13c2d72 100644 --- a/tests/testthat/test-calculate_cvs.R +++ b/tests/testthat/test-calculate_cvs.R @@ -2,93 +2,22 @@ test_data_path = test_path("fixtures", "test_data_2026-02-08.rds") test_codebook_path = test_path("fixtures", "codebook_2026-02-08.rds") testthat::test_that( - "No CV has missing values for all observations", + "No MOE has missing values for all observations", { testthat::skip_if_not(file.exists(test_data_path), "Test fixture not available") - ## Statistics for CA and TX tracts df = readRDS(test_data_path) - measure_level_quality = df %>% - dplyr::select(dplyr::matches("_CV$")) %>% - tidyr::pivot_longer(dplyr::everything()) %>% - dplyr::arrange(dplyr::desc(value)) %>% - dplyr::mutate( - flag = dplyr::case_when( - is.na(value) ~ NA, - value > 1000 ~ "1000+", - value > 100 ~ "100+", - value > 30 ~ "30+", - TRUE ~ "<=30")) %>% - dplyr::group_by(name, flag) %>% - dplyr::summarize( - count = dplyr::n()) %>% - dplyr::group_by(name) %>% - dplyr::mutate( - total = sum(count), - percent_missing = count / total) %>% - dplyr::ungroup() + moe_cols = df %>% + dplyr::select(dplyr::matches("_M$")) - testthat::expect_lt( - measure_level_quality %>% - dplyr::slice(1) %>% - dplyr::pull(percent_missing), 1) } - ) - -testthat::test_that( - "All measures have at least some values with modest CVs", - { - testthat::skip_if_not(file.exists(test_data_path), "Test fixture not available") - ## Statistics for CA and TX tracts - df = readRDS(test_data_path) - - measure_level_quality = df %>% - dplyr::select(dplyr::matches("_CV$")) %>% - tidyr::pivot_longer(dplyr::everything()) %>% - dplyr::arrange(dplyr::desc(value)) %>% - dplyr::mutate( - flag = dplyr::case_when( - is.na(value) ~ NA, - value > 1000 ~ "1000+", - value > 100 ~ "100+", - value > 30 ~ "30+", - TRUE ~ "<=30")) %>% - dplyr::group_by(name, flag) %>% - dplyr::summarize( - count = dplyr::n()) %>% - dplyr::group_by(name) %>% - dplyr::mutate( - total = sum(count), - percent_missing = count / total) %>% - dplyr::ungroup() - - testthat::expect_gt( - measure_level_quality %>% - dplyr::filter(flag == "<=30") %>% - nrow(), 0) } -) - -testthat::test_that( - "There is a CV for every variable that has an MOE (or for which one can be calculated)", - { - testthat::skip_if_not(file.exists(test_data_path), "Test fixture not available") - df = readRDS(test_data_path) - moes = df %>% - dplyr::select(dplyr::matches("_M$")) %>% - colnames() %>% - stringr::str_remove("_M$") - cvs = df %>% - dplyr::select(matches("_CV$")) %>% - colnames() %>% - stringr::str_remove("_CV$") - - testthat::expect_equal( - moes[!moes %in% cvs] %>% length(), - 0) + ## every MOE column should have at least one non-NA value + all_na_count = purrr::map_lgl(moe_cols, ~ all(is.na(.x))) %>% sum() + testthat::expect_equal(all_na_count, 0) } ) testthat::test_that( - "All _pct variables have CVs calculated", + "All _pct variables have MOEs calculated", { testthat::skip_if_not(file.exists(test_data_path), "Test fixture not available") testthat::skip_if_not(file.exists(test_codebook_path), "Test fixture not available") @@ -98,21 +27,15 @@ testthat::test_that( ## _pct variables are raw ACS count variables renamed from _percent pct_vars = colnames(df) %>% stringr::str_subset("_pct$") %>% - stringr::str_subset("_M$|_SE$|_CV$", negate = TRUE) + stringr::str_subset("_M$", negate = TRUE) ## all _pct variables should be in the codebook pct_in_codebook = pct_vars[pct_vars %in% codebook$calculated_variable] testthat::expect_equal(length(pct_in_codebook), length(pct_vars)) - ## all _pct variables should have corresponding _CV columns - pct_cvs = paste0(pct_vars, "_CV") - pct_cvs_present = pct_cvs[pct_cvs %in% colnames(df)] - testthat::expect_equal(length(pct_cvs_present), length(pct_vars)) - - ## all _pct variables should have corresponding _SE columns - pct_ses = paste0(pct_vars, "_SE") - pct_ses_present = pct_ses[pct_ses %in% colnames(df)] - testthat::expect_equal(length(pct_ses_present), length(pct_vars)) + ## all _pct variables should have corresponding _M columns + pct_moes = paste0(pct_vars, "_M") + pct_moes_present = pct_moes[pct_moes %in% colnames(df)] + testthat::expect_equal(length(pct_moes_present), length(pct_vars)) } ) - diff --git a/tests/testthat/test-compile_acs_data.R b/tests/testthat/test-compile_acs_data.R index a2d48e5..7e4b9a5 100644 --- a/tests/testthat/test-compile_acs_data.R +++ b/tests/testthat/test-compile_acs_data.R @@ -861,27 +861,7 @@ testthat::test_that( }) testthat::test_that( - "compile_acs_data() accepts indicators parameter", - { - ## verify indicator resolution returns the correct parent tables - resolved = resolve_tables(indicators = c("snap_received_percent")) - testthat::expect_true("snap" %in% resolved) - testthat::expect_true("total_population" %in% resolved) - }) - -testthat::test_that( - "compile_acs_data() accepts mixed tables and indicators", - { - resolved = resolve_tables( - tables = "race", - indicators = "snap_received_percent") - testthat::expect_true("race" %in% resolved) - testthat::expect_true("snap" %in% resolved) - testthat::expect_true("total_population" %in% resolved) - }) - -testthat::test_that( - "Default (no tables/indicators) resolves to all tables", + "Default (no tables) resolves to all tables", { all_tables = list_tables() testthat::expect_gte(length(all_tables), 30) diff --git a/tests/testthat/test-generate_codebook.R b/tests/testthat/test-generate_codebook.R index a6ff37f..cdf79c1 100644 --- a/tests/testthat/test-generate_codebook.R +++ b/tests/testthat/test-generate_codebook.R @@ -105,17 +105,10 @@ test_codebook_path = test_path("fixtures", "codebook_2026-02-08.rds") ####----Table Registry Codebook Tests----#### testthat::test_that( - "Registry codebook entries have required columns.", + "Registry codebook entries reference valid tables.", { - ## verify the registry codebook has the expected structure all_tables = list_tables() - indicators = list_indicators() - - testthat::expect_true("indicator" %in% colnames(indicators)) - testthat::expect_true("table" %in% colnames(indicators)) - - ## every indicator should reference a valid table - testthat::expect_true(all(indicators$table %in% all_tables)) + testthat::expect_gte(length(all_tables), 30) }) testthat::test_that( diff --git a/tests/testthat/test-table_registry.R b/tests/testthat/test-table_registry.R index bcc9d40..1e93e45 100644 --- a/tests/testthat/test-table_registry.R +++ b/tests/testthat/test-table_registry.R @@ -83,54 +83,6 @@ testthat::test_that( "Unknown table") }) -testthat::test_that( - "list_indicators() returns construct-level table names", - { - indicators = list_indicators() - - testthat::expect_true(tibble::is_tibble(indicators)) - testthat::expect_true("indicator" %in% colnames(indicators)) - testthat::expect_true("table" %in% colnames(indicators)) - testthat::expect_gt(nrow(indicators), 0) - - ## spot-check specific indicators - testthat::expect_true("snap_received_percent" %in% indicators$indicator) - testthat::expect_true("race_personofcolor_percent" %in% indicators$indicator) - - ## verify construct-level table names for sex_by_age split - age_indicators = indicators %>% dplyr::filter(table == "age") - testthat::expect_true("age_over_64_percent" %in% age_indicators$indicator) - testthat::expect_true("age_under_18_percent" %in% age_indicators$indicator) - - sex_indicators = indicators %>% dplyr::filter(table == "sex") - testthat::expect_true("sex_female_percent" %in% sex_indicators$indicator) - testthat::expect_true("sex_male_percent" %in% sex_indicators$indicator) - - ## verify construct-level table names for nativity_language split - nativity_indicators = indicators %>% dplyr::filter(table == "nativity") - testthat::expect_true("nativity_native_born_percent" %in% nativity_indicators$indicator) - - language_indicators = indicators %>% dplyr::filter(table == "language") - testthat::expect_true("ability_speak_english_very_well_better_percent" %in% language_indicators$indicator) - }) - -testthat::test_that( - "resolve_tables() resolves indicators to parent tables", - { - ## indicators with construct-level table names should still resolve correctly - resolved = resolve_tables(indicators = "snap_received_percent") - testthat::expect_true("snap" %in% resolved) - testthat::expect_true("total_population" %in% resolved) - - ## age indicator should resolve to sex_by_age internal table - resolved = resolve_tables(indicators = "age_over_64_percent") - testthat::expect_true("sex_by_age" %in% resolved) - - ## sex indicator should resolve to sex_by_age internal table - resolved = resolve_tables(indicators = "sex_female_percent") - testthat::expect_true("sex_by_age" %in% resolved) - }) - testthat::test_that( "Every registered table has a name and definitions", { diff --git a/vignettes/custom-geographies.Rmd b/vignettes/custom-geographies.Rmd index ad13078..e1932d2 100644 --- a/vignettes/custom-geographies.Rmd +++ b/vignettes/custom-geographies.Rmd @@ -88,17 +88,19 @@ dc_quadrants = calculate_custom_geographies( The maps below show the share of households receiving SNAP benefits. Notice how aggregating to quadrants produces more precise estimates with -smaller coefficients of variation. Indeed, the median coefficient of variation -for tract level is greater than 30, a common upper bound for "reliable" estimates. +smaller margins of error. Indeed, the median coefficient of variation +(derived from the MOE) for tract level is greater than 30, a common +upper bound for "reliable" estimates. ```{r, fig.height = 4} -# Tract-level map bind_rows( dc_tracts %>% mutate(geography = "Tract"), dc_quadrants %>% mutate(geography = "Quadrant")) %>% mutate( .by = geography, - median_cv = round(median(snap_received_percent_CV, na.rm = TRUE)), + cv = (snap_received_percent_M / 1.645) / snap_received_percent * 100, + cv = if_else(is.infinite(cv), NA_real_, cv), + median_cv = round(median(cv, na.rm = TRUE)), label = str_c(geography, " - median CV: ", median_cv)) %>% ggplot() + geom_sf(aes(fill = snap_received_percent), color = "white", linewidth = 0.1) + @@ -108,8 +110,8 @@ for tract level is greater than 30, a common upper bound for "reliable" estimate facet_wrap(~ label) ``` -The quadrant-level estimates have substantially lower CVs, indicating -more reliable estimates. +The quadrant-level estimates have substantially lower margins of error, +indicating more reliable estimates. # Detecting Statistically Significant Differences diff --git a/vignettes/quantified-survey-error.Rmd b/vignettes/quantified-survey-error.Rmd index 685fd4f..cd9a33a 100644 --- a/vignettes/quantified-survey-error.Rmd +++ b/vignettes/quantified-survey-error.Rmd @@ -92,8 +92,8 @@ measures of error. while 10 of those times, our estimate would fall outside this range. - **Standard Errors (SE)** are derived from MOEs by dividing the MOE - against a confidence level-related value. `urbnindicators` returns - 90% SEs, which are calculated by dividing an MOE by 1.645. + against a confidence level-related value. 90% SEs are calculated by + dividing an MOE by 1.645. - **Coefficients of Variation (CV)** relate error to the size of the estimate. They are calculated by dividing the SE by the estimate and @@ -105,13 +105,14 @@ measures of error. # Evaluate Estimate Quality -CVs allow us to assess whether estimates have problematically large +MOEs allow us to assess whether estimates have problematically large errors. While there's not a right-or-wrong threshold for what -constitutes a good or bad CV, many people employ thresholds between 30 -and 40 (that is, where the error is 30-40% of the size of the estimate). +constitutes a problematic MOE, a common approach is to calculate the +coefficient of variation (CV = SE / estimate * 100, where SE = MOE / +1.645) and use thresholds between 30 and 40. As shown below, variables that rely on larger sample sizes tend to have -smaller CVs. Typically, there are two strategies to reduce CVs: (1) +smaller MOEs. Typically, there are two strategies to reduce error: (1) aggregate estimates, either across geographies or across variables, or (2) use larger geographies. @@ -131,20 +132,33 @@ acs_df_tract = compile_acs_data( ``` ```{r, message = FALSE, warning = FALSE, fig.width = 7, fig.height = 12} +## derive CVs from MOEs for quality evaluation plot_df = bind_rows( acs_df_county %>% mutate(geography = "County"), acs_df_tract %>% mutate(geography = "Tract")) %>% st_drop_geometry() %>% select( geography, - c(matches("^age.*percent.*CV") & matches("(_6|_7|_8)"))) %>% - rename_with( - .cols = everything(), - .fn = ~ .x %>% - str_replace_all(c("_" = " ", "percent" = "(%)", "age|CV" = "")) %>% - str_squish() %>% str_trim()) %>% + c(matches("^age.*percent$") & matches("(_6|_7|_8)")), + c(matches("^age.*percent_M$") & matches("(_6|_7|_8)"))) %>% pivot_longer(-geography) %>% - mutate(plot_title = str_c(geography, ": ", name)) + mutate( + base_var = str_remove(name, "_M$"), + type = if_else(str_detect(name, "_M$"), "moe", "estimate")) %>% + pivot_wider( + id_cols = c(geography, base_var), + names_from = type, + values_from = value, + values_fn = list) %>% + tidyr::unnest(c(estimate, moe)) %>% + mutate( + cv = (moe / 1.645) / estimate * 100, + cv = if_else(is.infinite(cv), NA_real_, cv), + name = base_var %>% + str_replace_all(c("_" = " ", "percent" = "(%)")) %>% + str_remove_all("age") %>% + str_squish() %>% str_trim(), + plot_title = str_c(geography, ": ", name)) factor_levels = plot_df %>% arrange(name) %>% @@ -156,7 +170,7 @@ plot_df %>% plot_title = factor(plot_title, levels = factor_levels, ordered = TRUE)) %>% ggplot() + geom_histogram( - aes(x = value), fill = palette_urbn_main[1], bins = 50) + + aes(x = cv), fill = palette_urbn_main[1], bins = 50) + geom_vline(xintercept = 30, linetype = "dashed") + facet_wrap(~ plot_title, ncol = 2, scales = "free") + scale_x_continuous(limits = c(0, 150)) + @@ -164,10 +178,10 @@ plot_df %>% theme_urbn_print() + theme(axis.text.y = element_blank()) + labs( - x = "CV", + x = "CV (derived from MOE)", y = "Distribution", title = "County CVs are Smaller than Tract CVs\nCVs for Wider Age Ranges are Smaller than CVs for Narrow Age Ranges", - subtitle = "Coefficients of variation (CV) at the county and tract levels for various age groupings") + subtitle = "Coefficients of variation at the county and tract levels for various age groupings") ``` # Conduct Statistical Significance Testing @@ -185,11 +199,6 @@ compare each tract-level value within a single county to that of the corresponding county. ```{r, fig.width = 7} -## utility to help us derive an MOE from a CV -cv_to_moe = function(cv, estimate) { - cv / 100 * 1.645 * estimate -} - plot_data = acs_df_tract %>% filter(str_detect(NAME, "Atlantic")) %>% select(GEOID, matches("age_over_64_percent")) %>% @@ -210,10 +219,8 @@ plot_data = acs_df_tract %>% significance = significance( est1 = age_over_64_percent, est2 = age_over_64_percent_county, - moe1 = cv_to_moe( - cv = age_over_64_percent_CV, estimate = age_over_64_percent), - moe2 = cv_to_moe( - cv = age_over_64_percent_CV_county, estimate = age_over_64_percent_county), + moe1 = age_over_64_percent_M, + moe2 = age_over_64_percent_M_county, clevel = 0.9), statistical_difference = case_when( significance == FALSE ~ "Not significant", diff --git a/vignettes/urbnindicators.Rmd b/vignettes/urbnindicators.Rmd index 9bbfede..e4d7ab2 100644 --- a/vignettes/urbnindicators.Rmd +++ b/vignettes/urbnindicators.Rmd @@ -139,15 +139,21 @@ a `geography` option comprising more units, by selecting more states, or selecti more years--can significantly increase the query time. A tract-level query of the entire US for all supported variables can take 30+ minutes. -Use `list_tables()` to see which tables are available: +Use `list_tables()` to see some of the most commonly-used tables: ```{r} list_tables() |> head(10) ``` +Or use `get_acs_codebook()` to see every table supported by the Census Bureau API: + +```{r} +get_acs_codebook() |> + ## just showing a sample of the 28,000+ variables available + slice_sample(n = 10) +``` + Here we request just two tables--`disability` and `transportation_to_work`. -Alternately, you can set `tables = NULL` (the default) and get a very wide -dataset comprising every variable supported by the package. ```{r, message = FALSE, warning = FALSE} df_urbnindicators = compile_acs_data( @@ -158,6 +164,18 @@ df_urbnindicators = compile_acs_data( spatial = TRUE) ``` +Alternately, you can pass the name of a variable or table from `get_acs_codebook()` +to `compile_acs_data()`. The equivalent would be: + +```{r} +df_urbnindicators = compile_acs_data( + years = 2024, + tables = c("sex_by_age_by_disability_status_universe", "B08301"), + geography = "county", + states = "NJ", + spatial = TRUE) +``` + ### Analyze or visualize data And now we're ready to analyze or plot our data. Simplistically: @@ -186,33 +204,15 @@ The codebook specifies the variable type and provides a definition of how the variable was calculated. Most (though not all) variables that are directly available from the ACS are count variables. Many of the variables that are calculated by `library(urbnindicators)` are percent -variables, where we divide two count variables. For example, -`disability_percent` is simply a numerator divided by a denominator: +variables, where we divide two count variables. -```{r} -df_urbnindicators %>% - attr("codebook") %>% - filter(str_detect(calculated_variable, "^disability_percent$")) %>% - select(calculated_variable, variable_type, definition) %>% - reactable::reactable() -``` - -The most common convention is that a percent variable is calculated by -dividing a count variable (e.g., `means_transportation_work_walked`) by the -universe variable for the corresponding table (e.g., `means_transportation_work_universe`). -But other derived variables are more complex, such as those for commute -mode. Here, we've aggregated three counts representing different types -of individual motor vehicle transportation to create the numerator, -while the denominator is the table universe minus individuals who work -from home. +Others, however, are quite complex. For example, `disability_percent` is +the sum of all of the sex-by-age groupings for people with disabilities (numerator) +divided by the table universe. ```{r} df_urbnindicators %>% attr("codebook") %>% - filter(str_detect(calculated_variable, "means.*motor_vehicle")) %>% - select(calculated_variable, variable_type, definition) %>% - reactable::reactable() + filter(str_detect(calculated_variable, "^disability_percent$")) %>% + pull(definition) ``` - -This allows us say something along the lines of: "Of individuals who commute to work, -XX% use a motor vehicle as their primary commute mode."