UI-Research · wcurrangroome · Mar 4, 2026 · Mar 1, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -2,7 +2,7 @@
 
 ## What this package does
 
-**urbnindicators** is an R package that provides analysis-ready American Community Survey (ACS) data with minimal user effort. The main entry point is `compile_acs_data()`, which pulls hundreds of standardized variables (raw counts + calculated percentages), generates a codebook, and computes margins of error and coefficients of variation.
+**urbnindicators** is an R package that provides analysis-ready American Community Survey (ACS) data with minimal user effort. The main entry point is `compile_acs_data()`, which pulls hundreds of standardized variables (raw counts + calculated percentages), generates a codebook, and computes margins of error.
 
 - Five-year ACS estimates only; tract-level geography and up (no block groups)
 - Lifecycle stage: experimental
@@ -37,7 +37,7 @@ CI runs on GitHub Actions: `test-coverage.yaml` (push/PR to main) and `pkgdown.y
 - **Indentation**: 2 spaces
 - **Naming**: `snake_case` for functions and variables
 - **Variable naming pattern**: `[concept]_[subconcept]_[characteristic]_[metric]` (e.g., `race_nonhispanic_white_alone_percent`)
-- **Variable suffixes**: `_percent` for percentages, `_universe` or `_universe_` for universe variables, `_M` for margins of error, `_CV` for coefficients of variation, `_SE` for standard errors
+- **Variable suffixes**: `_percent` for percentages, `_universe` or `_universe_` for universe variables, `_M` for margins of error
 - **Documentation**: roxygen2 (v7.3.2) with markdown mode enabled
 - **Conditionals**: `dplyr::if_else()` (not base `ifelse()`)
 - **Division**: use `safe_divide(x, y)` for percentage calculations (returns 0 instead of NaN)
@@ -69,12 +69,10 @@ Users can request specific subsets of data:
 # Pull specific tables (using construct-level names)
 compile_acs_data(tables = c("race", "snap"), years = 2022, geography = "county", states = "NJ")
 
-# Pull by indicator name (returns the full parent table)
-compile_acs_data(indicators = c("snap_received_percent"), years = 2022, geography = "county", states = "NJ")
-
-# Discover available tables, indicators, and variables
+# Discover available tables and variables
 list_tables()
-list_variables()  # tibble of all variables and their table names
+list_variables()       # tibble of all variables and their table names
+get_acs_codebook()     # browse ACS variables with clean names and table codes
 ```
 
 **Construct-level table names**: Some ACS tables contain multiple constructs. These are split into separate user-facing tables:
@@ -83,7 +81,7 @@ list_variables()  # tibble of all variables and their table names
 
 Both construct names and internal names are accepted by `compile_acs_data(tables = ...)` and `resolve_tables()`.
 
-When `tables`/`indicators` are specified:
+When `tables` are specified:
 1. `resolve_tables()` determines which tables are needed (always includes `total_population`)
 2. `collect_raw_variables()` builds the named ACS variable vector for those tables
 3. Only those tables' `compute_fn` functions are called
@@ -92,19 +90,19 @@ When `tables`/`indicators` are specified:
 
 ### Key source files
 
-1. **`R/table_registry.R`** - Central registry: table definitions, `list_tables()`, `list_indicators()`, `resolve_tables()`, `collect_raw_variables()`, `expand_codebook_entry()`, and all `register_table()` calls.
-2. **`R/list_acs_variables.R`** - `list_acs_variables()` (supports optional `tables` param), `select_variables_by_name()`, `filter_variables()`.
-3. **`R/compile_acs_data.R`** - `compile_acs_data()` (with `tables`, `indicators`, deprecated `variables`), `internal_compute_acs_variables()` (legacy), `safe_divide()`.
+1. **`R/table_registry.R`** - Central registry: table definitions, `list_tables()`, `resolve_tables()`, `collect_raw_variables()`, `expand_codebook_entry()`, and all `register_table()` calls.
+2. **`R/list_acs_variables.R`** - `list_acs_variables()` (supports optional `tables` param), `select_variables_by_name()`, `filter_variables()`, `get_acs_codebook()`.
+3. **`R/compile_acs_data.R`** - `compile_acs_data()` (with `tables`, deprecated `variables`), `internal_compute_acs_variables()` (legacy), `safe_divide()`.
 4. **`R/generate_codebook.R`** - `generate_codebook()` (registry-based) and `generate_codebook_legacy()` (AST-based, for deprecated `variables` path).
-5. **`R/calculate_cvs.R`** - Computes standard errors and coefficients of variation. Parses codebook definition text strings. No changes needed when adding tables.
+5. **`R/calculate_cvs.R`** - Computes margins of error for derived variables (uses standard errors as intermediates internally). Parses codebook definition text strings. No changes needed when adding tables.
 6. **`R/make_pretty_names.R`** - Converts variable names to publication-ready labels.
 7. **`R/utils-pipe.R`** - Re-exports `%>%`.
 
 ### Exported functions
 
-- `compile_acs_data(tables, indicators, ...)` - Pull and compute ACS data
+- `compile_acs_data(tables, ...)` - Pull and compute ACS data
 - `list_tables()` - Available table names for the `tables` parameter (construct-level names)
-- `list_indicators()` - Available indicator names for the `indicators` parameter
+- `get_acs_codebook(year, table)` - Browse ACS variables with clean names and table codes
 - `list_variables(year)` - Tibble mapping all variables (raw + computed) to their table name
 - `list_acs_variables(year, tables)` - Named vector of ACS variable codes
 - `select_variables_by_name(variable_name, census_codebook)` - Filter variables by pattern
@@ -121,9 +119,9 @@ To add a new ACS table to the package:
    - `compute_fn` that calculates derived indicators using `safe_divide()` and `dplyr::across()`
    - `codebook_entries` with structured entries (types: `simple_percent`, `across_percent`, `across_sum`, `complex`, `one_minus`, `metadata`)
 2. **Add any new global variables** to the `utils::globalVariables()` call at the bottom of `R/table_registry.R`
-3. **Verify**: `devtools::load_all()` then `list_tables()` shows your table; `list_indicators()` shows your indicators
+3. **Verify**: `devtools::load_all()` then `list_tables()` shows your table
 4. **Verify codebook**: the codebook auto-generates from `codebook_entries` -- no changes to `R/generate_codebook.R` needed
-5. **Verify CVs**: `R/calculate_cvs.R` parses codebook definition strings -- no changes needed if definitions follow standard patterns
+5. **Verify MOEs**: `R/calculate_cvs.R` parses codebook definition strings -- no changes needed if definitions follow standard patterns
 6. **Update pretty names** if needed (`R/make_pretty_names.R` -- rarely needed)
 
 ### Codebook entry types
@@ -142,7 +140,7 @@ To add a new ACS table to the package:
 - Percentages must be 0-1 bounded
 - All measures must have meaningful, non-missing values
 - At least 2 distinct values per measure
-- CVs should be reasonable for tract-level data (flag if >50 for many tracts)
+- MOEs should be reasonable for tract-level data
 - Compare to published Census Bureau benchmarks when available
 
 ## Legacy path

diff --git a/NAMESPACE b/NAMESPACE
@@ -4,6 +4,7 @@ export("%>%")
 export(calculate_custom_geographies)
 export(compile_acs_data)
 export(filter_variables)
+export(get_acs_codebook)
 export(list_acs_variables)
 export(list_tables)
 export(list_variables)

diff --git a/R/auto_percent.R b/R/auto_percent.R
@@ -0,0 +1,241 @@
+#' @importFrom magrittr %>%
+
+####----AUTO-PERCENTAGE COMPUTATION FOR ARBITRARY ACS TABLES----####
+
+## Detect whether a string looks like a raw ACS table code (e.g., "B25070", "C15002B", "B01001APR")
+is_raw_acs_code = function(x) {
+  grepl("^[BC][0-9]{5}[A-I]?(?:PR)?$", x, perl = TRUE)
+}
+
+## Resolve a user-supplied string to an ACS table code.
+## Accepts:
+##   1. A raw ACS code ("B25070") -> returned as-is
+##   2. A cleaned variable name from clean_acs_names() -> resolved to parent table
+## Returns the table code, or NULL if not resolvable.
+resolve_to_acs_table = function(x, year, census_variables = NULL) {
+  if (is_raw_acs_code(x)) return(x)
+
+  ## load census codebook if not provided
+  if (is.null(census_variables)) {
+    suppressMessages({suppressWarnings({
+      census_variables = tidycensus::load_variables(year = year, dataset = "acs5")
+    })})
+  }
+
+  ## apply clean_acs_names and search for a match
+  cleaned = census_variables %>% clean_acs_names()
+  clean_col = cleaned$clean_names %>% stringr::str_remove("_$")
+
+  ## exact match first
+  match_idx = which(clean_col == x)
+  if (length(match_idx) == 0) {
+    ## try partial match (user gives a prefix)
+    match_idx = which(stringr::str_starts(clean_col, stringr::fixed(x)))
+  }
+
+  if (length(match_idx) == 0) return(NULL)
+
+  ## extract the ACS table code from the variable name (e.g., "B25070_001" -> "B25070")
+  acs_name = cleaned$name[match_idx[1]]
+  table_code = stringr::str_extract(acs_name, "^[BC][0-9]{5}[A-I]?(?:PR)?")
+  return(table_code)
+}
+
+## Build a label tree for a single ACS table.
+## Takes a data frame filtered to one table (from tidycensus::load_variables)
+## with clean_acs_names() already applied.
+## Returns the data frame with additional columns: segments, depth, is_total,
+## is_subtotal, parent_code, parent_clean_name.
+build_label_tree = function(variables_df) {
+  ## parse label segments (split on !!)
+  variables_df = variables_df %>%
+    dplyr::mutate(
+      segments = stringr::str_split(label, "!!"),
+      depth = purrr::map_int(segments, length),
+      is_total = stringr::str_detect(name, "_001$"),
+      is_subtotal = stringr::str_detect(label, ":$") & !is_total,
+      clean_name_trimmed = stringr::str_remove(clean_names, "_$"))
+
+  ## assign parent for each variable
+  ## for variable i, walk backward to find the nearest ancestor subtotal
+  ## whose segments are a strict prefix of this variable's segments
+  n = nrow(variables_df)
+  total_name = variables_df$name[1]
+  total_clean = variables_df$clean_name_trimmed[1]
+
+  parent_results = purrr::map(seq_len(n), function(i) {
+    if (variables_df$is_total[i]) {
+      return(list(parent_code = NA_character_, parent_clean_name = NA_character_))
+    }
+
+    current_segments = variables_df$segments[[i]]
+    candidates = rev(seq_len(i - 1))
+
+    ## find the nearest ancestor whose segments are a strict prefix
+    match_idx = purrr::detect(candidates, function(j) {
+      candidate_segments = variables_df$segments[[j]]
+      length(candidate_segments) < length(current_segments) &&
+        all(candidate_segments == current_segments[seq_along(candidate_segments)]) &&
+        (variables_df$is_subtotal[j] || variables_df$is_total[j])
+    })
+
+    if (!is.null(match_idx)) {
+      list(parent_code = variables_df$name[match_idx],
+           parent_clean_name = variables_df$clean_name_trimmed[match_idx])
+    } else {
+      ## fallback to _001 (table total)
+      list(parent_code = total_name, parent_clean_name = total_clean)
+    }
+  })
+
+  variables_df$parent_code = purrr::map_chr(parent_results, "parent_code")
+  variables_df$parent_clean_name = purrr::map_chr(parent_results, "parent_clean_name")
+  return(variables_df)
+}
+
+## Classify an ACS table as "count" (percentages appropriate) or "skip" (not appropriate).
+## Detection based on concept field and the _001 label.
+classify_acs_table = function(nodes) {
+  concept = nodes$concept[1]
+  total_label = nodes$label[nodes$is_total][1]
+
+  concept_lower = tolower(concept)
+  label_lower = tolower(total_label)
+
+  ## patterns that indicate non-percentage-amenable tables
+  skip_patterns = c(
+    "median", "aggregate", "average", "mean",
+    "allocation of", "imputation of",
+    "margin of error")
+
+  has_skip_pattern = purrr::some(skip_patterns, function(pattern) {
+    grepl(pattern, concept_lower, fixed = TRUE) ||
+      grepl(pattern, label_lower, fixed = TRUE)
+  })
+  if (has_skip_pattern) return("skip")
+
+  ## singleton tables (only one variable) — no meaningful percentages
+  if (nrow(nodes) <= 1) return("skip")
+
+  ## tables where the total is not a count (e.g., median income tables may have
+  ## a numeric label rather than "Estimate!!Total:")
+  if (!grepl(":", total_label) && !grepl("^Estimate!!Total$", total_label)) {
+    return("skip")
+  }
+
+  return("count")
+}
+
+## Generate simple_percent definitions for auto-computed tables.
+## For each non-total variable, produces a define_percent() call.
+## denominator_mode: "parent" (nearest parent subtotal), "total" (_001), or a specific ACS variable code.
+generate_auto_definitions = function(nodes, denominator_mode = "parent",
+                                     custom_denominator = NULL) {
+  ## only process non-total variables
+  leaf_nodes = nodes %>% dplyr::filter(!is_total)
+
+  if (nrow(leaf_nodes) == 0) return(list())
+
+  ## determine total row clean name (for "total" mode or fallback)
+  total_clean_name = nodes$clean_name_trimmed[nodes$is_total][1]
+
+  ## if a custom denominator ACS code is given, find its clean name
+  custom_denom_clean = NULL
+  if (!is.null(custom_denominator)) {
+    match_row = nodes %>% dplyr::filter(name == custom_denominator)
+    if (nrow(match_row) > 0) {
+      custom_denom_clean = match_row$clean_name_trimmed[1]
+    } else {
+      rlang::warn(paste0("Custom denominator '", custom_denominator,
+                         "' not found in table. Falling back to table total."))
+      denominator_mode = "total"
+    }
+  }
+
+  purrr::map(seq_len(nrow(leaf_nodes)), function(i) {
+    row = leaf_nodes[i, ]
+    numerator = row$clean_name_trimmed
+
+    ## determine denominator
+    if (!is.null(custom_denom_clean)) {
+      denominator = custom_denom_clean
+    } else if (denominator_mode == "total") {
+      denominator = total_clean_name
+    } else {
+      ## "parent" mode: use parent_clean_name, fall back to total
+      denominator = row$parent_clean_name
+      if (is.na(denominator)) denominator = total_clean_name
+    }
+
+    ## skip if numerator == denominator (the total itself as a subtotal)
+    if (identical(numerator, denominator)) return(NULL)
+
+    ## raw variables ending in _pct (renamed from _percent by clean_acs_names):
+    ## replace _pct with _percent so the computed column gets the standard suffix
+    if (grepl("_pct$", numerator)) {
+      output = sub("_pct$", "_percent", numerator)
+    } else {
+      output = paste0(numerator, "_percent")
+    }
+    define_percent(output = output, numerator = numerator, denominator = denominator)
+  }) %>% purrr::compact()
+}
+
+## Orchestrator: build a complete auto-table entry from an ACS table code.
+## Returns a list with the same shape as register_table() entries, plus is_auto = TRUE.
+## Pass census_variables to avoid redundant tidycensus::load_variables() calls.
+build_auto_table_entry = function(table_code, year, denominator_mode = "parent",
+                                  custom_denominator = NULL,
+                                  census_variables = NULL) {
+  ## load variables only if not provided
+  if (is.null(census_variables)) {
+    suppressMessages({suppressWarnings({
+      census_variables = tidycensus::load_variables(year = year, dataset = "acs5")
+    })})
+  }
+
+  table_vars = census_variables %>%
+    dplyr::filter(stringr::str_detect(name, paste0("^", table_code, "_")))
+
+  if (nrow(table_vars) == 0) {
+    stop(paste0("ACS table '", table_code, "' not found in the ", year,
+                " 5-year ACS. Check the table code."))
+  }
+
+  ## apply clean_acs_names
+  table_vars = table_vars %>% clean_acs_names()
+
+  ## build label tree
+  nodes = build_label_tree(table_vars)
+
+  ## classify table
+  table_type = classify_acs_table(nodes)
+
+  ## generate definitions
+  if (table_type == "count") {
+    definitions = generate_auto_definitions(
+      nodes,
+      denominator_mode = denominator_mode,
+      custom_denominator = custom_denominator)
+  } else {
+    definitions = list()
+  }
+
+  ## build raw_variables named vector (clean_name_ -> ACS code)
+  raw_variables = stats::setNames(nodes$name, paste0(nodes$clean_name_trimmed, "_"))
+
+  list(
+    name = table_code,
+    description = nodes$concept[1],
+    acs_tables = table_code,
+    depends_on = character(0),
+    raw_variable_source = list(type = "manual"),
+    raw_variables = raw_variables,
+    definitions = definitions,
+    is_auto = TRUE,
+    table_type = table_type)
+}
+
+utils::globalVariables(c(
+  "clean_names", "is_total", "clean_name_trimmed",
+  "segments", "depth", "is_subtotal", "parent_code", "parent_clean_name"))