diff --git a/README.Rmd b/README.Rmd
index dae7fa2..cb07116 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -275,4 +275,8 @@ Cite the organizations that produce the crosswalks returned by this package:
*CT Data Collaborative. (2023). 2022 Census Tract Crosswalk. Retrieved from: https://github.com/CT-Data-Collaborative/2022-tract-crosswalk.*
-- **For this package:** https://ui-research.github.io/crosswalk/authors.html#citation
\ No newline at end of file
+- **For this package:** https://ui-research.github.io/crosswalk/authors.html#citation
+
+## AI Use
+
+This package was written in part with the use of agentic AI tools under the supervision of the author.
\ No newline at end of file
diff --git a/README.md b/README.md
index 6505ad4..c274746 100644
--- a/README.md
+++ b/README.md
@@ -1,328 +1,312 @@
-
-# crosswalk
-
-An R package for translating data across space and time.
-
-## Overview
-
-This package provides a consistent API and standardized versions of
-crosswalks to enable consistent approaches that work across different
-geography and year combinations. The package also facilitates
-interpolation–that is, adjusting source geography/year values by their
-crosswalk weights and translating these values to the desired target
-geography/year–including diagnostics of the joins between source data
-and crosswalks.
-
-The package sources crosswalks from:
-
-- **Geocorr** (Missouri Census Data Center) - for inter-geography
- crosswalks (same-decade)
-- **IPUMS NHGIS** - for inter-temporal crosswalks (across decades)
-- **CT Data Collaborative** - for Connecticut 2020→2022 crosswalks
- (planning region changes)
-
-## Why Use `crosswalk`?
-
-- **Programmatic access**: No more manual downloads from web interfaces;
- data is cached for speed
-- **Standardized output**: Consistent column names across all crosswalk
- sources
-- **Metadata tracking**: Full provenance of crosswalks stored as
- attributes
-- **Crosswalk chaining**: Automatic chaining when multiple crosswalks
- are required
-
-## Installation
-
-``` r
-# Install from GitHub
-renv::install("UI-Research/crosswalk")
-```
-
-## Quick Start
-
-First we obtain a crosswalk and apply it to our data:
-
-``` r
-library(crosswalk)
-library(dplyr)
-library(ggplot2)
-library(stringr)
-library(sf)
-library(tidycensus)
-library(tigris)
-library(scales)
-
-source_data = get_acs(
- year = 2023,
- geography = "zcta",
- output = "wide",
- variables = c(below_poverty_level = "B17001_002")) %>%
- select(
- source_geoid = GEOID,
- count_below_poverty_level = below_poverty_levelE)
-
-# Get a crosswalk from ZCTAs to PUMAs (same year, uses Geocorr (2022))
-zcta_puma_crosswalk <- get_crosswalk(
- source_geography = "zcta",
- target_geography = "puma22",
- weight = "population")
-
-# Apply the crosswalk to your data
-crosswalked_data <- crosswalk_data(
- data = source_data,
- crosswalk = zcta_puma_crosswalk)
-
-## Or in a single step
-crosswalked_data = crosswalk_data(
- data = source_data,
- source_geography = "zcta",
- target_geography = "puma22",
- weight = "population")
-```
-
-What does the crosswalk(s) reflect and how was it sourced?
-
-``` r
-## and there's more (not shown)
-names(attr(crosswalked_data, "crosswalk_metadata")) %>% head()
-#> [1] "call_parameters" "data_source" "data_source_full_name"
-#> [4] "download_url" "api_endpoint" "documentation_url"
-```
-
-How well did the crosswalk join to our source data?
-
-``` r
-## look at all the characteristics of the join(s) between the source data
-## and the crosswalks
-join_quality = attr(crosswalked_data, "join_quality")
-
-## what share of records in the source data do not join to a crosswalk and
-## thus are dropped during the crosswalking process?
-join_quality$pct_data_unmatched
-#> [1] 0.4234277
-
-## zctas aren't nested within states, otherwise join_quality$state_analysis_data
-## would help us to ID whether non-joining source data were clustered within one
-## or a few states. instead we can join to spatial data to diagnose further:
-zctas_sf = zctas(year = 2023, progress_bar = FALSE)
-states_sf = states(year = 2023, cb = TRUE, progress_bar = FALSE)
-
-## apart from DC, which has a disproportionate number of non-joining ZCTAs--
-## seemingly corresponding to federal areas and buildings--the distribution of
-## non-joining ZCTAs appears proportionate to state-level populations and is
-## distributed across many states:
-zctas_sf %>%
- filter(GEOID20 %in% join_quality$data_geoids_unmatched) %>%
- st_intersection(states_sf %>% select(NAME)) %>%
- st_drop_geometry() %>%
- count(NAME, sort = TRUE) %>%
- head()
-#> NAME n
-#> 1 District of Columbia 19
-#> 2 New York 15
-#> 3 Texas 9
-#> 4 California 8
-#> 5 Colorado 6
-#> 6 Utah 6
-```
-
-And how accurate was the crosswalking process?
-
-``` r
-comparison_data = get_acs(
- year = 2023,
- geography = "puma",
- output = "wide",
- variables = c(
- below_poverty_level = "B17001_002")) %>%
- select(
- source_geoid = GEOID,
- count_below_poverty_level_acs = below_poverty_levelE)
-
-combined_data = left_join(
- comparison_data,
- crosswalked_data,
- by = c("source_geoid" = "geoid"))
-
-combined_data %>%
- select(source_geoid, matches("count")) %>%
- mutate(difference_percent = (count_below_poverty_level_acs - count_below_poverty_level) / count_below_poverty_level_acs) %>%
- ggplot() +
- geom_histogram(aes(x = difference_percent)) +
- theme_minimal() +
- theme(panel.grid = element_blank()) +
- scale_x_continuous(labels = percent) +
- labs(
- title = "Crosswalked data approximates observed values",
- subtitle = "Block group-level source data would produce more accurate crosswalked values",
- y = "",
- x = "Percent difference between observed and crosswalked values")
-```
-
-
-
-## Core Functions
-
-The package has two main functions, though you can also specify the
-needed crosswalk(s) directly from `crosswalk_data()` and omit the
-intermediate `get_crosswalk()` call.
-
-| Function | Purpose |
-|----|----|
-| `get_crosswalk()` | Fetch crosswalk(s) |
-| `crosswalk_data()` | Apply crosswalk(s) to interpolate data to the target geography-year |
-
-## Output Structure
-
-`get_crosswalk()` **always returns a list** structured as follows:
-
-The list contains three elements:
-
-| Element | Description |
-|--------------|-------------------------------------------------------|
-| `crosswalks` | A named list of crosswalks (`step_1`, `step_2`, etc.) |
-| `plan` | Details about what crosswalks are being fetched |
-| `message` | A description of the crosswalk chain |
-
-### Multi-Step Crosswalks
-
-For some source year/geography -\> target year/geography combinations,
-there is not a single direct crosswalk. The package automatically plans
-and fetches the required chain of crosswalks, using a year-first
-strategy:
-
-1. **NHGIS step(s)**: Change year while keeping geography constant
- (multiple hops if the temporal span requires it, e.g. 1990→2010→2020)
-2. **Geocorr step**: Change geography at the target year
-
-``` r
-result <- get_crosswalk(
- source_geography = "tract",
- target_geography = "zcta",
- source_year = 2010,
- target_year = 2020,
- weight = "population",
- silent = TRUE)
-
-# Two crosswalks are returned
-# Step 1: 2010 tracts -> 2020 tracts (NHGIS)
-# Step 2: 2020 tracts -> 2020 ZCTAs (Geocorr)
-
-# Longer chains are produced when needed, e.g.
-# 2000 tracts -> 2020 ZCTAs produces three steps:
-# Step 1: 2000 tracts -> 2010 tracts (NHGIS)
-# Step 2: 2010 tracts -> 2020 tracts (NHGIS)
-# Step 3: 2020 tracts -> 2020 ZCTAs (Geocorr)
-```
-
-### Crosswalk Structure
-
-Each crosswalk contains standardized columns:
-
-| Column | Description |
-|----|----|
-| `source_geoid` | Identifier for source geography |
-| `target_geoid` | Identifier for target geography |
-| `allocation_factor_source_to_target` | Weight for interpolating values |
-| `weighting_factor` | What attribute was used (population, housing, land) |
-
-Additional columns may include `source_year`, `target_year`,
-`population_2020`, `housing_2020`, and `land_area_sqmi` depending on the
-source of the crosswalk.
-
-### Accessing Metadata
-
-Each crosswalk tibble has a `crosswalk_metadata` attribute that
-documents what the crosswalk represents and how it was created:
-
-``` r
-metadata <- attr(result$crosswalks$step_1, "crosswalk_metadata")
-names(metadata)
-```
-
-## Interpolation
-
-`crosswalk_data()` applies crosswalk weights to transform your data. If
-you’re in a hurry, you can omit a call to `get_crosswalk()` and specify
-the needed crosswalk parameters to `crosswalk_data()`, which will pass
-these to `get_crosswalk()` behind the scenes. Or you can call
-`get_crosswalk()` explicitly and then pass the result to
-`crosswalk_data()`.
-
-### Column Naming Convention
-
-The function auto-detects columns based on prefixes:
-
-| Prefix | Treatment |
-|----|----|
-| `count_` | Summed after weighting (for counts like population, housing units) |
-| `mean_`, `median_`, `percent_`, `ratio_` | Weighted mean (for rates, percentages, averages) |
-
-You can also specify columns explicitly via `count_columns` and
-`non_count_columns`. All non-count variables are interpolated using
-weighted means, weighting by the allocation factor from the crosswalk.
-
-## Supported Geography and Year Combinations
-
-`get_available_crosswalks()` returns a listing of all supported
-year-geography combinations.
-
-``` r
-get_available_crosswalks() %>%
- head()
-#> # A tibble: 6 × 4
-#> source_geography target_geography source_year target_year
-#>
-#> 1 block block 1990 2010
-#> 2 block block 2000 2010
-#> 3 block block 2010 2020
-#> 4 block block 2020 2010
-#> 5 block block 2020 2022
-#> 6 block block 2022 2020
-```
-
-## API Keys
-
-NHGIS crosswalks require an IPUMS API key. Get one at
- and add to your `.Renviron`:
-
-``` r
-usethis::edit_r_environ()
-# Add: IPUMS_API_KEY=your_key_here
-```
-
-## Caching
-
-Use the `cache` parameter to save crosswalks locally for ease:
-
-``` r
-result <- get_crosswalk(
- source_geography = "tract",
- target_geography = "zcta",
- weight = "population",
- cache = here::here("crosswalks-cache"))
-```
-
-## Citations
-
-Cite the organizations that produce the crosswalks returned by this
-package:
-
-**For NHGIS**, see requirements at:
-
-
-**For Geocorr**, a suggested citation (update the year):
-
-> Missouri Census Data Center, University of Missouri. (2022/2018).
-> Geocorr 2022/2018: Geographic Correspondence Engine. Retrieved from:
->
-
-**For CTData**, a suggested citation (adjust for alternate source
-geography):
-
-> CT Data Collaborative. (2023). 2022 Census Tract Crosswalk. Retrieved
-> from: .
-
-**For this package**, refer here:
-
+
+# crosswalk
+
+An R package for translating data across space and time.
+
+## Overview
+
+This package provides a simple API and standardized versions of
+crosswalks to enable consistent, programmatic approaches that work
+across different geography and year combinations.
+
+The package also facilitates interpolation–that is, adjusting source
+geography/year values by their crosswalk weights and translating these
+values to the desired target geography/year–including diagnostics of the
+joins between source data and crosswalks.
+
+The package sources crosswalks from:
+
+- **Geocorr** (Missouri Census Data Center) - for inter-geography
+ crosswalks (same-decade)
+- **IPUMS NHGIS** - for inter-temporal crosswalks (across decades)
+- **CT Data Collaborative** - for Connecticut 2020→2022 crosswalks
+ (planning region changes)
+
+## Why Use `crosswalk`?
+
+- **Programmatic access**: No more manual downloads from web interfaces;
+ data is cached for speed
+- **Standardized output**: Consistent column names across all crosswalk
+ sources
+- **Metadata tracking**: Full provenance of crosswalks stored as
+ attributes
+- **Crosswalk chaining**: Automatic chaining when multiple crosswalks
+ are required
+
+## Installation
+
+ # Install from GitHub
+ renv::install("UI-Research/crosswalk")
+
+## Quick Start
+
+We obtain a crosswalk and apply it to our data:
+
+``` r
+library(crosswalk)
+library(dplyr)
+library(ggplot2)
+library(stringr)
+library(sf)
+library(tidycensus)
+library(tigris)
+library(scales)
+
+source_data = get_acs(
+ year = 2023,
+ geography = "zcta",
+ output = "wide",
+ variables = c(below_poverty_level = "B17001_002")) %>%
+ select(
+ source_geoid = GEOID,
+ count_below_poverty_level = below_poverty_levelE)
+
+# Get a crosswalk from ZCTAs to PUMAs (same year, uses Geocorr (2022))
+zcta_puma_crosswalk <- get_crosswalk(
+ source_geography = "zcta",
+ target_geography = "puma22",
+ weight = "population")
+
+# Apply the crosswalk to your data
+crosswalked_data <- crosswalk_data(
+ data = source_data,
+ crosswalk = zcta_puma_crosswalk)
+
+## Or in a single step
+crosswalked_data = crosswalk_data(
+ data = source_data,
+ source_geography = "zcta",
+ target_geography = "puma22",
+ weight = "population")
+```
+
+What does the crosswalk(s) reflect and how was it sourced?
+
+``` r
+## and there's more (not shown)
+names(attr(crosswalked_data, "crosswalk_metadata")) %>% head()
+#> [1] "call_parameters" "data_source" "data_source_full_name"
+#> [4] "download_url" "api_endpoint" "documentation_url"
+```
+
+How well did the crosswalk join to our source data?
+
+``` r
+## look at all the characteristics of the join(s) between the source data
+## and the crosswalks
+join_quality = attr(crosswalked_data, "join_quality")
+
+## what share of records in the source data do not join to a crosswalk and
+## thus are dropped during the crosswalking process?
+join_quality$pct_data_unmatched
+#> [1] 0.4234277
+
+## zctas aren't nested within states, otherwise join_quality$state_analysis_data
+## would help us to ID whether non-joining source data were clustered within one
+## or a few states. instead we can join to spatial data to diagnose further:
+zctas_sf = zctas(year = 2023, progress_bar = FALSE)
+states_sf = states(year = 2023, cb = TRUE, progress_bar = FALSE)
+
+## apart from DC, which has a disproportionate number of non-joining ZCTAs--
+## seemingly corresponding to federal areas and buildings--the distribution of
+## non-joining ZCTAs appears proportionate to state-level populations and is
+## distributed across many states:
+zctas_sf %>%
+ filter(GEOID20 %in% join_quality$data_geoids_unmatched) %>%
+ st_intersection(states_sf %>% select(NAME)) %>%
+ st_drop_geometry() %>%
+ count(NAME, sort = TRUE) %>%
+ head()
+#> NAME n
+#> 1 District of Columbia 19
+#> 2 New York 15
+#> 3 Texas 9
+#> 4 California 8
+#> 5 Colorado 6
+#> 6 Utah 6
+```
+
+And how accurate was the crosswalking process?
+
+``` r
+comparison_data = get_acs(
+ year = 2023,
+ geography = "puma",
+ output = "wide",
+ variables = c(
+ below_poverty_level = "B17001_002")) %>%
+ select(
+ source_geoid = GEOID,
+ count_below_poverty_level_acs = below_poverty_levelE)
+
+combined_data = left_join(
+ comparison_data,
+ crosswalked_data,
+ by = c("source_geoid" = "geoid"))
+
+combined_data %>%
+ select(source_geoid, matches("count")) %>%
+ mutate(difference_percent = (count_below_poverty_level_acs - count_below_poverty_level) / count_below_poverty_level_acs) %>%
+ ggplot() +
+ geom_histogram(aes(x = difference_percent)) +
+ theme_minimal() +
+ theme(panel.grid = element_blank()) +
+ scale_x_continuous(labels = percent) +
+ labs(
+ title = "Crosswalked data approximates observed values",
+ subtitle = "Block group-level source data would produce more accurate crosswalked values",
+ y = "",
+ x = "Percent difference between observed and crosswalked values")
+```
+
+
+
+## Core Functions
+
+The package has two main functions, though you can also specify the
+needed crosswalk(s) directly from `crosswalk_data()` and omit the
+intermediate `get_crosswalk()` call.
+
+| Function | Purpose |
+|----|----|
+| `get_crosswalk()` | Fetch crosswalk(s) |
+| `crosswalk_data()` | Apply crosswalk(s) to interpolate data to the target geography-year |
+
+## Output Structure
+
+`get_crosswalk()` **always returns a list** structured as follows:
+
+The list contains three elements:
+
+| Element | Description |
+|--------------|-------------------------------------------------------|
+| `crosswalks` | A named list of crosswalks (`step_1`, `step_2`, etc.) |
+| `plan` | Details about what crosswalks are being fetched |
+| `message` | A description of the crosswalk chain |
+
+### Multi-Step Crosswalks
+
+For some source year/geography -\> target year/geography combinations,
+there is not a single direct crosswalk. In such cases, we need two
+crosswalks. The package automatically plans and fetches the required
+crosswalks:
+
+1. **Step 1 (NHGIS)**: Change year, keep geography constant
+2. **Step 2 (Geocorr)**: Change geography at target year
+
+``` r
+result <- get_crosswalk(
+ source_geography = "tract",
+ target_geography = "zcta",
+ source_year = 2010,
+ target_year = 2020,
+ weight = "population",
+ silent = TRUE)
+
+# Two crosswalks are returned
+# Step 1: 2010 tracts -> 2020 tracts (NHGIS)
+# Step 2: 2020 tracts -> 2020 ZCTAs (Geocorr)
+```
+
+### Crosswalk Structure
+
+Each crosswalk contains standardized columns:
+
+| Column | Description |
+|----|----|
+| `source_geoid` | Identifier for source geography |
+| `target_geoid` | Identifier for target geography |
+| `allocation_factor_source_to_target` | Weight for interpolating values |
+| `weighting_factor` | What attribute was used (population, housing, land) |
+
+Additional columns may include `source_year`, `target_year`,
+`population_2020`, `housing_2020`, and `land_area_sqmi` depending on the
+source of the crosswalk.
+
+### Accessing Metadata
+
+Each crosswalk tibble has a `crosswalk_metadata` attribute that
+documents what the crosswalk represents and how it was created:
+
+``` r
+metadata <- attr(result$crosswalks$step_1, "crosswalk_metadata")
+names(metadata)
+```
+
+## Interpolation
+
+`crosswalk_data()` applies crosswalk weights to transform your data. If
+you’re in a hurry, you can omit a call to `get_crosswalk()` and specify
+the needed crosswalk parameters to `crosswalk_data()`, which will pass
+these to `get_crosswalk()` behind the scenes. Or you can call
+`get_crosswalk()` explicitly and then pass the result to
+`crosswalk_data()`.
+
+## Supported Geography and Year Combinations
+
+`get_available_crosswalks()` returns a listing of all supported
+year-geography combinations.
+
+``` r
+get_available_crosswalks() %>%
+ head()
+#> # A tibble: 6 × 4
+#> source_geography target_geography source_year target_year
+#>
+#> 1 block aiannh 2022 2022
+#> 2 block block 1990 2010
+#> 3 block block 2000 2010
+#> 4 block block 2010 2020
+#> 5 block block 2020 2010
+#> 6 block block 2020 2022
+```
+
+## API Keys
+
+NHGIS crosswalks require an IPUMS API key. Get one at
+ and add to your `.Renviron`:
+
+``` r
+usethis::edit_r_environ()
+# Add: IPUMS_API_KEY=your_key_here
+```
+
+## Caching
+
+Use the `cache` parameter to save crosswalks locally for ease:
+
+``` r
+result <- get_crosswalk(
+ source_geography = "tract",
+ target_geography = "zcta",
+ weight = "population",
+ cache = here::here("crosswalks-cache"))
+```
+
+## Citations
+
+Cite the organizations that produce the crosswalks returned by this
+package:
+
+**For NHGIS**, see requirements at:
+
+
+**For Geocorr**, a suggested citation (update the year):
+
+> Missouri Census Data Center, University of Missouri. (2022/2018).
+> Geocorr 2022/2018: Geographic Correspondence Engine. Retrieved from:
+>
+
+- **For CT Data Collaborative**, a suggested citation (adjust for
+ alternate source geography):
+
+*CT Data Collaborative. (2023). 2022 Census Tract Crosswalk. Retrieved
+from: .*
+
+- **For this package:**
+
+
+## AI Use
+
+This package was written in part with the use of agentic AI tools under
+the supervision of the author.
diff --git a/man/figures/README-unnamed-chunk-5-1.png b/man/figures/README-unnamed-chunk-5-1.png
new file mode 100644
index 0000000..c9ec9ef
Binary files /dev/null and b/man/figures/README-unnamed-chunk-5-1.png differ