Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

name: test-coverage
name: test-coverage.yaml

permissions: read-all

jobs:
test-coverage:
Expand All @@ -23,18 +24,29 @@ jobs:

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
extra-packages: any::covr, any::xml2
needs: coverage

- name: Test coverage
run: |
covr::codecov(
cov <- covr::package_coverage(
quiet = FALSE,
clean = FALSE,
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
print(cov)
covr::to_cobertura(cov)
shell: Rscript {0}

- uses: codecov/codecov-action@v5
with:
# Fail if error if not on PR, or if on PR and token is given
fail_ci_if_error: ${{ github.event_name != 'pull_request' || secrets.CODECOV_TOKEN }}
files: ./cobertura.xml
plugins: noop
disable_search: true
token: ${{ secrets.CODECOV_TOKEN }}

- name: Show testthat output
if: always()
run: |
Expand Down
6 changes: 3 additions & 3 deletions R/calculate_cvs.R
Original file line number Diff line number Diff line change
Expand Up @@ -232,10 +232,10 @@ se_weighted_mean = function(
}

#' @title Calculate a coefficient of variation
#' @details Return a coefficient of variation at the 90% level
#' @details Return a coefficient of variation reflecting the ration of the SE to the estimate
#' @param estimate The estimate
#' @param se The standard error
#' @returns A coefficient of variation at the 90% level
#' @param se The standard error (SE)
#' @returns A coefficient of variation
cv = function(estimate, se) {
cv = se / estimate * 100

Expand Down
55 changes: 21 additions & 34 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,36 +27,40 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Codecov test
coverage](https://codecov.io/gh/UI-Research/urbnindicators/graph/badge.svg)](https://app.codecov.io/gh/UI-Research/urbnindicators)

[![Codecov test coverage](https://codecov.io/gh/UI-Research/urbnindicators/graph/badge.svg)](https://app.codecov.io/gh/UI-Research/urbnindicators)
<!-- badges: end -->

# Overview

**urbnindicators** aims to provide users with analysis-ready data from
the American Community Survey (ACS).

With a single function call, you get:
What you can access:

- Hundreds of pre-computed variables, including percentages and
the raw count variables used to produce them. Or flexibly query
any table your heart desires.

- Access to hundreds of standardized variables, such as percentages and
the raw count variables used to produce them.
- Or flexibly specify your own derived variables with a series of
helper functions.

- Margins of error for all variables--those direct from the API as
well as derived variables.
well as derived variables--with correctly calculated pooled margins
of error, per Census Bureau guidance.

- Meaningful, consistent variable names.
- Meaningful, consistent variable names--no more "B01003_001"; try
"total_population_universe" instead. (But if you're fond of the API's
variable names, those are stored in the codebook as well for cross-referencing.)

- A codebook that describes how each variable is calculated.

- The built-in capacity to pull data for multiple years and multiple
states.
- Data for multiple years and multiple states out of the box.

- Supplemental measures, such as population density, that aren't
available from the ACS.

- Built-in quality checks to help ensure that calculated variables
and measures of error are accurate. Plus some good, old-fashioned manual QC.
That said--use at your own risk. We cannot and do not guarantee there aren't bugs.

- Tools to aggregate or interpolate your data to different
geographies--along with correctly adjusted margins of error.


# Installation
Expand Down Expand Up @@ -222,7 +226,7 @@ Confidence intervals are presented around each point but are extremely small"),
ACS data are available for standard geographies (tracts, counties,
states, etc.), but many analyses require non-standard areas like
neighborhoods, school zones, or planning districts.
`interpolate_acs()` aggregates tract-level data to
`interpolate_acs()` aggregates source data to
any user-defined geography, properly re-deriving percentages and
propagating margins of error:

Expand Down Expand Up @@ -274,10 +278,7 @@ df = compile_acs_data(
"snap_not_received_percent",
numerator_variables = c("snap_universe"),
numerator_subtract_variables = c("snap_received"),
denominator_variables = c("snap_universe")),
define_one_minus(
"snap_received_complement",
source_variable = "snap_received_percent")),
denominator_variables = c("snap_universe"))),
years = 2024,
geography = "county",
states = "DC")
Expand All @@ -287,18 +288,8 @@ df %>%
glimpse()
```

The available helpers are:

| Helper | Use case |
|---|---|
| `define_percent()` | Ratio of a numerator to a denominator |
| `define_across_percent()` | Percentages for every column matching a regex |
| `define_across_sum()` | Sum paired columns (e.g., male + female counts) |
| `define_one_minus()` | Complement of an existing percentage (1 - x) |
| `define_metadata()` | Codebook-only entry for a non-computed variable |

See `vignette("custom-derived-variables")` for detailed examples of
each helper.
each of the `define_*()` helpers.

# Learn More

Expand Down Expand Up @@ -331,9 +322,5 @@ Check out the vignettes for additional details:

This package is built on top of and enormously indebted to
`library(tidycensus)`, which provides the core functionality for
accessing the Census Bureau API. For users who want additional
variables, `library(tidycensus)` exposes the entire range of
pre-tabulated variables available from the ACS and provides access to
ACS microdata and other Census Bureau datasets.

Learn more here: <https://walker-data.com/tidycensus/index.html>.
accessing the Census Bureau API. Learn more here:
<https://walker-data.com/tidycensus/index.html>.
57 changes: 23 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,36 +11,42 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Codecov test
coverage](https://codecov.io/gh/UI-Research/urbnindicators/graph/badge.svg)](https://app.codecov.io/gh/UI-Research/urbnindicators)

[![Codecov test
coverage](https://codecov.io/gh/UI-Research/urbnindicators/graph/badge.svg)](https://app.codecov.io/gh/UI-Research/urbnindicators)
<!-- badges: end -->

# Overview

**urbnindicators** aims to provide users with analysis-ready data from
the American Community Survey (ACS).

With a single function call, you get:
What you can access:

- Hundreds of pre-computed variables, including percentages and the raw
count variables used to produce them. Or flexibly query any table your
heart desires.

- Access to hundreds of standardized variables, such as percentages and
the raw count variables used to produce them.
- Or flexibly specify your own derived variables with a series of helper
functions.

- Margins of error for all variables–those direct from the API as well
as derived variables.
as derived variables–with correctly calculated pooled margins of
error, per Census Bureau guidance.

- Meaningful, consistent variable names.
- Meaningful, consistent variable names–no more “B01003_001”; try
“total_population_universe” instead. (But if you’re fond of the API’s
variable names, those are stored in the codebook as well for
cross-referencing.)

- A codebook that describes how each variable is calculated.

- The built-in capacity to pull data for multiple years and multiple
states.
- Data for multiple years and multiple states out of the box.

- Supplemental measures, such as population density, that aren’t
available from the ACS.

- Built-in quality checks to help ensure that calculated variables and
measures of error are accurate. Plus some good, old-fashioned manual
QC. That said–use at your own risk. We cannot and do not guarantee
there aren’t bugs.
- Tools to aggregate or interpolate your data to different
geographies–along with correctly adjusted margins of error.

# Installation

Expand Down Expand Up @@ -209,7 +215,7 @@ Confidence intervals are presented around each point but are extremely small"),
ACS data are available for standard geographies (tracts, counties,
states, etc.), but many analyses require non-standard areas like
neighborhoods, school zones, or planning districts. `interpolate_acs()`
aggregates tract-level data to any user-defined geography, properly
aggregates source data to any user-defined geography, properly
re-deriving percentages and propagating margins of error:

``` r
Expand Down Expand Up @@ -265,10 +271,7 @@ df = compile_acs_data(
"snap_not_received_percent",
numerator_variables = c("snap_universe"),
numerator_subtract_variables = c("snap_received"),
denominator_variables = c("snap_universe")),
define_one_minus(
"snap_received_complement",
source_variable = "snap_received_percent")),
denominator_variables = c("snap_universe"))),
years = 2024,
geography = "county",
states = "DC")
Expand All @@ -284,18 +287,8 @@ df %>%
#> $ snap_not_received_percent_M <dbl> 0.0071
```

The available helpers are:

| Helper | Use case |
|---------------------------|-------------------------------------------------|
| `define_percent()` | Ratio of a numerator to a denominator |
| `define_across_percent()` | Percentages for every column matching a regex |
| `define_across_sum()` | Sum paired columns (e.g., male + female counts) |
| `define_one_minus()` | Complement of an existing percentage (1 - x) |
| `define_metadata()` | Codebook-only entry for a non-computed variable |

See `vignette("custom-derived-variables")` for detailed examples of each
helper.
of the `define_*()` helpers.

# Learn More

Expand Down Expand Up @@ -328,9 +321,5 @@ Check out the vignettes for additional details:

This package is built on top of and enormously indebted to
`library(tidycensus)`, which provides the core functionality for
accessing the Census Bureau API. For users who want additional
variables, `library(tidycensus)` exposes the entire range of
pre-tabulated variables available from the ACS and provides access to
ACS microdata and other Census Bureau datasets.

Learn more here: <https://walker-data.com/tidycensus/index.html>.
accessing the Census Bureau API. Learn more here:
<https://walker-data.com/tidycensus/index.html>.
6 changes: 3 additions & 3 deletions man/cv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 5 additions & 2 deletions vignettes/codebook.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ knitr::opts_chunk$set(
comment = "#>")
```

```{r setup, echo = FALSE}
```{r setup}
library(urbnindicators)
library(dplyr)
library(reactable)
Expand Down Expand Up @@ -68,7 +68,10 @@ critical.
## Browse the codebook

Use the search box below to filter by variable name, type, or
definition text.
definition text. Note that this codebook reflects all variables from
the tables returned by `list_tables()`, but if you were to specify
different tables in your `compile_acs_data()` call, your codebook
would comprise different variable listings.

```{r, echo = FALSE}
reactable(
Expand Down
2 changes: 1 addition & 1 deletion vignettes/custom-geographies.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ knitr::opts_chunk$set(
comment = "#>")
```

```{r setup, echo = FALSE}
```{r setup}
library(dplyr)
library(ggplot2)
library(scales)
Expand Down
32 changes: 7 additions & 25 deletions vignettes/design-philosophy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@ knitr::opts_chunk$set(
comment = "#>")
```

**urbnindicators** makes a number of opinionated design choices about
what data to select from the Census Bureau API, how to process it, what
relevant derived variables to calculate, and even which types of
geographies to support.

**urbnindicators** makes a number of opinionated design choices.
"Opinionated" doesn't mean that these decisions are the best ones for
every user or use-case, but these decisions are designed to either speed
or improve the accuracy of a common use-case involving a large set of variables
(optionally over multiple years).
or improve the accuracy of common workflows.

## Design choices

Expand All @@ -39,14 +34,6 @@ or improve the accuracy of a common use-case involving a large set of variables
larger-population geographies, such as tracts, zip codes, and
some places and counties.

- **Support only a subset of ACS variables.** Pre-calculated ACS
estimates cover tens of thousands of different variables. But, in
our work, only a small fraction of these is used frequently. We've
tried to select those common variables to return by default,
cognizant that at present, every additional variable returned
results in a slower query. Open an issue in GitHub if you'd like to
see additional variables added to the default set.

- **Rename all variables.** The default variable names returned by the
API are not human-friendly. Not only is it challenging to
determine what a given variable represents when you're looking at a
Expand All @@ -61,7 +48,8 @@ or improve the accuracy of a common use-case involving a large set of variables
documentation anywhere (apart from the codebook returned by this
package!) of a variable named, for example,
`race_personofcolor_percent`. Variables in the codebook have
their original API names included in their definitions.
their original API names included in their definitions so that you
can cross-reference these as needed.

- **Use a consistent variable naming convention.** Variable names
follow the pattern
Expand All @@ -77,7 +65,9 @@ or improve the accuracy of a common use-case involving a large set of variables
are expressed as proportions (e.g., 0.25 rather than 25). This
avoids ambiguity and simplifies downstream calculations (e.g.,
multiplying a proportion by a population count). Use
`scales::percent()` for display formatting.
`scales::percent()` for display formatting. You can always just multiply
these values (and the MOEs) by 100 if you prefer; this multiplication
requires no other adjustments to the MOEs.

- **Always propagate margins of error.** When `urbnindicators`
derives a new variable from two or more raw ACS estimates, it
Expand All @@ -88,11 +78,3 @@ or improve the accuracy of a common use-case involving a large set of variables
`vignette("quantified-survey-error")`) but are far preferable to
dropping error information entirely.

- **Design for extensibility.** New ACS tables can be added to the
package via a single `register_table()` call in
`R/table_registry.R`. The registration declaratively specifies
raw variables, derived calculations, and codebook metadata; the
codebook and margin of error calculations are generated
automatically. See `vignette("custom-derived-variables")` for a
walkthrough.

Loading
Loading