rowcolr

The goal of rowcolr is to extract structured data from semi-structured Excel files by identifying row and column labels using predefined identifiers or regex patterns. Processing such data can be challenging with existing packages. rowcolr simplifies this process by efficiently locating values at the intersection of row and column labels, making data wrangling and cleaning more seamless.

Built on the powerful tidyxl package, rowcolr ensures robust and reliable extraction of cell contents from Excel files.

Origin

The development of rowcolr was driven by a specific use case: existing Excel files used as input forms by organizations in the arts and culture sector to provide information to the Dutch Ministry of Education, Culture, and Science (OCW). To minimize disruptions to the existing workflow, the forms were kept as similar as possible to their original format while still allowing structured extraction of all relevant data from the Excel files.

Installation

You can install the development version of rowcolr from GitHub with:

devtools::install_github("coeneisma/rowcolr")

Usage

The primary function of rowcolr is extract_values(), which allows you to extract structured data from Excel files by identifying row and column labels. The function works by drawing a horizontal line from a detected row label and a vertical line from a detected column label. The value located at their intersection is extracted and stored.

The package can be loaded with:

library(rowcolr)

Example: Extracting Data from an Excel File

Values can be extracted from an Excel file using row and column identifiers or regex patterns:

# Extract values from an example Excel file with regex patterns
dataset <- extract_values(rowcolr_example("example.xlsx"), 
                          row_pattern = ".*_row$", 
                          col_pattern = ".*_col$")

head(dataset |> dplyr::select(-c(filename, sheet, row, col)))
#> # A tibble: 6 × 9
#>   row_label            col_label description data_type error logical  numeric
#>   <chr>                <chr>     <chr>       <chr>     <chr> <lgl>      <dbl>
#> 1 name_row             value_col _           character <NA>  NA            NA
#> 2 year_row             value_col _           numeric   <NA>  NA          2025
#> 3 kvk_row              value_col _           numeric   <NA>  NA      12345678
#> 4 test1_row            2026_col  _           character <NA>  NA            NA
#> 5 tes2_row             2026_col  _           character <NA>  NA            NA
#> 6 int_fixed_assets_row 2025_col  _           numeric   <NA>  NA        100000
#> # ℹ 2 more variables: date <dttm>, character <chr>

For more details on using the package, refer to the vignette: vignette("rowcolr").

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
R		R
data		data
inst/extdata		inst/extdata
man		man
pkgdown/favicon		pkgdown/favicon
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
rowcolr.Rproj		rowcolr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

rowcolr

Origin

Installation

Usage

Example: Extracting Data from an Excel File

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

coeneisma/rowcolr

Folders and files

Latest commit

History

Repository files navigation

rowcolr

Origin

Installation

Usage

Example: Extracting Data from an Excel File

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages