This package allows you to parse entire folders of non-rectangular ‘xlsx’ files into a single rectangular and tidy ‘data.frame’ based on a custom template file defining the column names of the output.
You can install the latest stable version of this package from CRAN:
install.packages("xlcutter")
or the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("Bisaloo/xlcutter")
Non-rectangular excel files are common in many domains. For a simple demonstration here, we use the example of the “Blue timesheet” from https://templates.office.com/, where employees can log their working hours.
A typical use case of xlcutter in this example would be for a manager who want to get a single rectangular dataset with the timesheets from different employees.
Your first step to extract the data is to define the various columns you
want in the output in a template file. You can mark the data cells to
extract with any custom marker, with the default being
{{ column_name }}
.
library(xlcutter)
data_files <- list.files(
system.file("example", "timesheet", package = "xlcutter"),
pattern = "\\.xlsx$",
full.names = TRUE
)
template_file <- system.file(
"example", "timesheet_template.xlsx",
package = "xlcutter"
)
xlsx_cutter(
data_files,
template_file
)
#> employee_firstname contract_hours employee_lastname realised_hours
#> 1 Leon 35 Bedu 29.00
#> 2 Paul 35 Dupont 35.00
#> 3 Marianne 35 Lebrun 36.25
#> manager_firstname manager_lastname period_start period_end
#> 1 <NA> Dubois 2022-01-03 2022-01-07
#> 2 Lydia Dubois 2022-01-03 2022-01-07
#> 3 Lydia Dubois 2022-01-03 2022-01-07
Other typical use cases for this package could be:
- an hospital that wants to collate non-rectangular information sheets from different patients into a single rectangular dataset