Options for another kind of stacking? #2006

ddsjoberg · 2024-09-29T04:50:11Z

Rather than using the row headers in gt (and for the other print engines, just adding a new column to the left), I would like to be able to stack tables and have the individual tables indented with the headers.

I also want this somehow integrated with tbl_strata() for stacking, but haven't thought through all those details yet.

The text was updated successfully, but these errors were encountered:

ddsjoberg · 2024-10-01T15:57:25Z

Maybe if we added a new function (something like tbl_nested_stack()), and the option could easily be added an a combine method in tbl_strata()

dereksonderegger · 2024-10-10T15:34:10Z

In the following, I'm thinking about creating a grid of tables, each already containing a by split...

There are two approaches to merging/stacking:

Independently create a bunch of tables and then merge/stack them. In this case, the function has to check and align column and row information in case the subtables are somehow different, e.g. a by level that was present in one subtable and not the other (possible if by was a character string and not a factor). This is particularly frustrating when both merging and stacking. If one of the subtables didn't have any data, then we are in real trouble because gtsummary::tbl_summary() won't produce a completely empty table suitable to take up that cell space in the subsequent grid.
In tbl_ard_summary() have a merge= and stack= options that take one or more of the groups variables. The nice thing about this is that we get to see all of the data and determine what grouping levels are present and then then when the grid of merged/stacked tables is created, any grid cell with no data can be filled in with missing statistics. It would make sense to create an cards::ard_expand() function that just made sure all combinations of grouping variables are created, inserting n=0,N=0 and NAs for the statistics. Then we could merge and stack knowing that there aren't any missing grid cells.

For my own use, I've written a wrapper around tbl_summary(), tbl_merge(), and tbl_stack() to automate the first approach. This function has by, merge, and stack parameters but unfortunately bombs out in the edge case where some combination of merge and stack levels has no data. This has been particularly useful for quickly creating tables for subgroup analyses in Clinical Research (e.g. by=treatment_method, merge=sex, stack=region to look at the treatment effect by sex across different world regions). This could crash out if for some reason I didn't have any female subjects in some region. This happens a lot during the early enrollment time in a clinical study because the subject numbers are still small.

Ultimately I can see use cases for both approaches and would love to see both implemented.

As always, gtsummary is amazing and I'm thrilled to see you working on a package that takes the ARD structure and makes it easy to do the formatting and then spread the formatted statistics out into a structured table.

Just in case my discussion of a grid cell with missing data isn't clear enough, I've added an example...

library(tidyverse)

data <- palmerpenguins::penguins |>
  filter(!is.na(sex)) |>
  filter( !(species=='Chinstrap' & year==2007) ) |> # remove a group
  select(body_mass_g, sex, species, island, year)


# Adelie penguins are on all three islands, so there will be 3 columns
A_2007 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Adelie', year == 2007) |>
    select(body_mass_g, sex, island), 
  by=island
)
A_2008 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Adelie', year == 2008) |>
    select(body_mass_g, sex, island), 
  by=island
)

# Chinstrap penguins are only on Dream island.
# 
# This blows up so in my looping, I have to double check if there is data
# C_2007 <- gtsummary::tbl_summary(
#   data |> 
#     filter(species == 'Chinstrap', year == 2007) |>
#     select(body_mass_g, sex, island), 
#   by=island
# )
C_2008 <- gtsummary::tbl_summary(
  data |>
    filter(species == 'Chinstrap', year == 2008) |>
    select(body_mass_g, sex, island),
  by=island
)


# Gentoo's are only on Biscoe Island
G_2007 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Gentoo', year == 2007) |>
    select(body_mass_g, sex, island), 
  by=island
)
G_2008 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Gentoo', year == 2008) |>
    select(body_mass_g, sex, island), 
  by=island
)


R_2007 <- gtsummary::tbl_merge( 
  list(A_2007,G_2007), 
  tab_spanner=c('Adelie','Gentoo') )
R_2008 <- gtsummary::tbl_merge( 
  list(A_2008, C_2008, G_2008), 
  tab_spanner=c('Adelie','Chinstrap','Gentoo') )

# Now the stack is all messed up even ignoring the N counts
gtsummary::tbl_stack( 
  list(R_2007, R_2008),                    
  group_header=c('2007','2008'))
#> Column headers among stacked tables differ. Headers from the first table are
#> used.
#> ℹ Use `quiet = TRUE` to suppress this message.

Characteristic	Adelie			Gentoo
Characteristic	Biscoe N = 10¹	Dream N = 19¹	Torgersen N = 15¹	Biscoe N = 33¹	Dream N = 0¹	Torgersen N = 0¹	Biscoe N = 45¹	Dream N = 0¹	Torgersen N = 0¹
2007
body_mass_g	3,700 (3,400, 3,800)	3,550 (3,300, 4,150)	3,700 (3,450, 4,200)	5,050 (4,650, 5,550)	NA (NA, NA)	NA (NA, NA)
sex
female	5 (50%)	9 (47%)	8 (53%)	16 (48%)	0 (NA%)	0 (NA%)
male	5 (50%)	10 (53%)	7 (47%)	17 (52%)	0 (NA%)	0 (NA%)
2008
body_mass_g	3,650 (3,350, 4,050)	3,650 (3,450, 4,200)	3,850 (3,575, 4,175)	NA (NA, NA)	3,750 (3,500, 4,100)	NA (NA, NA)	5,000 (4,700, 5,400)	NA (NA, NA)	NA (NA, NA)
sex
female	9 (50%)	8 (50%)	8 (50%)	0 (NA%)	9 (50%)	0 (NA%)	22 (49%)	0 (NA%)	0 (NA%)
male	9 (50%)	8 (50%)	8 (50%)	0 (NA%)	9 (50%)	0 (NA%)	23 (51%)	0 (NA%)	0 (NA%)
¹ Median (Q1, Q3); n (%)

^{Created on 2024-10-10 with reprex v2.1.1}

ddsjoberg mentioned this issue Nov 13, 2024

tbl_hierarchical() enhancements #2021

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options for another kind of stacking? #2006

Options for another kind of stacking? #2006

ddsjoberg commented Sep 29, 2024

ddsjoberg commented Oct 1, 2024

dereksonderegger commented Oct 10, 2024 •

edited

Loading

Options for another kind of stacking? #2006

Options for another kind of stacking? #2006

Comments

ddsjoberg commented Sep 29, 2024

ddsjoberg commented Oct 1, 2024

dereksonderegger commented Oct 10, 2024 • edited Loading

dereksonderegger commented Oct 10, 2024 •

edited

Loading