Skip to content

Allow for user-specified & cross-experiment-consistent column ordering #181

@katossky

Description

@katossky

First thanks so much for the package. This is not criticism even though I am not really good at making it sound nice.

It is at the moment quite tedious to preserve the ordering of corr-generated plots in an other plot.

Say I want to try 2 different correlation measures, or two ways to pre-process my data, I could not found any straightforward way to plot the correlations in exactly the same variable order. I've just spent 1h30 trying to hack the resulting cor_df and ggplot object to get what I wanted without success. I had to build the plot from scratch, inspecting the autoplot code in order to reach my goal.

I see two ways forward :

  1. add examples of how to do that simply (hopefully there is an undocumented easy way)
  2. add an option to rearrange for user-specified order

Current strategy :

g <- iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE) |>
  corrr::rearrange() |>
  corrr::shave() |>
  corrr::stretch() |>
  dplyr::mutate(
    x = factor(x, levels = unique(x)),
    y = factor(y, levels = unique(y))
  ) |>
  dplyr::filter(!is.na(r)) |>
  ggplot() + aes(x = x, y = y, fill = r) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(limits = c(-1, 1)) +
  scale_x_discrete() +
  labs(x = NULL, y = NULL, fill = NULL) +
  coord_fixed() +
  theme_minimal() +
  theme(
    panel.grid = element_blank(), 
    axis.text.x = element_text(angle = 315, vjust = 1, hjust = 0)
  )
g

Now change the default correlation to spearman.

ordering <- levels(g$data$x)

iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE, method = "spearman") |> # new !!!
  corrr::stretch() |>
  dplyr::mutate(
      x = factor(x, levels = ordering), # new !!!
      y = factor(y, levels = ordering)  # new, needed for filtering !!!
  ) |>
  dplyr::filter(!is.na(r), as.integer(x) < as.integer(y)) |> # new !!!
  dplyr::mutate(
      y = factor(y, levels = rev(ordering)) # back to standard plottting !!!
  ) |>
  ggplot() + aes(x = x, y = y, fill = r) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(limits = c(-1, 1)) +
  scale_x_discrete() +
  labs(x = NULL, y = NULL, fill = NULL) +
  coord_fixed() +
  theme_minimal() +
  theme(
      panel.grid = element_blank(), 
      axis.text.x = element_text(angle = 315, vjust = 1, hjust = 0)
  )

Ideal strategy :

iris_corr <- iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE) |>
  corrr::rearrange(method = "PCA")
cols <- setdiff(colnames(iris_corr), "term") # long term ordering for side-to-side comparisons
autoplot(iris_corr)

# then variations
iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE, method = "pearsons") |>
  autoplot(ordering = cols)

iris |>
  mutate(Sepal.Length = ifelse(Sepal.Length > 0, 0, Sepal.Length)) |> # or whatever
  select(-Species) |>
  corrr::correlate(quiet = TRUE) |>
  autoplot(ordering = cols) # if this path is chosen, maybe should ordering supersede method but there might be edge cases in programming where this might create problems ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions