Skip to content

Improve functions available for discrete observations #343

@TeemuSailynoja

Description

@TeemuSailynoja
Collaborator

Currently bayesplot offers are three functions for discrete PPCs: ppc_bars, ppc_bars_grouped, and ppc_rootogram.

In our preprint, we argue why bar graphs are usually not a good PPC, especially for binary observations.

Image

Calibration plots

I suggest adding the binned calibration plot and PAV-adjusted calibration plot under a function called ppc_calibration. For ordinal or categorical data, it would possibly be clearer to have another function that produces calibration plots for 1-vs-others, or cumulative event probability.

Residual plots

Another missing function are residual plots, ppc_error_scatter_vs_x, and ppc_error_binned work on this field, but the scatter doesn't work for discrete observations, and ppc_error_binned doesn't currently support covariates on the x-axis.

Image

  • The confidence bands in these need work, but at the very least adding options for binned residual plots and PAV-adjusted residuals would be an improvement.

Activity

TeemuSailynoja

TeemuSailynoja commented on Apr 14, 2025

@TeemuSailynoja
CollaboratorAuthor

This issue could work as a general discussion on functionality for discrete distributions. I already have bayesplot-like implementations of these from the article, but would appreciate input on the best details of arguments and features.

TeemuSailynoja

TeemuSailynoja commented on Apr 14, 2025

@TeemuSailynoja
CollaboratorAuthor

The binned residual plots are also mentioned in #263, and binned calibration plots in #150.

jgabry

jgabry commented on Apr 15, 2025

@jgabry
Member

This all sounds good, thanks Teemu! Let me know which specific feedback you want. I think adding all of this is a good idea.

TeemuSailynoja

TeemuSailynoja commented on Apr 16, 2025

@TeemuSailynoja
CollaboratorAuthor

Had a conversation about the PAV-adjusted calibration plots with @avehtari, and wanted to continue it here:

For predicted event probabilities, p = p1, ..., pN, the original CORP reliability diagrams paper makes the consistency regions as follows:

  1. Sample p_ by bootstrapping from p
  2. Sample predictions y_ ~ B(p_).
  3. Transform y_ into conditional event probabilities (CEPs) with the PAV algorithm.
  4. repeat 1-3 n.bootstrap times.
  5. Compute alpha level central confidence intervals for the CEPs at each p.

In bayesplot, we would expect y and yrep from the user, and can just compute p = colMeans(yrep), and use the posterior draws for the consistency regions. This is what we did in the Recommendations for visual PCs article.
In our conversation with Aki, I got confused with the possibility, that, especially from brms and rstanarm, we could receive posterior samples of the predicted event probability, p, itself. This would allow the consistency region computation to also include the uncertainty about p through changing the pool of available p between bootstrapping steps.

  • Now that I'm writing this, I feel that this is not complexity, that we would necessarily want into the function, but what do you think?
  • Should it still be an option to give posterior draws of event probabilities?
  • Should I make a short demo to highlight the difference?
jgabry

jgabry commented on Apr 17, 2025

@jgabry
Member

It does sound nice to be able to include the uncertainty about p, but I also agree that it's good to avoid too much complexity in the functions. I suppose we could add an optional argument for providing the draws of event probabilities but this argument would never be required. Is that basically what you were suggesting?

TeemuSailynoja

TeemuSailynoja commented on Apr 24, 2025

@TeemuSailynoja
CollaboratorAuthor

Here is a quick demo of how this ppc_calibration_overlay would show the posterior uncertainty of the calibration.
This is a raw version without proper theming, and could for example be made into a ribbon.

On the left, pava-calibration plots for two models,
on the right, posterior samples of calibration curves for the models.

jgabry

jgabry commented on Apr 25, 2025

@jgabry
Member

Thanks for the demo. I think this would be very nice to have!

jgabry

jgabry commented on Apr 25, 2025

@jgabry
Member

And just to confirm, is the idea to have two separate functions for these, ppc_calibration and ppc_calibration_overlay?

TeemuSailynoja

TeemuSailynoja commented on May 13, 2025

@TeemuSailynoja
CollaboratorAuthor

Yes, I was thinking the overlay would be a separate function. Perhaps the other could be just ppc_calibration(), and include options for showing the red calibration curve for the mean posterior predicted event probability.
There could also be ppc_loo_calibration(), which would show LOO predictive probabilities.

jgabry

jgabry commented on May 14, 2025

@jgabry
Member

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @jgabry@TeemuSailynoja

      Issue actions

        Improve functions available for discrete observations · Issue #343 · stan-dev/bayesplot