Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Violin quantiles are based on observations #5912

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #4120.

Briefly, instead of GeomViolin estimating quantiles based on the density, StatYdensity includes quantile as a computed variable. GeomViolin is then able to draw the quantiles based on that computed variable.

I consider this a breaking change as StatYdensity how 'owns' the draw_quantiles parameter, rather than GeomViolin. For users, this should only matter when they are breaking the link between geom_violin() and stat_ydensity(). It is also a visual change, as the quantiles are now correct.

Reprex from the issue, notice how the red quantiles of the violin now aligns with the boxplot quantiles.

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2
set.seed(5)

types <- list(
  rnorm(n = 20, mean = 5),
  c(10, 9, 9, 9, 9, 7, 7, 6, 5, 4, 1),
  c(rnorm(n = 10, mean = 2.5), rnorm(n = 10, mean = 7.5, sd = 0.5))
)

df = data.frame(
  type = rep(paste0("Type ", seq_along(types)), lengths(types)),
  val  = unlist(types)
)

ggplot(df, aes(x = type, y = val)) + theme_classic() + 
  geom_boxplot(alpha = 0.5) + 
  geom_violin(scale = "area", alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75), 
              colour = "red") +
  geom_dotplot(binaxis = "y", stackdir = "center", alpha = 0.3, dotsize = 0.4)
#> Bin width defaults to 1/30 of the range of the data. Pick better value with
#> `binwidth`.

Created on 2024-05-28 with reprex v2.1.0

@teunbrand teunbrand added the breaking change ☠️ API change likely to affect existing code label May 28, 2024
@thomasp85
Copy link
Member

I'm considering whether draw_quantiles should still be a parameter of the geom. It seems wrong that drawing is controlled by a stat.

Would it make sense to have the stat always compute the quantile, and the geom drawing it depending on the value of draw_quantile?

@teunbrand
Copy link
Collaborator Author

teunbrand commented Jul 12, 2024

In principle I agree with you, though there are some practicalities that may get in the way.
The issue is that draw_quantiles currently is not a boolean and carries the numeric values for which quantiles the stat should know about, but the geom doesn't need to know about.
Another issue is what the default quantiles that are computed should be, but it'd probably should be c(0.25, 0.5, 0.75).

Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not like how the stat flows into the how the geom behaves. Is it not possible to gracefully deprecate the old behaviour.

So, geom_violin() will check the value of draw_quantile. If it is not a boolean it will emit a deprecation warning and forward the value to quantiles in the stat (which does what draw_quantile does now)

@teunbrand
Copy link
Collaborator Author

We should discuss if we should give this the #5423 treatment. So instead of draw_quantile = TRUE/FALSE we use quantiles.linetype = 0 and add quantiles.colour = NULL, quantiles.linewidth = NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: quantiles based on observations instead of density estimates with geom_violin
2 participants