Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical x apparently not supported by arviz.plot_hdi() #2412

Open
milesalanmoore opened this issue Jan 19, 2025 · 1 comment · May be fixed by #2413
Open

Categorical x apparently not supported by arviz.plot_hdi() #2412

milesalanmoore opened this issue Jan 19, 2025 · 1 comment · May be fixed by #2413

Comments

@milesalanmoore
Copy link

milesalanmoore commented Jan 19, 2025

Hello arviz-devs! Thanks for a wonderful package. This issue comes from a discussion on the PyMC discourse including @tomicapretto and I.

Describe the bug
To my understanding, plot_hdi() does not currently support categorical x values (see code below), but this isn't explicitly noted in the documentation nor is there a ValueError or TypeError raised. Further, the functions default of smooth=True throws it's own error when a user passes a categorical x.

To Reproduce

import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Simulate data
np.random.seed(42)
x = ['A', 'B', 'C']
yA = np.random.normal(loc=5, scale=3, size=30)
yB = np.random.normal(loc=2, scale=4, size=30)
yC = np.random.normal(loc=7, scale=1.8, size=30)

# Create a DataFrame
data = pd.DataFrame({
    'y': np.concatenate([yA, yB, yC]),
    'group': np.repeat(x, repeats=30)
})
data['group'] = data['group'].astype('category')

# Plot the data
plt.figure(figsize=(8, 6))
sns.boxplot(x='group', y='y', data=data, palette='Set3')
sns.stripplot(x='group', y='y', data=data, color='black', alpha=0.5, jitter=True)
plt.title('Distribution of y across groups')
plt.xlabel('Group')
plt.ylabel('y')
plt.show()

Image

# Fit a Bayesian ANOVA model using Bambi
model = bmb.Model('y ~ group', data)
idata = model.fit()

preds = model.predict(idata, kind="response_params", inplace=False)
y_mu = az.extract(preds["posterior"])["mu"].values
group = data.loc[:, "group"].values

az.plot_hdi(x=group, y=y_mu.T)

Returns:

#UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U1'), dtype('float64')) -> None

The traceback points to np.linspace under the if smooth: block:

if smooth:
if isinstance(x[0], np.datetime64):
raise TypeError("Cannot deal with x as type datetime. Recommend setting smooth=False.")
if smooth_kwargs is None:
smooth_kwargs = {}
smooth_kwargs.setdefault("window_length", 55)
smooth_kwargs.setdefault("polyorder", 2)
x_data = np.linspace(x.min(), x.max(), 200)
x_data[0] = (x_data[0] + x_data[1]) / 2
hdi_interp = griddata(x, hdi_data, x_data)
y_data = savgol_filter(hdi_interp, axis=0, **smooth_kwargs)

Setting smooth to False does not return an expected plot:

az.plot_hdi(x=group, y=y_mu.T, smooth=False)

Image

Expected behavior
Given the documentation, I'd expect the behavior to mirror the output from bambi.interpret

bmb.interpret.plot_predictions(
    model=model,
    idata=idata,
    conditional="group",
);

Image

Additional context
arviz 0.20.0 via conda-forge
Python 3.12.0


I think a TypeError informing the user of the lack of support for categorical (str) types would be very helpful to future users.

@milesalanmoore milesalanmoore linked a pull request Jan 19, 2025 that will close this issue
4 tasks
@amaloney
Copy link

Thanks @milesalanmoore I was able to reproduce everything you gave above, and I thank you greatly for the effort given to the minimum reproducible example.

It looks like you were expecting categorical data to plot similarly to how bambi plots HDI values for categories. What I find interesting is that the ArviZ version is actually returning the same data, but it is connecting all the points together into lines. You can see this by overlaying the bambi plot with the ArviZ one.

I think a better error message at this time would be a NotImplementedError with a message saying plot_hdi does not plot categorical values.

On another note, it looks to me like you are wanting a forest plot. To do that using ArviZ, I would do the following.

import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import xarray as xr

# Simulate data
np.random.seed(42)
x = ['A', 'B', 'C']
yA = np.random.normal(loc=5, scale=3, size=30)
yB = np.random.normal(loc=2, scale=4, size=30)
yC = np.random.normal(loc=7, scale=1.8, size=30)

# Create a DataFrame
data = pd.DataFrame({
    'y': np.concatenate([yA, yB, yC]),
    'group': np.repeat(x, repeats=30)
})

# Fit a Bayesian ANOVA model using Bambi
model = bmb.Model('y ~ group', data)
idata = model.fit()
preds = model.predict(idata, kind="response_params", inplace=False)
y_mu = az.extract(preds["posterior"])["mu"].values

# forest plot
az.plot_forest(
    xr.DataArray(
        data=y_mu.reshape(90, 1, 4000),
        dims=["group", "chain", "draw"],
        coords={"group": data["group"].values},
    )
)
plt.show()

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants