Overview#

Simuk is a Python library for simulation-based calibration (SBC) and the generation of synthetic data.

Prior Simulation-Based Calibration (Prior SBC) is a method for validating Bayesian inference by checking whether the posterior distributions align with the expected theoretical results derived from the prior.

Posterior Simulation-Based Calibration (Posterior SBC) is a method for validating Bayesian inference by checking whether the posterior distributions conditioned on the augmented data (original + posterior predictive) align with the expected theoretical results derived from the posterior.

Prior SBC Quickstart#

This quickstart guide provides a simple example to help you get started. If you’re looking for more examples and use cases, be sure to check out the Examples section.

To use SBC, you need to define a model function that generates simulated data and corresponding prior predictive samples, then compare them to posterior samples obtained through inference.

In our case, we will take a PyMC model and pass it into our SBC class.

from arviz_plots import plot_ecdf_pit
import numpy as np
import pymc as pm

data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

with pm.Model() as centered_eight:
    mu = pm.Normal('mu', mu=0, sigma=5)
    tau = pm.HalfCauchy('tau', beta=5)
    theta = pm.Normal('theta', mu=mu, sigma=tau, shape=8)
    y_obs = pm.Normal('y', mu=theta, sigma=sigma, observed=data)

# Pass it into the SBC class
sbc = simuk.SBC(centered_eight, num_simulations=100, sample_kwargs={'draws': 25, 'tune': 50})

Now, we use the run_simulations method to generate and analyze simulated data, running the model multiple times to compare prior and posterior distributions.

sbc.run_simulations()

Plot the empirical CDF to compare the differences between the prior and posterior.

plot_ecdf_pit(sbc.simulations,
            visuals={"xlabel":False},
);

The lines should be nearly uniform and fall within the oval envelope. It suggests that the prior and posterior distributions are properly aligned and that there are no significant biases or issues with the model.

Posterior SBC Quickstart#

While Prior SBC checks the global validity of an inference algorithm across the entire prior space, Posterior SBC evaluates validity locally, conditional on your observed data. To use it, simply pass method="posterior" and the original trace to the SBC class: Currently, it’s only implemented for PyMC.

Warning

Model requirements for Posterior SBC

Posterior SBC augments the observed data (concatenating original + replicated), which changes its size. For this to work, store observed data in pm.Data containers, and specify size using the dims parameter instead of setting a static shape. If your model uses dims and coords, you are also responsible for resizing them to the correct size corresponding to the new augmented dataset via the update_data callback. Similarly, if your model has covariates, store them in pm.Data so they can be resized in the same callback.

# Define the model conforming to the Posterior SBC implementation requirements.
import numpy as np
import pymc as pm

data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

with pm.Model(coords={"school": np.arange(8)}) as centered_eight:
    school_idx = pm.Data("school_idx", np.arange(8))
    y_data = pm.Data("y_data", data)
    sigma_data = pm.Data("sigma_data", sigma)

    mu = pm.Normal('mu', mu=0, sigma=5)
    tau = pm.HalfCauchy('tau', beta=5)
    theta = pm.Normal('theta', mu=mu, sigma=tau, dims="school")
    y_obs = pm.Normal('y', mu=theta[school_idx], sigma=sigma_data, observed=y_data)

# Run the model and save the trace.
with centered_eight:
    idata = pm.sample(progressbar=False)

# Define necessary callbacks to resize our covariates
def update_data(model, augmented_data, simulation_idx):
    with model:
        pm.set_data({
            "sigma_data": np.concatenate([sigma, sigma]),
            "school_idx": np.concatenate([np.arange(8), np.arange(8)])
        })

# Run Posterior SBC
post_sbc = simuk.SBC(
    centered_eight,
    method="posterior",
    trace=idata,
    update_data=update_data,
    num_simulations=100,
    sample_kwargs={'draws': 25, 'tune': 50},
    progress_bar=False
)
post_sbc.run_simulations()

plot_ecdf_pit(post_sbc.simulations, group="posterior_sbc", visuals={"xlabel": False})

For more advanced use cases, such as custom data augmentation or re-evaluating rank statistics, check out the Posterior SBC tutorial.

References#

Talts, S., Betancourt, M., Simpson, D., Vehtari A., and Gelman A. (2018). “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” arXiv:1804.06788.
Modrák, M., Moon, A, Kim, S., Bürkner, P., Huurre, N., Faltejsková, K., Gelman A and Vehtari, A.(2023). “Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. Bayesian Analysis, advance publication, DOI: 10.1214/23-BA1404
Säilynoja, T., Marvin Schmitt, Paul-Christian Bürkner and Aki Vehtari (2025). “Posterior SBC: Simulation-Based Calibration Checking Conditional on Data” arXiv:2502.03279.