Overview ======== Simuk is a Python library for simulation-based calibration (SBC) and the generation of synthetic data. Prior Simulation-Based Calibration (Prior SBC) is a method for validating Bayesian inference by checking whether the posterior distributions align with the expected theoretical results derived from the prior. Posterior Simulation-Based Calibration (Posterior SBC) is a method for validating Bayesian inference by checking whether the posterior distributions conditioned on the augmented data (original + posterior predictive) align with the expected theoretical results derived from the posterior. Prior SBC Quickstart ---------- This quickstart guide provides a simple example to help you get started. If you're looking for more examples and use cases, be sure to check out the :doc:`examples` section. To use SBC, you need to define a model function that generates simulated data and corresponding prior predictive samples, then compare them to posterior samples obtained through inference. In our case, we will take a PyMC model and pass it into our ``SBC`` class. .. code-block:: python from arviz_plots import plot_ecdf_pit import numpy as np import pymc as pm data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]) sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]) with pm.Model() as centered_eight: mu = pm.Normal('mu', mu=0, sigma=5) tau = pm.HalfCauchy('tau', beta=5) theta = pm.Normal('theta', mu=mu, sigma=tau, shape=8) y_obs = pm.Normal('y', mu=theta, sigma=sigma, observed=data) # Pass it into the SBC class sbc = simuk.SBC(centered_eight, num_simulations=100, sample_kwargs={'draws': 25, 'tune': 50}) Now, we use the ``run_simulations`` method to generate and analyze simulated data, running the model multiple times to compare prior and posterior distributions. .. code-block:: python sbc.run_simulations() Plot the empirical CDF to compare the differences between the prior and posterior. .. code-block:: python plot_ecdf_pit(sbc.simulations, visuals={"xlabel":False}, ); The lines should be nearly uniform and fall within the oval envelope. It suggests that the prior and posterior distributions are properly aligned and that there are no significant biases or issues with the model. Posterior SBC Quickstart ------------------------ While Prior SBC checks the global validity of an inference algorithm across the entire prior space, Posterior SBC evaluates validity locally, conditional on your observed data. To use it, simply pass ``method="posterior"`` and the original ``trace`` to the ``SBC`` class: Currently, it's only implemented for PyMC. .. warning:: **Model requirements for Posterior SBC** Posterior SBC augments the observed data (concatenating original + replicated), which changes its size. For this to work, store observed data in ``pm.Data`` containers, and specify size using the ``dims`` parameter instead of setting a static shape. If your model uses ``dims`` and ``coords``, you are also responsible for resizing them to the correct size corresponding to the new augmented dataset via the ``update_data`` callback. Similarly, if your model has covariates, store them in ``pm.Data`` so they can be resized in the same callback. .. code-block:: python # Define the model conforming to the Posterior SBC implementation requirements. import numpy as np import pymc as pm data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]) sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]) with pm.Model(coords={"school": np.arange(8)}) as centered_eight: school_idx = pm.Data("school_idx", np.arange(8)) y_data = pm.Data("y_data", data) sigma_data = pm.Data("sigma_data", sigma) mu = pm.Normal('mu', mu=0, sigma=5) tau = pm.HalfCauchy('tau', beta=5) theta = pm.Normal('theta', mu=mu, sigma=tau, dims="school") y_obs = pm.Normal('y', mu=theta[school_idx], sigma=sigma_data, observed=y_data) # Run the model and save the trace. with centered_eight: idata = pm.sample(progressbar=False) # Define necessary callbacks to resize our covariates def update_data(model, augmented_data, simulation_idx): with model: pm.set_data({ "sigma_data": np.concatenate([sigma, sigma]), "school_idx": np.concatenate([np.arange(8), np.arange(8)]) }) # Run Posterior SBC post_sbc = simuk.SBC( centered_eight, method="posterior", trace=idata, update_data=update_data, num_simulations=100, sample_kwargs={'draws': 25, 'tune': 50}, progress_bar=False ) post_sbc.run_simulations() plot_ecdf_pit(post_sbc.simulations, group="posterior_sbc", visuals={"xlabel": False}) For more advanced use cases, such as custom data augmentation or re-evaluating rank statistics, check out the :doc:`Posterior SBC tutorial `. .. toctree:: :maxdepth: 1 :hidden: :caption: Getting Started Overview installation .. toctree:: :maxdepth: 2 :hidden: :caption: API documentation api/index.rst .. toctree:: :maxdepth: 2 :hidden: :caption: Examples examples .. toctree:: :maxdepth: 1 :hidden: :caption: References contributing changelog References ---------- - Talts, S., Betancourt, M., Simpson, D., Vehtari A., and Gelman A. (2018). “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” `arXiv:1804.06788 `_. - Modrák, M., Moon, A, Kim, S., Bürkner, P., Huurre, N., Faltejsková, K., Gelman A and Vehtari, A.(2023). "Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. Bayesian Analysis, advance publication, DOI: `10.1214/23-BA1404 `_ - Säilynoja, T., Marvin Schmitt, Paul-Christian Bürkner and Aki Vehtari (2025). "Posterior SBC: Simulation-Based Calibration Checking Conditional on Data" `arXiv:2502.03279 `_.