simuk.SBC#
- class simuk.SBC(model, method='prior', num_simulations=1000, sample_kwargs=None, seed=None, data_dir=None, simulator=None, trace=None, augment_observed=None, update_data=None, transform=None, keep_fits=True, progress_bar=True)[source]#
Simulation-based calibration checking (SBC).
Supports two modes of operation:
Prior SBC (
method="prior", default): validates that the inference algorithm across the prior. Reference draws come from the prior and replicated data from the prior predictive (Talts et al., 2020 [1]).Posterior SBC (
method="posterior"): validates that the inference algorithm across the posterior. Reference draws come from the original posterior and replicated data from the posterior predictive. The model is then re-fit on the concatenation of the original observations and the replicated data (Säilynoja et al., 2025 [2]).
- Parameters:
model (pymc.Model, bambi.Model or numpyro.infer.mcmc.MCMCKernel) – A PyMC, Bambi model or NumPyro MCMC kernel. If a PyMC model the data needs to be defined as mutable data.
method ({"prior", "posterior"}, default "prior") – Which variant of SBC to perform.
num_simulations (int, default 1000) – How many SBC iterations to run.
sample_kwargs (dict, optional) – Keyword arguments forwarded to
pymc.sample(orbambi.Model.fit/numpyro.infer.MCMC).seed (int, optional) – Random seed. This persists even if running the simulations is paused for whatever reason.
data_dir (dict, optional) – Keyword arguments passed to numpyro model, intended for use when providing an MCMC Kernel model.
simulator (callable, optional) – A custom data-generating function. It receives the model parameter values as keyword arguments plus a
seedinteger, and must return adictmapping observed-variable names to numpy arrays.trace (arviz.InferenceData, optional) – Required for
method="posterior". An InferenceData object that contains both theposteriorandobserved_datagroups. The number of posterior draws per chain must be at leastnum_simulations.augment_observed (callable, optional) –
Posterior SBC only. Signature:
(model, observed_data, replicated_data, simulation_idx) -> dict. Builds the augmented observed data that the model will be conditioned on.observed_datais the xarray Dataset fromtrace["observed_data"], andreplicated_datais adict[str, np.ndarray]of the simulated observations from the original posterior predictive for the current iteration. The returneddictmaps variable names to the augmented data.The default behaviour concatenates the original and replicated observations along the first axis for each variable. Provide this callback when simple concatenation is not valid, e.g. for structured data.
update_data (callable, optional) – Posterior SBC only. Signature:
(model, augmented_data, simulation_idx) -> None. Called before conditioning the model on the augmented data. Use this to resize covariates, coordinate labels, or otherpm.Datacontainers so that the model is consistent with the augmented dataset.transform (callable, optional) – A transform applied to both the reference draw and the posterior draws before computing the rank statistic. Signature:
(param_name, param_value) -> transformed_value. Useful for defining scalar test quantities (e.g.lambda param_name, param_value: np.mean(param_value)to test the mean of a vector parameter). The return values must be comparable with the<operator. The default is the identity (rank on the raw parameter values).keep_fits (bool, default True) – Whether to store posteriors to allow re-evaluation of rank statistics using a different quantity (
compute_rank_statistics) without needing to run the simulations again.
Notes
Prior SBC exploits the self-consistency of Bayesian updating: if \(\theta' \sim \pi(\theta)\) and \(y' \sim \pi(y \mid \theta')\), then \(\theta'\) is also a draw from \(\pi(\theta \mid y')\). See Talts et al., 2020 [1].
Posterior SBC uses the same self-consistency after conditioning on observed data \(y_{\text{obs}}\). A draw \(\theta'_i \sim \pi(\theta \mid y_{\text{obs}})\) and a replicated dataset \(y_i \sim \pi(y \mid \theta'_i)\) are combined so that \(\theta'_i\) is also a draw from \(\pi(\theta \mid y_i, y_{\text{obs}})\). The rank of \(\theta'_i\) among augmented-posterior draws should be uniformly distributed if the inference is calibrated. See Säilynoja et al., 2025 [2].
References
Examples
Prior SBC (default):
import pymc as pm import simuk with pm.Model() as model: x = pm.Normal('x') y = pm.Normal('y', mu=2 * x, observed=obs) sbc = simuk.SBC(model, num_simulations=200) sbc.run_simulations()
Posterior SBC – validate inference conditional on observed data:
import pymc as pm import simuk with pm.Model() as model: x = pm.Normal('x') y = pm.Normal('y', mu=2 * x, observed=obs) # 1. Obtain posterior samples from the real data trace = pm.sample() # 2. Run posterior SBC sbc = simuk.SBC( model, method="posterior", trace=trace, num_simulations=200, ) sbc.run_simulations()
Methods
- compute_rank_statistics(transform=None)[source]#
Compute the rank statistic for the reference parameters.
This method computes the rank of each reference parameter value relative to the newly sampled posterior draws for each simulation.
This allows users to recompute rank statistics rapidly using a different parameter transformation without needing to rerun the simulations.
- Parameters:
transform (callable, optional) – A function that accepts two arguments: (param_name, param_value). This function is applied to both the posterior draws and the reference parameter draws before computing the rank. For instance, it can be used to take the mean over a vectorized parameter grouping. If None, defaults to the transform passed during class initialization.
- Returns:
An xarray.DataTree containing the computed rank statistics, matching the output structure generated by run_simulations.
- Return type:
xarray.DataTree