Overview
========

Simuk is a Python library for simulation-based calibration (SBC) and the generation of synthetic data.

Prior Simulation-Based Calibration (Prior SBC) is a method for validating Bayesian inference by checking
whether the posterior distributions align with the expected theoretical results derived from the prior.

Posterior Simulation-Based Calibration (Posterior SBC) is a method for validating Bayesian inference by
checking whether the posterior distributions conditioned on the augmented data (original + posterior predictive) 
align with the expected theoretical results derived from the posterior.

Prior SBC Quickstart
----------

This quickstart guide provides a simple example to help you get started. If you're looking for more examples
and use cases, be sure to check out the :doc:`examples` section.

To use SBC, you need to define a model function that generates simulated data and corresponding prior predictive
samples, then compare them to posterior samples obtained through inference.

In our case, we will take a PyMC model and pass it into our ``SBC`` class.

.. code-block:: python

    from arviz_plots import plot_ecdf_pit
    import numpy as np
    import pymc as pm

    data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
    sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

    with pm.Model() as centered_eight:
        mu = pm.Normal('mu', mu=0, sigma=5)
        tau = pm.HalfCauchy('tau', beta=5)
        theta = pm.Normal('theta', mu=mu, sigma=tau, shape=8)
        y_obs = pm.Normal('y', mu=theta, sigma=sigma, observed=data)

    # Pass it into the SBC class
    sbc = simuk.SBC(centered_eight, num_simulations=100, sample_kwargs={'draws': 25, 'tune': 50})

Now, we use the ``run_simulations`` method to generate and analyze simulated data, running the model multiple times to
compare prior and posterior distributions.

.. code-block:: python

    sbc.run_simulations()

Plot the empirical CDF to compare the differences between the prior and posterior.

.. code-block:: python

    plot_ecdf_pit(sbc.simulations,
                visuals={"xlabel":False},
    );

The lines should be nearly uniform and fall within the oval envelope. It suggests that the prior and posterior distributions
are properly aligned and that there are no significant biases or issues with the model.

Posterior SBC Quickstart
------------------------

While Prior SBC checks the global validity of an inference algorithm across the entire prior space, 
Posterior SBC evaluates validity locally, conditional on your observed data. To use it, simply pass ``method="posterior"`` and the original ``trace`` to the ``SBC`` class:
Currently, it's only implemented for PyMC.

.. warning::

    **Model requirements for Posterior SBC**

    Posterior SBC augments the observed data (concatenating original + replicated),
    which changes its size. For this to work, store observed data in ``pm.Data``
    containers, and specify size using the ``dims`` parameter instead of setting a static shape. 
    If your model uses ``dims`` and ``coords``, you are also responsible for resizing them to the correct size corresponding to the new augmented dataset via the ``update_data`` callback.
    Similarly, if your model has covariates, store them in ``pm.Data`` so they
    can be resized in the same callback.

.. code-block:: python

    # Define the model conforming to the Posterior SBC implementation requirements.
    import numpy as np
    import pymc as pm

    data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
    sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

    with pm.Model(coords={"school": np.arange(8)}) as centered_eight:
        school_idx = pm.Data("school_idx", np.arange(8))
        y_data = pm.Data("y_data", data)
        sigma_data = pm.Data("sigma_data", sigma)
        
        mu = pm.Normal('mu', mu=0, sigma=5)
        tau = pm.HalfCauchy('tau', beta=5)
        theta = pm.Normal('theta', mu=mu, sigma=tau, dims="school")
        y_obs = pm.Normal('y', mu=theta[school_idx], sigma=sigma_data, observed=y_data)

    # Run the model and save the trace.
    with centered_eight:
        idata = pm.sample(progressbar=False)
    
    # Define necessary callbacks to resize our covariates
    def update_data(model, augmented_data, simulation_idx):
        with model:
            pm.set_data({
                "sigma_data": np.concatenate([sigma, sigma]),
                "school_idx": np.concatenate([np.arange(8), np.arange(8)])
            })
    
    # Run Posterior SBC
    post_sbc = simuk.SBC(
        centered_eight,
        method="posterior",
        trace=idata,
        update_data=update_data,
        num_simulations=100,
        sample_kwargs={'draws': 25, 'tune': 50},
        progress_bar=False
    )
    post_sbc.run_simulations()

    plot_ecdf_pit(post_sbc.simulations, group="posterior_sbc", visuals={"xlabel": False})

For more advanced use cases, such as custom data augmentation or re-evaluating rank statistics, check out the :doc:`Posterior SBC tutorial <examples/gallery/posterior_sbc>`.

.. toctree::
   :maxdepth: 1
   :hidden:
   :caption: Getting Started

   Overview <self>
   installation

.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: API documentation

   api/index.rst

.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Examples

   examples

.. toctree::
   :maxdepth: 1
   :hidden:
   :caption: References

   contributing
   changelog

References
----------

- Talts, S., Betancourt, M., Simpson, D., Vehtari A., and Gelman A. (2018). “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” `arXiv:1804.06788 <https://doi.org/10.48550/arXiv.1804.06788>`_.
- Modrák, M., Moon, A, Kim, S., Bürkner, P., Huurre, N., Faltejsková, K., Gelman A and Vehtari, A.(2023). "Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. Bayesian Analysis, advance publication, DOI: `10.1214/23-BA1404 <https://projecteuclid.org/journals/bayesian-analysis/volume--1/issue--1/Simulation-Based-Calibration-Checking-for-Bayesian-Computation--The-Choice/10.1214/23-BA1404.full>`_
- Säilynoja, T., Marvin Schmitt, Paul-Christian Bürkner and Aki Vehtari (2025). "Posterior SBC: Simulation-Based Calibration Checking Conditional on Data" `arXiv:2502.03279 <https://doi.org/10.48550/arXiv.2502.03279>`_.