API Reference

This reference provides detailed documentation for all modules, classes, and methods in the current release of Bambi.

bambi.diagnostics

bambi.diagnostics.autocorr(x)[source]

Compute autocorrelation using FFT for every lag for the input array https://en.wikipedia.org/wiki/Autocorrelation#Efficient_computation.

Parameters

x (array-like) – An array containing MCMC samples.

Returns

An array of the same size as the input array.

Return type

np.ndarray

bambi.diagnostics.autocov(x)[source]

Compute autocovariance estimates for every lag for the input array.

Parameters

x (array-like) – An array containing MCMC samples.

Returns

An array of the same size as the input array.

Return type

np.ndarray

bambi.diagnostics.effective_n(mcmc)[source]
Parameters

mcmc (MCMCResults) – Pre-sliced MCMC samples to compute diagnostics for.

bambi.diagnostics.gelman_rubin(mcmc)[source]
Parameters

mcmc (MCMCResults) – Pre-sliced MCMC samples to compute diagnostics for.

bambi.models

class bambi.models.Model(data=None, default_priors=None, auto_scale=True, dropna=False, taylor=None, noncentered=True)[source]
data

the dataset to use. Either a pandas DataFrame, or the name of the file containing the data, which will be passed to pd.read_csv().

Type

DataFrame, str

default_priors

An optional specification of the default priors to use for all model terms. Either a dict containing named distributions, families, and terms (see the documentation in priors.PriorFactory for details), or the name of a JSON file containing the same information.

Type

dict, str

auto_scale

If True (default), priors are automatically rescaled to the data (to be weakly informative) any time default priors are used. Note that any priors explicitly set by the user will always take precedence over default priors.

Type

bool

dropna

When True, rows with any missing values in either the predictors or outcome are automatically dropped from the dataset in a listwise manner.

Type

bool

taylor

Order of Taylor expansion to use in approximate variance when constructing the default priors. Should be between 1 and 13. Lower values are less accurate, tending to undershoot the correct prior width, but are faster to compute and more stable. Odd- numbered values tend to work better. Defaults to 5 for Normal models and 1 for non-Normal models. Values higher than the defaults are generally not recommended as they can be unstable.

Type

int

noncentered

If True (default), uses a non-centered parameterization for normal hyperpriors on grouped parameters. If False, naive (centered) parameterization is used.

Type

bool

add(fixed=None, random=None, priors=None, family='gaussian', link=None, categorical=None, append=True)[source]

Adds one or more terms to the model via an R-like formula syntax.

Parameters
  • fixed (str) – Optional formula specification of fixed effects.

  • random (list) – Optional list-based specification of random effects.

  • priors (dict) – Optional specification of priors for one or more terms. A dict where the keys are the names of terms in the model, and the values are either instances of class Prior or ints, floats, or strings that specify the width of the priors on a standardized scale.

  • family (str, Family) – A specification of the model family (analogous to the family object in R). Either a string, or an instance of class priors.Family. If a string is passed, a family with the corresponding name must be defined in the defaults loaded at Model initialization. Valid pre-defined families are ‘gaussian’, ‘bernoulli’, ‘poisson’, and ‘t’.

  • link (str) – The model link function to use. Can be either a string (must be one of the options defined in the current backend; typically this will include at least ‘identity’, ‘logit’, ‘inverse’, and ‘log’), or a callable that takes a 1D ndarray or theano tensor as the sole argument and returns one with the same shape.

  • categorical (str, list) – The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is None, the data type of the columns in the DataFrame will be used to infer handling. In cases where numeric columns are to be treated as categoricals (e.g., random factors coded as numerical IDs), explicitly passing variable names via this argument is recommended.

  • append (bool) – If True, terms are appended to the existing model rather than replacing any existing terms. This allows formula-based specification of the model in stages.

build(backend=None)[source]

Set up the model for sampling/fitting.

Performs any steps that require access to all model terms (e.g., scaling priors on each term), then calls the BackEnd’s build() method.

Parameters

backend (str) – The name of the backend to use for model fitting. Currently, ‘pymc’ and ‘stan’ are supported. If None, assume that fit() has already been called (possibly without building), and look in self._backend_name.

fit(fixed=None, random=None, priors=None, family='gaussian', link=None, run=True, categorical=None, backend=None, **kwargs)[source]

Fit the model using the specified BackEnd.

Parameters
  • fixed (str) – Optional formula specification of fixed effects.

  • random (list) – Optional list-based specification of random effects.

  • priors (dict) – Optional specification of priors for one or more terms. A dict where the keys are the names of terms in the model, and the values are either instances of class Prior or ints, floats, or strings that specify the width of the priors on a standardized scale.

  • family (str, Family) – A specification of the model family (analogous to the family object in R). Either a string, or an instance of class priors.Family. If a string is passed, a family with the corresponding name must be defined in the defaults loaded at Model initialization. Valid pre-defined families are ‘gaussian’, ‘bernoulli’, ‘poisson’, and ‘t’.

  • link (str) – The model link function to use. Can be either a string (must be one of the options defined in the current backend; typically this will include at least ‘identity’, ‘logit’, ‘inverse’, and ‘log’), or a callable that takes a 1D ndarray or theano tensor as the sole argument and returns one with the same shape.

  • run (bool) – Whether or not to immediately begin fitting the model once any set up of passed arguments is complete.

  • categorical (str, list) – The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is None, the data type of the columns in the DataFrame will be used to infer handling. In cases where numeric columns are to be treated as categoricals (e.g., random factors coded as numerical IDs), explicitly passing variable names via this argument is recommended.

  • backend (str) – The name of the BackEnd to use. Currently only ‘pymc’ and ‘stan’ backends are supported. Defaults to PyMC3.

property fixed_terms

Return dict of all and only fixed effects in model.

property random_terms

Return dict of all and only random effects in model.

reset()[source]

Reset list of terms and y-variable.

set_priors(priors=None, fixed=None, random=None, match_derived_names=True)[source]

Set priors for one or more existing terms.

Parameters
  • priors (dict) – Dict of priors to update. Keys are names of terms to update; values are the new priors (either a Prior instance, or an int or float that scales the default priors). Note that a tuple can be passed as the key, in which case the same prior will be applied to all terms named in the tuple.

  • fixed (Prior, int, float, str) – a prior specification to apply to all fixed terms currently included in the model.

  • random (Prior, int, float, str) – a prior specification to apply to all random terms currently included in the model.

  • match_derived_names (bool) – if True, the specified prior(s) will be applied not only to terms that match the keyword exactly, but to the levels of random effects that were derived from the original specification with the passed name. For example, priors={‘condition|subject’:0.5} would apply the prior to the terms with names ‘1|subject’, ‘condition[T.1]|subject’, and so on. If False, an exact match is required for the prior to be applied.

property term_names

Return names of all terms in order of addition to model.

class bambi.models.RandomTerm(name, data, predictor, grouper, categorical=False, prior=None, constant=None)[source]
class bambi.models.Term(name, data, categorical=False, prior=None, constant=None)[source]

Representation of a single (fixed) model term.

name

Name of the term.

Type

str

data

The term values.

Type

DataFrame, Series, ndarray

categorical

If True, the source variable is interpreted as nominal/categorical. If False, the source variable is treated as continuous.

Type

bool

prior

A specification of the prior(s) to use. An instance of class priors.Prior.

Type

Prior

constant

indicates whether the term levels collectively act as a constant, in which case the term is treated as an intercept for prior distribution purposes.

Type

bool

bambi.priors

class bambi.priors.Family(name, prior, link, parent)[source]

A specification of model family.

name

Family name.

Type

str

prior

A Prior instance specifying the model likelihood prior.

Type

Prior

The name of the link function transforming the linear model prediction to a parameter of the likelihood.

Type

str

parent

The name of the prior parameter to set to the link- transformed predicted outcome (e.g., mu, p, etc.).

Type

str

class bambi.priors.Prior(name, scale=None, **kwargs)[source]

Abstract specification of a term prior.

name

Name of prior distribution (e.g., Normal, Bernoulli, etc.)

Type

str

kwargs

Optional keywords specifying the parameters of the named distribution.

Type

dict

update(**kwargs)[source]

Update the model arguments with additional arguments.

Parameters

kwargs (dict) – Optional keyword arguments to add to prior args.

class bambi.priors.PriorFactory(defaults=None, dists=None, terms=None, families=None)[source]

An object that supports specification and easy retrieval of default priors.

defaults

Optional base configuration containing default priors for distribution, families, and term types. If a string, the name of a JSON file containing the config. If a dict, must contain keys for ‘dists’, ‘terms’, and ‘families’; see the built-in JSON configuration for an example. If None, a built-in set of priors will be used as defaults.

Type

str, dict

dists

Optional specification of named distributions to use as priors. Each key gives the name of a newly defined distribution; values are two-element lists, where the first element is the name of the built-in distribution to use (‘Normal’, ‘Cauchy’, etc.), and the second element is a dictionary of parameters on that distribution (e.g., {‘mu’: 0, ‘sd’: 10}). Priors can be nested to arbitrary depths by replacing any parameter with another prior specification.

Type

dict

terms

Optional specification of default priors for different model term types. Valid keys are ‘intercept’, ‘fixed’, or ‘random’. Values are either strings preprended by a #, in which case they are interpreted as pointers to distributions named in the dists dictionary, or key -> value specifications in the same format as elements in the dists dictionary.

Type

dict

families

Optional specification of default priors for named family objects. Keys are family names, and values are dicts containing mandatory keys for ‘dist’, ‘link’, and ‘parent’.

Type

dict

Examples

>>> dists = { 'my_dist': ['Normal', {'mu': 10, 'sd': 1000}]}
>>> pf = PriorFactory(dists=dists)
>>> families = { 'normalish': { 'dist': ['normal', {sd: '#my_dist'}],
>>>                             link:'identity', parent: 'mu'}}
>>> pf = PriorFactory(dists=dists, families=families)
get(dist=None, term=None, family=None)[source]

Retrieve default prior for a named distribution, term type, or family.

Parameters
  • dist (str) – Name of desired distribution. Note that the name is the key in the defaults dictionary, not the name of the Distribution object used to construct the prior.

  • term (str) – The type of term family to retrieve defaults for. Must be one of ‘intercept’, ‘fixed’, or ‘random’.

  • family (str) – The name of the Family to retrieve. Must be a value defined internally. In the default config, this is one of ‘gaussian’, ‘bernoulli’, ‘poisson’, or ‘t’.

bambi.results

class bambi.results.MCMCResults(model, data, names, dims, levels, transformed_vars=None)[source]

Holds sampler results; provides slicing, plotting, and summarization tools.

model

a bambi Model instance specifying the model.

Type

Model

data

Raw storage of MCMC samples in array with dimensions 0, 1, 2 = samples, chains, variables

Type

array-like

names

Names of all Terms.

Type

list

dims

Numbers of levels for all Terms.

Type

list

levels

Names of all levels for all Terms.

Type

list

transformed

Optional list of variable names to treat as transformed–and hence, to exclude from the output by default.

Type

list

plot(varnames=None, ranefs=True, transformed=False, combined=False, hist=False, bins=20, kind='trace')[source]

Plots posterior distributions and sample traces.

Basically a wrapperfor pm.traceplot() plus some niceties, based partly on code from: https://pymc-devs.github.io/pymc3/notebooks/GLM-model-selection.html.

Parameters
  • varnames (list) – List of variable names to plot. If None, all eligible variables are plotted.

  • ranefs (bool) – If True (default), shows trace plots for individual random effects.

  • transformed (bool) – If False (default), excludes internally transformed variables from plotting.

  • combined (bool) – If True, concatenates all chains into one before plotting. If False (default), plots separately lines for each chain (on the same axes).

  • hist (bool) – If True, plots a histogram for each fixed effect, in addition to the kde plot. To prevent visual clutter, histograms are never plotted for random effects.

  • bins (int) – If hist is True, the number of bins in the histogram. Ignored if hist is False.

  • kind (str) – Either ‘trace’ (default) or ‘priors’. If ‘priors’, this just internally calls Model.plot()

summary(varnames=None, ranefs=False, transformed=False, hpd=0.95, quantiles=None, diagnostics=['effective_n', 'gelman_rubin'])[source]

Returns a DataFrame of summary/diagnostic statistics for the parameters.

Parameters
  • varnames (list) – List of variable names to include; if None (default), all eligible variables are included.

  • ranefs (bool) – Whether or not to include random effects in the summary. Default is False.

  • transformed (bool) – Whether or not to include internally transformed variables in the summary. Default is False.

  • hpd (float, between 0 and 1) – Show Highest Posterior Density (HPD) intervals with specified width/proportion for all parameters. If None, HPD intervals are suppressed.

  • quantiles (float, list) – Show specified quantiles of the marginal posterior distributions for all parameters. If list, must be a list of floats between 0 and 1. If None (default), no quantiles are shown.

  • diagnostics (list) – List of functions to use to compute convergence diagnostics for all parameters. Each element can be either a callable or a string giving the name of a function in the diagnostics module. Valid strings are ‘gelman_rubin’ and ‘effective_n’. Functions must accept a MCMCResults object as the sole input, and return a DataFrame with one labeled row per parameter. If None, no convergence diagnostics are computed.

to_df(varnames=None, ranefs=False, transformed=False, chains=None)[source]

Returns the MCMC samples in a nice, neat pandas DataFrame with all MCMC chains concatenated.

Parameters
  • varnames (list) – List of variable names to include; if None (default), all eligible variables are included.

  • ranefs (bool) – Whether or not to include random effects in the returned DataFrame. Default is True.

  • transformed (bool) – Whether or not to include internally transformed variables in the result. Default is False.

  • chains (int, list) – Index, or list of indexes, of chains to concatenate. E.g., [1, 3] would concatenate the first and third chains, and ignore any others. If None (default), concatenates all available chains.

class bambi.results.PyMC3ADVIResults(model, params)[source]

Holds PyMC3 ADVI results and provides plotting and summarization tools.

model

A bambi Model instance specifying the model.

Type

Model

params

ADVI parameters returned by PyMC3.

Type

MultiTrace