API Reference¶
This reference provides detailed documentation for all modules, classes, and methods in the current release of Bambi.
bambi.models
¶
- class bambi.models.Model(formula=None, data=None, family='gaussian', priors=None, link=None, categorical=None, potentials=None, dropna=False, auto_scale=True, automatic_priors='default', noncentered=True, priors_cor=None, taylor=None)[source]¶
Specification of model class.
- Parameters
formula (str) – A model description written in model formula language.
data (DataFrame or str) – The dataset to use. Either a pandas
DataFrame
, or the name of the file containing the data, which will be passed topd.read_csv()
.family (str or Family) – A specification of the model family (analogous to the family object in R). Either a string, or an instance of class
priors.Family
. If a string is passed, a family with the corresponding name must be defined in the defaults loaded atModel
initialization. Valid pre-defined families are'gaussian'
,'bernoulli'
,'beta'
,'binomial'
,'poisson'
,'gamma'
,'wald'
, and'negativebinomial'
. Defaults to'gaussian'
.priors (dict) – Optional specification of priors for one or more terms. A dictionary where the keys are the names of terms in the model, ‘common’ or ‘group_specific’ and the values are either instances of class
Prior
orint
,float
, orstr
that specify the width of the priors on a standardized scale.link (str) – The model link function to use. Can be either a string (must be one of the options defined in the current backend; typically this will include at least
'identity'
,'logit'
,'inverse'
, and'log'
), or a callable that takes a 1D ndarray or theano tensor as the sole argument and returns one with the same shape.categorical (str or list) – The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is
None
, the data type of the columns in theDataFrame
will be used to infer handling. In cases where numeric columns are to be treated as categoricals (e.g., group specific factors coded as numerical IDs), explicitly passing variable names via this argument is recommended.potentials (A list of 2-tuples.) – Optional specification of potentials. A potential is an arbitrary expression added to the likelihood, this is generally useful to add constrains to models, that are difficult to express otherwise. The first term of a 2-tuple is the name of a variable in the model, the second a lambda function expressing the desired constraint. If a constraint involves n variables, you can pass n 2-tuples or pass a tuple which first element is a n-tuple and second element is a lambda function with n arguments. The number and order of the lambda function has to match the number and order of the variables names.
dropna (bool) – When
True
, rows with any missing values in either the predictors or outcome are automatically dropped from the dataset in a listwise manner.auto_scale (bool) – If
True
(default), priors are automatically rescaled to the data (to be weakly informative) any time default priors are used. Note that any priors explicitly set by the user will always take precedence over default priors.automatic_priors (str) – An optional specification to compute/scale automatic priors.
"default"
means to use a method inspired on the R rstanarm library."mle"
means to use old default priors in Bambi that rely on maximum likelihood estimations obtained via the statsmodels library.noncentered (bool) – If
True
(default), uses a non-centered parameterization for normal hyperpriors on grouped parameters. IfFalse
, naive (centered) parameterization is used.dict (priors_cor =) – The value of eta in the prior for the correlation matrix of group-specific terms. Keys in the dictionary indicate the groups, and values indicate the value of eta.
taylor (int) – Order of Taylor expansion to use in approximate variance when constructing the default priors. Should be between 1 and 13. Lower values are less accurate, tending to undershoot the correct prior width, but are faster to compute and more stable. Odd-numbered values tend to work better. Defaults to 5 for Normal models and 1 for non-Normal models. Values higher than the defaults are generally not recommended as they can be unstable.
- build()[source]¶
Set up the model for sampling/fitting.
Performs any steps that require access to all model terms (e.g., scaling priors on each term), then calls the backend’s
build()
method.
- property common_terms¶
Return dict of all and only common effects in model.
- fit(omit_offsets=True, **kwargs)[source]¶
Fit the model using the specified backend.
- Parameters
omit_offsets (bool) – Omits offset terms in the
InferenceData
object when the model includes group specific effects. Defaults toTrue
.
- graph(formatting='plain', name=None, figsize=None, dpi=300, fmt='png')[source]¶
Produce a graphviz Digraph from a built Bambi model.
- Requires graphviz, which may be installed most easily with
conda install -c conda-forge python-graphviz
Alternatively, you may install the
graphviz
binaries yourself, and thenpip install graphviz
to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.- Parameters
formatting (str) – One of
'plain'
or'plain_with_params'
. Defaults to'plain'
.name (str) – Name of the figure to save. Defaults to None, no figure is saved.
figsize (tuple) – Maximum width and height of figure in inches. Defaults to None, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works if
name
is not None.dpi (int) – Point per inch of the figure to save. Defaults to 300. Only works if
name
is not None.fmt (str) – Format of the figure to save. Defaults to
'png'
. Only works ifname
is not None.
Example
>>> model = Model('y ~ x + (1|z)') >>> model.build() >>> model.graph()
>>> model = Model('y ~ x + (1|z)') >>> model.fit() >>> model.graph()
- property group_specific_terms¶
Return dict of all and only group specific effects in model.
- property intercept_term¶
Return the intercept term
- plot_priors(draws=5000, var_names=None, random_seed=None, figsize=None, textsize=None, hdi_prob=None, round_to=2, point_estimate='mean', kind='kde', bins=None, omit_offsets=True, omit_group_specific=True, ax=None)[source]¶
Samples from the prior distribution and plots its marginals.
- Parameters
draws (int) – Number of draws to sample from the prior predictive distribution. Defaults to 5000.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to both observed and unobserved RVs.
random_seed (int) – Seed for the random number generator.
figsize (tuple) – Figure size. If
None
it will be defined automatically.textsize (float) – Text size scaling factor for labels, titles and lines. If
None
it will be autoscaled based onfigsize
.hdi_prob (float) – Plots highest density interval for chosen percentage of density. Use
'hide'
to hide the highest density interval. Defaults to 0.94.round_to (int) – Controls formatting of floats. Defaults to 2 or the integer part, whichever is bigger.
point_estimate (str) –
- Plot point estimate per variable. Values should be
'mean'
,'median'
,'mode'
or
None
. Defaults to'auto'
i.e. it falls back to default set in ArviZ’s rcParams.
- Plot point estimate per variable. Values should be
kind (str) – Type of plot to display (
'kde'
or'hist'
) For discrete variables this argument is ignored and a histogram is always used.bins (integer or sequence or 'auto') – Controls the number of bins, accepts the same keywords
matplotlib.hist()
does. Only works ifkind == hist
. IfNone
(default) it will useauto
for continuous variables andrange(xmin, xmax + 1)
for discrete variables.omit_offsets (bool) – Whether to omit offset terms in the plot. Defaults to
True
.omit_group_specific (bool) – Whether to omit group specific effects in the plot. Defaults to
True
.ax (numpy array-like of matplotlib axes or bokeh figures) – A 2D array of locations into which to plot the densities. If not supplied, ArviZ will create its own array of plot areas (and return it).
**kwargs – Passed as-is to
plt.hist()
orplt.plot()
function depending on the value ofkind
.
- Returns
axes
- Return type
matplotlib axes or bokeh figures
- posterior_predictive(idata, draws=500, var_names=None, inplace=True, random_seed=None)[source]¶
Generate samples from the posterior predictive distribution.
- Parameters
idata (InferenceData) –
InferenceData
with samples from the posterior distribution.draws (int) – Number of draws to sample from the posterior predictive distribution. Defaults to 500.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to observed RVs.
inplace (bool) – If
True
it will add aposterior_predictive
group to idata, otherwise it will return a copy of idata with the added group. IfTrue
and idata already have aposterior_predictive
group it will be overwritten.random_seed (int) – Seed for the random number generator.
- Returns
When
inplace=True
addposterior_predictive
group to idata and returnNone
. Otherwise a copy of idata with aposterior_predictive
group.- Return type
None or InferenceData
- predict(idata, kind='mean', data=None, draws=None, inplace=True)[source]¶
Predict method for Bambi models
Obtains in-sample and out-sample predictions from a fitted Bambi model.
- Parameters
idata (InferenceData) –
InferenceData
with samples from the posterior distribution.kind (str) – Indicates the type of prediction required. Can be
"mean"
or"pps"
. The first returns posterior distribution of the mean, while the latter returns the posterior predictive distribution (i.e. the posterior probability distribution for a new observation). Defaults to"mean"
.data (pd.DataFrame or None) – An optional data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used.
draws (None) – The number of random draws per chain. Only used if
kind="pps"
. Not recommended unless more than ndraws times nchains posterior predictive samples are needed. Defaults toNone
which means ndraws times nchains.inplace (bool) – If
True
it will add aposterior_predictive
group to idata, otherwise it will return a copy of idata with the added group. IfTrue
and idata already have aposterior_predictive
group it will be overwritten.
- Returns
A NumPy array with predictions.
- Return type
np.ndarray
- prior_predictive(draws=500, var_names=None, omit_offsets=True, random_seed=None)[source]¶
Generate samples from the prior predictive distribution.
- Parameters
draws (int) – Number of draws to sample from the prior predictive distribution. Defaults to 500.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to both observed and unobserved RVs.
random_seed (int) – Seed for the random number generator.
- Returns
InferenceData
object with the groups prior,prior_predictive
andobserved_data
.- Return type
InferenceData
- set_priors(priors=None, common=None, group_specific=None)[source]¶
Set priors for one or more existing terms.
- Parameters
priors (dict) – Dictionary of priors to update. Keys are names of terms to update; values are the new priors (either a
Prior
instance, or an int or float that scales the default priors). Note that a tuple can be passed as the key, in which case the same prior will be applied to all terms named in the tuple.common (Prior, int, float or str) – A prior specification to apply to all common terms included in the model.
group_specific (Prior, int, float or str) – A prior specification to apply to all group specific terms included in the model.
- property term_names¶
Return names of all terms in order of addition to model.
bambi.priors
¶
- class bambi.priors.Family(name, likelihood, link)[source]¶
A specification of model family.
- Parameters
name (str) – Family name.
likelihood (Likelihood) – A
Likelihood
instace specifying the model likelihood function.link (str or Link) – The name of the link function, or the function itself, transforming the linear model prediction to the mean parameter of the likelihood. If a function, it must be able to operate over theano tensors rather than numpy arrays.
- class bambi.priors.Likelihood(name, parent=None, pps=None, **kwargs)[source]¶
Representation of a Likelihood function for a Bambi model.
‘parent’ must not be in ‘kwargs’. ‘parent’ is inferred from the ‘name’ if it is a known name
- Parameters
name (str) – Name of the likelihood function. Must be a valid PyMC3 distribution name.
parent (str) – Optional specification of the name of the mean parameter in the likelihood. This is the parameter whose transformation is modeled by the linear predictor.
kwargs – Keyword arguments that indicate prior distributions for auxiliary parameters in the likelihood.
- class bambi.priors.Link(name, link=None, linkinv=None, linkinv_backend=None)[source]¶
Representation of link function.
This object actually contains two main functions. One is the link function itself, the function that maps values in the response scale to the linear predictor, and the other is the inverse of the link function, that maps values of the linear predictor to the response scale.
The great majority of users will never interact with this class unless they want to create a custom
Family
with a customLink
. This is automatically handled for all the built-in families.- Parameters
name (str) – The name of the link function. If it is a known name, it’s not necessary to pass any other arguments because functions are already defined internally. If not known, all of
link
,linkinv
andlinkinv_backend
must be specified.link (function) – A function that maps the response to the linear predictor. Known as ‘g()’ in GLM jargon. Does not need to be specified when
name
is a known name.linkinv (function) – A function that maps the linear predictor to the response. Known as ‘g()^-1’ in GLM jargon. Does not need to be specified when
name
is a known name.linkinv_backend (function) – Same than
linkinv
but must be something that works with PyMC3 backend (i.e. it must work with Aesara tensors). Does not need to be specified whenname
is a known name.
- class bambi.priors.Prior(name, auto_scale=True, scale=None, **kwargs)[source]¶
Abstract specification of a term prior.
- Parameters
name (str) – Name of prior distribution. Must be the name of a PyMC3 distribution (e.g.,
'Normal'
,'Bernoulli'
, etc.)auto_scale (bool) – Whether to adjust the parameters of the prior or use them as passed. Default to
True
.scale (num or str) –
kwargs (dict) – Optional keywords specifying the parameters of the named distribution.
- class bambi.priors.PriorScalerMLE(model, taylor)[source]¶
Scale prior distributions parameters.
Used internally. Based on https://arxiv.org/abs/1702.01201
- fit_mle()[source]¶
Fits MLE of the common part of the model.
This used to be called in the class instantiation, but there is no need to fit the GLM when there are no automatic priors. So this method is only called when needed.
- get_slope_stats(exog, name, values, sigma_corr, points=4, full_model=None)[source]¶
- Parameters
name (str) – Name of the term
values (np.array) – Values of the term
full_model (statsmodels.genmod.generalized_linear_model.GLM) – Statsmodels GLM to replace MLE model. For when
'predictor'
is not in the common part of the model.points (int) – Number of points to use for LL approximation.
- bambi.priors.extract_family_prior(family, priors)[source]¶
Extract priors for a given family
If a key in the priors dictionary matches the name of a nuisance parameter of the response distribution for the given family, this function extracts and returns the prior for that nuisance parameter. The result of this function can be safely used to update the
Prior
of the response term.- Parameters
family (str or
Family
) – The name of a built-in family or aFamily
instance.priors (dict) – A dictionary where keys represent parameter/term names and values represent prior distributions.