API Reference¶
This reference provides detailed documentation for all modules, classes, and methods in the current release of Bambi.
bambi.models
¶
- class bambi.models.Model(formula=None, data=None, family='gaussian', priors=None, link=None, categorical=None, dropna=False, auto_scale=True, default_priors=None, noncentered=True, taylor=None)[source]¶
Specification of model class.
- Parameters
formula (str) – A model description written in model formula language.
data (DataFrame or str) – The dataset to use. Either a pandas
DataFrame
, or the name of the file containing the data, which will be passed topd.read_csv()
.family (str or Family) – A specification of the model family (analogous to the family object in R). Either a string, or an instance of class
priors.Family
. If a string is passed, a family with the corresponding name must be defined in the defaults loaded atModel
initialization.Valid pre-defined families are'gaussian'
,'bernoulli'
,'poisson'
,'gamma'
,'wald'
, and'negativebinomial'
. Defaults to'gaussian'
.priors (dict) – Optional specification of priors for one or more terms. A dictionary where the keys are the names of terms in the model, ‘common’ or ‘group_specific’ and the values are either instances of class
Prior
orint
,float
, orstr
that specify the width of the priors on a standardized scale.link (str) – The model link function to use. Can be either a string (must be one of the options defined in the current backend; typically this will include at least
'identity'
,'logit'
,'inverse'
, and'log'
), or a callable that takes a 1D ndarray or theano tensor as the sole argument and returns one with the same shape.categorical (str or list) – The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is
None
, the data type of the columns in theDataFrame
will be used to infer handling. In cases where numeric columns are to be treated as categoricals (e.g., group specific factors coded as numerical IDs), explicitly passing variable names via this argument is recommended.dropna (bool) – When
True
, rows with any missing values in either the predictors or outcome are automatically dropped from the dataset in a listwise manner.auto_scale (bool) – If
True
(default), priors are automatically rescaled to the data (to be weakly informative) any time default priors are used. Note that any priors explicitly set by the user will always take precedence over default priors.default_priors (dict or str) – An optional specification of the default priors to use for all model terms. Either a dictionary containing named distributions, families, and terms (see the documentation in
priors.PriorFactory
for details), or the name of a JSON file containing the same information.noncentered (bool) – If
True
(default), uses a non-centered parameterization for normal hyperpriors on grouped parameters. IfFalse
, naive (centered) parameterization is used.taylor (int) – Order of Taylor expansion to use in approximate variance when constructing the default priors. Should be between 1 and 13. Lower values are less accurate, tending to undershoot the correct prior width, but are faster to compute and more stable. Odd-numbered values tend to work better. Defaults to 5 for Normal models and 1 for non-Normal models. Values higher than the defaults are generally not recommended as they can be unstable.
- build(backend='pymc')[source]¶
Set up the model for sampling/fitting.
Performs any steps that require access to all model terms (e.g., scaling priors on each term), then calls the backend’s
build()
method.- Parameters
backend (str) – The name of the backend to use for model fitting. Currently only
'pymc'
is supported.
- property common_terms¶
Return dict of all and only common effects in model.
- fit(omit_offsets=True, backend='pymc', **kwargs)[source]¶
Fit the model using the specified backend.
- Parameters
omit_offsets (bool) – Omits offset terms in the
InferenceData
object when the model includes group specific effects. Defaults toTrue
.backend (str) – The name of the backend to use. Currently only
'pymc'
backend is supported.
- graph(formatting='plain', name=None, figsize=None, dpi=300, fmt='png')[source]¶
Produce a graphviz Digraph from a Bambi model.
- Requires graphviz, which may be installed most easily with
conda install -c conda-forge python-graphviz
Alternatively, you may install the
graphviz
binaries yourself, and thenpip install graphviz
to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.- Parameters
formatting (str) – One of
'plain'
or'plain_with_params'
. Defaults to'plain'
.name (str) – Name of the figure to save. Defaults to None, no figure is saved.
figsize (tuple) – Maximum width and height of figure in inches. Defaults to None, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works if
name
is not None.dpi (int) – Point per inch of the figure to save. Defaults to 300. Only works if
name
is not None.fmt (str) – Format of the figure to save. Defaults to
'png'
. Only works ifname
is not None.
- property group_specific_terms¶
Return dict of all and only group specific effects in model.
- plot_priors(draws=5000, var_names=None, random_seed=None, figsize=None, textsize=None, hdi_prob=None, round_to=2, point_estimate='mean', kind='kde', bins=None, omit_offsets=True, omit_group_specific=True, ax=None)[source]¶
Samples from the prior distribution and plots its marginals.
- Parameters
draws (int) – Number of draws to sample from the prior predictive distribution. Defaults to 5000.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to both observed and unobserved RVs.
random_seed (int) – Seed for the random number generator.
figsize (tuple) – Figure size. If
None
it will be defined automatically.textsize (float) – Text size scaling factor for labels, titles and lines. If
None
it will be autoscaled based onfigsize
.hdi_prob (float) – Plots highest density interval for chosen percentage of density. Use
'hide'
to hide the highest density interval. Defaults to 0.94.round_to (int) – Controls formatting of floats. Defaults to 2 or the integer part, whichever is bigger.
point_estimate (str) –
- Plot point estimate per variable. Values should be
'mean'
,'median'
,'mode'
or
None
. Defaults to'auto'
i.e. it falls back to default set in ArviZ’s rcParams.
- Plot point estimate per variable. Values should be
kind (str) – Type of plot to display (
'kde'
or'hist'
) For discrete variables this argument is ignored and a histogram is always used.bins (integer or sequence or 'auto') – Controls the number of bins, accepts the same keywords
matplotlib.hist()
does. Only works ifkind == hist
. IfNone
(default) it will useauto
for continuous variables andrange(xmin, xmax + 1)
for discrete variables.omit_offsets (bool) – Whether to omit offset terms in the plot. Defaults to
True
.omit_group_specific (bool) – Whether to omit group specific effects in the plot. Defaults to
True
.ax (numpy array-like of matplotlib axes or bokeh figures) – A 2D array of locations into which to plot the densities. If not supplied, ArviZ will create its own array of plot areas (and return it).
**kwargs – Passed as-is to
plt.hist()
orplt.plot()
function depending on the value ofkind
.
- Returns
axes
- Return type
matplotlib axes or bokeh figures
- posterior_predictive(idata, draws=500, var_names=None, inplace=True, random_seed=None)[source]¶
Generate samples from the posterior predictive distribution.
- Parameters
idata (InfereceData) –
InfereceData
with samples from the posterior distribution.draws (int) – Number of draws to sample from the prior predictive distribution. Defaults to 500.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to both observed and unobserved RVs.
inplace (bool) – If
True
it will add aposterior_predictive
group to idata, otherwise it will return a copy of idata with the added group. IfTrue
and idata already have aposterior_predictive
group it will be overwritten.random_seed (int) – Seed for the random number generator.
- Returns
When
inplace=True
addposterior_predictive
group to idata and returnNone
. Otherwise a copy of idata with aposterior_predictive
group.- Return type
None or InferenceData
- prior_predictive(draws=500, var_names=None, omit_offsets=True, random_seed=None)[source]¶
Generate samples from the prior predictive distribution.
- Parameters
draws (int) – Number of draws to sample from the prior predictive distribution. Defaults to 500.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to both observed and unobserved RVs.
random_seed (int) – Seed for the random number generator.
- Returns
InferenceData
object with the groups prior,prior_predictive
andobserved_data
.- Return type
InferenceData
- set_priors(priors=None, common=None, group_specific=None, match_derived_names=True)[source]¶
Set priors for one or more existing terms.
- Parameters
priors (dict) – Dictionary of priors to update. Keys are names of terms to update; values are the new priors (either a
Prior
instance, or an int or float that scales the default priors). Note that a tuple can be passed as the key, in which case the same prior will be applied to all terms named in the tuple.common (Prior, int, float or str) – A prior specification to apply to all common terms included in the model.
group_specific (Prior, int, float or str) – A prior specification to apply to all group specific terms included in the model.
match_derived_names (bool) – If
True
, the specified prior(s) will be applied not only to terms that match the keyword exactly, but to the levels of group specific effects that were derived from the original specification with the passed name. For example,priors={'condition|subject':0.5}
would apply the prior to the terms with names'1|subject'
,'condition[T.1]|subject'
, and so on. IfFalse
, an exact match is required for the prior to be applied.
- property term_names¶
Return names of all terms in order of addition to model.
bambi.priors
¶
- class bambi.priors.Family(name, prior, link, parent)[source]¶
A specification of model family.
- Parameters
name (str) – Family name.
prior (Prior) – A
Prior
instance specifying the model likelihood prior.link (str) – The name of the link function transforming the linear model prediction to a parameter of the likelihood.
parent (str) – The name of the prior parameter to set to the link-transformed predicted outcome (e.g.,
'mu'
,'p'
, etc.).
- class bambi.priors.Prior(name, scale=None, **kwargs)[source]¶
Abstract specification of a term prior.
- Parameters
name (str) – Name of prior distribution (e.g.,
'Normal'
,'Bernoulli'
, etc.)kwargs (dict) – Optional keywords specifying the parameters of the named distribution.
- class bambi.priors.PriorFactory(defaults=None, dists=None, terms=None, families=None)[source]¶
An object that supports specification and easy retrieval of default priors.
- Parameters
defaults (str or dict) – Optional base configuration containing default priors for distribution, families, and term types. If a string, the name of a JSON file containing the config. If a dict, must contain keys for
'dists'
,'terms'
, and'families'
; see the built-in JSON configuration for an example. IfNone
, a built-in set of priors will be used as defaults.dists (dict) – Optional specification of named distributions to use as priors. Each key gives the name of a newly defined distribution; values are two-element lists, where the first element is the name of the built-in distribution to use (
'Normal'
,'Cauchy',
etc.), and the second element is a dictionary of parameters on that distribution (e.g.,{'mu': 0, 'sigma': 10}
). Priors can be nested to arbitrary depths by replacing any parameter with another prior specification.terms (dict) – Optional specification of default priors for different model term types. Valid keys are
'intercept'
,'common'
, or'group_specific'
. Values are either strings preprended by a #, in which case they are interpreted as pointers to distributions named in the dists dictionary, or key -> value specifications in the same format as elements in the dists dictionary.families (dict) – Optional specification of default priors for named family objects. Keys are family names, and values are dicts containing mandatory keys for
'dist'
,'link'
, and'parent'
.
Examples
>>> dists = {'my_dist': ['Normal', {'mu': 10, 'sigma': 1000}]} >>> pf = PriorFactory(dists=dists)
>>> families = {'normalish': {'dist': ['normal', {sigma: '#my_dist'}], >>> link:'identity', parent: 'mu'}} >>> pf = PriorFactory(dists=dists, families=families)
- get(dist=None, term=None, family=None)[source]¶
Retrieve default prior for a named distribution, term type, or family.
- Parameters
dist (str) – Name of desired distribution. Note that the name is the key in the defaults dictionary, not the name of the
Distribution
object used to construct the prior.term (str) – The type of term family to retrieve defaults for. Must be one of
'intercept'
,'common'
, or'group_specific'
.family (str) – The name of the
Family
to retrieve. Must be a value defined internally. In the default config, this is one of'gaussian'
,'bernoulli'
,'poisson'
,'gama'
,'wald'
, or'negativebinomial'
.