# Getting Started¶

## Prior Transforms¶

The **prior transform** function is used to implicitly specify the Bayesian
prior \(\pi(\boldsymbol{\Theta})\) for Nested Sampling. It functions as
a transformation from a space where variables are i.i.d. within the
\(D\)-dimensional unit cube (i.e. uniformly distributed from 0 to 1)
to the parameter space of interest. For independent
parameters, this would be the product of the inverse cumulative distribution
function (CDF) (also
known as the “percent point function” or “quantile function”) associated with
each parameter.

It is crucial to note that increasing the size of the prior *directly*
impacts the amount of time needed to integrate over the posterior.
We highlight some examples of prior transforms below.

### Example: Uniform Priors¶

Suppose we want our prior to be Uniform from [-10, 10) for all variables:

The prior transform for this distribution would be:

```
def prior_transform(u):
"""Transforms the uniform random variable `u ~ Unif[0., 1.)`
to the parameter of interest `x ~ Unif[-10., 10.)`."""
x = 2. * u - 1. # scale and shift to [-1., 1.)
x *= 10. # scale to [-10., 10.)
return x
```

### Example: Non-uniform priors¶

Suppose we instead have a more complicated prior in 5 variables.
The first 2 are drawn from a
bivariate Normal distribution,
the third is drawn from a
Beta distribution,
the fourth from a
Gamma distribution,
and the fifth from a truncated normal distribution.
To handle more complicated functions like these, we can use the built-in
functions
in `scipy.stats`

, which include a **percent point function (ppf)** that
is analagous to our prior transform. Using those, our above examples
would look like:

```
def prior_transform(u):
"""Transforms the uniform random variables `u ~ Unif[0., 1.)`
to the parameters of interest."""
x = np.array(u) # copy u
# Bivariate Normal
t = scipy.stats.norm.ppf(u[0:2]) # convert to standard normal
Csqrt = np.array([[2., 1.],
[1., 2.]]) # C^1/2 for C=((5, 4), (4, 5))
x[0:2] = np.dot(Csqrt, t) # correlate with appropriate covariance
mu = np.array([5., 2.]) # mean
x[0:2] += mu # add mean
# Beta
a, b = 2.31, 0.627 # shape parameters
x[2] = scipy.stats.beta.ppf(u[2], a, b)
# Gamma
alpha = 5. # shape parameter
x[3] = scipy.stats.gamma.ppf(u[3], alpha)
# Truncated Normal
m, s = 5, 2 # mean and standard deviation
low, high = 2., 10. # lower and upper bounds
low_n, high_n = (low - m) / s, (high - m) / s # standardize
x[4] = scipy.stats.truncnorm.ppf(u[4], low_n, high_n, loc=m, scale=s)
return x
```

### Example: Conditional priors¶

This procedure can be generalized to construct priors that only can be
expressed in conditional form. As an example, let’s assume we have
a three-parameter model where the prior for the third parameter depends
on the values for the first two. This might be the case in, e.g., a
hierarchical
model where the prior over `c`

is a Normal distribution whose mean
`m`

and standard deviation `s`

are determined by a corressponding
“hyper-prior”. We can easily set up a prior transform for this model
by just going through the variables in order. This would look like:

```
def prior_transform(u):
"""Transforms the uniform random variables `u ~ Unif[0., 1.)`
to the parameters of interest."""
x = np.array(u) # copy u
# Mean hyper-prior
mu, sigma = 5., 1. # mean, standard deviation
x[0] = scipy.stats.norm.ppf(u[0], loc=mu, scale=sigma)
# Standard deviation hyper-prior
x[1] = 10. ** (u[1] * 2. - 1.) # log10(std) ~ Uniform[-1, 1]
# Prior
x[2] = scipy.stats.norm.ppf(u[2], loc=x[0], scale=x[1])
return x
```

More complicated dependencies can be constructed using similar approaches.

## Nested Sampling with dynesty¶

To give a concrete example of running `dynesty`

on a real problem,
let’s return to the simple 3-D multivariate normal
likelihood and uniform prior from [-10, 10) used in Crash Course to
define the `loglikelihood()`

and `prior_transform()`

functions:

```
import numpy as np
# Define the dimensionality of our problem.
ndim = 3
# Define our 3-D correlated multivariate normal log-likelihood.
C = np.identity(ndim)
C[C==0] = 0.95
Cinv = linalg.inv(C)
lnorm = -0.5 * (np.log(2 * np.pi) * ndim +
np.log(np.linalg.det(C)))
def loglike(x):
return -0.5 * np.dot(x, np.dot(Cinv, x)) + lnorm
# Define our uniform prior via the prior transform.
def ptform(u):
return 20. * u - 10.
```

### Initialization¶

Nested Sampling in `dynesty`

is done via a particular `sampler`

object that is initialized from the Top-Level Interface. To start,
let’s use `NestedSampler()`

to initialize a particular
sampler from `nestedsamplers`

. There are only 3 required arguments:
a log-likelihood function (`loglike`

), a prior transform function (`ptform`

),
and the number of dimensions taken by the loglikelihood (`ndim`

).

Using the functions above, we can initialize our sampler using:

```
from dynesty import NestedSampler
# initialize our nested sampler
sampler = NestedSampler(loglike, ptform, ndim)
```

See Top-Level Interface for more details on the API, Examples for more examples of usage, and FAQ for some additional advice. Here we’ll go over just the basics.

### Live Points¶

Similar to ensemble sampling methods such as emcee, the behavior of Nested Sampling can also be sensitive to the number of live points used. Increasing the number of live points leads to smaller changes in the prior volume \(\ln X\) over time. This improves the effective resolution while simultaneously increasing the runtime.

In addition, the number of live points can also affect the stability of our
Bounding Options. By default, `dynesty`

inflates the size of the
chosen bounds by an enlargement factor to ensure they effectively bound the
iso-likelihood contours. These bounds become more robust the more live points
are used, leading to more efficient proposals.

It is important to note that running with too few live points can lead to mode “die off”. When there are multiple modes with live points distributed between them, live points can randomly “jump” between them at any given iteration. If there are only a handful of live points at a particular mode, it is possible that, entirely by chance, all of them could transfer completely to the other mode even as both remain equally likely, leading it to “die off” and likely never be located again. As a rule-of-thumb, you should allocate around 50 live points per possible mode to guard against this.

The number of live points can be specified upon initialization via the
`nlive`

argument. For example, if we want to run with 1000 live points rather
than the default 250, we would use:

```
NestedSampler(loglike, ptform, ndim, nlive=1500)
```

### Bounding Options¶

`dynesty`

supports a number of options for bounding the target distribution:

**no bound**(`'none'`

), i.e. sampling from the entire unit cube,a

**single bounding ellipsoid**(`'single'`

),**multiple**(possibly overlapping)**bounding ellipsoids**(`'multi'`

),**overlapping balls**centered on each live point (`'balls'`

), and**overlapping cubes**centered on each live point (`'cubes'`

).

By default, `dynesty`

uses multi-ellipsoidal decomposition (`'multi'`

),
which often is flexible enough to capture the complexity of many likelihood
distributions while simple enough to quickly and efficiently generate new
samples. For more complex distributions, overlapping balls (`'balls'`

)
or cubes (`'cubes'`

) can generate more flexible bounding distributions but
come with significantly more overhead that can be less efficient at generating
samples. For simpler distributions, a single ellipsoid (`'single'`

) is often
sufficient. Sampling directly from the unit cube (`'none'`

) is extremely
inefficient but is a useful option to verify your results and
look for possible biases. It otherwise should only be used if the
log-likelihood is trivial to compute.

Specifying the particular bounding distribution can be done upon initialization
via the `bound`

argument. If we wanted to sample using overlapping balls rather
than multiple bounding ellipsoids, for instance, we would use:

```
NestedSampler(loglike, ptform, ndim, nlive=1500, bound='balls')
```

As mentioned in Live Points, bounding distributions in `dynesty`

are
enlarged in an attempt to conservatively encompass the iso-likelihood contour
associated with each dead point. The default behavior increases the
volume by 25%, although this can also be done in real-time using
bootstrapping methods (this procedure can lead to some instability in the size
of the bounds if fewer than the optimal number of live points are being used;
see the FAQ for additional details).
The volume enlargement factor or the number of
bootstrap realizations used can be specified using the `enlarge`

and `bootstrap`

arguments.

Additional information on the bounding objects can be found under Bounding and in Examples.

To avoid excessive overhead spent constructing bounding
distributions, `dynesty`

only updates bounding distributions after a fixed
number of likelihood calls specified by the `update_interval`

argument. Larger
values generally decrease the sampling efficiency but can improve overall
performance. This value by default is set to different values for different
sampling methods (see the API for additional details), but if
we wanted to instead use a particular value we could just specify that via:

```
NestedSampler(loglike, ptform, ndim, nlive=1500, bound='balls',
bootstrap=50, update_interval=1.2)
```

Passing a float like `1.2`

sets the update interval to be after
`round(1.2 * nlive)`

functional calls so that it scales based on the
number of live points (and thus the speed at which we expect to traverse
the prior volume). If we’d like to specific the number of function calls
directly, however, we can instead pass an integer:

```
NestedSampler(loglike, ptform, ndim, nlive=1500, bound='balls',
bootstrap=50, update_interval=600)
```

This now specifies that we will update our bounds after `600`

function
calls.

See Top-Level Interface for more information.

### Sampling Options¶

`dynesty`

also supports several different sampling methods *conditioned on*
the provided bounds which can be passed via the `sample`

argument:

**uniform**sampling (`'unif'`

),**random walks**away from a current live point (`'rwalk'`

),**multivariate slice sampling**away from a current live point (`'slice'`

),**random slice sampling**away from a current live point (`'rslice'`

), and**“Hamiltonian” slice sampling**away from a current live point (`'hslice'`

).

In addition, `dynesty`

also supports passing **custom callable functions**
to the `sample`

argument, provided they follow the same format as the
default sampling functions defined here.
These can also be accompanied by custom “update functions” that try to
adaptively scale proposals to ensure better overall sampling efficiency.
See here
for examples of some of the functions that are associated with the default
sampling methods described above.

By default, `dynesty`

automatically picks a sampling method
based on the dimensionality of the problem via the `'auto'`

argument, which
uses the following logic:

If \(D < 10\),

`'unif'`

is chosen since uniform proposals can be quite efficient in low dimensions.If \(10 \leq D \geq 20\),

`'rwalk'`

is chosen since random walks are more robust to underestimated bounding distributions in higher dimensions,If \(D > 20\) and a gradient is not provided,

`'rslice'`

is chosen since non-rejection sampling methods scale in polynomial (rather than exponential) time as the dimensionality increases.If \(D > 20\) and a gradient

*is*provided,`'hslice'`

is chosen to take advantage of Hamiltonian dynamics, which scale better than`'rslice'`

as the dimensionality increases.

Note that `'hslice'`

, while using gradients, is substantially less efficient
(and in general less reliable) than other gradient-based approaches such
as Hamiltonian Monte Carlo. As such, **use them at your own risk.**

One benefit to using random walks or slice sampling is that they require many
fewer live points to adapt to structure in higher dimensions (since they only
sample *conditioned* on the bounds, rather than **within** them).
They also do not require any sort of bootstrap-style corrections
since they contain built-in methods to tune their step sizes. This, however,
does not mean that they are immune to issues that arise when running with
fewer live points such as mode “die-off” (see Live Points).

Following the example above, let’s say we wanted to combine the flexibility of multiple bounding ellipsoids and slice sampling. This might look something like:

```
NestedSampler(loglike, ptform, ndim, bound='multi', sample='slice')
```

See Top-Level Interface for additional information.

### Early-Time Behavior¶

`dynesty`

tries to avoid constructing bounding distributions
early in the run to avoid issues where the bounds can significantly exceed the
unit cube. For instance, in most cases the bounding distribution
of the initial set of points *by construction* will exceed
the bounds of the unit cube when `enlarge > 1`

. This can lead to a
variety of problems associated with each method, especially in higher
dimensions (since volume scales as \(\propto r^D\)).

To avoid this behavior, `dynesty`

deliberately delays the first bounding
update until at least `2 * nlive`

function calls have been made *and* the
efficiency has fallen to 10%. This generally assumes that the overall
efficiency will be below 10%, which is the case for almost all sampling
methods (see below). If we wanted to adjust this behavior so
that we construct our first bounding distributions much earlier,
we could do so by passing some parameters using the `first_update`

argument:

```
NestedSampler(loglike, ptform, ndim, nlive=1500, bound='balls',
first_update={'min_ncall': 100, 'min_eff': 50.})
```

This will now trigger an update when 100 log-likelihood function calls have been made and the effiency drops below 50%.

### Special Boundary Conditions¶

By default, `dynesty`

assumes that all parameters have hard bounds.
In other words, if for some reason you propose outside of the unit cube
that defines the prior (see Prior Transforms), that point is
automatically rejected and a new one is proposed instead. This is
the desired behavior for most problems, since individual parameters are often
either defined everywhere (i.e. from negative infinity to infinity)
or over a finite range (e.g., from \(10\) to \(25\)).

Specific problems, however, may have parameters that behave differently.
In particular, `dynesty`

supports both **reflective** and **periodic**
boundary conditions. The former can arise when parameters are ratios (where
\(1/2\) and \(2/1\) may be equivalent) or angles (since 90 degrees and
450 degrees are often equivalent). Imposing these specific boundary conditions
on relevant parameters can help improve the overall sampling efficiency,
especially when solutions end up near the bounds (e.g., at \(0\) or
\(2\pi\) for phases). These can be enabled by just
specifying the indices of the relevant parameters, as shown below:

```
NestedSampler(loglike, ptform, ndim, nlive=1000, bound='cubes',
periodic=[0, 2], reflective=[1, 5])
```

### Parallelization¶

If you want to run computations in parallel, `dynesty`

can use a user-defined
`pool`

to execute a variety of internal operations in “parallel” rather than
in serial. This can be done by passing the `pool`

object to the sampler
upon initialization:

```
# initialize sampler with pool
sampler = NestedSampler(loglike, ptform, ndim, pool=pool)
```

By default, `dynesty`

tries to grab the size of the pool from the `pool.size`

attribute of the `pool`

. If this is not defined, the number of function
evaluations to execute in parallel can be set manually using the `queue_size`

argument:

```
# initialize sampler with pool with pre-defined queue
sampler = NestedSampler(loglike, ptform, ndim, pool=pool, queue_size=8)
```

Parallel operations in `dynesty`

are done by simply swapping in the
`pool.map`

function over the default `map`

function when making likelihood
calls. Note that this is a *synchronous* function call, which requires that
all members of the pool have completed their respective tasks before receiving
the pool’s output. The call time for functions is therefore limited
by the slowest-performing member of the pool.

The reason why “parallel” is written in quotes above is that while function evaluations can be made in parallel, live point proposals must be done serially in order to avoid breaking the statistical properties of Nested Sampling. Assuming we are using \(M\) processes with \(K\) live points, this leads to sub-linear scaling \(S\) of the form (Handley et al. 2015):

This scales pretty linearly as long as the number of processes is much smaller than the number of live points, but falls off as the pool becomes relatively larger.

Depending on where the bottleneck of the computation lies, the provided
`pool`

can be disabled during certain function evaluations (e.g., when
initializing points) using the `use_pool`

argument:

```
# initialize sampler with pool with pre-defined queue
sampler = NestedSampler(loglike, ptform, ndim,
nlive=2000, bound='single', sample='rwalk',
pool=pool, queue_size=16,
use_pool={'prior_transform': False})
```

See Pool Questions on the FAQ page for additional troubleshooting tips.

Note that, as discussed in Combining Runs, it is actually possible to
combine multiple independent Nested Sampling runs into a single run, giving
users an option as to whether they want to parallelize `dynesty`

*during*
runtime (using a user-provided `pool`

) or *after* runtime (by merging
the runs together).

### Running Internally¶

Sampling from our target distribution can be done using the
`run_nested()`

function in the provided
`sampler`

:

```
sampler.run_nested()
```

Sampling will continue until specified stopping criteria are reached, and
the current state of the sampler is by default output to `stderr`

in
real time. The stopping criteria can be any combination of:

a fixed number of iterations (

`maxiter`

),a fixed number of likelihood calls (

`maxcall`

),a maximum log-likelihood

`(logl_max`

),a specified \(\Delta \ln \hat{\mathcal{Z}}_i\) tolerance (

`dlogz`

), anda specified Effective Sample Size (ESS) (

`n_effective`

).

For instance, running one of the examples above would produce output like:

Out:

```
iter: 12521 | +1500 | bound: 7 | nc: 1 | ncall: 66884 | eff(%): 20.963 |
loglstar: -inf < -0.301 < inf | logz: -8.960 +/- 0.082 |
dlogz: 0.001 > 1.509
```

From left to right, this records: the current iteration (plus the number of
live points added after stopping), the current bound being used, the number
of log-likelihood calls made before accepting the last sample, the total number
of log-likelihood calls, the overall sampling efficiency,
the current log-likelihood bounds (`-inf`

and `inf`

because we began sampling from the prior and didn’t declare a `logl_max`

),
the current estimated evidence, and the remaining `dlogz`

relative
to the stopping criterion.

By default, the stopping criteria are optimized for evidence estimation, with posteriors treated as a nice byproduct. We can modify this by passing in something like:

```
sampler.run_nested(dlogz=0.5, maxiter=10000, maxcall=50000)
```

Since sampling is done through the `sampler`

objects, users can also continue
to add new samples based on where they left off. This is as easy as:

```
# initialize our sampler
sampler = NestedSampler(loglike, ptform, ndim, nlive=1000)
# start our run
sampler.run_nested(dlogz=0.5)
res1 = sampler.results
# (possibly) add more samples
sampler.run_nested(maxcall=10000)
res2 = sampler.results
# (possibly) add more samples again
sampler.run_nested(dlogz=0.01)
res3 = sampler.results
```

### Running Externally¶

Similar to emcee, `sampler`

objects in
`dynesty`

can also be run externally as a **generator** via the
`sample()`

function. This might look something
like:

```
# The main nested sampling loop.
for it, res in enumerate(sampler.sample(dlogz=0.5)):
pass
# Adding the final set of live points.
for it_final, res in enumerate(sampler.add_live_points()):
pass
```

as opposed to:

```
# The main nested sampling loop.
sampler.run_nested(dlogz=0.5, add_live=False)
# Adding the final set of live points.
sampler.add_final_live()
```

This can be extremely useful if you would like to manipulate the results in real-time, generate plots, save intermediate outputs, etc.

### Combining Runs¶

Nested sampling is “trivially parallelizable”, which makes it really
straightforward to combine the results from multiple independent runs.
`dynesty`

contains built-in utilities for combining results
from separate runs into a single run with improved posterior/evidence
estimates. This can be extremely useful if, for instance, you have performed
multiple independent analyses over the course of a project that you would
like to combine, or if you want to add additional samples to a
preliminary analysis (but don’t have the `sampler`

currently loaded in memory).

`dynesty`

makes this process relatively straightforward. An example is
shown below:

```
from dynesty import utils as dyfunc
# Create several independent nested sampling runs.
sampler = NestedSampler(loglike, ptform, ndim)
rlist = []
for i in range(10):
sampler.run_nested()
rlist.append(sampler.results)
sampler.reset()
# Merge into a single run.
results = dyfunc.merge_runs(rlist)
```

This process works with Dynamic Nested Sampling as well. See Unraveling/Merging Runs for additional details.

### Sampling with Gradients¶

As mentioned in Sampling Options,
`dynesty`

can utilize log-likelihood gradients \(\nabla \ln \mathcal{L}\)
by proposing new samples using Hamiltonian dynamics
(often referred to as **reflective slice sampling**). However, because
sampling in `dynesty`

occurs on the *unit cube* (\(\mathbf{u}\)) rather
than in the target space of our original variables (\(\mathbf{x}\)),
these gradients have to be defined with respect to \(\mathbf{u}\) rather
than \(\mathbf{x}\) even though they are evaluated at \(\mathbf{x}\).
This requires computing the Jacobian matrix
\(\mathbf{J}\) from \(\mathbf{x}\) to \(\mathbf{u}\).

While this Jacobian might seem difficult to derive, it can be shown that given independent priors on each parameter

where \(\pi_i(x_i)\) is the prior for the i-th parameter \(x_i\) that the Jacobian is diagonal where each diagonal element is simply

By default, `dynesty`

assumes that any gradient you pass in
**already has the appropriate Jacobian applied**. If not, you can tell
`dynesty`

to numerically estimate the Jacobian by setting
`compute_jac=True`

.

For the simple 3-D multivariate normal likelihood and uniform prior from [-10, 10) used in Crash Course, sampling with gradients would look something like:

```
import numpy as np
import dynesty
# Define the dimensionality of our problem.
ndim = 3
# Define our 3-D correlated multivariate normal log-likelihood.
C = np.identity(ndim)
C[C==0] = 0.95
Cinv = linalg.inv(C)
lnorm = -0.5 * (np.log(2 * np.pi) * ndim +
np.log(np.linalg.det(C)))
def loglike(x):
return -0.5 * np.dot(x, np.dot(Cinv, x)) + lnorm
# Define our uniform prior via the prior transform.
def ptform(u):
return 20. * u - 10.
# Define our gradient with and without the Jacobian applied.
def grad_x(x):
return -np.dot(Cinv, x) # without Jacobian
def grad_u(x):
return -np.dot(Cinv, x) * 20. # with Jacobian for uniform [-10, 10)
# Sample with `grad_u` (including Jacobian).
sampler = dynesty.NestedSampler(loglike, ptform, ndim, sample='hslice',
gradient=grad_u)
sampler.run_nested()
results_with_jac = sampler.results
# Sample with `grad_x` (compute Jacobian numerically).
sampler = dynesty.NestedSampler(loglike, ptform, ndim, sample='hslice',
gradient=grad_x, compute_jac=True)
sampler.run_nested()
results_without_jac = sampler.results
```

For other independent priors discussed in Prior Transforms,
we can use the built-in
functions
in `scipy.stats`

, which include a **probability density function (pdf)** that
is exactly our desired \(\pi_i(v_i)\) function. These then enable us to
compute and apply the (diagonal) Jacobian matrix directly.
In more complex cases such as the simple hierarchical model in
Example: Conditional priors, however, we may need to resort to
estimating the Jacobian numerically to deal with the expected
off-diagonal terms.

## Results¶

Sampling results can be accessed through the `results`

property and are returned as a (modified) dictionary:

```
results = sampler.results
```

We can print a quick summary of the run using
`summary()`

, which provides basic information
about the evidence estimates and overall sampling efficiency:

```
# Print out a summary of the results.
res1.summary()
res2.summary()
```

Out:

```
Summary
=======
nlive: 1000
niter: 6718
ncall: 39582
eff(%): 19.499
logz: -8.832 +/- 0.132
Summary
=======
nlive: 1000
niter: 13139
ncall: 49499
eff(%): 28.564
logz: -8.818 +/- 0.084
```

### Quick Rundown¶

While a number of quantities are contained in the `Results`

instance,
the relevant quantities for most users will be the collection
of samples from the run (`samples`

), their corresponding (unnormalized)
log-weights (`logwt`

), the cumulative log-evidence (`logz`

), and the
*approximate* error on the evidence (`logzerr`

). The remaining quantities are
used to help visualize the output (see Visualizing Results) and might
also be useful for more advanced users who want additional information about
the nested sampling run.

### Full Summary¶

As a dictionary, the full set of quantities provided in `Results`

can be
accessed using `keys()`

. A description of the full set of quantities
included in `Results`

are listed below:

`nlive`

: the number of live points used in the run,`niter`

: the number of iterations (samples),`ncall`

: the total number of function calls,`eff`

: the overall sampling efficiency,`samples`

: the set of samples in the*native parameter space*,`samples_u`

: the set of samples in the*unit cube*,`samples_id`

: the unique particle index associated with each sample,`samples_it`

: the iteration the sample was*originally*proposed,`logwt`

: the log-weight (unnormalized) associated with each sample,`logl`

: the log-likelihood associated with each sample,`logvol`

: the (expected) ln(prior volume) associated with each sample,`logz`

: the cumulative evidence at each iteration (sample),`logzerr`

: the estimated error (standard deviation) on`logz`

, and`information`

: the estimated “information” (see Priors in Nested Sampling) at each iteration (sample).

If the bounding distributions are also saved (the default behavior), then the following quantities are also provided:

`bound`

: a (deep) copy of the set of bounding objects,`bound_iter`

: the index of the bounding object active at a given iteration,`samples_bound`

: the index of the bounding object the sample was*originally proposed from*, and`scale`

: the scale-factor used at a given iteration (used to scale the bounds for non-uniform proposals).

Note that some of these quantities change when using Dynamic Nested Sampling.

## Visualizing Results¶

Assuming we’ve completed a run and stored the resulting `res1`

and `res2`

`Results`

dictionaries as defined above, we can compare what
their relative weights by comparing them directly, as shown below.

In the initial run (`res1`

), we see that the majority of the importance weight
\(\hat{p}_i\) is concentrated near the mode; in the extended run, however,
it is instead concentrated in a ring around the mode. This behavior represents
the fundamental compromise between the likelihood \(\mathcal{L}_i\) and the
change in prior volume \(\Delta X_i\). The stark difference in the
distribution of weights between the two samples is driven entirely by
differences in \(\Delta X_i\). In the extended run (`res2`

), the
distribution of weights directly follows the shape expected from the “typical
set” (see Typical Sets for additional discussion).

By contrast, since the final set of live points after \(N\) samples are
uniformly sampled within \(X_{i=N}\), the expected change in the prior volume
is *constant*. This leads to *linear* (rather than exponential) compression of
the remaining prior volume, where the weight assigned to the
live point with the \(k\)-th lowest likelihood is then
\(\propto f(\mathcal{L}_{N+k}) \, X_N\). In the case where there is a
significant portion of prior volume remaining (as with `res1`

), this leads to
extremely rapid traversal of the remaining prior volume and hence large
importance weights.

### dyplot¶

To avoid introducing an excessive burden on typical users, `dynesty`

comes
with a variety of built-in plotting utilities in the `plotting`

module. These include a variety of generic summary plots as well as ways of
visualizing bounding distributions throughout the course of a run. We can
import them using:

```
from dynesty import plotting as dyplot
```

The `dyplot`

alias will be used for convenient shorthand throughout the
remainded of the documentation. While some basic usage will be demonstrated
below, please see the API for additional details.

One important note is that **the default credible intervals in all plotting
utilities are defined to be 95% (2-sigma) rather than 68% (1-sigma)**.
This is a deliberate choice meant to highlight more realistic uncertainties
(1-in-3 vs 1-in-20 chances) and better capture possible secondary solutions
at the 2.5% level rather than the roughly 16% level.

### Summary Plots¶

One of the most direct ways of visualizing how Nested Sampling computes
the *evidence* is by examining the relationship between the prior volume
\(\ln X_i\) and:

the (effective) iteration \(i\), which illustrates how quickly/slowly our samples are compressing the prior volume,

the likelihood \(\mathcal{L}_i\), to see how smoothly we sample “up” the likelihood distribution to the maximum likelihood (ML) estimate,

the importance weight \(\hat{p}_i\), showcasing where the bulk of the

**posterior mass**is located (i.e. the typical set), andthe evidence \(\hat{\mathcal{Z}}_i\), to see where most of the contribution to the evidence (and its respective errors) are coming from.

A **summary (run) plot** showcasing these features can be generated using
`runplot()`

. As an example, a summary plot for `res2`

comparing it to the actual analytic \(\ln \mathcal{Z}\) evidence solution
can be generated using:

```
lnz_truth = ndim * -np.log(2 * 10.) # analytic evidence solution
fig, axes = dyplot.runplot(res2, lnz_truth=lnz_truth) # summary (run) plot
```

Up until we recycle our final set of live points (see Basic Algorithm), as indicated by the dashed lines, the relationship between \(\ln X_i\) and \(i\) is linear (i.e. prior volume compression is exponential). Afterwards, however, it stretches out, rapidly traversing the remaining prior volume in linear fashion. Comparing the general shape of the likelihood and importance weights subplots also highlight how the typical set is as much a function of \(\Delta X_i\) as \(\mathcal{L}_i\): although contributions initially increase as the likelihood increases, they quickly fall as the ML region encompasses increasingly smaller effective volumes.

### Trace Plots¶

Another common way to visualize the results of many sampling algorithms is to
generate a **trace plot** showing the evolution of particles (and their
marginal posterior distributions) in 1-D projections. This can be done using
the `traceplot()`

function, which plots a combination
of particle positions as a function of \(\ln X\) (colored by importance
weight) and the corresponding 1-D marginalized posterior:

```
fig, axes = dyplot.traceplot(res2, truths=np.zeros(ndim),
truth_color='black', show_titles=True,
trace_cmap='viridis', connect=True,
connect_highlight=range(5))
```

By default, `traceplot()`

returns the samples color-coded
by their relative posterior mass and the 1-D marginalized
posteriors smoothed by a Normal (Gaussian) kernel
with a standard deviation set to ~2% of the provided range
(which defaults to the 5-sigma bounds computed from the set of weighted
samples). It also can overplot input truth vectors as well as highlight
specific particle paths (shown above) to inspect the behavior of individual
particles. These can be useful to qualitatively identify problematic behavior
such as strongly correlated samples.

### Corner Plots¶

In addition to trace plots, another common way to visualize (weighted) samples
is using **corner plots** (also called “triangle plots”), which show a
combination of 1-D and 2-D marginalized posteriors. `dynesty`

supports
several corner plotting functions. The most straightforward is
`cornerpoints()`

, which simply plots the sample positions
colored according to their estimated posterior mass if `kde=True`

and
raw importance weights if `kde=False`

. An example highlighting the
difference between the two runs is shown below:

```
# initialize figure
fig, axes = plt.subplots(2, 5, figsize=(25, 10))
axes = axes.reshape((2, 5)) # reshape axes
# add white space
[a.set_frame_on(False) for a in axes[:, 2]]
[a.set_xticks([]) for a in axes[:, 2]]
[a.set_yticks([]) for a in axes[:, 2]]
# plot initial run (res1; left)
fg, ax = dyplot.cornerpoints(res1, cmap='plasma', truths=np.zeros(ndim),
kde=False, fig=(fig, axes[:, :2]))
# plot extended run (res2; right)
fg, ax = dyplot.cornerpoints(res2, cmap='viridis', truths=np.zeros(ndim),
kde=False, fig=(fig, axes[:, 3:]))
```

Just by looking at our projected samples, it is apparent that the results from
the extended run `res2`

does a much better job of localizing the overall
distribution compared to `res1`

. We can get a better qualitative and
quantitative handle on this by plotting the marginal 1-D and 2-D posterior
density estimates using `cornerplot()`

as:

```
# initialize figure
fig, axes = plt.subplots(3, 7, figsize=(35, 15))
axes = axes.reshape((3, 7)) # reshape axes
# add white space
[a.set_frame_on(False) for a in axes[:, 3]]
[a.set_xticks([]) for a in axes[:, 3]]
[a.set_yticks([]) for a in axes[:, 3]]
# plot initial run (res1; left)
fg, ax = dyplot.cornerplot(res1, color='blue', truths=np.zeros(ndim),
truth_color='black', show_titles=True,
max_n_ticks=3, quantiles=None,
fig=(fig, axes[:, :3]))
# plot extended run (res2; right)
fg, ax = dyplot.cornerplot(res2, color='dodgerblue', truths=np.zeros(ndim),
truth_color='black', show_titles=True,
quantiles=None, max_n_ticks=3,
fig=(fig, axes[:, 4:]))
```

Similar to `runplot()`

, the marginal distributions shown
are by default smoothed by 2% in the specified range using a Normal (Gaussian)
kernel. Notice that even though our original run `res1`

gave
similar evidence estimates to the extended run `res2`

, it gives somewhat
“noisier” estimates of the posterior.

### Bounding Distribution Plots¶

To visualize how we’re sampling in nested “shells”, we can look at the
evolution of our bounding distributions in a given 2-D projection over the
course of a run. The `boundplot()`

function allows us to
look at the bounding distributions from two different perspectives: the
bounding distribution used when proposing new live points at a specific
iteration (specified using `it`

), or the bounding distribution that a given
dead point originated from (specified using `idx`

). While
`boundplot()`

natively plots in the space of the unit
cube, if a specified `prior_transform()`

is passed all samples are instead
converted to the original (native) model space.

Using `boundplot()`

, we can examine the evolution of the
bounding distributions over a given run via:

```
# initialize figure
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
# plot 6 snapshots over the course of the run
for i, a in enumerate(axes.flatten()):
it = int((i+1)*res2.niter/8.)
# overplot the result onto each subplot
temp = dyplot.boundplot(res2, dims=(0, 1), it=it,
prior_transform=prior_transform,
max_n_ticks=3, show_live=True,
span=[(-10, 10), (-10, 10)],
fig=(fig, a))
a.set_title('Iteration {0}'.format(it), fontsize=26)
fig.tight_layout()
```

The figure illustrates that we first begin sampling directly from the unit
cube. After the conditions in `first_update`

are satisfied, we then switch over
to the default multi-ellipsoidal bounding distributions. We see that these are
able to adapt well to the target distribution over time, ensuring we continue
to sample efficiently. We can also see the impact of bootstrapping
on the bounding ellipsoids since they always remain slightly larger than the
set of live points. While it slightly decreases the overall sampling
efficiency, this shows how the procedure helps to ensure no likelihood is
“left out” during the course of the Nested Sampling run.

Alternately, we can generate a corner plot of the bounding distribution using
`cornerbound()`

via:

```
fig, axes = dyplot.cornerbound(res2, it=5000,
prior_transform=prior_transform,
show_live=True,
span=[(-10, 10), (-10, 10)])
```

## Basic Post-Processing¶

In addition to plotting, `dynesty`

also contains some post-processing
utilities in the `utils`

module. In many cases, a rough
approximation of the posterior using the first two moments (mean and
covariance) can be useful. These can be computed from the set of (weighted)
samples using the `mean_and_cov()`

function:

```
from dynesty import utils as dyfunc
samples, weights = res2.samples, np.exp(res2.logwt - res2.logz[-1])
mean, cov = dyfunc.mean_and_cov(samples, weights)
```

Runs can also be resampled to give a mew set of points with equal
weights, similar to MCMC methods, using the
`resample_equal()`

function:

```
new_samples = dyfunc.resample_equal(samples, weights)
```

See Nested Sampling Errors for some additional discussion and demonstration of more functions.