Bayesian deconvolution of stick lengths


Consider trying to determine the length of a straight stick. Instead of the measurement errors being clustered about zero, suppose the errors are known to be always positive, that is, no measurement ever underestimates the length of the stick. Such a stick is shown in the figure below.

StickPlusError

Basically, what’s observed is y_{i} = L + x_{i}, with x_{i} being a random variable such that x_{i} > 0.

This problem is actually equivalent to a class of others. For example, the figure below shows one. In this case there are actually M straight sticks, each of which have a length L_{j} and “errors in measurement” x_{i,j}. But what’s observed is a single length, y_{i}, as

y_{i} = \sum_{j=1}^{M} (L_{j} + x_{i,j}) = \sum_{j=1}^{M} L_{j} + \sum_{j=1}^{M} x_{i,j}

Here the overall length L = \sum_{j=1}^{M} L_{j} and what was called the added random variable before is x_{i} =  \sum_{j=1}^{M} x_{i,j}.

ArticulatedStickWithErrors

How would this be done. Well, L is constant so if, say, it was known how the measurement errors were distributed, then obtaining L is straightforward.

Suppose, for instance, that the x_{i} are distributed as a Poisson. Then, because the mean of a Poisson is equal to its variance, and because, for any random variable, X, and constant, c,

\text{VAR}[c + X] = \text{VAR}[X]

estimating L amounts to:

  1. calculating the mean of y_{i}
  2. calculating the variance of y_{i}
  3. taking their difference, or E[y_{i}] - \text{VAR}[y_{i}]

In some applications, the “randomness” of the x_{i} may be a descriptive approximation. For example, in percolation networks, the path length of a water particle through a closely packed and already wet section of sand of differing grain size can be thought of as it making a number of choices of turns about impeding grains, the net being that these appear as random errors of positive size, even if they are, in each instance specific and unchanging, even if not knowable. Such a path is shown in the figure below.

percolation

These are interesting mathematical problems, with applications in hydrology, geophysics, epidemiology, and sociology.

What happens if the errors are not Poisson? Well, different models for the errors could be used — and even a mix of models — but, in general, such errors could be multi-modal

BimodalAntsSmaller

In that case, a non-parametric model is desirable. This can be combined in a Bayesian scheme using a stick-breaking prior. (See also.) The length of the stick, L, cannot be larger than the minimum length observed, \min{y_{i}}. Thus, as a candidate for L, pick a value, \lambda_{1}, as a sample of a uniform random number on the interval from zero to \min{y_{i}}. Then, as a candidate for the mean of the measurement noise, pick a value, \lambda_{2}, as a sample of a uniform random number on the interval from zero to \max{y_{i}} - \lambda_{1}. Then observe the variance of the y_{i} as a Gamma distribution with some reasonable shape. This is a Bayesian hierarchical model.

To get specific,

\lambda_{1} \sim \mathcal{U}(0, \min{y_{i}})
\lambda_{2} \sim \mathcal{U}(0, \max{y_{i}} - \lambda_{1})
\text{VAR}[y_{i}] \sim \mathsf{Ga}(\kappa, \frac{\lambda_{2}}{\kappa})
\kappa \sim \mathcal{U}(2, 100)

Here the Gamma density, \mathsf{Ga}(\kappa, \frac{\lambda_{2}}{\kappa}) has a shape of \kappa and a scale of \frac{\lambda_{2}}{\kappa}. Sometimes, as in JAGS and BUGS, Gamma densities are parameterized in terms of a rate instead of a scale, which is the reciprocal of the scale or, here, \frac{\kappa}{\lambda_{2}}. This can be seen as a set-up for a stochastic search.

How well does it work?

Well, first here is the BUGS code:

var m, n, y[n,m], mu[2,m],
ymin[m], ymax[m], yvar[m],
kappa;
model
{
# PRIORS
kappa ~ dunif(2, 100)
for (j in 1:m)
{
# mu[1,:] is the estimate of the minimum
# mu[2,:] is the mean of added noise
mu[1,j] ~ dunif(0, ymin[j])
mu[2,j] ~ dunif(0, (ymax[j] - mu[1,j]))
yvar[j] ~ dgamma( kappa, kappa/mu[2,j] )
}
# MODEL
for (j in 1:m)
{
for (i in 1:n)
{
y[i,j] ~ dnorm((mu[1,j] + mu[2,j]), 100)
}
}
#inits# .RNG.name, .RNG.seed
#monitor# mu, kappa
#data# n, m, y, ymin, ymax, yvar
}

The tests reported here ran 3 independent cases, so m = 3. n denotes the number of observations per case, a value which will be indicated when the results are reported below. y denotes the observations, n rows and m columns. \text{ymin}, \text{ymax}, and \text{yvar} are each vectors of length m containing the minimum, maximum, and variance for each of the m cases. Different kinds of positive error densities were used for a simple evaluation. In all cases the “true lengths” of the sticks were 13, 5, and 21, respectively, for cases 1, 2, and 3.

The first errors added were Poisson, each having a mean and variance of 2. There were n = 20 observations generated for each of the 3 cases. The means of the 3 sets of 20 were 15.6, 6.55, and 23.25, respectively, and their variances were 3.515789474, 1.944736842, and 1.039473684. (The excess precision is deliberate, for purposes of documentation.) The recipe given above for Poisson data produces estimates of the lengths as 12.08421053, 4.605263158, and 22.21052632, compared to the actuals of 13, 5, and 21. The stick-breaking Bayesian procedure above produced estimates of 11.5131224643, 4.2231132697, and 20.688221523. This was run using JAGS using 10,000 adaptation steps, 200,000 burn-in steps, and 3 million primary steps, thinning 1-in-100 for 30,000 samples, effectively. Gelman-Rubin multivariate PSRF was 1.02, and all variables had PSRF values under 1.05. (This run took 3 minutes using 3 of 4 cores on a 3.2 GHz 64-bit workstation, with no memory constraints.) The common \kappa value had a mean of 25.9.

The next errors added were 4 times a Beta random variable having both its shape parameters equaling 2. That looks like the purple line in

Beta_distribution

It’s mean is 2 and its maximum is 4. n = 50 observations per case were used. The stick-breaking Bayesian procedure produced estimates of 12.8382797743, 5.0637212340, and 20.2159258567, respectively for the cases where the actual lengths are 13, 5, and 21. Again, JAGS was used, with 10,000 adaptation steps, 200,000 burn-in steps, and 3 million primary, thinning 1-in-100. The multivariate PSRF was worse than in the Poisson case, begin 1.15. Convergence was poorer for the first of the three cases than in the others. (This run took 6.6 minutes on the same hardware as above.) The common \kappa value had a mean of 11.0.

The next errors added were half-Gaussians, of mean 4 and standard deviation 1. In other words, Gaussian errors were generated, but their absolute values were taken before adding them to the true stick lengths. n = 30 observations were used in each case. The stick-breaking Bayesian procedure produced estimates of 14.0260363155, 5.4179282352, and 20.370834914, respectively. Note the corresponding minimum values of the populations were 15.5700347670, 6.6854522364, and 23.0599805370. 10,000 adaptation steps were used, and 200,000 burn-in steps. In this case 9 million primary steps were taking, again thinning 1-in-100. (This took 11.2 minutes.) The multivariate PSRF was 1.15, with the first and third cases showing some convergence stress. The common \kappa value had a mean of 3.2.

Finally, Gamma errors were added, having a mean of 2 and a variance of 0.25. $n = 50$ observations were generated. The stick-breaking Bayesian procedure yielded estimates of 13.7986086833, 5.9373813183, and 22.1958578133, respectively, for the cases of lengths 13, 5, and 21. There were 10,000 adaptation steps, 200,000 burn-in steps, and 3 million primary steps, with 1-in-100 thinning. Multivariate PSRF was unity, and all of the parameters had PSRF values below 1.05. (This took 6.3 minutes.) The common \kappa value had a mean of 11.0.

As is typical for non-parametric procedures, these results suggest Bayesian non-parametric procedures are robust for this kind of estimation, but need more data. Still, they are less susceptible to risks of model mismatches.

About hypergeometric

See http://www.linkedin.com/in/deepdevelopment/ and http://667-per-cm.net
This entry was posted in Bayesian, Gibbs Sampling, JAGS, mathematics, maths, optimization, probabilistic programming, R, statistics, stochastic algorithms, stochastic search. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s