“Consider the following.” (Bill Nye the *Science Guy*)

There are random variables drawn from the same *kind* of probability distribution, but with different parameters for each. In this example, I’ll consider random variables , that is, each drawn from a *Bernoulli distribution*, but having different probabilities .

Of interest is a particular sum:

where the are fixed *weights*, not all of which are zero. Given the values, and the weights , the problem is to calculate the value of , say, , at a given quantile quantile of the cumulative distribution of . Indeed, suppose *many* such values are sought, for various . How do you do it?

One technique which most data scientists proffer is to *simulate* many such Bernoulli draws and calculate quantiles empirically from the resulting simulations. This is possible, but the question is how much error is there in such simulations given arbitrary choices of , the

, the , and ? Also, how does the error behave as the number of simulated draws increases? Does it decrease monotonically?

There is another approach: Using the * Cornish-Fisher expansion*, a technique dating from 1937. But before meeting the expansion,

*cumulants*need to be introduced.

**Cumulants**

Technically, cumulants are the coefficients obtained from a series expansion of the * log of the characteristic function of a probability density function*. That’s dense. Let’s break it down.

The *characteristic function* is the *expected value* of a particular non-linear function of a random variable , where , and is the probability density function for . That non-linear function is

where , the complex basis. Using the definition of expected value, and assuming the domain for is continuous,

If were discrete, then

where denotes the (finite) domain of the random variable .

Inspection shows these expressions are the value of a general Fourier transform of the probability density function, , one having constants .

Taking the log, then

where (finally!) the are the ** cumulants**. There is a cumulant generating function definable,

roughly analogous to the *moment generating function*, or

where denotes the th moment of . In fact, moments and cumulants are interrelated and can be obtained from one another. This can be useful. Another fact, particularly useful for the problem at hand, is that if denotes the th cumulant of some random variable and that random variable is itself the sum of other random variables, per

then

assuming cumulants exist for all . What this means is that to the degree to which the probability distribution of a sum of random variables can be characterized by or even recovered from knowledge of its cumulants, we can readily find the cumulants of such a sum from the cumulants of each component of the sum. If moment generating functions are used, or simply probability densities are used, one can calculate the same, but it entails convolutions of densities or products of characteristic functions. These interchanges are really just consequences of the product-convolution identity for Fourier transforms.

This just sketches the beginning of things that can be done with cumulants. See k-statistics and polykays for others, even empirical uses.

**The Cornish-Fisher expansion**

Given knowledge of cumulants, the Cornish-Fisher expansion gives an estimate for directly from the cumulants of , assuming these exist. In the case of the weighted Bernoulli sum here, they certainly do.

**Cumulants for **

The first five cumulants for each part, of the sum are:

These are elemental cumulants and, so, to obtain the corresponding cumulants for pairs of and where, recall, ,

Note that

- corresponds to the
*mean*. - corresponds to the
*variance*. - corresponds to
*skewness*, hereinafter denoted . - corresponds to
*kurtosis*, hereinafter denoted . - does not correspond to any central moment or simple combination of them.

Details are available here.

**A Cornish-Fisher expansion for **

library(mpoly) theoreticalMean<- function(W,P) sum(W*P) theoreticalVariance<- function(W,P) sum(W^2*P*(1-P)) theoreticalSkew<- function(W,P) sum(P*(P-1)*(2*P-1)*W^3) # (This is not excess kurtosis. Still need to subtract 3 to get kurtosis w.r.t. Gaussian.) theoreticalKurtosis<- function(W,P) sum(P*(1-P)*(1 + 6*P*(P - 1))*W^4) theoreticalKappa5<- function(W,P) sum(P*(P-1)*(2*P-1)*(1 + 12*P*(1-P))*W^5) gamma1<- function(W,P) theoreticalSkew(W,P)/sqrt(theoreticalVariance(W,P)^3) gamma2<- function(W,P) theoreticalKurtosis(W,P)/sqrt(theoreticalVariance(W,P)^4) gamma3<- function(W,P) theoreticalKappa5(W,P)/sqrt(theoreticalVariance(W,P)^5) He.polynomials<- hermite(degree=1:4, kind="he", normalized=TRUE) ThePoint<- 0.10 TheQuantile<- qnorm(ThePoint) yAtThePoint<- function(xPoint=0.1, W, P) { # xQuantile<- qnorm(xPoint) # He<- unlist(sapply(X=He.polynomials, FUN=function(He.k) as.function(He.k, silent=TRUE)(xQuantile))) # h1<- He[2]/6 h2<- He[3]/24 h11<- - (2*He[3]+He[1])/36 h3<- He[4]/120 h12<- -(He[4] + He[2])/24 h111<- (12*He[4] + 19*He[2])/324 # g1<- gamma1(W,P) g2<- gamma2(W,P) g3<- gamma3(W,P) # mu<- theoreticalMean(W,P) sigma<- sqrt(theoreticalVariance(W,P)) # w<- xQuantile + (g1*h1) + (g2*h2 + g1^2*h11) + (g3*h3 + g1*g2*h12 + g1^3*h111) # yp<- mu + sigma*w # return(yp) }

**Simulating using many Bernoulli draws**

A simulation for was developed wherein a vector, , of probabilities were drawn from a Beta distribution, a vector of non-negative weights, , were drawn from a Gamma distribution, and then, governed by , vectors , were drawn from the Bernoulli distribution. Then, was calculated for each and saved. , the length of these vectors, was chosen to be 10, typical for the application described motivating this problem. It also suffices for illustration. An estimated probability density was developed from the -sized collection of saved sums. Thirty of these are shown below.

**Comparing the two approaches**

was a quantile chosen of interest, and used for comparison. In addition to comparison with the numerical simulation, the * PDQutils package* of

**R**was used to independently calculate the value for these quantiles as a check on derivations. Empirical quantiles were obtained from the simulations using the

*hdquantile*function from the

*Hmisc*package of

**R**. The results are given below.

**Comment on Cumulants and the Cornish-Fisher expansion in statistical education**

Cumulants and the Cornish-Fisher expansion aren’t regularly taught any longer in courses on theoretical statistics, at least judging by material in statistics textbooks. For example,

- J. S. Bendat, A. G. Piersol,
*Random Data: Analysis and Measurement Procedures*, edition, Wiley, 2010 - A. W. Drake,
*Fundamentals of Applied Probability*, McGraw-Hill Book Company, 1967 - J. A. Rice,
*Mathematical Statistics and Data Analysis*, edition, Duxbury, Thomson/Brooks/Cole, 2007 - M. H. DeGroot, M. J. Schervish,
*Probability and Statistics*, edition, Addison-Wesley, 2002 - K. V. Mardia, J. T. Kent, J. M. Bibby,
*Multivariate Analysis*, Academic Press, 1979 - R. Durrett,
*Probability: Theory and Examples*, edition, Cambridge University Press, 2010

lack any mention of either cumulants or the expansion, although moment-generating functions are inevitably mentioned. In contrast,

- R. V. Hogg, J. W. McKean, A. T. Craig,
*Introduction to Mathematical Statistics*, edition, Pearson/Prentice-Hall, 2005 - A. N. Shiryaev, R. P. Boas (translator),
*Probability*, edition, Springer, 1996 - E. Parzen,
*Modern Probability Theory and Its Applications*, John Wiley, 1960 - D. E. Knuth,
*The Art of Computer Programming, Volume 1: Fundamental Algorithms*, edition, Addison-Wesley, 1997

* do* mention cumulants. Parzen (1960) has quite a bit about them, and relationships to sums of random variables. The erudite Professor Knuth has three pages mentioning them. That suggests, however, that most specialists in computer science ought to know of them, but I’d bet most don’t.

Cumulants are often mentioned during courses on random walks and diffusion, because a random walk can be considered a kind of a sum.

**Postscript**

Should readers be interested in the code, I can prepare a documented version of it to use. Note that this would be in the interest of reproducibility. As mentioned above, the * PDQutils package* of

**R**offers a means of calculating quantiles given cumulants directly, via its function

*qapx_cf*. Of course, theoretical expressions for quantiles demand symbolic calculation, such as offered by the capabilities of

*Wolfram Mathematica*.