## Cumulants and the Cornish-Fisher Expansion

“Consider the following.” (Bill Nye the Science Guy)

There are $N$ random variables drawn from the same kind of probability distribution, but with different parameters for each. In this example, I’ll consider $N$ random variables $x_{j} \sim \mathcal{B}(p_{j})$, that is, each drawn from a Bernoulli distribution, but having different probabilities $p_{j}$.

Of interest is a particular sum: $S = \sum_{j=0}^{N-1} w_{j} x_{j}.$

where the $w_{j}$ are fixed weights, not all of which are zero. Given the $N$ $p_{j}$ values, and the $N$ weights $w_{j}$, the problem is to calculate the value of $S$, say, $S_{q}$, at a given quantile quantile $q$ of the cumulative distribution of $S$. Indeed, suppose many such values $S_{q}$ are sought, for various $q$. How do you do it?

One technique which most data scientists proffer is to simulate many such Bernoulli draws and calculate quantiles empirically from the resulting simulations. This is possible, but the question is how much error is there in such simulations given arbitrary choices of $N$, the $p_{j}$, the $w_{j}$, and $q$? Also, how does the error behave as the number of simulated draws increases? Does it decrease monotonically?

There is another approach: Using the Cornish-Fisher expansion, a technique dating from 1937. But before meeting the expansion, cumulants need to be introduced.

### Cumulants

Technically, cumulants are the coefficients obtained from a series expansion of the log of the characteristic function of a probability density function. That’s dense. Let’s break it down.

The characteristic function is the expected value of a particular non-linear function of a random variable $X$, where $X \sim P(x)$, and $P(x)$ is the probability density function for $X$. That non-linear function is $\varphi_{X}(t) = E\llbracket \exp{(\mathbf{i} t X)} \rrbracket.$

where $\mathbf{i} = \sqrt{-1}$, the complex basis. Using the definition of expected value, and assuming the domain for $X$ is continuous, $\varphi_{X}(t) = E\llbracket \exp{(\mathbf{i} t X)} \rrbracket = \int_{-\infty}^{\infty} \exp{(\mathbf{i} t X)} P(x)\,dx.$

If $X$ were discrete, then $\varphi_{X}(t) = E\llbracket \exp{(\mathbf{i} t X)} \rrbracket = \frac{\sum_{x \in \mathcal{D}(X)} \exp{(\mathbf{i} t X)} P(x)}{\sum_{x \in \mathcal{D}(X)} P(x)}.$

where $\mathcal{D}(X)$ denotes the (finite) domain of the random variable $X$.

Inspection shows these expressions are the value of a general Fourier transform of the probability density function, $P(x)$, one having constants $a=1, b=1$.

Taking the log, then $\log{\varphi_{X}(t) } = \sum_{n=1}^{\infty} \kappa_{n} \frac{(\mathbf{i} t)^{n}}{n!}.$

where (finally!) the $\kappa_{1}, \kappa_{2}, \dots$ are the cumulants. There is a cumulant generating function definable, $\mathbf{K}(h) = \log{\mathbf{M}(h)} = \sum_{n=1}^{\infty} \kappa_{n} \frac{h^{n}}{n!}.$ $\mathbf{M}_{X}(t) = E\llbracket \exp{(t X)} \rrbracket = \sum_{n=1}^{\infty} \frac{t^{n} E\llbracket X^{n} \rrbracket}{n!} = \sum_{n=1}^{\infty} \frac{t^{n} \mu_{n}}{n!}.$

where $\mu_{n}$ denotes the $n$th moment of $X$. In fact, moments and cumulants are interrelated and can be obtained from one another. This can be useful. Another fact, particularly useful for the problem at hand, is that if $\kappa_{r}(Y)$ denotes the $r$th cumulant of some random variable $Y$ and that random variable is itself the sum of $m$ other random variables, per $Y = \sum_{k=1}^{m} X_{k}.$

then $\kappa_{r}(Y) = \kappa_{r}(\sum_{k=1}^{m} X_{k}) = \sum_{k=1}^{m} \kappa_{r}(X_{k}).$

assuming cumulants exist for all $X_{k}$. What this means is that to the degree to which the probability distribution of a sum of random variables can be characterized by or even recovered from knowledge of its cumulants, we can readily find the cumulants of such a sum from the cumulants of each component of the sum. If moment generating functions are used, or simply probability densities are used, one can calculate the same, but it entails convolutions of densities or products of characteristic functions. These interchanges are really just consequences of the product-convolution identity for Fourier transforms.

This just sketches the beginning of things that can be done with cumulants. See k-statistics and polykays for others, even empirical uses.

### The Cornish-Fisher expansion

Given knowledge of cumulants, the Cornish-Fisher expansion gives an estimate for $S_{q}$ directly from the cumulants of $S$, assuming these exist. In the case of the weighted Bernoulli sum here, they certainly do.

### Cumulants for $S$

The first five cumulants for each part, $w_{j} x_{j}$ of the sum $S$ are: $\kappa_{1,j} = p_{j} w_{j}.$ $\kappa_{2,j} = p_{j} (1 - p_{j}) w_{j}^{2}.$ $\kappa_{3,j} = p_{j} (p_{j} - 1) (2 p_{j} - 1) w_{j}^{3}.$ $\kappa_{4,j} = p_{j} (1 - p_{j}) (1 + 6 p_{j} (p_{j} - 1)) w_{j}^4.$ $\kappa_{5,j} = p_{j} (p_{j} - 1) (2 p_{j} - 1) (1 + 12 p_{j} (p_{j} - 1)) w_{j}^{5}.$

These are elemental cumulants and, so, to obtain the corresponding cumulants for $N$ pairs of $w_{j}$ and $p_{j}$ where, recall, $x_{j} \sim \mathcal{B}(p_{j})$, $\kappa_{k} = \sum_{j=0}^{N-1} \kappa_{k,j}.$

Note that

• $\kappa_{1}$ corresponds to the mean.
• $\kappa_{2}$ corresponds to the variance.
• $\kappa_{3}$ corresponds to skewness, hereinafter denoted $\mathcal{S}$.
• $\kappa_{4}$ corresponds to kurtosis, hereinafter denoted $\mathcal{K}$.
• $\kappa_{5}$ does not correspond to any central moment or simple combination of them.

Details are available here.

### A Cornish-Fisher expansion for $S$

library(mpoly)
theoreticalMean<- function(W,P) sum(W*P)
theoreticalVariance<- function(W,P) sum(W^2*P*(1-P))
theoreticalSkew<- function(W,P) sum(P*(P-1)*(2*P-1)*W^3)
# (This is not excess kurtosis. Still need to subtract 3 to get kurtosis w.r.t. Gaussian.)
theoreticalKurtosis<- function(W,P) sum(P*(1-P)*(1 + 6*P*(P - 1))*W^4)
theoreticalKappa5<- function(W,P) sum(P*(P-1)*(2*P-1)*(1 + 12*P*(1-P))*W^5)

gamma1<- function(W,P) theoreticalSkew(W,P)/sqrt(theoreticalVariance(W,P)^3)
gamma2<- function(W,P) theoreticalKurtosis(W,P)/sqrt(theoreticalVariance(W,P)^4)
gamma3<- function(W,P) theoreticalKappa5(W,P)/sqrt(theoreticalVariance(W,P)^5)

He.polynomials<- hermite(degree=1:4, kind="he", normalized=TRUE)
ThePoint<- 0.10
TheQuantile<- qnorm(ThePoint)

yAtThePoint<- function(xPoint=0.1, W, P)
{
#
xQuantile<- qnorm(xPoint)
#
He<- unlist(sapply(X=He.polynomials, FUN=function(He.k) as.function(He.k, silent=TRUE)(xQuantile)))
#
h1<- He/6
h2<- He/24
h11<- - (2*He+He)/36
h3<- He/120
h12<- -(He + He)/24
h111<- (12*He + 19*He)/324
#
g1<- gamma1(W,P)
g2<- gamma2(W,P)
g3<- gamma3(W,P)
#
mu<- theoreticalMean(W,P)
sigma<- sqrt(theoreticalVariance(W,P))
#
w<- xQuantile + (g1*h1) + (g2*h2 + g1^2*h11) + (g3*h3 + g1*g2*h12 + g1^3*h111)
#
yp<- mu + sigma*w
#
return(yp)
}


### Simulating $S$ using many Bernoulli draws

A simulation for $S$ was developed wherein a vector, $\mathbf{p}$, of probabilities were drawn from a Beta distribution, a vector of non-negative weights, $\mathbf{w}$, were drawn from a Gamma distribution, and then, governed by $\mathbf{p}$, $M = 10000$ vectors $x_{k}, k \in \{1, \dots, M\}$, were drawn from the Bernoulli distribution. Then, $S_{k} = w_{k} \cdot x_{k}$ was calculated for each and saved. $N$, the length of these vectors, was chosen to be 10, typical for the application described motivating this problem. It also suffices for illustration. An estimated probability density was developed from the $M$-sized collection of saved sums. Thirty of these are shown below. ### Comparing the two approaches $S_{q=0.1}$ was a quantile chosen of interest, and used for comparison. In addition to comparison with the numerical simulation, the PDQutils package of R was used to independently calculate the value for these quantiles as a check on derivations. Empirical quantiles were obtained from the simulations using the hdquantile function from the Hmisc package of R. The results are given below.  ### Comment on Cumulants and the Cornish-Fisher expansion in statistical education

Cumulants and the Cornish-Fisher expansion aren’t regularly taught any longer in courses on theoretical statistics, at least judging by material in statistics textbooks. For example,

• J. S. Bendat, A. G. Piersol, Random Data: Analysis and Measurement Procedures, $4^{th}$ edition, Wiley, 2010
• A. W. Drake, Fundamentals of Applied Probability, McGraw-Hill Book Company, 1967
• J. A. Rice, Mathematical Statistics and Data Analysis, $3^{rd}$ edition, Duxbury, Thomson/Brooks/Cole, 2007
• M. H. DeGroot, M. J. Schervish, Probability and Statistics, $3^{rd}$ edition, Addison-Wesley, 2002
• K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate Analysis, Academic Press, 1979
• R. Durrett, Probability: Theory and Examples, $4^{th}$ edition, Cambridge University Press, 2010

lack any mention of either cumulants or the expansion, although moment-generating functions are inevitably mentioned. In contrast,

• R. V. Hogg, J. W. McKean, A. T. Craig, Introduction to Mathematical Statistics, $6^{th}$ edition, Pearson/Prentice-Hall, 2005
• A. N. Shiryaev, R. P. Boas (translator), Probability, $2^{nd}$ edition, Springer, 1996
• E. Parzen, Modern Probability Theory and Its Applications, John Wiley, 1960
• D. E. Knuth, The Art of Computer Programming, Volume 1: Fundamental Algorithms, $3^{rd}$ edition, Addison-Wesley, 1997

do mention cumulants. Parzen (1960) has quite a bit about them, and relationships to sums of random variables. The erudite Professor Knuth has three pages mentioning them. That suggests, however, that most specialists in computer science ought to know of them, but I’d bet most don’t.

Cumulants are often mentioned during courses on random walks and diffusion, because a random walk can be considered a kind of a sum.

### Postscript

Should readers be interested in the code, I can prepare a documented version of it to use. Note that this would be in the interest of reproducibility. As mentioned above, the PDQutils package of R offers a means of calculating quantiles given cumulants directly, via its function qapx_cf. Of course, theoretical expressions for quantiles demand symbolic calculation, such as offered by the capabilities of Wolfram Mathematica. 