## Today, now, and what of the future?

From Aldo Leopold in his A Sand County Almanac:

One of the penalties of an ecological education is that one lives alone in a world of wounds. Much of the damage inflicted on land is quite invisible to laymen. An ecologist must either harden his shell and make believe that the consequences of science are none of his business, or he must be the doctor who sees the marks of death in a community that believes itself well and does not want to be told otherwise.

Among many other notable efforts, Aldo Leopold attempted to reconcile Ecology with economic imperatives. I still, however, cannot stomach the encouragement of hunting which Leopold’s post mortem Foundation pursues.

## Fast means, fast moments (originally devised 1984)

###### (Updated 4th December 2018.)

There are many devices available for making numerical calculations fast. Modern datasets and computational problems apply stylized architectures, and use approaches to problems including special algorithms for just calculating dominant eigenvectors or using non-classical statistical mechanisms like shrinkage to estimate correlation matrices well. But sometimes it’s simpler than that.

In particular, some calculations can be preconditioned. What’s done varies with the problem, but often there is a one-time investment in creating a numerical data structure which is then used repeatedly to obtain fast performance of some important kind.

This post is a re-presentation of a technique I devised in 1984 for finding sums of contiguous submatrices of a given matrix in constant time. The same technique can be used to calculate moments of such submatrices, in order to support estimating statistical moments or moments of inertia, but I won’t address those in this post. Studies of these problems, given this technique, can imagine how that might be done. If there’s an interest and a need, with comments or email, I’ll some day post algorithm and code for doing moments.

#### The Algorithm

Assume there is a square matrix, $\mathbf{A}$, having flonums. Accordingly it has both $m$ rows and $m$ columns. Produce from it a transformed matrix $\mathbf{A_{1,2}}$ in two steps, first producing a transformed matrix, $\mathbf{A_{1}}$, and then the transformed matrix, $\mathbf{A_{1,2}}$. Do this by first forming $\mathbf{A_{1}}$ as $\mathbf{A}$ but with its $j^{\text{th}}$ column replaced by the cumulative sum of the $j^{\text{th}}$ column of $\mathbf{A}$. That is,

$\mathbf{A_{1}}_{i,j} = \sum_{k=1}^{i} \mathbf{A}_{k,j}$

To form $\mathbf{A_{1,2}}$ do something similar, except with rows:

$\mathbf{A_{1,2}}_{i,j} = \sum_{k=1}^{j} \mathbf{A_{1}}_{i,k}$

$\mathbf{A_{1,2}}$ is then the preconditioned version of ${A}$. Specifically, if the sum of any contiguous rectangular submatrix of $\mathbf{A}$ is sought, say, with its upper left coordinate within $\mathbf{A}$ being $(i,j)$ and its lower right being $(k,l)$, then lookup the value of $\mathbf{A_{1,2}}_{k,l}$ and subtract from it the value of $\mathbf{A_{1,2}}_{i-1,l}$ and the value of $\mathbf{A_{1,2}}_{k,j-1}$, and then add back in, because of the principle of inclusion and exclusion, the value of $\mathbf{A_{1,2}}_{i-1,j-1}$. If any of these are off the premises of $\mathbf{A_{1,2}}$ because $i-1$ or $j-1$ are too small, treat the corresponding subtrahends as zero.

Because there are only two sums involved in the calculation of the sum of any submatrix, the algorithm does this in constant time, irrespective of the sizes of the submatrices. There are dynamic range tricks that can be applied should the sums of values in the preconditioned matrix get very large in magnitude.

The code is available in the Google Drive folder for such things.

 ############################################################################################### # This section of code up to the next group of #### marks is owned and copyrighted by # # Jan Galkowski, who has placed it in the public domain, available for any purpose by anyone. # ###############################################################################################

 fastMeansFastMomentsPrecondition<- function(X) { # # (From Jan Galkowski, Fast means, fast moments'', 1984, # # IBM Federal Systems Division, Owego, NY. Released into the public domain, 1994.) stopifnot( is.matrix(X) ) M<- nrow(X) stopifnot( M == ncol(X) ) AX1<- apply(X=X, MARGIN=2, FUN=cumsum) # AX2<- t(apply(X=X, MARGIN=1, FUN=cumsum)) AX12<- t(apply(X=AX1, MARGIN=1, FUN=cumsum)) return(AX12) } fastMeansFastMomentsBlock<- function(P, iUL, jUL, iLR, jLR) { # # (From Jan Galkowski, Fast means, fast moments'', 1984, # # IBM Federal Systems Division, Owego, NY. Released into the public domain, 1994.) # # P is the preconditioned AX12 from above. # stopifnot( is.matrix(P) ) M<- nrow(P) stopifnot( M == ncol(P) ) stopifnot( (1 <= iUL) && (iUL <= M) ) stopifnot( (1 <= jUL) && (jUL <= M) ) stopifnot( (1 <= iLR) && (iLR <= M) ) stopifnot( (1 <= jLR) && (jLR <= M) ) # iUL1<- iUL-1 jUL1<- jUL-1 iLR1<- iLR-1 jLR1<- jLR-1 # if (0 == iUL1) { W.AL<- 0 W.A<- 0 if (0 == jUL1) { W.L<- 0 } else { W.L<- P[iLR,jUL1] } } else if (0 == jUL1) { W.AL<- 0 W.L<- 0 if (0 == iUL1) { W.A<- 0 } else { W.A<- P[iUL1,jLR] } } else { W.AL<- P[iUL1,jUL1] W.A<- P[iUL1,jLR] W.L<- P[iLR,jUL1] } # W<- P[iLR,jLR] + W.AL - W.A - W.L # return(W) } # Self-test FMFM ... cat("Fast means, fast moments self-test ...\n") Z<- matrix(round(runif(100, min=1, max=100)), 10, 10) Z.P<- fastMeansFastMomentsPrecondition(Z) stopifnot( sum(Z[1:4,1:5]) == fastMeansFastMomentsBlock(Z.P, 1, 1, 4, 5) ) stopifnot( sum(Z[8:10, 8:9]) == fastMeansFastMomentsBlock(Z.P, 8, 8, 10, 9) ) stopifnot( sum(Z[4:7, 3:5]) == fastMeansFastMomentsBlock(Z.P, 4, 3, 7, 5) ) rm(list=c("Z", "Z.P")) cat("... Self-test completed.\n") ############################################################################################### # End of public domain code. # ############################################################################################### randomizeSeed<- function() { #set.seed(31415) # Futz with the random seed E<- proc.time()["elapsed"] names(E)<- NULL rf<- E - trunc(E) set.seed(round(10000*rf)) # rm(list=c("E", "rf")) return( sample.int(2000000, size=sample.int(2000, size=1), replace=TRUE)[1] ) } 

wonkyRandom<- randomizeSeed() 

So, there’s a little story of how this came to be public domain.

I used to work for IBM Federal Systems Division, in Owego, NY. I worked as both a software engineer and later, in a much more fun role, as a test engineer specializing in hardware-software which did quantitative calculations. IBM, as is common, had a strong intellectual property protection policy and framework.

Well, in 1994, Loral bought Federal Systems from IBM. As former employees, we were encouraged to join the new organization, but, naturally, were asked to sign a bunch of paperwork. To my personal surprise, there was nothing in the paperwork which had to do with IBM’s rights to intellectual property we might have originated or held. All they wanted was for us to sign onto the new division, part of Loral.

Before I signed, therefore, I approached corporate counsel and pointed out there was no restriction on intellectually property or constraints upon its disposition. They, interested in making the transition happen, said “Yes”. I went off and prepared a document of all the material and algorithms and such which I thought I had developed while on IBM time in which they hadn’t expressed any interest, including offers to put it up for patenting or whatever. It was a reasonably thick document, maybe 100 pages, and I still have a copy. I asked the attorneys to sign over the intellectual property rights of these to me, and they did. It was witnessed. I had made my coming to Loral contingent upon doing this, and they seemed happy to do it. I signed once I had this in hand. I still have a copy.

I did not know at the time, but Loral’s interest was in purchasing Federal Systems, and they had every intention of “flipping it”, as one might a house, to another buyer in a short time, and they apparently didn’t care about this.

But, as a result, this algorithm, for fast means and fast moments, which I had developed in 1984 while doing some image processing work, became mine. And I have always treated it as public domain, available to anyone for any purpose. And, here, with this post, I put it out there for your public use, for any purpose whatsoever, without constraints. Very open source. Commercial or otherwise.

Enjoy.

It would be nice to credit me, but you don’t have to do that.

## Heat has no hair (from Eli Rabett)

See Eli’s post.

Excerpt:

We can summarize the data in the figure above adding that ~40 W/m2 go directly from the surface to space as IR radiation of the 398 W/m2 leaving the surface. In and out in the table … [AT THE POST] … means into and out the surface the atmosphere and space respectively. In is taken as a positive addition to the heat content and negative a decrease …

… The important point is to realize that surface IR radiation absorbed in the atmosphere is rapidly (10 μs) thermalized and converted into random motion of the molecules in the atmosphere, just as is latent heat from condensation of water vapor and from sensible heat. Very little, less than a part per million, is directly radiated back to the surface and we can neglect that.

The 342 W/m2 of back radiation is OBSERVED, so this ain’t a model or a theory, where does it come from? It comes from ALL of the sources pushing heat into the atmosphere, from the convective and radiative heat transfer from the surface.

###### (Emphasis in the original.)

Perhaps it’s just my physics training, but I never understood the primacy some (even scientists) put on the primacy of convection in terms of climate. I mean, sure, a Bunch of energy can come into the convective-dominated part of the climate system, and it might reside there for a duration, perhaps even long, but, really, that doesn’t matter. Eli’s point is that if a different bunch of the same amount doesn’t leave the climate system, it’ll warm. And it doesn’t matter how long the first bunch is in the climate system, or what path it takes through it, or anything of the kind.

So, to me, this idea that the various oscillations, like NAO or PDO or ENSO somehow have something substantial to do with the overall climate and problem is specious. Yeah, there are big energy flows from one part of the system to the other, just as there are big flows of Carbon to and from oceans to atmosphere, but that’s just slosh, and the point is the net balance. And human emissions of (about) 10 GtC per annum are affecting that a lot.

## Blackbody radiation and the greenhouse effect, via plates (from Eli Rabett)

See Eli’s post.

Excerpt:

Eli can keep on adding plates, Ms. Rabett has gone out to buy some extras. Here is the red plate special. If somebunny works it through they will find that b’=3/4 a, go another plate and, as Christian pointed out, now b’ has increased to 4/5 a and so on.

Eli has not said anything about how the heat is being transferred, radiation, convection or conduction but since heat transfer, no matter the mechanism, is always proportional to temperature, the temperature of the blue plate must increase as more plates are added.

## The Democrats have no plan to address Climate Change (either)

… [T]he Democratic Party does not have a plan to address climate change. This is true at almost every level of the policy-making process: It does not have a consensus bill on the issue waiting in the wings; it does not have a shared vision for what that bill could look like; and it does not have a guiding slogan—like “Medicare for all”—to express how it wants to stop global warming.

Many people in the party know that they want to do something about climate change, but there’s no agreement about what that something may be.

This is not for lack of trying. Democrats have struggled to formulate a post-Obama climate policy because substantive political obstacles stand in their way. They have not yet identified a mechanism that will make a dent in Earth’s costly, irreversible warming while uniting the many factions of their coalition. These problems could keep the party scrambling to face the climate crisis for years to come.

This remains true. The only Democrats in the national view who keep mitigation of climate change in focus are Senator Bernie Sanders and Senator Sheldon Whitehouse. In fact, Senator Sanders and Senator Whitehouse are the only ones with plans, this being Senator Sanders’, and this being Senator Whitehouse, quite contrary to the impression The Atlantic article gives. Also, the claim that “Unlike Clinton’s policies, Sanders would surely have required a Democratic Congress to enshrine his policies”, is completely disingenuous. Only the most limited policies can be enacted without Congress, but that never should be a reason for failing to champion them or make excuses for why they can’t be done, like President Obama’s insistence that we cannot sacrifice economic growth in the pursuit of climate mitigation.

So, I would suggest that what The Atlantic and I mean here is that the standard, vanilla-flavored Democratic Party has no idea about what to do, and it doesn’t really care. What it cares about is winning, and it will compromise on policy in order to make that happen.

This is predominantly why Claire and I are so supportive of Bob Massie as Democratic candidate for governor of Massachusetts. See his position on climate change.

It’s more tiring to say it again than it is to listen to it, but we are running out of time and the economic costs to do something real in time to stop awesome, amazing, and recurring harm from climate change increase by the month.

We determine the point of no return (PNR) for climate change, which is the latest year to take action to reduce greenhouse gases to stay, with a certain probability, within thresholds set by the Paris Agreement. For a 67% probability and a 2K (Kelvin) threshold, the PNR is the year 2035 when the share of renewable energy rises by 2% per year. We show the impact on the PNR of the speed by which emissions are cut, the risk tolerance, climate uncertainties and the potential for negative emissions.

In short, both political parties — and especially the Democrats, since they claim to know better — are failing the United States Constitution and the people of the United States:

Preamble. We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

Amendment XIV (Ratified July 9, 1868)

Section 1.
All persons born or naturalized in the United States, and subject to the jurisdiction thereof, are citizens of the United States and of the State wherein they reside. No State shall make or enforce any law which shall abridge the privileges or immunities of citizens of the United States; nor shall any State deprive any person of life, liberty, or property, without due process of law; nor deny to any person within its jurisdiction the equal protection of the laws.

Indeed, given this situation, as I’ve mentioned before, I really wonder if the Constitution of the United States is up to this challenge, because it lacks the mechanism to achieve this. Of course, given that Congresses and Presidents disregard the Constitution, notably

Article. VI.

… This Constitution, and the Laws of the United States which shall be made in Pursuance thereof; and all Treaties made, or which shall be made, under the Authority of the United States, shall be the supreme Law of the Land; and the Judges in every State shall be bound thereby, any Thing in the Constitution or Laws of any State to the Contrary notwithstanding.

with respect to, say, United Nations Framework Convention on Climate Change”, which remains one of the treaties in force’ considered so by the U.S. State Department (page 515).

This is why Juliana v United States is so essential. See the compact details.

## “Why we need Jean-Luc Picard in 2018”

See the story, by Daniel W Drezner. On CBS All Access.

Yes, “Make it so.”

## What will happen to fossil fuel-fired electric bills everywhere, eventually, including those fired by natural gas

See Cost of Coal: Electric Bills Skyrocket in Appalachia as Region’s Economy Collapses, by James Bruggers at Inside Climate News. Excerpt:

The common denominator is American Electric Power, one of the nation’s largest utilities. It owns Kentucky Power, along with two subsidiaries in neighboring West Virginia, Wheeling Power and Appalachian Power.

In May, Wheeling Power and Appalachian Power requested permission from the Public Service Commission of West Virginia to boost their monthly residential bill 11 percent because of declining sales. That was on top of a 29 percent increase between 2014 and 2018.

Customers in both states are furious that the regulators are going along.

“Our jobs available in this state are not a living wage, and many are working two, three jobs just to make it,” wrote Elizabeth Bland of Beckley, West Virginia, in her protest letter to the commission. “Please turn down this request from Appalachian Power for the sake of all West Virginians.”

Rising rates are just part of the problem.

Kentucky Power’s monthly bill also includes surcharges, and a line for each customer’s share of the utility’s fixed costs. These add up in precious dollars.

They’re doubling down on coal at a time when coal is not competitive,’ said James M. Van Nostrand, a professor at the West Virginia University College of Law with decades of experience in the energy field. It’s really tragic.’

The average bill per customer at Kentucky Power has been among the highest in the nation for an investor-owned utility, according to 2016 numbers from the U.S. Energy Information Agency, the most recent comparisons available.

We’re hit hard,’ Alice Craft, a Whitesburg-area resident, told InsideClimate News. The power companies, they are just greedy, greedy, greedy.’

This will inevitably happen to all regions depending primarily upon fossil-fuel fired electricity, including Massachusetts, with consequences for the public, for utility shareholders, for local real estate property values, and for local business expansion. Accordingly, the actions of the Massachusetts House on recent energy legislation is incredibly myopic to say the least, and does not support the stated goals of House leadership, especially those of Democratic House Speaker Robert DeLeo to look out for the little guy’. His actions say he’s looking out for utility and energy companies, and the interests of AIM, whatever he says his motivations are.

## Censorship of Science by the administration of President Donald Trump

… President Trump has directed EPA and DOI to reconsider regulations adopted to control greenhouse gas emissions, despite the wealth of data showing that those emissions are the key cause of climate change. Faced with this contradiction, both agencies have sought to downplay the science, including by restricting the availability of information (for example, by removing climate data from websites and deleting references to humans’ role in climate change from reports). Similar action has also been taken by a raft of other entities, with the SST indicating that at least 20 different federal bodies, including both Congress and the White House, have attempted to restrict access to scientific information or otherwise silence science …

## “All of Monsanto’s problems just landed on Bayer” (by Chris Hughes at Bloomberg)

Monsanto has touted Roundup (also known as Glyphosate but more properly as $\textbf{\texttt{N-(phosphonomethyl)glycine}}$) as a safe remedy for weed control, often in the taming of so-called “invasive species”. It’s used on playfields where children are exposed to it, including, apparently, in my home town of Westwood, Massachusetts.

### But what is it good for? Case 1: Markov chain transition matrices

Consider again (1):

$(1')\,\,\,\left[ \begin{array} {c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right] = \left[ \begin{array} {ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} \right] \left[ \begin{array} {c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right]$

This happens also to be the template for a 3-state Markov chain with their many applications.

The following example is taken from the famous paper by Rabiner, as presented by Resch:

• L. R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, February 1989, 77(2), DOI:10.1109/5.18626.
• B. Resch, “Hidden Markov Models”, notes for the course Computational Intelligence, Graz University of Technology, 2011.
• They begin with the transition diagram:

which if cast into the form of (1′) and (2) looks like:

$(18)\,\,\,\left[ \begin{array} {c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right] = \left[ \begin{array} {ccc} 0.8 & 0.05 & 0.15 \\ 0.2 & 0.6 & 0.2 \\ 0.2 & 0.3 & 0.5 \end{array} \right] \left[ \begin{array} {c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right]$

The rows, top-to-bottom, are labeled sunny, rainy, and foggy, as are the columns, left-to-right. Cell $(i,j)$ gives the probability for going from state $i$ to state $j$. For example, the probability of going from sunny to foggy is 0.15. Here’s a prettier rendition from Resch:

Resch and Rabiner go on to teach Hidden Markov Models (“HMMs”) where $\mathbf{A}$ is not known and, moreover, the weather is not directly observed. Instead, information about the weather is obtained by observing whether or not a third party takes an umbrella to work or not. Here, however, suppose the weather is directly known. And suppose $\mathbf{A}$ is known except nothing is known about what happens after foggyexcept when it remains foggy. Symbolically:

$(19)\,\,\,\left[ \begin{array} {c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right] = \left[ \begin{array} {ccc} 0.8 & 0.05 & 0.15 \\ 0.2 & 0.6 & 0.2 \\ a_{31} & a_{32} & 0.5 \end{array} \right] \left[ \begin{array} {c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right]$

Note in (18) or Resch’s tableau how the rows each sum to one. This is a characteristic of first order Markov models: Once in a state, the transition has to go somewhere, even if to stay in that state. Transitions can’t just cause the system to disappear, so all the outgoing probabilities need to sum to one. This means, however, that when what happens when it is foggy is introduced, there aren’t two unconstrained parameters, there is only one. Accordingly, rather than introducing $a_{32}$, I could write $1 - a_{31}$. As it turns out, in my experience with nloptr, it is often better to specify this constraint explicitly so the optimizer
knows about it rather than building it implicitly into the objective function, even at the price of introducing another parameter and its space to explore.

The challenge I’ll pose here is somewhat tougher than that faced by HMMs. The data in hand is not a series of sunny, rainy, or foggy weather records but, because, say, the records were jumbled, all that’s in hand is a count of how many sunny, rainy, and foggy days there were, and what the count of days were following them. In particular:

$(20)\,\,\,\left[ \begin{array} {c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right] = \left[ \begin{array} {c} 1020 \\ 301 \\ 155 \end{array} \right]$

meaning that the first day of a set of pairs began where the first day was sunny 1020 times, rainy 301 times, and foggy 155 times. Statistical spidey sense wonders about how many observations are needed to pin town transition probabilities well, but let’s set that aside for now. (At least it’s plausible that if ordering information is given up, there might be a need for more count information.) And the count of what the weather was on the second days is:

$(21)\,\,\,\left[ \begin{array} {c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right] = \left[ \begin{array} {c} 854 \\ 416 \\ 372 \end{array} \right]$

or 854 sunny days, 416 rainy days, and 372 foggy foggy days.

Note that unlike in (16) here in (19) there is no need to pick upper and lower bounds on the value: This is a probability so it is by definition limited to the unit interval. But $a_{31} + a_{32} + 0.5 = 1$ always so that constraint needs to be stated.

Here’s the code:
 P2.func<- function(x) { # Sunny, Rainy, Foggy stopifnot( is.vector(x) ) stopifnot( 2 == length(x) ) # a.31<- x[1] a.32<- x[2] # P2<- matrix( c( 0.8, 0.05, 0.15, 0.2, 0.6, 0.2, a.31, a.32, 0.5 ), nrow=3, ncol=3, byrow=TRUE ) return(P2) }

 objective2<- function(x) { stopifnot( is.vector(x) ) stopifnot( 2 == length(x) ) x.right<- matrix(c(1020, 301, 155), 3, 1) b<- matrix(c(854, 416, 372),3,1) P2<- P2.func(x) d<- b - P2 %*% x.right # L2 norm return( L2norm(d) ) } constraint2<- function(x) { return( (x[1] + x[2] - 0.5 )) } nloptr.options2<- list("algorithm"="NLOPT_GN_ISRES", "xtol_rel"=1.0e-4, "print_level"=0, "maxeval"=100000, "population"=1000) Y2<- nloptr(x0=rep(0.5,2), eval_f=objective2, eval_g_eq=constraint2, lb=rep(0,2), ub=rep(1,2), opts=nloptr.options2 ) print(Y2) cat(sprintf("Y2 resulting estimates for a_{31}, a_{32} are: %.2f, %.2f\n", Y2$solution[1], Y2$solution[2])) 

This run results in:

 Call: nloptr(x0 = rep(0.5, 2), eval_f = objective2, lb = rep(0, 2), ub = rep(1, 2), eval_g_eq = constraint2, opts = nloptr.options2)

 Minimization using NLopt version 2.4.2 NLopt solver status: 5 ( NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached. ) Number of Iterations....: 100000 Termination conditions: xtol_rel: 1e-04 maxeval: 1e+05 Number of inequality constraints: 0 Number of equality constraints: 1 Current value of objective function: 0.500013288564363 Current value of controls: 0.20027284199 0.29972776012 

Y2 resulting estimates for $a_{31}, a_{32}$ are: 0.20, 0.30 

Suppose some of the data is missing? In particular, suppose instead:

$(20a)\,\,\,\left[ \begin{array} {c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right] = \left[ \begin{array} {c} 1020 \\ r(\eta, 155, 1020) \\ 155 \end{array} \right]$

where $\eta$ is on the unit interval and so all that’s known is that $x_{2}$ is between 155 and 1020, that is, bounded by the other two terms in $\mathbf{x}$.

Now there are two parameters to search, but they are unconstrained, apart from being on the unit interval. The code for this is:

 P3.func<- function(x) { # Sunny, Rainy, Foggy stopifnot( is.vector(x) ) stopifnot( 3 == length(x) ) # a.31<- x[1] a.32<- x[2] # There's an x[3] but it isn't used in the P3.func. See # the objective3. # P3<- matrix( c( 0.8, 0.05, 0.15, 0.2, 0.6, 0.2, a.31, a.32, 0.5 ), nrow=3, ncol=3, byrow=TRUE ) return(P3) }

 objective3<- function(x) { stopifnot( is.vector(x) ) stopifnot( 3 == length(x) ) x.right<- matrix(c(1020, r(x[3], 155, 1020), 155), 3, 1) b<- matrix(c(854, 416, 372),3,1) P3<- P3.func(x) d<- b - P3 %*% x.right # L2 norm return( L2norm(d) ) } constraint3<- function(x) { stopifnot( 3 == length(x) ) return( (x[1] + x[2] - 0.5 )) } nloptr.options3<- list("algorithm"="NLOPT_GN_ISRES", "xtol_rel"=1.0e-4, "print_level"=0, "maxeval"=100000, "population"=1000) Y3<- nloptr(x0=rep(0.5,3), eval_f=objective3, eval_g_eq=constraint3, lb=rep(0,3), ub=rep(1,3), opts=nloptr.options3 ) print(Y3) 

cat(sprintf("Y3 resulting estimates for a_{31}, a_{32}, and eta are: %.2f, %.2f, %.2f\n", Y3$solution[1], Y3$solution[2], Y3$solution[3]))  The results are:  Call: nloptr(x0 = rep(0.5, 3), eval_f = objective3, lb = rep(0, 3), ub = rep(1, 3), eval_g_eq = constraint3, opts = nloptr.options3)  Minimization using NLopt version 2.4.2 NLopt solver status: 5 ( NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached. ) Number of Iterations....: 100000 Termination conditions: xtol_rel: 1e-04 maxeval: 1e+05 Number of inequality constraints: 0 Number of equality constraints: 1 Current value of objective function: 0.639962390444759 Current value of controls: 0.20055501795 0.29944464945 0.16847867543 Y3 resulting estimates for a_{31}, a_{32}, and $\eta$ are: 0.20, 0.30, 0.17, with that $\eta$ corresponding to 301  That 301 versus the true 372 isn’t too bad. An example of where this kind of estimation is done more generally, see: ### But what is it good for? Case 2: Learning prediction matrices When systems like (2) arise in cases of statistical regression, the matrix $\mathbf{A}$ is called a prediction or design matrix. The idea is that its columns represent sequences of predictions for the response, represented by the column vector $\mathbf{b}$, and the purpose of regression is to find the best weights, represented by column vector $\mathbf{x}$, for predicting the response. Consider (2) again but instead of $\mathbf{b}$ and $\mathbf{x}$ being column vectors, as in (5), they are matrices, $\mathbf{B}$ and $\mathbf{X}$, respectively. In other words, the situation is that there are lots of $(\mathbf{b}_{k}, \mathbf{x}_{l})$ pairs available. And then suppose nothing is known about $\mathbf{A}$, that is, it just contains nine unknown parameters: $(22)\,\,\,\mathbf{A} = \left[ \begin{array} {ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} \right]$ There are, of course, questions about how many $(\mathbf{b}_{k}, \mathbf{x}_{l})$ pairs are needed in tandem with choice of number of iterations (see maxeval discussion in Miscellaneous Notes below.) Here, 8 pairs were used for purposes of illustration. $(23)\,\,\,\mathbf{X} = \left[\begin{array}{cccccccc} 1356 & 7505 & 4299 & 3419 & 7132 & 1965 & 8365 & 8031 \\ 5689 & 8065 & 7001 & 638 & 8977 & 1088 & 3229 & 1016 \\ 3777 & 8135 & 3689 & 1993 & 3635 & 9776 & 8967 & 7039 \end{array} \right]$ and $(24)\,\,\,\mathbf{B} = \left[\begin{array}{cccccccc} 5215 & 13693 & 7265 & 4217 & 9367 & 10588 & 14372 & 12043 \\ 7528 & 17825 & 11024 & 4989 & 14860 & 9447 & 16162 & 13087 \\ 6161 & 12798 & 7702 & 3023 & 9551 & 8908 & 11429 & 8734 \end{array} \right]$ The code for this case is:  objective4<- function(x) { stopifnot( is.vector(x) ) stopifnot( 9 == length(x) ) B<- matrix(c(5215, 13693, 7265, 4217, 9367, 10588, 14372, 12043, 7528, 17825, 11024, 4989, 14860, 9447, 16162, 13087, 6161, 12798, 7702, 3023, 9551, 8908, 11429, 8734 ), 3, 8, byrow=TRUE) X.right<- matrix(c(1356, 7505, 4299, 3419, 7132, 1965, 8365, 8031, 5689, 8065, 7001, 638, 8977, 1088, 3229, 1016, 3777, 8135, 3689, 1993, 3635, 9776, 8967, 7039 ), 3, 8, byrow=TRUE) P4<- matrix(x, nrow=3, ncol=3, byrow=TRUE) d<- B - P4 %*% X.right # L2 norm for matrix return( L2normForMatrix(d, scaling=1000) ) } nloptr.options4<- list("algorithm"="NLOPT_GN_ISRES", "xtol_rel"=1.0e-6, "print_level"=0, "maxeval"=300000, "population"=1000) Y4<- nloptr(x0=rep(0.5,9), eval_f=objective4, lb=rep(0,9), ub=rep(1,9), opts=nloptr.options4 ) print(Y4)  cat("Y4 resulting estimates for $\mathbf{A}$:\n") print(matrix(Y4$solution, 3, 3, byrow=TRUE)) 

The run results are:
 Call: nloptr(x0 = rep(0.5, 9), eval_f = objective4, lb = rep(0, 9), ub = rep(1, 9), opts = nloptr.options4)

 Minimization using NLopt version 2.4.2 NLopt solver status: 5 ( NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached. ) Number of Iterations....: 300000 Termination conditions: xtol_rel: 1e-06 maxeval: 3e+05 Number of inequality constraints: 0 Number of equality constraints: 0 Current value of objective function: 0.0013835300300619 Current value of controls: 0.66308125177 0.13825982301 0.93439957114 0.92775614187 0.63095968859 0.70967190127 0.3338899268 0.47841968691 0.79082981177 

Y4 resulting estimates for mathbf{A}: [,1] [,2] [,3] [1,] 0.66308125177 0.13825982301 0.93439957114 [2,] 0.92775614187 0.63095968859 0.70967190127 [3,] 0.33388992680 0.47841968691 0.79082981177 `

In fact, the held back version of $\mathbf{A}$ used to generate these test data sets was:

$(25)\,\,\,\mathbf{A} = \left[\begin{array}{ccc} 0.663 & 0.138 & 0.934 \\ 0.928 & 0.631 & 0.710 \\ 0.334 & 0.478 & 0.791 \end{array} \right]$

and that matches the result rather well. So, in a sense, the algorithm has “learned” $\mathbf{A}$ from the 8 data pairs presented.