The Myth of Baseload Power (Amory Lovins)

Posted in Amory Lovins, Anthropocene, Bloomberg New Energy Finance, BNEF, Buckminster Fuller, capitalism, decentralized electric power generation, decentralized energy, design science, distributed generation, economics, energy efficiency, energy reduction, environment, Hermann Scheer, Hyper Anthropocene, ILSR, integrative design, John Farrell, Joseph Schumpeter, leaving fossil fuels in the ground, local generation, local self reliance, microgrids, petroleum, reasonableness, solar democracy, solar domination, solar energy, solar power, Spaceship Earth, sustainability, the energy of the people, the green century, the right to know, utility company death spiral, wind energy, wind power, zero carbon | 1 Comment

A “capacity for sustained muddle-headedness”

Hat tip to Paul Lauenstein, and his physician brother, suggesting the great insights of the late Dr Larry Weed:

Great lines, great quotes, a lot of humor:

  • “… a tolerance of ambiguity …”
  • “Y’know, Pavlov said you must teach a graduate student gradualness”
  • “… a final diagnosis is a myth …”
  • “… there were getting more information on what they did than what they had …”
  • “If you cannot evaluate what you’re doing, then there is a very serious possibility that you do not know what you are doing.”
  • ”… they played `Sherlock Holmes’ too early. They asked the first question, and then the next question would be determined by the first question, because they were brought up in a CPC sort of atmosphere. `What do you think of next, doctor?”’

Unfortunately, it’s not clear medicine — or statistics — has progressed much beyond 1971. Note the 1999 report from the National Academy of Sciences,
To Err is Human.

The last quote reminds me of something I was taught in graduate school (in 1973, noting the above video is from 1971), when I took 6.871, Knowledge-based application systems, and medical decision support was covered as a subject. I distinctly recall that problems of software-physician interaction during the patient interview centered about the comparatively unstructured way which physicians gathered information, that they could not be constrained to using a diagnostic or taxonomic key as is popular in, say, Botany. The thought crossed my mind at the time, that “How do we know if the approach the physicians are using are the most effective?” However, being a student, and knowing next to nothing about diagnostic medicine, I suppressed my doubts and took the advice as definitive. Given Dr Weed’s comments, I should have been more assertive with my doubts. And, unfortunately and apparently, these methods of practice which Dr Weed criticized have gotten ingrained in decision support software for medicine. See E. H. Shortliffe, “Computer programs to support clinical decision making”, Journal of the American Medical Association, 258, 61-66, for their status as of 1986, some 15 years after Weed’s talk.

Incidentally, while I have found two additional references by medical authors to the title phrase of this post attributed by them to Alfred North Whitehead, that is, claims that Whitehead mentioned a “capacity for sustained muddle-headedness”. My online research has failed to turn up that reference. The closest I can find is a mention by Lomax in an article in the Journal of Psychiatric Practice, 17(1), January 2011, on page 46 where he quotes Whitehead from Whitehead’s 1938 book Modes of Thought where Lomax writes

Alfred Whitehead said that the job of the philosopher was “living with sustained muddle-headedness”.

Whitehead liked the term muddle-headed, even applying it to himself. I doubt he ever meant it, however, in exactly the way Dr Weed and colleagues used it.

Posted in American Association for the Advancement of Science, American Statistical Association, anemic data, Bayesian, cardiovascular system, David Spiegelhalter, machine learning, Massachusetts Institute of Technology, medicine, Paul Lauenstein, rationality, reason, reasonableness, risk, statistics | Leave a comment

“Explosive methane will create two million jobs!”

The American Petroleum Institute has trotted out a commissioned study claiming an increase of two million jobs in 2040 off a base of four million (in 2015).

First, these are not people working with development or distribution of natural gas. 44% are “end user” workers, that is, someone someplace who works to “… convert natural gas and its associated liquids to electricity, petrochemical and other products and the industries that manufacture, sell, install and maintain gas-fired appliances and equipment used in the residential, commercial, vehicle and industrial sectors”. (API, “Key Observations and Findings”) The implication is that if natural gas went away, so would these 44% of jobs. That is not correct. The report further defines these as

The largest NAICS codes associated with the end-use segment are Chemical Manufacturing, Gas-fired Electric Power Generation, Power Boiler and Heat Exchanger Manufacturing, Household Appliance Repair and Maintenance, and Industrial Process Furnace and Oven Manufacturing. The end-use segment also includes portions of the jobs related to Industrial Equipment and Machinery Repair and Maintenance, Industrial Construction, Freight Trucks, Turbine and Turbine Generator Set Manufacturing, Iron and Steel Pipe and Tube Manufacturing, and Freight Rail.

While it is not clear what “largest” means here, nevertheless there is no notion of substitution or displacement. The natural gas industry is really responsible for jobs in “Household Appliance Repair and Maintenance” and “Chemical Manufacturing”?

Second, 30% of the claimed jobs are in fact directly associated with natural gas mining, production and distribution. These are fully tied to natural gas.

Third, 25% of the claimed jobs consist “…of oil and gas production companies and their suppliers of goods and services…”, or, to quote their NAICS descriptions,

The largest NAICS Codes primarily associated with the production segment include Support Activities for Oil and Gas Operations, Crude Petroleum and Natural Gas Extraction, Drilling Oil and Gas Wells, and Oil and Gas Field Machinery Manufacturing.

Natural gas can be credited with all of these jobs?

Fourth, the study limited itself to 2015-2016 natural gas growth scenarios, and cherry-picked data from the U.S. EIA Annual Energy Outlook for those years. In particular, here are the case studies the report chose:

In contrast, EIA AEO scenarios from 2017 show some projections with flat natural gas consumption, per:

Presumably because there is no increase in use of natural gas, there is also no increase in numbers of jobs.

So, I conclude, in a best case scenario, about 600,000 jobs could be added by 2040 for which the natural gas industry is responsible, and perhaps another 200,000, dependending upon how the other categories are counted. There is no allocation that I can see to jobs fixing existing pipeline infrastructure, which don’t increase as production does. The method used to make these projections, as far as I can tell, is using linear fits based upon historical employment numbers.

Incidentally, the API report contains an interesting Appendix B which compares the jobs intensity for construction of natural gas plants, nuclear, coal, onshore wind, and two types of solar, sourcing modules from the U.S. or China. While there are plenty of examples of unfair comparison in that Appendix, including failing to account for additional decrease in price for solar modules between now and 2040, the study produces the result that for new wind, solar, and natural gas construction, the job creation intensity is about the same. For some reason they excluded offshore wind.

Finally, as I’ve noted before, there is nothing natural about natural gas. It is explosive methane. Natural gas ain’t granola.

Posted in American Petroleum Institute, Anthropocene, being carbon dioxide, Carbon Worshipers, disingenuity, economics, EIA, emissions, explosive methane, false advertising, fossil fuels, gas pipeline leaks, Hyper Anthropocene, methane, natural gas, petroleum, pipelines, the right to be and act stupid, the show, the tragedy of our present civilization, tragedy of the horizon | 1 Comment

Solar Costs at `Jaw-Dropping Lows`; `No Evidence That Changing Power Mix Endangers Electric System Reliability`

From GTM:

`Solar Moves in a Curious Direction Since Trump Quit Paris Deal: Up

There is “[n]o evidence that the changing power mix endangers electric system reliability”. Two reports:

Also see Bigger is Not Better: Grid Modernization and the Antiquated Concept of ‘Baseload’, and in particular the comment by Gene Grindle to that post.

And “Henbest: Energy to 2040 — Faster Shift to Clean, Dynamic, Distributed”:

As some of the coal and nuclear plants face retirement decisions, focusing on their status as “baseload” generation is not a useful perspective for ensuring the cost-effective and reliable supply of electricity. Instead, system planners, market administrators such as regional effectively and efficiently defines and measures system needs and (b) develops planning tools, scheduling processes, and market mechanisms to elicit and compensate broad range of resources
that have become available to meet those needs. Fortunately, planners and operators have been hard at work at such innovations and have moved past the concept of “baseload” to focus on the attributes of resources and the services they provide to the system that help the modernized electricity system operate more reliably, efficiently, and nimbly. While coal and nuclear power plants—as well as a broad range of other resource types—are recognized for providing a wide range of reliability services to the grid, the traditional definition of power supply resource adequacy is being revisited by some system operators and planners. Still, additional work is needed in planning and markets to better recognize and compensate resources for the value they provide to the system, and to incorporate the environmental impacts of electricity generation, including resources’ ability to reduce the system’s greenhouse gas emissions, consistent with public policy goals.

Coal and nuclear plants do not provide unique operational services that are specifically identified by or correlated with the term “baseload” generation. The term does not reflect the broader range of services that various resources can provide. As system planning and electricity market design are modernized, it is becoming increasingly clear that the services and attributes most under-recognized by today’s markets are greenhouse gas emissions in some jurisdictions and operational flexibility. A resource is considered flexible when it can react to operational signals to ramp its power generation up and down to help meet the needs of the system over multiple hours and minute-to-minute. Flexible resources can cost-effectively assist with meeting changing system loads and integrating the variable output of renewable resources. These flexibility needs are rapidly expanding as a result of numerous industry trends: (a) recognition by policymakers that renewable energy resources are needed to meet long-term emissions reductions goals; (b) customers’ increasing desire to voluntarily procure renewable energy or generate electricity on-site; and (c) substantial technological improvements that have driven down the cost of renewable resources to the point where, even before accounting for tax incentives, they are the lowest-cost option for new generating plants in some regions of the country.

(From Advancing Past “Baseload” to a Flexible Grid)

Posted in American Solar Energy Society, Anthropocene, Bloomberg, Bloomberg New Energy Finance, BNEF, bridge to somewhere, Buckminster Fuller, clean disruption, climate business, corporate litigation on damage from fossil fuel emissions, decentralized electric power generation, decentralized energy, demand-side solutions, destructive economic development, distributed generation, Donald Trump, economics, electricity markets, engineering, force multiplier, fossil fuel divestment, fossil fuel infrastructure, Green Tech Media, grid defection, Hyper Anthropocene, ILSR, investment in wind and solar energy, Joseph Schumpeter, leaving fossil fuels in the ground, local generation, local self reliance, microgrids, public utility commissions, reasonableness, Sankey diagram, solar democracy, solar domination, solar energy, solar power, Spaceship Earth, stranded assets, sustainability, the energy of the people, the green century, the right to be and act stupid, the value of financial assets, Tony Seba, tragedy of the horizon, utility company death spiral, wind energy, wind power, zero carbon | 5 Comments

Defying technology, trends … nay, defying Mathematics!

The creatures from Trumpland are planning an Energy Week in the upcoming, probably to lead up to the Fourth of July celebrations. Our Orange Leader

… will tout surging U.S. exports of oil and natural gas during a week of events aimed at highlighting the country’s growing energy dominance.

[He] also plans to emphasize that after decades of relying on foreign energy supplies, the U.S. is on the brink of becoming a net exporter of oil, gas, coal and other energy resources.

(Brief excerpt from the Bloomberg article on the subject)

Trouble is, this defies trends and the cost curves of wind and especially solar technology, as noted by Bloomberg New Energy Finance and Lazard. Worse, as energy supplies are more constrained in their sources and demand more exotic methods to extract, either the price per unit needs to increase, or there needs to be a greater subsidy from the federal government. Fossil fuels in the USA already receive huge subsidies: Consider the FERC-directed eminent domain takings which pipeline companies receive, rather than having to buy the land their pipes cross and despoil. (In contrast, consider what Amtrak has to do for its rights of way.) As additional supplies of fossil fuel are dumped into the marketplace, prices are depressed, especially explosive methane (“Natural gas ain’t granola”).

Worse, their prices are at best constant, and, over time, increase are the reserves get rarer, whereas renewables — without subsidies — will be cheaper than costs of transmission for electricity in the early 2020s.

This is matching a linear curve with an upslope against exponentially decaying curves. Trumpland wants overseas consumption, but at what price? Cheaper than, say, coal dug in China with fewer health and workplace protections? With lower transportation costs?

What do the markets think? Trumpland is happy to quote, take credit for, and lie about increases in jobs (e.g., increases in numbers of jobs in coal as cited by EPA head Pruitt), but how have energy sources performed since the junta was in office?

(Click figures to see larger images, and use browser Back Button to return to blog.)

Not too well.

In contrast, have a look at wind and solar investment (*):

I daresay that solar via TAN:ETF has perked up recently, after a time of doldrums.

By Trumplands criteria, they aren’t doing too well. The order is tall to provide convincing evidence how the United States is going to defy headwinds, develop markets, avoid having climate damage reparations set against it, let alone financially succeed. It’s possible that Trumpland is trying to set up a protection racket, consistent with organized crime, where if the rest of Earth doesn’t want their planet trashed, then the USA should get paid off. But, if that is the objective, it speaks for itself.

And it won’t work. I’ve detailed many times elsewhere here why.

Update, 2017-07-01

From Kevin Book of the Center for Strategic & International Studies, in his “An energy policy of dominance”, 28th June 2017:

Governments in the United States can position the country for dominance by rationalizing disparate policies that muddy price signals for private industry. For example, the $1/gallon federal biodiesel tax credit implies a CO2 price of ~$196 per metric ton. By contrast, the ¢24.4/gallon federal diesel tax corresponds to a CO2 price of ~$24 per metric ton. Federal wind energy tax credits of $24/megawatt hour imply ~$54 per metric ton, but the nine-state Regional Greenhouse Gas Initiative auctioned carbon allowances in June for ~$2.80 per metric ton. Paying green energy producers 10 to 20 times more to abate greenhouse gases than we charge fossil fuel consumers for emitting them is distortion, not dominance.

That, incidentally, shows how silly the claim of government subsidies support renewables and that’s why they’re winning is.

Also, outgoing FERC member Collete Honorable says “I don’t see any problems with reliability, and I say bring on more renewables”.

(*) Statement of interest: Both my wife, Claire, and I have holdings in the FAN and TAN ETFs. Also, this is not intended as any kind of financial advice. You should consult with a professionally licensed advisor before acting on this information.

Posted in American Petroleum Institute, American Solar Energy Society, Anthropocene, being carbon dioxide, Bloomberg, Bloomberg New Energy Finance, BNEF, bridge to nowhere, bridge to somewhere, carbon dioxide, Carbon Worshipers, clean disruption, climate business, climate economics, corporate litigation on damage from fossil fuel emissions, corporations, destructive economic development, Donald Trump, economics, electricity markets, energy, energy utilities, evidence, explosive methane, exponential growth, extended supply chains, false advertising, FERC, fossil fuel divestment, fossil fuel infrastructure, fossil fuels, fracking, global blinding, greenhouse gases, Humans have a lot to answer for, Hyper Anthropocene, investing, investment in wind and solar energy, investments, Joseph Schumpeter, leaving fossil fuels in the ground, military inferiority, Minsky moment, pipelines, politics, pollution, rights of the inhabitants of the Commonwealth, risk, solar democracy, solar domination, solar energy, solar power, the energy of the people, the green century, the problem of evil, the right to be and act stupid, the stack of lies, the tragedy of our present civilization, the value of financial assets, tragedy of the horizon, United States, utility company death spiral, wind energy, wind power | Leave a comment



Today, the City of Boston released The City website features scrubbed information from the U.S. EPA Climate Change website.

ref: Azimuth Backup Project

Just another band out of Boston …

Posted in Azimuth Backup Project, climate change, climate data, climate models, Environmental Protection Agency | Leave a comment

Causation and the Tenuous Relevance of Philosophy to Modern Science

I was asked by ATTP at their blog:

Which bit of what Dikran said do you disagree with? It certainly seems reasonable to me; if you want to explain how something could cause something else, you need to use more than just statistics.

After consideration, I posted a long explanation, worthy of a blog post on its own. But I’m leaving it there, and just putting a link to it here.

Posted in causation, john d norton, philosophy of science, science | Leave a comment

Deloitte: The drumbeats for the extinction of utilities have begun

Deloitte Resources 2017 Study — Energy management: Sustainability and progress.

From The Economist, 25th February 2017:

FROM his office window, Philipp Schröder points out over the Bavarian countryside and issues a Bond villain’s laugh: “In front of you, you can see the death of the conventional utility, all financed by Mr and Mrs Schmidt. It’s a beautiful sight.”

Microgrids where the Big Grids don’t go.

Why microgrids.

Update, 2017-09-14

Quote from Brian Bentz, CEO of Canadian electricity provider Alectra Utilities:

Someone’s going to cannibalize our business — it may as well be us. Someone’s going to eat our lunch. They’re lining up to do it. I think repositioning, redefining the grid from what it is as sort of a passive purveyor of energy to one that is an enabler of transactive energy at the grassroots level is a much more sustainable, profitable strategy.

Posted in Bloomberg, Bloomberg New Energy Finance, BNEF, bridge to somewhere, Buckminster Fuller, clean disruption, CleanTechnica, decentralized electric power generation, decentralized energy, destructive economic development, disruption, distributed generation, economics, EIA, electrical energy storage, energy utilities, engineering, finance, fossil fuel divestment, green tech, grid defection, investment in wind and solar energy, John Farrell, Joseph Schumpeter, leaving fossil fuels in the ground, local generation, local self reliance, Mark Jacobson, Massachusetts Clean Energy Center, microgrids, public utility commissions, PUCs, rate of return regulation, reason, reasonableness, solar democracy, solar domination, solar energy, solar power, Techno Utopias, the energy of the people, the green century, the value of financial assets, Tony Seba, wind energy, wind power, zero carbon | 1 Comment

Statements by the Ecological Society of America on the proposed U.S. exit from the Paris Agreement, and on Climate Change

By withdrawing from the Paris Agreement on climate change, the United States is abdicating its role as the world leader in using science-based information to inform policy. Business, political, and scientific leaders the world over are condemning the decision. More than 190 signatory nations agreed to take actions towards reducing future temperature increases and addressing the serious threats posed by a changing climate to people, livelihoods, and nature. The science-based evidence is clear that humans are driving climate change.

Read the full statement.

And read the ESA’s statement on “Ecosystem management in a changing climate”. An excerpt:

Management strategies have traditionally operated under the assumption that natural systems fluctuate within a certain range—the past has served as an indicator of future conditions. But this assumption does not hold in the face of rapid climate change. Even conservative warming projections show that natural systems will experience unprecedented stresses, including shifting habitats and ecological processes (e.g. wildlife migration and reproduction) and more frequent and severe natural disturbances, such as fires, floods, and droughts. These unavoidable changes will require management that addresses ecological thresholds, tipping points, and other sources of uncertainty. Ecosystems are naturally dynamic and diverse—they are the products of change and adaptation. But human activity has impaired the ability of many systems to respond. Preserving natural function is central to maintaining resilience and safeguarding ecosystem services in the face of climate change.

I am a member of the Ecological Society of America (ESA). Great technical literature! Interesting problems!

Posted in adaptation, American Association for the Advancement of Science, American Statistical Association, Anthropocene, argoecology, Carl Safina, climate change, climate disruption, complex systems, ecological services, ecology, Ecology Action, environment, global warming, Hyper Anthropocene, marine biology, mesh models, model-free forecasting, population biology, population dynamics, quantitative biology, quantitative ecology, science, Science magazine, Spaceship Earth, sustainability, Takens embedding theorem, the tragedy of our present civilization, the value of financial assets, tragedy of the horizon, Wordpress, zero carbon | Leave a comment

Installed Non-Utility Solar, Massachusetts, 12/2016

(Click on image to see a larger figure, and use browser Back Button to return to blog.)

Posted in Bloomberg New Energy Finance, BNEF, clean disruption, CleanTechnica, decentralized electric power generation, decentralized energy, destructive economic development, distributed generation, electricity markets, energy utilities, engineering, feed-in tariff, green tech, grid defection, ILSR, investment in wind and solar energy, ISO-NE, John Farrell, local generation, local self reliance, making money, Massachusetts Clean Energy Center, microgrids, New England, regime shifts, RevoluSun, solar democracy, solar domination, solar energy, solar power, stranded assets, SunPower, sustainability, the energy of the people, the green century, the value of financial assets, utility company death spiral, zero carbon | Leave a comment

Prayer Vigil for the Earth, Needham Common, Massachusetts, 4 June 2017

Avoiding [in excess of] 2 degrees [Celsius] warming `is now totally unrealistic.”’

Led by our own UU Needham Reverand Catie Scudera, with Reverand Daryn Bunce Stylianopoulos of the First Baptist Church of Needham, and Reverend Jim Mitulski of the Congregational Church of Needham, Sunday, 4th June 2017 saw a vigil of members of their combined congregations, singing songs of reflection and protest in response to Thursday’s announcement by President 45, that he would withdraw the United States from the Paris Climate agreement.

In response …

Why? Oh, why not try something from an episode on Tyson’s excellent Cosmos:


(properly, for pay)

(YouTube, for pay)

And, for background to this, read Ogden and Sleep.

In the world in excess of +2\textdegree, dragons prowl. The most obvious are methane clathrates, but who knows what else.

And, while solemn, the songs, the sun, and the moment were memorable.

And there is some good news, even if it is partial.

Posted in Anthropocene, climate disruption, Hyper Anthropocene, methane, Reverend Catie Scudera, Unitarian Universalism, UU Needham | Leave a comment

Disney’s Robert Iger resigns from Trump advisory panel over the Trumpistas’ decision to quit COP

“Protecting our planet and driving economic growth are critical to our future, and they aren’t mutually exclusive,” he said in a statement. “I deeply disagree with the decision to withdraw from the Paris Agreement and, as a matter of principle, I’ve resigned from the President’s advisory council.”

— Disney’s CEO, Robert Iger

Story from Variety.

Disney’s environmental commitment.

Disney’s environmental goals and targets.

Posted in Anthropocene, Buckminster Fuller, carbon dioxide, citizenship, climate business, climate change, climate disruption, climate economics, COP21, destructive economic development, Disney, Donald Trump, environment, Florida, global warming, Hyper Anthropocene, Robert Iger, Walt Disney Company | Leave a comment

GForce Waste Sorters!

Waste management at the Boston Red Sox.

Check out their Wide World of Waste.

Posted in ecology, Ecology Action, economics, materials science, recycling, resiliency, solid waste management, sustainability | Leave a comment

`Exxon Shareholders Approve Climate Resolution: 62% Vote for Disclosure’

Flash from InsideClimate News:

ExxonMobil shareholders voted Wednesday to require the world’s largest oil and gas company to report on the impacts of climate change to its business—defying management, and marking a milestone in a 28-year effort by activist investors.

Sixty-two percent of shareholders voted for Exxon to begin producing an annual report that explains how the company will be affected by global efforts to reduce greenhouse gas emissions under the Paris climate agreement. The analysis should address the financial risks the company faces as nations slash fossil fuel use in an effort to prevent worldwide temperatures from rising more than 2 degrees Celsius.
… [I]nstitutional investors argue that climate risk is a long-term financial risk that should be integrated into financial reporting.

BlackRock, the world’s largest investment firm, with $5.1 trillion in assets under management, and several major global investors—including State Street, Aviva, and Legal & General—have signaled that they want more transparency on climate change risk. BlackRock’s first vote against corporate management on climate came this year against Occidental, where it was the largest institutional investor.
Patrick Doherty, director of corporate governance for the New York State Office of the State Comptroller, which spearheaded the Exxon resolution along with the Church of England, said that climate is a very real financial concern for the employees paying into state pension funds and looking to payouts decades into the future. The New York State Common Retirement Fund, one of the world’s largest public employees investment funds, holds more than $1 billion in Exxon stock.

“We have a very, very strong financial interest in the long-term health of the company,” Doherty said.

“The average CEO has a tenure of five years, and hedge funds are looking to maybe the next quarter,” he said. “Only institutional investors have this longer view. And one of the reasons that support for climate disclosure has been increasing over the years is more and more institutional shareholders are saying, hey, there can be large long-term risk and long-term damage.”

(The above is the Carbon Tracker 2014 Unburnable Carbon report.)

And regarding the claim that

Oil companies cannot predict the long-term impact of climate and climate policy with enough precision to provide the kind of risk analysis that shareholders are seeking, IHS Markit said. Financial disclosure under securities regulation looks ahead over a much shorter time frame.

what Markit is saying is that the management of fossil fuel companies does not know how to do their job. If that is correct, which I doubt, they should step aside and allow someone who knows how to do it, do it.

See the following links for additional news on this development:

Posted in Anthropocene, Bloomberg, Bloomberg New Energy Finance, BNEF, bridge to nowhere, business, capitalism, Carbon Worshipers, clean disruption, climate, climate business, climate change, climate disruption, climate economics, corporations, destructive economic development, environmental law, extended supply chains, Exxon, fossil fuel divestment, fossil fuel infrastructure, fossil fuels, Global Carbon Project, global warming, greenhouse gases, Hyper Anthropocene, investing, investments, Joseph Schumpeter, leaving fossil fuels in the ground, making money, Our Children's Trust, petroleum, pollution, rationality, reason, reasonableness, statistics, stranded assets, sustainability, the right to know, the tragedy of our present civilization, the value of financial assets, tragedy of the horizon, zero carbon | Leave a comment

Dikran Marsupial’s excellent bit on hypothesis testing applied to climate, or how it should be applied, if at all

Frankly, I wish some geophysicists and climate scientists wrote more as if they thoroughly understood this, let alone deniers to try to discredit climate disruption. See “What does statistically significant actually mean?”.

Of course, while statistical power of a test is important to keep in mind, as well as the effects of arbitrary alterations or recodings of data upon it (see also Andrew Gelman’s comment on this), people should really look at this from a purely Bayesian perspective, and there’s no longer a computational excuse to ignore that approach.

Posted in Anthropocene, anti-science, Bayesian, climate change, climate data, climate disruption, D. K. Marsupial, Frequentist, global warming, hiatus, Hyper Anthropocene, ignorance, John Kruschke, regime shifts, statistics, Student t distribution | Leave a comment

The Rule of 135

From SingingBanana.

Posted in Conway's Game of Life, dynamical systems, finite-state machines, mathematical publishing, mathematics, mathematics education, maths, Patterson's Worm, random walks, state-space models, statistical dependence, statistics | Leave a comment

Massachusetts Senate, Climate and Clean Energy Tour (Senator Marc Pacheco, and others), testimony

I testified at the Weymouth, Massachusetts hearing for the MA Senate Climate and Clean Energy Tour.

Here’s Senator Marc R Pacheco introducing the Tour:

The Weymouth hearing was recorded and is available on YouTube in three parts:


My written testimony is attached below:

A better version having hyperlinks intact is available here, but be sure to download before reading in order to get access to the links.

Posted in adaptation, American Association for the Advancement of Science, American Meteorological Association, American Solar Energy Society, Anthropocene, being carbon dioxide, Bloomberg New Energy Finance, BNEF, bridge to somewhere, citizenship, clean disruption, climate disruption, climate economics, climate justice, Constitution of the Commonwealth of Massachusetts, decentralized electric power generation, destructive economic development, Ecology Action, global blinding, Global Carbon Project, global warming, greenhouse gases, Hermann Scheer, Hyper Anthropocene, investment in wind and solar energy, investments, John Farrell, Joseph Schumpeter, Mark Carney, Mark Jacobson, Massachusetts, Massachusetts Clean Energy Center, Massachusetts Interfaith Coalition for Climate Action, science, solar democracy, solar domination, solar energy, solar power, Spaceship Earth, the energy of the people, the green century, the value of financial assets, Tony Seba, tragedy of the horizon, wind energy, wind power, zero carbon | Leave a comment

“Bigger Isn’t Always Better When It Comes to Data”: Barry Nussbaum

The President’s Corner in the May 2017 issue of Amstat News, the monthly newsletter of the American Statistical Association (“ASA”), features the interesting exposition by environmental statistician and President of the ASA, Barry Nussbaum, called “Bigger isn’t always better when it comes to data.” Key paragraph:

Notice a subtle nuance here. Normally, you have a population and you sample elements from the population. Here, we really didn’t know if the vehicle’s emissions belonged to the population, due to the maintenance and use restrictions, until we administered the questionnaire after the vehicle had been randomly selected.

Good read.


Posted in American Statistical Association, emissions, sampling, sampling without replacement, smoothing, spatial statistics, statistics | Leave a comment

Akamai Technologies invests in Texas wind farm

Akamai (NASDAQ: AKAM) said it is making a 20-year investment in the planned Seymour Hills Wind Farm, which will be based outside of Dallas and is expected to begin operating next year. The project is being developed by Infinity Renewables, and the plan is to construct 38 wind turbines across about 8,000 acres, Akamai said in a news release. Akamai said it intends to pull enough energy from the wind farm to offset its aggregate data center operations based in Texas, which account for about 7 percent of Akamai’s global power load.

This is part of Akamai’s commitment to reduce Carbon emissions and cover 50% of its operating requirements for electrical energy by 2020. See the details in Akamai’s press release.

See more about Akamai’s sustainability initiative, and its long range plans.

See Akamai’s progress.

Posted in Akamai Technologies, American Solar Energy Society, Anthropocene, Bloomberg, Bloomberg New Energy Finance, BNEF, bridge to somewhere, Buckminster Fuller, business, clean disruption, climate business, climate change, climate disruption, climate economics, coastal communities, corporations, decentralized electric power generation, decentralized energy, destructive economic development, ecology, economics, efficiency, electricity, electricity markets, global warming, green tech, Hyper Anthropocene, investing, investment in wind and solar energy, the energy of the people, the green century, wind energy, wind power, zero carbon | Leave a comment

“The [transport-as-a-service] disruption will crater the value chain of the oil industry” (RethinkX)

… By 2030, the report predicts that oil demand will drop to 70 million barrels per day. The resulting collapse in prices will be catastrophic for the industry, and these effects are likely to be felt as early as 2021.

The report suggests that oil demand from passenger road transport will drop by 90 percent by 2030; demand from the trucking industry will drop by 7 million barrels per day globally. This is, as the report says, an existential crisis for the industry. Current share prices and projections are based on the presumption of a system of individually owned vehicles.

See the news report for an overview, and the detailed report written by James Arbib and Tony Seba of RethinkX.

As far as I’m concerned, it couldn’t happen to a “nicer” bunch of people, this economic catastrophe. And it can’t happen soon enough!

Elsewhere, Why fossil fuel giants underestimate electric cards and renewable energy?

Posted in American Petroleum Institute, Bloomberg New Energy Finance, BNEF, bridge to somewhere, Buckminster Fuller, capitalism, clean disruption, CleanTechnica, climate business, decentralized energy, destructive economic development, economics, efficiency, fossil fuel divestment, green tech, ILSR, investments, John Farrell, leaving fossil fuels in the ground, local self reliance, making money, Mark Jacobson, Massachusetts Clean Energy Center, public transport, public utility commissions, rationality, reason, stranded assets, sustainability, the energy of the people, the green century, the value of financial assets, Tony Seba, zero carbon | Leave a comment

`Evidence of a decline in electricity use by U.S. households’ (Prof Lucas Davis, U.C. Berkeley)

This is from a blog post by Professor Lucas Davis at his blog. In addition to the subject, that’s an interesting way of presenting a change over time I’ll need to think about: It seems the model could be used in other, more comprehensive ways. Note it’s really a matched pairs test, where each state is a candidate and its electricity use in 2010 is match with that in 2015. Even though the amount of electricity used by any individual state over time is a dependent quantity, electricity use of one state is more or less independent of that in another state. They might be dependent if, say, the United States economy crashed, or if it underwent a sudden boom.

Posted in American Solar Energy Society, American Statistical Association, anomaly detection, Bloomberg New Energy Finance, BNEF, bridge to somewhere, convergent cross-mapping, decentralized electric power generation, decentralized energy, demand-side solutions, dependent data, efficiency, EIA, electricity, electricity markets, energy, energy reduction, energy utilities, engineering, evidence, green tech, local self reliance, Lucas Davis, marginal energy sources, Massachusetts Clean Energy Center, model-free forecasting, multivariate statistics, public utility commissions, rate of return regulation, statistics, Takens embedding theorem | Leave a comment

I’m afraid, dear progressive friends, Mr Maher is 110% correct

I see nearly every week in the comedy called progressive plans for energy sources in the Commonwealth of Massachusetts. Progressives, it seems, eschew cooperation with business and attorneys and, as a result, never get anything respectable done. They are, as I’ve sometimes remarked, in practice, liberal climate deniers, because they rate the survival of their collective political power more important than that of civilization.

(Hat tip to Climate Denial Crock of the Week)

Posted in Anthropocene, atheism, Bill Maher, Buckminster Fuller, Canettes Blues Band, citizenship, civilization, climate, climate change, climate disruption, climate economics, Daniel Kahneman, decentralized energy, destructive economic development, electricity markets, engineering, environmental law, fossil fuel divestment, free flow of labor, global warming, green tech, greenwashing, Hermann Scheer, Humans have a lot to answer for, Hyper Anthropocene, Kevin Anderson, leaving fossil fuels in the ground, liberal climate deniers, local self reliance, Michael Osborne, politics, rationality, the right to be and act stupid, the tragedy of our present civilization, zero carbon | 2 Comments

Investing, and Sharpe’s inequality

See the statement from Sharpe himself.

Hat tip to Matt Levine of Bloomberg.

Posted in investments, statistics | 2 Comments

Liang, information flows, causation, and convergent cross-mapping

Someone recommended the work of Liang recently in connection with causation and attribution studies, and their application to CO2 and climate change. Liang’s work is related to information flows and transfer entropies. As far as I know, the definitive work on that is James, Barnett, and Crutchfield, “Information Flows? A Critique of Transfer Entropies.” The former paper claims, in part,

The whole new formalism is derived from first principles, rather than as an empirically defined ansatz, with the property of causality guaranteed in proven theorems. This is in contrast to other causality analyses, say that based on Granger causality or convergent cross mapping (CCM)

Well I’ve written about CCM here before, in 2013, 2016, and just recently.

Anyway, I don’t see anything obviously superior regarding Liang’s information flows approach, at least in comparison with Granger causality or CCM, and, so, I’ll take conclusions about causation of CO2 and climate they derive with a big grain of salt. I prefer Egbert van Nes, Marten Scheer, Victor Brovkin, Timothy Lenton, Hao Ye, Ethan Deyle, and George Sugihara on “Causal feedbacks in climate change.”

Posted in Akaike Information Criterion, American Association for the Advancement of Science, Anthropocene, attribution, carbon dioxide, climate, climate change, climate disruption, complex systems, convergent cross-mapping, ecology, Egbert van Nes, Ethan Deyle, Floris Takens, George Sughihara, global warming, Hao Ye, Hyper Anthropocene, information theoretic statistics, Lenny Smith, model-free forecasting, nonlinear systems, physics, statistics, Takens embedding theorem, theoretical physics, Timothy Lenton, Victor Brovkin | Leave a comment

Just because the data lies some times doesn’t mean it’s okay to censor it

Or, there’s no such thing as an outlier …

Eli put up a post titled “The Data Lies. The Crisis in Observational Science and the Virtue of Strong Theory” at his lagomorph blog. Think of it: Data lying. Obviously this is worth a remark. After all, the Bayesian project is all above treating data as given and fixed, a nod of deep respect, and then, in a kind of generalization of maximum likelihood philosophy, finding those parameters offered by theory which are most consistent with it. But in experimental and, especially, observational science things aren’t so easy.

So I say … Maybe it is …

Well. Of course. Eddington: “It is also a good rule not to put overmuch confidence in the observational results that are put forward until they are confirmed by theory” (from his book). On the other hand …

It is also possible to score theory’s consistency with experiment with techniques better than t-tests and the like, notably the important information criteria that have been developed (Burnham and Anderson). These are bidirectional. For example, it is entirely possible an observational experiment, however well constructed, might be useless for testing a model. Observational experiments are not as powerful in this regard as are constructed experiments.

But I think the put-down of the random walk as a model is a bit strong. After all, that is the basis of a Kalman filter-smoother, at least in the step-level change version. Sure, the state equation need not assume random variation and could have a deterministic core about which there is random variation. But it is possible to posit a “null model” if you will which involves no more than a random walk to initialize, and then takes advantage of Markov chains as universal models to lock onto and track whatever a phenomenon is.

Better, it’s possible to integrate over parameters, as was done in the bivariate response for temperature anomalies in the above, to estimate best fits for process variance. It’s possible to use priors on these parameters, but the outcomes can be sensitive to initializations. It’s also possible to use non-parametric smoothing splines fit using generalized cross-validation. These are a lot better than some of the multiple sets of linear fits I’ve seen done in Nature Climate Change and they tell the same story:

No doubt, there are serious questions about how pertinent these models are to paleoclimate calculations. However, if they are parameterized correctly, especially in the manner of hierarchical Bayesian models, these could well provide constraints in the way of priors for processes which could be applicable to paleoclimate.

While certainly theory can be used, and much of it is approachable and very accessible, I understand why people might want to do something else. Business and economic forecasts are often done using ARIMA models, even if these are not appropriate.

But there is an important area of quantitative research which offers so-called model-free techniques for understanding complex systems, and, in my opinion, these should not be casually dismissed. In particular, the best quantitative evidence of which I am aware teasing out the causal role CO2 has for forcing at all periods comes from this work. In fact, I’m surprised more people aren’t aware of — and use — the methods Ye, Deyle, Sugihara, and the rest of their team offer.

I should mention, too, that there are R packages called:

  • Package nwfscNLTS: Non-linear time series
  • Package rEDM: an R package for Empirical Dynamic Modeling and Convergent Cross-Mapping
  • Package multispatialCCM: Multispatial Convergent Cross Mapping

[P.S. Sorry, I can’t help it if Judith Curry likes it, too. It’s good stuff.]

But, personally, I like Bayesian Dirichlet stick-breaking …

Posted in Akaike Information Criterion, American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, Anthropocene, Bayes, Bayesian, climate, climate change, climate models, data science, dynamical systems, ecology, Eli Rabett, environment, Ethan Deyle, George Sughihara, Hao Ye, Hyper Anthropocene, information theoretic statistics, IPCC, Kalman filter, kriging, Lenny Smith, maximum likelihood, model comparison, model-free forecasting, physics, quantitative ecology, random walk processes, random walks, science, smart data, state-space models, statistics, Takens embedding theorem, the right to know, Timothy Lenton, Victor Brovkin | 1 Comment

A response to “We might not be certain but …” at … and Then There’s Physics

I posted a response to a comment from the blog author at the ellipsis-loving … and Then There’s Physics. The figures didn’t make it into the comment, and, so, I am reproducing the intended comment in its entirety here.

ATTP, you were correctly pointing out I was partly incorrect, and certainly incomplete. Kudos to you, and apologies, and to the readers.

I hadn’t read Armour 2017. I have now. I did read ATTP’s assessment and, yes, it does mention Armour deals with nonlinearity. And, yes, it does mention that the histogram is from CMIP runs, but I interpreted it differently than it should have been interpreted. I have not read Richardson, and probably won’t. I also assumed that the Armour figure was something Stephens was using in his “criticism of excessive certainty” but have gone back and seen that there is another parse to this post which is consistent with Stephens not mentioning Armour at all.

I also have not read Stephens, and perhaps I should before commenting, but I won’t.

The point I tried to make was essentially that uncertainty and ignorance in a place where a decision ought to be made and when the consequences could be enormous is not the place to claim “It’s okay to remain ignorant.” Essentially, this is enshrining the “Do nothing until someone proves you have to do so” which might work for some common decisions, but taking a big ship into an iceberg-strewn sea because it hasn’t hit anything yet hardly seems prudent.

I also am not convinced, commenting with respect for Armour, that the adjustment for nonlinearity they attempt helps the argument much, and ATTP hinted at that in his previous post (beginning “… A few additional points. We don’t know that these adjustments are correct. However, we do have a situation where there is a mismatch between different climate sensitivity estimates …”). In the public discussion of climate change, highlighting these kinds of papers tends, I think, to convince people there’s more arbitrariness to this process than is correct. After all, there have been similar papers published by Meraner, Mauritsen, and Voigt, as well as Caballero and Huber, the latter focussing upon nonlinearity in ECS and having a good introduction. These emphasize Pierrehumbert’s comment “Here there (may) be dragons”, and, as of 2013,

…there have already been great strides in understanding the magnitude and pattern of warmth in hothouse climates, which have helped resolve some earlier modeling paradoxes, but much remains to be done. In particular, narrowing the broad error bars on past atmospheric CO2 is crucial to relating these climates to what is going on at present.

More recently there is the published work of Friedrich, Timmermann, Tigchelaar, Timm, and Ganopolski.

Consider Pierrehumbert’s equation (3.14) for temperature sensitivity (specifically mean surface temperature) with respect to some parameter, \Lambda, where \Lambda might be, as Pierrehumbert suggests, albedo, or CO2 concentration, or the solar constant:

\frac{dT}{d\Lambda} = -\frac{\frac{\partial{}G}{\partial{}\Lambda}}{\frac{\partial{}G}{\partial{}T}}

Here G is the top-of-atmosphere flux, and \text{OLR} is outgoing longwave radiation at the surface (*). This is pretty standard, even if it is very general, much more general than, say, Armour’s equations (1)-(3). From a statistical perspective what’s striking about the above is that if




are each interpreted to be random variables worthy of estimation by whatever means, then that implies \frac{dT}{d\Lambda} is a random variable which is drawn from a ratio distribution. And should the Highest Density Probability Interval for \frac{\partial{}G}{\partial{}T} include zero, whatever the physical reason, the distribution of \frac{dT}{d\Lambda} is pretty meaningless. A good physical imagination offers any number of ways this could happen, but Professor Pierrehumbert’s discussions in Section 3.4 of his book describes the possible (mathematical) range, irrespective of the geophysical details. And because what we are about is \delta{}T as a function of all relevant \Lambda, that being a total differential, the excessive variability in any one such \Lambda will dominate that of the rest. Note extreme variability is not our friend, no matter what vision of a cultural or economic future we might have.

If ECS is going to continue to be used as the basis of argument and policy, it seems to need to be made far more robust than it is. That’s the point of my argument for much more additional work. If we are to keep this troubled concept in the planning stables, we desperately need to understand the bounds on its applicability. Armour is a start, but Armour simply says there might be problems when we already know there are problems from theory. What we need are constraints. Otherwise, ECS is a “nice to have if the world were a different place.” But then we don’t really have it, except knowing that there could be “dragons” out there.

I think there are much better arguments, and there are much better problems to chase. For instance, here is the definitive plot from Fyfe, Gillett, and Zwiers:

I have noted (**; Section 7) that what’s wrong with this presentation is not that that the Highest Density Probability Interval for the climate models fails to overlap the observational mean and cloud, it’s that there is such a big difference between the observational variance and that of the model ensemble. The specifics of the discrepancy seen as a t-test based upon a difference in means led to the later explanation by Cowtan and Way and then a rebuttal by Fyfe and Gillett. I say, rather, that the reason for the discrepancy is deep, having to do more with the difference in variances (***), and probably not something we can expect most public or most policymakers to understand, at least without understanding something like Leonard Smith’s Chaos: A Very Short Introduction. The climate ensemble simulates all possible futures, and Earth takes one future at a time. I have read all around this in the literature, and there seems to be a confusion about what internal variability means. Yes, there’s unexplained internal variability, but there’s a lot of evidence for stochastic variability even if all the phenomena in internal variability were deeply understood. That’s important, because it makes what Bret Stephens and others like Judith Curry want to do a fundamentally flawed project. This stochastic variability on top of everything could be enough to send us all over some kind of potential cliff, even if emissions were managed to some precalculated minimax loss-versus-economic benefit point.

Here’s a rhetorical question when dealing with the public and policymakers: Why not go back to simple conservation of energy arguments, and point out that radiative forcing from CO2 is indisputable? The excess energy from forcing is going to go somewhere, and where it’s gone in the past may not be where it continues to go, ditto CO2 itself. Sure, this frustrates people who want a cost put on the phenomenon. But making up a cost is arguably worse than saying “We don’t have one.” Will the latter produce inaction? Possibly. But that’s what’s happening now, and people are trying to produce cost estimates.

Oh, and indeed, there are but 21 single socks in the Broman climate collection, per Armour’s count of the number of GCMs used reported at the top right of the second page of their article.

Other work on climate sensitivity is reported by Held and Winton (assuming the NOAA site continues to be maintained), and at Isaac Held’s blog.

(*) See Professor Ray Pierrehumbert’s book for the intimate portrait of Earth as a planet, in the manner of Arnold Ross, with associated and very fine Python code.

(**) WARNING: Not peer-reviewed.

(***) Were the observational variance to be appreciably larger, the conclusion of a statistical test would be that the difference in means was less significant.

Posted in Uncategorized | Leave a comment

Why we sold our Disney Vacation Club timeshares

Hat tip to Climate Denial Crock of the Week, in their “Florida slowly confronting sea level nightmare.”

Posted in Anthropocene, being carbon dioxide, Bloomberg, carbon dioxide, climate, climate change, climate disruption, climate economics, coastal communities, coasts, Disney, Disney Vacation Club, environmental law, flooding, Florida, global warming, greenhouse gases, Hyper Anthropocene, Joe Romm, leaving fossil fuels in the ground, living shorelines, science education, shorelines, sustainability, the right to be and act stupid, the tragedy of our present civilization, the value of financial assets | Leave a comment

March for Science, Boston, 22 April 2017

Cold and wet. A very typical Massachusetts day in Spring.

But great …

Posted in American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, being carbon dioxide, Buckminster Fuller, Earth Day, Environmental Protection Agency, Hyper Anthropocene, Minsky moment, National Center for Atmospheric Research, NCAR, Principles of Planetary Climate, science, science education, scientific publishing, Scripps Institution of Oceanography, Spaceship Earth, Stephen Schneider, Svante Arrhenius, the right to be and act stupid, the right to know, the tragedy of our present civilization, WHOI, Woods Hole Oceanographic Institution, XKCD | Leave a comment

“You don’t have that option.”

Dr Neil deGrasse Tyson. I think he’s awesome. Marvelous. I saw him in Boston. He and I did not get off well, at the start, because of my being awestruck, and feeling very awkward, and the short time we had in his meeting us backstage in Boston. I regret that, but I could not be other than what I was.

But he is someone I will and do always admire, and follow. He knows how to challenge and communicate.

He’s great.

And he would be the first to challenge that.

Because of Science. And its values. “Prove it,” I think he’d say.

This is much better than Religion, although those are my feelings and thoughts, not Dr Tyson’s.

“This is Science. It’s not something to toy with.”

All this is about people, and the human situation. Science is a means of getting beyond that.

“Recognize what Science is, and allow it to be and what it can be in the service of civilization.”

March for Science, Saturday, 22nd April 2017. Earth Day. I will be marching in Boston. And I will be doing it as a member of:

And, I believe, citizen scientists have a big role to play in the Science of now and of the future. And, yes, that’s a very real thing.

Update, 2017-04-21

It seems fitting to have another image of the Pale Blue Dot here, taken by JPL’s Cassini at Saturn, on 12th April 2017.

Posted in American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, Bayesian, citizen data, citizen science, Climate Lab Book, Earth Day, ecological services, ecology, environment, Hyper Anthropocene, Neill deGrasse Tyson, Principles of Planetary Climate, rationality, Ray Pierrehumbert, reason, reasonableness, religion, science, science education, Science magazine, scientific publishing, secularism, Spaceship Earth, sustainability, the right to be and act stupid, the right to know, the tragedy of our present civilization, United States, XKCD | Leave a comment

“Hadoop is NOT ‘Big Data’ is NOT Analytics”

Arun Krishnan, CEO & Founder at \mathbf{n!}\, Analytical Sciences comments on this serious problem with the field. Short excerpt:

… A person who is able to write code using Hadoop and the associated frameworks is not necessarily someone who can understand the underlying patterns in that data and come up with actionable insights. That is what a data scientist is supposed to do. Again, data scientists might not be able to write the code to convert “Big Data” into “actionable” data. That’s what a Hadoop practitioner does. These are very distinct job descriptions.

While the term analytics has become a catch-all phrase used across the entire value chain, I personally prefer to use it more for the job of actually working with the data to get analytical insights. That separates out upstream and downstream elements of the entire data mining workflow.

I have repeatedly observed practitioners and especially managers who treat — or would very much like to treat — tools and techniques from this area as if they were Magical Boxes, to which you can send arbitrary data and obtain wonderful results, like the elixir of the Alchemists. There is also a cynical aspect to the attitude of some managers — some seem indoctrinated by the old “Internet time“ and “agile sprint” notions — that if something does not show tangible and substantial progress over the short term (on the order of a week or two), there is something fundamentally wrong with the process. Sure, progress needs to be shown and reportable, but some problems, especially those involving data which are not obviously meaningful (*), demand a deep familiarization with the data and good deal of data cleansing (**). This is hard, especially when the data are large. And not all worthwhile problems can be solved in two weeks, even for a corporation. Consider the project and planning timelines which a Walt Disney Company does for their parks or a energy company like DONG does for their offshore wind projects.

This is unfortunate, and it is more than simply a matter of personal style. Projects which proceed with the magical thinking that the right tool or algorithm is going to solve all their issues typically fail, after expending large resources on computing assets, data licenses, and labor. When they do, they give analytics and “Big Data” a tarnished reputation, especially among upper management who blame and distrust new things rather than incompetent engineers or, perhaps, engineers without the integrity of explaining to their management that these tools have promise, but the project schedules for venturing into new sources of data are long, and best done with a very small team for the first portion.

In fact, one severe failing of the current suite of “Big Data” tools I see is that, while they are strong on certain modeling algorithms, and representational devices like Python panadas-esque and R-esque data frames, they offer little in the way of advanced data cleaning tools, ones which can marshall clusters to completely rewrite data in order for it to be useful for analysis and machine learning.

(*) Data which are obviously meaningful consist of self-evident records like purchasing transactions, or, as is increasingly less common, have records and fields documented carefully in a data dictionary. These have fallen out of fashion because of the NoSQL movement and I applaud the desire to push analysis and data sources beyond structured data offerings. However, just because an analytical can parse unstructured text does not mean it somehow automatically recovers meaning from that text. Indeed, what you have now, instead of structured data, is a problem in natural language processing, for which there are, indeed, excellent tools available, like Python’s nltk. But few people who embrace NoSQL know or use this kind of thing.

It is even harder to know what to do with semi-structured textual data, such as the headers of IETF RFC 2616. In these cases, while there is official guidance, there is no effective enforcement mechanism and, so, instances of these headers are, by the criteria of the RFC, malformed, even if there dialects in Internet communities which are self-consistent and practiced in breach of the RFC. The trouble is that, here, there is no computable definition of malformed, so what is meaningful is something which needs to be learned from the corpora available. This is not an easy task, and may be dependent not only upon the communities in question, but upon geographic origins and takeup, as well as Internet protocol and netblocks.

(**) There are plenty of examples of these in the single thread, single core world. There is, for instance, an open source version called OpenRefine.

Posted in alchemy, American Statistical Association, artificial intelligence, big data, data science, engineering, Internet, jibber jabber, machine learning, natural language processing, NLTK, sociology, superstition | Leave a comment