Generating supports for classification rules in black box regression models

Inspired by the extensive and excellent work in approximate Bayesian computation (see also), especially that done by Professors Christian Robert and colleagues (see also), and Professor Simon Wood (see also), it occurred to me that the complaints regarding lack of interpretability of “black box regression models” for, say, the binary logistic regression problem, could be readily resolved using these techniques. Such complaints are offered by:

Essentially, if the classifier, \mathcal{C}, is trained to the investigators satisfaction, and is assumed to produce a binary outcome from a space of attributes, \mathcal{A}, an estimate of the support for \mathcal{C} can be had by simulating draws from \mathcal{A}, testing them with \mathcal{C}, and retaining a collection of draws for which the classification outcome is affirmative.

Efficient generation of draws from \mathcal{A} is the key question, but these can be done using many methods, including those described in the comprehensive textbook, Monte Carlo Statistical Methods by Robert and Casella (see also). But actually, in many cases, the generation can be simpler than that.

If independence of the attributes, \mathcal{A}, from one another is assumed, then a sample of each of their range is available in the training data used to train \mathcal{C}. Empirical methods for estimation of each attributes distribution function can be applied, and if the quantile function can be derived from these, then generators of values for each attribute are in hand, by generating uniform deviates on the unit interval and transforming them by these quantile functions. It is then possible to produce a very large number of these, subjecting each to a classification by \mathcal{C}. Those classified in the affirmation are retained. Assuming independence can never cause a miss of a portion of the support, for its hypervolume must necessarily be larger than the volume of any portion having contingencies or constraints. That is, dependency means that the space conditional upon another variable is smaller than the independent version.

Once the large collection of accepted attributes are in hand, these can be described by any of the modern means of multivariate presentation and analysis, and these descriptions interpreted as appropriate for the problem domain.

About hypergeometric

See http://www.linkedin.com/in/deepdevelopment/ and http://667-per-cm.net
This entry was posted in approximate Bayesian computation, Bayes, Bayesian, Bayesian inversion, generalized linear models, machine learning, numerical analysis, numerical software, probabilistic programming, rationality, reasonableness, state-space models, statistics, stochastic algorithms, stochastic search, stochastics, support of black boxes. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s