Inspired by the extensive and excellent work in approximate Bayesian computation (see also), especially that done by Professors Christian Robert and colleagues (see also), and Professor Simon Wood (see also), it occurred to me that the complaints regarding lack of interpretability of “black box regression models” for, say, the binary logistic regression problem, could be readily resolved using these techniques. Such complaints are offered by:
- Lipton
- Kim, Loh, Shih, and Chaudhuri
- Hofner, Mayr, Robinzonov, and Schmid
- It is especially pointed in the description of the function blackboost in the previous author’s mboost package
Essentially, if the classifier, , is trained to the investigators satisfaction, and is assumed to produce a binary outcome from a space of attributes,
, an estimate of the support for
can be had by simulating draws from
, testing them with
, and retaining a collection of draws for which the classification outcome is affirmative.
Efficient generation of draws from is the key question, but these can be done using many methods, including those described in the comprehensive textbook, Monte Carlo Statistical Methods by Robert and Casella (see also). But actually, in many cases, the generation can be simpler than that.
If independence of the attributes, , from one another is assumed, then a sample of each of their range is available in the training data used to train
. Empirical methods for estimation of each attributes distribution function can be applied, and if the quantile function can be derived from these, then generators of values for each attribute are in hand, by generating uniform deviates on the unit interval and transforming them by these quantile functions. It is then possible to produce a very large number of these, subjecting each to a classification by
. Those classified in the affirmation are retained. Assuming independence can never cause a miss of a portion of the support, for its hypervolume must necessarily be larger than the volume of any portion having contingencies or constraints. That is, dependency means that the space conditional upon another variable is smaller than the independent version.
Once the large collection of accepted attributes are in hand, these can be described by any of the modern means of multivariate presentation and analysis, and these descriptions interpreted as appropriate for the problem domain.