Active Modeling in Cost Sensitive Environments
Red McCombs School of Business
The University of Texas at Austin
Probability estimates are used by decision makers to evaluate the expected
utility from a set of alternatives, such as for ranking offers to consumers
based on their preferences. Supervised learning is often used to build class
probability estimates; however, it often is very costly to obtain training
data with outcome labels. Active learning aims to economize on the cost of
learning by identifying especially informative data for labeling. We outline
the critical features for an active learning approach and present an active
learning method for estimating class probabilities and ranking. We show that
the method significantly reduces the costs of learning, economizing on the
number of learning instances that must be obtained and labeled across a wide
variety of domains. In a direct marketing domain Bootstrap-lv exhibits significant
dollar savings for building targeting models from data that is noisy and especially
difficult to learn from. We investigate the contribution of the components
of the algorithm and establish that each contributes to help identify informative
examples. We analyze the performance of our approach against alternatives demonstrating
its superiority and providing insight for improving existing active learning
approaches.