Active Modeling in Cost Sensitive Environments

 

Maytal Saar-Tsechnsky

Red McCombs School of Business
The University of Texas at Austin

 

Probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives, such as for ranking offers to consumers based on their preferences. Supervised learning is often used to build class probability estimates; however, it often is very costly to obtain training data with outcome labels. Active learning aims to economize on the cost of learning by identifying especially informative data for labeling. We outline the critical features for an active learning approach and present an active learning method for estimating class probabilities and ranking. We show that the method significantly reduces the costs of learning, economizing on the number of learning instances that must be obtained and labeled across a wide variety of domains. In a direct marketing domain Bootstrap-lv exhibits significant dollar savings for building targeting models from data that is noisy and especially difficult to learn from. We investigate the contribution of the components of the algorithm and establish that each contributes to help identify informative examples. We analyze the performance of our approach against alternatives demonstrating its superiority and providing insight for improving existing active learning approaches.