The Problem
Our competition data are customer analytics information, being used to predict (classify) customer behavior like churn (customers leaving the service for another option). There are approximately 7500 data examples provided, consisting of 107 features each. Of these, the first 41 features are numeric (real-valued); the next 28 are discrete categorical, and the final 38 are binary-valued. The nature of the features (what quantities they correspond to) has been either lost or deliberately obscured for privacy reasons. We have also pre-processed the data in a number of ways (balancing the two classes, imputing some missing data, etc.)