Researcher: Perlate Diala, University of the Witwatersrand, Johannesburg
Supervisor: Dr. Hairong Wang
Loss of customers in telecommunication industries has become one of the major concerns in recent years. This is due to a very high of competition among industries and the customer acquisition costs, so it is of great value to keep existing customers. For that purpose, it is of great significant to prevent churn by implementing prediction models that are effective and accurate. However, the major problems with building models for telecommunication are large volumes of data, enormous feature space and Class Imbalance Problem (CIP). This study aims to compare the performance of various machine learning classifiers for the prediction of customer churn in telecommunication. In particular, we explore some pre-processing of the dataset such as dimensionality reduction and seven oversampling techniques to reduce CIP, and hence to improve the performance of the concerned machine learning models. To evaluate the performance of selected machine learning models, the Receiver Operating Characteristic and Area Under the Curve (ROC-AUC curve) was adopted. The experimental results showed that the Logistic Regression classifier coupled with Random Oversampling (ROS) and dimensionality reduction based on linear autoencoder performs better than all other classifiers.