Building effective data mining models: a case study of customer churn prediction

Author: Grzegorz Migut
Publisher: edu-Libri
Edition: Kraków, 2024
ISBN: 978-83-66395-79-4
Pages: 252
Price: 99 PLN

ORDER FORM


Building a customer retention model is a multi-stage process, full of challenges and dependent on many interrelated factors. This book guides the reader through the key stages of the process—from defining the business parameters of the model to an overview of modeling strategies and techniques, including:

  • data preparation,
  • building effective classification models,
  • validation and deployment of retention models.

Special attention is given to assessing the predictive power of the model, including sensitivity analysis of metrics with respect to class imbalance. The book also includes a practical case study that illustrates, step by step, the key actions and decisions taken during model development. This case can serve as inspiration and a reference for understanding and following each stage of the modeling process—from initial decisions to final implementation.

The research results may support the selection of optimal model-building strategies, increasing the chances of achieving the desired outcomes.

The author, with over 20 years of consulting experience, shares practical knowledge and presents examples that help build effective analytical models. For years, as part of StatSoft Polska, he has delivered courses and lectures at prestigious postgraduate programs, introducing participants to the world of data analysis, data mining, and machine learning. His extensive experience includes supporting clients from academic, business, and industrial sectors, where he combines theory with real-world analytical challenges.


Table of contents:

  1. Introduction
    Customer retention as a field of marketing modeling
    1.1. The role of modeling in marketing research
    1.2. Relationship marketing concept in customer loyalty modeling
    1.3. Buyer loyalty – definitions and determinants
    1.4. Types and levels of consumer loyalty
    1.5. Methods of measuring and modeling loyalty – selected approaches
    1.6. Methodologies for building data mining models for customer retention
    1.7. Summary
  2. Data preparation in customer retention modeling
    2.1. Defining key parameters of the analytical project
    2.2. Data reliability, identifying and imputing missing data
    2.3. Methods for mitigating issues with heterogeneous datasets
    2.4. Variable transformations and preparation of derived variables
    2.5. Unbalanced target variable distribution as a challenge in rare event prediction
    2.6. Summary
  3. Building a classification model
    3.1. Variable selection in model development
    3.1.1. Filter methods – directed and undirected
    3.1.2. Wrapper methods
    3.1.3. Embedded methods
    3.1.4. Summary
    3.2. Choosing a modeling technique
    3.2.1. Logistic regression
    3.2.2. Neural networks
    3.2.3. Support vector machines
    3.2.4. Classification and regression trees
    3.2.5. Multi-model approach
    3.3. Hyperparameter optimization
    3.4. Summary
  4. Validation and deployment of customer retention models
    4.1. Goodness-of-fit metrics for retention models
    4.1.1. Metrics based on classification confusion matrices
    4.1.2. Quality metrics and a posteriori probability
    4.1.3. Metrics based on probabilistic interpretation of model output
    4.2. Validation strategies for retention models
    4.2.1. Data-splitting methods
    4.2.2. Resampling-based methods
    4.3. Determining the optimal cut-off point
    4.4. Customer selection for sales and retention strategies (uplift modeling)
    4.4.1. Building uplift models
    4.4.2. Evaluating uplift models
    4.4.3. Monitoring retention models
    4.5. Summary
  5. Step-by-step identification of a customer churn model
    5.1. Understanding and preparing the data
    5.1.1. Data quality assessment
    5.1.2. Step-by-step data preparation
    5.1.3. Dataset segmentation
    5.1.4. Dataset partitioning
    5.1.5. Missing data imputation
    5.1.6. Additional categorization of categorical variables
    5.1.7. Derived variables in the model
    5.1.8. Initial variable selection
    5.1.9. Model development using selected analytical tools
    5.2. Logistic regression model
    5.3. Classification and regression tree (CART) model
    5.4. Boosted tree model
    5.5. Neural network model (multilayer perceptron)
    5.6. Final conclusions

Do you have questions?

Get in Touch!

Our team is ready to help with any questions you might have. Just fill out the form, send us a message, or give us a call, and we’ll get back to you as soon as we can!

    Skip to content