Machine Learning Algorithm — RFM Model

Introduction to Customer Segmentation using Python

Sivaranjani Prabasankar
5 min readJun 1, 2020

In this tutorial, we will be learning how to implement customer segmentation using RFM (Recency, Frequency, Monetary) analysis using Python.

What is Customer Segmentation?

Customer segmentation is the process of dividing customers into groups or clusters based on common characteristics.

In Business to Business model, the company can segment customers based on various factors like

1) Demographic (Age, Gender, Occupation, Marital Status)

2) Geographic (Location, Region, Urban/Rural)

3) Behavioural (Spending, Consumption, Usage)

4) Psychographic (Lifestyle, Social Status, Personality)

Need for Customer Segmentation

To identify the most potential customers

To easily communicate with a targeted group of the audience

To improve the quality of service, loyalty, and retention

Better understanding needs of the customer in each segment

For better upselling and cross-selling of products

To identify new products that customers could be interested in

Customer Segmentation using RFM Analysis

RFM (Recency, Frequency, Monetary) analysis is a behaviour-based approach grouping customers into segments. It groups the customers based on their previous purchase transactions considering the factors like

· How recently

· How often

· How much did a customer buy

RFM filters customers into various groups for the purpose of better service. It helps managers to identify potential customers to do a more profitable business.

There is a segment of customer who is the big spender but what if they purchased only once or how recently they purchased? Do they often purchase our product?

Also, it helps the company to run an effective promotional campaign for personalized service

· Recency (R): Who have purchased recently? Number of days since last purchase (least recency)

· Frequency (F): Who has purchased frequently? It means the total number of purchases. (high frequency)

Monetary Value(M): Who have high purchase amount? It means the total money customer spent (high monetary value)

Steps in Python

1) Load Data

Data source: https://www.kaggle.com/jihyeseo/online-retail-data-set-from-uci-ml-repo?

2) Data Exploration

a) Unique Values

b) Grouping Country wise

3) Data Cleaning

a) Removing Rows with negative values for quantity

b) Checking for Duplicate values

c) Checking for Missing values and Dropping them

4) Data Pre-processing

a) Calculating Total price and adding it as a new feature

b) Calculating Snapshot date using recent invoice date

5) RFM Customer Segmentation

I. Calculate RFM values

Recency

i) Group customer based on their Id

ii) Find the difference between invoice date and snapshot date to know their recency

Frequency

i) Group customer based on their Id

ii) Count the no. of. invoice for each customer to know their frequency

Monetary

i) Group customer based on their Id

ii) Sum the total purchase using invoice amount for each customer to know their monetary value

II. Plot RFM distributions

This plot provides us with some very interesting insights and how skewed our data is. The important thing to take note here is that we will be grouping these values in quantiles. However, when we examine our customer segmentation using K-Means in the next, it will be very important to ensure that we scale our data to centre the mean and standard deviations.

III. Create R, F and M groups based on Quartiles

IV. Calculate RFM score — Add a new column to combine RFM score

V. Adding customer in each segment bins to base on RFM score

VI. Calculating mean in each segment bins for better segment stats

VII. Plotting a map based on segment stats for better understanding

Conclusion

In this tutorial, we covered a lot of details about Customer Segmentation, RFM analysis and Implementation of RFM in python. Also, we learned some basic concepts of pandas such as handling duplicates, groupby, qcut() and squarifyplot for bins based on sample quantiles.

Hopefully, this would help to analyze your own datasets.

Happy Learning!

Sivaranjani

--

--

No responses yet