Machine Learning Algorithm — RFM Model
Introduction to Customer Segmentation using Python
In this tutorial, we will be learning how to implement customer segmentation using RFM (Recency, Frequency, Monetary) analysis using Python.
What is Customer Segmentation?
Customer segmentation is the process of dividing customers into groups or clusters based on common characteristics.
In Business to Business model, the company can segment customers based on various factors like
1) Demographic (Age, Gender, Occupation, Marital Status)
2) Geographic (Location, Region, Urban/Rural)
3) Behavioural (Spending, Consumption, Usage)
4) Psychographic (Lifestyle, Social Status, Personality)
Need for Customer Segmentation
To identify the most potential customers
To easily communicate with a targeted group of the audience
To improve the quality of service, loyalty, and retention
Better understanding needs of the customer in each segment
For better upselling and cross-selling of products
To identify new products that customers could be interested in
Customer Segmentation using RFM Analysis
RFM (Recency, Frequency, Monetary) analysis is a behaviour-based approach grouping customers into segments. It groups the customers based on their previous purchase transactions considering the factors like
· How recently
· How often
· How much did a customer buy
RFM filters customers into various groups for the purpose of better service. It helps managers to identify potential customers to do a more profitable business.
There is a segment of customer who is the big spender but what if they purchased only once or how recently they purchased? Do they often purchase our product?
Also, it helps the company to run an effective promotional campaign for personalized service
· Recency (R): Who have purchased recently? Number of days since last purchase (least recency)
· Frequency (F): Who has purchased frequently? It means the total number of purchases. (high frequency)
Monetary Value(M): Who have high purchase amount? It means the total money customer spent (high monetary value)
Steps in Python
1) Load Data
Data source: https://www.kaggle.com/jihyeseo/online-retail-data-set-from-uci-ml-repo?
2) Data Exploration
a) Unique Values
b) Grouping Country wise
3) Data Cleaning
a) Removing Rows with negative values for quantity
b) Checking for Duplicate values
c) Checking for Missing values and Dropping them
4) Data Pre-processing
a) Calculating Total price and adding it as a new feature
b) Calculating Snapshot date using recent invoice date
5) RFM Customer Segmentation
I. Calculate RFM values
Recency
i) Group customer based on their Id
ii) Find the difference between invoice date and snapshot date to know their recency
Frequency
i) Group customer based on their Id
ii) Count the no. of. invoice for each customer to know their frequency
Monetary
i) Group customer based on their Id
ii) Sum the total purchase using invoice amount for each customer to know their monetary value
II. Plot RFM distributions
This plot provides us with some very interesting insights and how skewed our data is. The important thing to take note here is that we will be grouping these values in quantiles. However, when we examine our customer segmentation using K-Means in the next, it will be very important to ensure that we scale our data to centre the mean and standard deviations.
III. Create R, F and M groups based on Quartiles
IV. Calculate RFM score — Add a new column to combine RFM score
V. Adding customer in each segment bins to base on RFM score
VI. Calculating mean in each segment bins for better segment stats
VII. Plotting a map based on segment stats for better understanding
Conclusion
In this tutorial, we covered a lot of details about Customer Segmentation, RFM analysis and Implementation of RFM in python. Also, we learned some basic concepts of pandas such as handling duplicates, groupby, qcut() and squarifyplot for bins based on sample quantiles.
Hopefully, this would help to analyze your own datasets.
Happy Learning!
Sivaranjani