摘要:Recency (R)– 用户最后一次购买的时间。距离现在时间越短,用户再次购买的可能性越大。Frequency (F)-用户在一定时间内购买的次数。频率越高,表明用户对品牌的忠诚度越高。Monetary (M)-用户在一定时间内为公司带来的总收益。金额越高,
在产品运营中,我们经常需要将用户进行分层,以便更好针对性做运营策略。本文分享了如何用SQL结合RFM模型,对用户进行分层的方法,供大家参考学习。
RFM模型通常用于分析用户数据库,以识别最有价值的用户。
Recency (R)– 用户最后一次购买的时间。距离现在时间越短,用户再次购买的可能性越大。Frequency (F)-用户在一定时间内购买的次数。频率越高,表明用户对品牌的忠诚度越高。Monetary (M)-用户在一定时间内为公司带来的总收益。金额越高,表明用户的价值越大。
通过RFM模型,企业可以对用户进行细分,比如将用户分为高价值用户、需要挽留的用户、有潜力的用户等,然后根据这些细分采取不同的营销策略。
作为产品经理如何使用SQL计算RFM模型,对用户进行分层呢?
一、数据源准备用户会员表数据
订单表数据(部分字段)
因 MySQL 性能问题,我们将数据通过Binlog订阅同步到 Hive 进行计算;
2.1、RFM模型的计算步骤如下:确定时间范围:首先确定分析的时间范围,比如过去一年或过去六个月。
这里我们使用
AND TO_DATE(o.SOCreateTime) >= ‘2024-04-01′ AND TO_DATE(o.SOCreateTime)
收集数据:收集客户在所选时间范围内的所有交易记录。
SELECT m.mimemberid AS memberid,
MAX(o.socreatetime) AS last_order_time,
DATEDIFF(‘2024-07-01’, MAX(o.socreatetime)) AS R,
COUNT(o.soordersn) AS F,
SUM(o.sototalamount) AS M
FROM ods_travel.v_teschoolinnermarket_memberinfo m
LEFT JOIN paimon.fts_base_tetravelrvsorder.schoolorder o
ON m.mimemberid = o.somemberid
WHERE o.SOPayStatus = 2
AND m.MIStatus = 0
AND TO_DATE(o.SOCreateTime) >= ‘2024-04-01’
AND TO_DATE(o.SOCreatetime)
GROUP BY m.mimemberid
ORDER BY R ASC;
计算Frequency (F):
计算Monetary (M):
DROP TABLE IF EXISTS adsxyt_travel.userrfm;
CREATE TABLE adsxyt_travel.userrfm
STORED AS ORC AS
WITH mada_order_num AS (
SELECT a.SOMemberId AS memberid, COUNT(*) AS ordernum
FROM paimon.fts_base_tetravelrvsorder.SchoolOrder a
INNER JOIN paimon.fts_base_tetravelrvsorder.SchoolOrderExpand b ON a.SOOrderSn = b.SOEOrderSn
WHERE a.SOPayStatus = 2
AND TO_DATE(a.SOCreateTime) >= ‘2024-04-01’
AND TO_DATE(a.SOCreateTime)
GROUP BY a.SOMemberId
),
base_data AS (
— 查询最原始的RFM值
SELECT m.mimemberid AS memberid, MAX(o.socreatetime) AS last_order_time
, DATEDIFF(‘2024-07-01’, MAX(o.socreatetime)) AS R
, COUNT(o.soordersn) AS F, SUM(o.sototalamount) AS M
FROM ods_travel.v_teschoolinnermarket_memberinfo m
LEFT JOIN paimon.fts_base_tetravelrvsorder.schoolorder o ON m.mimemberid = o.somemberid
WHERE o.SOPayStatus = 2
AND m.MIStatus = 0
AND TO_DATE(o.SOCreateTime) >= ‘2024-04-01’
AND TO_DATE(o.SOCreateTime)
GROUP BY m.mimemberid
),
quartiles AS (
— 按照数据的4分位数计算RFM得分
SELECT *, NTILE(4) OVER (ORDER BY R) AS R_score
, NTILE(4) OVER (ORDER BY F DESC) AS F_score
, NTILE(4) OVER (ORDER BY M DESC) AS M_score
FROM base_data
),
quartiles_fixed AS (
— 四分位数修正
SELECT *
, CASE
WHEN R_score = 1 THEN 4
WHEN R_score = 2 THEN 3
WHEN R_score = 3 THEN 2
ELSE 1
END AS R_score_fixed
, CASE
WHEN F_score = 1 THEN 1
WHEN F_score = 2 THEN 2
WHEN F_score = 3 THEN 3
ELSE 4
END AS F_score_fixed
, CASE
WHEN M_score = 1 THEN 1
WHEN M_score = 2 THEN 2
WHEN M_score = 3 THEN 3
ELSE 4
END AS M_score_fixed
FROM quartiles
),
means AS (
SELECT AVG(R_score_fixed) AS r_mean, AVG(F_score_fixed) AS f_mean
, AVG(M_score_fixed) AS m_mean
FROM quartiles_fixed
)
SELECT qf.memberid, mc.ordernum, qf.last_order_time, qf.R, qf.F
, qf.M, qf.R_score_fixed AS R_score, qf.F_score_fixed AS F_score, qf.M_score_fixed AS M_score, m.r_mean
, m.f_mean, m.m_mean
, CASE
WHEN R_score_fixed > m.r_mean THEN ‘高’
ELSE ‘低’
END AS R_label
, CASE
WHEN F_score_fixed > m.f_mean THEN ‘高’
ELSE ‘低’
END AS F_label
, CASE
WHEN M_score_fixed > m.m_mean THEN ‘高’
ELSE ‘低’
END AS M_label
FROM quartiles_fixed qf
CROSS JOIN means m
LEFT JOIN order_num mc ON qf.memberid = mc.memberid
ORDER BY qf.R ASC;
通过一系列公共表表达式(CTEs)构建了一个RFM(最近购买行为、购买频率、购买金额)分析模型,用于对会员进行分类。首先,它计算了每个会员在指定时间段内的订单数量、最后下单时间、以及基于这些数据的RFM原始值。接着,通过四分位数方法为每个RFM值分配得分,并进行修正以确保得分与会员价值正相关。然后,计算这些得分的平均值,用于确定每个会员的RFM标签(高或低)。最后,结合这些标签和订单数量,对会员进行分类,并按最近购买行为进行排序。
为RFM打分:
将R、F、M的值分别进行标准化或归一化,以便于比较。例如,可以使用排名或百分比来为每个维度打分。Recency可以按照时间从近到远进行排序,然后分配分数,时间越近分数越高。Frequency可以按照购买次数从多到少进行排序,然后分配分数,购买次数越多分数越高。Monetary可以按照总金额从高到低进行排序,然后分配分数,金额越高分数越高。综合RFM得分:
将R、F、M的分数相加,得到每个客户的RFM总分。根据总分将客户分为不同的群体,如高价值客户、需要挽留的客户、低价值客户等。分析和应用:
分析不同RFM群体的特征,制定相应的营销策略。例如,对于高RFM得分的客户,可以提供忠诚度奖励或个性化服务;对于低RFM得分的客户,可以设计促销活动以提高其购买频率和金额。四等位数法
其中使用了四分位数,是统计学分位数中的一种,把所有数值从低到高(或者从高到底)排列并分成四等份,处于三个分割点位置的数值就是四分位数。
一般表示为:
Q1:样本排列中处于25%位置的数字;
Q2:又称为中位数,指的是样本排列中处于50%位置,即中间位置的数据;
Q3:样本排列中处于75%位置的数字。
假设样本数据项数一共是N:
则Q1的位置数值=(N+1)/4;
Q2的位置数值=(N+1)/2;
Q3的位置数值=3(N+1)/4。
如果(N+1)恰好是4的倍数,则确定四分位数比较简单,如果不是4的倍数,相关位置的四分位数就应该是相邻两个数值的标志值的平均数。权数的大小取决于两个数值距离的远近,距离越近权数越大,距离越远,权数越小,权数之和等于1。
DROP TABLE IF EXISTS adsxyt_travel.userrfmcategory;
CREATE TABLE adsxyt_travel.userrfmcategory
STORED AS ORC AS
SELECT memberid, mada_ordernum, last_order_time, R, F, M
, R_score, F_score, M_score, r_mean, f_mean
, m_mean, R_label, F_label, M_label
, CASE
WHEN R_label = ‘高’
AND F_label = ‘高’
AND M_label = ‘高’
THEN ‘重要价值用户’
WHEN R_label = ‘高’
AND F_label = ‘低’
AND M_label = ‘高’
THEN ‘重要发展用户’
WHEN R_label = ‘低’
AND F_label = ‘高’
AND M_label = ‘高’
THEN ‘重要保持用户’
WHEN R_label = ‘低’
AND F_label = ‘低’
AND M_label = ‘高’
THEN ‘重要挽留用户’
WHEN R_label = ‘高’
AND F_label = ‘高’
AND M_label = ‘低’
THEN ‘一般价值用户’
WHEN R_label = ‘高’
AND F_label = ‘低’
AND M_label = ‘低’
THEN ‘一般发展用户’
WHEN R_label = ‘低’
AND F_label = ‘高’
AND M_label = ‘低’
THEN ‘一般保持用户’
WHEN R_label = ‘低’
AND F_label = ‘低’
AND M_label = ‘低’
THEN ‘一般挽留用户’
ELSE ‘未分类’
END AS user_category
FROM adsxyt_travel.userrfm;
使用CASE语句根据R、F、M的标签值(’高’或’低’)来确定用户类别。这些标签可能代表了用户在最近性(Recency)、频率(Frequency)、货币价值(Monetary)三个方面的表现。user_category是根据R、F、M的标签值组合来定义的用户类别,如“重要价值用户”、“重要发展用户”等。(参照上述表格)敏感数据不做暂时,本文提供 SQL 计算解决思路,具体可参照实验。
本文由 @李昂 原创发布于人人都是产品经理。未经作者许可,禁止转载
题图来自Unsplash,基于CC0协议
该文观点仅代表作者本人,人人都是产品经理平台仅提供信息存储空间服务
来源:人人都是产品经理一点号