By Nesrin Othmann | September 20, 2018
Uplift Modeling Blog
Nesrin Othmann
20 September 2018
Motivation
This blog post is a guide to help readers build a code for Uplift Models from the very basics. The blog post introduces in the topic of Uplift Models and their differences. Furthermore, the blog post provides a step-by-step coding tutorial on implementing and evaluating Uplift Models in R. Finally, there will be an open access data example, which can be used by readers.
1. Introduction into Uplift Modeling
E-Mails with a coupon, flyers in the letter box and two products for the price of one. Companies use many tricks to convince customers to buy certain products. But are these marketing actions and campaigns effective at all? Or do companies even have to pay for the campaign. Therefore, it is important to target the relevant customers. But how can you measure the effectiveness of such campaign? Does the customer buy because of the campaign or because the customer already wanted to buy a certain product or service?
1.1 The traditional Response Model
The traditional Response Model targets those customers who have already bought during a last campaign or the customers who has the characteristics of historical buyers. The big problem of this model is, that it doesn´t differ between the customers who buy because of the campaign and customers who buy because they want to buy anyway. This an important issue, because the traditional Response Model is potentially wasting money to the customers who will buy anyway.
1.2 Uplift Models
Uplift Models try to target those people, who will be most likely buy because of the campaign (Persuadables) and therefore overcome the problem of the traditional Response Model. But it isn´t as easy as it sounds. Persuadables buy if they were treated with a campaign, but you can´t treat and not treat an observation at the same time. So, it is unknow how a treated customer who responded would react if he/she hasn´t been treated.
The treated responders are important to target, because they include possible Persuadables.
1.3 Different Uplift Models
There are different Uplift Models existing. The Two-Model-Approach models the Uplift through the difference of the response probabilities in the Treatment and Control group. The response probabilities are calculated separately for each group, which leads to an approach based on two models (Radcliffe and Surry, 1999).
The approach from Lo (2002) (Lo´s Approach) changes the independent variables in a logistic regression. The model includes a dummy treatment variable and treatment interaction terms as independent variables. The model is based and learned on one model, but the predicted probabilities are calculated for both groups: the treatment and control group. For the calculation of the predicted probabilities the dummy treatment variable in the test set is set to 1 for the treatment group and 0 for the control group Lo (2002).
The Class variable transformation (CVT) Model(Jaskowski und Jaroszewicz, 2012) and Lai´s Weighted Uplift Method (LWUM) (Kane, Lo und Zheng, 2014) are models which redefine the dependent y variable. The redefinition results of the idea that the Control Non-Responders also could include possible Persuadables, because the observed people hadn´t had the chance to be treated and to react on the treatment. If they could have been treated, they could have maybe responded. Now there are not only the Treated Responders important but also the Control-Non-Responders (Figure 2), which also could include Persuadables. That leads into a new y variable (in literature called z-variable) which is 1 if the observation was treated and responded or the observation wasn´t treated and didn´t respond. Otherwise the new y variable will be 0. The difference between the CVT and LWUM is, that the LWUM consider weights in den dataset concerning the treatment and control group (Kane, Lo und Zheng, 2014).
1.4 Evaluating Uplift Models
The evaluation of Uplift Models is leant on gain charts, but especially constructed for Uplift Models. Gain charts are built by sorting the main population from the best to the worst lift performance and partitioning in segments. The y-axis represents the cumulative incremental gains and the x-axis the proportion of the population targeted. There is an Uplift Curve and a random curve based on the calculation of every segment. The Qini-Coefficient is the difference between the area under the Uplift Curve and the random curve. A Qini-Coefficient near one represents a good performance of the Uplift Model and a Qini-Coefficient near zero a worse one.
For demonstrating the superior performance of Uplift Models based on datasets of the INFORMATION package in R. The code mentioned below shows an example of the model building and uplift prediction process, the calculation of the Qini-Coefficient and a graphical visualization of the Qini-Charts (Qini-Curve).
2. Step-by-Step Coding Tutorial
All the functions can use five different Models: the traditional Response Model, the Two-Model-Approach, Lo´s Approach, the CVT and the LWUM.
2.1 Function 1: PredictionProcess
PredictionProcess = function(dataset,Conversion,Treatment, Model = "LWUM"){
#rename the conversion and treatment vectors for standardization
d1 = dataset[,-Conversion]
d2= d1[,-Treatment]
Dataset = cbind(Conversion,Treatment,d2)
as.data.frame(Dataset)
n=names(Dataset)
n[1]="Conversion"
n[2]="Treatment"
#Check the arguments for Model
if (!Model%in% c("LWUM", "tradResponse","2ModApp","CVT","LosApp"))
stop("Model must be either LWUM,tradResponse, 2ModApp, LosApp or CVT")
#different model building
#yvar is the conversion variable
if(Model == "tradResponse"||Model=="LosApp"){Dataset$yvar = Dataset$Conversion}
else if (Model == "2ModApp"){
#Split the Dataset in two different Datasets depending if there was a treatment or not
Dataset1=Dataset
new = which(Dataset1$Treatment==1)
Dataset1 = Dataset1[new,]
Dataset2 = Dataset
new2 = which(Dataset2$Treatment==0)
Dataset2 = Dataset2[new2,]
#yvar is the conversion variabl
Dataset1$yvar = Dataset1$Conversion
Dataset2$yvar = Dataset2$Conversion}
#new variable yvar is the response variable in a logistic regression
else if(Model=="CVT"){ Dataset$yvar = 0
#yvar is redefined in LWUM: represents the z-variable
Dataset$yvar[Dataset$Conversion == 1 & Dataset$Treatment == 1] =1
Dataset$yvar[Dataset$Conversion == 0 & Dataset$Treatment == 0] =1
Dataset$yvar}
else if(Model=="LWUM"){Dataset$yvar = 0
#yvar is redefined in LWUM: represents the z-variable
Dataset$yvar[Dataset$Conversion == 1 & Dataset$Treatment == 1] =1
Dataset$yvar[Dataset$Conversion == 0 & Dataset$Treatment == 0] =1
Dataset$yvar
proportiontreat=sum(Dataset$Treatment==1)/length(Dataset$Treatment)
Dataset$weight = ifelse(Dataset$Treatment == 1, 1 - proportiontreat,proportiontreat)
}
#Model learning and prediction with cross validation
if(Model == "tradResponse"||Model=="LosApp"||Model=="CVT"||Model=="LWUM"){
# Shuffle the Dataset -> shuffled
n = nrow(Dataset)
shuffled = Dataset[sample(n),]
# random seed
set.seed(1)
#cross validation
for (i in 1:4) {
# indices determine intervals of the test set
indices = (((i-1) * round((1/4)*nrow(shuffled))) + 1):((i*round((1/4) * nrow(shuffled))))
# encapsulate the test sets from the Dataset = train set
train = shuffled[-indices,]
# indices (0.25% of the data) is the test set
testset = shuffled[indices,]
# A Model will be learned with every training set:
if(Model=="LWUM"||Model=="tradResponse"||Model=="CVT"){
train$Treatment = NULL
train$Conversion=NULL
if(Model=="LWUM"){train$weight=NULL}
#build a logistic regression
logisticReg = glm(yvar ~., data = train, family = "binomial")
#make predictions on the test set
pred = predict(logisticReg, testset, type = "response")
if(Model=="LWUM"){pred=2*pred*testset$weight-1}
if(Model=="CVT"){pred=2*pred-1}
Testdata = data.frame(pred,testset)}
else if(Model=="LosApp"){ train$Conversion =NULL
logisticReg = glm(yvar ~.*Treatment, data = train, family = "binomial")
testset1=testset
testset1$Treatment=rep(1,length(testset1$Treatment))
pred1 = predict(logisticReg, testset1, type = "response")
testset2=testset
testset2$Treatment=rep(0,length(testset2$Treatment))
pred2= predict(logisticReg, testset2, type = "response")
pred = pred1-pred2
Testdata = data.frame(pred,testset)}}}
else if(Model=="2ModApp"){
n =nrow(Dataset1)
shuffled = Dataset1[sample(n),]
# random seed
set.seed(1)
#cross validation
for (i in 1:4) {
indices = (((i-1) * round((1/4)*nrow(shuffled))) + 1):((i*round((1/4) * nrow(shuffled))))
train = shuffled[-indices,]
testset1 = shuffled[indices,]
train$Treatment=NULL
train$Conversion=NULL
#logistische Regression bilden
logisticReg1 = glm(yvar ~., data = train, family = "binomial")}
n = nrow(Dataset2)
shuffled = Dataset2[sample(n),]
# random seed
set.seed(1)
#cross validation
for (i in 1:4) {
indices = (((i-1) * round((1/4)*nrow(shuffled))) + 1):((i*round((1/4) * nrow(shuffled))))
train = shuffled[-indices,]
testset2 = shuffled[indices,]
train$Treatment=NULL
train$Conversion=NULL
logisticReg2 = glm(yvar ~., data = train, family = "binomial")}
testset=rbind(testset1,testset2)
pred1 = predict(logisticReg1, testset, type = "response")
pred2=predict(logisticReg2, testset, type = "response")
pred=pred1-pred2
Testdata = data.frame(pred,testset)}
return(Testdata)
}
The first Function PredictionProcess gets a few arguments for the input: The dataset, the interesting variables: Conversion and Treatment, and which Model should be used. The Conversion and Treatment variables have different names in different datasets. This is the reason why we need to rename the two columns for an easier use in the function. By excluding both columns from the dataset and then merging them with the new dataset (without the Treatment and Conversion variable) will lead to a dataset where the Conversion variable is the first column and the Treatment variable the second column. Then they get renamed and the model argument needs to be checked. After that the different models are built depending on their content.
Except from the LWUM and the CVT the other models use Conversion as the independent y variable (yvar). The function need to redefine the independent variable in case of the LWUM or CVT-Model. The LWUM also needs a calculation of the weight, which is the proportion of treated observations per all observations. The Two-Model-Approach is in model building also special, because the dataset needs to be split in two datasets depending on the belonging to the treatment or control group.
For model learning and prediction the dataset needs to be split in a test and training set. For a better prediction the use of a cross validation process is recommended and used in the Function. The cross-validation process includes four iterations. But beforehand the dataset needs to be shuffled to provide a fair and random distribution of the dependent variable in every observation. The first For-Loop treats the models except from the Two-Model-Approach, because the Two-Model-Approach requires two separate learning processes as already mentioned. After building a train set the models will be learnt depending on their definition. Every model will be learnt by a logistic regression with their predefined y-variable and after that the prediction will be made with the test set.
There are differences in the model building, f.ex. in the logistic regression for learning Lo´s Model it is necessary to include the Treatment Variable and interaction terms in the regression as independent variables. After learning there will be built two prediction values, within setting the dummy treatment variable in the test set to 1 for the treatment group and 0 for the control group, which is already told before. The predicted uplift is the difference between those predictions.
For the traditional Response Model and the CVT-Model there are no specialties except from the different y-variable. The predicted uplift for the LWUM and CVT Model will be calculated with the formula based on Jaskowski and Jaroszewicz (2012), but the LWUM includes the weight variable in the formula.
The Two-Model-Approach is a bit more elaborate. After separating the dataset into a treatment-set and control-set each set requires their own learning process including a separate cross-validation process. When it comes to the prediction the two test sets resulting from the two processes needs to be merged to one test set. Then, with the two regressions two prediction values will be learnt based on one test set. The predicted uplift is the difference between those predictions.
In the end, the prediction variable will be merged with the test dataset, where the prediction variable is the first column, the Conversion variable will switch to the second place and the Treatment variable is on the third place. This is important for further execution.
2.2 Function 2: Qini
####Qini-Function######
Qini=function(predVector, y, ct, Segments = 10) {
#Split Dataset in different segments
seg = cbind(predVector = predVector, y = y, ct = ct, predrank = rank(-predVector))
#Distance of segments
brk = unique(quantile(seg[, 4],
probs = seq(0, 1, 1 / Segments)))
#Creating segments: break after distance of segments depending on the range
seg = cbind(seg, decile = cut(seg[, 4], breaks = brk, labels = NULL,include.lowest = TRUE))
Rc = tapply(seg[seg[, 3] == 0, ][, 2], seg[seg[, 3] == 0, ][, 5], sum)
Rt = tapply(seg[seg[, 3] == 1, ][, 2], seg[seg[, 3] == 1, ][, 5], sum)
RcMean = tapply(seg[seg[, 3] == 0, ][, 2], seg[seg[, 3] == 0, ][, 5], mean)
RtMean = tapply(seg[seg[, 3] == 1, ][, 2], seg[seg[, 3] == 1, ][, 5], mean)
Nc = tapply(seg[seg[, 3] == 0, ][, 2], seg[seg[, 3] == 0, ][, 5], length)
Nt = tapply(seg[seg[, 3] == 1, ][, 2], seg[seg[, 3] == 1, ][, 5], length)
PM = merge(cbind(Rc, RcMean, Nc), cbind(Rt, RtMean, Nt), by= "row.names", all = TRUE)
PM$Row.names = as.numeric(PM$Row.names)
PM[, c(2, 4, 5, 7)][is.na(PM[, c(2, 4, 5, 7)])]= 0 # missing implies 0 counts
PM = PM[order(PM$Row.names),]
res = cbind(Gruppe=PM$Row.names, Nt=PM$Nt, Nc=PM$Nc, Rt = PM$Rt, Rc = PM$Rc,RtMean = PM$RtMean,RcMean = PM$RcMean)
# Uplift
#PM$uplift = (Rt-((Rc*Nt)/Nc))
#alternative Uplift:
PM$uplift = (Rt/Nt)-(Rc/Nc)
#cumulative incremental gain
PM$cig =(res[, 4] - res[, 5] * sum(res[, 2]) / sum(res[, 3])) / sum(res[, 2])
PM$Is = rep(0,Segments)
PM$Is = PM$cig[1]
for(i in 2:Segments){ #cumulate with a for-loop
PM$Is[i] = PM$Is[i-1]+PM$cig[i]
}
cumincrgain = PM$Is
# alternative to incremental cumulative gain: cumsum(PM$cig)
#overall incremental gain
Ioa = sum(res[, 4]) / sum(res[, 2]) - sum(res[, 5]) / sum(res[, 3])
PM$Ioa= Ioa
groupNum = nrow(PM)
#random cumulative incremental gain
PM$Irand1 = Ioa/groupNum
PM$Irand= rep(0,Segments)
PM$Irand = PM$Irand1[1]
for(i in 2:Segments){PM$Irand[i] = PM$Irand[i-1]+PM$Irand1[i]} #cumulate with a for-loop
randcumincrgain = PM$Irand
# alternative to for-loop: Irand = cumsum(rep(Ioa / groupNum, groupNum))
#Area under the Uplift-Curve
x = seq(1 / groupNum, 1, 1 / groupNum)
y = cumincrgain
AUC = 0
for (s in 2:length(x)) { #with a for-loop calculate the sum of the particular/individual trapezes/areas
width =x[s] - x[s-1] #width of a trapeze equates to the particular segment sections on the x-axis
height = y[s] + y[s-1] #height of a trapeze equates to the cumulative incremental gain per segment
AUCSegment = 0.5*width*height
AUC = AUC + AUCSegment}
#Top segment
AUC1 = 0.5*(x[2]-x[1])*(y[2]+y[1])
#area under the random curve
x = seq(1 / groupNum, 1, 1 / groupNum)
y.randcumincrgain = randcumincrgain
AUCrand = 0
for (i in 2:length(x)) {#with a for-loop calculate the sum of the individual trapezes/areas
width = x[i] - x[i-1]
height = y.randcumincrgain[i] + y.randcumincrgain[i-1]
AUCrandSegment = 0.5*width*height
AUCrand = AUCrand + AUCrandSegment}
#Top segment
AUCrand1 = 0.5*(x[2]-x[1])*(y.randcumincrgain[2]+y.randcumincrgain[1])
#Difference of the areas: Qini-coefficient
Qini = AUC - AUCrand
#Qini top segment
QiniTop= AUC1-AUCrand1
#Define output of the function
PM = PM[order(PM$Row.names), ]
res = cbind(group = PM$Row.names,Nt= PM$Nt,Nc= PM$Nc,Rt = PM$Rt, Rc = PM$Rc,RtMean = PM$RtMean,RcMean = PM$RcMean,
uplift = PM$uplift,cumincrgain = PM$Is, Ioa, randcumincrgain= PM$Irand,AUC,AUCrand,Qini,QiniTop,groupNum)
res = round(res, 6)
return(as.data.frame(res))}
The Qini-Function has the duty to calculate all the relevant calculations which are necessary for the evaluation of the models. The function takes four arguments, the first one is the vector of the prediction, the second and third arguments are the treatment and conversion variables and at last the number of desired segments.
The function starts with a splitting of the dataset into the desired number of segments (in the code ten by default). After that, for each segment the sum and the mean values of the responses depending on the treatment and control group are calculated.
After that the number of the treated observations and the number of the not treated (control) observations are counted per segment. These new designed vectors are used to calculate the actual uplift, the cumulative incremental gain, the overall incremental gain, the random cumulative incremental gain. The cumulation in the code is calculated by a for-loop.
For the calculation of the area under the curves (uplift or random) the cumulative incremental gain (uplift or random) is needed, which represents the y-axis of the coordinate system. For each segment (segments on the x-axis) the area needs to be calculated and then cumulative sum up, that the whole area under the curve is captured. The area of one segment is calculated with a trapeze formula. The difference of the area under the uplift and the random curves is the Qini-Coefficient.
2.3 Function 3: QiniPlot
QiniPlot <- function(Model="LWUM",cumincrgain,randcumincrgain, groupNum){
#Check the arguments for Model
groupNum <- length(groupNum)
if (!Model%in% c("LWUM","tradResponse","2ModApp","LosApp","CVT"))
stop("Model must be either LWUM,tradResponse, 2ModApp, LosApp or CVT")
miny <- 100 * min(c(randcumincrgain, cumincrgain))
maxy <- 100 * max(c(randcumincrgain, cumincrgain))
if(Model=="LWUM"){
headline = "Qini Curve LWUM (logistic regression)"}
else if(Model=="tradResponse"){
headline = "Qini Curve traditional Response Model (logistic regression)"}
else if(Model=="LosApp"){
headline = "Qini Curve Lo´s Approach (logistic regression)"}
else if(Model=="2ModApp"){
headline = "Qini Curve Two Model Approach (logistic regression)"}
else if(Model=="CVT"){
headline = "Qini Curve CVT (logistic regression)"}
if(Model=="LosApp"||Model=="2ModApp"||Model=="CVT"||Model=="LWUM"){Mdl="Uplift Model"}
else{Mdl="traditional Response Model"}
Qini_Plot <- plot(cumincrgain * 100 ~ seq(100 / groupNum, 100, 100 / groupNum), type ="b",
col="blue" , main = headline ,lty = 2, xlab = "Proportion of population targeted (%)",
ylab = "Cumulative incremental gains (pc pt)",ylim = c(miny, maxy))
lines(randcumincrgain * 100 ~ seq(100 / groupNum, 100, 100 / groupNum), type = "l", col="red" , lty = 1)
legend("topright", c(Mdl, "Random Model"), col=c("blue", "red"), lty=c(2,1))}
The third function QiniPlot is created for plotting the Qini-Curves and the random curve into a coordinate. The y-axis represents the cumulative incremental gains in percentage points and the x-axis the proportion of the population targeted in percentage. Arguments for the function are the cumulative incremental gain, the random cumulative incremental gain for the random curve, the model, because of the header of the graphic and the number of groups/segments for the sections on the x-axis.
3. Executing example of the Code with the Information package in R
The Information package in R includes two datasets called train and valid, because the data set was already splitted. For using the presented code, it´s possible to merge both datasets into one, because it should get splitted during the code execution.
library(Information)
Info = rbind(valid,train)
These are three examples from the executed codes for the traditional Response Model, the Two-Model-Approach and the CVT-Model presented:
test6= PredictionProcess(Info, Info$PURCHASE, Info$TREATMENT, Model= "tradResponse")
QinitradR2 = Qini(test6[,1],test6[,2],test6[,3],Segments = 10)
QiniPlot(Model="tradResponse",QinitradR2$cumincrgain,QinitradR2$randcumincrgain,QinitradR2$groupNum)
test7= PredictionProcess(Info, Info$PURCHASE, Info$TREATMENT, Model="2ModApp" )
Qini2ModApp2 = Qini(test7[,1],test7[,2],test7[,3],Segments = 10)
QiniPlot(Model="2ModApp",Qini2ModApp2$cumincrgain,Qini2ModApp2$randcumincrgain,Qini2ModApp2$groupNum)
test8= PredictionProcess(Info, Info$PURCHASE, Info$TREATMENT, Model= "CVT")
QiniCVT2 = Qini(test8[,1],test8[,2],test8[,3],Segments = 10)
QiniPlot(Model="CVT",QiniCVT2$cumincrgain,QiniCVT2$randcumincrgain,QiniCVT2$groupNum)
Within the figures, you can see, that the Uplift Modells (Two-Modell-Approach and CVT) worked better than the traditional Response Model, because the Uplift Curve is under the random curve. This implies that the Qini-Coefficient of the traditional Response Model is negative and therefore proofs a bad performance. That means, the Uplift Models select much more effective the customers who would actually react because of a marketing action.
The Qini-Coefficient and other information about the calculations of the Qini-Code:
Qini2ModApp2
## group Nt Nc Rt Rc RtMean RcMean uplift cumincrgain Ioa
## 1 1 247 253 113 78 0.457490 0.308300 0.149189 0.013756 0.019561
## 2 2 253 247 65 45 0.256917 0.182186 0.074731 0.021615 0.019561
## 3 3 271 229 49 33 0.180812 0.144105 0.036707 0.027910 0.019561
## 4 4 226 274 39 23 0.172566 0.083942 0.088625 0.034231 0.019561
## 5 5 255 245 34 18 0.133333 0.073469 0.059864 0.040564 0.019561
## 6 6 249 251 8 11 0.032129 0.043825 -0.011696 0.039340 0.019561
## 7 7 248 252 13 12 0.052419 0.047619 0.004800 0.039708 0.019561
## 8 8 246 254 49 43 0.199187 0.169291 0.029896 0.041990 0.019561
## 9 9 258 242 62 74 0.240310 0.305785 -0.065475 0.037016 0.019561
## 10 10 255 245 78 121 0.305882 0.493878 -0.187995 0.019561 0.019561
## randcumincrgain AUC AUCrand Qini QiniTop groupNum
## 1 0.001956 0.029903 0.009683 0.02022 0.001475 10
## 2 0.003912 0.029903 0.009683 0.02022 0.001475 10
## 3 0.005868 0.029903 0.009683 0.02022 0.001475 10
## 4 0.007824 0.029903 0.009683 0.02022 0.001475 10
## 5 0.009781 0.029903 0.009683 0.02022 0.001475 10
## 6 0.011737 0.029903 0.009683 0.02022 0.001475 10
## 7 0.013693 0.029903 0.009683 0.02022 0.001475 10
## 8 0.015649 0.029903 0.009683 0.02022 0.001475 10
## 9 0.017605 0.029903 0.009683 0.02022 0.001475 10
## 10 0.019561 0.029903 0.009683 0.02022 0.001475 10
References:
Radcliffe, N. J., 2007a. Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models. Direct Marketing Journal, Direct Marketing Asso-ciation Analytics Council (1), pp. 14-21.
Kane, K., Lo, V. S., und Zheng. J., 2014. Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods. Journal of Marketing Analytics, 82(4), pp. 218-238.
Lai, L. T., 2006. Influential Marketing: A New Direct Marketing Strategy Addressing the Existence of Voluntary Buyers. Master of Science Thesis, Simon Fraser University School of Computing Science. Burnaby, BC, Canada.
Lo, V. S., 2002. The True Lift Model: A Novel Data Mining Approach to Response Model-ing in Database Marketing. ACM SIGKDD Explorations Newsletter, 4(2), pp. 78-86.
Jaskowski, M., und Jaroszewicz, S., 2012. Uplift Modeling for Clinical Trial Data. ICML 2012 Workshop on Clinical Data Analysis.
Radcliffe, N. J., und Surry, P. D., 1999. Differential Response Analysis: Modeling True Response by Isolating the Effect of a Single Action. Credit Scoring and Credit Control VI.