How do promotional offers affect Starbucks users? — A data exploration.

This blog post describes the results of my data exploration with the simulated dataset provided by Starbucks. The exploration is the final project challenge from the Data Scientist Nanodegree in Udacity. The dataset to be dealt with is simulated data on how customers make purchases and how their decisions might be affected by promotional offers of Starbucks. People in the simulation are provided with three different types of offers: 1. buy-one-get-one(BOGO), 2. discount and 3. informational. They produced various events, like receiving the offers, viewing the offers and possibly making purchases. Each person will have his or her own hidden traits on their purchasing patterns. The goal of this project is to identify groups of people that share similar purchasing patterns and if they are influenced by promotional offers.

The problem

Starbucks, as well as other companies, are always interested in identifying different customer segments. This project is designed to simulate customer purchasing patterns when given different types of promotional offers. There are two main questions that we’d like to know about this dataset.

  1. How can we identify different groups of customers? How are they in common?
  2. How do different groups of customers respond to a different type of promotional offers?

The data exploration

This dataset contains three different parts: customer profile, promotion portfolio and the transcript of different events that customers produce. The following figures illustrate how the raw data looks like.

Fig 1: The Head of the provided Profile Dataset

This customer profile data contains the customer’s id, age, gender, income and the date he or she became a member. To further look into customers’ information, distribution plot on their ages and incomes are shown below. In the distribution plot of customers’ age, a peak shows up at the age close to 120 is because the ‘NaN’ is set to be this value. For the plot of income, all the ‘NaN’ case is dropped.

Fig 2: The Distribution Plot of the Customers’ Age
Fig 3: The Distribution Plot of the Customers’ Income

The following plot presents all different types of promotional offers, including offer id, offer type, difficulty (how much you need to spend to complete an offer, in dollars), rewards (reward given by completing an offer, in dollars), duration (in hours) and channels.

Fig 4: The Promotion Portfolio Dataset

Figure 5 shows the example of a transcript of events that customers produce. It includes person id, event, time and value.

Fig 5: The Head of the Transcript Dataset

The data preprocessing

To preprocess the datasets, we need to combine them together into a dataset that reveals all the personal and transactional information for each person. The function is defined to extract values including total transaction amount, percentage of the valid transaction and valid completion of a promotional offer. One situation to be pointed out here is, in some case, people can complete the offer even without viewing it. This is not a valid complete since the promotion doesn’t make any influence on customers purchasing. The following figure shows the combined dataset that comes out of the preprocessing procedure.

Fig 6: The Head of the Combined Dataset

Question 1: How can we identify different groups of customers? How are they in common?

In order to identify different groups of customers, AggolmerativeClustering clustering algorithm is applied for segmentation. SSE (sum of squared values of distances to cluster centre) is tracked in terms of k value (number of clusters). Finally, the optimal k value is found to be 4, where the decreasing speed of SEE with k dramatically drops.

After building the clusters, it’s necessary to investigate the unique purchasing patterns of different clusters.

Fig 7: Clustering Identification

Figure 7 shows the distribution of income, total transaction time and total transaction amount of these 4 clusters. Interestingly we found that income is the main factor that differentiates clusters. Moreover, customers in different clusters also clearly show their own purchasing patterns. For example, customers in Cluster 0 (blue), who have the highest income, usually have the lowest transaction times while they pay the most in total among all the customers. Customers in Cluster 3 (red), who have the lowest income, purchase relatively often but spend the least money in total. These observations agree with common sense and prove that customers are well identified into different groups.

Question 2: How do different groups of customers respond to a different type of promotional offers?

Customers’ response is investigated based on the clustering result from Q1. The percentages of how many complete BOGO or discount promotions are compared among all 4 clusters as well as with total average percentage.

Fig 8: Valid Offer Percentage for Different Clusters

According to the figure above, customers in Cluster 0 (blue) and Cluster 2 (green) have a higher percentage in completing the promotion, while Cluster 3 (red) barely responds to either BOGO or discount offer. Meanwhile, compared to BOGO, the discount offer usually makes more influences in customer purchasing behaviours, especially for people in Cluster 1 (yellow) and Cluster 3 (red).

My conclusion

By applying the clustering algorithm, the simulated Starbuck’s customers are divided into different segments, which mostly differentiated by personal income. Customer purchasing patterns are clearly demonstrated and in good agreement with common sense.

Customers in different segments also respond to promotional offers in a different way. In conclusion, the discount promotion will receive better response among lower-income customers but will show a similar complete percentage among higher-income customers. Higher-income customers better respond to a promotion offer than lower-income customers.

Founder of Beyond, PM Library & Riptide, Ex-N26, Co-Organizer of Product Tank Hamburg & Barcelona