Level 3: Loyalty Illusion
Problem 3.1: The VIP Program Request
Ting ting! Not the monthly paycheck notification :(, but an email from Kelly, the Marketing Director.
From: Kelly (Marketing Director)
To: Data Team
Subject: VIP List for Coupon Campaign
"Hi Team! 🚀
We want to launch a 'VIP Loyalty Program' to reward our best users.
I need a list of all Repeat Customers (anyone who has placed more than 1 order with us).
Please give me the count of how many customers qualify. Thanks!
This time, use updated duckdb file: https://public.lelouvincx.com/brazilian_ecommerce.duckdb
Total Repeat Customers is: 0
Logic / Approach:
-
A "repeat customer" means a customer who comes back and places more than one order
-
In the dataset, each row in orders represents one order, and customers are identified by customer_id
-
Therefore: repeat customer = number of orders per customer > 1.
-
To solve this, we:
- Group by customer_id
- Count orders for each customer: COUNT(order_id)
- Keep customers with COUNT(order_id) > 1
WITH repeat_customers AS (
SELECT
customer_id
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 1 )
SELECT
COUNT(*) AS repeat_customer_count
FROM repeat_customers;

Problem 3.2: Check
I hope before writing a single line of code, you decided to audit the customers table to see how it's structured.
Run this query to look at the first few rows of the customers table:
SELECT *
FROM customers
LIMIT 5;
What do you see?
You should see 2 different ID columns:
customer_idcustomer_unique_id
Why would a table have two IDs for the same person?
Answer:
A table can have two IDs for the same person because one customer can appear in the system in multiple records
- customer_id represents a specific customer record, which may change if the customer creates a new account or updates their information
- customer_unique_id represents the actual person, allowing the system to link multiple records to the same customer
Having two IDs helps the business manage transactions correctly while still analyzing customer behavior at the individual level
Problem 3.3
Can you answer the problem 3.1 again?
Answer:
After reviewing the customers table, the problem should be solved using customer_unique_id instead of customer_id, because multiple customer records can belong to the same person. This provides a more accurate view of customer behavior
We want to count repeat customers by person, so we should count by customer_unique_id
However, customer_unique_id exists only in the customers table
The order history (orders count) is in the orders table
Therefore, we must join orders with customers to link each order to a person:
- Join key: orders.customer_id = customers.customer_id
- After joining, each order will have a customer_unique_id.
WITH repeat_customers AS (
SELECT
c.customer_unique_id
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
GROUP BY c.customer_unique_id
HAVING COUNT(o.order_id) > 1 )
SELECT
COUNT(*) AS repeat_customer_count
FROM repeat_customers;
Total Repeat Customers is: 2,997
