Level 3: Loyalty Illusion

Problem 3.1: The VIP Program Request

Ting ting! Not the monthly paycheck notification :(, but an email from Kelly, the Marketing Director.

From: Kelly (Marketing Director)

To: Data Team

Subject: VIP List for Coupon Campaign

"Hi Team! 🚀

We want to launch a 'VIP Loyalty Program' to reward our best users.

I need a list of all Repeat Customers (anyone who has placed more than 1 order with us).

Please give me the count of how many customers qualify. Thanks!

tip

This time, use updated duckdb file: https://public.lelouvincx.com/brazilian_ecommerce.duckdb

Total Repeat Customers is: 0

Logic / Approach:

A "repeat customer" means a customer who comes back and places more than one order
In the dataset, each row in orders represents one order, and customers are identified by customer_id
Therefore: repeat customer = number of orders per customer > 1.
To solve this, we:

Group by customer_id

Count orders for each customer: COUNT(order_id)

Keep customers with COUNT(order_id) > 1

WITH repeat_customers AS (
  SELECT
    customer_id
  FROM orders
  GROUP BY customer_id
  HAVING COUNT(order_id) > 1 )
SELECT
  COUNT(*) AS repeat_customer_count
FROM repeat_customers;

Problem 3.2: Check

info

I hope before writing a single line of code, you decided to audit the customers table to see how it's structured.

Run this query to look at the first few rows of the customers table:

SELECT *
FROM customers
LIMIT 5;

What do you see?

You should see 2 different ID columns:

customer_id
customer_unique_id

Why would a table have two IDs for the same person?

Answer:

A table can have two IDs for the same person because one customer can appear in the system in multiple records

customer_id represents a specific customer record, which may change if the customer creates a new account or updates their information

customer_unique_id represents the actual person, allowing the system to link multiple records to the same customer
Having two IDs helps the business manage transactions correctly while still analyzing customer behavior at the individual level

Problem 3.3

info

Can you answer the problem 3.1 again?

Answer:

After reviewing the customers table, the problem should be solved using customer_unique_id instead of customer_id, because multiple customer records can belong to the same person. This provides a more accurate view of customer behavior
We want to count repeat customers by person, so we should count by customer_unique_id
However, customer_unique_id exists only in the customers table
The order history (orders count) is in the orders table
Therefore, we must join orders with customers to link each order to a person:

Join key: orders.customer_id = customers.customer_id

After joining, each order will have a customer_unique_id.

WITH repeat_customers AS (
  SELECT
   c.customer_unique_id
  FROM orders o
  JOIN customers c
   ON o.customer_id = c.customer_id
  GROUP BY c.customer_unique_id
  HAVING COUNT(o.order_id) > 1 )
SELECT
  COUNT(*) AS repeat_customer_count
FROM repeat_customers;

Total Repeat Customers is: 2,997

Problem 3.1: The VIP Program Request​

Problem 3.2: Check​

Problem 3.3​

Problem 3.1: The VIP Program Request

Problem 3.2: Check

Problem 3.3