1. Dataset Introduction
Conceptual Overview
The world you’re entering:
- Olist connects small sellers to big online marketplaces.
- Orders are shipped through logistics partners across Brazil.
- The Brazilian E-Commerce Public Dataset by Olist captures ~100k real orders from 2016–2018, including customers, orders, payments, deliveries, and reviews.

Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners.
After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.
ERD
ERD link: https://dbdocs.io/lelouvincx/brazillian-ecommerce
You begin with read access to the following:
-
orders(central fact table)order_id— unique order keyorder_status— delivered / shipped / canceled / etc.order_estimated_delivery_date— when Olist promised deliveryorder_delivered_customer_date— when the customer actually got it
-
order_reviewsorder_idreview_score— 1 to 5 stars
-
customerscustomer_idcustomer_state— two-letter state code (e.g., RJ, SP, MG)
Later levels may optionally touch:
order_items(each line item in an order)sellers(seller info)geolocation(ZIP → lat/long)
Rule: For this quest, you must always filter on customers in customer_state = 'RJ'. That’s your battlefield.
Dataset schema (full)
// Brazilian E-commerce Database Schema
Project brazilian_ecommerce {
database_type: 'PostgreSQL'
Note: '''
# Brazilian E-commerce Dataset ERD
This database contains information about orders, customers, products, sellers, and reviews from a Brazilian e-commerce platform (Olist).
**Source**: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
**Dataset size**: ~100K orders, 33K products, 3K sellers
**Time period**: 2016-2018
'''
}
Table customers {
customer_id uuid [primary key, note: 'Unique identifier for each customer (per order)']
customer_unique_id uuid [note: 'Unique identifier for customer across multiple orders']
customer_zip_code_prefix int [note: '5-digit ZIP code prefix']
customer_city varchar [note: 'Customer city name']
customer_state varchar(2) [note: '2-letter state abbreviation (e.g., SP, RJ)']
Note: 'Customer information. One customer_id per order, but customer_unique_id links multiple orders from same customer.'
indexes {
customer_unique_id
customer_zip_code_prefix
(customer_city, customer_state)
}
}
Table sellers {
seller_id uuid [primary key, note: 'Unique identifier for each seller/merchant']
seller_zip_code_prefix int [note: '5-digit ZIP code prefix of seller location']
seller_city varchar [note: 'Seller city name']
seller_state varchar(2) [note: '2-letter state abbreviation']
Note: 'Seller/merchant information. Each seller can have multiple products in order_items.'
indexes {
seller_zip_code_prefix
(seller_city, seller_state)
}
}
Table products {
product_id uuid [primary key, note: 'Unique identifier for each product']
product_category_name varchar [note: 'Product category name in Portuguese']
product_name_lenght int [null, note: 'Length of product name in characters']
product_description_lenght int [null, note: 'Length of product description in characters']
product_photos_qty int [null, note: 'Number of product photos']
product_weight_g int [null, note: 'Product weight in grams']
product_length_cm int [null, note: 'Product length in centimeters']
product_height_cm int [null, note: 'Product height in centimeters']
product_width_cm int [null, note: 'Product width in centimeters']
Note: 'Product catalog with dimensions and category. Physical dimensions used for shipping calculations.'
indexes {
product_category_name
}
}
Table product_category_translation {
product_category_name varchar [primary key, note: 'Category name in Portuguese']
product_category_name_english varchar [note: 'Category name translated to English']
Note: 'Translation table for product categories from Portuguese to English. Contains 70 category mappings.'
}
Table geolocation {
geolocation_zip_code_prefix int [note: '5-digit ZIP code prefix']
geolocation_lat decimal(10,8) [note: 'Latitude coordinate']
geolocation_lng decimal(11,8) [note: 'Longitude coordinate']
geolocation_city varchar [note: 'City name']
geolocation_state varchar(2) [note: '2-letter state abbreviation']
Note: 'Geographic lookup table with 1M+ records mapping ZIP codes to coordinates. Multiple entries per ZIP code due to coordinate variations.'
indexes {
geolocation_zip_code_prefix [name: 'idx_geo_zip']
(geolocation_city, geolocation_state) [name: 'idx_geo_location']
}
}
Table orders {
order_id uuid [primary key, note: 'Unique identifier for each order']
customer_id uuid [not null, note: 'Reference to customer who placed the order']
order_status varchar [note: 'Order status: delivered, shipped, canceled, etc.']
order_purchase_timestamp timestamp [not null, note: 'When the order was created']
order_approved_at timestamp [null, note: 'When payment was approved']
order_delivered_carrier_date timestamp [null, note: 'When order was handed to logistics carrier']
order_delivered_customer_date timestamp [null, note: 'When order was delivered to customer']
order_estimated_delivery_date timestamp [note: 'Estimated delivery date shown to customer']
Note: 'Central fact table for orders. Contains order lifecycle timestamps from purchase to delivery.'
indexes {
customer_id
order_status
order_purchase_timestamp
}
}
Table order_items {
order_id uuid [note: 'Reference to parent order']
order_item_id int [note: 'Sequential number identifying item within order (1, 2, 3...)']
product_id uuid [not null, note: 'Reference to product purchased']
seller_id uuid [not null, note: 'Reference to seller fulfilling this item']
shipping_limit_date timestamp [note: 'Deadline for seller to ship the item']
price decimal(10,2) [note: 'Item price in BRL (Brazilian Real)']
freight_value decimal(10,2) [note: 'Shipping cost for this item in BRL']
Note: 'Order line items. Bridge table connecting orders to products and sellers. Each order can have multiple items from different sellers.'
indexes {
(order_id, order_item_id) [pk]
product_id
seller_id
shipping_limit_date
}
}
Table order_payments {
order_id uuid [note: 'Reference to parent order']
payment_sequential int [note: 'Sequential number for multiple payments per order (1, 2, 3...)']
payment_type varchar [note: 'Payment method: credit_card, boleto, voucher, debit_card']
payment_installments int [note: 'Number of installments (common in Brazil)']
payment_value decimal(10,2) [note: 'Payment amount in BRL']
Note: 'Payment information. Orders can have multiple payment methods (e.g., split payment). One payment_sequential per payment type used.'
indexes {
(order_id, payment_sequential) [pk]
payment_type
}
}
Table order_reviews {
review_id uuid [pk, note: 'Unique identifier for each review']
order_id uuid [not null, note: 'Reference to order being reviewed']
review_score int [note: 'Rating from 1 to 5 stars']
review_comment_title varchar [null, note: 'Review title written by customer']
review_comment_message text [null, note: 'Review message/body written by customer']
review_creation_date timestamp [note: 'When customer wrote the review']
review_answer_timestamp timestamp [note: 'When review was processed/answered']
Note: 'Customer reviews and ratings. Not all orders have reviews. Contains satisfaction scores and optional text feedback.'
indexes {
order_id
review_score
review_creation_date
}
}
// ===== RELATIONSHIPS =====
// Customer to Orders (1:M)
Ref has: customers.customer_id < orders.customer_id
// Orders to Order Items (1:M)
Ref contains: orders.order_id < order_items.order_id
// Orders to Order Payments (1:M)
Ref paid_by: orders.order_id < order_payments.order_id
// Orders to Order Reviews (1:1 or 1:0)
Ref reviewed_in: orders.order_id - order_reviews.order_id
// Products to Order Items (1:M)
Ref sold_in: products.product_id < order_items.product_id
// Sellers to Order Items (1:M)
Ref fulfills: sellers.seller_id < order_items.seller_id
// Product Category Translation to Products (1:M)
Ref categorizes: product_category_translation.product_category_name < products.product_category_name
// Geolocation to Customers (M:M via zip code lookup)
Ref locates_customer: geolocation.geolocation_zip_code_prefix < customers.customer_zip_code_prefix
// Geolocation to Sellers (M:M via zip code lookup)
Ref locates_seller: geolocation.geolocation_zip_code_prefix < sellers.seller_zip_code_prefix
Acknowledgements
Thanks to Olist for releasing this dataset.
Original link: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce