Skip to content

rkharwar-nv/dataset_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retail Datasets — A New PM's Field Guide

A curated, opinionated reference to the most widely used public retail datasets, written from a Product Manager's perspective. For each dataset you get: what it is, license, schema, a real sample row, and how a retail PM would actually use it.

Quick navigation

  • datasets/ — one deep-dive page per dataset
  • scorecards/ — PM scorecard templates (RFM, delivery SLA, promo lift, forecast accuracy)
  • regulatory/ — GDPR / CCPA / LGPD / PCI checklist for retail data
  • samples/ — small sample CSVs for the freely-licensed datasets
  • scripts/ — fetch scripts for restricted datasets
  • notebooks/ — Jupyter quickstarts (load → profile → visualize)

Master table

# Dataset What it's about License
1 UCI Online Retail / II UK non-store online retailer, 2009–2011, ~1M invoice lines, gifts + wholesale CC BY 4.0
2 Brazilian E-Commerce by Olist ~100K real BR orders 2016–2018, 9 relational tables incl. reviews & geo CC BY-NC-SA 4.0
3 Instacart Online Grocery 3M+ grocery orders, 200K users, reorder patterns Instacart Open Dataset (non-commercial, no redistribution)
4 Amazon Customer Reviews 75GB+ product reviews & metadata across categories Amazon CR License (research only)
5 Walmart M5 Forecasting Daily sales for 3,049 SKUs × 10 stores, CA/TX/WI Kaggle Competition Rules
6 H&M Personalized Fashion 2yr fashion transactions + customer metadata + product images H&M Competition Rules
7 Dunnhumby — Complete Journey 2yr household grocery transactions + promo + demographics Dunnhumby Source Files (academic/non-commercial)
8 Google Analytics Sample (GA4) 3mo obfuscated event-level data, Google Merchandise Store Google APIs ToS
9 US Census Monthly Retail Trade Monthly/annual retail sales by NAICS, quarterly e-commerce Public domain
10 Kaggle E-Commerce Data (UK) 541K transactions from UK online retailer (mirror of UCI) CC0
11 Retail Transactions (synthetic) Synthetic transactions for basket/segmentation prototyping CC0

Suggested reading order for a new Retail PM

  1. US Census MRTS — macro context, 30 min
  2. UCI Online Retail — simplest schema, get hands-on
  3. Olist — multi-table modeling + review/funnel signals
  4. Instacart + Dunnhumby — basket-level intuition
  5. M5 + H&M — forecasting & fashion-specific dynamics
  6. GA4 sample — read your own site's analytics fluently

How attribution works in this repo

Each dataset page is split into:

  • From the dataset — content verbatim or paraphrased from the source page, with link.
  • General knowledge — my PM framing and use-case suggestions.

Both are clearly labeled inline.

Licensing

The writeups, scorecards, checklists, and code in this repo are licensed under CC BY 4.0. The datasets themselves are not — each retains its own license, reproduced on its page.

If you want to add a sample to samples/, the dataset must allow redistribution. When in doubt, use scripts/fetch_samples.py instead.

About

This repository studies all the retail public datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors