A ready-to-use data science environment for VS Code, designed for data science and ML bootcamp students. Covers data visualization, data cleaning, feature engineering, and traditional machine learning.
All users
- Docker Desktop (Windows / Mac) or Docker Engine (Linux)
- VS Code with the Dev Containers extension
NVIDIA GPU users (also required)
- NVIDIA driver ≥570 (download)
- NVIDIA Container Toolkit (Linux only, not needed on Windows)
Mac users: GPU acceleration (Metal/MPS) does not pass through to Docker containers. The Mac configuration uses native ARM64 CPU, no extra setup needed beyond Docker Desktop.
-
Fork this repository (click Fork at the top of this page)
-
Clone your fork:
git clone https://github.com/<your-username>/datascience-devcontainer.git
-
Open the folder in VS Code, then open the Command Palette (
Ctrl+Shift+P/Cmd+Shift+P) and run Dev Containers: Open Folder in ContainerVS Code will ask which configuration to use, pick the one that matches your machine (see table below).
-
Verify your setup by running
notebooks/environment_test.ipynb
| If you have... | Choose this |
|---|---|
| NVIDIA GPU (GTX 10xx / RTX / Quadro / Tesla) | DataScience NVIDIA |
| Windows or Linux machine, no NVIDIA GPU | DataScience CPU |
| Apple Silicon Mac (M1 / M2 / M3 / M4) | DataScience Mac |
Not sure if your GPU is compatible? Check: NVIDIA CUDA GPUs (need compute capability ≥6.0).
Fork this repo once, then use it as a GitHub template to spin up new projects instantly.
- Go to your fork on GitHub
- Click Settings → scroll to Template repository → enable it
-
Go to your fork and click Use this template → Create a new repository
-
Name your new repo and click Create repository
-
Clone it and start working:
git clone https://github.com/<your-username>/my-new-project.git
-
Clean it up - remove anything that doesn't belong to your project:
- Update
README.mdto describe your project - Delete unused devcontainer configs (e.g. if you only use CPU, remove
nvidia/andmac/) - Remove or replace
notebooks/environment_test.ipynbwith your own notebooks - Delete test data from
data/
git add -A && git commit -m "Initial project setup" && git push
- Update
pip install <package-name>-
Create a
requirements.txtin the repository root:lightgbm shap -
Add a
postCreateCommandto the relevant.devcontainer/*/devcontainer.json:"postCreateCommand": "pip install -r requirements.txt"
-
Rebuild the container (
Ctrl+Shift+P→ Dev Containers: Rebuild Container)
# Add upstream once
git remote add upstream https://github.com/gperdrizet/datascience-devcontainer.git
# Pull in updates
git fetch upstream && git merge upstream/main| Package | Purpose |
|---|---|
| numpy, pandas, scipy | Core data science stack |
| scikit-learn, xgboost, statsmodels | Machine learning and statistics |
| matplotlib, seaborn, plotly | Visualization |
| optuna | Hyperparameter optimization |
| jupyterlab | Interactive notebooks |
| cupy-cuda12x | GPU-accelerated arrays (NVIDIA only) |
| python-dotenv | Environment variable management |
Requires compute capability ≥6.0 (Pascal / GTX 10xx or newer):
| Architecture | Example GPUs | Compute Capability |
|---|---|---|
| Pascal | GTX 1050–1080, Tesla P100 | 6.0–6.1 |
| Volta | Tesla V100, Titan V | 7.0 |
| Turing | RTX 2060–2080, GTX 1660 | 7.5 |
| Ampere | RTX 3060–3090, A100 | 8.0–8.6 |
| Ada Lovelace | RTX 4060–4090 | 8.9 |
| Hopper | H100, H200 | 9.0 |
| Blackwell | RTX 5070–5090, B100, B200 | 10.0 |
datascience-devcontainer/
├── .devcontainer/
│ ├── nvidia/
│ │ └── devcontainer.json # NVIDIA GPU configuration
│ ├── cpu/
│ │ └── devcontainer.json # CPU configuration
│ └── mac/
│ └── devcontainer.json # Mac (ARM64) configuration
├── data/ # Store datasets here
├── notebooks/
│ └── environment_test.ipynb # Verify your setup
├── .gitignore
├── LICENSE
└── README.md
| Problem | Solution |
|---|---|
| Docker won't start | Enable virtualization in BIOS / enable WSL2 on Windows |
| Permission denied (Linux) | Add your user to the docker group, then log out and back in |
| GPU not detected | Update NVIDIA drivers (≥570); Linux: install NVIDIA Container Toolkit |
| Container build fails | Check your internet connection |
| Module not found | Add the package to requirements.txt and rebuild the container |