Skip to content

sijeong-kim/3d-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

318 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diversifying Text-to-3D Generation with Repulsive 3D Gaussian Splatting

📌 MSc Individual Research Project — Imperial College London
Author: Sijeong Kim


📌 Overview

This repository investigates how repulsion-based optimization can improve diversity and stability in text-to-3D generation using 3D Gaussian Splatting (3DGS).

🚩 Problem

Standard SDS-based text-to-3D pipelines often produce:

  • nearly identical shapes across runs,
  • mode collapse,
  • unstable geometry or over-smoothing.

✨ Core Idea

Introduce feature-space repulsion (DINOv2 / CLIP features) into DreamGaussian training so that Gaussian particles spread apart in semantic space while maintaining fidelity.

✅ Key Contributions

Repulsion variants implemented

  • SVGD repulsion
  • RLSD-style feature repulsion
  • Baseline (no repulsion)

Feature-space guidance

  • DINOv2 / CLIP embeddings
  • RBF & cosine kernels

Large-scale evaluation

  • ↑ 98% semantic diversity
  • CLIP fidelity preserved (ΔCLIP ≈ −0.006)
  • Multi-view consistency C > 0.83
  • Human perceptual study (n = 41)

Reproducible research pipeline

  • Automatic sweeps
  • Multi-scene parallel training
  • Run metadata, configs, CSVs, and plots auto-generated

🎬 Demo Results

Comparison of Our Best Model with Baseline (seed=42)

Prompt Baseline Ours (Best)
"a small saguaro cactus plated in a clay pot"
"a photo of an ice cream"
"an ice cream sundae"
"a photo of a hamburger"
"a photo of a tulip"
"a bulldozer made out of toy bricks"

🚀 Installation

git clone https://github.com/sijeong-kim/3D-Generation.git
cd 3D-Generation

# Local interactive environment
bash scripts/envs/setup_interactive.sh

# Or cluster environment (SLURM)
bash scripts/envs/setup_sbatch.sh

⚡️ Quick Start

✅ Single run (baseline)

python main_ours.py --config configs/text_baseline.yaml \
    prompt="a photo of a hamburger"

✅ Repulsion-enabled run (ours)

python main_ours.py --config configs/text_ours.yaml \
    prompt="a photo of a hamburger" \
    repulsion_type=rlsd \
    kernel_type=rbf \
    lambda_repulsion=1000 \
    num_particles=8 \
    outdir=exp/demo

✅ Automatic experiment sweeps

bash scripts/experiments/run_exp_interactive.sh exp6_ours_best

✅ SLURM (cluster)

sbatch scripts/exp_sbatch/run_exp_sbatch.sh exp6_ours_best

📁 Output Structure

exp/
  ├── <sweep_name>/<config_name>/
  │    ├── config.yaml
  │    ├── run_metadata.yaml
  │    ├── out / err
  │    └── figures/ (PSNR, SSIM, CLIP, diversity stats, Pareto plots)

Repository Structure

3D-Generation/
├── configs/               # YAML configs & sweep definitions
├── scripts/
│   ├── experiments/       # Local interactive runs
│   └── exp_sbatch/        # SLURM submit helpers
├── analysis/              # Result parsing & plotting
├── guidance/              # Feature extraction (CLIP/DINOv2) + RNG hooks
├── results/               # Example outputs & CSVs
├── main_ours.py           # Main training pipeline (ours)
├── main_pure_baseline.py  # DreamGaussian baseline
├── kernels.py             # RBF & cosine kernels
├── feature_extractor.py   # Feature-space similarity backend
├── gs_renderer.py         # Gaussian Splatting renderer utilities
├── metrics.py             # CLIP, consistency, and diversity metrics
└── visualizer.py          # Particle visualization

References


Acknowledgements

This work was conducted as part of the MSc programme at Imperial College London. GitHub Copilot and Cursor were used only for boilerplate refactoring; all design, implementation, experiments, and the report were completed by Sijeong Kim.