Semantic Search Over People In MOT17 Frames

This project indexes people from MOT-style image sequences, describes each detected person with Florence-2, embeds those descriptions with MiniLM, and supports natural-language retrieval from a local Chroma database.

The main pipeline is now person-first:

Read MOT frames.
Sample them like a video at an effective FPS.
Detect people in each sampled frame.
Crop each person.
Caption each crop.
Embed the caption text.
Store the result for search.

What The Repo Does

python app.py
- no-argument shortcut
- runs index-persons with the hardcoded defaults from camera_search/config.py
python app.py index
- default pipeline
- detects people with OpenCV HOG
- stores one searchable record per detected person observation
python app.py index-persons
- tracked pipeline
- stores one searchable record per person track summary
- supports mot_gt, hog_iou, and yolo_bytetrack
python app.py search
- searches person observations by default
- can search track summaries with --index-type tracks
python app.py ui
- text UI
- lets you change source sequence and sample FPS for the current session

Current Model Stack

Person detector for default pipeline: OpenCV HOG people detector
Caption model: florence-community/Florence-2-base
Florence prompt: <DETAILED_CAPTION>
Text embedding model: sentence-transformers/all-MiniLM-L6-v2
Vector store: local persistent ChromaDB
Default track backend: yolo_bytetrack using Ultralytics YOLO + ByteTrack
Other track backends: mot_gt, hog_iou

Requirements

Windows-tested Python environment
Python 3.10+ recommended
CPU works
GPU is optional
Internet may be needed on first run so Florence and MiniLM can download model files from Hugging Face

Install

Create and activate a local virtual environment, then install dependencies.

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Data Layout

The app expects a MOT-style sequence directory:

train/
  MOT17-02/
    img1/
      000001.jpg
      000002.jpg
      ...
    seqinfo.ini
    gt/
      gt.txt

Notes:

img1/ and seqinfo.ini are required for all indexing.
gt/gt.txt is only required for --tracking-backend mot_gt.
The repo currently includes local train/ and test/ sequence folders.

Plain CLI Defaults

The plain CLI uses hardcoded defaults from camera_search/config.py:

DEFAULT_SOURCE_SPLIT = "train"
DEFAULT_SOURCE_SEQUENCE_NAME = "MOT17-02"
DEFAULT_SAMPLE_FPS = 1.0
DEFAULT_TRACKING_BACKEND = "yolo_bytetrack"

That means plain CLI indexing uses:

source: train/MOT17-02
effective sampling: 1 FPS

If you run python app.py with no subcommand, it currently behaves like:

python app.py index-persons --tracking-backend yolo_bytetrack

If you want to change source or sample FPS without editing code, use the UI.

Quick Start

Index person observations with the default source and sample FPS:

python app.py index

Search person observations:

python app.py search --query "woman in purple jacket and white pants"
python app.py search --query "man with backpack" --top-k 10

Index track summaries:

python app.py index-persons --tracking-backend mot_gt
python app.py index-persons --tracking-backend hog_iou
python app.py index-persons --tracking-backend yolo_bytetrack --yolo-model yolo11n.pt

Search track summaries:

python app.py search --query "person wearing dark clothes" --index-type tracks --top-k 5

Run the text UI:

python app.py ui

UI Behavior

The UI shows:

current source sequence
current sample FPS

It also lets you:

index person observations
index tracked person summaries with mot_gt
index tracked person summaries with hog_iou
index tracked person summaries with yolo_bytetrack
search person observations
search person tracks
change source and sample FPS for the current session

UI changes do not rewrite config.py. They only affect the current UI run.

Outputs

The project writes data locally under data/:

data/
  frames/
  crops/
  chroma/

What gets written:

sampled frame images into data/frames/
person crop images into data/crops/
embeddings and metadata into data/chroma/

Available Pipelines

Default Observation Pipeline

python app.py index

detects people with OpenCV HOG
captions each crop with Florence
normalizes the description
embeds the text
stores one record per detected person observation

Track Pipeline

python app.py index-persons

Tracking backends:

mot_gt
- uses MOT ground-truth boxes and IDs
hog_iou
- uses OpenCV HOG detection plus a simple IoU tracker
yolo_bytetrack
- uses Ultralytics YOLO plus ByteTrack

The tracked pipeline aggregates repeated observations into one searchable record per track_id.

First-Run Downloads

On first use:

Florence may download model files from Hugging Face
MiniLM may download model files from Hugging Face

After that, they run from the local cache.

OpenCV HOG does not require a Hugging Face model download.

Notes On Sampling

MOT sequences are stored as images, but the app treats them like pseudo-video.

It reads the sequence FPS from seqinfo.ini and converts the configured target sample FPS into a stride. For example, if the sequence is 30 FPS and the target sample FPS is 1.0, the app processes about every 30th frame.

This gives denser, more video-like indexing than the old large frame-step approach.

Important Limitations

This is not yet a live webcam, RTSP, or real streaming pipeline.
Input is still MOT-style frame folders.
The default pipeline does not yet track people across frames.
Search results from different indexing runs still share the same collection within an index type.
There is no collection isolation per experiment yet.
The HOG default path is a lightweight baseline, not a strong production detector.
yolo_bytetrack is stronger, but adds Ultralytics dependency and model/runtime overhead.

Troubleshooting

Florence import/runtime issues

Florence requires a recent Transformers version. This repo expects the project virtual environment, not an older global Python install.

Use:

.\.venv\Scripts\python.exe app.py index

Hugging Face warnings

Unauthenticated Hugging Face warnings are not fatal. They just mean downloads are using public access.

`mot_gt` backend fails

Make sure the selected sequence has:

img1/
seqinfo.ini
gt/gt.txt

Search results look mixed

Current collections are shared across runs. Repeated experiments can mix old and new records until collection isolation or cleanup is added.

Repo Layout

camera_search/
  app.py
  camera.py
  config.py
  crops.py
  detector.py
  florence_describer.py
  mot_tracking.py
  person_summary.py
  tracker.py
  ui.py
  vector_store.py
data/
  frames/
  crops/
  chroma/
train/
test/
app.py
requirements.txt
README.md
MEMORY.md

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
camera_search		camera_search
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search Over People In MOT17 Frames

What The Repo Does

Current Model Stack

Requirements

Install

Data Layout

Plain CLI Defaults

Quick Start

UI Behavior

Outputs

Available Pipelines

Default Observation Pipeline

Track Pipeline

First-Run Downloads

Notes On Sampling

Important Limitations

Troubleshooting

Florence import/runtime issues

Hugging Face warnings

`mot_gt` backend fails

Search results look mixed

Repo Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Search Over People In MOT17 Frames

What The Repo Does

Current Model Stack

Requirements

Install

Data Layout

Plain CLI Defaults

Quick Start

UI Behavior

Outputs

Available Pipelines

Default Observation Pipeline

Track Pipeline

First-Run Downloads

Notes On Sampling

Important Limitations

Troubleshooting

Florence import/runtime issues

Hugging Face warnings

mot_gt backend fails

Search results look mixed

Repo Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mot_gt` backend fails

Packages