This project indexes people from MOT-style image sequences, describes each detected person with Florence-2, embeds those descriptions with MiniLM, and supports natural-language retrieval from a local Chroma database.
The main pipeline is now person-first:
- Read MOT frames.
- Sample them like a video at an effective FPS.
- Detect people in each sampled frame.
- Crop each person.
- Caption each crop.
- Embed the caption text.
- Store the result for search.
python app.py- no-argument shortcut
- runs
index-personswith the hardcoded defaults fromcamera_search/config.py
python app.py index- default pipeline
- detects people with OpenCV HOG
- stores one searchable record per detected person observation
python app.py index-persons- tracked pipeline
- stores one searchable record per person track summary
- supports
mot_gt,hog_iou, andyolo_bytetrack
python app.py search- searches person observations by default
- can search track summaries with
--index-type tracks
python app.py ui- text UI
- lets you change source sequence and sample FPS for the current session
- Person detector for default pipeline: OpenCV HOG people detector
- Caption model:
florence-community/Florence-2-base - Florence prompt:
<DETAILED_CAPTION> - Text embedding model:
sentence-transformers/all-MiniLM-L6-v2 - Vector store: local persistent ChromaDB
- Default track backend:
yolo_bytetrackusing Ultralytics YOLO + ByteTrack - Other track backends:
mot_gt,hog_iou
- Windows-tested Python environment
- Python 3.10+ recommended
- CPU works
- GPU is optional
- Internet may be needed on first run so Florence and MiniLM can download model files from Hugging Face
Create and activate a local virtual environment, then install dependencies.
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtThe app expects a MOT-style sequence directory:
train/
MOT17-02/
img1/
000001.jpg
000002.jpg
...
seqinfo.ini
gt/
gt.txt
Notes:
img1/andseqinfo.iniare required for all indexing.gt/gt.txtis only required for--tracking-backend mot_gt.- The repo currently includes local
train/andtest/sequence folders.
The plain CLI uses hardcoded defaults from camera_search/config.py:
DEFAULT_SOURCE_SPLIT = "train"DEFAULT_SOURCE_SEQUENCE_NAME = "MOT17-02"DEFAULT_SAMPLE_FPS = 1.0DEFAULT_TRACKING_BACKEND = "yolo_bytetrack"
That means plain CLI indexing uses:
- source:
train/MOT17-02 - effective sampling:
1 FPS
If you run python app.py with no subcommand, it currently behaves like:
python app.py index-persons --tracking-backend yolo_bytetrackIf you want to change source or sample FPS without editing code, use the UI.
Index person observations with the default source and sample FPS:
python app.py indexSearch person observations:
python app.py search --query "woman in purple jacket and white pants"
python app.py search --query "man with backpack" --top-k 10Index track summaries:
python app.py index-persons --tracking-backend mot_gt
python app.py index-persons --tracking-backend hog_iou
python app.py index-persons --tracking-backend yolo_bytetrack --yolo-model yolo11n.ptSearch track summaries:
python app.py search --query "person wearing dark clothes" --index-type tracks --top-k 5Run the text UI:
python app.py uiThe UI shows:
- current source sequence
- current sample FPS
It also lets you:
- index person observations
- index tracked person summaries with
mot_gt - index tracked person summaries with
hog_iou - index tracked person summaries with
yolo_bytetrack - search person observations
- search person tracks
- change source and sample FPS for the current session
UI changes do not rewrite config.py. They only affect the current UI run.
The project writes data locally under data/:
data/
frames/
crops/
chroma/
What gets written:
- sampled frame images into
data/frames/ - person crop images into
data/crops/ - embeddings and metadata into
data/chroma/
python app.py index
- detects people with OpenCV HOG
- captions each crop with Florence
- normalizes the description
- embeds the text
- stores one record per detected person observation
python app.py index-persons
Tracking backends:
mot_gt- uses MOT ground-truth boxes and IDs
hog_iou- uses OpenCV HOG detection plus a simple IoU tracker
yolo_bytetrack- uses Ultralytics YOLO plus ByteTrack
The tracked pipeline aggregates repeated observations into one searchable record per track_id.
On first use:
- Florence may download model files from Hugging Face
- MiniLM may download model files from Hugging Face
After that, they run from the local cache.
OpenCV HOG does not require a Hugging Face model download.
MOT sequences are stored as images, but the app treats them like pseudo-video.
It reads the sequence FPS from seqinfo.ini and converts the configured target sample FPS into a stride. For example, if the sequence is 30 FPS and the target sample FPS is 1.0, the app processes about every 30th frame.
This gives denser, more video-like indexing than the old large frame-step approach.
- This is not yet a live webcam, RTSP, or real streaming pipeline.
- Input is still MOT-style frame folders.
- The default pipeline does not yet track people across frames.
- Search results from different indexing runs still share the same collection within an index type.
- There is no collection isolation per experiment yet.
- The HOG default path is a lightweight baseline, not a strong production detector.
yolo_bytetrackis stronger, but adds Ultralytics dependency and model/runtime overhead.
Florence requires a recent Transformers version. This repo expects the project virtual environment, not an older global Python install.
Use:
.\.venv\Scripts\python.exe app.py indexUnauthenticated Hugging Face warnings are not fatal. They just mean downloads are using public access.
Make sure the selected sequence has:
img1/seqinfo.inigt/gt.txt
Current collections are shared across runs. Repeated experiments can mix old and new records until collection isolation or cleanup is added.
camera_search/
app.py
camera.py
config.py
crops.py
detector.py
florence_describer.py
mot_tracking.py
person_summary.py
tracker.py
ui.py
vector_store.py
data/
frames/
crops/
chroma/
train/
test/
app.py
requirements.txt
README.md
MEMORY.md