Skip to content

AlexBalats/Camera-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Search Over People In MOT17 Frames

This project indexes people from MOT-style image sequences, describes each detected person with Florence-2, embeds those descriptions with MiniLM, and supports natural-language retrieval from a local Chroma database.

The main pipeline is now person-first:

  1. Read MOT frames.
  2. Sample them like a video at an effective FPS.
  3. Detect people in each sampled frame.
  4. Crop each person.
  5. Caption each crop.
  6. Embed the caption text.
  7. Store the result for search.

What The Repo Does

  • python app.py
    • no-argument shortcut
    • runs index-persons with the hardcoded defaults from camera_search/config.py
  • python app.py index
    • default pipeline
    • detects people with OpenCV HOG
    • stores one searchable record per detected person observation
  • python app.py index-persons
    • tracked pipeline
    • stores one searchable record per person track summary
    • supports mot_gt, hog_iou, and yolo_bytetrack
  • python app.py search
    • searches person observations by default
    • can search track summaries with --index-type tracks
  • python app.py ui
    • text UI
    • lets you change source sequence and sample FPS for the current session

Current Model Stack

  • Person detector for default pipeline: OpenCV HOG people detector
  • Caption model: florence-community/Florence-2-base
  • Florence prompt: <DETAILED_CAPTION>
  • Text embedding model: sentence-transformers/all-MiniLM-L6-v2
  • Vector store: local persistent ChromaDB
  • Default track backend: yolo_bytetrack using Ultralytics YOLO + ByteTrack
  • Other track backends: mot_gt, hog_iou

Requirements

  • Windows-tested Python environment
  • Python 3.10+ recommended
  • CPU works
  • GPU is optional
  • Internet may be needed on first run so Florence and MiniLM can download model files from Hugging Face

Install

Create and activate a local virtual environment, then install dependencies.

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Data Layout

The app expects a MOT-style sequence directory:

train/
  MOT17-02/
    img1/
      000001.jpg
      000002.jpg
      ...
    seqinfo.ini
    gt/
      gt.txt

Notes:

  • img1/ and seqinfo.ini are required for all indexing.
  • gt/gt.txt is only required for --tracking-backend mot_gt.
  • The repo currently includes local train/ and test/ sequence folders.

Plain CLI Defaults

The plain CLI uses hardcoded defaults from camera_search/config.py:

  • DEFAULT_SOURCE_SPLIT = "train"
  • DEFAULT_SOURCE_SEQUENCE_NAME = "MOT17-02"
  • DEFAULT_SAMPLE_FPS = 1.0
  • DEFAULT_TRACKING_BACKEND = "yolo_bytetrack"

That means plain CLI indexing uses:

  • source: train/MOT17-02
  • effective sampling: 1 FPS

If you run python app.py with no subcommand, it currently behaves like:

python app.py index-persons --tracking-backend yolo_bytetrack

If you want to change source or sample FPS without editing code, use the UI.

Quick Start

Index person observations with the default source and sample FPS:

python app.py index

Search person observations:

python app.py search --query "woman in purple jacket and white pants"
python app.py search --query "man with backpack" --top-k 10

Index track summaries:

python app.py index-persons --tracking-backend mot_gt
python app.py index-persons --tracking-backend hog_iou
python app.py index-persons --tracking-backend yolo_bytetrack --yolo-model yolo11n.pt

Search track summaries:

python app.py search --query "person wearing dark clothes" --index-type tracks --top-k 5

Run the text UI:

python app.py ui

UI Behavior

The UI shows:

  • current source sequence
  • current sample FPS

It also lets you:

  • index person observations
  • index tracked person summaries with mot_gt
  • index tracked person summaries with hog_iou
  • index tracked person summaries with yolo_bytetrack
  • search person observations
  • search person tracks
  • change source and sample FPS for the current session

UI changes do not rewrite config.py. They only affect the current UI run.

Outputs

The project writes data locally under data/:

data/
  frames/
  crops/
  chroma/

What gets written:

  • sampled frame images into data/frames/
  • person crop images into data/crops/
  • embeddings and metadata into data/chroma/

Available Pipelines

Default Observation Pipeline

python app.py index

  • detects people with OpenCV HOG
  • captions each crop with Florence
  • normalizes the description
  • embeds the text
  • stores one record per detected person observation

Track Pipeline

python app.py index-persons

Tracking backends:

  • mot_gt
    • uses MOT ground-truth boxes and IDs
  • hog_iou
    • uses OpenCV HOG detection plus a simple IoU tracker
  • yolo_bytetrack
    • uses Ultralytics YOLO plus ByteTrack

The tracked pipeline aggregates repeated observations into one searchable record per track_id.

First-Run Downloads

On first use:

  • Florence may download model files from Hugging Face
  • MiniLM may download model files from Hugging Face

After that, they run from the local cache.

OpenCV HOG does not require a Hugging Face model download.

Notes On Sampling

MOT sequences are stored as images, but the app treats them like pseudo-video.

It reads the sequence FPS from seqinfo.ini and converts the configured target sample FPS into a stride. For example, if the sequence is 30 FPS and the target sample FPS is 1.0, the app processes about every 30th frame.

This gives denser, more video-like indexing than the old large frame-step approach.

Important Limitations

  • This is not yet a live webcam, RTSP, or real streaming pipeline.
  • Input is still MOT-style frame folders.
  • The default pipeline does not yet track people across frames.
  • Search results from different indexing runs still share the same collection within an index type.
  • There is no collection isolation per experiment yet.
  • The HOG default path is a lightweight baseline, not a strong production detector.
  • yolo_bytetrack is stronger, but adds Ultralytics dependency and model/runtime overhead.

Troubleshooting

Florence import/runtime issues

Florence requires a recent Transformers version. This repo expects the project virtual environment, not an older global Python install.

Use:

.\.venv\Scripts\python.exe app.py index

Hugging Face warnings

Unauthenticated Hugging Face warnings are not fatal. They just mean downloads are using public access.

mot_gt backend fails

Make sure the selected sequence has:

  • img1/
  • seqinfo.ini
  • gt/gt.txt

Search results look mixed

Current collections are shared across runs. Repeated experiments can mix old and new records until collection isolation or cleanup is added.

Repo Layout

camera_search/
  app.py
  camera.py
  config.py
  crops.py
  detector.py
  florence_describer.py
  mot_tracking.py
  person_summary.py
  tracker.py
  ui.py
  vector_store.py
data/
  frames/
  crops/
  chroma/
train/
test/
app.py
requirements.txt
README.md
MEMORY.md

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages