UniV-Baseline

The Official Baseline for UniV: the First Large-Scale Video-Based University Geo-Localization Benchmark

Drone Video ↔ Satellite Image Cross-View Retrieval & Navigation

• Watch full results on Bilibili:

🔥 Highlights

First video-to-satellite geo-localization benchmark with real drone videos (not image-only!)
Two challenging tasks:
→ Task 1: Video-based drone-view → satellite localization
→ Task 2: Satellite-guided drone video navigation
Zero university overlap between train/val/test → extremely challenging domain gap
Full training/evaluation code + pretrained weights released
Strong baseline using Video2BEV + Two-stage training

Paper: Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization (ICCV 2025)

TODOs

~~(Optional) Release the 2-fps BEVs for both training and evaluation~~
~~Release the requirements.txt~~
~~Release the UniV dataset~~
~~Release the weight of the second stage~~
~~Release the evaluation code for the second stage~~
~~Release the training code for the second stage~~
~~Release the weight of the first stage~~
~~Release the evaluation code for the first stage~~
~~Release the training code for the first stage~~

.
├── 30
│   ├── 10fps
│   │   ├── test
│   │   │   └── gallery_drone
│   │   └── train
│   │       └── drone
│   ├── 2fps
│   │   ├── test
│   │   │   └── gallery_drone
│   │   └── train
│   │       └── drone
│   └── 5fps
│       ├── test
│       │   └── gallery_drone
│       └── train
│           └── drone
├── 45
│   ├── 10fps
│   │   ├── test
│   │   │   └── gallery_drone
│   │   └── train
│   │       └── drone
│   ├── 2fps
│   │   ├── test
│   │   │   ├── gallery_drone
│   │   │   ├── gallery_satellite
│   │   │   └── gallery_street
│   │   └── train
│   │       ├── drone
│   │       ├── google
│   │       ├── satellite
│   │       └── street
│   └── 5fps
│       ├── test
│       │   └── gallery_drone
│       └── train
│           └── drone
├── dataset_split.json
└── organize_univ.py

We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.

Getting started

Installation

conda create --name video2bev python=3.7
# pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

# (optional but recommended) install apex
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

If you have any question of installing apex, please refer to issue-2 first, then search for possible solutions.

Dataset & Preparation

Download UniV.
cat and unzip the dataset:
- mkdir UniV
- cat UniV.tar.xz.* | tar -xJf - -C UniV --transform='s/\*/_/g'
- update the path of UniV in organize_univ.py and python ./organize_univ.py
[Optional] If you are interested in reproducing or evaluating the proposed Video2BEV, please ~~feel free to contact us and ask for~~ download BEVs and synthetic negative samples (which is fine-tuned via diffusers) via baidu-disk or google-drive.
- Unzip UniV-supp
  - cd UniV-supp
  - cat UniV-supp.tar.xz.* | tar -xvJf - --transform 's|.*/|UniV-supp/|'
- Organize UniV-supp
  - set path in organize_univ-supp.py (in the above downloaded)
  - python ./organize_univ-supp.py
Note
- 281 not in training split(2fps), you can download from University 1652 or extract from high fps

Training & Evaluation

Training

First-stage training & evaluation

First-stage training:
- Check to first-stage branch by git checkout first-stage
- Refer to this file
First-stage evaluation:
- Check to first-stage branch by git checkout first-stage
- Refer to this file

# Train:
# In the first stage, we fine-tune the encoder with the instance loss and contrastive loss.
sh train.sh
# Evaluation:
python test_collect_weights.py;
sh test.sh

Second-stage training & evaluation

Second-stage training:
- Check to second-stage-training branch by git checkout second-stage-training
- Refer to this file
Second-stage evaluation:
- Check to second-stage-evalution branch by git checkout second-stage-evalution
- Refer to this file

# Train:
# In the second stage, we freeze the encoder and train mlps with matching loss.
# please change contents in train.sh
sh train.sh

# Evaluation:
# please change contents in test_collect_weights.py and test.sh
python test_collect_weights.py;
sh test.sh

Weights

Download link

.
├── first-stage
│   ├── 30-degree
│   │   └── model_xxxx_xxxx
│   │       └── two_view_long_share_d0.75_256_s1
│   │           └── model_xxxx_xxxx_xxx
│   │               ├── net_9301.pth
│   │               └── opts.yaml
│   └── 45-degree
│       └── model_2024-08-20-19_19_36
│           └── two_view_long_share_d0.75_256_s1
│               └── model_2024-08-20-19_19_36_059
│                   ├── net_059.pth
│                   └── opts.yaml
├── second-stage
│   ├── 30degree-2fps
│   │   └── model_2024-11-02-03-05-31.zip
│   ├── 45degree-2fps
│   │   └── model_2024-10-05-02_49_11.zip
│   └── 45degree-2fps-better
│       └── model_2024-10-20-06_02_09.zip
└── vit_small_p16_224-15ec54c9.pth

Choose the weight and unzip it. Then put it in the root path in the working directory for your repo.

PS:

model_2024-11-02-03-05-31 is the weight for 30-degree UniV (2fps) and model_2024-10-05-02_49_11 is the weight for 45-degree UniV (2fps)
- The evaluation number should be the same as our paper
By tuning hyper-parameter, we can get a better result.

Citation

The following paper uses and reports the result of the baseline model. You may cite it in your paper.

@article{ju2024video2bev,
  title={Video2bev: Transforming drone videos to bevs for video-based geo-localization},
  author={Ju, Hao and Huang, Shaofei and Liu, Si and Zheng, Zhedong},
  journal=ICCV,
  year={2025}
}

Others:

@article{zheng2020university,
  title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
  author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
  journal={ACM Multimedia},
  year={2020}
}
@article{zheng2017dual,
  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={10.1145/3383184},
  volume={16},
  number={2},
  pages={1--23},
  year={2020},
  publisher={ACM New York, NY, USA}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
README.md		README.md
pull.sh		pull.sh
requirements.txt		requirements.txt
update.sh		update.sh

Split for the each subset	#data	#buildings	#universities
Training	701 vids + 12364 imgs	701	33
Query_drone	701 vids	701	39
Query_satellite	701 imgs	701	39
Query_ground	2,579 imgs	701	39
Gallery_drone	951 vids	951	39
Gallery_satellite	951 imgs	951	39
Gallery_ground	2,921 imgs	793	39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniV-Baseline

The Official Baseline for UniV: the First Large-Scale Video-Based University Geo-Localization Benchmark

🔥 Highlights

TODOs

Table of contents

About Dataset

Download

Getting started

Installation

Dataset & Preparation

Training & Evaluation

Training

First-stage training & evaluation

Second-stage training & evaluation

Weights

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UniV-Baseline

The Official Baseline for UniV: the First Large-Scale Video-Based University Geo-Localization Benchmark

🔥 Highlights

TODOs

Table of contents

About Dataset

Download

Getting started

Installation

Dataset & Preparation

Training & Evaluation

Training

First-stage training & evaluation

Second-stage training & evaluation

Weights

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages