Skip to content

mever-team/auvire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AuViRe

Implementation of WACV 2026 paper "AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization" by Christos Koutlis and Symeon Papadopoulos, available at https://arxiv.org/abs/2511.18993.

auvire

With the rapid advancement of sophisticated synthetic audio-visual content, e.g., for subtle malicious manipulations, ensuring the integrity of digital media has become paramount. This work presents a novel approach to temporal localization of deepfakes by leveraging Audio-Visual Speech Representation Reconstruction (AuViRe). Specifically, our approach reconstructs speech representations from one modality (e.g., lip movements) based on the other (e.g., audio waveform). Cross-modal reconstruction is significantly more challenging in manipulated video segments, leading to amplified discrepancies, thereby providing robust discriminative cues for precise temporal forgery localization. AuViRe outperforms the state of the art by +8.9 AP@0.95 on LAV-DF, +9.6 AP@0.5 on AV-Deepfake1M, and +5.1 AUC on an in-the-wild experiment. Code will be publicly available upon acceptance.

Setup

Clone the repo

cd $HOME
git clone https://github.com/mever-team/auvire

Build the environment

# Create and activate with conda
conda create -n auvire python=3.10
conda activate auvire

# Install torch-related dependences
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install sox

# Install AVHubert related dependences (fairseq)
git clone https://github.com/facebookresearch/av_hubert
cd av_hubert
git submodule init
git submodule update
cp -r fairseq ../auvire
cd ../auvire/fairseq
pip install --editable ./
cd ..

# Install rest dependences
pip install -r requirements.txt

⚠️ In fairseq/fairseq/data/indexed_dataset.py replace np.float with float to avoid errors.

AVHubert dependences

Download the checkpoint from https://facebookresearch.github.io/av_hubert/ and place it in src/avhubert/base_lrs3_iter4.pt. Download the src/avhubert/misc/20words_mean_face.npy and src/avhubert/misc/shape_predictor_68_face_landmarks.dat as described in https://colab.research.google.com/drive/1bNXkfpHiVHzXQH8WjGhzQ-fsDxolpUjD#scrollTo=fenUTcC2Disi.

Data

To obtain the data first download LAV-DF as described in https://github.com/ControlNet/LAV-DF and AV-Deepfake1M as described in https://github.com/ControlNet/AV-Deepfake1M. Put all data under data/.

AuViRe expects AVHubert features as input so they should be extraced for all videos:

cp external/inference.py ../av_hubert/avhubert
cd ../av_hubert/avhubert
python inference.py -d lavdf -i <part-index>  # Should be 0-999
python inference.py -d avdeepfake1m -i <part-index>  # Should be 0-999

⚠️ Note that the feature extraction requires the AVHubert environment built as described in https://github.com/facebookresearch/av_hubert.

The ablation analysis expects to also have (Ma et al. 2022) features extracted for LAV-DF, which can be done by following instructions in https://github.com/mever-team/dimodif.

Checkpoints

Download the model checkpoints from https://zenodo.org/records/17698401 and place them in ckpt.

Results

To obtain the core results of the paper run:

python scripts/results.py

Train

To train AuVire on LAV-DF and AV-Deepfake1M run:

python scripts/train.py -d lavdf
python scripts/train.py -d avdeepfake1m

Training logs, in json format, and model checkpoints, in pth format, will be created in ckpt folder.

⚠️ We already provide them so to re-run the training, one should first move them.

Test

To evaluate AuVire on the test set of LAV-DF and the validation set of AV-Deepfake1M (in-dataset and cross-dataset) run:

python scripts/test.py

⚠️ Note for AV-Deepfake1M: The above evaluates on the validation set, not the test set. For evaluation on the test set, get predictions with scripts/predict.py and submit to Codabench (cf. https://deepfakes1m.github.io/2024/evaluation for details):

# Predict with AuViRe trained on LAV-DF
TRAINED_ON="lavdf"
python scripts/predict.py -r "$TRAINED_ON"
zip -j "results/avdeepfake1m_test_predictions/$TRAINED_ON/prediction.zip" \
    "results/avdeepfake1m_test_predictions/$TRAINED_ON/dfd/prediction.txt" \
    "results/avdeepfake1m_test_predictions/$TRAINED_ON/tfl/prediction.json"

# Predict with AuViRe trained on AV-Deepfake1M
TRAINED_ON="avdeepfake1m"
python scripts/predict.py -r "$TRAINED_ON"
zip -j "results/avdeepfake1m_test_predictions/$TRAINED_ON/prediction.zip" \
    "results/avdeepfake1m_test_predictions/$TRAINED_ON/dfd/prediction.txt" \
    "results/avdeepfake1m_test_predictions/$TRAINED_ON/tfl/prediction.json"

The results we got from Codabench are provided in results/avdeepfake1m_test_predictions/lavdf/metrics.json and results/avdeepfake1m_test_predictions/avdeepfake1m/metrics.json.

Real-world analysis

To download the real-world data run:

python real-world-data/download.py

To obtain inference results on the real-world data run:

export PYTHONPATH="$HOME/auvire:$HOME/auvire/fairseq"
python scripts/itw.py -d lavdf -i <video-index>  # Should be 0-370

Robustness

To conduct the robustness analysis run:

export PYTHONPATH="$HOME/auvire:$HOME/auvire/fairseq"
python scripts/robustness.py -m without
python scripts/robustness.py -m visual -i <visual-distortion-type-level-index>  # Should be 0-34
python scripts/robustness.py -m audio -i <audio-distortion-type-level-index>  # Should be 0-19
python -u scripts/robustness.py -m backbone -i <audio-visual-distortion-type-index>  # Should be 0-10

Ablation

To conduct the ablation analysis run:

python scripts/ablation.py -i <ablation-index>  # Should be 0-21

Slurm

We run our experiment inside a slurm cluster. For your convenience we provide our sbatch files in slurm/sh.

Contact

Christos Koutlis (ckoutlis@iti.gr)

About

Implementation of paper "AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors