Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
docs		docs
README.md		README.md

Repository files navigation

Foundation Models and Biometrics: A Survey and Outlook

Website

IEEE-TIFS

TechRxiv

Maintenance

🚀 About the Survey

This paper provides an overview of the recent advancements in foundation models and discusses potential applications of these models in the field of biometrics. Foundation models, such as Large Language Models (LLMs), Vision Language Models (VLMs), Audio-Language Models (ALMs), and Large Multi-modal Models (LMMs), are based on large neural networks which are trained with massive amounts of data and enable robust feature extraction for transfer learning. These models allow efficient zero-shot and few-shot learning, achieving state-of-the-art performance in downstream tasks. Biometrics is also an active field of research, which involves various research problems, ranging from robust recognition to security and privacy in biometric systems. In this paper, we present an in-depth analysis of state-of-the-art methodologies regarding foundation multi-modal models, their advancements, and their applicability to biometrics tasks. We also highlight current limitations and provide insights into potential future research directions in the applications of foundation models in biometrics. To our knowledge, this paper is the first survey which investigates the applications of foundation models in biometrics. [ Link to pre-print ] [ Link to paper on IEEE-Xplore (Open Access) ]

The survey is structured as follows for clarity and readability:

Foundation Models: In this section, we review recent advancements in foundation models and mention state-of-the-art models. We catgeorise foundation models in four different catgories:
- Large Language Models (LLMs)
- Vision Language Models (VLMs)
- Audio-Language Models (ALMs)
- Large Multi-modal Models (LMMs)
Biometric Recognition and Security: In this section, we review the general pipeline of biometric systems. We describe attack points in biometric systems and discuss security and privacy threats. For information about this section, we refer the readers to Section III of our survey paper.
Applications of Foundation Models in Biometrics: In this section, we review recent papers on the applications of foundation models in biometrics:
- Foundation Models for Biometric Recognition
- Foundation Models for Soft-biometric Detection
- Foundation Models for Deepfake and Forgery Detection
- Foundation Models for Anti-spoofing
- Foundation Models for Synthetic Biometric Generation

✨ Foundation Models

In this section, we review recent advancements in foundation models and mention state-of-the-art models. We catgeorise foundation models in four different catgories:

Large Language Models (LLMs)
Vision Language Models (VLMs)
Audio-Language Models (ALMs)
Large Multi-modal Models (LMMs)

Large Language Models (LLMs)

Model	Paper Title	Year	Paper	Code
GPT	Improving language understanding by generative pre-training	2018	link	link
GPT-2	Language models are unsupervised multitask learners	2019	link	link
GPT-3	Language models are few-shot learners	2020	link	link
GPT-4	GPT-4 technical report	2023	link	NA
o1	OpenAI o1 System Card	2024	link	NA
o3-mini	OpenAI o3-mini System Card	2025	link	NA
BERT	BERT: Pretraining of deep bidirectional transformers for language understanding	2018	link	link
T5	Exploring the limits of transfer learning with a unified text-to-text transformer	2020	link	link
FLAN-T5	Scaling instruction-finetuned language models	2024	link	link
OPT	OPT: Open pre-trained transformer language models	2020	link	NA
Falcon	The Falcon Series of Open Language Models	2023	link	link
Mistral	Mistral 7B	2023	link	link
Mixtral	Mixtral of experts	2023	link	link
LLaMA	LLaMA: Open and efficient foundation language models	2023	link	link
LLaMA 2	Llama 2: Open foundation and fine-tuned chat models	2023	link	link
Vicuna	Judging LLM-as-a-judge with MT-bench and chatbot arena	2023	link	link
Gemma	Gemma: Open models based on Gemini research and technology	2024	link	link
Gemma 2	Gemma 2: Improving open language models at a practical size	2024	link	link
Nemotron 4	Nemotron-4 340B technical report	2024	link	link
Qwen	Qwen technical report	2023	link	link
Qwen 2.5	Qwen2.5 technical report	2024	link	link
Qwen 3	Qwen3 technical report	2025	link	link
Phi-4	Phi-4 technical report	2024	link	link
DeepSeek	DeepSeek LLM: Scaling open-source language models with longtermism	2024	link	link
DeepSeek-V2	DeepSeek-v2: A strong, economical, and efficient mixture-of-experts language model	2024	link	link
DeepSeek-V3	DeepSeek-V3 technical report	2024	link	link
DeepSeek-R1	DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learnin	2025	link	link
ReasonFlux	ReasonFlux: Hierarchical LLM reasoning via scaling thought templates	2025	link	link

Vision Language Models (VLMs)

Model	Paper Title	Year	Paper	Code
DINO	Emerging properties in self-supervised vision transformers	2021	link	link
DINOv2	Dinov2: Learning robust visual features without supervision	2023	link	link
BEiT	Beit: Bert pre-training of image transformers	2021	link	link
CLIP	Learning transferable visual models from natural language supervision	2021	link	link
ALIGN	Scaling up visual and vision-language representation learning with noisy text supervision	2021	link	NA
FLAVA	Flava: A foundational language and vision alignment model	2022	link	link
Florence	Florence: A new foundation model for computer vision	2021	link	NA
OFA	Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework	2022	link	link
Unified-IO	Unified-io: A unified model for vision, language, and multi-modal tasks	2022	link	link
AIM	Scalable Pre-training of Large Autoregressive Image Models	2024	link	link
AIMv2	Multimodal Autoregressive Pre-training of Large Vision Encoders	2024	link	link
BLIP	Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation	2022	link	link
BLIP 2	Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models	2023	link	link
SigLIP	Sigmoid loss for language image pre-training	2023	link	link
SigLIP 2	Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features	2025	link	link
OpenCLIP	Reproducible Scaling Laws for Contrastive Language-Image Learning	2023	link	link
SAM	Segment anything	2023	link	link
SAM~2	Sam 2: Segment anything in images and videos	2024	link	link
DALL-E	Zero-shot text-to-image generation	2021	link	NA
DALL-E 2	Hierarchical text-conditional image generation with clip latents	2022	link	NA
DALL-E~3	Improving image generation with better captions	2023	link	NA
Stable Diffusion	High-resolution image synthesis with latent diffusion models	2022	link	link
Imagen 3	Imagen 3	2024	link	NA
Edify	Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models	2024	link	NA
LlamaGen	Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation	2024	link	link
GPT-4V	GPT-4V	2024	link	NA
MiniGPT-4	Minigpt-4: Enhancing vision-language understanding with advanced large language models	2023	link	link
Flamingo	Flamingo: a Visual Language Model for Few-Shot Learning	2022	link	NA
LLaVa	Improved baselines with visual instruction tuning	2024	link	link
Video-LLaVa	Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	2023	link	link
Pixtral	Pixtral 12B	2024	link	link
Phi-3.5-Vision	Phi-3 technical report: A highly capable language model locally on your phone	2024	link	link
VILA	Vila: On pre-training for visual language models	2024	link	link
NVILA	NVILA: Efficient frontier visual language models	2024	link	link
VILA-U	Vila-u: a unified foundation model integrating visual understanding and generation	2024	link	link
TokenFlow	Tokenflow: Unified image tokenizer for multimodal understanding and generation	2024	link	link
VAR	Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	2024	link	link
InstructBLIP	Instructblip: Towards general-purpose vision-language models with instruction tuning	2023	link	link
Yi-VL	Yi: Open foundation models by 01. ai	2024	link	link
Qwen-VL	Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond	2023	link	link
Qwen2-VL	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	2024	link	link
Qwen2.5-VL	Qwen2.5-VL Technical Report	2025	link	link
InternVL	Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks	2024	link	link
InternVL 1.5	How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites	2024	link	link
InternVL3	Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models	2025	link	link
InternVideo2	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	2024	link	link
LLaVa-OneVision	LLaVA-OneVision: Easy Visual Task Transfer	2024	link	link
LLaVa-NeXT	Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models	2024	link	link
CogVLM2	Cogvlm2: Visual language models for image and video understanding	2024	link	link
Bunny	Efficient multimodal learning from data-centric perspective	2024	link	link
Chameleon	Chameleon: Mixed-Modal Early-Fusion Foundation Models	2024	link	link
Apollo	Apollo: An Exploration of Video Understanding in Large Multimodal Models	2024	link	link
DeepSeek-VL	DeepSeek-VL: Towards Real-World Vision-Language Understanding	2024	link	link
DeepSeek-VL2	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	2024	link	link
Emu 3	Emu3: Next-Token Prediction is All You Need	2024	link	link
Janus	Janus: Decoupling visual encoding for unified multimodal understanding and generation	2024	link	link
JanusFlow	JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation	2024	link	link
Janus-Pro	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	2025	link	link
Movie Gen	Movie Gen: A Cast of Media Foundation Models	2024	link	NA
Mochi	[blog] Mochi 1: A new SOTA in open text-to-video	2024	link	link
Imagen Video	Imagen video: High definition video generation with diffusion models	2022	link	NA
Make-A-Video	Make-A-Video: Text-to-Video Generation without Text-Video Data	2023	link	link
Tune-A-Video	Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation	2023	link	link
PixelDance	Make pixels dance: High-dynamic video generation	2024	link	link
CogVideoX	CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer	2024	link	link
FlashVideo	FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation	2025	link	link
Goku	Goku: Flow Based Video Generative Foundation Models	2025	link	link
T2V	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	2025	link	link
Sora	[blog] Sora: Creating Video from Text	2024	link	NA

Audio-Language Models (ALMs)

Model	Paper Title	Year	Paper	Code
Wav2Vec 2.0	wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations	2020	link	link
HuBERT	Hubert: Self-supervised speech representation learning by masked prediction of hidden units	2021	link	link
WavLM	Wavlm: Large-scale self-supervised pre-training for full stack speech processing	2022	link	link
Whisper	Robust speech recognition via large-scale weak supervision	2023	link	link
USM	Google usm: Scaling automatic speech recognition beyond 100 languages	2023	link	NA
UniAudio	Uniaudio: An audio foundation model toward universal audio generation	2023	link	link
MERT	Mert: Acoustic music understanding model with large-scale self-supervised training	2023	link	link
CLAP	Clap learning audio concepts from natural language supervision	2023	link	link
SenseVoice	Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms	2024	link	link
CosyVoice	Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens	2024	link	link
Vall-E	Neural codec language models are zero-shot text to speech synthesizers	2023	link	NA
SpeechT5	Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing	2021	link	link
SLM	Slm: Bridge the thin gap between speech and text foundation models	2023	link	NA
AudioGPT	Audiogpt: Understanding and generating speech, music, sound, and talking head	2024	link	link
SpeechGPT	SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities	2023	link	link
AudioPaLM	Audiopalm: A large language model that can speak and listen	2023	link	NA
SALMONN	SALMONN: Towards Generic Hearing Abilities for Large Language Models	2024	link	link
WavLLM	Wavllm: Towards robust and adaptive speech large language model	2024	link	link
Pengi	Pengi: An audio language model for audio tasks	2023	link	link
LTU	Listen, Think, and Understand	2024	link	link
GAMA	GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities	2024	link	link
Qwen2-Audio	Qwen2-audio technical report	2024	link	link
SeamlessM4T	SeamlessM4T-Massively Multilingual & Multimodal Machine Translation	2023	link	link
Step-Audio	Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction	2025	link	link
MusicLM	Musiclm: Generating music from text	2023	link	NA
AudioLDM	Audioldm: Text-to-audio generation with latent diffusion models	2023	link	link

Large Multi-modal Models (LMMs)

Model	Paper Title	Year	Paper	Code
AudioCLIP	Audioclip: Extending clip to image, text and audio	2022	link	link
4M	4m: Massively multimodal masked modeling	2023	link	link
ImageBind	Imagebind: One embedding space to bind them all	2023	link	link
PandaGPT	Pandagpt: One model to instruction-follow them all	2023	link	link
NeXT-GPT	NExT-GPT: Any-to-Any Multimodal LLM	2023	link	link
Video-LLaMA	Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding	2023	link	link
Video-LLaMA2	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	2024	link	link
Video-SALMONN	video-salmonn: Speech-enhanced audio-visual large language models	2024	link	link
Gemini Pro	Gemini: a family of highly capable multimodal models	2023	link	NA
LLaMA 3	The llama 3 herd of models	2024	link	link
Qwen2.5-Omni	Qwen2. 5-omni technical report	2025	link	link
GPT-4o	GPT-4o System Card	2024	link	NA

🔥 Applications of Foundation Models in Biometrics

In this section, we review recent papers on the applications of foundation models in biometrics:

Foundation Models for Biometric Recognition
Foundation Models for Soft-biometric Detection
Foundation Models for Deepfake and Forgery Detection
Foundation Models for Anti-spoofing
Foundation Models for Synthetic Biometric Generation

Foundation Models for Biometric Recognition

Paper Title	Year	Modality / Task	Paper	Code
Exploring wav2vec 2.0 on speaker verification and language identification	2020	speaker and language identification	link	NA
ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities	2024	face verification, gender detection, age estimation	link	NA
How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability	2024	face verification	link	NA
ChatGPT Meets Iris Biometrics	2024	iris recognition	link	NA
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition	2025	face verification	link	NA
Benchmarking Foundation Models for Zero-Shot Biometric Tasks	2025	face verification, soft biometric attribute prediction (gender and race), iris recognition, iris presentation attack detection, face morph detection, and face deepfake detection	link	NA
A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding	2021	speaker verification	link	link
Iris-SAM: Iris Segmentation Using a Foundation Model	2024	iris segmentation	link	link
SAM-Iris: A SAM-Based Iris Segmentation Algorithm	2025	iris segmentation	link	NA
Froundation: Are foundation models ready for face recognition?	2024	face recognition	link	link
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	2025	audio-visual human video recognition (emotion recognition, expression description, and action understanding)	link	link
FaceLLM: A Multimodal Large Language Model for Face Understanding	2025	face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting	link	link
FaceXBench: Evaluating Multimodal LLMs on Face Understanding	2025	face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting	link	link
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants	2025	facial attributes, age estimation, expression recognition, attack detection, recognition; human attributes, action, spatial/social relations, re-ID	link	link
From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing	2024	face recognition explainability	link	NA
FaceOracle: Chat with a Face Image Oracle	2025	face image quality assessment	link	NA
Unispeech-sat: Universal speech representation learning with speaker aware pre-training	2022	speaker ID, verification, diarization, phoneme recognition, keyword spotting, emotion recognition	link	link
Large-scale self-supervised speech representation learning for automatic speaker verification	2022	speaker verification	link	link
General facial representation learning in a visual-linguistic manner	2022	face parsing, alignment, attribute recognition	link	link
Marlin: Masked autoencoder for facial video representation learning	2023	face attribute recognition, expression recognition, deepfake detection, lip synchronization	link	link
Self-Supervised Facial Representation Learning with Facial Region Awareness	2024	face expression and attribute recognition	link	link
Pose-disentangled contrastive learning for self-supervised facial representation	2023	face expression, face recognition, head pose estimation	link	link
Pros: Facial omni-representation learning via prototype-based self-distillation	2024	face parsing, attribute recognition, emotion detection, landmark detection	link	link
ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces	2024	face expression change, weight change, age change estimation	link	NA
SwinFace: a multi-task transformer for face recognition, expression recognition, age estimation and attribute estimation	2023	face attributes, age estimation, expression recognition, face recognition	link	link
FaceXFormer: A Unified Transformer for Facial Analysis	2024	face parsing, landmarks, head pose estimation, age/gender/race estimation, attribute recognition, expression recognition,	link	link
Task-adaptive Q-Face	2024	head pose estimation, face attribute recognition, age estimation, expression recognition	link	NA
Faceptor: A generalist model for face perception	2024	face parsing, landmarks, age and gender estimation, attribute recognition, expression recognition, face recognition	link	link

Foundation Models for Soft-biometric Detection

Paper Title	Year	Modality / Task	Paper	Code
Robust light-weight facial affective behavior recognition with clip	2024	facial expression classification; action unit detection	link	link
Cliper: A unified vision-language framework for in-the-wild facial expression recognition	2024	face static & dynamic expression recognition	link	link
Emoclip: A vision-language method for zero-shot video facial expression recognition	2024	video facial emotion recognition	link	link
Finecliper: Multi-modal fine-grained clip for dynamic facial expression recognition with adapters	2024	dynamic facial expression recognition	link	NA
Face-mllm: A large face perception model	2024	face age/gender, expression, action units, attributes	link	NA
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces	2024	face 3DMM parameter generation	link	NA
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	2025	face attribute detection	link	link
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning	2025	face expression recognition, action unit detection, facial attribute detection, age estimation, and deepfake detection	link	link
FaceInsight: A Multimodal Large Language Model for Face Perception	2025	face attribute recognition, age/ gender/ race estimation, and expression prediction	link	NA
R1-omni: Explainable omni-multimodal emotion recognition with reinforcement learning	2025	audio-visual emotion recognition with reasoning	link	link
ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities	2024	face gender detection, age estimation	link	NA
How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability	2024	age, gender, ethnicity, hair color	link	NA
ChatGPT Meets Iris Biometrics	2024	iris–face matching; soft-biometrics	link	NA

Foundation Models for Deepfake and Forgery Detection

Paper Title	Year	Modality / Task	Paper	Code
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection	2024	face forgery detection	link	link
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection	2024	face forgery detection	link	link
MADation: Face Morphing Attack Detection with Foundation Models	2025	face morph attack detection	link	link
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning	2024	deepfake detection, anti-spoofing, unseen diffusion forgery	link	link
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation	2022	voice spoofing & deepfake detection	link	link
X2-dfd: A framework for explainable and extendable deepfake detection	2024	face deepfake detection	link	link
Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant	2024	forgery analysis assistant	link	link
Towards general visual-linguistic face forgery detection (v2)	2025	face forgery detection	link	link
Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection	2024	face morph attack detection	link	link
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models	2024	speaker deepfake detection	link	NA
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector	2025	face deepfake detection \newline+ description	link	link
Standing on the shoulders of giants: Reprogramming visual-language model for general deepfake detection	2025	face deepfake detection	link	link
Can chatgpt detect deepfakes? a study of using multimodal large language models for media forensics	2024	face deepfake detection	link	link
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception	2024	audio-visual deepfake detection	link	NA
ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models	2025	face morph detection	link	NA

Foundation Models for Anti-spoofing

Paper Title	Year	Modality / Task	Paper	Code
Flip: Cross-domain face anti-spoofing with language guidance	2023	fine‐tune CLIP image encoder for face (FLIP alignment)	link	link
On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection	2023	SSL via masked‐fingerprint prediction with prompt tuning	link	NA
CPL-CLIP: Compound Prompt Learning for Flexible-Modal Face Anti-Spoofing	2024	face anti-spoofing	link	NA
Fm-clip: Flexible modal clip for face anti-spoofing	2024	cross‐modal antispoofing	link	NA
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection	2024	Unified physical-digital face attack detection	link	NA
Cfpl-fas: Class free prompt learning for generalizable face anti-spoofing	2024	face anti-spoofing	link	NA
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing	2025	face anti-spoofing	link	link
Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing	2025	Multimodal face anti-spoofing	link	link
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models	2025	face anti-spoofing (classification and attack localization)	link	link
Interpretable face anti-spoofing: Enhancing generalization with multimodal large language models	2025	face anti-spoofing	link	NA
Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning	2025	face anti-spoofing (spoofing detection and reasoning)	link	NA
VL-FAS: Domain Generalization via Vision-Language Model For Face Anti-Spoofing	2024	face anti‐spoofing	link	NA
FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection	2025	face anti‐spoofing	link	link
Towards Iris Presentation Attack Detection with Foundation Models	2025	iris anti‐spoofing	link	NA
Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning	2025	face presentation attack detection	link	link
Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection?	2025	face presentation attack detection	link	link
Shield: An evaluation benchmark for face spoofing and forgery detection with multimodal large language models	2025	face anti-spoofing (RGB, infrared, depth) and forgery detection	link	link
ChatGPT Meets Iris Biometrics	2024	iris presentation‐attack detection	link	NA

Foundation Models for Synthetic Biometric Generation

Paper Title	Year	Modality / Task	Paper	Code
Toward open-world text-driven face generation and manipulation via stylegan3	2024	Text-to-face synthesis	link	NA
AnyFace++: A unified framework for free-style text-to-face synthesis and manipulation	2024	Text-guided face editing	link	NA
AnyFace: Free-style text-to-face synthesis and manipulation	2022	Text-to-face generation	link	NA
Towards counterfactual image manipulation via clip	2022	Controllable text-to-face	link	link
Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation	2024	Prompt-based face synthesis	link	NA
Tecm-clip: Text-based controllable multi-attribute face image manipulation	2022	face attribute / expression editing	link	link
Stylemc: Multi-channel based fast text-guided image generation and manipulation	2022	face multi-attribute editing	link	link
Photoverse: Tuning-free image customization with text-to-image diffusion models	2023	Few-shot personalised face portrait generation	link	link
Fastcomposer: Tuning-free multi-subject image generation with localized attention	2024	fast subject-driven face text-to-image	link	link
Moa: Mixture-of-attention for subject-context disentanglement in personalized image generation	2024	multi-concept face portrait generation	link	NA
Photomaker: Customizing realistic human photos via stacked id embedding	2024	high-fidelity face personalisation	link	link
Face0: Instantaneously conditioning a text-to-image model on a face	2023	Identity-preserving face text-to-image	link	NA
Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models	2023	face instant personalisation	link	link
Dreamidentity: Improved editability for efficient face-identity preserved image generation	2023	face identity-guided generation	link	NA
Portraitbooth: A versatile portrait model for fast identity-preserved personalization	2024	face few-shot portrait generation	link	NA
Instantid: Zero-shot identity-preserving generation in seconds	2024	face real-time personalisation	link	link
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning	2024	face identity-consistent generation	link	link
Facestudio: Put your face everywhere in seconds	2023	face ID & style controllable text-to-image	link	link
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models	2024	identity-aware face editing	link	NA
Arc2face: A foundation model for id-consistent human faces	2024	identity-conditioned face generation	link	link
Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model	2024	General identity-conditioned face generation	link	link
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance	2025	Identity-conditioned 3D head / avatar generation	link	link
ClipSwap: Towards High Fidelity Face Swapping via Attributes and CLIP-Informed Loss	2024	Face swapping	link	NA

📚 Missing Papers

We will keep updating the git repository and webpage of our survey. Please contact the first author (hatef.otroshi@idiap.ch) or complete the following form to add your paper:

👉 Submit Your Paper

We appreciate your contributions and look forward to keeping this survey comprehensive and up to date!

©️ Citation

If you find this survey useful, please consider citing it:

@article{fmbiometrics2025survey,
  title={Foundation Models and Biometrics: A Survey and Outlook},
  author={Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
  journal={IEEE Transactions on Information Forensics and Security},
  year={2025},
  publisher={IEEE}
}

About

[IEEE T-IFS] Foundation Models and Biometrics: A Survey and Outlook

otroshi.github.io/foundation-models-biometrics/

security privacy survey biometrics face-recognition language-model morph alm vlm iris-recognition synthetic-data antispoofing lmm deepfake foundation-models llm mllm

Report repository

Contributors