Skip to content

otroshi/foundation-models-biometrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 

Repository files navigation

Foundation Models and Biometrics: A Survey and Outlook

🚀 About the Survey

This paper provides an overview of the recent advancements in foundation models and discusses potential applications of these models in the field of biometrics. Foundation models, such as Large Language Models (LLMs), Vision Language Models (VLMs), Audio-Language Models (ALMs), and Large Multi-modal Models (LMMs), are based on large neural networks which are trained with massive amounts of data and enable robust feature extraction for transfer learning. These models allow efficient zero-shot and few-shot learning, achieving state-of-the-art performance in downstream tasks. Biometrics is also an active field of research, which involves various research problems, ranging from robust recognition to security and privacy in biometric systems. In this paper, we present an in-depth analysis of state-of-the-art methodologies regarding foundation multi-modal models, their advancements, and their applicability to biometrics tasks. We also highlight current limitations and provide insights into potential future research directions in the applications of foundation models in biometrics. To our knowledge, this paper is the first survey which investigates the applications of foundation models in biometrics. [ Link to pre-print ] [ Link to paper on IEEE-Xplore (Open Access) ]

The survey is structured as follows for clarity and readability:

  • Foundation Models: In this section, we review recent advancements in foundation models and mention state-of-the-art models. We catgeorise foundation models in four different catgories:

    • Large Language Models (LLMs)
    • Vision Language Models (VLMs)
    • Audio-Language Models (ALMs)
    • Large Multi-modal Models (LMMs)
  • Biometric Recognition and Security: In this section, we review the general pipeline of biometric systems. We describe attack points in biometric systems and discuss security and privacy threats. For information about this section, we refer the readers to Section III of our survey paper.

  • Applications of Foundation Models in Biometrics: In this section, we review recent papers on the applications of foundation models in biometrics:

    • Foundation Models for Biometric Recognition
    • Foundation Models for Soft-biometric Detection
    • Foundation Models for Deepfake and Forgery Detection
    • Foundation Models for Anti-spoofing
    • Foundation Models for Synthetic Biometric Generation

✨ Foundation Models

In this section, we review recent advancements in foundation models and mention state-of-the-art models. We catgeorise foundation models in four different catgories:

Large Language Models (LLMs)

Model Paper Title Year Paper Code
GPT Improving language understanding by generative pre-training 2018 link link
GPT-2 Language models are unsupervised multitask learners 2019 link link
GPT-3 Language models are few-shot learners 2020 link link
GPT-4 GPT-4 technical report 2023 link NA
o1 OpenAI o1 System Card 2024 link NA
o3-mini OpenAI o3-mini System Card 2025 link NA
BERT BERT: Pretraining of deep bidirectional transformers for language understanding 2018 link link
T5 Exploring the limits of transfer learning with a unified text-to-text transformer 2020 link link
FLAN-T5 Scaling instruction-finetuned language models 2024 link link
OPT OPT: Open pre-trained transformer language models 2020 link NA
Falcon The Falcon Series of Open Language Models 2023 link link
Mistral Mistral 7B 2023 link link
Mixtral Mixtral of experts 2023 link link
LLaMA LLaMA: Open and efficient foundation language models 2023 link link
LLaMA 2 Llama 2: Open foundation and fine-tuned chat models 2023 link link
Vicuna Judging LLM-as-a-judge with MT-bench and chatbot arena 2023 link link
Gemma Gemma: Open models based on Gemini research and technology 2024 link link
Gemma 2 Gemma 2: Improving open language models at a practical size 2024 link link
Nemotron 4 Nemotron-4 340B technical report 2024 link link
Qwen Qwen technical report 2023 link link
Qwen 2.5 Qwen2.5 technical report 2024 link link
Qwen 3 Qwen3 technical report 2025 link link
Phi-4 Phi-4 technical report 2024 link link
DeepSeek DeepSeek LLM: Scaling open-source language models with longtermism 2024 link link
DeepSeek-V2 DeepSeek-v2: A strong, economical, and efficient mixture-of-experts language model 2024 link link
DeepSeek-V3 DeepSeek-V3 technical report 2024 link link
DeepSeek-R1 DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learnin 2025 link link
ReasonFlux ReasonFlux: Hierarchical LLM reasoning via scaling thought templates 2025 link link

Vision Language Models (VLMs)

Model Paper Title Year Paper Code
DINO Emerging properties in self-supervised vision transformers 2021 link link
DINOv2 Dinov2: Learning robust visual features without supervision 2023 link link
BEiT Beit: Bert pre-training of image transformers 2021 link link
CLIP Learning transferable visual models from natural language supervision 2021 link link
ALIGN Scaling up visual and vision-language representation learning with noisy text supervision 2021 link NA
FLAVA Flava: A foundational language and vision alignment model 2022 link link
Florence Florence: A new foundation model for computer vision 2021 link NA
OFA Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework 2022 link link
Unified-IO Unified-io: A unified model for vision, language, and multi-modal tasks 2022 link link
AIM Scalable Pre-training of Large Autoregressive Image Models 2024 link link
AIMv2 Multimodal Autoregressive Pre-training of Large Vision Encoders 2024 link link
BLIP Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation 2022 link link
BLIP 2 Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models 2023 link link
SigLIP Sigmoid loss for language image pre-training 2023 link link
SigLIP 2 Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features 2025 link link
OpenCLIP Reproducible Scaling Laws for Contrastive Language-Image Learning 2023 link link
SAM Segment anything 2023 link link
SAM~2 Sam 2: Segment anything in images and videos 2024 link link
DALL-E Zero-shot text-to-image generation 2021 link NA
DALL-E 2 Hierarchical text-conditional image generation with clip latents 2022 link NA
DALL-E~3 Improving image generation with better captions 2023 link NA
Stable Diffusion High-resolution image synthesis with latent diffusion models 2022 link link
Imagen 3 Imagen 3 2024 link NA
Edify Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models 2024 link NA
LlamaGen Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation 2024 link link
GPT-4V GPT-4V 2024 link NA
MiniGPT-4 Minigpt-4: Enhancing vision-language understanding with advanced large language models 2023 link link
Flamingo Flamingo: a Visual Language Model for Few-Shot Learning 2022 link NA
LLaVa Improved baselines with visual instruction tuning 2024 link link
Video-LLaVa Video-LLaVA: Learning United Visual Representation by Alignment Before Projection 2023 link link
Pixtral Pixtral 12B 2024 link link
Phi-3.5-Vision Phi-3 technical report: A highly capable language model locally on your phone 2024 link link
VILA Vila: On pre-training for visual language models 2024 link link
NVILA NVILA: Efficient frontier visual language models 2024 link link
VILA-U Vila-u: a unified foundation model integrating visual understanding and generation 2024 link link
TokenFlow Tokenflow: Unified image tokenizer for multimodal understanding and generation 2024 link link
VAR Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction 2024 link link
InstructBLIP Instructblip: Towards general-purpose vision-language models with instruction tuning 2023 link link
Yi-VL Yi: Open foundation models by 01. ai 2024 link link
Qwen-VL Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond 2023 link link
Qwen2-VL Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution 2024 link link
Qwen2.5-VL Qwen2.5-VL Technical Report 2025 link link
InternVL Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks 2024 link link
InternVL 1.5 How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites 2024 link link
InternVL3 Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models 2025 link link
InternVideo2 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 2024 link link
LLaVa-OneVision LLaVA-OneVision: Easy Visual Task Transfer 2024 link link
LLaVa-NeXT Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models 2024 link link
CogVLM2 Cogvlm2: Visual language models for image and video understanding 2024 link link
Bunny Efficient multimodal learning from data-centric perspective 2024 link link
Chameleon Chameleon: Mixed-Modal Early-Fusion Foundation Models 2024 link link
Apollo Apollo: An Exploration of Video Understanding in Large Multimodal Models 2024 link link
DeepSeek-VL DeepSeek-VL: Towards Real-World Vision-Language Understanding 2024 link link
DeepSeek-VL2 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding 2024 link link
Emu 3 Emu3: Next-Token Prediction is All You Need 2024 link link
Janus Janus: Decoupling visual encoding for unified multimodal understanding and generation 2024 link link
JanusFlow JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation 2024 link link
Janus-Pro Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling 2025 link link
Movie Gen Movie Gen: A Cast of Media Foundation Models 2024 link NA
Mochi [blog] Mochi 1: A new SOTA in open text-to-video 2024 link link
Imagen Video Imagen video: High definition video generation with diffusion models 2022 link NA
Make-A-Video Make-A-Video: Text-to-Video Generation without Text-Video Data 2023 link link
Tune-A-Video Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation 2023 link link
PixelDance Make pixels dance: High-dynamic video generation 2024 link link
CogVideoX CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer 2024 link link
FlashVideo FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation 2025 link link
Goku Goku: Flow Based Video Generative Foundation Models 2025 link link
T2V Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model 2025 link link
Sora [blog] Sora: Creating Video from Text 2024 link NA

Audio-Language Models (ALMs)

Model Paper Title Year Paper Code
Wav2Vec 2.0 wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations 2020 link link
HuBERT Hubert: Self-supervised speech representation learning by masked prediction of hidden units 2021 link link
WavLM Wavlm: Large-scale self-supervised pre-training for full stack speech processing 2022 link link
Whisper Robust speech recognition via large-scale weak supervision 2023 link link
USM Google usm: Scaling automatic speech recognition beyond 100 languages 2023 link NA
UniAudio Uniaudio: An audio foundation model toward universal audio generation 2023 link link
MERT Mert: Acoustic music understanding model with large-scale self-supervised training 2023 link link
CLAP Clap learning audio concepts from natural language supervision 2023 link link
SenseVoice Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms 2024 link link
CosyVoice Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens 2024 link link
Vall-E Neural codec language models are zero-shot text to speech synthesizers 2023 link NA
SpeechT5 Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing 2021 link link
SLM Slm: Bridge the thin gap between speech and text foundation models 2023 link NA
AudioGPT Audiogpt: Understanding and generating speech, music, sound, and talking head 2024 link link
SpeechGPT SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities 2023 link link
AudioPaLM Audiopalm: A large language model that can speak and listen 2023 link NA
SALMONN SALMONN: Towards Generic Hearing Abilities for Large Language Models 2024 link link
WavLLM Wavllm: Towards robust and adaptive speech large language model 2024 link link
Pengi Pengi: An audio language model for audio tasks 2023 link link
LTU Listen, Think, and Understand 2024 link link
GAMA GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities 2024 link link
Qwen2-Audio Qwen2-audio technical report 2024 link link
SeamlessM4T SeamlessM4T-Massively Multilingual & Multimodal Machine Translation 2023 link link
Step-Audio Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction 2025 link link
MusicLM Musiclm: Generating music from text 2023 link NA
AudioLDM Audioldm: Text-to-audio generation with latent diffusion models 2023 link link

Large Multi-modal Models (LMMs)

Model Paper Title Year Paper Code
AudioCLIP Audioclip: Extending clip to image, text and audio 2022 link link
4M 4m: Massively multimodal masked modeling 2023 link link
ImageBind Imagebind: One embedding space to bind them all 2023 link link
PandaGPT Pandagpt: One model to instruction-follow them all 2023 link link
NeXT-GPT NExT-GPT: Any-to-Any Multimodal LLM 2023 link link
Video-LLaMA Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding 2023 link link
Video-LLaMA2 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs 2024 link link
Video-SALMONN video-salmonn: Speech-enhanced audio-visual large language models 2024 link link
Gemini Pro Gemini: a family of highly capable multimodal models 2023 link NA
LLaMA 3 The llama 3 herd of models 2024 link link
Qwen2.5-Omni Qwen2. 5-omni technical report 2025 link link
GPT-4o GPT-4o System Card 2024 link NA

🔥 Applications of Foundation Models in Biometrics

In this section, we review recent papers on the applications of foundation models in biometrics:

Foundation Models for Biometric Recognition

Paper Title Year Modality / Task Paper Code
Exploring wav2vec 2.0 on speaker verification and language identification 2020 speaker and language identification link NA
ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities 2024 face verification, gender detection, age estimation link NA
How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability 2024 face verification link NA
ChatGPT Meets Iris Biometrics 2024 iris recognition link NA
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition 2025 face verification link NA
Benchmarking Foundation Models for Zero-Shot Biometric Tasks 2025 face verification, soft biometric attribute prediction (gender and race), iris recognition, iris presentation attack detection, face morph detection, and face deepfake detection link NA
A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding 2021 speaker verification link link
Iris-SAM: Iris Segmentation Using a Foundation Model 2024 iris segmentation link link
SAM-Iris: A SAM-Based Iris Segmentation Algorithm 2025 iris segmentation link NA
Froundation: Are foundation models ready for face recognition? 2024 face recognition link link
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding 2025 audio-visual human video recognition (emotion recognition, expression description, and action understanding) link link
FaceLLM: A Multimodal Large Language Model for Face Understanding 2025 face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting link link
FaceXBench: Evaluating Multimodal LLMs on Face Understanding 2025 face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting link link
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants 2025 facial attributes, age estimation, expression recognition, attack detection, recognition; human attributes, action, spatial/social relations, re-ID link link
From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing 2024 face recognition explainability link NA
FaceOracle: Chat with a Face Image Oracle 2025 face image quality assessment link NA
Unispeech-sat: Universal speech representation learning with speaker aware pre-training 2022 speaker ID, verification, diarization, phoneme recognition, keyword spotting, emotion recognition link link
Large-scale self-supervised speech representation learning for automatic speaker verification 2022 speaker verification link link
General facial representation learning in a visual-linguistic manner 2022 face parsing, alignment, attribute recognition link link
Marlin: Masked autoencoder for facial video representation learning 2023 face attribute recognition, expression recognition, deepfake detection, lip synchronization link link
Self-Supervised Facial Representation Learning with Facial Region Awareness 2024 face expression and attribute recognition link link
Pose-disentangled contrastive learning for self-supervised facial representation 2023 face expression, face recognition, head pose estimation link link
Pros: Facial omni-representation learning via prototype-based self-distillation 2024 face parsing, attribute recognition, emotion detection, landmark detection link link
ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces 2024 face expression change, weight change, age change estimation link NA
SwinFace: a multi-task transformer for face recognition, expression recognition, age estimation and attribute estimation 2023 face attributes, age estimation, expression recognition, face recognition link link
FaceXFormer: A Unified Transformer for Facial Analysis 2024 face parsing, landmarks, head pose estimation, age/gender/race estimation, attribute recognition, expression recognition, link link
Task-adaptive Q-Face 2024 head pose estimation, face attribute recognition, age estimation, expression recognition link NA
Faceptor: A generalist model for face perception 2024 face parsing, landmarks, age and gender estimation, attribute recognition, expression recognition, face recognition link link

Foundation Models for Soft-biometric Detection

Paper Title Year Modality / Task Paper Code
Robust light-weight facial affective behavior recognition with clip 2024 facial expression classification; action unit detection link link
Cliper: A unified vision-language framework for in-the-wild facial expression recognition 2024 face static & dynamic expression recognition link link
Emoclip: A vision-language method for zero-shot video facial expression recognition 2024 video facial emotion recognition link link
Finecliper: Multi-modal fine-grained clip for dynamic facial expression recognition with adapters 2024 dynamic facial expression recognition link NA
Face-mllm: A large face perception model 2024 face age/gender, expression, action units, attributes link NA
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces 2024 face 3DMM parameter generation link NA
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs 2025 face attribute detection link link
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning 2025 face expression recognition, action unit detection, facial attribute detection, age estimation, and deepfake detection link link
FaceInsight: A Multimodal Large Language Model for Face Perception 2025 face attribute recognition, age/ gender/ race estimation, and expression prediction link NA
R1-omni: Explainable omni-multimodal emotion recognition with reinforcement learning 2025 audio-visual emotion recognition with reasoning link link
ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities 2024 face gender detection, age estimation link NA
How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability 2024 age, gender, ethnicity, hair color link NA
ChatGPT Meets Iris Biometrics 2024 iris–face matching; soft-biometrics link NA

Foundation Models for Deepfake and Forgery Detection

Paper Title Year Modality / Task Paper Code
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection 2024 face forgery detection link link
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection 2024 face forgery detection link link
MADation: Face Morphing Attack Detection with Foundation Models 2025 face morph attack detection link link
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning 2024 deepfake detection, anti-spoofing, unseen diffusion forgery link link
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation 2022 voice spoofing & deepfake detection link link
X2-dfd: A framework for explainable and extendable deepfake detection 2024 face deepfake detection link link
Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant 2024 forgery analysis assistant link link
Towards general visual-linguistic face forgery detection (v2) 2025 face forgery detection link link
Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection 2024 face morph attack detection link link
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models 2024 speaker deepfake detection link NA
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector 2025 face deepfake detection \newline+ description link link
Standing on the shoulders of giants: Reprogramming visual-language model for general deepfake detection 2025 face deepfake detection link link
Can chatgpt detect deepfakes? a study of using multimodal large language models for media forensics 2024 face deepfake detection link link
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception 2024 audio-visual deepfake detection link NA
ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models 2025 face morph detection link NA

Foundation Models for Anti-spoofing

Paper Title Year Modality / Task Paper Code
Flip: Cross-domain face anti-spoofing with language guidance 2023 fine‐tune CLIP image encoder for face (FLIP alignment) link link
On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection 2023 SSL via masked‐fingerprint prediction with prompt tuning link NA
CPL-CLIP: Compound Prompt Learning for Flexible-Modal Face Anti-Spoofing 2024 face anti-spoofing link NA
Fm-clip: Flexible modal clip for face anti-spoofing 2024 cross‐modal antispoofing link NA
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection 2024 Unified physical-digital face attack detection link NA
Cfpl-fas: Class free prompt learning for generalizable face anti-spoofing 2024 face anti-spoofing link NA
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing 2025 face anti-spoofing link link
Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing 2025 Multimodal face anti-spoofing link link
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models 2025 face anti-spoofing (classification and attack localization) link link
Interpretable face anti-spoofing: Enhancing generalization with multimodal large language models 2025 face anti-spoofing link NA
Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning 2025 face anti-spoofing (spoofing detection and reasoning) link NA
VL-FAS: Domain Generalization via Vision-Language Model For Face Anti-Spoofing 2024 face anti‐spoofing link NA
FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection 2025 face anti‐spoofing link link
Towards Iris Presentation Attack Detection with Foundation Models 2025 iris anti‐spoofing link NA
Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning 2025 face presentation attack detection link link
Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection? 2025 face presentation attack detection link link
Shield: An evaluation benchmark for face spoofing and forgery detection with multimodal large language models 2025 face anti-spoofing (RGB, infrared, depth) and forgery detection link link
ChatGPT Meets Iris Biometrics 2024 iris presentation‐attack detection link NA

Foundation Models for Synthetic Biometric Generation

Paper Title Year Modality / Task Paper Code
Toward open-world text-driven face generation and manipulation via stylegan3 2024 Text-to-face synthesis link NA
AnyFace++: A unified framework for free-style text-to-face synthesis and manipulation 2024 Text-guided face editing link NA
AnyFace: Free-style text-to-face synthesis and manipulation 2022 Text-to-face generation link NA
Towards counterfactual image manipulation via clip 2022 Controllable text-to-face link link
Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation 2024 Prompt-based face synthesis link NA
Tecm-clip: Text-based controllable multi-attribute face image manipulation 2022 face attribute / expression editing link link
Stylemc: Multi-channel based fast text-guided image generation and manipulation 2022 face multi-attribute editing link link
Photoverse: Tuning-free image customization with text-to-image diffusion models 2023 Few-shot personalised face portrait generation link link
Fastcomposer: Tuning-free multi-subject image generation with localized attention 2024 fast subject-driven face text-to-image link link
Moa: Mixture-of-attention for subject-context disentanglement in personalized image generation 2024 multi-concept face portrait generation link NA
Photomaker: Customizing realistic human photos via stacked id embedding 2024 high-fidelity face personalisation link link
Face0: Instantaneously conditioning a text-to-image model on a face 2023 Identity-preserving face text-to-image link NA
Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models 2023 face instant personalisation link link
Dreamidentity: Improved editability for efficient face-identity preserved image generation 2023 face identity-guided generation link NA
Portraitbooth: A versatile portrait model for fast identity-preserved personalization 2024 face few-shot portrait generation link NA
Instantid: Zero-shot identity-preserving generation in seconds 2024 face real-time personalisation link link
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning 2024 face identity-consistent generation link link
Facestudio: Put your face everywhere in seconds 2023 face ID & style controllable text-to-image link link
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models 2024 identity-aware face editing link NA
Arc2face: A foundation model for id-consistent human faces 2024 identity-conditioned face generation link link
Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model 2024 General identity-conditioned face generation link link
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance 2025 Identity-conditioned 3D head / avatar generation link link
ClipSwap: Towards High Fidelity Face Swapping via Attributes and CLIP-Informed Loss 2024 Face swapping link NA

📚 Missing Papers

We will keep updating the git repository and webpage of our survey. Please contact the first author (hatef.otroshi@idiap.ch) or complete the following form to add your paper:

👉 Submit Your Paper

We appreciate your contributions and look forward to keeping this survey comprehensive and up to date!

©️ Citation

If you find this survey useful, please consider citing it:

@article{fmbiometrics2025survey,
  title={Foundation Models and Biometrics: A Survey and Outlook},
  author={Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
  journal={IEEE Transactions on Information Forensics and Security},
  year={2025},
  publisher={IEEE}
}