Image-to-text training pipeline. Feature extractors, i.e., dots in the feature space, for ViT, ResNet50 and VAE models.
- Chest X-Ray Captions Using Qwen and ReXGradient-160K Dataset Colab
- Qwen2-VL-7B image-text fine-tuning using the LaTeX OCR dataset Colab
- Chest X-Ray Captions Using Transformers and the ReXGradient-160K Dataset Colab
- Some experiments on how to Build an AI Agent from Scratch in Raw Python
- Very simple Attention mechanism implementation
- Image Captions Using ViT and GPT2 Transformers is an image-to-text training pipeline.
- Example of text augmentations
- Image Captions with Minimal Details using Vision Encoder Decoder (ViT + GPT2) model that fine-tuned on flickr8k-dataset for image-to-text task.
- Vision Transformer (ViT) fine-tuning ViT model using timm library. After fine-tuning, the feature extractor is performed. Colab
- ResNet50 fine-tuning and feature extractor.
- Variational Autoencoder (VAE) feature extractor.
