Skip to content

jahidhasanlinix/gpt-2_inference_optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 GPT-2 Inference API Development

In this project, I used the HuggingFace GPT-2 model for inference using FastAPI and tested on Google Colab.


πŸš€ 1. GPT-2 Inference Server-side (FastAPI + ngrok)

Run this notebook to:

  • Load a GPT-2 model using the HuggingFace transformers library
  • Test the FastAPI using ngrok

πŸ‘‰ GPT-2 Inference Server Google Colab

πŸ› οΈ Features

  • /generate endpoint for single-prompt inference
  • /batch_generate endpoint for multi-prompt batch inference
  • Build your own prototype or demos to further test it
  • Experimenting with quantization, batching, and latency

πŸ§ͺ 2. API Client-end Testing

Use this notebook to:

  • Send test requests to your public FastAPI endpoint
  • Measure tokens generated and latency
  • Support both single and batch inference calls

πŸ‘‰ Client-end Testing Google Colab


πŸ™Œ Credits

Built with:

  • ⚑ FastAPI – High-performance Python web framework for building APIs
  • πŸ€— Hugging Face Transformers – State-of-the-art NLP models
  • 🧠 Google Colab – Free cloud notebooks with GPU support
  • 🌐 ngrok – Public URLs for localhost APIs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors