In this project, I used the HuggingFace GPT-2 model for inference using FastAPI and tested on Google Colab.
Run this notebook to:
- Load a GPT-2 model using the HuggingFace transformers library
- Test the FastAPI using ngrok
π GPT-2 Inference Server Google Colab
/generateendpoint for single-prompt inference/batch_generateendpoint for multi-prompt batch inference- Build your own prototype or demos to further test it
- Experimenting with quantization, batching, and latency
Use this notebook to:
- Send test requests to your public FastAPI endpoint
- Measure tokens generated and latency
- Support both single and batch inference calls
π Client-end Testing Google Colab
Built with:
- β‘ FastAPI β High-performance Python web framework for building APIs
- π€ Hugging Face Transformers β State-of-the-art NLP models
- π§ Google Colab β Free cloud notebooks with GPU support
- π ngrok β Public URLs for localhost APIs