This project demonstrates that lightweight Large Language Models (LLMs) can achieve strong performance on tasks requiring technical and specialized knowledge, such as interpreting 3GPP telecommunications standards, while being efficient enough for deployment on resource-constrained hardware like edge devices or consumer-grade GPUs (e.g., NVIDIA RTX 3050 Ti). By leveraging a Llama 3.2 model with only 3 billion parameters, enhanced with Fine-Tuning and Retrieval-Augmented Generation (RAG), and quantized to 4 bits, we achieved results comparable to or better than the proprietary GPT-4o-mini model, using significantly less memory (2.768 GB vs. resource-heavy alternatives). This approach, detailed in our paper Lightweight LLMs for 3GPP Specifications: Fine-Tuning, Retrieval-Augmented Generation and Quantization, utilizes the TeleQnA dataset for training and evaluation and the TSpec-LLM dataset for RAG-based context retrieval.
For a streamlined guide to reproduce the core experiments, see the lightweight repository:
https://github.com/josearimatea/3gpp_llm_eval_light
The experiments evaluated three models—GPT-4o-mini, Llama 3.2 3B, and a fine-tuned Llama-4bit-Tuned—on 600 multiple-choice questions from TeleQnA, evenly split across 3GPP Releases 17, 18, and others (200 questions each). The focus was on achieving high accuracy with minimal computational overhead, enabling practical deployment on modest hardware. Key findings include:
- Llama-4bit-Tuned with RAG: Achieved the highest average accuracy of 76.3%, surpassing GPT-4o-mini (74.0%) despite being quantized to 4 bits, reducing memory usage to 2.768 GB (peak 3.5 GB during RAG inference) compared to larger models requiring over 6 GB unquantized.
- Efficiency through Quantization: The 4-bit quantization of Llama 3.2 3B, fine-tuned on 4,000 TeleQnA questions, enabled local execution on an NVIDIA RTX 3050 Ti (3.712 GB max memory), making it viable for resource-constrained environments like edge devices or softwarized networks.
- RAG Impact: Boosted accuracy significantly across all models, with Llama 3.2 3B improving from 54.7% (no RAG) to 67.8% (with RAG), a 13.1% gain, showing RAG’s effectiveness in compensating for limited internal knowledge in smaller, quantized models.
- Fine-Tuning Benefit: Fine-tuning Llama-4bit-Tuned on telecom-specific data increased its accuracy by up to 14.5% over the untuned Llama 3.2 3B (e.g., 76.3% vs. 67.8% with RAG), particularly excelling on Release 18 (81.5%).
These results, detailed in Lightweight LLMs for 3GPP Specifications: Fine-Tuning, Retrieval-Augmented Generation and Quantization, highlight the feasibility of deploying efficient, open-source LLMs for telecom question-answering without sacrificing performance, even on hardware with limited resources.
The figure below, located in the repository at Figures/Large_all_releases_600.png, illustrates the average accuracy across all releases for the three models:
To provide a statistical perspective, the figure below shows the accuracy of each model with 95% confidence intervals across the 600 test questions:
This image shows Llama-4bit-Tuned with RAG achieving 76.3% accuracy with a tight confidence interval, indicating reliable performance compared to GPT-4o-mini (74.0%) and Llama 3.2 3B (67.8% with RAG), despite its reduced memory footprint.
The following flowchart outlines the entire workflow, from data preprocessing to evaluation:
This diagram illustrates the process from dataset acquisition (TeleQnA and TSpec-LLM), through fine-tuning and RAG integration, to evaluation on modest hardware.
Below is a focused flowchart for the fine-tuned Llama-4bit-Tuned model with RAG:
This highlights the fine-tuning of Llama 3.2 3B on 4,000 TeleQnA questions, quantization to 4 bits, RAG integration, and evaluation on 600 test questions.
To explore the project:
git clone https://github.com/josearimatea/3gpp_llm_evaluation.git
cd 3gpp_llm_evaluationpip install torch transformers sentence-transformers faiss-gpu unsloth langchain pandas numpy jupyterNote: Ensure CUDA compatibility for local GPU use, or use Google Colab.
For a complete step-by-step reproduction guide, refer to the lightweight repository:
https://github.com/josearimatea/3gpp_llm_eval_light
This project also investigated AI agents combined with RAG for enhanced query handling. While promising, agent-based results (e.g., 56% accuracy with Llama 3.2 3B and Llama 3.1 70B) did not surpass RAG-only performance, highlighting challenges in prompt tuning and step sequencing. See the report for more details.
This folder is intended to store the processed datasets used in the project. It does not come pre-populated, so you’ll need to download and process the data manually:
- Processed versions of TSpec-LLM (3GPP documents in markdown format) and TeleQnA (10,000 telecom questions with answers).
- Used for fine-tuning (4,000 questions from TeleQnA), testing (600 questions), and RAG retrieval (TSpec-LLM chunks).
- TSpec-LLM: Hugging Face (3GPP specs for RAG).
- TeleQnA: GitHub (questions for training/evaluation).
- Telco-RAG: GitHub (optional reference for RAG setup).
Action:
Create a data/ subfolder and place processed files here (see the lightweight repository for preprocessing steps).
This folder holds intermediate outputs generated during the project’s development:
- Model weights (e.g., fine-tuned Llama-4bit-Tuned), embeddings (e.g., Faiss indices for RAG), and logs from training/inference runs.
- Stores artifacts so you can reuse them without rerunning time-intensive tasks like embedding generation (4 hours) or fine-tuning (42 minutes).
Note:
Some files are specific to the hardware used (e.g., RTX 3050 Ti) and may vary if regenerated on different setups.
This folder contains documentation and reference materials:
- The project report and related research papers used as inspiration (e.g., TeleQnA and Telco-RAG articles).
- Provides detailed methodology, results, and context for the project, serving as a comprehensive guide to the work done.
Key File:
Project Report (English) includes accuracy tables, confidence intervals, and discussions from the paper.
This folder contains all the source code for the project:
- Jupyter notebooks and scripts for data preprocessing (e.g.,
Process_TeleQnA.ipynb,Process_tspec_llm.ipynb), fine-tuning (e.g.,Llama_fine_tuning_4000_shortavailability.ipynb), inference (e.g.,Inference_RAG_llama_3.2.ipynb), and result compilation (e.g.,main_results_600_questions.ipynb).
- Enables full execution of the project, from data preparation to result generation.
Key File:
/Source/main_results_600_questions.ipynb - Run this to view the main accuracy results (e.g., 76.3% for Llama-4bit-Tuned with RAG).
- Provides in-depth analysis and methodology.
- https://github.com/josearimatea/3gpp_llm_eval_light offers a step-by-step guide to replicate the core experiments.



