3GPP LLM Evaluation - Project Overview

This project demonstrates that lightweight Large Language Models (LLMs) can achieve strong performance on tasks requiring technical and specialized knowledge, such as interpreting 3GPP telecommunications standards, while being efficient enough for deployment on resource-constrained hardware like edge devices or consumer-grade GPUs (e.g., NVIDIA RTX 3050 Ti). By leveraging a Llama 3.2 model with only 3 billion parameters, enhanced with Fine-Tuning and Retrieval-Augmented Generation (RAG), and quantized to 4 bits, we achieved results comparable to or better than the proprietary GPT-4o-mini model, using significantly less memory (2.768 GB vs. resource-heavy alternatives). This approach, detailed in our paper Lightweight LLMs for 3GPP Specifications: Fine-Tuning, Retrieval-Augmented Generation and Quantization, utilizes the TeleQnA dataset for training and evaluation and the TSpec-LLM dataset for RAG-based context retrieval.

For a streamlined guide to reproduce the core experiments, see the lightweight repository:
https://github.com/josearimatea/3gpp_llm_eval_light

Key Results

The experiments evaluated three models—GPT-4o-mini, Llama 3.2 3B, and a fine-tuned Llama-4bit-Tuned—on 600 multiple-choice questions from TeleQnA, evenly split across 3GPP Releases 17, 18, and others (200 questions each). The focus was on achieving high accuracy with minimal computational overhead, enabling practical deployment on modest hardware. Key findings include:

Llama-4bit-Tuned with RAG: Achieved the highest average accuracy of 76.3%, surpassing GPT-4o-mini (74.0%) despite being quantized to 4 bits, reducing memory usage to 2.768 GB (peak 3.5 GB during RAG inference) compared to larger models requiring over 6 GB unquantized.
Efficiency through Quantization: The 4-bit quantization of Llama 3.2 3B, fine-tuned on 4,000 TeleQnA questions, enabled local execution on an NVIDIA RTX 3050 Ti (3.712 GB max memory), making it viable for resource-constrained environments like edge devices or softwarized networks.
RAG Impact: Boosted accuracy significantly across all models, with Llama 3.2 3B improving from 54.7% (no RAG) to 67.8% (with RAG), a 13.1% gain, showing RAG’s effectiveness in compensating for limited internal knowledge in smaller, quantized models.
Fine-Tuning Benefit: Fine-tuning Llama-4bit-Tuned on telecom-specific data increased its accuracy by up to 14.5% over the untuned Llama 3.2 3B (e.g., 76.3% vs. 67.8% with RAG), particularly excelling on Release 18 (81.5%).

These results, detailed in Lightweight LLMs for 3GPP Specifications: Fine-Tuning, Retrieval-Augmented Generation and Quantization, highlight the feasibility of deploying efficient, open-source LLMs for telecom question-answering without sacrificing performance, even on hardware with limited resources.

Figure of Results

The figure below, located in the repository at Figures/Large_all_releases_600.png, illustrates the average accuracy across all releases for the three models:

Confidence Interval Analysis

To provide a statistical perspective, the figure below shows the accuracy of each model with 95% confidence intervals across the 600 test questions:

This image shows Llama-4bit-Tuned with RAG achieving 76.3% accuracy with a tight confidence interval, indicating reliable performance compared to GPT-4o-mini (74.0%) and Llama 3.2 3B (67.8% with RAG), despite its reduced memory footprint.

Complete Flowchart

The following flowchart outlines the entire workflow, from data preprocessing to evaluation:

This diagram illustrates the process from dataset acquisition (TeleQnA and TSpec-LLM), through fine-tuning and RAG integration, to evaluation on modest hardware.

Flowchart of Tuned Model

Below is a focused flowchart for the fine-tuned Llama-4bit-Tuned model with RAG:

This highlights the fine-tuning of Llama 3.2 3B on 4,000 TeleQnA questions, quantization to 4 bits, RAG integration, and evaluation on 600 test questions.

Cloning and Setup

To explore the project:

Step 1: Clone the Repository

git clone https://github.com/josearimatea/3gpp_llm_evaluation.git
cd 3gpp_llm_evaluation

Step 2: Install Dependencies

pip install torch transformers sentence-transformers faiss-gpu unsloth langchain pandas numpy jupyter

Note: Ensure CUDA compatibility for local GPU use, or use Google Colab.

Reproduction Guide

For a complete step-by-step reproduction guide, refer to the lightweight repository:
https://github.com/josearimatea/3gpp_llm_eval_light

Agents Exploration

This project also investigated AI agents combined with RAG for enhanced query handling. While promising, agent-based results (e.g., 56% accuracy with Llama 3.2 3B and Llama 3.1 70B) did not surpass RAG-only performance, highlighting challenges in prompt tuning and step sequencing. See the report for more details.

Folder Structure

Dataset Folder

This folder is intended to store the processed datasets used in the project. It does not come pre-populated, so you’ll need to download and process the data manually:

Purpose

Used for fine-tuning (4,000 questions from TeleQnA), testing (600 questions), and RAG retrieval (TSpec-LLM chunks).

Download Instructions

TSpec-LLM: Hugging Face (3GPP specs for RAG).
TeleQnA: GitHub (questions for training/evaluation).
Telco-RAG: GitHub (optional reference for RAG setup).

Action:
Create a data/ subfolder and place processed files here (see the lightweight repository for preprocessing steps).

Files Folder

This folder holds intermediate outputs generated during the project’s development:

Purpose

Stores artifacts so you can reuse them without rerunning time-intensive tasks like embedding generation (4 hours) or fine-tuning (42 minutes).

Note:
Some files are specific to the hardware used (e.g., RTX 3050 Ti) and may vary if regenerated on different setups.

Papers Folder

This folder contains documentation and reference materials:

Purpose

Provides detailed methodology, results, and context for the project, serving as a comprehensive guide to the work done.

Key File:
Project Report (English) includes accuracy tables, confidence intervals, and discussions from the paper.

Source Folder

This folder contains all the source code for the project:

Jupyter notebooks and scripts for data preprocessing (e.g., Process_TeleQnA.ipynb, Process_tspec_llm.ipynb), fine-tuning (e.g., Llama_fine_tuning_4000_shortavailability.ipynb), inference (e.g., Inference_RAG_llama_3.2.ipynb), and result compilation (e.g., main_results_600_questions.ipynb).

Purpose

Enables full execution of the project, from data preparation to result generation.

Key File:
/Source/main_results_600_questions.ipynb - Run this to view the main accuracy results (e.g., 76.3% for Llama-4bit-Tuned with RAG).

More Information

Full Report

Provides in-depth analysis and methodology.

Lightweight Reproduction

https://github.com/josearimatea/3gpp_llm_eval_light offers a step-by-step guide to replicate the core experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Dataset		Dataset
Evaluations/RAGAS		Evaluations/RAGAS
Figures		Figures
Files		Files
Models_responses		Models_responses
Papers		Papers
Source		Source
.gitattributes		.gitattributes
.gitignore		.gitignore
Project_report(english_version).pdf		Project_report(english_version).pdf
Project_report(portuguese_version).pdf		Project_report(portuguese_version).pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3GPP LLM Evaluation - Project Overview

Key Results

Figure of Results

Confidence Interval Analysis

Complete Flowchart

Flowchart of Tuned Model

Cloning and Setup

Step 1: Clone the Repository

Step 2: Install Dependencies

Reproduction Guide

Agents Exploration

Folder Structure

Dataset Folder

Contents

Purpose

Download Instructions

Files Folder

Contents

Purpose

Papers Folder

Contents

Purpose

Source Folder

Contents

Purpose

More Information

Full Report

Lightweight Reproduction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

3GPP LLM Evaluation - Project Overview

Key Results

Figure of Results

Confidence Interval Analysis

Complete Flowchart

Flowchart of Tuned Model

Cloning and Setup

Step 1: Clone the Repository

Step 2: Install Dependencies

Reproduction Guide

Agents Exploration

Folder Structure

Dataset Folder

Contents

Purpose

Download Instructions

Files Folder

Contents

Purpose

Papers Folder

Contents

Purpose

Source Folder

Contents

Purpose

More Information

Full Report

Lightweight Reproduction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages