Fine-tuning tutorial: train ModelSpanExtractor on a custom domain

## Summary

Write a tutorial (notebook or markdown guide) showing how to fine-tune `ModelSpanExtractor` on custom domain data using the existing training pipeline in `verbatim_rag/extractor_models/`.

## Motivation

The `verbatim-rag-modern-bert-v2` model was trained on scientific papers, medical literature, financial tables, and other domains, but users in niche domains (legal contracts, internal documentation, code output) may want a specialized extractor. The training code exists but is undocumented for external users.

## Scope

1. Data format: how to structure `{question, document, gold_spans[]}` pairs as a training dataset
2. Training: `python verbatim_rag/extractor_models/train.py --data_path data/ --output_dir output/`
3. Evaluation: word-F1 on a held-out dev set
4. Loading the fine-tuned model: `ModelSpanExtractor(model_path="./output/checkpoint-best")`

A Jupyter notebook in `examples/` or a `docs/fine_tuning.md` are both acceptable.

## Notes

- The training script CLI already exists at `verbatim_rag/extractor_models/train.py`
- `QADataset`, `QAModel`, and `Trainer` are in `verbatim_rag/extractor_models/`
- A small synthetic dataset example would make the tutorial self-contained

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning tutorial: train ModelSpanExtractor on a custom domain #30

Summary

Motivation

Scope

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Fine-tuning tutorial: train ModelSpanExtractor on a custom domain #30

Description

Summary

Motivation

Scope

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions