MoCA-Fin — Market-of-Claims Agent for Financial Reasoning

MoCA-Fin (Market-of-Claims Agent) is a multi-agent, code-generating system for financial and tabular reasoning. Instead of free-form multi-agent debate, MoCA-Fin runs a claim market: a question is decomposed into atomic, tradable claims; specialist trader agents buy/sell those claims; the market clears a price and a confidence per claim; and a synthesizer writes an executable Python program from the market-supported claims. A sandboxed executor runs the program to produce the final answer.

Backbone. All headline results use a single Qwen/Qwen3.6-27B backbone served via vLLM (OpenAI-compatible) for every role: the trader panel, the synthesizer, the verifier, and the selector committee. The prompts are identical across datasets — only the data loader changes.

Method at a glance

question + table/context
        │
        ▼
 ┌──────────────────┐   atomic, typed, tradable claims
 │  catalog builder │ ───────────────────────────────────┐
 └──────────────────┘                                     ▼
                                            ┌─────────────────────────────┐
 trader panel (buy/sell orders on claims):  │        claim market          │
   • extractor   (reads cells/values)       │  clears price + confidence   │
   • formula     (proposes the computation) │  per claim (accept/reject)   │
   • accountant  (sign / unit / scale)      └─────────────────────────────┘
   • skeptic     (shorts over-confident claims)            │
                                                            ▼
                                              ┌──────────────────────────┐
                                              │       synthesizer        │  Python program
                                              │ (uses supported claims)  │ ─────────────┐
                                              └──────────────────────────┘              ▼
                                                                              ┌───────────────────┐
        one market-aware repair round on failure  ◀───────────────────────── │ sandboxed executor│
                                                                              └───────────────────┘
 hybrid routing: a baseline proposer + a multi-lens selector committee + a
 conflict arbiter keep the strongest grounded candidate.

Headline results

Exact-match / execution accuracy (%). Bold = best overall; prev. best is the strongest previously published number for that track (see the paper for the full leaderboards and citations).

C1 — Financial Numerical Reasoning

Dataset	MoCA-Fin	Prev. best
FinQA	78.29	74.18 (Fino1-14B)
DocMath-Simplong	70.00	60.00 (GPT-4o)
DocMath-Complong	50.67	42.33 (DeepSeek-V3)
FinanceMath	76.00	67.0 (GPT-4o PoT)

C2 — General Tabular Reasoning

Dataset	MoCA-Fin	Prev. best
TabMWP	96.00	94.70 (CREATOR)
WikiTableQuestions	81.40	80.80 (ARTEMIS-DA)
HiTab	77.27	79.10 (SS-CoT)
MultiHiertt	71.17	56.78 (Fortune RL w/ CS)

C3 — Domain Knowledge QA

Dataset	MoCA-Fin	Prev. best
ESGenius	86.88	83.80 (Gemma-3 12B + IT + RAG)

C4 — Multimodal Chart Reasoning (FinChart-Bench)

Subtask	MoCA-Fin	Prev. best
True/False	97.27	97.86 (o3)
Multiple-Choice	84.85	92.17 (o3)
Open QA	74.65	63.59 (Claude Sonnet 4)
Average	85.59	84.32 (Claude Sonnet 4)

The ten benchmarks

Cat.	Dataset	`--dataset` key	Eval N	Split	Modality
C1	FinQA	`finqa`	1,147	test	text + table
C1	DocMath-Simplong	`dm_simplong`	100	testmini	text + multi-table
C1	DocMath-Complong	`dm_complong`	300	testmini	text + multi-table
C1	FinanceMath	`financemath`	200	validation	text + table
C2	HiTab	`hitab`	1,584	test	hierarchical table
C2	MultiHiertt	`multihiertt`	1,044	dev	text + multi-hier. table
C2	TabMWP	`tabmwp`	1,000	test (1k)	semi-structured table
C2	WikiTableQuestions	`wikitq`	4,344	test	semi-structured table
C3	ESGenius	`esgenius`	1,136	full	text (MCQ)
C4	FinChart-Bench (TF/MC/QA)	`finchart_bench`	2,384 / 2,350 / 2,284	full	chart image

See docs/datasets.md for the per-dataset data paths, sources, and exact run commands.

Installation

git clone <this-repo> mocafin-fin && cd mocafin-fin
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# To self-host the backbone on a GPU node, also:
pip install -r requirements-serve.txt   # vLLM

Python 3.10+ is recommended. The agent itself only needs an OpenAI-compatible endpoint (vLLM or Azure OpenAI); vllm is required only on the machine that serves the model.

1. Download the data

Nothing is redistributed here — the helper fetches each benchmark from its original source (GitHub / Hugging Face).

# everything
python scripts/download_data.py --datasets all
# or a subset
python scripts/download_data.py --datasets finqa tabmwp finchart_bench

Notes:

FinanceMath is a gated HF dataset — accept the terms on its HF page and run huggingface-cli login (or export HF_TOKEN=...) first.
DocMath (dm_simplong + dm_complong) both come from the single yale-nlp/DocMath-R snapshot.

Data lands under data/raw/<dataset>/ exactly where the loaders expect it (mocafin/data.py::COMMON_DATA_PATHS).

2. Serve the backbone with vLLM

On a 4×H100 node:

export LLM_BACKEND=vllm
export VLLM_MODEL_NAME=Qwen/Qwen3.6-27B
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen/Qwen3.6-27B \
    --port 8000 --tensor-parallel-size 4 --max-model-len 65536 \
    --gpu-memory-utilization 0.85 --reasoning-parser qwen3 \
    --trust-remote-code --limit-mm-per-prompt '{"image": 8}'

On SLURM, use scripts/serve_vllm.slurm. The same multimodal server handles both text and the FinChart vision pass. Full notes (cache redirection, long context, smoke tests) are in docs/vllm_setup.md.

3. Point the agent at the server

export LLM_BACKEND=vllm
export VLLM_BASE_URL=http://localhost:8000/v1
export VLLM_API_KEY=EMPTY
export VLLM_MODEL_NAME=Qwen/Qwen3.6-27B

Single-example demo:

python -m mocafin.demo \
  --question "What is the YoY revenue growth from 2022 to 2023?" \
  --table "| Year | Revenue |
|------|--------|
| 2022 | 5200 |
| 2023 | 6100 |"

4. Run an evaluation

python -m mocafin.evaluate \
  --dataset finqa \
  --data_path data/raw/finqa/test.json \
  --output results/finqa.jsonl \
  --workers 16

This writes per-example records to results/finqa.jsonl and a summary to results/finqa_metrics.json (answer accuracy + the three internal diagnostics: code-execution rate, code–answer consistency, mean market confidence). Runs are resumable — re-running the same command continues from the existing file.

FinChart-Bench (multimodal) additionally needs the vision pass:

python -m mocafin.evaluate --dataset finchart_bench \
  --data_path data/raw/finchart_bench_qa_only \
  --vision_transcribe question_conditioned \
  --output results/finchart_qa.jsonl --workers 4

Reproduce everything in one SLURM job

scripts/run_eval.slurm launches vLLM, waits for it, runs the requested benchmarks, and shuts the server down:

mkdir -p logs results
# edit the SBATCH account/partition for your cluster, then:
sbatch --account=YOUR_ACCOUNT scripts/run_eval.slurm                  # all text tracks
sbatch --account=YOUR_ACCOUNT \
  --export=ALL,DATASETS="finqa tabmwp" scripts/run_eval.slurm         # just two

Tunable env vars: MODEL PORT TP MAXLEN WORKERS DATASETS DATA_ROOT OUT CACHE_ROOT.

Configuration

All configuration is via environment variables — no secrets are stored in the source. The most important ones:

Variable	Default	Meaning
`LLM_BACKEND`	`azure`	`vllm` for a local OpenAI-compatible server
`VLLM_BASE_URL`	`http://localhost:8000/v1`	server URL
`VLLM_MODEL_NAME`	`Qwen/Qwen3.6-27B`	served model id
`VLLM_VISION_BASE_URL` / `VLLM_VISION_MODEL_NAME`	= text values	vision endpoint (FinChart)
`MOCAFIN_MAX_CLAIMS`	`10`	max claims per question
`MOCAFIN_REPAIR_ROUNDS`	`1`	market-aware code repair rounds
`MOCAFIN_EXECUTION_TIMEOUT`	`10`	sandbox timeout (s)

The full set (trader role weights, market thresholds, selector/arbiter thresholds) is in mocafin/config.py; the paper settings are the defaults.

To use Azure OpenAI instead of vLLM, set LLM_BACKEND=azure and provide AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_DEPLOYMENT_NAME.

Repository layout

mocafin-release/
├── mocafin/                 # the agent package (paper datasets only)
│   ├── agent.py             # MoCAFinAgent orchestration
│   ├── market.py            # claim market clearing
│   ├── models.py            # claim / market / program dataclasses
│   ├── prompts.py           # role prompts (catalog, traders, synth, verify)
│   ├── verifier.py          # operation-aware structural checks
│   ├── executor.py          # sandboxed Python execution + answer matching
│   ├── llm_client.py        # Azure / vLLM OpenAI-compatible client
│   ├── data.py              # loaders for the 10 paper benchmarks
│   ├── evaluate.py          # evaluation CLI
│   ├── demo.py              # single-example CLI
│   └── vision/              # image → text transcription for FinChart
├── scripts/
│   ├── download_data.py     # fetch the 10 benchmarks from upstream
│   ├── serve_vllm.slurm     # serve Qwen3.6-27B on 4×H100
│   └── run_eval.slurm       # serve + evaluate + teardown, one job
├── docs/
│   ├── vllm_setup.md
│   └── datasets.md
├── requirements.txt
├── requirements-serve.txt   # vLLM (server side only)
└── LICENSE

Citation

soon

License

Code: MIT (see LICENSE). Each benchmark is governed by its own upstream license — consult and comply with it before use. No dataset is redistributed by this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MoCA-Fin — Market-of-Claims Agent for Financial Reasoning

Method at a glance

Headline results

C1 — Financial Numerical Reasoning

C2 — General Tabular Reasoning

C3 — Domain Knowledge QA

C4 — Multimodal Chart Reasoning (FinChart-Bench)

The ten benchmarks

Installation

1. Download the data

2. Serve the backbone with vLLM

3. Point the agent at the server

4. Run an evaluation

Reproduce everything in one SLURM job

Configuration

Repository layout

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
docs		docs
mocafin		mocafin
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-serve.txt		requirements-serve.txt
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MoCA-Fin — Market-of-Claims Agent for Financial Reasoning

Method at a glance

Headline results

C1 — Financial Numerical Reasoning

C2 — General Tabular Reasoning

C3 — Domain Knowledge QA

C4 — Multimodal Chart Reasoning (FinChart-Bench)

The ten benchmarks

Installation

1. Download the data

2. Serve the backbone with vLLM

3. Point the agent at the server

4. Run an evaluation

Reproduce everything in one SLURM job

Configuration

Repository layout

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages