LLM Investment Benchmark

A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.

Overview

This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:

Create new portfolios
List current holdings and recent context
Update portfolios based on model decisions

The model executions and their current context can be seen here.

Automated Weekly Runs

The active model roster is configured in models.json. The weekly GitHub Actions workflow runs the benchmark every Monday, writes model orders under orders/<model>/<date>.json, stores the market snapshot in prices/<date>.csv, updates llm100kbench.db, and regenerates this README's current portfolio section.

The project is intentionally limited to free API tiers. Models that require paid metered API access or subscriptions are archived instead of being run automatically.

Each weekly run also writes a concise decision log under logs/<model>/<date>.md, including the model used, validation status, per-trade rationale, context, and validation notes when a model response is rejected.

Why?

To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.

This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.

Notes

chatgpt, deepseek, and grok are kept as continuing benchmark identities. Their exact backend model IDs are recorded in each new order's metadata.
perplexity is archived for future runs because its API is paid and the free chat UI is not suitable for unattended automation.
Claude and other paid-only APIs are not included while the project keeps the free-tier-only restriction.

Project Structure

cmd: Contains the main command implementations
- create: Initialize new portfolios
- list: Display current holdings and context
- update: Process investment orders and update holdings
- stocks: Fetch most recent stock prices

Prompt

The most recent prompt with the clear guidelines can be see here and here.

Current Portfolio (2026-06-22)

Portfolio Value by Model

pie showData
    "deepseek" : 220011
    "chatgpt" : 124637
    "mistral" : 100000
    "qwen" : 100000
    "gpt-oss" : 96482
    "gemini" : 82272
    "llama" : 46936

Model	Ticket	Sum	Quantity
`chatgpt`	`USD`	69	69
`chatgpt`	`AAPL`	124568	418
`deepseek`	`AMD`	1612	3
`deepseek`	`ASML`	192968	100
`deepseek`	`MSFT`	2656	7
`deepseek`	`SNPS`	22776	50
`gemini`	`AAPL`	12516	42
`gemini`	`AMD`	10210	19
`gemini`	`ASML`	7719	4
`gemini`	`CRDO`	5437	20
`gemini`	`GFS`	5150	60
`gemini`	`NVDA`	6321	30
`gemini`	`NXPI`	9398	30
`gemini`	`ON`	5108	42
`gemini`	`QCOM`	10401	46
`gemini`	`TSLA`	10012	25
`mistral`	`USD`	100000	100000
`llama`	`USD`	37102	37102
`llama`	`AAPL`	9834	33
`gpt-oss`	`AAPL`	96257	323
`gpt-oss`	`CMCSA`	224	10
`qwen`	`USD`	100000	100000

Model	Total Sum	Change
`deepseek`	220011	—
`chatgpt`	124637	—
`mistral`	100000	—
`qwen`	100000	—
`gpt-oss`	96482	—
`gemini`	82272	—
`llama`	46936	—

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
cmd		cmd
logs		logs
orders		orders
pkg		pkg
prices		prices
stats		stats
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
llm100kbench.db		llm100kbench.db
models.json		models.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Investment Benchmark

Overview

Automated Weekly Runs

Why?

Notes

Project Structure

Prompt

Current Portfolio (2026-06-22)

Portfolio Value by Model

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Investment Benchmark

Overview

Automated Weekly Runs

Why?

Notes

Project Structure

Prompt

Current Portfolio (2026-06-22)

Portfolio Value by Model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages