Skip to content

gqgs/llm100kbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Investment Benchmark

A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.

Overview

This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:

  • Create new portfolios
  • List current holdings and recent context
  • Update portfolios based on model decisions

The model executions and their current context can be seen here.

Automated Weekly Runs

The active model roster is configured in models.json. The weekly GitHub Actions workflow runs the benchmark every Monday, writes model orders under orders/<model>/<date>.json, stores the market snapshot in prices/<date>.csv, updates llm100kbench.db, and regenerates this README's current portfolio section.

The project is intentionally limited to free API tiers. Models that require paid metered API access or subscriptions are archived instead of being run automatically.

Each weekly run also writes a concise decision log under logs/<model>/<date>.md, including the model used, validation status, per-trade rationale, context, and validation notes when a model response is rejected.

Why?

To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.

This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.

Notes

  • chatgpt, deepseek, and grok are kept as continuing benchmark identities. Their exact backend model IDs are recorded in each new order's metadata.
  • perplexity is archived for future runs because its API is paid and the free chat UI is not suitable for unattended automation.
  • Claude and other paid-only APIs are not included while the project keeps the free-tier-only restriction.

Project Structure

  • cmd: Contains the main command implementations
    • create: Initialize new portfolios
    • list: Display current holdings and context
    • update: Process investment orders and update holdings
    • stocks: Fetch most recent stock prices

Prompt

The most recent prompt with the clear guidelines can be see here and here.

Current Portfolio (2026-06-22)

Portfolio Value by Model

pie showData
    "deepseek" : 220011
    "chatgpt" : 124637
    "mistral" : 100000
    "qwen" : 100000
    "gpt-oss" : 96482
    "gemini" : 82272
    "llama" : 46936
Loading
Model Ticket Sum Quantity
chatgpt USD 69 69
chatgpt AAPL 124568 418
deepseek AMD 1612 3
deepseek ASML 192968 100
deepseek MSFT 2656 7
deepseek SNPS 22776 50
gemini AAPL 12516 42
gemini AMD 10210 19
gemini ASML 7719 4
gemini CRDO 5437 20
gemini GFS 5150 60
gemini NVDA 6321 30
gemini NXPI 9398 30
gemini ON 5108 42
gemini QCOM 10401 46
gemini TSLA 10012 25
mistral USD 100000 100000
llama USD 37102 37102
llama AAPL 9834 33
gpt-oss AAPL 96257 323
gpt-oss CMCSA 224 10
qwen USD 100000 100000
Model Total Sum Change
deepseek 220011
chatgpt 124637
mistral 100000
qwen 100000
gpt-oss 96482
gemini 82272
llama 46936

About

LLM 100k portfolio management benchmark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors