Atlasmind-Lite

A natural language to JQL (Jira Query Language) generator using RAG (Retrieval-Augmented Generation) with pgvector. Supports multiple LLM backends: local Ollama, vLLM (GPU inference server), Groq cloud, Anthropic Claude direct API, and AWS Bedrock-compatible endpoints. Returns structured JSON with a JQL query, a chart specification, and a plain-text answer. A two-stage router answers general questions immediately without touching the JQL pipeline. Try it live: atlasmind.de

Preview

Prerequisites

PostgreSQL with the pgvector extension
One of the following LLM backends:
- Ollama running locally with a model loaded (default: qwen2.5:3b-instruct-q4_K_M)
- A Groq API key (GROQ_API_KEY)
- A vLLM inference server (VLLM_URL)
- An Anthropic API key (CLAUDE_API_KEY) for Claude direct
- An AWS Bedrock-compatible endpoint + bearer token for --model bedrock
Python 3.12+, uv

Setup

uv sync

One-time Jira field fetch

Run these once against your active Jira profile before starting the server for the first time, or after switching to a new Jira instance:

# Fetch all Jira field metadata and cache locally
uv run python -c "from jira.jira_field_api import fetch_and_save_fields; fetch_and_save_fields()"

# Fetch allowed values for all eligible fields (status, priority, issue types, custom options)
uv run python -c "
import asyncio
from jira.jira_field_api import fetch_and_save_allowed_values
asyncio.run(fetch_and_save_allowed_values())
"

Files are written to data/{domain_slug}/ and used to seed the pgvector tables on next startup. AtlasMind will also fetch them automatically on first run if they are absent.

Set the following environment variables (or rely on the defaults in settings.py):

Variable	Default	Description
`DATABASE_URL`	`postgresql://postgres:postgres@localhost:5432/jql_vectordb`	pgvector connection string
`EMBEDDING_MODEL`	`BAAI/bge-small-en-v1.5`	SentenceTransformer model name
`LLM_BACKEND`	`ollama`	LLM backend: `ollama`, `groq`, `vllm`, `claude`, or `bedrock` (overrides `--model` when set)
`JQL_OLLAMA_URL`	`http://localhost:11434`	Ollama base URL
`JQL_LOCAL_MODEL`	`qwen2.5:3b-instruct-q4_K_M`	Ollama model to use
`JQL_OLLAMA_TIMEOUT`	`120`	Read timeout in seconds for LLM inference
`GROQ_API_KEY`	—	Groq API key (local dev)
`GROQ_API_KEY_OCID`	—	OCI Vault secret OCID for `GROQ_API_KEY` (takes priority over `GROQ_API_KEY`)
`GROQ_MODEL`	`meta-llama/llama-4-scout-17b-16e-instruct`	Groq model name
`JQL_ANNOTATION_FILE`	`data/jira_jql_annotated_queries.md`	Path to JQL annotation file
`MAX_JIRA_RESULTS`	`2000`	Maximum number of Jira issues fetched per query (paginated automatically)
`JQL_MAX_ATTEMPTS`	`4`	Total JQL attempts per query: 1 initial + (`JQL_MAX_ATTEMPTS` − 1) retries on Jira validation errors
`MAX_INTENT_FIELDS`	`5`	Maximum extra fields the LLM may propose per query
`STANDARD_FIELD_IDS`	`key,summary,assignee,priority,issuetype,created,resolutiondate`	Comma-separated list of Jira field IDs always shown in results — override per project or Docker deployment
`VALUE_AUTO_CORRECT_THRESHOLD`	`0.15`	Cosine distance below which the sanitizer silently auto-corrects a bad value to the nearest known allowed value (e.g. typo correction)
`VALUE_HINT_THRESHOLD`	`0.40`	Cosine distance threshold for JQL value correction — bad values within this distance of a known allowed value are flagged
`VALUE_HINT_MAX_CANDIDATES`	`3`	Maximum candidate values surfaced per field for JQL sanitizer corrections
`VALUE_PROMPT_MAX_CANDIDATES`	`3`	Maximum candidate values injected into the retry prompt as hints for the LLM
`EMBEDDING_BATCH_SIZE`	`256`	Batch size for SentenceTransformer encoding during seeding — higher values reduce seeding time on CPU/GPU
`MAX_VALUES_FOR_EMBEDDING`	`50`	Maximum allowed values embedded per field in `jira_field_values`. High-cardinality fields (versions, components) are capped here; the in-memory exact-match dict always holds all values so casing correction is unaffected
`VLLM_URL`	—	vLLM server base URL (e.g. `http://100.x.x.x:8002`)
`VLLM_FALLBACK`	`ollama`	Backend to use if vLLM is unreachable at startup (`ollama`, `groq`, `claude`, `bedrock`)
`VLLM_TIMEOUT`	`240`	Read timeout in seconds for vLLM inference
`VLLM_MAX_TOKENS`	—	Max tokens for vLLM responses
`VLLM_API_KEY`	—	API key if the vLLM server requires authentication
`CLAUDE_API_KEY`	—	Anthropic API key (local dev); used when `--model claude`
`CLAUDE_API_KEY_OCID`	—	OCI Vault secret OCID for `CLAUDE_API_KEY` (takes priority)
`CLAUDE_MODEL`	`claude-sonnet-4-6`	Anthropic model name
`AWS_BEARER_TOKEN_BEDROCK`	—	Bearer token for the Bedrock-compatible endpoint; used when `--model bedrock`
`CUSTOM_ENDPOINT`	—	Bedrock-compatible API endpoint URL — required for `--model bedrock`
`BEDROCK_REGION`	`custom`	Region name passed to boto3 (gateway may override internally)
`BEDROCK_MODEL`	`claude-sonnet-4.6`	Model ID sent to the Bedrock endpoint

Running the app

All modes are accessed through app.py.

Setting env variables on Windows The VAR=value command inline syntax is Linux/macOS only. On Windows use:

CMD: set JQL_MAX_ATTEMPTS=5 && uv run python app.py --server

PowerShell: $env:JQL_MAX_ATTEMPTS=5; uv run python app.py --server

Interactive REPL

uv run python app.py --query                    # local Ollama (default)
uv run python app.py --query --model groq       # Groq cloud
uv run python app.py --query --model vllm       # vLLM inference server
uv run python app.py --query --model claude     # Anthropic Claude direct
uv run python app.py --query --model bedrock    # AWS Bedrock-compatible endpoint

Starts a Rich terminal loop with the AtlasMind banner. Type a natural language query and press Enter to get JQL and an answer.

[atlasmind]> list open bugs assigned to me

  Route   : JQL pipeline
  JQL     : assignee = currentUser() AND issuetype = Bug AND status != Done ORDER BY created DESC
  Chart   : {"type": "bar", "x_field": "status", "y_field": "count", "title": "Open bugs by status"}
  Answer  : Open bugs currently assigned to you
  Response time : 2.34s

General questions are answered directly without going through the JQL pipeline:

[atlasmind]> what is the difference between a bug and a task?

  Route   : General answer
  Answer  : A bug represents a defect or unexpected behaviour in the software...
  Response time : 0.81s

REPL commands:

Command	Description
`am help`	Show example queries and command list
`am history`	Show query history for this session
`exit` / `quit` / `q` / `am quit`	Exit the REPL
`Ctrl+C` at prompt	Exit cleanly
`Ctrl+C` during query	Interrupt the current query, return to prompt

Single-shot query

uv run python app.py --query "list open bugs assigned to me"

Runs one query, prints JQL and Answer, then exits. Useful for scripting.

FastAPI server

uv run python app.py --server                             # Ollama backend, port 8000
uv run python app.py --server --model groq --port 9000    # Groq backend, port 9000
uv run python app.py --server --model vllm --port 9000    # vLLM backend, port 9000
uv run python app.py --server --model claude              # Anthropic Claude direct
uv run python app.py --server --model bedrock             # AWS Bedrock-compatible endpoint

Starts the REST API on http://0.0.0.0:8000.

Method	Endpoint	Description
`GET`	`/health`	Liveness check — returns `{"status": "ok"}`
`GET`	`/meta`	Server metadata: active model name, LLM backend, and timeout
`GET`	`/query`	Generate JQL from natural language (query via `q` URL param)
`POST`	`/query`	Same as GET but query in request body (`{"query": "..."}`)
`POST`	`/event`	Client events: `{"event": "cancel", "request_id": "..."}` to abort an in-flight query

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "list open bugs assigned to me"}'

{
  "jql": "assignee = currentUser() AND issuetype = Bug AND status != Done ORDER BY created DESC",
  "chart_spec": {"type": "bar", "x_field": "status", "y_field": "count", "title": "Open bugs by status"},
  "answer": "Open bugs currently assigned to you"
}

Routing overrides

The query router automatically classifies each query as JQL or general. If the router misclassifies a query, or you need to control which Jira search API is used, append a flag to your query:

Flag	Effect
`/jql`	Forces the JQL pipeline regardless of LLM classification
`/general`	Forces the general answer path, skipping the JQL pipeline
`/cloud`	Forces `POST /rest/api/3/search/jql` (Jira Cloud API) for this request
`/server`	Forces `GET /rest/api/2/search` (Jira Server API) for this request

All flags are stripped from the query before it is sent to the LLM, so they do not affect the generated JQL or answer.

/cloud and /server override the search method and path for the current request only — they do not switch the active profile or change the Jira base URL. Use them when the active profile's jira_type is correct but you need to temporarily force a different API version. Flags can be combined:

[atlasmind]> list open issues in KAFKA /cloud
[atlasmind]> project = KAFKA AND status = Open /raw /cloud

Examples:

[atlasmind]> how many states are there in India /general

  Route   : General answer
  Answer  : What is the definition of an atom?

[atlasmind]> list issues in KAFKA /jql

  Route   : JQL pipeline
  JQL     : project = KAFKA ORDER BY created DESC

Overrides work across all LLM backends (Ollama, Groq, vLLM, Claude, Bedrock).

Architecture

Data flow:

JQL_Embeddings.run() seeds pgvector with (annotation, JQL) pairs parsed from the annotation file
Jira_Field_Embeddings.run() seeds pgvector with Jira field metadata (name, type, allowed values) — auto-fetched from the Jira REST API on first run if the file is absent
At query time, QueryRouter makes a single fast LLM call to classify the query:
- General query → answered immediately; no embeddings or Jira API calls
- JQL query → full RAG pipeline: encode → similarity search → prompt → LLM → Jira API
The assembled prompt is split on the ## Available Jira Fields marker and sent to the active LLM. For Claude and Bedrock the stable system instructions (before the marker) are sent as a cached system block — billed at ~90% less on subsequent requests in the same session. For Groq the split maps to OpenAI system / user roles. Ollama and vLLM receive the full prompt as a single string. Actual token counts (including cache hits) are logged from every API response
LLM returns structured JSON with jql, intent_fields, chart_spec, and answer
intent_fields (LLM-proposed display columns) are resolved via FieldResolver. Exact name matches are tried first; unknown names fall back to embedding similarity search against the field metadata vector store — catching LLM variants like fixVersion → Fix Version/s without any extra LLM call
JQL is validated by JqlSanitizer before execution. Invalid field values are detected by cosine similarity against the jira_field_values vector store and corrected deterministically — no LLM call needed for known-value fields
The validated JQL is executed against the Jira REST API. On failure, the Jira error is appended to the accumulated retry prompt and the LLM is asked to correct the JQL — each successive retry carries the full failure history of all prior attempts so the model sees every error at once. Up to JQL_MAX_ATTEMPTS total attempts (default 4). Certain errors are fixed deterministically without an LLM call: invalid field values are stripped, comment IS NOT EMPTY is rewritten to comment ~ '.', and unsupported IS [NOT] EMPTY operators on fields like issueLinkType are stripped inline
Token usage (system prompt, field block, examples, and cumulative retry tokens) is tracked per query and returned in token_usage on every response — including error responses

Both seeding steps are hash-gated — re-encoding is skipped if the source files have not changed since the last run.

A third vector table (jira_field_values) stores one embedding per (field_id, allowed_value) pair. It is seeded from the jira_allowed_values.json file at startup and used by the JqlSanitizer for pure-DB value correction — no LLM call, no token cost. High-cardinality fields are capped at MAX_VALUES_FOR_EMBEDDING values (default 50) to keep seeding fast; the full value list is still held in-memory for exact-match correction. The seed key encodes the cap value (::cap50) so changing MAX_VALUES_FOR_EMBEDDING automatically triggers a re-seed on next startup without any manual DB intervention.

A fourth vector table (jira_asset_values) stores one embedding per (field_id, label) pair for Jira Assets (formerly Insight) object fields. Asset object labels (e.g. "Sample Domain (ABCD-1234)") come from a different API than standard Jira field options and are stored in a dedicated table. At prompt-assembly time, the closest asset labels to the user query are injected as value hints so the LLM generates the exact label string on the first pass. Asset fields are optional — if jira_assets.json is absent or the Assets API is unreachable, the server starts normally without asset hints; an error is logged but startup is never blocked.

Jira fields are stored per domain under data/{domain_slug}/ (e.g. data/issues_apache_org/jira_fields.json). Switching the active profile in config/profiles.json automatically uses the correct set of files for that Jira instance.

Key files:

File	Role
`app.py`	CLI entry point — `--query` (REPL / single-shot), `--server`, `--model`, `--host`, `--port`
`server.py`	FastAPI app with `/health` and `/query` endpoints
`core/atlasmind.py`	Top-level orchestrator — `run()` seeds both DBs, `generate_jql()` is the query entry point
`core/router.py`	Two-stage query router — fast LLM classify before triggering RAG pipeline
`core/ollama_client.py`	Sync `test_connection()` and async `generate_jql()` against the Ollama API
`core/groq_client.py`	Async Groq REST client (OpenAI-compatible); splits prompt into `system` / `user` roles at the `## Available Jira Fields` marker; logs token usage; used when `--model groq`
`core/vllm_client.py`	Async vLLM REST client (OpenAI-compatible); auto-detects model from `/v1/models`; used when `--model vllm`
`core/claude_client.py`	Async Anthropic SDK client; caches system prompt via `cache_control: ephemeral` + `anthropic-beta` header; logs input/output/cache token counts; used when `--model claude`
`core/bedrock_claude_client.py`	boto3 `converse()` client for Bedrock-compatible endpoints; caches system prompt via `cacheConfig: default` in the `system` block; used when `--model bedrock`
`core/jira_auth.py`	Per-request Jira auth — `X-Jira-Token` and `X-Jira-Url` FastAPI dependencies; `JiraProfile` / `JiraCredential` Pydantic models
`cloud/oci_vault.py`	OCI Vault secret fetching via Instance Principal; fallback to plain env var
`rag/jql_embeddings.py`	Seeds and searches the JQL annotation pgvector table
`rag/jira_field_embeddings.py`	Seeds and searches the Jira field metadata pgvector table; `find_similar_field_name()` provides embedding fallback for unknown intent field names
`rag/jira_field_value_embeddings.py`	Seeds and searches the `jira_field_values` pgvector table — one embedding per `(field_id, allowed_value)` pair; used by `JqlSanitizer` for value correction without LLM calls
`rag/jira_asset_embeddings.py`	Seeds and searches the `jira_asset_values` pgvector table — one embedding per `(field_id, label)` pair for Jira Assets object fields; used to inject query-ranked asset labels as value hints before LLM generation
`core/jql_sanitizer.py`	Deterministic JQL pre-execution corrections: strips invalid field values, rewrites unsupported operators, injects value-hint candidates into retry prompts
`jira/jira_field_api.py`	Fetches field metadata and allowed values from the Jira REST API
`jira/jira_assets_api.py`	Fetches Jira Assets object labels via the Assets AQL API; `list_asset_fields()` prints detected Insight/Assets custom fields from the cached `jira_fields.json`
`rag/seed_manager.py`	MD5 hash-based seeding gate stored in a `seed_metadata` pgvector table
`config/profiles.json`	Jira connection profiles (URL, credentials); `default` key selects the active one
`config/system_prompt.md`	JQL-only system prompt (general answers handled by router)
`config/router_prompt.md`	Router prompt template with Jira vocabulary list and few-shot examples
`settings.py`	All defaults and env-overridable settings for both Ollama and Groq backends

Jira connection profiles

Edit config/profiles.json to configure your Jira instance:

{
  "default": "work",
  "profiles": {
    "work": {
      "jira_url": "https://issues.apache.org/jira",
      "email": "",
      "token": "",
      "jira_type": "server",
      "search_path": ""
    },
    "personal": {
      "jira_url": "https://myorg.atlassian.net",
      "email": "me@example.com",
      "token": "",
      "jira_type": "cloud",
      "search_path": ""
    }
  }
}

Change "default" to switch the active instance. Jira fields are auto-fetched and stored in data/{domain_slug}/ on first run.

`jira_type`

Controls the Jira search API used for every query:

`jira_type`	Method	Endpoint	Pagination
`cloud`	`POST`	`/rest/api/3/search/jql`	Cursor (`nextPageToken`)
`server`	`GET`	`/rest/api/2/search`	Offset (`startAt`)

Defaults to cloud when omitted.

`search_path`

Optional override for the search endpoint path. Leave empty to use the default for jira_type. Set this to change the search path without rebuilding or redeploying — useful if the Jira API version changes:

"search_path": "/rest/api/4/search/jql"

When search_path is set, it overrides the default path but jira_type still determines the HTTP method and pagination strategy (cloud → POST + cursor, server → GET + offset).

Per-request auth headers

Frontends can override credentials per request using HTTP headers — no server restart needed:

Header	Description
`X-Jira-Token`	PAT or API token; takes precedence over the profile `token` field
`X-Jira-Url`	Jira base URL; overrides the profile `jira_url` (must be a valid `http`/`https` URL)

Both headers are optional. When absent, the active profile values are used as fallback.

Jira Assets (optional)

If your Jira instance uses the Assets module (formerly Insight), AtlasMind automatically detects Assets-type fields, fetches their object labels, and embeds them so the LLM generates the correct aqlFunction JQL pattern instead of guessing a raw value.

For example, a user query like "show issues in domain Sample Domain" generates:

"Domain" IN aqlFunction('Name = "Sample Domain"')

instead of the incorrect domain = "Sample Domain" that a plain LLM would produce.

How it works

At startup, AtlasMind:

Reads config/jira_assets_fields.json to get asset field detection keywords (default: [".insight", ".cmdb"]) — read at runtime, no rebuild needed.
Reads jira_fields.json and detects every field whose schema.custom contains any of the configured keywords — the definitive Jira Assets/Insight indicator.
Fetches all object labels for each detected field from the Jira Assets AQL API (GET /rest/assets/1.0/object/aql).
Writes the results to data/<hostname>/jira_assets.json and seeds the jira_asset_values pgvector table.
On subsequent startups, re-fetch is skipped if jira_fields.json has not changed (hash-gated).

No manual configuration is required. A log line confirms the load:

INFO  Detected 1 Assets field(s): customfield_10200
INFO  Asset fields loaded: 1 field(s) — customfield_10200

If the Assets API is unreachable, the server starts normally without asset hints and logs an error — startup is never blocked.

Forcing a refresh

Run this after bulk changes to asset objects in Jira (bypasses the hash gate):

uv run python -c "
import asyncio
from jira.jira_assets_api import refresh_asset_values
asyncio.run(refresh_asset_values())
"

Configuring asset field detection keywords

AtlasMind detects Assets/Insight fields by looking for keywords in schema.custom. Default keywords: [".insight", ".cmdb"].

To add a custom keyword (e.g., for a vendor-specific plugin), edit config/jira_assets_fields.json:

{
    "asset_field_keywords": [".insight", ".cmdb", ".custom-plugin"]
}

Keywords are read from the config file at runtime on every startup — no rebuild or re-deploy needed. Restart AtlasMind after changing this file to re-seed the asset vector table with the updated detection rules.

Overriding the object type name

By default, AtlasMind uses the field display name as the AQL object type (e.g. field named "Domain" → objectType = "Domain"). This covers most cases. When the display name differs from the AQL object type, add an override to config/jira_assets_fields.json:

{
    "customfield_10200": {
        "display_name": "Domain",
        "object_type": "CustomerDomain"
    }
}

Entries in this file take precedence over auto-detected values. If no override file exists, auto-detection runs without it.

Discovering asset field IDs

uv run python -c "from jira.jira_assets_api import list_asset_fields; list_asset_fields()"

Prints all Assets-type fields detected in jira_fields.json with their field IDs and schema keys.

Response model

The /query endpoint returns a QueryResponse Pydantic model:

class QueryResponse(BaseModel):
    type:           str                        # "jql" or "general"
    profile:        str                        # active Jira profile name
    jira_base_url:  str
    jira_type:      str | None                 # effective search API: "cloud" or "server"
    answer:         str | None
    jql:            str | None                 # None for general queries
    total:          int                        # total matching issues in Jira
    shown:          int                        # issues returned in this response
    display_fields: list[str]                  # ordered column headers for the frontend
    issues:         list[dict]                 # normalised issue dicts
    chart_spec:     ChartSpec | None
    filters:        dict[str, list[str]] | None  # facet values for filter dropdowns
    meta:           ServerMeta | None          # model name, backend, timeout
    token_usage:    TokenUsage | None          # prompt token estimates for this query

jira_type reflects the search API actually used for this response — either the profile default or the per-request /cloud//server override. The UI can use this to display which Jira API version was active, or to adapt behaviour for cloud vs server responses.

TokenUsage breaks down prompt size per query:

class TokenUsage(BaseModel):
    system_tokens:   int   # system prompt character count ÷ 4
    fields_tokens:   int   # field context block ÷ 4
    examples_tokens: int   # RAG examples block ÷ 4
    total_tokens:    int   # total prompt tokens for the initial LLM call
    retry_tokens:    int   # cumulative tokens added across all retry extensions

token_usage is present on every response including error responses, so the frontend can track cost even when a query fails.

For general (non-Jira) questions, jql and chart_spec are None and answer contains the plain-text response.

For JQL queries, answer always includes a result-count suffix appended by the server after the Jira search completes — for example Found 42 result(s)., Found 500 result(s); showing 500. (when paginated), or No results found.

Data files

JQL annotation file (`data/jira_jql_annotated_queries.md`)

Markdown file with /* comment */\nJQL pairs used as few-shot examples:

/* open bugs assigned to me */
assignee = currentUser() AND issuetype = Bug AND status != Done ORDER BY created DESC

/* high priority tickets created this week */
priority = High AND created >= startOfWeek() ORDER BY created DESC

Jira fields (`data/{domain_slug}/jira_fields.json`)

Fetched automatically on first run from /rest/api/2/field. Keyed by field ID. A companion jira_allowed_values.json is also fetched and merged in to enrich descriptions with discrete option lists (e.g. status values, issue types).

Jira Assets override config (`config/jira_assets_fields.json`)

Contains asset_field_keywords (keywords to detect Assets/Insight fields in schema.custom) and optional per-field object type overrides. See the Jira Assets section.

Jira Assets cache (`data/{domain_slug}/jira_assets.json`)

Written automatically at startup by the Assets auto-detect flow. Contains all object labels per detected asset field. Re-run refresh_asset_values() whenever asset objects change in Jira — the server detects the hash change and re-seeds the jira_asset_values table on next startup.

Running vLLM on a GPU system (GPU inference server)

AtlasMind on OCI A1 can offload all LLM inference to a local GPU system over Tailscale. Only vLLM needs to run on the GPU system — no database, no AtlasMind installation required there.

What runs where

Machine	What runs
GPU system	vLLM only — serves the model over HTTP
OCI A1 (always-on)	AtlasMind + Postgres + Ollama (fallback) + frontend

AtlasMind on OCI A1 sends prompts to vLLM on the GPU system over Tailscale. When the GPU system is off, AtlasMind falls back to its local Ollama automatically.

Step 1 — Install WSL2 (Windows only)

vLLM does not run natively on Windows. You need WSL2 with Ubuntu.

Open PowerShell as Administrator and run:

wsl --install

Restart when prompted. After restart, Ubuntu opens and asks you to create a username and password. This is your Linux environment — all remaining steps run inside WSL2.

To open WSL2 later: search for Ubuntu in the Start menu, or run wsl in any terminal.

Step 2 — Verify the GPU is visible in WSL2

The NVIDIA driver is automatically bridged from Windows into WSL2 — no separate CUDA toolkit installation needed. vLLM's pip package bundles the CUDA runtime libraries it needs.

Run inside WSL2:

nvidia-smi

You should see your GPU listed with driver version and VRAM. If this command fails, reinstall the latest NVIDIA driver on Windows first, then retry.

Step 3 — Install vLLM in a virtual environment

Ubuntu 24.04 does not allow system-wide pip installs. Use a virtual environment:

python3 -m venv ~/vllm-env
source ~/vllm-env/bin/activate
pip install vllm

After activation you will see (vllm-env) in your prompt. This download is large (~5 GB) — let it complete fully before continuing.

Always activate the environment before running vLLM in future sessions:

source ~/vllm-env/bin/activate

Step 4 — Choose and run a model

With 8 GB VRAM, use a quantized 7B model. AWQ quantization gives the best quality-to-size ratio and is natively supported by vLLM.

Recommended for AtlasMind (reliable structured JSON and JQL output):

vllm serve Qwen/Qwen2.5-Coder-7B-Instruct-AWQ \
  --quantization awq \
  --gpu-memory-utilization 0.85 \
  --max-model-len 8192 \
  --port 8002 \
  --host 0.0.0.0

Qwen2.5-Coder is preferred over the general instruct variant because JQL is a query language (similar to SQL). The Coder model is trained on code and structured DSLs, making it more reliable at generating syntactically correct JQL and strictly following the JSON output format (jql, intent_fields, chart_spec, answer).

--gpu-memory-utilization 0.85 reserves 85% of VRAM for vLLM. The default is 0.9 (90%) which can exceed available VRAM on 8 GB cards due to Windows/WSL2 overhead. Lower to 0.80 if startup still fails.

--max-model-len 8192 caps the context window at 8192 tokens. The model's default (32768) requires more KV cache than fits in 8 GB after loading weights. 8192 is sufficient for AtlasMind — typical prompts (system prompt + RAG examples + query) are 1500–2500 tokens.

On first run, this downloads the model weights from HuggingFace (~4.5 GB). Subsequent runs load from the local cache. Wait until you see:

INFO:     Application startup complete.

The server is now listening on port 8002.

Alternative models (all fit in 8 GB VRAM with AWQ):

Model	VRAM	Notes
`Qwen/Qwen2.5-7B-Instruct-AWQ`	~4.5 GB	General instruct, solid fallback
`meta-llama/Llama-3.1-8B-Instruct-AWQ`	~5.5 GB	Strong reasoning, good alternative

Step 5 — Verify the server is running

From WSL2, confirm the API responds:

curl http://localhost:8002/v1/models

You should see a JSON response listing the loaded model name.

Step 6 — Configure AtlasMind on OCI A1

On the OCI A1 machine, set the following environment variables before starting AtlasMind:

export VLLM_URL=http://<gpu-system-tailscale-ip>:8002

Replace <gpu-system-tailscale-ip> with the GPU system's Tailscale IP address (find it by running tailscale ip in PowerShell on the GPU system, or clicking the Tailscale tray icon).

Then start AtlasMind with the vLLM backend:

uv run python app.py --server --model vllm

AtlasMind auto-detects the loaded model from vLLM's /v1/models endpoint — no need to set the model name explicitly.

Keeping vLLM running across WSL2 sessions

WSL2 shuts down when you close the terminal. To keep vLLM running in the background:

source ~/vllm-env/bin/activate
nohup vllm serve Qwen/Qwen2.5-Coder-7B-Instruct-AWQ \
  --quantization awq \
  --gpu-memory-utilization 0.85 \
  --max-model-len 8192 \
  --port 8002 \
  --host 0.0.0.0 > ~/vllm.log 2>&1 &

Logs go to ~/vllm.log. Check them with tail -f ~/vllm.log.

Setting up Tailscale for vLLM access

Tailscale creates a private network between your GPU system and OCI A1, so AtlasMind can reach vLLM securely without exposing any ports to the internet.

Step 1 — Install Tailscale on Windows

Download and install Tailscale from tailscale.com/download. Run the installer and sign in with your Tailscale account (Google, GitHub, or Microsoft login).

Once signed in, Tailscale assigns your Windows machine a private IP in the 100.x.x.x range. You will see the Tailscale icon in the system tray.

Step 2 — Configure WSL2 networking

Edit (or create) C:\Users\<username>\.wslconfig and add:

[wsl2]
networkingMode=mirrored
firewall=false

Restart WSL2 to apply:

wsl --shutdown

networkingMode=mirrored makes WSL2 share the Windows network stack directly — vLLM is reachable at the Windows machine's IP without any port proxy. firewall=false disables the WSL2 Hyper-V firewall layer, which otherwise blocks inbound connections independently of other firewall rules.

Step 3 — Configure the Windows firewall

Add an inbound allow rule for TCP port 8002.

If you use Windows Defender Firewall only:

Press Win + R → type wf.msc → Enter
Inbound Rules → New Rule → Port → TCP → 8002 → Allow the connection → All profiles → Finish

If you use a third-party firewall suite (e.g. Norton 360, McAfee):

Third-party firewall suites include their own firewall engine that runs alongside Windows Defender Firewall. Add the port 8002 allow rule in your firewall suite's settings — for Norton: Settings → Firewall → Traffic Rules → Add → Action: Allow, Direction: Inbound, Protocol: TCP, Local port: 8002, Profile: All.

Note: If inbound connections are still blocked after adding the rule, both firewall engines may be active simultaneously and conflicting. If your third-party suite is the intended firewall, disable Windows Defender Firewall so only one engine is enforcing rules. Run the following in PowerShell as Administrator:
Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled False
This disables Windows Defender Firewall across all profiles. Your third-party firewall (Norton, McAfee, etc.) remains active.

Step 4 — Restarting vLLM after WSL2 shutdown

WSL2 resets completely on every shutdown (wsl --shutdown, PC restart, or closing the terminal) — all running processes including vLLM are killed. You must restart vLLM each time WSL2 comes back up.

To make this less tedious, add a shell alias to your ~/.bashrc:

echo "alias start-vllm='source ~/vllm-env/bin/activate && vllm serve Qwen/Qwen2.5-Coder-7B-Instruct-AWQ --quantization awq --gpu-memory-utilization 0.85 --max-model-len 8192 --port 8002 --host 0.0.0.0'" >> ~/.bashrc
source ~/.bashrc

Then to start vLLM in any future session:

start-vllm

Or in the background:

start-vllm > ~/vllm.log 2>&1 &

Then follow the logs:

tail -f ~/vllm.log

Step 5 — Find your Tailscale IP (on the GPU system)

In PowerShell on Windows, run:

tailscale ip

Or click the Tailscale system tray icon — your IP is shown at the top. It will look like 100.x.x.x.

Note this IP — you will set it as VLLM_URL on OCI A1.

Step 6 — Install Tailscale on OCI A1

On the OCI A1 instance, run:

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Follow the authentication link printed in the terminal to connect OCI A1 to the same Tailscale account. Once authenticated, OCI A1 and your GPU system are on the same private network.

Step 7 — Verify connectivity

From OCI A1, confirm it can reach vLLM on the GPU system (replace with your actual Tailscale IP):

curl http://100.x.x.x:8002/v1/models

You should get back a JSON response listing the loaded model. If the request times out, check that:

vLLM is running in WSL2 with --host 0.0.0.0
.wslconfig has networkingMode=mirrored and firewall=false, and WSL2 was restarted after the change
Both machines show as Connected in the Tailscale admin console at login.tailscale.com
The firewall allow rule for port 8002 is in place (Step 3)
If using a third-party firewall suite, check whether both firewall engines are conflicting (see Step 3 note)

Step 8 — Configure AtlasMind

On OCI A1, set the Tailscale IP before starting the server:

export VLLM_URL=http://100.x.x.x:8002
uv run python app.py --server --model vllm

Running tests

uv run python -m pytest tests/ -v

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
cloud		cloud
config		config
core		core
data		data
jira		jira
rag		rag
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
server.py		server.py
settings.py		settings.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Atlasmind-Lite

Preview

Prerequisites

Setup

One-time Jira field fetch

Running the app

Interactive REPL

Single-shot query

FastAPI server

Routing overrides

Architecture

Jira connection profiles

jira_type

search_path

Per-request auth headers

Jira Assets (optional)

How it works

Forcing a refresh

Configuring asset field detection keywords

Overriding the object type name

Discovering asset field IDs

Response model

Data files

JQL annotation file (data/jira_jql_annotated_queries.md)

Jira fields (data/{domain_slug}/jira_fields.json)

Jira Assets override config (config/jira_assets_fields.json)

Jira Assets cache (data/{domain_slug}/jira_assets.json)

Running vLLM on a GPU system (GPU inference server)

What runs where

Step 1 — Install WSL2 (Windows only)

Step 2 — Verify the GPU is visible in WSL2

Step 3 — Install vLLM in a virtual environment

Step 4 — Choose and run a model

Step 5 — Verify the server is running

Step 6 — Configure AtlasMind on OCI A1

Keeping vLLM running across WSL2 sessions

Setting up Tailscale for vLLM access

Step 1 — Install Tailscale on Windows

Step 2 — Configure WSL2 networking

Step 3 — Configure the Windows firewall

Step 4 — Restarting vLLM after WSL2 shutdown

Step 5 — Find your Tailscale IP (on the GPU system)

Step 6 — Install Tailscale on OCI A1

Step 7 — Verify connectivity

Step 8 — Configure AtlasMind

Running tests

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`jira_type`

`search_path`

JQL annotation file (`data/jira_jql_annotated_queries.md`)

Jira fields (`data/{domain_slug}/jira_fields.json`)

Jira Assets override config (`config/jira_assets_fields.json`)

Jira Assets cache (`data/{domain_slug}/jira_assets.json`)

Packages