[Bug] Context length is reduced to 4096 after failed to load a model until last successfully loaded model is unloaded manually

**Hardware**: Mac Studio M3 Ultra 512GB
**Problem**: The context length in the chat interface is reduced to 4096 tokens once Unsloth Studio failed to load a model. The 4096 context length remains there even if we successfully load models with higher context lengths.
**Workaround**: If you explicitly unload the model via the model selector and load the model again then the correct context length (retrieved from GGUF) will be set.
**Reproduction**:
1. Start Unsloth Studio
2. Load Qwen3.5-4B-UD-Q4_K_XL.gguf (context length is successfully set to: 262144)
3. Try to load gpt-oss-120b-mlx-8bit (load fails, Unsloth Studio automatically reloads qwen3.5-4b-gguf but llama.cpp is started with context length of 4096)
4. Load Qwen3.5-35B-A3B-Q4_K_M.gguf (load succeeds, GGUF context length is read correctly: 262144 but llama.cpp is started with context length of 4096 only)
5. Load the initial Qwen3.5-4B-UD-Q4_K_XL.gguf again (load succeeds, GGUF context length is read correctly but llama.cpp is started with context length of 4096 only)
6. Unload the model with the link at the bottom of the model list dropdown.
7. Load Qwen3.5-4B-UD-Q4_K_XL.gguf (voila the context length is set correctly to 262144 tokens)


**Log produced during the execution of the above described steps**:
```
(base) acker@Ackermanns-Mac-Studio ~ % /Users/acker/.unsloth/studio/unsloth_studio/bin/unsloth studio -H 0.0.0.0 -p 8888
Starting Unsloth Studio on http://84.236.97.122:8888
✅ Frontend loaded from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/frontend/dist
INFO:     Started server process [46112]
INFO:     Waiting for application startup.
Hardware detected: CPU (no GPU backend available)
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
{"timestamp": "2026-04-07T09:24:42.770458Z", "level": "info", "event": "Pre-caching helper GGUF: unsloth/Qwen3.5-4B-GGUF/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:24:43.254440Z", "level": "info", "event": "Helper GGUF cached: 1 file(s)"}

🦥 Unsloth Studio is running
────────────────────────────────────────────────────
  On this machine — open this in your browser:
    http://127.0.0.1:8888
    (same as http://localhost:8888)

  From another device on your network / to share:
    http://84.236.97.122:8888

  API & health:
    http://127.0.0.1:8888/api
    http://127.0.0.1:8888/api/health
────────────────────────────────────────────────────
  Tip: if you are on the same computer, use the Local link above.

{"timestamp": "2026-04-07T09:24:52.319399Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 8.69}
{"timestamp": "2026-04-07T09:25:06.103265Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 4.67}
{"timestamp": "2026-04-07T09:25:31.689921Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 9.7}
{"timestamp": "2026-04-07T09:25:36.552190Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 360.71}
{"timestamp": "2026-04-07T09:25:39.321466Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:25:39.322826Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 591.4}
{"timestamp": "2026-04-07T09:25:39.337462Z", "level": "info", "event": "InferenceOrchestrator initialized (subprocess mode)"}
{"timestamp": "2026-04-07T09:25:39.338286Z", "level": "info", "event": "Unloaded model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:25:39.339055Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 7.62}
{"timestamp": "2026-04-07T09:25:39.562447Z", "level": "info", "event": "Top GGUF models: ['unsloth/Qwen3.5-35B-A3B-GGUF', 'unsloth/Qwen3.5-9B-GGUF', 'unsloth/gemma-4-26B-A4B-it-GGUF', 'unsloth/Qwen3.5-27B-GGUF', 'unsloth/gemma-4-31B-it-GGUF', 'unsloth/Qwen3.5-4B-GGUF', 'unsloth/gemma-4-E4B-it-GGUF', 'unsloth/Qwen3.5-122B-A10B-GGUF', 'unsloth/Qwen3.5-0.8B-GGUF', 'unsloth/Qwen3.5-2B-GGUF', 'unsloth/gemma-4-E2B-it-GGUF', 'unsloth/LTX-2.3-GGUF', 'unsloth/Qwen3-Coder-Next-GGUF', 'unsloth/gpt-oss-20b-GGUF', 'unsloth/Nemotron-3-Nano-30B-A3B-GGUF', 'unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF', 'unsloth/gpt-oss-120b-GGUF', 'unsloth/GLM-4.7-Flash-GGUF', 'unsloth/gemma-3-27b-it-GGUF', 'unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF', 'unsloth/Qwen-Image-Edit-2511-GGUF', 'unsloth/gemma-3-12b-it-GGUF', 'unsloth/Qwen3-VL-4B-Instruct-GGUF', 'unsloth/Qwen3.5-397B-A17B-GGUF', 'unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF', 'unsloth/Qwen2.5-VL-7B-Instruct-GGUF', 'unsloth/Llama-3.2-1B-Instruct-GGUF', 'unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF', 'unsloth/gemma-3-4b-it-GGUF', 'unsloth/gemma-3-270m-it-GGUF', 'unsloth/Qwen3-4B-Instruct-2507-GGUF', 'unsloth/Qwen3-VL-8B-Instruct-GGUF', 'unsloth/Qwen3-4B-GGUF']"}
{"timestamp": "2026-04-07T09:25:39.562946Z", "level": "info", "event": "Top hub models: ['unsloth/mistral-7b-v0.3-bnb-4bit', 'unsloth/Llama-3.1-8B-Instruct', 'unsloth/DeepSeek-OCR-2', 'unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit', 'unsloth/Qwen3-0.6B', 'unsloth/GLM-4.7-Flash-FP8-Dynamic', 'unsloth/Qwen3-0.6B-unsloth-bnb-4bit', 'unsloth/Meta-Llama-3.1-8B-Instruct', 'unsloth/Qwen2.5-7B-Instruct-bnb-4bit', 'unsloth/Qwen3-14B-unsloth-bnb-4bit', 'unsloth/Qwen3.5-9B', 'unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit', 'unsloth/Qwen3.5-4B', 'unsloth/GLM-4.7-Flash', 'unsloth/Qwen2.5-7B', 'unsloth/Qwen2.5-7B-Instruct', 'unsloth/gpt-oss-20b-unsloth-bnb-4bit', 'unsloth/gpt-oss-20b', 'unsloth/Llama-3.2-1B-Instruct', 'unsloth/Qwen3.5-2B', 'unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit', 'unsloth/Qwen3-1.7B-unsloth-bnb-4bit', 'unsloth/Llama-3.2-3B-Instruct', 'unsloth/gpt-oss-120b-BF16', 'unsloth/Qwen3-8B-unsloth-bnb-4bit', 'unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit', 'unsloth/Qwen2-7B', 'unsloth/Qwen3.5-0.8B', 'unsloth/gpt-oss-20b-BF16', 'unsloth/Llama-3.2-1B-Instruct-unsloth-bnb-4bit', 'unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit', 'unsloth/Qwen3-VL-4B-Instruct', 'unsloth/Qwen2.5-7B-Instruct-unsloth-bnb-4bit', 'unsloth/Qwen2.5-0.5B-unsloth-bnb-4bit', 'unsloth/Llama-3.2-1B', 'unsloth/Qwen3-4B-bnb-4bit', 'unsloth/Qwen3-1.7B', 'unsloth/llama-3-8b-bnb-4bit', 'unsloth/Qwen3-4B', 'unsloth/Qwen3-4B-unsloth-bnb-4bit']"}
{"timestamp": "2026-04-07T09:25:39.927531Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:25:40.816457Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:25:40.816604Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:25:41.266413Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:25:41.991329Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:25:42.512953Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:25:42.513036Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:25:42.513053Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:25:42.513087Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:25:42.525948Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 20.0 GB, context: 262144, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:25:42.526166Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:25:42.526258Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:25:42.526284Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62097 -c 262144 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:25:44.082119Z", "level": "info", "event": "llama-server ready on port 62097 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:25:44.082907Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:25:44.106699Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:25:44.111345Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 4765.91}
{"timestamp": "2026-04-07T09:25:44.122256Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 4.48}
{"timestamp": "2026-04-07T09:25:44.126407Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:25:44.129685Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:25:44.129782Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:25:44.129961Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 6.88}
{"timestamp": "2026-04-07T09:25:44.130083Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 6.96}
{"timestamp": "2026-04-07T09:27:31.403234Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 11.62}
{"timestamp": "2026-04-07T09:27:40.894766Z", "level": "warning", "event": "Could not detect base model for LoRA: /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:41.455463Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 564.57}
{"timestamp": "2026-04-07T09:27:41.594420Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:27:41.594478Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:27:41.594688Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 133.56}
{"timestamp": "2026-04-07T09:27:41.599314Z", "level": "warning", "event": "Could not detect base model for LoRA: /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:41.599826Z", "level": "info", "event": "ExportOrchestrator initialized (subprocess mode)"}
{"timestamp": "2026-04-07T09:27:41.599934Z", "level": "info", "event": "Spawning fresh inference subprocess for '/Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit' (transformers 4.x)"}
{"timestamp": "2026-04-07T09:27:41.608854Z", "level": "info", "event": "Inference subprocess started (pid=46852)"}
{"timestamp": "2026-04-07T09:27:42.832999Z", "level": "warning", "event": "Could not detect base model for LoRA: /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:42.833453Z", "level": "info", "event": "Using default transformers (4.57.x) for /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:42.833847Z", "level": "info", "event": "Subprocess status: Importing Unsloth..."}
{"timestamp": "2026-04-07T09:27:42.922291Z", "level": "error", "event": "Error loading model: Subprocess error: Failed to import ML libraries: Unsloth currently only works on NVIDIA, AMD and Intel GPUs.", "exc_info": true}
{"timestamp": "2026-04-07T09:27:42.922647Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 500, "process_time_ms": 1324.03}
{"timestamp": "2026-04-07T09:27:43.813890Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:27:45.526025Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:27:45.526213Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:27:45.978189Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:27:46.700592Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:27:47.797772Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:27:47.797818Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:27:47.797840Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:27:47.797877Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:27:47.804610Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 0.3 GB, context: 4096, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:27:47.804705Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:27:47.804789Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:27:47.804816Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62112 -c 4096 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:27:48.850551Z", "level": "info", "event": "llama-server ready on port 62112 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:27:48.851224Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:27:48.870135Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:27:48.873484Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 5946.47}
{"timestamp": "2026-04-07T09:27:48.884500Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 3.75}
{"timestamp": "2026-04-07T09:27:48.888380Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:27:48.891559Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:27:48.891654Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:27:48.891821Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 6.53}
{"timestamp": "2026-04-07T09:27:48.891934Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 6.6}
{"timestamp": "2026-04-07T09:29:58.630117Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 10.58}
{"timestamp": "2026-04-07T09:30:10.506922Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 3.94}
{"timestamp": "2026-04-07T09:30:11.782547Z", "level": "info", "event": "Detected local GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf"}
{"timestamp": "2026-04-07T09:30:11.783210Z", "level": "info", "event": "Detected mmproj for vision: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:11.783760Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 4.98}
{"timestamp": "2026-04-07T09:30:11.863510Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:30:11.863655Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:30:11.864011Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 77.86}
{"timestamp": "2026-04-07T09:30:11.869897Z", "level": "info", "event": "Detected local GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf"}
{"timestamp": "2026-04-07T09:30:11.870097Z", "level": "info", "event": "Detected mmproj for vision: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:11.917898Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:30:11.917950Z", "level": "info", "event": "GGUF metadata: chat_template=7756 chars"}
{"timestamp": "2026-04-07T09:30:11.917964Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:30:11.917999Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:30:11.924693Z", "level": "info", "event": "GGUF size: 19.7 GB, est. KV cache: 0.2 GB, context: 4096, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:30:11.924811Z", "level": "info", "event": "Reasoning model: enable_thinking=True by default"}
{"timestamp": "2026-04-07T09:30:11.924879Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:11.924905Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf --port 62125 -c 4096 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": true} --mmproj /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:13.471887Z", "level": "info", "event": "llama-server ready on port 62125 for model '/Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF'"}
{"timestamp": "2026-04-07T09:30:13.472304Z", "level": "info", "event": "Loaded GGUF model via llama-server: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:30:13.490184Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:30:13.493457Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 1625.12}
{"timestamp": "2026-04-07T09:30:13.501453Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 3.36}
{"timestamp": "2026-04-07T09:30:13.505145Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:30:13.508733Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:30:13.508844Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:30:13.509033Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 6.87}
{"timestamp": "2026-04-07T09:30:13.509176Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 6.97}
{"timestamp": "2026-04-07T09:31:19.354060Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 11.31}
{"timestamp": "2026-04-07T09:31:32.682409Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 181.99}
{"timestamp": "2026-04-07T09:31:35.624802Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:31:35.626184Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 872.99}
{"timestamp": "2026-04-07T09:31:35.764832Z", "level": "info", "event": "Unloaded GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:31:35.764892Z", "level": "info", "event": "Unloaded GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:31:35.765089Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 130.99}
{"timestamp": "2026-04-07T09:31:36.737589Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:31:37.339912Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:31:37.340072Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:31:37.683185Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:31:38.107990Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:31:38.618221Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:31:38.618274Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:31:38.618290Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:31:38.618324Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:31:38.625933Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 0.3 GB, context: 4096, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:31:38.626039Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:31:38.626115Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:31:38.626146Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62138 -c 4096 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:31:39.668044Z", "level": "info", "event": "llama-server ready on port 62138 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:31:39.668669Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:31:39.688732Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:31:39.691752Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 3921.88}
{"timestamp": "2026-04-07T09:31:39.702758Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 3.81}
{"timestamp": "2026-04-07T09:31:39.706864Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:31:39.711015Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:31:39.711118Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:31:39.711263Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 7.79}
{"timestamp": "2026-04-07T09:31:39.711518Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 1.51}
{"timestamp": "2026-04-07T09:32:33.392744Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 11.05}
{"timestamp": "2026-04-07T09:32:35.022329Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:32:35.022483Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:32:35.022865Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 128.25}
{"timestamp": "2026-04-07T09:32:35.029772Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 2.94}
{"timestamp": "2026-04-07T09:32:35.031819Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:32:35.031934Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:32:35.032090Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 1.54}
{"timestamp": "2026-04-07T09:32:35.032211Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 1.62}
{"timestamp": "2026-04-07T09:33:03.304072Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 9.51}
{"timestamp": "2026-04-07T09:33:05.365800Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 229.14}
{"timestamp": "2026-04-07T09:33:08.995626Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:33:08.996673Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 1147.97}
{"timestamp": "2026-04-07T09:33:10.200225Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:33:10.833191Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:33:10.833401Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:33:11.136599Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:33:11.430023Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:33:11.815927Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:33:11.815984Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:33:11.816000Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:33:11.816034Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:33:11.823487Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 20.0 GB, context: 262144, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:33:11.823618Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:33:11.823715Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:33:11.823762Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62151 -c 262144 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:33:13.371150Z", "level": "info", "event": "llama-server ready on port 62151 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:33:13.372028Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:33:13.397559Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:33:13.401291Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 4396.82}
{"timestamp": "2026-04-07T09:33:13.411958Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 4.08}
{"timestamp": "2026-04-07T09:33:13.420743Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:33:13.424083Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:33:13.424190Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:33:13.424390Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 11.56}
{"timestamp": "2026-04-07T09:33:13.424538Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 11.66}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Context length is reduced to 4096 after failed to load a model until last successfully loaded model is unloaded manually #4893

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Context length is reduced to 4096 after failed to load a model until last successfully loaded model is unloaded manually #4893

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions