(base) acker@Ackermanns-Mac-Studio ~ % /Users/acker/.unsloth/studio/unsloth_studio/bin/unsloth studio -H 0.0.0.0 -p 8888
Starting Unsloth Studio on http://84.236.97.122:8888
✅ Frontend loaded from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/frontend/dist
INFO: Started server process [46112]
INFO: Waiting for application startup.
Hardware detected: CPU (no GPU backend available)
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
{"timestamp": "2026-04-07T09:24:42.770458Z", "level": "info", "event": "Pre-caching helper GGUF: unsloth/Qwen3.5-4B-GGUF/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:24:43.254440Z", "level": "info", "event": "Helper GGUF cached: 1 file(s)"}
🦥 Unsloth Studio is running
────────────────────────────────────────────────────
On this machine — open this in your browser:
http://127.0.0.1:8888
(same as http://localhost:8888)
From another device on your network / to share:
http://84.236.97.122:8888
API & health:
http://127.0.0.1:8888/api
http://127.0.0.1:8888/api/health
────────────────────────────────────────────────────
Tip: if you are on the same computer, use the Local link above.
{"timestamp": "2026-04-07T09:24:52.319399Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 8.69}
{"timestamp": "2026-04-07T09:25:06.103265Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 4.67}
{"timestamp": "2026-04-07T09:25:31.689921Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 9.7}
{"timestamp": "2026-04-07T09:25:36.552190Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 360.71}
{"timestamp": "2026-04-07T09:25:39.321466Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:25:39.322826Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 591.4}
{"timestamp": "2026-04-07T09:25:39.337462Z", "level": "info", "event": "InferenceOrchestrator initialized (subprocess mode)"}
{"timestamp": "2026-04-07T09:25:39.338286Z", "level": "info", "event": "Unloaded model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:25:39.339055Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 7.62}
{"timestamp": "2026-04-07T09:25:39.562447Z", "level": "info", "event": "Top GGUF models: ['unsloth/Qwen3.5-35B-A3B-GGUF', 'unsloth/Qwen3.5-9B-GGUF', 'unsloth/gemma-4-26B-A4B-it-GGUF', 'unsloth/Qwen3.5-27B-GGUF', 'unsloth/gemma-4-31B-it-GGUF', 'unsloth/Qwen3.5-4B-GGUF', 'unsloth/gemma-4-E4B-it-GGUF', 'unsloth/Qwen3.5-122B-A10B-GGUF', 'unsloth/Qwen3.5-0.8B-GGUF', 'unsloth/Qwen3.5-2B-GGUF', 'unsloth/gemma-4-E2B-it-GGUF', 'unsloth/LTX-2.3-GGUF', 'unsloth/Qwen3-Coder-Next-GGUF', 'unsloth/gpt-oss-20b-GGUF', 'unsloth/Nemotron-3-Nano-30B-A3B-GGUF', 'unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF', 'unsloth/gpt-oss-120b-GGUF', 'unsloth/GLM-4.7-Flash-GGUF', 'unsloth/gemma-3-27b-it-GGUF', 'unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF', 'unsloth/Qwen-Image-Edit-2511-GGUF', 'unsloth/gemma-3-12b-it-GGUF', 'unsloth/Qwen3-VL-4B-Instruct-GGUF', 'unsloth/Qwen3.5-397B-A17B-GGUF', 'unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF', 'unsloth/Qwen2.5-VL-7B-Instruct-GGUF', 'unsloth/Llama-3.2-1B-Instruct-GGUF', 'unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF', 'unsloth/gemma-3-4b-it-GGUF', 'unsloth/gemma-3-270m-it-GGUF', 'unsloth/Qwen3-4B-Instruct-2507-GGUF', 'unsloth/Qwen3-VL-8B-Instruct-GGUF', 'unsloth/Qwen3-4B-GGUF']"}
{"timestamp": "2026-04-07T09:25:39.562946Z", "level": "info", "event": "Top hub models: ['unsloth/mistral-7b-v0.3-bnb-4bit', 'unsloth/Llama-3.1-8B-Instruct', 'unsloth/DeepSeek-OCR-2', 'unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit', 'unsloth/Qwen3-0.6B', 'unsloth/GLM-4.7-Flash-FP8-Dynamic', 'unsloth/Qwen3-0.6B-unsloth-bnb-4bit', 'unsloth/Meta-Llama-3.1-8B-Instruct', 'unsloth/Qwen2.5-7B-Instruct-bnb-4bit', 'unsloth/Qwen3-14B-unsloth-bnb-4bit', 'unsloth/Qwen3.5-9B', 'unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit', 'unsloth/Qwen3.5-4B', 'unsloth/GLM-4.7-Flash', 'unsloth/Qwen2.5-7B', 'unsloth/Qwen2.5-7B-Instruct', 'unsloth/gpt-oss-20b-unsloth-bnb-4bit', 'unsloth/gpt-oss-20b', 'unsloth/Llama-3.2-1B-Instruct', 'unsloth/Qwen3.5-2B', 'unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit', 'unsloth/Qwen3-1.7B-unsloth-bnb-4bit', 'unsloth/Llama-3.2-3B-Instruct', 'unsloth/gpt-oss-120b-BF16', 'unsloth/Qwen3-8B-unsloth-bnb-4bit', 'unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit', 'unsloth/Qwen2-7B', 'unsloth/Qwen3.5-0.8B', 'unsloth/gpt-oss-20b-BF16', 'unsloth/Llama-3.2-1B-Instruct-unsloth-bnb-4bit', 'unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit', 'unsloth/Qwen3-VL-4B-Instruct', 'unsloth/Qwen2.5-7B-Instruct-unsloth-bnb-4bit', 'unsloth/Qwen2.5-0.5B-unsloth-bnb-4bit', 'unsloth/Llama-3.2-1B', 'unsloth/Qwen3-4B-bnb-4bit', 'unsloth/Qwen3-1.7B', 'unsloth/llama-3-8b-bnb-4bit', 'unsloth/Qwen3-4B', 'unsloth/Qwen3-4B-unsloth-bnb-4bit']"}
{"timestamp": "2026-04-07T09:25:39.927531Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:25:40.816457Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:25:40.816604Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:25:41.266413Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:25:41.991329Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:25:42.512953Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:25:42.513036Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:25:42.513053Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:25:42.513087Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:25:42.525948Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 20.0 GB, context: 262144, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:25:42.526166Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:25:42.526258Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:25:42.526284Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62097 -c 262144 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:25:44.082119Z", "level": "info", "event": "llama-server ready on port 62097 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:25:44.082907Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:25:44.106699Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:25:44.111345Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 4765.91}
{"timestamp": "2026-04-07T09:25:44.122256Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 4.48}
{"timestamp": "2026-04-07T09:25:44.126407Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:25:44.129685Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:25:44.129782Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:25:44.129961Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 6.88}
{"timestamp": "2026-04-07T09:25:44.130083Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 6.96}
{"timestamp": "2026-04-07T09:27:31.403234Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 11.62}
{"timestamp": "2026-04-07T09:27:40.894766Z", "level": "warning", "event": "Could not detect base model for LoRA: /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:41.455463Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 564.57}
{"timestamp": "2026-04-07T09:27:41.594420Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:27:41.594478Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:27:41.594688Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 133.56}
{"timestamp": "2026-04-07T09:27:41.599314Z", "level": "warning", "event": "Could not detect base model for LoRA: /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:41.599826Z", "level": "info", "event": "ExportOrchestrator initialized (subprocess mode)"}
{"timestamp": "2026-04-07T09:27:41.599934Z", "level": "info", "event": "Spawning fresh inference subprocess for '/Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit' (transformers 4.x)"}
{"timestamp": "2026-04-07T09:27:41.608854Z", "level": "info", "event": "Inference subprocess started (pid=46852)"}
{"timestamp": "2026-04-07T09:27:42.832999Z", "level": "warning", "event": "Could not detect base model for LoRA: /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:42.833453Z", "level": "info", "event": "Using default transformers (4.57.x) for /Users/acker/.lmstudio/models/lmstudio-community/gpt-oss-120b-mlx-8bit"}
{"timestamp": "2026-04-07T09:27:42.833847Z", "level": "info", "event": "Subprocess status: Importing Unsloth..."}
{"timestamp": "2026-04-07T09:27:42.922291Z", "level": "error", "event": "Error loading model: Subprocess error: Failed to import ML libraries: Unsloth currently only works on NVIDIA, AMD and Intel GPUs.", "exc_info": true}
{"timestamp": "2026-04-07T09:27:42.922647Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 500, "process_time_ms": 1324.03}
{"timestamp": "2026-04-07T09:27:43.813890Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:27:45.526025Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:27:45.526213Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:27:45.978189Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:27:46.700592Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:27:47.797772Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:27:47.797818Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:27:47.797840Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:27:47.797877Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:27:47.804610Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 0.3 GB, context: 4096, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:27:47.804705Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:27:47.804789Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:27:47.804816Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62112 -c 4096 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:27:48.850551Z", "level": "info", "event": "llama-server ready on port 62112 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:27:48.851224Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:27:48.870135Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:27:48.873484Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 5946.47}
{"timestamp": "2026-04-07T09:27:48.884500Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 3.75}
{"timestamp": "2026-04-07T09:27:48.888380Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:27:48.891559Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:27:48.891654Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:27:48.891821Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 6.53}
{"timestamp": "2026-04-07T09:27:48.891934Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 6.6}
{"timestamp": "2026-04-07T09:29:58.630117Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 10.58}
{"timestamp": "2026-04-07T09:30:10.506922Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 3.94}
{"timestamp": "2026-04-07T09:30:11.782547Z", "level": "info", "event": "Detected local GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf"}
{"timestamp": "2026-04-07T09:30:11.783210Z", "level": "info", "event": "Detected mmproj for vision: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:11.783760Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 4.98}
{"timestamp": "2026-04-07T09:30:11.863510Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:30:11.863655Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:30:11.864011Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 77.86}
{"timestamp": "2026-04-07T09:30:11.869897Z", "level": "info", "event": "Detected local GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf"}
{"timestamp": "2026-04-07T09:30:11.870097Z", "level": "info", "event": "Detected mmproj for vision: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:11.917898Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:30:11.917950Z", "level": "info", "event": "GGUF metadata: chat_template=7756 chars"}
{"timestamp": "2026-04-07T09:30:11.917964Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:30:11.917999Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:30:11.924693Z", "level": "info", "event": "GGUF size: 19.7 GB, est. KV cache: 0.2 GB, context: 4096, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:30:11.924811Z", "level": "info", "event": "Reasoning model: enable_thinking=True by default"}
{"timestamp": "2026-04-07T09:30:11.924879Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:11.924905Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf --port 62125 -c 4096 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": true} --mmproj /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf"}
{"timestamp": "2026-04-07T09:30:13.471887Z", "level": "info", "event": "llama-server ready on port 62125 for model '/Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF'"}
{"timestamp": "2026-04-07T09:30:13.472304Z", "level": "info", "event": "Loaded GGUF model via llama-server: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:30:13.490184Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:30:13.493457Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 1625.12}
{"timestamp": "2026-04-07T09:30:13.501453Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 3.36}
{"timestamp": "2026-04-07T09:30:13.505145Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:30:13.508733Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:30:13.508844Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:30:13.509033Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 6.87}
{"timestamp": "2026-04-07T09:30:13.509176Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 6.97}
{"timestamp": "2026-04-07T09:31:19.354060Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 11.31}
{"timestamp": "2026-04-07T09:31:32.682409Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 181.99}
{"timestamp": "2026-04-07T09:31:35.624802Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:31:35.626184Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 872.99}
{"timestamp": "2026-04-07T09:31:35.764832Z", "level": "info", "event": "Unloaded GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:31:35.764892Z", "level": "info", "event": "Unloaded GGUF model: /Users/acker/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF"}
{"timestamp": "2026-04-07T09:31:35.765089Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 130.99}
{"timestamp": "2026-04-07T09:31:36.737589Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:31:37.339912Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:31:37.340072Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:31:37.683185Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:31:38.107990Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:31:38.618221Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:31:38.618274Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:31:38.618290Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:31:38.618324Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:31:38.625933Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 0.3 GB, context: 4096, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:31:38.626039Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:31:38.626115Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:31:38.626146Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62138 -c 4096 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:31:39.668044Z", "level": "info", "event": "llama-server ready on port 62138 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:31:39.668669Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:31:39.688732Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:31:39.691752Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 3921.88}
{"timestamp": "2026-04-07T09:31:39.702758Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 3.81}
{"timestamp": "2026-04-07T09:31:39.706864Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:31:39.711015Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:31:39.711118Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:31:39.711263Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 7.79}
{"timestamp": "2026-04-07T09:31:39.711518Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 1.51}
{"timestamp": "2026-04-07T09:32:33.392744Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 11.05}
{"timestamp": "2026-04-07T09:32:35.022329Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:32:35.022483Z", "level": "info", "event": "Unloaded GGUF model: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:32:35.022865Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/unload", "status_code": 200, "process_time_ms": 128.25}
{"timestamp": "2026-04-07T09:32:35.029772Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 2.94}
{"timestamp": "2026-04-07T09:32:35.031819Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:32:35.031934Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:32:35.032090Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 1.54}
{"timestamp": "2026-04-07T09:32:35.032211Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 1.62}
{"timestamp": "2026-04-07T09:33:03.304072Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/local", "status_code": 200, "process_time_ms": 9.51}
{"timestamp": "2026-04-07T09:33:05.365800Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/gguf-variants", "status_code": 200, "process_time_ms": 229.14}
{"timestamp": "2026-04-07T09:33:08.995626Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:33:08.996673Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/validate", "status_code": 200, "process_time_ms": 1147.97}
{"timestamp": "2026-04-07T09:33:10.200225Z", "level": "info", "event": "Detected remote GGUF repo 'unsloth/qwen3.5-4b-gguf', variant=UD-Q4_K_XL, vision=True"}
{"timestamp": "2026-04-07T09:33:10.833191Z", "level": "info", "event": "GGUF download: 2.7 GB needed, 872.0 GB free on disk"}
{"timestamp": "2026-04-07T09:33:10.833401Z", "level": "info", "event": "Resolving GGUF: unsloth/qwen3.5-4b-gguf/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:33:11.136599Z", "level": "info", "event": "GGUF resolved from cache: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-07T09:33:11.430023Z", "level": "info", "event": "Downloading mmproj: unsloth/qwen3.5-4b-gguf/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:33:11.815927Z", "level": "info", "event": "GGUF metadata: context_length=262144"}
{"timestamp": "2026-04-07T09:33:11.815984Z", "level": "info", "event": "GGUF metadata: chat_template=7816 chars"}
{"timestamp": "2026-04-07T09:33:11.816000Z", "level": "info", "event": "GGUF metadata: model supports reasoning (enable_thinking)"}
{"timestamp": "2026-04-07T09:33:11.816034Z", "level": "info", "event": "GGUF metadata: model supports tool calling"}
{"timestamp": "2026-04-07T09:33:11.823487Z", "level": "info", "event": "GGUF size: 2.7 GB, est. KV cache: 20.0 GB, context: 262144, GPUs free: [], selected: None, fit: True"}
{"timestamp": "2026-04-07T09:33:11.823618Z", "level": "info", "event": "Reasoning model: enable_thinking=False by default"}
{"timestamp": "2026-04-07T09:33:11.823715Z", "level": "info", "event": "Using mmproj for vision: /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:33:11.823762Z", "level": "info", "event": "Starting llama-server: /Users/acker/.unsloth/llama.cpp/llama-server -m /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/Qwen3.5-4B-UD-Q4_K_XL.gguf --port 62151 -c 262144 --parallel 1 --flash-attn on --fit on --jinja --chat-template-kwargs {\"enable_thinking\": false} --mmproj /Users/acker/.cache/huggingface/hub/models--unsloth--qwen3.5-4b-gguf/snapshots/e87f176479d0855a907a41277aca2f8ee7a09523/mmproj-BF16.gguf"}
{"timestamp": "2026-04-07T09:33:13.371150Z", "level": "info", "event": "llama-server ready on port 62151 for model 'unsloth/qwen3.5-4b-gguf'"}
{"timestamp": "2026-04-07T09:33:13.372028Z", "level": "info", "event": "Loaded GGUF model via llama-server: unsloth/qwen3.5-4b-gguf"}
{"timestamp": "2026-04-07T09:33:13.397559Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:33:13.401291Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 200, "process_time_ms": 4396.82}
{"timestamp": "2026-04-07T09:33:13.411958Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/list", "status_code": 200, "process_time_ms": 4.08}
{"timestamp": "2026-04-07T09:33:13.420743Z", "level": "info", "event": "Loaded default model defaults from /Users/acker/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/backend/assets/configs/model_defaults/default.yaml"}
{"timestamp": "2026-04-07T09:33:13.424083Z", "level": "info", "event": "Found 0 trained LoRA adapters in /Users/acker/.unsloth/studio/outputs"}
{"timestamp": "2026-04-07T09:33:13.424190Z", "level": "info", "event": "Found 0 exported models in /Users/acker/.unsloth/studio/exports"}
{"timestamp": "2026-04-07T09:33:13.424390Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 11.56}
{"timestamp": "2026-04-07T09:33:13.424538Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/models/loras", "status_code": 200, "process_time_ms": 11.66}
Hardware: Mac Studio M3 Ultra 512GB
Problem: The context length in the chat interface is reduced to 4096 tokens once Unsloth Studio failed to load a model. The 4096 context length remains there even if we successfully load models with higher context lengths.
Workaround: If you explicitly unload the model via the model selector and load the model again then the correct context length (retrieved from GGUF) will be set.
Reproduction:
Log produced during the execution of the above described steps: