When I try to add keep_alive to the config, it's not being sent to ollama because when I do 'ollama ps' it shows the time to unload as 5m which is the default.
https://docs.ollama.com/faq#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
I tested with other values like 1800, to see if maybe you are expecting numbers only, but alas, it still doesn't work.
My config:
[API]
secret_KEY=blahdude #don't use one w ollama
# Ollama with native API format
api_url=http://localhost:11434/
route_chat_completions=api/generate
response_type=ollama
keep_alive=30m #A valid value
model=qwen2.5:7b-instruct
temperature=0.7
max_tokens=0
top_p=0.8
frequency_penalty=0
presence_penalty=0
[PLUGIN]
total_tokens_used=0
keep_question=1
is_chat=1
chat_limit=10
I'd really like for this to work. I have to change my server value or do a Modelfile to get it to stop unloading every 5 min. Thanks!
When I try to add keep_alive to the config, it's not being sent to ollama because when I do 'ollama ps' it shows the time to unload as 5m which is the default.
https://docs.ollama.com/faq#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
I tested with other values like 1800, to see if maybe you are expecting numbers only, but alas, it still doesn't work.
My config:
I'd really like for this to work. I have to change my server value or do a Modelfile to get it to stop unloading every 5 min. Thanks!