Tabby-API-Ollama
Tabby-API-Ollama
A fork of Tabby-API that functions as a drop-in replacement of Ollama.
Motivation
I really disliked every LLM frontend besides open-webui. But open-webui only works with Ollama. Ollama lacks my favorite model format, Exllamav2. This format is quite fast for models that fit entirely in the GPU(s). So why not make open-webui compatible with my go-to server, TabbyAPI?
The Process
I fired up open-webui and pointed it at the TabbyAPI endpoint. I read the error message, then the open-webui backend code to see what it was expecting, and then TabbyAPI server to figure out how to resolve it. I ultimately made all changes in Tabby since it was simplest, and would make it a drop-in replacement for Ollama. I found the endpoint open-webui was trying to hit, what it expected, and implemented it in Tabby. Like model lists, version numbers, chat completions etc.
I worked like this for a couple days, bouncing between reading open-webui code and then implementing that in Tabby. I finally got text to generate and receive in open-webui, but open-webui wouldn't display it. I got a JSON parsing error. After an embarrasingly long time, I realized the answer was in the Ollama backend and I just had to mimic their streaming response in Tabby.
With that final issue resolved I can finally use the two together and experience the sweet relief of being able to move on with my life.