Adding Bearer authentication with haproxy – Linux

Motivation

I’m currently playing around with some LLMs for different projects. One project aims to add textual descriptions to a large image gallery. Those descriptions can be used both for better image organization or as alternative texts when publishing them on websites.

My basic setup contains ollama for running the LLMs and some python code to locate the images, upload them to ollama and save the returned description in the gallery’s database.

While testing I found, that I could run 4 instances of ollama on 2 Nvidia cards to get the most out of the available hardware, however I’d need some kind of load balancing to make use of this parallelism.

Load balancing

So I researched and tried several ollama-specific load balancing solutions (like ollamaflow or ollama_proxy_server), however none of them provided up-to-date and working installation instructions or – if they did so – the configured system just failed for whatever reason (and I was unwilling to spent much time on further investigation).

So I decided to go with haproxy for now. As I used it on several occasions the basic load balancing was easy:

linux # cat /etc/haproxy/conf.d/ollama.cfg
defaults
  mode http
  balance leastconn

frontend ft_ollama
  bind :80
  default_backend be_ollama
  timeout client 60s

backend be_ollama
  mode http
  balance leastconn
  option httpchk GET /
  option forwardfor
  timeout server 60s
  timeout connect 60s
  http-request set-header X-Real-IP %[src]
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  server ollama1 127.0.0.1:11435 check maxconn 2 fall 3 rise 2 inter 10s downinter 10s
  server ollama2 127.0.0.1:11436 check maxconn 2 fall 3 rise 2 inter 10s downinter 10s
  server ollama3 127.0.0.1:11437 check maxconn 2 fall 3 rise 2 inter 10s downinter 10s
  server ollama4 127.0.0.1:11438 check maxconn 2 fall 3 rise 2 inter 10s downinter 10s

So basically we get a frontend listening on port 80, and we do load balancing for the 4 ollama instances running on localhost (ports 11435-11438).

Adding bearer authentication

However what’s missing with plain ollama is authentication: everyone can use these systems (if a network connection can be established). Some of the tools (that I couldn’t get to work) had additional bearer authentication implemented, so I was wondering whether this feature could also be done by haproxy.

Searching for “haproxy bearer authentication” didn’t give an easy solution, however there were some pointers into the right direction (s. here).

After some testing I ended up with this prototype:

linux # cat /etc/haproxy/conf.d/ollama.cfg
<...>
frontend ft_ollama
  bind :80
  default_backend be_ollama
  timeout client 60s

  acl valid_bearer var(txn.bearer) -m str 'sk_mysupersecure_bearer_token1'
  acl valid_bearer var(txn.bearer) -m str 'sk_mysupersecure_bearer_token2'

  http-request deny content-type 'text/html' string 'Missing Authorization HTTP header' unless { req.hdr(authorization) -m found }
  http-request deny content-type 'text/html' string 'Not authorized' unless valid_bearer
<...>

Testing

A simple test using curl shows the desired effect:

linux # curl http://10.10.0.1:80/
Missing Authorization HTTP header

linux # curl -H 'Authorization: Bearer some-invalid-token' http://10.10.0.1:80/
Not authorized

linux # curl -H 'Authorization: Bearer sk_mysupersecure_bearer_token1' http://10.10.0.1:80/
Ollama is running

Conclusion

For a basic load balancing solution (even with rudimentary authentication) haproxy will to fine. Of course other solutions could make use of internal ollama knowledge to achieve better results (something like choosing the right node that has the desired LLM already in memory instead of choosing a random one that may need to unload an existing LLM and load the new one and so on…).

You’d also want to have additional SSL protection for a production system, but there’s plenty of documentation about how to achieve that with haproxy out there, so I decided to keep it simple here 🙂 .

Motivation

Load balancing

Adding bearer authentication

Testing

Conclusion

Leave a Reply Cancel reply