I am just starting my AI journey, and trying to get Ollama to work on my linux box, was an interesting non-AI experience.
I noticed, that everytime I was trying out something new, my linux box got reliably stuck every single time I pulled a new model. htop
helped point out, that each time I did a ollama pull
or ollama run
, it spun up a ton of threads.
Often things got so bad, that the system became quite unresponsive. Here, you can see "when" I triggered the pull:
Reply from 192.168.85.24: bytes=32 time=7ms TTL=64
Reply from 192.168.85.24: bytes=32 time=7ms TTL=64
Reply from 192.168.85.24: bytes=32 time=7ms TTL=64
Reply from 192.168.85.24: bytes=32 time=8ms TTL=64
Reply from 192.168.85.24: bytes=32 time=65ms TTL=64
Reply from 192.168.85.24: bytes=32 time=286ms TTL=64
Reply from 192.168.85.24: bytes=32 time=286ms TTL=64
Reply from 192.168.85.24: bytes=32 time=304ms TTL=64
A little searching, led me to this on-going Github thread where a feature like --rate-limit
were requested for multiple reasons. Some people were unhappy with how a pull clogged their routers, some were unhappy with how it jammed all other downloads / browsing on the machine. I was troubled since my linux box (a not-so-recent but still 6.5k BogoMIPS 4vCPU i5) came to a crawl.
While the --rate-limit
feature takes shape, here are two solutions that did work for me :
- As soon as I started the fetch (
ollama run
or ollama pull
etc), I used iotop
to change the ionice
priority to idle
. This made the issue go away completely (or at least made the system quite usable). However, it was still frustrating since (unlike top
and htop
) one had to type the PIDs... and as you may have guessed it already, Ollama creates quite a few when it does such the fetch.
Note that doing something like nice -n 19
did not help here. This was because the ollama
processes weren't actually consuming (much) CPU for this task at all!
Then I tried to use ionice
, which didn't work either! Note that since Ollama
uses threads, the ionice
tool didn't work for me. This was because ionice
doesn't work with threads within a parent process. So this meant, something like the following did not work for me:
# These did not help!
robins@dell:~$ nice -n 19 ollama run mistral # Did not work!
robins@dell:~$ ionice -c3 ollama run mistral # Did not work either!!
- After some trial-and-error, a far simpler solution was to just run a series of commands immediately after triggered a new model fetch. Essentially, it got the parent PID, and then set
ionice
for each of the child processes for that parent:
pid=`ps -ef | grep "ollama run" | grep -v grep | awk '{print $2}'`
echo $pid
sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '`
This worked something like this:
robins@dell:~$ pid=`ps -ef | grep "ollama run" | grep -v grep | awk '{print $2}'` && [ ${#pid} -gt 1 ] && ( sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '` ; echo "done" ) || echo "skip"
skip
robins@dell:~$ pid=`ps -ef | grep "ollama run" | grep -v grep | awk '{print $2}'` && [ ${#pid} -gt 1 ] && ( sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '` ; echo "done" ) || echo "skip"
done
After the above, iotop
started showing idle
in front of each of the ollama processes:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.27 M/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 36.76 K/s
TID PRIO USER DISK READ DISK WRITE> COMMAND 2692712 idle ollama 0.00 B/s 867.62 K/s ollama serve
2705767 idle ollama 0.00 B/s 852.92 K/s ollama serve
2692707 idle ollama 0.00 B/s 849.24 K/s ollama serve
2693740 idle ollama 0.00 B/s 783.07 K/s ollama serve
1 be/4 root 0.00 B/s 0.00 B/s init splash
2 be/4 root 0.00 B/s 0.00 B/s [kthreadd]
3 be/4 root 0.00 B/s 0.00 B/s [pool_workqueue_release]
4 be/0 root 0.00 B/s 0.00 B/s [kworker/R-rcu_g]
5 be/0 root 0.00 B/s 0.00 B/s [kworker/R-rcu_p]
6 be/0 root 0.00 B/s 0.00 B/s [kworker/R-slub_]
While at it, it was funny to note that the fastest
way to see whether the unresponsive system is "going to" recover (because of what I just tried) was by keeping a separate ping session to the linux box. On my local network, I knew the system is going to come back to life in the next few seconds, when I noticed that the pings begin ack'ing in 5-8ms instead of ~100+ ms during the logjam.
So yeah, +10 on the --rate-limit
or something similar!
Reference:
- https://github.com/ollama/ollama/issues/2006