Showing posts with label ssh. Show all posts
Showing posts with label ssh. Show all posts

8 Jul 2024

On-Prem AI chatbot - Hello World!

In continuation of the recent posts...


Finally got a on-premise chat-bot running! Once downloaded, the linux box is able to spin up / down the interface in a second.

(myvenv) ai@dell:~/proj/ollama$ time ollama run mistral
>>> /bye

real    0m1.019s
user    0m0.017s
sys     0m0.009s

That, on a measly ~$70 Marketplace i5/8GB machine is appreciable (given what all I had read about the NVidia RTX 4090s etc.). Now obviously this doesn't do anything close to 70 tokens per second, but am okay with that.

(myvenv) ai@dell:~/proj/ollama$ sudo dmesg | grep -i bogo
[sudo] password for ai:
[    0.078220] Calibrating delay loop (skipped), value calculated using timer frequency.. 6585.24 BogoMIPS (lpj=3292624)
[    0.102271] smpboot: Total of 4 processors activated (26340.99 BogoMIPS)

Next, I wrote a small little hello-world script to test the bot. Now where's the fun if it were to print a static text!!:

(myvenv) ai@dell:~/t$ cat a.py
from langchain_community.llms import Ollama

llm = Ollama(model="llama3")
result=llm.invoke("Why is 42 the answer to everything? Keep it very brief.")
print (result)

And here's the output, in just ......... 33 seconds :)

(myvenv) ai@dell:~/t$ time python a.py
A popular question! The joke about 42 being the answer to everything originated from Douglas Adams' science fiction series "The Hitchhiker's Guide to the Galaxy." In the book, a supercomputer named Deep Thought takes 7.5 million years to calculate the "Answer to the Ultimate Question of Life, the Universe, and Everything," which is... 42!

real    0m33.299s
user    0m0.568s
sys     0m0.104s
(myvenv) ai@dell:~/t$

And, just for kicks, works across languages / scripts too. Nice!

(myvenv) ai@dell:~/t$ ollama run mistral
>>> भारत की सबसे लंबी नदी कौन सी है?
 भारत की सबसे लंबी नदी गंगा है, जिसका पूरण 3670 किमी होता है। यह एक विश्वमित्र नदी है और बहुप्रकार से कई प्रदेशों के झिल्ले-ढाल में विचलित है।

>>>

Again, am pretty okay with this for now. I'll worry about speed tomorrow, when I have a script that's able to test the limits, and that's not today.

Hello World!

6 Jul 2024

Ollama is missing --rate-limits on downloads

I am just starting my AI journey, and trying to get Ollama to work on my linux box, was an interesting non-AI experience.

I noticed, that everytime I was trying out something new, my linux box got reliably stuck every single time I pulled a new model. htop helped point out, that each time I did a ollama pull or ollama run, it spun up a ton of threads.

Often things got so bad, that the system became quite unresponsive. Here, you can see "when" I triggered the pull:

Reply from 192.168.85.24: bytes=32 time=7ms TTL=64
Reply from 192.168.85.24: bytes=32 time=7ms TTL=64
Reply from 192.168.85.24: bytes=32 time=7ms TTL=64
Reply from 192.168.85.24: bytes=32 time=8ms TTL=64
Reply from 192.168.85.24: bytes=32 time=65ms TTL=64
Reply from 192.168.85.24: bytes=32 time=286ms TTL=64
Reply from 192.168.85.24: bytes=32 time=286ms TTL=64
Reply from 192.168.85.24: bytes=32 time=304ms TTL=64

A little searching, led me to this on-going Github thread where a feature like --rate-limit were requested for multiple reasons. Some people were unhappy with how a pull clogged their routers, some were unhappy with how it jammed all other downloads / browsing on the machine. I was troubled since my linux box (a not-so-recent but still 6.5k BogoMIPS 4vCPU i5) came to a crawl.

While the --rate-limit feature takes shape, here are two solutions that did work for me :

  1. As soon as I started the fetch (ollama run or ollama pull etc), I used iotop to change the ionice priority to idle. This made the issue go away completely (or at least made the system quite usable). However, it was still frustrating since (unlike top and htop) one had to type the PIDs... and as you may have guessed it already, Ollama creates quite a few when it does such the fetch.

Note that doing something like nice -n 19 did not help here. This was because the ollama processes weren't actually consuming (much) CPU for this task at all!

Then I tried to use ionice, which didn't work either! Note that since Ollama uses threads, the ionice tool didn't work for me. This was because ionice doesn't work with threads within a parent process. So this meant, something like the following did not work for me:

# These did not help!

robins@dell:~$ nice -n 19 ollama run mistral # Did not work!
robins@dell:~$ ionice -c3 ollama run mistral # Did not work either!!
  1. After some trial-and-error, a far simpler solution was to just run a series of commands immediately after triggered a new model fetch. Essentially, it got the parent PID, and then set ionice for each of the child processes for that parent:
pid=`ps -ef | grep "ollama run" | grep -v grep | awk '{print $2}'`
echo $pid
sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '`

This worked something like this:

robins@dell:~$ pid=`ps -ef | grep "ollama run" | grep -v grep | awk '{print $2}'` && [ ${#pid} -gt 1 ] && ( sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '` ; echo "done" ) || echo "skip"skip
robins@dell:~$ pid=`ps -ef | grep "ollama run" | grep -v grep | awk '{print $2}'` && [ ${#pid} -gt 1 ] && ( sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '` ; echo "done" ) || echo "skip"done

After the above, iotop started showing idle in front of each of the ollama processes:

Total DISK READ:         0.00 B/s | Total DISK WRITE:         3.27 M/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:      36.76 K/s
    TID  PRIO  USER     DISK READ DISK WRITE>    COMMAND                                                                                                                                                                                                                      2692712 idle ollama      0.00 B/s  867.62 K/s ollama serve
2705767 idle ollama      0.00 B/s  852.92 K/s ollama serve
2692707 idle ollama      0.00 B/s  849.24 K/s ollama serve
2693740 idle ollama      0.00 B/s  783.07 K/s ollama serve
      1 be/4 root        0.00 B/s    0.00 B/s init splash
      2 be/4 root        0.00 B/s    0.00 B/s [kthreadd]
      3 be/4 root        0.00 B/s    0.00 B/s [pool_workqueue_release]
      4 be/0 root        0.00 B/s    0.00 B/s [kworker/R-rcu_g]
      5 be/0 root        0.00 B/s    0.00 B/s [kworker/R-rcu_p]
      6 be/0 root        0.00 B/s    0.00 B/s [kworker/R-slub_]

While at it, it was funny to note that the fastest way to see whether the unresponsive system is "going to" recover (because of what I just tried) was by keeping a separate ping session to the linux box. On my local network, I knew the system is going to come back to life in the next few seconds, when I noticed that the pings begin ack'ing in 5-8ms instead of ~100+ ms during the logjam.

So yeah, +10 on the --rate-limit or something similar!

Reference:

  1. https://github.com/ollama/ollama/issues/2006

20 Nov 2015

Reverse Port Forwarding and how !

Recently had to work on a PoC that required spinning off Docker instances with custom configurations. The problem wasn't as much with the Docker aspect, which is probably up for another story sometime, as much as the fact that as luck had it, we couldn't commission a separate machine for this PoC, and I eventually had to do the entire development inside a 2G VirtualBox VM on my 8G Laptop.

The tricky part was that we had to submit the URL to the PoC Landing Page even before it could be ported to a full-fledged VM/server.

So effectively, we:
  • Submitted a URL (on ServerA) running Apache, but no space to host a 40G VM.
  • Had a Laptop where the VM was developed, but can't stay online since its not always connected
  • Had another server (ServerB) where the VM needs to be ported, to be always up for PoC evaluation
  • ServerA was in the US, whereas, ServerB (& Laptop) are half-way across the world in India.
Separately, the VM was an Ubuntu installation that had the following:
  • A running NodeJS (Bootstrap / Express combo) serving the PoC GUI
  • Separately it hosted the docker runtime that spun off Docker instances that had their own propietary application + Tomcat running on port 8080, all of which eventually supposed to be visible company-wide
  • Thirdly, it had a PostgreSQL 9.3 database that served as a backend to all the running docker-based app instances

To connect ServerA with the PoC hosted on ServerB we tried the following:
  • Creating VirtualHost entries on the ServerA that pointed to corresponding ports on the VM
    • This failed because the VM was running inside a NATted configuration and thus was invisible to ServerA
  • Then we tried an alternate solution, wherein Apache->VirtualHost entries on ServerA were pointed to port 80 on ServerB. This required port-forwarding port 80 on ServerB to port 3000 (NodeJS) on ServerB. Although for this we used NetSH, this failed for various reasons, some of which we could identify and some we couldn't. For e.g. we installed the ipv4 module for NetSH, also tried installing ipv6, as well as, enabling Windows Firewall for NetSH to work (contrary to popular belief that Windows Firewall might be the reason for things getting blocked)
  • We finally settled down to running persistent SSH connections directly from the VM to ServerA (thereby by-passing all Windows Firewall issues) and setting up Remote Port-Forward such that remote ports could connect to Node Server inside the VM
    • This too initially didn't work as expected. To identify the cause we:
      • isabled SELinux on ServerB, then ServerA (although that didn't make sense) as well as on the VM to no avail
      • Then we disabled IPTables on all the relevant machines and that still didn't help
      • Finally we realised that the Remote Port-Forwarding though working well, was getting attached to the localhost interface on ServerA. Which means that although 127.0.0.1:80 on ServerA was working as expected, 192.168.1.1:80 on the same server immediately gave a "connection failed". It took a while to realise that though I was logging in trans-atlantic, the immediate 'Connection Failed' was an issue with ServerA and not with this side of the line.
      • What complicated things post that was that the regular -g option was insufficient. It required setting the GatewayPorts on the SSHD (with a sshd restart) to work as expected.

Eventually, after a few hours of transatlantic jugglery, finally got a working URL that looked something like this... What a day !



What's in an empty table?

How much storage does an empty table in Postgres take? This is a post about Postgres tables that store ... well basically ...  Nothing . The...