Begrudgingly Trying AI
I don't like "AI". More specifically, I don't like whatever is being marketed as AI these days. I do, however, like technology, progress, and innovation. And, I hear ignorance in my own voice when I speak on these topics. So, I want to learn—even though I don't like it.
If I'm going to do this at all, it has to be run locally on my own hardware; no corporation is going to have that much insight into my thought process for free, and never will I trust anything they put in front of me as "fact" blindly. So, to trust the model's output at all, it has to be an open model, running on my own hardware: enter Ollama.
Ollama
Ollama is a CLI that is based on the llama.cpp library, which can "run" AI models from a serialised GGUF format in a file, for example.
The CLI is easy enough to use, as well.
This is where I made my first mistake. I installed ollama through dnf, my system's package manager. It was available when I searched for it, after all.
Unfortunately, with how quickly things are changing, the ollama package on my system was a little out of date, and therefore couldn't take advantage of my very-old hardware. This meant all the models were running directly on my CPU, which, while not a lemon by any means (13th Gen Intel(R) Core(TM) i7-13700K), was still not state-of-the-art.
This meant I was limited to small parameter models (<5b), lest they take ten seconds to emit a single word.
Off I went, trying out different models that were compatible and seemed interesting.
ollama run qwen3.5:0.8b
Trash. "thinking" takes up the entire output, never generates a result for Why is the sky blue?, 2 + 2 = ?, etc.. Lots of Remember to mention "blue"...
ollama run granite4.1:3b
Oh, that's actually impressive. It generated an answer for Why is the sky blue? that was reasonably accurate, and the answer was given within ten to fifteen words, which only took a second or so to generate. This … this is usable.
It seems IBM has cooked up something very nice. I like it.
Utilising the GPU
After lots of fiddling around, I just downloaded the install script from ollama themselves, vetted it, then removed the dnf package and ran the script on my system to install ollama locally. During install, NVidia GPU detected or something appeared in the output. Yay.
After that, it "just worked" to run models and have ollama utilise my GPU.
ollama run granite4.1:8b
Impressive. Hallucinates for industry-specific details, or things that don't appear in the training data over and over (like the workings of the packages of Emacs, or the Go language). Despite that, it provides quick answers, displays impressively cohesive reasoning, and I'd say is a very "usable" model if you have quick questions, need help planning a project—things of that nature.
ollama run nemotron-3-nano:4b
Holy crap, a thinking model you can run locally that isn't trash. Provides quick answers to Why is the sky blue?, 2 + 2 = ?, etc. without getting stuck. Cohesive reasoning, despite the low parameter count. Whatever NVidia is cooking here, it's good.
ollama run rnj-1:8b
Similar to granite4.1, but seems to be more "deterministic". Repeats itself, has "one way" of solving problems and sticks to it, and if it doesn't know how to solve your exact problem it tries to adapt the solution it does know to your problem. Overall, responsive, reasonable, and usable. Lump it in with Granite4.1.
I appreciate the plain-text style output, vs over-using markdown elements.
ollama run ministral-3:8b
Ew. Too many emojis, too much markdown (in every line?!), but not inaccurate or unusable. Just unlikable, to me.
Impressive to get an 8.9b parameter model to fit in 8gb of VRAM.
TODO AnythingLLM
I've heard anything-llm is an open source tool that is oriented towards RAG (basically, using larger documents as input to AI, and asking the model specific questions about the document). I'd be very interested if I could feed it the intel manual, and ask about things like the shadow stack, byte encoding of certain instructions, etc. That could actually be really helpful when writing a compiler, to be able to get specific answers to your questions about how certain hardware works.
I may update this post with more info, or make a new post, depending on how it goes.