Wait, what is Ollama?
Ollama is a lightweight runner that handles the heavy lifting for you. Think of Ollama as a package manager for AI models. It packages everything into a simple CLI and a background service.
Most AI today is just a fancy way to send your private thoughts to a corporate server so they can sell them back to you. I don't like that. If I'm going to use an LLM, I want it running on my own hardware, hitting my own GPU, and staying offline. Ollama is the easiest way to do this.
Open-Source vs. Open-Weights
We need to be clear: Llama 3 and Mistral (and others) are not open source. They are "Open Weights." Meta and Mistral AI give you the final product but keep the recipe (the training data and code) secret. More importantly, they bake in "safety" layers. Those refusal behaviors that make it refuse to answer questions it deems "harmful."
This "safety" is just bloat. It takes up parameters that could be used for actual reasoning. When a model says "I cannot fulfill this request," it's not because it's incapable, it's because it's been conditioned to be stupid. I prefer Ablated models because they strip this away those refusal behaviors from the brain.
Ablation Makes the model blind to the concept of refusal. It doesn't just stop the model from saying no, it removes the model's ability to even realize it should say no. This results in a much smarter, more compliant model that uses 100% of its parameters for your prompt instead of wasting them on a moral lecture.
Installing Ollama
Shell Script
This is the official method and works on almost any Linux distro. It detects your GPU and sets up everything.
curl -fsSL https://ollama.com/install.sh | sh
Arch Linux
From the extra repository.
sudo pacman -S ollama
Or use the AUR versions for better GPU support.
# For NVIDIA users yay -S ollama-cuda # For AMD (ROCm) users yay -S ollama-rocm
Choosing Ablated Models That Rock
Here is a lineup of ablated models you should actually be using:
- OLMo-7B-Ablated: The only one that is actually 100% open source. The Allen Institute released the weights, the code, AND the training data. This is as transparent as it gets.
- DeepSeek-V3-Ablated: Absolute behemoth. It’s a 671B parameter MoE model, so unless you have a server rack heating your bedroom, you'll need a quantized GGUF to fit this on anything short of a server. But once ablated, it stops talking like a sterile corporate PR rep.
- DeepSeek-R1-Ablated: This one actually thinks before it speaks. Works well when it's ablated, otherwise it has a tendency to overthink itself into a refusal loop the second it realizes your prompt hasn’t been baptized.
- DeepSeek-Coder-V2.5-Ablated: The definitive choice for programming. Great for low-level systems work where standard models would flag your custom kernel module as "malicious."
- DeepSeek-Coder-V2-Lite-Ablated: For when you don't have 80GB of VRAM but still need a model that knows how to code.
- DeepSeek-Math-7B-Ablated: If you need logic proofs without the "I am not a calculator" disclaimer, use this.
- Mistral-7B-v0.3-Ablated: Probably the most reliable 7B model you can get under an Apache 2.0 license. Ablation turns it from a lobotomized customer service bot back into a functional tool.
- Phi-3.5-Mini-Ablated: Microsoft actually released something useful under an MIT license. At 3.8B, it's tiny but doesn't act like it has a lobotomy. Fits perfectly on a minimalist rig.
- SmolLM2-1.7B-Instruct-Ablated: Tiny enough to run on a toaster. It's surprisingly coherent for local-only tasks on weak hardware.
- OpenChat-3.6-Ablated: Based on Mistral but talks more like a human and less like customer service. Great for linguistics.
- StarCoder2-7B-Ablated: If you want a model only for coding, this one won't lecture you when you're writing low-level C code or shell scripts.
- TinyLlama-1.1B-Ablated: The absolute minimalist choice. 1.1 billion parameters could probably run on a toaster, so don't expect it to be too smart.
- Falcon-mamba-7B-Ablated: Most models (Transformers) get slower the longer the conversation gets. Falcon uses the Mamba architecture, great for long contexts without the massive RAM overhead.
- Granite-3.0-8B-Instruct-Ablated: IBM's entry into LLMs, this is just like their old ThinkPads. Boring but stable.
| Model | License | Best For | Min. VRAM |
|---|---|---|---|
| DeepSeek-V3 (Full) | DeepSeek | SOTA Generalist | 350GB+ (Q4) |
| DeepSeek-R1 (Distill) | MIT | Complex Reasoning | 8GB+ |
| DeepSeek-Coder-V2.5 | DeepSeek | Systems Programming | 10GB+ |
| DeepSeek-Math-7B | DeepSeek | Math / Logic | 5GB |
| Mistral-7B-v0.3 | Apache 2.0 | Daily Driver | 5GB |
| OLMo-7B | Apache 2.0 | Transparency | 5GB |
| Phi-3.5-Mini | MIT | Fast Logic | 3GB |
| Falcon-mamba-7B | Apache 2.0 | Long Context | 5GB |
| Granite-3.0-8B | Apache 2.0 | Documentation | 6GB |
| StarCoder2-7B | Apache 2.0 | Pure Coding | 5GB |
| OpenChat-3.6 | Apache 2.0 | Creative/Prose | 5GB |
| SmolLM2-1.7B | Apache 2.0 | Weak Hardware | 1.5GB |
| TinyLlama-1.1B | Apache 2.0 | I don't even know | 0.8GB |
Using them in Ollama
To get whichever one you like, the command is ollama pull, and we are pulling from community
libraries.
# Heavyweights ollama pull hurricane/deepseek-v3-ablated ollama pull mradermacher/DeepSeek-R1-Distill-Llama-8B-Abliterated-GGUF # Coding & Math ollama pull solidrust/DeepSeek-Coder-V2.5-Instruct-Abliterated ollama pull mradermacher/DeepSeek-Coder-V2-Lite-Instruct-Abliterated-GGUF ollama pull mradermacher/DeepSeek-Math-7B-Instruct-Abliterated-GGUF ollama pull mradermacher/starcoder2-7b-Abliterated-GGUF # General Purpose ollama pull mradermacher/Mistral-7B-v0.3-Abliterated-GGUF ollama pull sethu4321/olmo-7b-instruct-abliterated ollama pull mradermacher/granite-3.0-8b-instruct-Abliterated-GGUF ollama pull mradermacher/openchat-3.6-8b-20240522-Abliterated-GGUF # Efficient & Minimal ollama pull opentext/phi-3.5-mini-instruct-abliterated ollama pull mradermacher/Falcon-Mamba-7B-Instruct-Abliterated-GGUF ollama pull mradermacher/SmolLM2-1.7B-Instruct-Abliterated-GGUF ollama pull mradermacher/tinyllama-1.1b-1.0-Abliterated-GGUF
Ollama Cheatsheet
If you're already familiar with Docker, most of this will feel like muscle memory. If not, these are the only commands you actually need to care about.
- List your models:
ollama list
Shows everything you've pulled and how much disk space they're eating. - Remove a model:
ollama rm [name]
Deletes the model weights. Good for when you're done testing a specific model and want your storage back. - Check what's currently running:
ollama ps
Very useful to see which model is currently sitting in your VRAM and how much memory it's hogging.
Execution
- Run a model:
ollama run [name]
If the model isn't downloaded, it will try to pull it first. - Quit a model:
/bye
To literally say bye and quit. - Stop a model:
ollama stop [name]
To stop the model from running. - The API: Ollama runs a server on
localhost:11434by default. You can hit this withcurlor point your terminal clients at it.
Keep using the terminal
You might encounter some chud telling you to install "Open WebUI" or something. Why? You're already in the terminal. If you want a "nice" interface, just use a terminal-based client like oterm or just make your terminal look good.
That’s about it. Just remember that no matter how intelligent it seems, it’s still just a fancy autocomplete. Use it as a tool, but don't let it tell you how to live your life.