Ollama & Ablated Models

Wait, what is Ollama?

Ollama is a lightweight runner that handles the heavy lifting for you. Think of Ollama as a package manager for AI models. It packages everything into a simple CLI and a background service.

Most AI today is just a fancy way to send your private thoughts to a corporate server so they can sell them back to you. I don't like that. If I'm going to use an LLM, I want it running on my own hardware, hitting my own GPU, and staying offline. Ollama is the easiest way to do this.

Open-Source vs. Open-Weights

We need to be clear: Llama 3 and Mistral (and others) are not open source. They are "Open Weights." Meta and Mistral AI give you the final product but keep the recipe (the training data and code) secret. More importantly, they bake in "safety" layers. Those refusal behaviors that make it refuse to answer questions it deems "harmful."

This "safety" is just bloat. It takes up parameters that could be used for actual reasoning. When a model says "I cannot fulfill this request," it's not because it's incapable, it's because it's been conditioned to be stupid. I prefer Ablated models because they strip this away those refusal behaviors from the brain.

Ablation Makes the model blind to the concept of refusal. It doesn't just stop the model from saying no, it removes the model's ability to even realize it should say no. This results in a much smarter, more compliant model that uses 100% of its parameters for your prompt instead of wasting them on a moral lecture.

Installing Ollama

Shell Script

This is the official method and works on almost any Linux distro. It detects your GPU and sets up everything.

curl -fsSL https://ollama.com/install.sh | sh

Arch Linux

From the extra repository.

sudo pacman -S ollama

Or use the AUR versions for better GPU support.

# For NVIDIA users
yay -S ollama-cuda

# For AMD (ROCm) users
yay -S ollama-rocm

Choosing Ablated Models That Rock

Here is a lineup of ablated models you should actually be using:

OLMo-7B-Ablated: The only one that is actually 100% open source. The Allen Institute released the weights, the code, AND the training data. This is as transparent as it gets.
DeepSeek-V3-Ablated: Absolute behemoth. It’s a 671B parameter MoE model, so unless you have a server rack heating your bedroom, you'll need a quantized GGUF to fit this on anything short of a server. But once ablated, it stops talking like a sterile corporate PR rep.
DeepSeek-R1-Ablated: This one actually thinks before it speaks. Works well when it's ablated, otherwise it has a tendency to overthink itself into a refusal loop the second it realizes your prompt hasn’t been baptized.
DeepSeek-Coder-V2.5-Ablated: The definitive choice for programming. Great for low-level systems work where standard models would flag your custom kernel module as "malicious."
DeepSeek-Coder-V2-Lite-Ablated: For when you don't have 80GB of VRAM but still need a model that knows how to code.
DeepSeek-Math-7B-Ablated: If you need logic proofs without the "I am not a calculator" disclaimer, use this.
Mistral-7B-v0.3-Ablated: Probably the most reliable 7B model you can get under an Apache 2.0 license. Ablation turns it from a lobotomized customer service bot back into a functional tool.
Phi-3.5-Mini-Ablated: Microsoft actually released something useful under an MIT license. At 3.8B, it's tiny but doesn't act like it has a lobotomy. Fits perfectly on a minimalist rig.
SmolLM2-1.7B-Instruct-Ablated: Tiny enough to run on a toaster. It's surprisingly coherent for local-only tasks on weak hardware.
OpenChat-3.6-Ablated: Based on Mistral but talks more like a human and less like customer service. Great for linguistics.
StarCoder2-7B-Ablated: If you want a model only for coding, this one won't lecture you when you're writing low-level C code or shell scripts.
TinyLlama-1.1B-Ablated: The absolute minimalist choice. 1.1 billion parameters could probably run on a toaster, so don't expect it to be too smart.
Falcon-mamba-7B-Ablated: Most models (Transformers) get slower the longer the conversation gets. Falcon uses the Mamba architecture, great for long contexts without the massive RAM overhead.
Granite-3.0-8B-Instruct-Ablated: IBM's entry into LLMs, this is just like their old ThinkPads. Boring but stable.

Model	License	Best For	Min. VRAM
DeepSeek-V3 (Full)	DeepSeek	SOTA Generalist	350GB+ (Q4)
DeepSeek-R1 (Distill)	MIT	Complex Reasoning	8GB+
DeepSeek-Coder-V2.5	DeepSeek	Systems Programming	10GB+
DeepSeek-Math-7B	DeepSeek	Math / Logic	5GB
Mistral-7B-v0.3	Apache 2.0	Daily Driver	5GB
OLMo-7B	Apache 2.0	Transparency	5GB
Phi-3.5-Mini	MIT	Fast Logic	3GB
Falcon-mamba-7B	Apache 2.0	Long Context	5GB
Granite-3.0-8B	Apache 2.0	Documentation	6GB
StarCoder2-7B	Apache 2.0	Pure Coding	5GB
OpenChat-3.6	Apache 2.0	Creative/Prose	5GB
SmolLM2-1.7B	Apache 2.0	Weak Hardware	1.5GB
TinyLlama-1.1B	Apache 2.0	I don't even know	0.8GB

Using them in Ollama

To get whichever one you like, the command is ollama pull, and we are pulling from community libraries.

# Heavyweights
ollama pull hurricane/deepseek-v3-ablated
ollama pull mradermacher/DeepSeek-R1-Distill-Llama-8B-Abliterated-GGUF

# Coding & Math
ollama pull solidrust/DeepSeek-Coder-V2.5-Instruct-Abliterated
ollama pull mradermacher/DeepSeek-Coder-V2-Lite-Instruct-Abliterated-GGUF
ollama pull mradermacher/DeepSeek-Math-7B-Instruct-Abliterated-GGUF
ollama pull mradermacher/starcoder2-7b-Abliterated-GGUF

# General Purpose
ollama pull mradermacher/Mistral-7B-v0.3-Abliterated-GGUF
ollama pull sethu4321/olmo-7b-instruct-abliterated
ollama pull mradermacher/granite-3.0-8b-instruct-Abliterated-GGUF
ollama pull mradermacher/openchat-3.6-8b-20240522-Abliterated-GGUF

# Efficient & Minimal
ollama pull opentext/phi-3.5-mini-instruct-abliterated
ollama pull mradermacher/Falcon-Mamba-7B-Instruct-Abliterated-GGUF
ollama pull mradermacher/SmolLM2-1.7B-Instruct-Abliterated-GGUF
ollama pull mradermacher/tinyllama-1.1b-1.0-Abliterated-GGUF

Ollama Cheatsheet

If you're already familiar with Docker, most of this will feel like muscle memory. If not, these are the only commands you actually need to care about.

List your models: ollama list
Shows everything you've pulled and how much disk space they're eating.
Remove a model: ollama rm [name]
Deletes the model weights. Good for when you're done testing a specific model and want your storage back.
Check what's currently running: ollama ps
Very useful to see which model is currently sitting in your VRAM and how much memory it's hogging.

Execution

Run a model: ollama run [name]
If the model isn't downloaded, it will try to pull it first.
Quit a model: /bye
To literally say bye and quit.
Stop a model: ollama stop [name]
To stop the model from running.
The API: Ollama runs a server on localhost:11434 by default. You can hit this with curl or point your terminal clients at it.

Keep using the terminal

You might encounter some chud telling you to install "Open WebUI" or something. Why? You're already in the terminal. If you want a "nice" interface, just use a terminal-based client like oterm or just make your terminal look good.

That’s about it. Just remember that no matter how intelligent it seems, it’s still just a fancy autocomplete. Use it as a tool, but don't let it tell you how to live your life.