 and left behind a graveyard of shims."
---

K8s needed Docker. Then it didn't. The breakup took five years and left behind a graveyard of shims.

---

## Before the divorce: why they married

In 2014, "container" meant Docker. There was nothing else. K8s launched and hardcoded Docker API calls into kubelet. No abstraction. No interface. `kubelet` knew how to talk to Docker the same way your code knows how to talk to `console.log` — directly, with no layer in between.

This worked. Docker was the only runtime. Why build an abstraction for one implementation?

---

## The affair: rkt shows up

In 2015, CoreOS released rkt (pronounced "rocket"). A competing container runtime. Different architecture. And it wanted to plug into K8s.

K8s had a problem. Supporting rkt meant writing a second set of hardcoded calls in kubelet. Then if a third runtime appeared, a third set. Every new runtime meant more hardcoded integration code in kubelet.

So K8s did what any engineer does when two implementations exist: define an interface. They called it **CRI** — Container Runtime Interface. A gRPC API. kubelet would speak CRI. Any runtime that understood CRI could plug in. Clean.

rkt implemented CRI. containerd implemented CRI. Docker didn't.

rkt won the battle — it forced CRI into existence. But it lost the war. containerd had Docker's ecosystem behind it. CRI-O had Red Hat. rkt had CoreOS, and CoreOS got acquired by Red Hat in 2018. Red Hat already had CRI-O. rkt became redundant. CNCF archived the project in 2020. Dead.

---

## Docker never implemented CRI

Docker had its own orchestrator: **Swarm**. Same job as K8s, simpler interface. `docker swarm init` and you had a cluster. `docker service scale web=5` and you had five replicas. No YAML walls, no CRD, no etcd.

Docker never implemented CRI. Why? Docker never said. What's clear is that Docker and K8s were competing orchestrators at the time, and CRI was K8s's standard.

---

## K8s blinks first: dockershim

K8s couldn't drop Docker. In 2016, Docker was the only runtime most clusters knew. Dropping support meant losing the entire user base.

So K8s wrote the translator itself. `dockershim` — a shim that converted CRI calls into Docker API calls. It lived inside the K8s codebase. K8s maintained it. K8s paid the cost. Every CRI change meant updating dockershim too.

> **Era 0** (2014–2016): Hardcoded Docker calls in kubelet. No interface.
>
> **Era 1** (2016+): CRI exists. Docker doesn't speak it. K8s writes dockershim.
>
> `kubelet → dockershim → Docker Engine → containerd → runc → container`
>
> K8s doesn't need the Docker Engine layer.

The irony: Docker Engine internally used containerd to manage containers and runc to create them. K8s was talking to Docker, who was talking to containerd. The middleman added latency for zero value.

K8s defined a standard, the biggest player refused to follow it, and K8s wrote a shim to accommodate. That shim haunted the codebase for six years.

---

## The breakup, in four steps

### containerd 1.0 — bypass Docker

Someone asked: "Why not talk to containerd directly?" K8s built `CRI-containerd`, a standalone daemon that translated CRI calls to containerd.

> **Era 2**: Bypass Docker, but add another daemon.
>
> `kubelet → CRI-containerd (standalone daemon) → containerd → runc → container`

Docker Engine gone. But now there's a new process to deploy and maintain.

### containerd 1.1 — absorb the translator

containerd added CRI support as a built-in plugin. The standalone daemon merged into containerd itself.

> **Era 3**: Clean path.
>
> `kubelet → containerd (built-in CRI plugin) → runc → container`

One call. No middlemen. This is what most clusters run today.

### CRI-O — Red Hat's alternative

Red Hat built CRI-O from scratch. Purpose-built for K8s. Nothing extra. No Docker legacy. Speaks CRI on top, speaks OCI on bottom.

> Alternative path: `kubelet → CRI-O → runc → container`

OpenShift uses CRI-O by default. It does less than containerd, which is the point.

### K8s 1.24 (2022) — dockershim deleted

K8s removed dockershim from its codebase. containerd became the default runtime. Running containers worked fine. But something else broke.

**The collateral damage: docker.sock**

Before 1.24, every Node ran Docker Engine. Docker Engine exposed `/var/run/docker.sock`. Many CI pipelines — Jenkins, GitLab Runner, Tekton — ran as Pods inside the cluster and mounted that socket to build images on the Node's Docker.

Before 1.24, Node had two processes:
- `Docker Engine` → `/var/run/docker.sock` ← CI Pods mounted this
- `containerd` → `/var/run/containerd.sock` ← Docker used this internally

After 1.24, Node has one:
- `containerd` → `/var/run/containerd.sock` ← kubelet talks directly to this
- Docker Engine → gone
- docker.sock → gone

containerd has its own socket. But `docker build` doesn't speak containerd's API. It only talks to Docker's socket. Different program, different protocol.

> `docker build` → looks for `/var/run/docker.sock` → file doesn't exist → fails

The problem was never "no socket." It was "Docker is gone from this Node, and docker CLI can't talk to anything else."

**Three ways out:**

1. **Install cri-dockerd** → Docker Engine comes back → docker.sock returns → pipeline unchanged
2. **Switch to kaniko** → builds images in user space → no daemon, no socket, no privileged
3. **Switch to buildah** → same idea as kaniko, Red Hat's version

kaniko and buildah gained adoption as daemon-less alternatives. cri-dockerd kept the lights on for teams that couldn't rewrite their pipelines overnight. But the call chain tells you why it's a dead end:

> `kubelet → cri-dockerd → Docker Engine → containerd → runc`

Three hops to reach containerd. The mainstream path reaches it in one.

---

## The three layers

Container management splits into three layers. Each layer has an interface. Each interface has multiple implementations.

```
High-level management     High-level runtime      Low-level runtime
(who gives orders)        (who manages lifecycle)  (who creates containers)
                │                    │                       │
                │       CRI          │          OCI          │
                │    (gRPC API)      │      (spec + API)     │
                ▼                    ▼                       ▼
Kubernetes ─────────→ containerd ──────────→ runc ─────→ container
crictl          ────→ CRI-O    ──────────→ kata-runtime → VM + container
docker          ────→ Docker   ──────────→ gVisor (runsc) → sandbox + container
podman          ────→ libpod
```

Every name in the left column:

- **Kubernetes / kubelet** — talks CRI to whatever runtime is configured.
- **crictl** — CLI debugging tool for CRI runtimes. Think of it as "`docker ps` for containerd." Speaks CRI directly. `crictl ps` shows K8s containers that `docker ps` can't see. Ships with K8s.
- **docker** — the Docker CLI. Talks to Docker Engine (dockerd) over Docker's own API. Nothing to do with CRI.
- **podman** — Red Hat's Docker replacement for local development. Same CLI (`podman build`, `podman run`). **Daemonless** — each command runs its own process. Uses **libpod** as its container library. Not a K8s runtime. Not CRI. A developer tool for running containers without Docker.

### CRI — the left interface

Container Runtime Interface. A gRPC API that kubelet speaks. Any high-level runtime that implements CRI can plug into K8s.

CRI only asks the minimum. It doesn't care what's behind the interface — Linux namespace, VM, sandbox. As long as you can "create an isolated environment, run a process, report status," you're a valid CRI implementation. That's why virtlet could run VMs pretending to be Pods. kubelet never asked "are you a container or a VM?" because CRI doesn't have that question.

The API splits into two gRPC services:

**RuntimeService** (Pod and container lifecycle):
- `RunPodSandbox` → create the sandbox (network namespace, etc.)
- `StopPodSandbox` → stop it
- `CreateContainer` → create a container inside a sandbox
- `StartContainer` → start it
- `StopContainer` → stop it
- `ListContainers` → what's running?
- `ContainerStatus` → is it healthy?
- `ExecSync` → run a command inside a container
- `Attach` → attach stdin/stdout
- `PortForward` → forward a port

**ImageService** (image management):
- `PullImage` → pull from registry
- `ListImages` → what's cached?
- `RemoveImage` → delete from cache
- `ImageStatus` → size, digest, etc.

Full protobuf definition: [kubernetes/cri-api/api.proto](https://github.com/kubernetes/cri-api/blob/v0.33.1/pkg/apis/runtime/v1/api.proto)

kubelet doesn't care if it's talking to containerd or CRI-O. It calls `RunPodSandbox`, `CreateContainer`, `StartContainer`. The runtime handles the rest.

### OCI — the right interface

Open Container Initiative. Two specs:

**Image spec** — how container images are packaged. Layers, manifests, digests. Docker invented this format. Then donated it to OCI. `docker pull nginx` and `ctr image pull nginx` get the same bytes. Docker's manifest v2 and OCI's manifest have minor structural differences. Registries and runtimes handle both transparently.

**Runtime spec** — how containers are created. A standard directory layout: `config.json` (namespaces, cgroups, mounts, env vars) plus a `rootfs` (the filesystem). Any low-level runtime that reads this format can run the container.

containerd doesn't care if it calls runc or kata-runtime. It passes an OCI bundle. The low-level runtime reads `config.json`, sets up isolation, and starts the process.

---

## Every name explained

### containerd

High-level runtime. Pulls images, manages snapshots (filesystem layers), calls runc to create containers, monitors their lifecycle. Doesn't create containers itself — delegates to a low-level runtime.

containerd started inside Docker. Docker's monolith got too big, so Docker split itself into pieces. containerd was one of them. In 2017 Docker donated it to CNCF (the same foundation that hosts K8s). In 2019 it graduated as an independent project. Now it belongs to nobody. Docker uses it. K8s uses it. Neither owns it.

### runc

Low-level runtime. The default. Takes an OCI bundle, sets up Linux namespaces, cgroups, chroot, then exec's the container process. Finishes in milliseconds. Then exits.

That's the key: **runc exits after creating the container.** It's a one-shot tool. Start the container, leave. Someone else needs to babysit.

### containerd-shim

runc exits after creating the container. shim stays as the container's parent process. containerd can crash, restart, upgrade — shim keeps the container alive. Uses a double fork trick to reparent to PID 1 (systemd), cutting the process tree link to containerd.

v2 (`containerd-shim-runc-v2`) changed from per-container to per-Pod, gRPC to ttrpc, and added pluggable runtime naming.

Deep dive: [Inside containerd: How shim and runc Actually Work](/v2/containerd-shim-and-runc)

### CRI-O

High-level runtime. Red Hat's answer to containerd. Built specifically for K8s. No Docker legacy, no image build, no extras. Speaks CRI, calls OCI runtimes. Versions match K8s versions (CRI-O 1.28 for K8s 1.28).

### dockershim (dead)

Translator that lived in K8s source code. Converted CRI calls to Docker API calls. Removed in K8s 1.24. If you still need Docker as a runtime, use `cri-dockerd` instead.

### cri-dockerd (the afterlife of dockershim)

Mirantis pulled dockershim's code out of K8s's repo, renamed it `cri-dockerd`, and maintained it independently. Same translation logic. Different address. See [K8s 1.24 section](#k8s-124-2022--dockershim-deleted) for why it exists and why most teams skip it.

### The full routing picture

Three paths into runc. Every one ends at the same place.

```
                    ┌─ containerd (CRI plugin) → runc → container    ← mainstream
                    │
kubelet ── CRI ─────┤─ CRI-O → runc → container                     ← OpenShift
                    │
                    └─ cri-dockerd → Docker → containerd → runc      ← legacy detour
```

---

## Low-level runtime alternatives

runc creates real Linux containers. But containers share the host kernel. A kernel exploit in one container can escape to the host.

Before gVisor and kata, people had two ways to harden containers:

**Kernel-level filtering** (seccomp, AppArmor, SELinux):
- Don't change the runtime. Just restrict what syscalls a container can make.
- runc already supports this — seccomp profiles ship with Docker and containerd by default.
- Block mount, reboot, ptrace. Lightweight. Negligible overhead (BPF filter runs per syscall, but nanoseconds).
- Problem: the attack surface is still the host kernel itself. A kernel bug bypasses all filters. You're locking doors in a house with no walls.

**Full VM isolation** (KVM):
- Run each container inside a traditional VM. Separate kernel. Total isolation.
- Problem: too heavy. A VM takes seconds to boot, GBs of memory. Defeats the whole point of lightweight containers.

Filters were too thin. VMs were too thick. gVisor and kata found the middle ground.

### gVisor (runsc)

Google's approach. Intercepts system calls with a user-space kernel called Sentry. The container thinks it's talking to Linux. It's actually talking to gVisor, which filters and re-implements syscalls before forwarding a subset to the real kernel.

- Normal container: `app → syscall → host kernel`
- gVisor container: `app → syscall → Sentry (user-space) → filtered syscall → host kernel`

Safer. Slower. The syscall interception adds overhead. Good for running untrusted code.

### kata-runtime

Creates a real lightweight VM for each container. The container runs inside its own kernel. Full isolation. A kernel exploit stays inside the VM.

- Normal container: `app → shared host kernel`
- kata container: `app → guest kernel (inside VM) → host kernel`

Safest. Heaviest. Requires hardware virtualization support. Used when you absolutely cannot share a kernel.

Both are OCI-compatible. Swap them into containerd or CRI-O with a runtime class config. The workload doesn't know the difference.

---

## containerd namespace isolation

containerd isolates K8s workloads from Docker workloads using namespaces (not K8s namespaces — containerd's own concept).

```bash
$ sudo ctr namespaces ls
NAME    LABELS
k8s.io          # kubelet's containers
moby            # Docker's containers
```

- `kubelet → containerd → containerd-shim (k8s.io namespace) → runc → container`
- `docker → containerd → containerd-shim (moby namespace) → runc → container`

Two clients sharing one containerd. They can't see each other's containers. `docker ps` won't show K8s pods. `crictl ps` won't show Docker containers. Same runtime, isolated worlds.

---

## The timeline

| Year | Event |
|---|---|
| 2014 | K8s launches. Docker hardcoded in kubelet. No interface |
| 2015 | CoreOS releases rkt. Wants to plug into K8s. Can't — Docker is hardcoded |
| 2016 | K8s defines CRI. rkt and containerd implement it. Docker doesn't. K8s writes dockershim |
| 2017 | containerd 1.0. Docker donates it to CNCF. CRI-containerd runs as standalone daemon |
| 2018 | containerd 1.1. CRI plugin built in. Clean kubelet → containerd path |
| 2019 | CRI-O matures. Red Hat ships it with OpenShift. Docker adds K8s support to Docker Desktop — Swarm has lost |
| 2020 | K8s announces dockershim deprecation. Panic ensues |
| 2022 | K8s 1.24. dockershim removed. docker.sock disappears from Nodes. CI pipelines break. kaniko and buildah gain adoption |

Eight years from "Docker is everything" to "Docker is optional." The tools changed. The lesson didn't: interfaces (CRI, OCI) outlive implementations. That's why K8s defined them.

---

## References

- [CRI API protobuf definition (RuntimeService, ImageService)](https://github.com/kubernetes/cri-api/blob/v0.33.1/pkg/apis/runtime/v1/api.proto)
- [K8s official: Container Runtime Interface (CRI)](https://kubernetes.io/docs/concepts/architecture/cri/)
- [K8s blog: Introducing CRI (2016, architecture diagrams)](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
- [K8s blog: Dockershim Historical Context (2022, full story)](https://kubernetes.io/blog/2022/05/03/dockershim-historical-context/)
- [K8s blog: Don't Panic — Kubernetes and Docker (2020)](https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/)
- [K8s blog: Dockershim Removal FAQ (2022)](https://kubernetes.io/blog/2022/02/17/dockershim-faq/)
- [Alibaba Cloud: Container Runtime evolution (before/after architecture diagrams)](https://www.alibabacloud.com/blog/a-discussion-on-container-runtime---starting-with-dockershim-being-deleted-by-kubernetes_600118)
- [K8s official: Container Runtimes setup guide](https://kubernetes.io/docs/setup/production-environment/container-runtimes/)
- [Containerd组件 - containerd-shim-runc-v2作用 (cnblogs)](https://www.cnblogs.com/zhangmingcheng/p/17524721.html)
