ut between that call and the running container, three programs do a relay — and one of them commits suicide on purpose."
---

kubelet calls containerd. Container appears. But between that call and the running container, three programs do a relay — and one of them commits suicide on purpose.

> This is containerd's internal machinery. K8s doesn't know any of this exists. For how K8s got to containerd in the first place, see [Container Runtime: A Divorce Story](/v2/container-runtime-evolution).

---

## runc: the one-shot tool

runc reads an OCI bundle (a `config.json` + `rootfs` directory). Sets up Linux namespaces, cgroups, chroot. Exec's the container process. Then exits.

That last part is the key. **runc doesn't babysit.** It builds the house, hands over the keys, and leaves. The container process is running, but runc is already gone. Who's the parent now?

---

## The orphan problem

Every Linux process has a parent. Parent is responsible for calling `wait()` when the child dies. No `wait()` call means the dead child becomes a zombie — a row in the process table that never gets cleaned up. Zombies pile up and eventually exhaust available PIDs.

If runc just exited and containerd was the parent:

> `containerd (PID 500) → runc → container (PID 502) → runc exits`
>
> container's parent = containerd
>
> - **Problem 1**: containerd restarts for upgrade → container's parent dies → container loses stdio
> - **Problem 2**: containerd crashes → nobody calls `wait()` on container → zombie

containerd can't be the parent. It needs to be free to restart, upgrade, and crash without killing containers. So there's a middleman.

---

## containerd-shim: the double fork trick

containerd doesn't directly parent the container. It uses a classic Unix trick: **double fork**.

1. containerd forks → `containerd-shim` (child)
2. `containerd-shim` forks → `containerd-shim-runc-v2` (grandchild)
3. `containerd-shim` exits immediately
4. `containerd-shim-runc-v2` is now an orphan
5. Linux kernel reparents it to PID 1 (systemd)

> These steps are inferred from the double fork pattern and the final process tree confirmed by pstree output in the cnblogs reference. No single source documents all five steps together.

```
Before (right after fork):
  containerd (PID 500)
    └── containerd-shim (PID 501)
          └── containerd-shim-runc-v2 (PID 502)

After (containerd-shim exits):
  containerd (PID 500)              ← no longer parent of anything
  systemd (PID 1)
    └── containerd-shim-runc-v2 (PID 502)   ← adopted by PID 1
          └── container process (PID 503)
```

containerd and shim are no longer in a parent-child relationship. They communicate over a ttrpc socket. Socket can disconnect and reconnect. Process tree can't.

---

## Why orphans don't die

A common misconception: "parent dies, child dies too." Wrong. That's not how Linux works.

> Parent dies → child gets adopted by PID 1 → child keeps running.

The confusion comes from **shells**. When you close a terminal:

- Terminal closes → sends `SIGHUP` to bash
- Bash receives `SIGHUP` → forwards `SIGHUP` to all its children
- Children receive `SIGHUP` → default behavior is to exit
- Bash exits

The children didn't die because their parent died. They died because **bash actively killed them before dying.** That's shell behavior, not kernel behavior.

`nohup` works by ignoring SIGHUP:

```bash
nohup python script.py &
# bash closes → sends SIGHUP → python ignores it → python keeps running
# bash dies → python is orphan → kernel gives it to PID 1
```

containerd-shim exits without sending any signal. It's not a shell. So containerd-shim-runc-v2 just becomes an orphan and keeps running.

---

## What the shim actually does all day

After runc exits and the container is running, the shim sits there and does three things:

| Job | Why |
|---|---|
| Hold stdin/stdout/stderr | `kubectl logs` and `kubectl attach` need a file descriptor to read from. shim keeps them open |
| Call `wait()` when container dies | Collects exit code. Prevents zombie. Reports back to containerd |
| Respond to containerd over ttrpc | containerd asks "is it still running?" or "kill it." shim executes |

That's it. Most of the time the shim is idle. It wakes up when the container exits or when containerd sends a request.

---

## v1 → v2: same job, better packaging

The core mechanism (double fork, reparent, wait, hold stdio) is identical in v1 and v2. What changed:

**v1: one shim per container**

- Pod with 3 containers → 3 shim processes
- 30 Pods × 2 containers average → 60 shim processes on one Node
- Each shim uses gRPC (HTTP/2 + protobuf) to talk to containerd. Heavy for small messages.

**v2: one shim per Pod**

- Pod with 3 containers → 1 shim process managing all three
- 30 Pods → 30 shim processes (not 60)
- Uses ttrpc (raw socket + protobuf) instead of gRPC. Lighter.

The naming convention also made runtimes pluggable:

- `containerd-shim-runc-v2` → uses runc (default)
- `containerd-shim-runsc-v2` → uses gVisor
- `containerd-shim-kata-v2` → uses kata

containerd picks the right binary based on the Pod's RuntimeClass. Adding a new runtime means dropping a new binary on the Node.

---

## The full picture

```
kubelet
  │ CRI call: "start this Pod"
  ▼
containerd
  │ fork
  ▼
containerd-shim (middle process)
  │ fork
  ▼
containerd-shim-runc-v2 (babysitter) ← reparented to PID 1
  │ exec
  ▼
runc
  │ set up namespaces, cgroups, chroot
  │ start container process
  │ exit (job done)
  ▼
container process (running)
  parent = containerd-shim-runc-v2
  grandparent = systemd (PID 1)

Steady state (containerd-shim and runc both exited):

systemd (PID 1)
  ├── containerd
  └── containerd-shim-runc-v2
          └── nginx (container PID 1)
```

containerd and containerd-shim-runc-v2 are siblings under systemd. Not parent-child. They talk over ttrpc socket, not process tree.

`nginx` shows PID 1 **inside the container's PID namespace**. From the host, it has a normal PID like 50321. Linux PID namespaces give each container its own numbering.

containerd can now crash, restart, upgrade. shim is still there. Container keeps running.

---

## References

- [Containerd组件 - containerd-shim-runc-v2作用 (cnblogs)](https://www.cnblogs.com/zhangmingcheng/p/17524721.html)
- [Implementing Container Runtime Shim: runc (iximiuz)](https://iximiuz.com/en/posts/implementing-container-runtime-shim/)
- [containerd runtime/v2 README (official, shim API and ttrpc design)](https://github.com/containerd/containerd/blob/main/core/runtime/v2/README.md)
- [What are the differences between runc v1 and v2? (containerd Discussion #7407)](https://github.com/containerd/containerd/discussions/7407)
- [Shim v1 vs v2 architecture comparison (openEuler, Kata perspective)](https://www.openeuler.org/en/blog/gaohuatao/2021-04-09-isulad-shimv1-shimv2-diff.html)
