> This difference is particularly noticeable with multiple images sharing the same base layers. With legacy storage drivers, shared base layers were stored once locally, and reused images that depended on them. With containerd, each image stores its own compressed version of shared layers, even though the uncompressed layers are still de-duplicated through snapshotters.
This seems like a really weird decision. If base images are duplicated for every image you have, that will add up quickly.
This is hell for a lot of ML containers, that have gigabytes of CUDA and PyTorch. Before at least you could keep your code contained to a layer. But if I understand this correctly every code revision duplicates gigabytes of the same damn bloated crap.
You don't even need MB of training data for some ML applications. AI is the sexy thing nowadays, but neural networks (Torch is a NN library) are generally useful for regression and clarification problems.
Petabytes of training data is only one application of PyTorch, which is going to use tens of thousands of containers, but...
Inference, development cycles, any of the application domains of PyTorch that don't involve training frontier models... all of those are complicated by excessive container layers.
But mostly dev really sucks with writing out an extra 10GB for a small code change.
Docker already fills up my dev machines yet they decided for this insane solution:
> The containerd image store uses more disk space than the legacy storage drivers for the same images. This is because containerd stores images in both compressed and uncompressed formats, while the legacy drivers stored only the uncompressed layers.
Just in case - I'm always amazed how many Docker users don't know about the prune command for cleaning up the caches and deleting unused container images and just slowly let their docker image cache eat their disk.
Sounds like a straightforward time-space tradeoff: if you have the compressed layers sitting around when you need them, you can avoid the expense and time of compressing them.
I'm not sure about the fastest macbook disk access, but even with NVMe storage I've found lz4 to be faster than the disk. That is (it's hard to say this exactly correct) compressed content gets read/written FASTER than uncompressed content because fewer bytes need to transit the disk interface and the CPU is able to compress/decompress significantly faster than data is able to go through whatever disk bus you've got.
That will make Apple happy, all the people who didn't get a large enough disk when they purchased their laptops last time around are already struggling with local AI models.
It is shameful for apple to hard solder their disks. There is no benefit to the user
As we have seen with framework even the hard solder ram is not needed to get reasonable performance. At least let me expand my memory even if it doesn't perform as fast as on chip.
> It is shameful for apple to hard solder their disks. There is no benefit to the user
Actually, it is. The speed and latency difference does matter, that is how even an 8GB RAM MacBook feels snappier than many a 32GB Windows machine - it can use the disk as swap.
I had to work on a Mac M3 for a year, it sucked, it did not feel snappier than any Windows or Linux machine (including this one) that I've ever used and that is going back to the 1980's.
I suggest you judge based on benchmarks rather than vibes.
If you believe the latest M3 does not perform better than machines you’ve used in the 80s, I have no idea how to even start a reasonable discussion about this.
I noticed the change because I wanted to persist Docker-related data between container instantiations on IncusOS. I couldn't understand why the custom volume I had mounted on /var/lib/docker didn't contain the downloaded images.
To keep both /var/lib/{containerd,docker} in sync, I use a single ZFS dataset ("custom filesystem volume" in Incus parlance) and mount subpaths inside the container:
incus storage volume create local docker-data
incus config device add docker docker disk pool=local source=docker-data/docker path=/var/lib/docker
incus config device add docker containerd disk pool=local source=docker-data/containerd path=/var/lib/containerd
There are other ways to achieve the same of course.
> This difference is particularly noticeable with multiple images sharing the same base layers. With legacy storage drivers, shared base layers were stored once locally, and reused images that depended on them. With containerd, each image stores its own compressed version of shared layers, even though the uncompressed layers are still de-duplicated through snapshotters.
This seems like a really weird decision. If base images are duplicated for every image you have, that will add up quickly.
I think there is an Issue/PR right now to change this. See: https://github.com/containerd/containerd/issues/13307
This is hell for a lot of ML containers, that have gigabytes of CUDA and PyTorch. Before at least you could keep your code contained to a layer. But if I understand this correctly every code revision duplicates gigabytes of the same damn bloated crap.
If you have problems with 13 (I believe) GB of docker layers ... how do you deal with terabytes or petabytes of AI training data?
You don't even need MB of training data for some ML applications. AI is the sexy thing nowadays, but neural networks (Torch is a NN library) are generally useful for regression and clarification problems.
Petabytes of training data is only one application of PyTorch, which is going to use tens of thousands of containers, but...
Inference, development cycles, any of the application domains of PyTorch that don't involve training frontier models... all of those are complicated by excessive container layers.
But mostly dev really sucks with writing out an extra 10GB for a small code change.
the training data is on a separate drive; or the training data isn't that large for this use case; or they aren't training.
Docker is already hogging a lot of disk space and needs to be pruned regularly. I can't imagine what's it's going to be like now.
Docker already fills up my dev machines yet they decided for this insane solution:
> The containerd image store uses more disk space than the legacy storage drivers for the same images. This is because containerd stores images in both compressed and uncompressed formats, while the legacy drivers stored only the uncompressed layers.
Why ?
> https://docs.docker.com/reference/cli/docker/system/prune/
Just in case - I'm always amazed how many Docker users don't know about the prune command for cleaning up the caches and deleting unused container images and just slowly let their docker image cache eat their disk.
Sounds like a straightforward time-space tradeoff: if you have the compressed layers sitting around when you need them, you can avoid the expense and time of compressing them.
Why would I need the compressed layers?
Pushing
To save disk space /s
I'm not sure about the fastest macbook disk access, but even with NVMe storage I've found lz4 to be faster than the disk. That is (it's hard to say this exactly correct) compressed content gets read/written FASTER than uncompressed content because fewer bytes need to transit the disk interface and the CPU is able to compress/decompress significantly faster than data is able to go through whatever disk bus you've got.
On my 2 years old ThinkPad laptop SSD is faster than lz4. On a fat EC2 server lz4 is faster. So one really has to test a particular config.
did you mean the first "compressed" to be "uncompressed" ?
That will make Apple happy, all the people who didn't get a large enough disk when they purchased their laptops last time around are already struggling with local AI models.
It is shameful for apple to hard solder their disks. There is no benefit to the user
As we have seen with framework even the hard solder ram is not needed to get reasonable performance. At least let me expand my memory even if it doesn't perform as fast as on chip.
What does Apple have to do with any of this?
> It is shameful for apple to hard solder their disks. There is no benefit to the user
Actually, it is. The speed and latency difference does matter, that is how even an 8GB RAM MacBook feels snappier than many a 32GB Windows machine - it can use the disk as swap.
I had to work on a Mac M3 for a year, it sucked, it did not feel snappier than any Windows or Linux machine (including this one) that I've ever used and that is going back to the 1980's.
I suggest you judge based on benchmarks rather than vibes.
If you believe the latest M3 does not perform better than machines you’ve used in the 80s, I have no idea how to even start a reasonable discussion about this.
Docker v29 (released 2025-11) switched to using containerd for its image store for new installs.
This means `/var/lib/docker` is no longer "hermetic": images and container snapshots are located in `/var/lib/containerd` now.
More info about the switch: https://www.docker.com/blog/docker-engine-version-29/
To configure this directory, see https://docs.docker.com/engine/storage/containerd/.
I noticed the change because I wanted to persist Docker-related data between container instantiations on IncusOS. I couldn't understand why the custom volume I had mounted on /var/lib/docker didn't contain the downloaded images.
To keep both /var/lib/{containerd,docker} in sync, I use a single ZFS dataset ("custom filesystem volume" in Incus parlance) and mount subpaths inside the container:
There are other ways to achieve the same of course.The article says to regularly run prune, how regularly? Currently I run the following once per day from cron:
From the docs, you can just run `docker system prune -a --volumes`
Ref: https://docs.docker.com/reference/cli/docker/system/prune/
I should start looking into Podman.
Why not just use podman at this point?
They are adopting to containerd standard, not sure why negative sentiment