Huge files in Docker containers

I need to create a Docker image (and therefore containers from this image) that use large files (containing genomic data, thus reaching ~ 10 GB in size).

How can I optimize their use? Should I include them in a container (e.g. COPY large_folder large_folder_in_container )? Is there a better way to link to such files? The thing is, it seems strange to me to push such a container (which will be> 10 GB) in my private repository. I wonder if there is a way to attach its volume to the container without packing all of these GBs together.

Thanks.

+7
docker docker-container dockerfile
source share
2 answers

Should I include them in a container (e.g. COPY large_folder large_folder_in_container )?

If you do this, it will include them in the image, not in the container: you can run 20 containers from this image, the actual occupied disk space will still be 10 GB.

If you want to make another image from your first image, the multi-level file system will reuse layers from the parent image, and the new image will still be β€œonly” 10 GB.

+5
source share

Is there a better way to link to such files?

If you already have a way to distribute the data, I would use "binding binding" to attach the volume to the containers.

 docker run -v /path/to/data/on/host:/path/to/data/in/container <image> ... 

This way you can change the image and you do not have to reload a large data set every time.

If you want to use the registry to distribute a large data set, but want to manage changes to the data set separately, you can use the data volume container with the Dockerfile as follows:

 FROM scratch COPY dataset /dataset VOLUME /dataset 

In the application container, you can attach this volume using:

 docker run -d --name dataset <data volume image name> docker run --volumes-from dataset <image> ... 

Anyway, I think https://docs.docker.com/engine/tutorials/dockervolumes/ is what you want.

+4
source share

All Articles