Dockerfile Explained

A Dockerfile is a script that includes a series of commands to automatically build a new Docker image from a base image. The Dockerfile is provided to the Docker daemon, which in turn executes the instructions inside the Dockerfile and creates the image.

Use Cases

One of the simplest use cases is one wants to customize a Docker image pulled from Dockerhub, adding new commands or changing the provided entrypoint scripts.

Dockerfile can also be useful to dynamic container provisioning. Imagine you work at a company provides PaaS or FaaS. The services requests sent from your clients can be mapped into the Dockerfiles. Docker daemon will then build the image on demand and pass the containers back to your clients.

Instructions Used by Dockerfile

You may have already noticed that Dockerfile’s syntax is rather simple. Each line is either a comment or a instruction followed by arguments, as shown below.

# Comment
INSTRUCTION arguments

We will now walk through a sample Dockerfile, taken from a Jupyter build, and explain the structure and commands step-by-step.

Dockerfile use # for line comment. The command FROM indicates the base image to use. In this example, it uses jupyter/pyspark-notebook as the base image. If the base image isn’t already on your host, Docker daemon will try to pull the image from Dockerhub.

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
FROM jupyter/pyspark-notebook

Define the maintainer.

MAINTAINER Jupyter Project <jupyter@googlegroups.com>

Define the user that runs the container.

USER root

The ENV command is to set the environment variables that can be accessed by the processes running inside the container. This is equivalent to run export VAR=arguments in a Linux shell.

# RSpark config
ENV R_LIBS_USER $SPARK_HOME/R/lib

The RUN command is to execute its arguments, in this case apt-get, inside the container. The scope of RUN is within the building time.

# R pre-requisites
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    fonts-dejavu \
    gfortran \
    gcc && apt-get clean && \
    rm -rf /var/lib/apt/lists/*

USER $NB_USER

# R packages
RUN conda config --add channels r && \
    conda install --quiet --yes \
    'r-base=3.3.2' \
    'r-irkernel=0.7*' \
    'r-ggplot2=2.2*' \
    'r-rcurl=1.95*' && conda clean -tipsy

# Apache Toree kernel
RUN pip --no-cache-dir install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
RUN jupyter toree install --sys-prefix

# Spylon-kernel
RUN conda install --quiet --yes 'spylon-kernel=0.2*'
RUN python -m spylon_kernel install --sys-prefix

Build the Image

The following example shows how to build an image using the Dockerfile. It is always recommended you build the image from the directory where the Dockerfile lives in. Be careful about the dot at the end of the line, it instructs the build to use current working dir as the build context.

## --rm  clean up the intermediate layers
## -t    target, e.g., apache/toree:1.02. The default tag is latest 
sudo docker build --rm -t repo:tag .

It is worth to mention, Docker uses cache to accelerate the build. If the Dockerfile has a new line inserted, Docker will use cached image layers before that new line and rebuild everything from that new line to the end.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s