Update ES mapping

Assume that there is a session doc type under the index called abc.

The following example will update the existing field date and enforce the format.

PUT /abc/_mapping/session
{
  "properties": {
    "date": {
      "type": "date",
      "format" : "yyyy-MM-dd" 
    }
  } 
}

If you want to add a new nested type called scans,

PUT /abc/_mapping/session
{
  "properties": {
    "scans": {
      "type": "nested"
    }
  }
}
Advertisements

Create an index in Elasticsearch

The following shows a minimal setup for creating an index in Elasticsearch (5.1.0). This example attempted to create an index test and define some of the properties for a type, my_type. One thing worth to mention in the example is the keyword mapping for string-typed fields defined in dynamic_templates. This setting will add a keyword field (previously known as the raw field) to all string fields. ES does not automatically create such keyword fields for custom types but this sometimes creates troubles for querying or visualization because the string fields are tokenized by default. Having such extra keyword(not analyzed) value for string fields is often found useful.

PUT test
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
      "my_type": {
        "_all": {
          "enabled": true,
          "norms": false
        },
        "dynamic_templates": [
          {
            "message_field": {
              "path_match": "message",
              "match_mapping_type": "string",
              "mapping": {
                "norms": false,
                "type": "text"
              }
            }
          },
          {
            "string_fields": {
              "match": "*",
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "keyword": {
                    "type": "keyword"
                  }
                },
                "norms": false,
                "type": "text"
              }
            }
          }
        ]
      }
    }
}

Dockerfile Explained

A Dockerfile is a script that includes a series of commands to automatically build a new Docker image from a base image. The Dockerfile is provided to the Docker daemon, which in turn executes the instructions inside the Dockerfile and creates the image.

Use Cases

One of the simplest use cases is one wants to customize a Docker image pulled from Dockerhub, adding new commands or changing the provided entrypoint scripts.

Dockerfile can also be useful to dynamic container provisioning. Imagine you work at a company provides PaaS or FaaS. The services requests sent from your clients can be mapped into the Dockerfiles. Docker daemon will then build the image on demand and pass the containers back to your clients.

Instructions Used by Dockerfile

You may have already noticed that Dockerfile’s syntax is rather simple. Each line is either a comment or a instruction followed by arguments, as shown below.

# Comment
INSTRUCTION arguments

We will now walk through a sample Dockerfile, taken from a Jupyter build, and explain the structure and commands step-by-step.

Dockerfile use # for line comment. The command FROM indicates the base image to use. In this example, it uses jupyter/pyspark-notebook as the base image. If the base image isn’t already on your host, Docker daemon will try to pull the image from Dockerhub.

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
FROM jupyter/pyspark-notebook

Define the maintainer.

MAINTAINER Jupyter Project <jupyter@googlegroups.com>

Define the user that runs the container.

USER root

The ENV command is to set the environment variables that can be accessed by the processes running inside the container. This is equivalent to run export VAR=arguments in a Linux shell.

# RSpark config
ENV R_LIBS_USER $SPARK_HOME/R/lib

The RUN command is to execute its arguments, in this case apt-get, inside the container. The scope of RUN is within the building time.

# R pre-requisites
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    fonts-dejavu \
    gfortran \
    gcc && apt-get clean && \
    rm -rf /var/lib/apt/lists/*

USER $NB_USER

# R packages
RUN conda config --add channels r && \
    conda install --quiet --yes \
    'r-base=3.3.2' \
    'r-irkernel=0.7*' \
    'r-ggplot2=2.2*' \
    'r-rcurl=1.95*' && conda clean -tipsy

# Apache Toree kernel
RUN pip --no-cache-dir install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
RUN jupyter toree install --sys-prefix

# Spylon-kernel
RUN conda install --quiet --yes 'spylon-kernel=0.2*'
RUN python -m spylon_kernel install --sys-prefix

Build the Image

The following example shows how to build an image using the Dockerfile. It is always recommended you build the image from the directory where the Dockerfile lives in. Be careful about the dot at the end of the line, it instructs the build to use current working dir as the build context.

## --rm  clean up the intermediate layers
## -t    target, e.g., apache/toree:1.02. The default tag is latest 
sudo docker build --rm -t repo:tag .

It is worth to mention, Docker uses cache to accelerate the build. If the Dockerfile has a new line inserted, Docker will use cached image layers before that new line and rebuild everything from that new line to the end.

Bring up ELK on Docker Swarm

Assuming there is a working Docker Swarm, this blog describes the steps to bring up an ELK stack on Docker Swarm.

First off, you need to decide if the official ELK Docker images on Docker Hub work for you; Or you would need to use custom images. If the official ones (Elasticsearch, Kibana, Logstash) serve the purpose, you may directly skip to service creation section; Otherwise you need to build the images on all individual nodes in the Swarm cluster or setup your own Docker registry.

Service Creation

All services should be created on the manager node in Swarm cluster. First create an Elasticsearch service called es-master, mapping a host dir /data/es to /usr/share/elasticsearch/data within the container. This also assumes an overlay network es is already existing.

docker service create \
               --network es \
               --name es-master \
               -p 9200:9200 \
               --mount type=bind,source=/data/es,destination=/usr/share/elasticsearch/data \
               elasticsearch

Create Kibana service called kibana, joining into es network. -e option points to es-master. The example command uses a custom Kibana image called kibana/plugin.

docker service create \
               --network es \
               --name kibana \
               -p 5601:5601 \
               -e ELASTICSEARCH_URL=http://es-master:9200 kibana/plugin

To verify the services,

docker service ls

ID            NAME       REPLICAS  IMAGE          COMMAND
5w8v5jksx7h5  kibana     1/1       kibana/plugin  
bpojoyb5wz16  es-master  1/1       elasticsearch  

To see on which node kibana is running,

docker service ps kibana

ID                         NAME      IMAGE          NODE          DESIRED STATE  CURRENT STATE           ERROR
39sadh4cfpqp0zwdh6mbh47er  kibana.1  kibana/plugin  indocgubt104  Running        Running 34 seconds ago  

To launch kibana in a browser, type node_IP:5601 in URL bar. Note that you can use either the IP address of manager node or the worker node actually runs kibana.

Setup a Docker Overlay Network on Multiple Hosts

We have seen many use cases where one fires up a few Docker containers on one single host. To accommodate the growth of data or complexity in business, we would need to consider running the containerized tasks on multiple physical hosts. One of the challenges was how to maintain the communications among the distributed tasks as if they were on the same host.

Fortunately Docker provides a mechanism called Overlay Networking, which basically creates an VXLAN layer 2 overlay tunnel on top of layer 3, i.e., TCP/IP. The details won’t be discussed here but interested readers can go here for more information. It is not hard to imagine that this would allow two containers, sitting on different hosts, to talk with each other. Cool!

This blog will walk through a simple example to create a Docker Swarm that spans two physical hosts, and we will create a Overlay Network to stitch together the distributed containers.

Since the version 1.12.0, Docker Engine natively includes Swarm mode, which makes bringing up a Swarm cluster much easier than using the previous standalone Swarm. Say now there are two nodes, node 1 and node 2. We decide to elect the node 1 to be the manager node. Note that one can have more than one manager nodes in a Swarm cluster but for the sake of simplicity, we just use node 1.

Initiate the Swarm

The following command on node 1 will initiate the Swarm and elect that node as the manager. This command will also spit out the command you would use on node 2 to join the cluster.

docker swarm init

Copy and paste the output from the above, run on node 2.

docker swarm join -- token ...

Come back on node 1 to verify 2 nodes are present in the cluster.

docker node ls

ID                           HOSTNAME      STATUS  AVAILABILITY  MANAGER STATUS
1gwudwxftloza3vldyr4p6p4y *  indocgubt103  Ready   Active        Leader
e9bcxw8vy1ow0jp80gopr2c58    indocgubt104  Ready   Active        

Create an Overlay Network

Now let’s create an Overlay Network called es. On node 1 run the following:

docker network create -d overlay es

To verify, run the following on node 1. Please note that es won’t show up on node 2 until there is actually a container uses the network.

docker network ls

NETWORK ID          NAME                DRIVER              SCOPE
43652a980910        bridge              bridge              local               
1ac35860a4cb        docker_gwbridge     bridge              local               
912wlikzt94x        es                  overlay             swarm               
5032a295b055        host                host                local               
1my3c1fbunaq        ingress             overlay             swarm               
7687da500317        none                null                local  

Attach the Service to the Overlay Network

Uses the example in this post, I would like to deploy an Elasticsearch service to es network.

docker service create \
               --network es \
               --name es-master \
               -p 9200:9200 \
               --mount type=bind,source=/data/es,destination=/usr/share/elasticsearch/data \
               elasticsearch

Now we bring up another test container to see if it can talk / ping the es-master.

docker service create \
               --name test \
               --network es \
               busybox sleep 300000

Run docker service ps to find out on which node the busybox is sent to, switch to that node, run

docker exec -it container_ID /bin/sh
ping es-master

If nothing goes wrong, the ping should return the results. More information about service communication on Overlay network is here