Docker - Using Compose to scale up elasticsearch cluster

In a previous post we’ ve shown how to create a cluster in elasticsearch with manually defined nodes. In this post we use docker compose to scale things up!

Elasticsearch has many solutions to play with (i.e. logstash and kibana aka elk, hq etc) but I have tried to use a more simple example and comment on various issues that I have come across.

Creating the compose file

First create a docker-compose.yml file and add two services:

$docker>mkdir es-cluster
$docker>cd es-cluster
$es-cluster>touch docker-compose.yml

With a text editor open docker-compose file and add the following:

version:'2'
    services:
    master:
        image: library/elasticsearch
        command: elasticsearch -network.host=0.0.0.0 -node.master=true -cluster.name=cluster-01 -node.name="Master of Disaster"
        volumes:
        - ./elasticsearch/config/:/usr/share/elasticsearch/config/
        - ./elasticsearch/logs/:/usr/share/elasticsearch/logs/
        ports:
        - "9200:9200"
        restart: always
        container_name: es_master
    node:
        image: library/elasticsearch
        command: elasticsearch -network.host=0.0.0.0 -cluster.name=cluster-01 -discovery.zen.ping.unicast.hosts=es_master
        restart: always
        volumes:
        - ./elasticsearch/config/:/usr/share/elasticsearch/config/
        - ./elasticsearch/logs/:/usr/share/elasticsearch/logs/
        depends_on:
        - master
        links:
        - master:es_master

master: The first service named master will be the master of the cluster.

node: The second service named node will represent all data nodes that participate in the cluster.

image: library/elasticsearch pulls (if not existing) the image of elastic search.

-node.master=true we set that in the master service to declare it as the master of the cluster.

-node.name=”Master of Disaster” optionally we give a name to the master node.

container_name: es_master we set the name es_master to the container so that the master will be discoverable by the rest data nodes.

-discovery.zen.ping.unicast.hosts=es_master each node will look up for the es_master container to connect.

- “9200:9200” in the ports section we expose the http port to the host.

-network.host=0.0.0.0 defines the where the service will be published. If this is not set the service will be published in the localhost (127.0.0.2) of the container and it won’t be available outside of it (at least not at Docker 1.12 Beta that I am using). By defining 0.0.0.0 as network host the service is exported in the containers network ip (i.e. publish_address {172.22.0.2:9200}).

-cluster.name=cluster-01 defines the name of the cluster. This should unique and the same to both master and node services in order all nodes to participate in the same cluster.

In the volumes section we map the local directories to those in the containers:

- ./elasticsearch/config/:/usr/share/elasticsearch/config/
- ./elasticsearch/logs/:/usr/share/elasticsearch/logs/

restart: always so that the master will always restart automatically every time the system reboots.

- master in the depend_on section we say that node will be depending on the master.

- master:es_master finally, in the links section we declare the link towards the master service.

Preparing for the first run

Before running this configuration for the first time it would be nice to create a logging file and set it in the config directory so that the log4j instance of elasticsearch can export logs in a favourable format. So, create the directories needed:

$cluster>mkdir elasticsearch
$cluster>cd elasticsearch
$elasticsearch>mkdir config

Then create the logging.yml file in the config directory:

$elasticsearch>cd config
$config>touch logging.yml

To create two appenders (one for the console and one to write into a file) open logging.yml with a text editor and add the following:

es.logger.level: INFO
rootLogger: ${es.logger.level}, console, file
logger:
    action: TRACE
    appender:
        console:
            type: console
            layout:
                type: consolePattern
                conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
        file:
            type: dailyRollingFile
            file: ${path.logs}/${cluster.name}.log
            datePattern: "yyyy-MM-dd"
            layout:
                type: pattern
                conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

Run for the first time

Run to test without scaling.

$es-cluster>docker-compose up

This will create a default network for the containers, es-cluster_default, named after the directory that the compose file is placed. The services will be attached on that network to communicate.

To see cluster’s health, enter the following URL:

localhost:9200/_cluster/health?pretty=true

Something like this should be shown:

{
    "cluster_name" : "cluster-01",
    "status" : "green",
    "timed_out" : false,
    "number_of_nodes" : 2,
    "number_of_data_nodes" : 2,
    "active_primary_shards" : 0,
    "active_shards" : 0,
    "relocating_shards" : 0,
    "initializing_shards" : 0,
    "unassigned_shards" : 0,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 0,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 100.0
}

Scale up the cluster

Now that we are positive that everything runs smoothly, we can stop the running processes (either with Ctrl C or with docker-compose stop). Beware only of one thing. If you terminate the processes by running docker-compose down it will stop and remove not only the running containers but also the network es-cluster_default. To avoid this and not depend on this network, it is better to create an external network and use it in the compose file:

$es-cluster>docker network create es-network

Edit the docker-compose.yml file and add the following lines at the end of the file:

networks:
    default:
    external:
        name: es-network

This section defines that the default network for the containers will the es-network instead of the es-cluster_default. And because it is the default network there is no need to add a networks section in each service. So, now we are ready to fire up our cluster with a master node and, let’s say, 5 more nodes:

$es-cluster>docker-compose scale master=1 node=5

Once again, to see our cluster’s health, in the browser give the address:

localhost:9200/_cluster/health?pretty=true

Something like this should be shown:

{
    "cluster_name" : "cluster-01",
    "status" : "green",
    "timed_out" : false,
    "number_of_nodes" : 6,
    "number_of_data_nodes" : 6,
    "active_primary_shards" : 0,
    "active_shards" : 0,
    "relocating_shards" : 0,
    "initializing_shards" : 0,
    "unassigned_shards" : 0,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 0,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 100.0
}

Loading data

At that point you need to load some data into elasticsearch. From Loading the Sample Dataset you can download the sample dataset (accounts.json), extract it to your current directory and load it into your cluster as follows:

$es-cluster>curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"

After loading it run:

$es-cluster>curl 'localhost:9200/_cat/indices?v'

and it will show:

health status index pri rep docs.count docs.deleted store.size pri.store.size
green open bank 5 1 889 0 344.3kb 189.7kb

If we refresh the page in the browser it will show:

{
    "cluster_name" : "cluster-01",
    "status" : "green",
    "timed_out" : false,
    "number_of_nodes" : 6,
    "number_of_data_nodes" : 6,
    "active_primary_shards" : 5,
    "active_shards" : 10,
    "relocating_shards" : 0,
    "initializing_shards" : 0,
    "unassigned_shards" : 0,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 0,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 100.0
}

Adding the HQ plugin

A nice plugin for visualising the cluster and query the data within is the HQ. The easiest way to install it is to run the plugin install command for the master:

$es-cluster>docker exec es_master plugin install royrusso/elasticsearch-HQ

This will download hq and it will install it. Then, in the browser go to the address localhost:9200/_plugin/hq and connect to the http://localhost:9200 in the top of the screen.

Summary

This was yet another simple example of creating clusters with Docker compose. I hope though, I sed some light in various parts of the docker compose and networking…
Enjoy coding!