Deploying Hadoop and Solr with Docker


Docker is an open platform for building, shipping, running distributed applications. There are a lot of docker containers with different os and bundled with different application such as hadoop, mongoDB.

When we want to learn or give some tools a try, we can just call docker run with the specific image: for example: docker run --name some-mongo -d mongo
This will not mess our host environment, when we are done, we can just call docker kill to kill the running container.

We can also use Docker to create a consistent environment which can be ran on any Docker enabled machine.

In this article, I would like to introduce how to run hadoop and Solr in docker.

Install Hadoop Image and Run it
Search Haddop in Docker registry: https://registry.hub.docker.com, and I chooses the most popular sequenceiq/hadoop-docker
Run the command in my Ubuntu host:
docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

This will download the hadoop-docker image, and start it. After several minutes, it will start the bash of  hadoop-docker container.

Install Solr in Hadoop Container
Run the following commands, it will download latest Solr-4.10.1, and unzip it.
mkdir -p /home/lifelongprogrammer/src/solr; cd /home/lifelongprogrammer/src/solr
curl -O http://mirrors.advancedhosters.com/apache/lucene/solr/4.10.1/solr-4.10.1.tgz
tar -xf solr-4.10.1.tgz
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example

Then run the following command, it will run solr on HDFS with default port 8983.
java -Dsolr.directoryFactory=HdfsDirectoryFactory \
     -Dsolr.lock.type=hdfs \
     -Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
     -Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar

Run Solr in background on Startup
Edit /etc/bootstrap.sh, and add the following commands after HADOOP_PREFIX/sbin/start-yarn.sh  
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example && nohup java -Dsolr.directoryFactory=HdfsDirectoryFactory \
   -Dsolr.lock.type=hdfs \
   -Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
   -Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar &

Commit changes and Create Docker Images
First run docker ps to get the container id:
CONTAINER ID        IMAGE 
2cd8fadba668        93186936bee2

Then let's commit the change and create our own docker images:
docker commit 2cd8fadba668   hadoop_docker_withsolr

Run exit in opened docker bash to logout it. Then run
docker run -d -t -p 8983:8983 hadoop_docker_withsolr /etc/bootstrap.sh -d

The first -d tells docker to tun the image in detached mode, the -p tells docker to publish a container's port to the host
The last -d is parameter of /etc/bootstrap.sh 

After several minutes, we can access http://linuxhostip:8983/solr/#/ to access solr admin page. Now solr is running in the hadoop docker image.

After we are done with our test, we run docker ps to get its container id, then call docker kill $container_id to kill it. 

Persist Modified Image
Now let's save our modified docker image:
docker save hadoop_docker_withsolr  > hadoop_docker_withsolr_save.tar

Now we can copy this tar to another machine, and load it:
docker load < hadoop_docker_withsolr_save.tar


References

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)