Previous: 4. The vanilla universe Index Next: 6. The parallel universe

5. The docker universe

Go to:

Why the docker universe? Top

About Docker I wrote a short guide here.

To resume: You saw from the vanilla examples that the list of files to copy from the submit to the execution machine may be long. If you create a container that includes the full working setup for your executables, instead, you have just to run the job inside the container and you will be sure that it will find everything it needs to work properly.

For programs that need only some specific version of a software that may be not present in all the computers of the pool (R, java or perl, for example) you will probably find an already existing Docker image in the Docker Hub that suits your needs.

If you need to configure some more specific data libraries or so, instead, you may have to create or personalize your own image and push it to some repository.

Once the image you need to run the jobs is in some repository, you can easily submit to docker universe jobs and have your code running.

Specific commands Top

The docker universe is very similar to the vanilla universe.

What does it change from a vanilla job to a docker job? Firstly, and obviously, Universe = docker.

The other differences are simply related to the fact that the job runs inside the container. Since a container can be configured to run a default executable, the first important difference is that the Executable instruction may be omitted. This is possible only if the Docker image is properly configured!

The other difference is that you must include the instruction that tells HTCondor which is the Docker image that must be used: this is docker_image. The argument must be an existing image name. If the image is not already present in the node that executes the job, Docker will look for it in the remote repository. If the repository is not specified, it will search in the Docker Hub. Some examples:

#from Docker Hub
docker_image=hello-world
docker_image=r-base:latest
#from the local repository in to4pxl:5000
docker_image=to4pxl:5000/python:latest

To run inside the container, HTCondor links the spool directory (in the node filesystem) so that it is seen in the container filesystem at the same path. The job starts then inside the container, in the same path it would have outside it. The spool directory is the only folder in the container that is seen outside it, so you must ensure that all the output files are copied there, otherwise they will be lost after the job termination.

The user that HTCondor uses to run docker universe jobs is the same that it would use to run a vanilla job: the submitter user if possible, or nobody otherwise. You should pay attention that all the folders and files inside the container that must be written or read by nobody have the correct permissions. The spool directory is properly configured, already.

A working example Top

I report here the submission scripts that I use to run my cosmomc jobs. I have created a Docker cosmobox-ready image with all the likelihoods and codes and I pushed it to the to4pxl local repository. I will use the 160725 version of the image for my submission.

This is the submit file:

Universe = docker
JOBNAME  = cosmomcDocker

docker_image=to4pxl:5000/cosmobox-ready:160725
Executable = ./cosmomcwrapper_docker
Arguments  = 0 test_planck.ini

transfer_input_files=condor/submit/docker/cosmomcwrapper_docker,condor/submit/docker/test_planck.ini
when_to_transfer_output = ON_EXIT_OR_EVICT

on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
next_job_start_delay=60

initial_dir= /home/gariazzo/
Log        = condor_logs/$(JOBNAME).$(Cluster).l
Output     = condor_logs/$(JOBNAME).$(Cluster).o
Error      = condor_logs/$(JOBNAME).$(Cluster).e

Queue

This it the bash script that executes the operations before and after the cosmomc execution inside the Docker container:

#!/bin/bash
scratchdir=`pwd`
echo "starting in the docker container, wd: $scratchdir"

echo "copying input..."
source /home/common/Planck15/plc-2.0/bin/clik_profile.sh
cd /home/cosmomc_git/

echo "requesting branch $1"
git checkout $1
shift

cp $scratchdir/* -t /home/cosmomc_git/
rm -r chains/
ln -s $scratchdir chains
mkdir chains/clusters

echo "make"
make

echo "running (in `pwd`):"
echo ./cosmomc $@
echo " "
./cosmomc $@
err=$?
echo "exiting cosmomc ($err)"

echo "listing $scratchdir content:"
ls chains/*

echo "exiting the script"
exit $err




Previous: 4. The vanilla universe Index Next: 6. The parallel universe