Previous: 2. User commands Index Next: 4. The vanilla universe

3. Opportunistic computation

Go to:

Job execution policies Top

HTCondor is different from other queue managers because it has an extreme flexibility in managing the job execution. This flexibility is resumed in a few configuration commands that define when a node should run, suspend, vacate or kill a job.

The differences from using HTCondor in dedicated computation resources and in its opportunistic version are defined by these policies. For example, in a dedicated resource you would expect that jobs are always running and never suspended, while in a personal computer the jobs should not run while the owner of the pc is working. All these behaviours can be defined in the HTCondor configuration files for each node using some expressions that evaluate to True or False, as listed below.

An excerpt of the HTCondor manual:

START
When TRUE, the machine is willing to spawn a remote HTCondor job.
WANT_SUSPEND
If True, the machine evaluates the SUSPEND expression to see if it should transition to the Suspended activity. If any value other than True, the machine will look at the PREEMPT expression.
SUSPEND
If WANT_SUSPEND is True, and the machine is in the Claimed/Busy state, it enters the Suspended activity if SUSPEND is True.
CONTINUE
If the machine is in the Claimed/Suspended state, it enter the Busy activity if CONTINUE is True.
PREEMPT
If the machine is either in the Claimed/Suspended activity, or is in the Claimed/Busy activity and WANT_SUSPEND is FALSE, the machine enters the Claimed/Retiring state whenever PREEMPT is TRUE.
WANT_VACATE
This is checked only when the PREEMPT expression is True and the machine enters the Preempting state. If WANT_VACATE is True, the machine enters the Vacating activity. If it is False, the machine will proceed directly to the Killing activity.
KILL
If the machine is in the Preempting/Vacating state, it enters Preempting/Killing whenever KILL is True.

I will now describe the situation for the to4pxl pool, where dedicated and opportunistic resources coexist.

In the dedicated network (the z* nodes), the configuration allows to start jobs only if the node is not used by a non-HTCondor job, while the job suspension is disabled.

In the opportunistic case, instead, suspension is enabled. The current settings favor the execution of jobs submitted by the owner of the machine, that can use it to test his codes while working. Jobs of other users can run only if the keyboard is idle and the CPU is free. If not, jobs are suspended or vacated, in order to allow the owner to use the machine to work fluently.

You can see more details about the policies in the configuration files. They are inside /etc/condor/config.d/, or in to4pxl (/home/condor/config.d/). In particular, in 02gr4_common there are a lot of comments near the policy configurations (taken from the HTCondor manual).

Install HTCondor and Docker Top

WARNING 1: these instructions were written in August 2016. They might be not updated.

WARNING 2: I will assume you are using some version of Ubuntu.

First thing to do: you have to install HTCondor. To do this, the best option is to use the HTCondor repository, but it is a problem if you have a 32-bit architecture, because it contains only 64-bit deb packages. If you cannot install 64-bit packages, avoid the second and third lines of the following code.

Open a terminal and login as root (type sudo su and insert your password), then:

#as root:
rm /etc/condor/condor_credential 
echo "deb http://research.cs.wisc.edu/htcondor/ubuntu/stable/ trusty contrib" >> /etc/apt/sources.list
wget -qO - http://research.cs.wisc.edu/htcondor/ubuntu/HTCondor-Release.gpg.key | sudo apt-key add -
apt update
apt install -y condor

At this point HTCondor is installed, but not configured.

Before configuring, let's install also Docker, to make available the docker universe:

#as root:
apt update
apt install apt-transport-https ca-certificates
apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D

After this, you should use only the correct line for your ubuntu distribution:

#ubuntu 14.04:
echo deb https://apt.dockerproject.org/repo ubuntu-trusty main >> /etc/apt/sources.list.d/docker.list
#ubuntu 16.04:
echo deb https://apt.dockerproject.org/repo ubuntu-xenial main >> /etc/apt/sources.list.d/docker.list

This will ensure that you are using the most recent Docker version. Now, continue with:

apt update
apt install -y linux-image-extra-$(uname -r) docker-engine
service docker start
groupadd docker
usermod -aG docker condor

The final step is to add your user to the docker group, in order to be able to use docker commands without root privileges: usermod -aG docker your_username.

Configure the HTCondor for the to4pxl pool Top

First of all, if you want to include your pc as a part of the pool, you need an account on the central node, to4pxl. This is required to copy the configuration files, but also (more important) to submit jobs! To obtain an account, ask to me (gariazzo@to.infn.it) or to Carlo Giunti (giunti@to.infn.it).

Once you have access to to4pxl, you should copy three files from /home/condor/config.d/ to your local machine: 01gr4_work, 02gr4_common and 03gr4_desktop. These files should be copied inside /etc/condor/config.d/ in your machine, but you will have to do an intermediate step, for example:

#from your pc:
for f in 01gr4_work 02gr4_common 03gr4_desktop;
do
   scp to4pxl:/home/condor/config.d/$f .
   sudo mv $f /etc/condor/config.d/
done

Before starting HTCondor, you should update one parameter in 03gr4_desktop, in order to make the policies settings work correctly:

#in 03gr4_desktop, set:
MACHINE_OWNER = "your_username"

Now, it is time to start HTCondor. If for some reasons it has been already started after the installation, use restart in the following command:

sudo service condor start
.

The last thing to do is to save the pool password. HTCondor must be running for this command to work. You can verify that at least condor_master is running with ps -ef | grep condor. The password is written in the /home/condor/pool_credential file in to4pxl. Type the following command and then the password, when asked:

condor_store_cred -c add
.

Conclude the configuration with

condor_restart
.

You can now check if your node is connected to the pool: use condor_status -schedd; condor_status to show if the pool schedd is recognized and if the node is listed as a part of the pool.




Previous: 2. User commands Index Next: 4. The vanilla universe