Previous: 2. User commands | Index | Next: 4. The vanilla universe |
Go to:
HTCondor is different from other queue managers because it has an extreme flexibility in managing the job execution. This flexibility is resumed in a few configuration commands that define when a node should run, suspend, vacate or kill a job.
The differences from using HTCondor in dedicated computation resources and in its opportunistic version are defined by these policies. For example, in a dedicated resource you would expect that jobs are always running and never suspended, while in a personal computer the jobs should not run while the owner of the pc is working. All these behaviours can be defined in the HTCondor configuration files for each node using some expressions that evaluate to True or False, as listed below.
An excerpt of the HTCondor manual:
I will now describe the situation for the to4pxl
pool,
where dedicated and opportunistic resources coexist.
In the dedicated network (the z*
nodes), the configuration allows to start jobs only if the
node is not used by a non-HTCondor job, while the job suspension is disabled.
In the opportunistic case, instead, suspension is enabled. The current settings favor the execution of jobs submitted by the owner of the machine, that can use it to test his codes while working. Jobs of other users can run only if the keyboard is idle and the CPU is free. If not, jobs are suspended or vacated, in order to allow the owner to use the machine to work fluently.
You can see more details about the policies in the configuration files.
They are inside /etc/condor/config.d/
, or in to4pxl
(/home/condor/config.d/
).
In particular, in 02gr4_common
there are a lot of comments near the policy configurations
(taken from the HTCondor manual).
WARNING 1: these instructions were written in August 2016. They might be not updated.
WARNING 2: I will assume you are using some version of Ubuntu.
First thing to do: you have to install HTCondor. To do this, the best option is to use the HTCondor repository, but it is a problem if you have a 32-bit architecture, because it contains only 64-bit deb packages. If you cannot install 64-bit packages, avoid the second and third lines of the following code.
Open a terminal and login as root (type sudo su
and insert your password), then:
#as root: rm /etc/condor/condor_credential echo "deb http://research.cs.wisc.edu/htcondor/ubuntu/stable/ trusty contrib" >> /etc/apt/sources.list wget -qO - http://research.cs.wisc.edu/htcondor/ubuntu/HTCondor-Release.gpg.key | sudo apt-key add - apt update apt install -y condor
At this point HTCondor is installed, but not configured.
Before configuring, let's install also Docker, to make available the docker
universe:
#as root: apt update apt install apt-transport-https ca-certificates apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
After this, you should use only the correct line for your ubuntu distribution:
#ubuntu 14.04: echo deb https://apt.dockerproject.org/repo ubuntu-trusty main >> /etc/apt/sources.list.d/docker.list #ubuntu 16.04: echo deb https://apt.dockerproject.org/repo ubuntu-xenial main >> /etc/apt/sources.list.d/docker.list
This will ensure that you are using the most recent Docker version. Now, continue with:
apt update apt install -y linux-image-extra-$(uname -r) docker-engine service docker start groupadd docker usermod -aG docker condor
The final step is to add your user to the docker group, in order to be able
to use docker commands without root privileges:
usermod -aG docker your_username
.
to4pxl
pool
Top
First of all, if you want to include your pc as a part of the pool,
you need an account on the central node, to4pxl
.
This is required to copy the configuration files, but also (more important) to submit jobs!
To obtain an account, ask to me (gariazzo@to.infn.it)
or to Carlo Giunti (giunti@to.infn.it).
Once you have access to to4pxl
, you should copy three files
from /home/condor/config.d/
to your local machine:
01gr4_work
, 02gr4_common
and 03gr4_desktop
.
These files should be copied inside /etc/condor/config.d/
in your machine,
but you will have to do an intermediate step, for example:
#from your pc: for f in 01gr4_work 02gr4_common 03gr4_desktop; do scp to4pxl:/home/condor/config.d/$f . sudo mv $f /etc/condor/config.d/ done
Before starting HTCondor, you should update one parameter in 03gr4_desktop
,
in order to make the policies settings work correctly:
#in 03gr4_desktop, set: MACHINE_OWNER = "your_username"
Now, it is time to start HTCondor.
If for some reasons it has been already started after the installation, use restart
in the following command:
sudo service condor start.
The last thing to do is to save the pool password.
HTCondor must be running for this command to work.
You can verify that at least condor_master
is running with ps -ef | grep condor
.
The password is written in the /home/condor/pool_credential
file in to4pxl
.
Type the following command and then the password, when asked:
condor_store_cred -c add.
Conclude the configuration with
condor_restart.
You can now check if your node is connected to the pool:
use condor_status -schedd; condor_status
to show if the pool schedd is recognized and
if the node is listed as a part of the pool.
Previous: 2. User commands | Index | Next: 4. The vanilla universe |