Elastiq and HTCondor

The aim of this page is mainly to explain how the HTCondor pool in the GR4 TorinoCloud is managed.

Go to:

Introduction Top

In the gr4cloud case, HTCondor is used together with the Elastiq system: the idle VMs are destroyed, so that the cloud infrastructure is freed for other users, and when new jobs are added to the queue the system tries to create new VMs. If there is free space in some physical node, the new VMs are created and automatically added to the common pool, so that HTCondor can send them any new job. You don't have to care about the creation of the new machines, since Elastiq will do the possible to enlarge the pool each time the HTCondor queue is full.

HTCondor gr4cloud pool Top

You can find a lot of information on HTCondor here. In this page I will only list some details that are particular for the gr4cloud pool.

In the case of the GR4 Cloud, all the nodes are configured to host slots with 1 core each. All the nodes are able to run parallel jobs, and the dedicated scheduler is hosted on the head node, to which you can connect with the public IP. See also this page.

If you have a code that is parallelized with OpenMP you can take advantage of the parallelization and require the entire machine to run the program, instead of the single 1-core slots. You just have to insert +RequiresWholeMachine = True in the submit script. This means that instead of requesting slots (1 core each), you will request machines (three slots at once, that means three cores). Note that if you forget this option the program will probably use anyway all the cores of the machine, but the other slots will be marked as Unclaimed: they will be probably used to run the jobs of some other users, so that you will not have the entire machine for your job.

A note: the fact that we have a network-shared filesystem simplifies a lot the things. HTCondor can manage the transfer of the files that are not present in the worker nodes, but you should configure the submit script to copy all the required ones and this can be unconfortable. Fortunately it's not the case.

Examples Top

If you can enter the GR4 Cloud, you can find some useful scripts and a few submit files in /data/condor. The aim is to share what we are going to use, so that other people can take advantage of what the users have already done: feel free to copy and paste what it's there and what you modified for you personal use. Please don't edit what has been created by other people, create new files!