Previous: 6. The parallel universe Index Next: 8. Useful scripts and commands

7. The standard universe

Go to:

Motivations Top

One of the big advantages of the HTCondor system is that it allows a really flexible management of the jobs. In case a machine is running jobs that must be interrupted for any reason, HTCondor may be able to save the status of the run, suspend it for some time, move it to a different node and so on.

This is possible in the context of the standard HTCondor universe. We have already said that the HTCondor universes contains rules for running jobs. The standard universe is the one that allows the most interesting features: checkpointing, suspending, resuming and moving jobs.

condor_compile Top

In order to enable these features, it is not sufficient to submit a standard universe job: the executable must be compiled with the HTCondor libraries. The HTCondor distribution provides an executable that can be used to properly link the HTCondor libraries when compiling an application: condor_compile.

The command is very simple to use, and in the default installation it works with many C, C++ and Fortran compilers (gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld). For compiling a C source for running in the standard universe, for example, you can do

condor_compile gcc main.c -o main.exe
.

In order to be able to do more, for example using make, a complete installation of the HTCondor compiler is required. The full installation is made substituting the system ld executable with the HTCondor one. Nothing changes outside the condor_compile command because the HTCondor ld defaults to the standard ld libraries when not used in combination with condor_compile (see also this page).

The full installation of condor_compile is done with (in a system where the default ld is in /usr/bin/):

mv /usr/bin/ld /usr/bin/ld.real
cp /usr/lib/condor/ld /usr/bin/ld
chown root /usr/bin/ld
chmod 755 /usr/bin/ld

After this, condor_compile will work also with make:

condor_compile make all

Note that the full installation can be broken if a software update overwrites /usr/bin/ld. In that case you may have to run the above commands again.

Job submission Top

The submission of standard universe jobs is done using universe=standard. Some additional specific commands, however, may be useful.

If one wants to execute a script that performs some preparatory actions, for example, it is not sufficient to write a wrapper and use its name as executable. Since the standard universe job must be treated in a special way, HTCondor must know that the executable is not the main program. You can tell this to condor_submit using allow_startup_script = True. The script must be also written so that the last line is exec my_real_executable, so that the PID of my_real_executable is the same as the one of the startup script. An example:

#! /bin/sh
# get the host name of the machine
$host=`uname -n`
# grab a standard universe executable designed specifically for this host
scp elsewhere@hostname:${host} my_real_executable
# The PID MUST stay the same, so exec the new standard universe process.
exec my_real_executable ${1+"$@"}

Other specific commands tell HTCondor whether a file must be retrieved before starting again the job, (fetch_files = file1, file2, ...), if contents must be only appended to some files (append_files = file1, file2, ...), if a file must be considered as temporary and not transfered across nodes (local_files = file1, file2, ...) or compressed when writing/decompressed when reading (compress_files = file1, file2, ...). Other commands are described in the condor_submit man page.




Previous: 6. The parallel universe Index Next: 8. Useful scripts and commands