Previous: 6. The parallel universe | Index | Next: 8. Useful scripts and commands |
Go to:
One of the big advantages of the HTCondor system is that it allows a really flexible management of the jobs. In case a machine is running jobs that must be interrupted for any reason, HTCondor may be able to save the status of the run, suspend it for some time, move it to a different node and so on.
This is possible in the context of the standard HTCondor universe. We have already said that the HTCondor universes contains rules for running jobs. The standard universe is the one that allows the most interesting features: checkpointing, suspending, resuming and moving jobs.
In order to enable these features, it is not sufficient to submit a standard universe job:
the executable must be compiled with the HTCondor libraries.
The HTCondor distribution provides an executable that can be used to properly link the HTCondor libraries
when compiling an application: condor_compile
.
The command is very simple to use, and in the default installation it works with many C, C++ and Fortran compilers
(gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld
).
For compiling a C source for running in the standard universe, for example, you can do
condor_compile gcc main.c -o main.exe.
In order to be able to do more, for example using make
,
a complete installation of the HTCondor compiler is required.
The full installation is made substituting the system ld
executable with the HTCondor one.
Nothing changes outside the condor_compile
command because the HTCondor ld
defaults to the standard ld
libraries when not used in combination with condor_compile
(see also
this page).
The full installation of condor_compile
is done with
(in a system where the default ld
is in /usr/bin/
):
mv /usr/bin/ld /usr/bin/ld.real cp /usr/lib/condor/ld /usr/bin/ld chown root /usr/bin/ld chmod 755 /usr/bin/ld
After this, condor_compile
will work also with make
:
condor_compile make all
Note that the full installation can be broken if a software update overwrites /usr/bin/ld
.
In that case you may have to run the above commands again.
The submission of standard universe jobs is done using universe=standard
.
Some additional specific commands, however, may be useful.
If one wants to execute a script that performs some preparatory actions, for example,
it is not sufficient to write a wrapper and use its name as executable
.
Since the standard universe job must be treated in a special way, HTCondor must know that the
executable is not the main program.
You can tell this to condor_submit
using allow_startup_script = True
.
The script must be also written so that the last line is exec my_real_executable
,
so that the PID of my_real_executable
is the same as the one of the startup script.
An example:
#! /bin/sh # get the host name of the machine $host=`uname -n` # grab a standard universe executable designed specifically for this host scp elsewhere@hostname:${host} my_real_executable # The PID MUST stay the same, so exec the new standard universe process. exec my_real_executable ${1+"$@"}
Other specific commands tell HTCondor whether a file must be retrieved before starting again the job,
(fetch_files = file1, file2, ...
),
if contents must be only appended to some files
(append_files = file1, file2, ...
),
if a file must be considered as temporary and not transfered across nodes
(local_files = file1, file2, ...
)
or compressed when writing/decompressed when reading
(compress_files = file1, file2, ...
).
Other commands are described in the condor_submit
man page.
Previous: 6. The parallel universe | Index | Next: 8. Useful scripts and commands |