Indirizzo di posta (mail address): gariazzo@to.infn.it
Homepage: Go
Go to:
It is a system of virtual machines, hosted in the Centro di Calcolo network where also TIER2 for CERN is running.
We bought some new physical machines that are supposed to be available for everyone in the network,
if we are not using them.
For this reason we are testing a cloud system in which we can create and destroy a number of virtual machines (VM),
for a total of 48 cores.
The preferred format for these machines is 3 cores/8GB RAM, but there are other formats.
The cores have not a fixed speed, since it depends on the physical machine the VM is created and
we have no control on this.
There is a head node with public ip (193.205.66.216 - with alias gr4cloud or gr4cloud.to.infn.it),
the other nodes are on a private subnetwork.
As they are VM, the nodes can be created and destroyed in a relatively short time.
We are asked to destroy the slave nodes, if we are not using them: in this way,
the physical resources are available for other users of the network.
After a VM is destroyed, the changes made on it are lost.
VMs are created using a configuration file that allow us to prepare the environment in some specified ways.
In particular, through this mechanism we can run some specific commands and install specific packages,
so that every VM is created with the same initial setup.
Each VM mounts different filesystems: some of them are temporary, that means they are destroyed with the VM and the changes made on them are lost:
You can login to the system through the node with public ip, with
ssh user@gr4cloud
from inside the Torino INFN network, or from the public login machines.
After login, you are in the head node, from which you can login to the other nodes with
ssh node-XXX
where the number must match the IP of one of the currently existing VM.
The default password should be the usual one on the z* nodes (please check),
but you are encouraged to create a .ssh/authorized_keys file after your first access,
in order to login only with your ssh key
(see here for useful information).
Since the home directory is shared, if you generate a ssh key
in the head node you can insert it in the .ssh/authorized_keys file
in order to successfully login to all the slave nodes.
After all of you will have the ssh key configured correctly,
the password login should be disabled for security reasons (tell to one of the admin).
Since we will create and destroy the machines, if you need some specific package or configuration, there are two possibilities:
The VM have Ubuntu 14.04LTS with some useful packages preinstalled, but it is possible that you need some other software: please send me (or to Hannes) the list of packages you need, so that we will add them in the startup routine or we can install them to some common folder. We can give you sudo rights, if really needed.
Here there is a list of software installed in the /data/common/ folder, that should be accessible to everyone.
You can import in your environment the correct path with the instruction lines in /data/common/exportvariables,
where there are the commands for all the software listed below:
Jobs must be submitted using the HTCondor system. You can find a lot of information on HTCondor here.
At the moment we are using a flexible system that manages the creation and shutdown of the VMs. The system is Elastiq and it is connected to HTCondor: the idle VMs are destroyed, so that the cloud infrastructure is freed for other users, and when new jobs are added to the HTCondor queue the system tries to create new VMs. If there is free space in some physical node, the new VMs are created and automatically added to the common pool, so that HTCondor can send them any new job. You don't have to care about the creation of the new machines, since Elastiq will do the possible to enlarge the pool each time the HTCondor queue is full.
If you can enter the GR4 Cloud, you can find some useful scripts and a few submit files in /data/condor
.
The aim is to share what we are going to use, so that other people can take advantage of what the users have already done:
feel free to copy and paste what it's there and what you modified for you personal use.
Please don't edit what has been created by other people, create new files!
Please notify to me, Hannes or Carlo every kind of problem you experience, so that we can try to solve them or ask to the people in Centro di Calcolo: we are not the first group to test this dynamical system, so someone else may have faced the same problem. Any feedback is definitely useful to improve the system.