Submitting jobs
All the resources of the the SISSA/Democritos infrastructure are managed through the Torque/Portable Batch System (Torque/PBS), which is a workload management system for Linux clusters. Besides managing the access to the resources it provides commands to submit, monitor and delete jobs. It's key components are:
- the Job Server or PBS server which provides the basic batch services such as receiving, creating and running a batch job, modifying it and protecting it against system crashes
- the Job Executor or PBS mom, which is a daemon that actually takes care of the execution when it receives a copy of the job from the Job Server. The PBS mom creates a new session as similar as possible to a user login session returns the job output to the user
- the Job Scheduler, which is a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own scheduler. On the SISSA/Democritos infrastructure the Maui scheduler is being used. The Maui scheduler can communicate with various moms to keep track of the system's resources and with the server to monitor the availability of jobs to execute.
All the computing nodes available on the SISSA/Democritos infrastructure, namely the HG1 cluster, are divided in partitions, which are identified by the scheduler as different queues. In the table below all the different queues are listed, indicating the nodes hardware specifications.
| Queue Name | # of Nodes | # of Cores | Nodes name | CPU (per node) | RAM (per node) | Network |
|---|---|---|---|---|---|---|
| zebra | 22 | 176 | p0xx | Intel Xeon E5420 2.5GHz (2x4 cores) | 16GB | Infiniband 20G |
| cmb | zebra clone, reserved to Planck project people, w/ higher priority | |||||
| blade | 88 | 352 | m0xx/cxxx | AMD Opteron 280 2.4GHz (2x2 cores) | 8GB | Infiniband 10G |
| iblade | 56 | 224 | ixxx | AMD Opteron 275 2.2GHz (2x2 cores) | 8GB | Infiniband 2x2.5G |
| smp | routing queue to submit jobs on the following two execution queues | |||||
| smp4 | 12 | 48 | a2xx | AMD Opteron 275 2.2GHz (2x2 cores) | 8GB | n.a. |
| smp2* | 23 | 46 | a0xx | AMD Opteron 252 2.6GHz (2x1 cores) | 4GB | n.a. |
| up* | 23 | 46 | a0xx | AMD Opteron 252 2.6GHz (2x1 cores) | 4GB | n.a. |
*Note: smp2 an up queues share the same physical machines
In the following sections detailed instructions are provided to help you submit your jobs to the system.
Important! Read the PBS How-To before jumping to other sections, instructions available there are important for any type of submission!
