Job Submission
1. Short Introduction
User's applications are generally executed by submitting them onto as jobs. The Workload Management System (WMS) (set of gLite services) is responsible for accepting jobs submitted by the user. Then those jobs are dispatched to appropriate CEs depending on the job requirements and the available resources. The submission of a job requires GSI authentication between both the user interface and the WMS and between WMS and the CE. Therefore when submitting, monitoring or retrieving a job output use of a valid proxy certificate is a must. The validity of the certificate should be higher than the execution time of the job.
A special file called the job description file is used to indicate various parameters related to the submitted job. Job description files are written using Job Description Language (JDL). The JDL is used to specify the desired job characteristics and constraints, which are used by the match-making process to select the resources that the job will use. The WMS retrieves information from the Information System (IS) and the File Catalog and look for the best CE, given the requirement of the submitted job (as given in the job description file). This search process is called “match making”.
2. My first jdl file
Here is a example of a simple JDL file.
[
Executable = "/bin/echo";
VirtualOrganisation = "gridbox";
Arguments = "Hello from $HOSTNAME";
StdOutput = "hello.out";
StdError = "hello.err";
ShallowRetryCount = 0;
OutputSandbox = {"hello.out", "hello.err"};
]
The Executable attribute specifies the command to be run on the Worker Node. The OutputSandbox attribute indicates the files you want to be copied back after job execution; normally these are files where output and error streams are redirected; their names are determined by the StdOutput and StdError attributes respectively. Also the number of retries is specified (ShallowRetryCount), in case of failures.
3. Credentials delegation
All the interactions with WMS are actually done through the WMProxy web service that requires a proxy itself. This is actually done by means of credential delegation procedure. There are two options here:
- The user can manually perform this operation using the command below and specifying the delegationId to be associated with the delegated proxy:
glite-wms-job-delegate-proxy -d $USER
Using -d option, the delegation is created, and its name is hold (in this case, the value of the environment variable USER, that is, the login name), so that subsequent invocations of glite-wms-job-submit and glite-wms-job-list-match can be given that delegation name, bypassing the delegation of a new proxy. So, when calling glite-wms-job-submit and glite-wms-job-list-match the delegation name is given with the -d option.
- The user can use on all the glite-wms* commands the -a option, which causes a delegated proxy to be established automatically. However massive use of this option it's not recommended, since it delegates a new proxy for each command issued, and delegation is a time-consuming operation, so it's better to do it once with glite-wms-job-delegate-proxy and reuse it.
In the following we will create a delegation towards WMProxy using as identifier our username, that you can get from the environment variable $USER.
[user01@ui-1 ~]$ echo $USER user01 [user01@ui-1 ~]$ glite-wms-job-delegate-proxy -d $USER Connecting to the service https://wms-4.grid.box:7443/glite_wms_wmproxy_server ================== glite-wms-job-delegate-proxy Success ================== Your proxy has been successfully delegated to the WMProxy: https://wms-4.grid.box:7443/glite_wms_wmproxy_server with the delegation identifier: user01 ==========================================================================
4. Job List Match
A JDL (Job Description Language) file describes the requirements of the task we want to run on the grid. It is wise, before running it, to check which computing elements (CE's) are able to match such requirements and accept it. This can be done glite-wms-job-list-match command. This command interacts with the WMproxy service and therefore requires the delegated proxy we created before. As you can see, with "-d" option allows you to specify the delegation identifier you have created. Since we did it using the username (as get from $USER), this is the value we give to the option.
[user01@ui-1 ~]$ glite-wms-job-list-match -d $USER hello.jdl Connecting to the service https://wms-4.grid.box:7443/glite_wms_wmproxy_server ========================================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - ce-1.grid.box:2119/jobmanager-lcgpbs-gridbox - ce-2.grid.box:2119/jobmanager-lcgpbs-gridbox ==========================================================================
5. Job Submission
5.1 WMS Job Submission
We are now ready to submit our simple job by means of glite-wms-job-submit command. We will use -d flag (for the delegated proxy and -o to save the job identfier on a file to later check the status of our job.
[user01@ui-1 ~]$ glite-wms-job-submit -d $USER -o jobid hello.jdl Connecting to the service https://wms-4.grid.box:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA The job identifier has been saved in the following file: /home/user01/jobid ==========================================================================
The file /home/user01/jobid contains the jobID(s) (https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA) returned by the submission process. If another job is submitted specifying the same output file, its jobID is appended.
5.2 Job Status
In order to know about the job status another command is available: glite-wms-job-status; this command requires as ipnut the Job Identifiers and then queries LB (Logging and Bookkeeping service) on the status of the job. No delegation is needed for this command.
[user01@ui-1 ~]$ glite-job-status -i jobid
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA
Current Status: Done (Success)
Logged Reason(s):
-
- Job terminated successfully
Exit code: 0
Status Reason: Job terminated successfully
Destination: ce-1.grid.box:2119/jobmanager-lcgpbs-gridbox
Submitted: Sat Sep 13 16:56:15 2008 CEST
*************************************************************
If the option -i input_file is specified the command will scan the input_file to check for jobids and inform the user if different job identifiers are present and ask for specific actions as specified here below:
[user01@ui-1 ~]$ glite-wms-job-status -i jobid ------------------------------------------------------------------ 1 : https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA 2 : https://wms-4.grid.box:9000/aIjWLzsTMv1WUXcLdtTQLw a : all q : quit ------------------------------------------------------------------ Choose one or more jobId(s) in the list - [1-2]all:2 ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms-4.grid.box:9000/aIjWLzsTMv1WUXcLdtTQLw Current Status: Running Status Reason: Job successfully submitted to Globus Destination: ce-1.grid.box:2119/jobmanager-lcgpbs-gridbox Submitted: Sat Sep 13 17:03:58 2008 CEST *************************************************************
In the above examples we can see that the first jobe has been completed while the second one is running.
5.3 Job Output
As soon as the glite-job-status command reports that job has been succesfully completed, the result can be retrieved by the command: glite-wms-job-output. The command requires the Jobid and can accept ( by -i option ) an input file. It will bring back all the files specified in the output sandbox in a directory. Again You don't need to specify a delegation identifier for this command. Please note that this command by default will retrieve the output in rather weird location (/tmp/JobOutput) with rather name as well( taken by the JobID). Much better use the following options to control where the output directory are stored (-o) and named (--dir).
[user01@ui-1 ~]$ glite-wms-job-output -o $PWD --dir job1 -i jobid Connecting to the service https://10.10.0.9:7443/glite_wms_wmproxy_server ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA have been successfully retrieved and stored in the directory: /home/user01/job1 ================================================================================
In order to inspect the job output, list the files in the indicated directory and show the content of the output file(s).
[user01@ui-1 ~]$ ls -alrt job1/ total 12 -rw-rw-r-- 1 user01 user01 28 Sep 13 17:20 hello.out -rw-rw-r-- 1 user01 user01 0 Sep 13 17:20 hello.err drwx------ 4 user01 user01 4096 Sep 13 17:20 .. drwxr-xr-x 2 user01 user01 4096 Sep 13 17:20 . [user01@ui-1 ~]$ cat jb1/hello.out [user01@ui-1 ~]$ cat job1/hello.out Hello from ce-1wn2.grid.box
5.4 Job Cancel
If anything goes wrong a job can be cancelled by the command: glite-wms-job-cancel Again the -i option is also available (especially useful in order to cancel more files with a single command):
[user01@ui-1 ~]$ glite-wms-job-cancel -i jobid ------------------------------------------------------------------ 1 : https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA 2 : https://wms-4.grid.box:9000/aIjWLzsTMv1WUXcLdtTQLw a : all q : quit ------------------------------------------------------------------ Choose one or more jobId(s) in the list - [1-2]all (use , as separator or - for a range): a Are you sure you want to remove specified job(s) [y/n]y : y Warning - Not allowed to cancel the job: https://wms-4.grid.box:9000/EeIEZVi31yv6as1TA6BLiA Current Job Status is Cleared Warning - Not allowed to cancel the job: https://wms-4.grid.box:9000/aIjWLzsTMv1WUXcLdtTQLw Current Job Status is Done Error - Operation Failed Unable to cancel any job
In this case the job was not cancelled because it has been already succesfully completed - the status "cleared" shows that we have already retrieved the output.
