RUSH RENDER QUEUE - EXAMPLES
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.40g 05/06/03

Strikeout text indicates features not yet implemented


	Rush Examples

Introduction To Rush
Technical Overview
Template Submit Script
Template Render Script
Creating Render and Submit Scripts
Submitting A Job
Monitoring Frames
Done Mail
Frame Logs
Requeuing Frames
Pausing A Job
Advanced: Submit/Monitoring

Introduction To Rush

General Overview

The Rush render queue allows users to manage jobs. A 'job' is usually just a range of frames that need to be rendered. To start a job you need to run a Submit Script, which contains: the instructions defining the job, the frame range to be rendered, which machines are to be used for rendering, the 'priority' the job should run at, the pathname of the Render Script for rendering frames, etc.
All you need are these two scripts to use the render queue: a Submit Script and a Render Script. Rush can create template scripts for you using the 'rush -tss' (Template Submit Script) and 'rush -trs' (Template Render Script) commands. (Here are some examples showing how to create them)
To render each frame, the render queue runs the Render Script, usually a simple script that contains the commands necessary to invoke whatever UNIX commands are necessary to invoke the renderer or compositor.
The 'current frame number' is passed to the render script via the $RUSH_FRAME environment variable. The Render Script can invoke any command line based programs: renderers, compositors, custom C programs, perl scripts, etc.
The Render Script is just a top level wrapper script that sets up the proper rendering environment, runs the renderer, and returns one of the 'rush exit codes' to indicate success or failure; 0=Done ok, 1=Failed, 2=Retry. The Render Script will be executed on all the machines configured in your Submit Script via the same pathname, so it must be accessible via NFS. (Or in the case of Win/NT, the "Network Neighborhood").
When the render script runs, various environment variables are passed from the render queue, which may be useful to intermediate or advanced shell programmers.
The Submit Script is executed to start the job running. As the job runs, frames are started on the various networked machines as needed, eating through the frame list until there are no more frames to render. After each frame renders, the system records how long the frame took to run in the Frame List Report and logs the error output for each frame in the Frame Logs.
The render queue uses Priority Values to allow important jobs to take precedence over lower priority jobs. When priorities of different jobs are equal, a 'round robin' approach is used to allow jobs to vie for cpus.
Priority flags allow a job to fight off other, lower priority jobs instead of passively waiting for idle cpus to become available ('k', the Kill flag). A job can also have a priority such that no other jobs can kill it ('a', the Almighty flag). Combined, these flags enable a job to kill off other jobs without allowing other jobs to kill it. Sysadmins will want to monitor audit logs for the use of such flags to prevent misuse.

Technical Overview

The render queue consists of two executables:

rush(1) is the command line oriented user front end tool.
rushd(8) is the network daemon that runs on each host, one daemon per host.

rush(1) is used to control all aspects of the render queue. It is basically a 'client', and the daemon is a 'server'. rush(1) uses mostly TCP connections to communicate with the daemon.
The rushd(8) daemon is usually started by a machine's boot script, and accepts both TCP and UDP protocols, mostly using UDP to intercommunicate with the other rushd(8) daemons running on other hosts. Absolutely *no* broadcasting or multicasting is used; all UDP traffic is unicast (point to point).
There is one rushd(8) daemon that runs per host. Even multi-processor hosts use only one instance of the daemon to manage all processors.
The render queue system has two configuration files, located in /usr/local/rush/etc/*:

rush.conf

hosts

The rush.conf file contains general configurable settings for the system, used both by the daemon and front end tools.
The 'hosts' file contains a list of all hosts participating in the render queue system, along with the number of configured cpus on each host, and other host specific information.
Both files are reloaded automatically by the daemon whenever their date stamps change, within 30 seconds.
These files should be rdist(1)ed from a central location whenever modified. Neither file should be in an NFS mounted directory, nor should the daemon executables.

Examples


	Template Submit Script


[erco@howland]% rush -tss            # Template Submit Script - Let's look at it.
#!/bin/csh -f                        # It's just a csh script.

#
#  S U B M I T
#

source /usr/tmp/rush/etc/.submit

rush -submit << EOF
title           SHOW/SHOT             # Title of job
ram             250                   # Amount of RAM job expects to use (MB)
frames          1-100                 # Frame range(s)
logdir          $cwd/logs             # Directory for frame logs
command         $cwd/render-script    # Path to render script
donemail        erco                  # Who to send mail to when job is done
autodump        done                  # Autodump job when done
cpus            howland=1@100         # cpu(s) to run on

# Optional
#notes          This is a test        # Optional free form notes for job
#state          Pause                 # Optional starting state for job
EOF
exit $status

There are other examples of submit scripts written in perl, and even C/C++.

Template Render Script

[erco@howland]% rush -trs # Template Render Script - Let's look at it. #!/bin/csh -f # It's just a csh script, too. ############################### # R E N D E R S C R I P T # ############################### # Source your render environment as needed source $RUSH_DIR/etc/.render # System environment settings. (You can add other sources) echo "--- Working on frame $RUSH_FRAME - `date`" ### YOUR RENDER COMMAND(S) HERE time sleep 10 # Your render command set err = $status # Keep exit code from your 'render command' # Rush exit codes: 0=DONE 1=FAIL 2=RETRY if ( $err ) then # Translate render command exit code to rush exit codes (0|1|2) echo --- FAIL; exit 1 else echo --- DONE; exit 0 endif #NOTREACHED#

Creating Render and Submit Scripts

[erco@howland]% rush -tss > submit_me # Create submit script [erco@howland]% rush -trs > render_me # Create render script [erco@howland]% chmod +x submit_me render_me [erco@howland]% ls -la submit_me render_me -rwxrwxr-x 1 erco stree 443 Jan 24 00:59 render_me -rwxrwxr-x 1 erco stree 359 Jan 24 00:59 submit_me [erco@howland]% vi submit_me render_me # Customize the scripts (see below) [..] [erco@howland]% cat submit_me #!/bin/csh -f # # S U B M I T # source /usr/tmp/rush/etc/.submit rush -submit << EOF title VEGA # Set our title ram 100 # MB of RAM we expect to use frames 1-10 # Frame range to use (1 thru 10) logdir $cwd/logs command $cwd/render_me # Our render script donemail erco autodump done cpus howland=3@100 # Use up to 3 cpus on howland at 100 priority EOF exit $status [erco@howland]% cat render_me #!/bin/csh -f ############################### # R E N D E R S C R I P T # ############################### # Source your render environment as needed source $RUSH_DIR/etc/.render echo "--- Working on frame $RUSH_FRAME - `date`" ### YOUR RENDER COMMAND(S) HERE render < /job/VEGA/ribs/${RUSH_PADFRAME}.rib # Command to be rendered set err = $status # Rush exit codes: 0=DONE 1=FAIL 2=RETRY if ( $err ) then echo --- FAIL; exit 1 else echo --- DONE; exit 0 endif #NOTREACHED#

Submitting A Job

[erco@howland]% ./submit_me # Submit job by running submit script setenv RUSH_JOBID how.848 # Our jobid (grab with mouse and paste below) [erco@howland]% setenv RUSH_JOBID how-848 [erco@howland]% rush -lj # List Jobs STATUS JOBID TITLE OWNER %DONE BUSY NOTES ------ ----------- ------------ -------- ----- ---- ----------- Run how.848 VEGA erco %30 3 00:00:07

Monitoring Frames

[erco@howland]% rush -lf # List Frames to see how they're doing STAT FRAME TRY HOSTNAME PID START ELAPSED _ Run 0001 1 howland 12499 01/24,01:00:25 00:00:09 | 3 frames running on Run 0002 1 howland 12500 01/24,01:00:25 00:00:09 | howland for last 9 secs. Run 0003 1 howland 12501 01/24,01:00:25 00:00:09 _| Que 0004 0 - 0 00/00,00:00:00 00:00:00 Que 0005 0 - 0 00/00,00:00:00 00:00:00 Que 0006 0 - 0 00/00,00:00:00 00:00:00 Que 0007 0 - 0 00/00,00:00:00 00:00:00 Que 0008 0 - 0 00/00,00:00:00 00:00:00 Que 0009 0 - 0 00/00,00:00:00 00:00:00 Que 0010 0 - 0 00/00,00:00:00 00:00:00 [erco@howland]% rush -lfi # List Frame Info for brief report State Total Perc ----- ----- ---- Que 7 %69 # %69 to go Run 3 %30 # %30 busy running Done 0 %0 # no frames done yet Fail 0 %0 # no frames failed either Hold 0 %0 [erco@howland]% rush -lc # List cpus to see all cpus we submitted CPUSPEC[HOST] STATE FRM PID JOBTID ELAPSED NOTES how=3@100 Run 0001 12499 2 00:00:31 how=3@100 Run 0002 12500 3 00:00:31 how=3@100 Run 0003 12501 4 00:00:31 [erco@howland]% rush -lfi State Total Perc ----- ----- ---- Que 4 %40 Run 3 %30 Done 3 %30 # Some frames are done; up to %30.. Fail 0 %0 Hold 0 %0

Done Mail

[erco@howland]% You have new mail. # Rush sends mail when job is done [erco@howland]% Mail # Let's read the mail.. "/usr/mail/erco": 1 message 1 new >N 2 erco@erco.com Mon Jan 24 01:02 [how-848] VEGA (%0 QUE, %100 DONE, %0 FAIL) & From erco@erco.com Mon Jan 24 01:02:30 2000 Date: Mon, 24 Jan 2000 01:02:30 -0800 From: erco@erco.com (Greg Ercolano) To: erco@erco.com Subject: [how-848] VEGA (QUE=%0, DONE=%100, FAIL=%0) # Subject shows jobid, title, stats Jobid: how-848 Title: VEGA Priority: 1 LogDir: /usr/var/tmp/rush/logs Ram: 100 Command: /usr/var/tmp/rush/render_me ChkCommand: EndCommand: AutoDump: done User: erco (1000/1007) DoneMail: erco StartDate: Mon Jan 24 01:00:24 2000 EndDate: - Elapsed: 00:02:05 Frames: 10 Cpus: how=3@100 Notes[0]: - Criteria: STAT FRAME TRY HOSTNAME PID START ELAPSED # Frame list dump Done 0001 1 howland 12499 01/24,01:00:25 00:00:32 Done 0002 1 howland 12500 01/24,01:00:25 00:00:32 Done 0003 1 howland 12501 01/24,01:00:25 00:00:32 Done 0004 1 howland 12517 01/24,01:00:56 00:00:31 Done 0005 1 howland 12519 01/24,01:00:56 00:00:32 Done 0006 1 howland 12522 01/24,01:00:57 00:00:32 Done 0007 1 howland 12535 01/24,01:01:28 00:00:32 Done 0008 1 howland 12537 01/24,01:01:28 00:00:31 Done 0009 1 howland 12539 01/24,01:01:28 00:00:31 Done 0010 1 howland 12548 01/24,01:02:00 00:00:30 & q Held 1 message in /usr/mail/erco

Frame Logs

[erco@howland]% ls logs # Frame logs directory 0001 0003 0005 0007 0009 framelist 0002 0004 0006 0008 0010 [erco@howland]% more logs/0001 # Frame 0001's log ------------------------------------------ _ -- Host: howland | -- Pid: 12499 | -- Jobid: how.848 | -- Frame: 1 | -- Owner: erco (1000/1007) | Rush header -- Tmpdir: /usr/var/tmp/RUSH_TMP.12499 | -- Logfile: /usr/var/tmp/rush/logs/0001 | -- Command: /usr/var/tmp/rush/render_me | -- Started: Mon Jan 24 01:00:26 2000 _| ------------------------------------------ _ --- Working on frame 1 - Mon Jan 24 01:00:26 | Writing /job/VEGA/tif/0001.tif | Output from render script --- DONE _| [erco@howland]% more logs/framelist # Frame List when job completed STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Done 0001 1 howland 12499 01/24,01:00:25 00:00:32 Done 0002 1 howland 12500 01/24,01:00:25 00:00:32 Done 0003 1 howland 12501 01/24,01:00:25 00:00:32 Done 0004 1 howland 12517 01/24,01:00:56 00:00:31 Done 0005 1 howland 12519 01/24,01:00:56 00:00:32 Done 0006 1 howland 12522 01/24,01:00:57 00:00:32 Done 0007 1 howland 12535 01/24,01:01:28 00:00:32 Done 0008 1 howland 12537 01/24,01:01:28 00:00:31 Done 0009 1 howland 12539 01/24,01:01:28 00:00:31 Done 0010 1 howland 12548 01/24,01:02:00 00:00:30

Requeuing Frames

[erco@howland]% rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED _ Fail 0100 1 howland 23386 01/25,12:19:09 00:00:22 | Fail 0101 1 howland 23387 01/25,12:19:09 00:00:22 | Failed frames Fail 0102 1 howland 23388 01/25,12:19:09 00:00:22 _| Done 0103 1 howland 23410 01/25,12:19:30 00:00:22 Done 0104 1 howland 23412 01/25,12:19:30 00:00:23 [..] [erco@howland]% rush -que 100-102 # Requeue them manually to start again 0100: Que 0101: Que 0102: Que [erco@howland]% rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED _ Que 0100 1 howland 23386 01/25,12:19:09 00:00:22 | Que 0101 1 howland 23387 01/25,12:19:09 00:00:22 | They'll restart shortly Que 0102 1 howland 23388 01/25,12:19:09 00:00:22 _| Done 0103 1 howland 23410 01/25,12:19:30 00:00:22 Done 0104 1 howland 23412 01/25,12:19:30 00:00:23 [..]

Pausing A Job

[erco@howland]% rush -pause # Pause Job. Job how.850 is now 'Pause' # No new frames will be started, running frames allowed to finish. [erco@howland]% rush -lj STATUS JOBID TITLE OWNER %DONE BUSY NOTES ------ ----------- ------------ -------- ----- ---- ----------- Pause how.850 WERNER/C33 erco %55 0 Job paused. [erco@howland]% rush -cont # Continue Job Job how.850 is now 'Run' [erco@howland]% rush -lj STATUS JOBID TITLE OWNER %DONE BUSY NOTES ------ ----------- ------------ -------- ----- ---- ----------- Run how.850 WERNER/C33 erco %55 0 00:29:20

Advanced: Submit/Monitoring

[erco@howland]% eval `./submit_me` # Submit job, sets RUSH_JOBID automatically [erco@howland]% rush -ljf # List Job Full - All info about job Jobid: how-848 Title: VEGA Priority: 1 LogDir: /usr/var/tmp/rush/logs Ram: 100 Command: /usr/var/tmp/rush/render_me ChkCommand: EndCommand: AutoDump: done User: erco (1000/1007) DoneMail: erco StartDate: Mon Jan 24 01:00:24 2000 EndDate: - Elapsed: 00:00:16 Frames: 10 Cpus: how=3@100 Notes[0]: - Criteria: [erco@howland]% rush -lff | more # List Frame Full - all information about frames Frame: 1 State: Run NewState: ??? Hostname: howland Priority: 100 Pid: 12499 Tid: 1 TaskSeqID: 2 Tries: 1 Notes: StartDate: Mon Jan 24 01:00:25 2000 EndDate: Sun Jan 01 00:00:00 0000 Elapsed: 00:00:22 Frame: 2 State: Run NewState: ??? Hostname: howland Priority: 100 Pid: 12500 Tid: 2 TaskSeqID: 3 Tries: 1 Notes: StartDate: Mon Jan 24 01:00:25 2000 EndDate: Sun Jan 01 00:00:00 0000 Elapsed: 00:00:22 Frame: 3 State: Run NewState: ??? Hostname: howland Priority: 100 Pid: 12501 [..] [erco@howland]% rush -tasklist howland # Task List - View all jobs competing # for howland's cpus. TID JOBSID TASKSID PID JOBID/NAME FRM PRI STATE UTRY NOTES _ 3 4 4 0 how-848,VEGA 0000 100 Idle 1 Last: 0004 DONE | 4 4 5 0 how-848,VEGA 0000 100 Idle 1 Last: 0005 DONE | erco's 3 tasks reservations; one 'Run'ing 5 4 6 15650 how-848,VEGA 0006 100 Run 1 Elapsed=00:00:55 _| 6 6 7 15761 how-851,TESLA/MATTE 0502 200k Run 2 Elapsed=00:00:25 | liza's 2 task reservations; running at 7 6 8 15763 how-851,TESLA/MATTE 0503 200k Run 2 Elapsed=00:00:25 _| higher 'killer' priority - both are busy.

In the above example, 5 task reservations are on howland. Two belong to Liza's job, three to Erco's. Erco's job has 100k (kill) priority, and Liza's job has 200k (kill) priority, higher than Erco's. Howland only has 3 cpus, therefore since Liza's job has higher priority it runs both of hers. The one leftover cpu is taken by Erco's job.