RUSH RENDER QUEUE - EXAMPLES
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.40g 05/06/03
Strikeout text indicates features not yet implemented


Rush Examples

   Introduction To Rush  

General Overview

    The Rush render queue allows users to manage jobs. A 'job' is usually just a range of frames that need to be rendered. To start a job you need to run a Submit Script, which contains: the instructions defining the job, the frame range to be rendered, which machines are to be used for rendering, the 'priority' the job should run at, the pathname of the Render Script for rendering frames, etc.

    All you need are these two scripts to use the render queue: a Submit Script and a Render Script. Rush can create template scripts for you using the 'rush -tss' (Template Submit Script) and 'rush -trs' (Template Render Script) commands. (Here are some examples showing how to create them)

    To render each frame, the render queue runs the Render Script, usually a simple script that contains the commands necessary to invoke whatever UNIX commands are necessary to invoke the renderer or compositor.

    The 'current frame number' is passed to the render script via the $RUSH_FRAME environment variable. The Render Script can invoke any command line based programs: renderers, compositors, custom C programs, perl scripts, etc.

    The Render Script is just a top level wrapper script that sets up the proper rendering environment, runs the renderer, and returns one of the 'rush exit codes' to indicate success or failure; 0=Done ok, 1=Failed, 2=Retry. The Render Script will be executed on all the machines configured in your Submit Script via the same pathname, so it must be accessible via NFS. (Or in the case of Win/NT, the "Network Neighborhood").

    When the render script runs, various environment variables are passed from the render queue, which may be useful to intermediate or advanced shell programmers.

    The Submit Script is executed to start the job running. As the job runs, frames are started on the various networked machines as needed, eating through the frame list until there are no more frames to render.  After each frame renders, the system records how long the frame took to run in the Frame List Report and logs the error output for each frame in the Frame Logs.

    The render queue uses Priority Values to allow important jobs to take precedence over lower priority jobs. When priorities of different jobs are equal, a 'round robin' approach is used to allow jobs to vie for cpus.

    Priority flags allow a job to fight off other, lower priority jobs instead of passively waiting for idle cpus to become available ('k', the Kill flag). A job can also have a priority such that no other jobs can kill it ('a', the Almighty flag). Combined, these flags enable a job to kill off other jobs without allowing other jobs to kill it. Sysadmins will want to monitor audit logs for the use of such flags to prevent misuse.

   Technical Overview  
The render queue consists of two executables:

  • rush(1) is the command line oriented user front end tool.
  • rushd(8) is the network daemon that runs on each host, one daemon per host.

rush(1) is used to control all aspects of the render queue. It is basically a 'client', and the daemon is a 'server'. rush(1) uses mostly TCP connections to communicate with the daemon.

The rushd(8) daemon is usually started by a machine's boot script, and accepts both TCP and UDP protocols, mostly using UDP to intercommunicate with the other rushd(8) daemons running on other hosts. Absolutely *no* broadcasting or multicasting is used; all UDP traffic is unicast (point to point).

There is one rushd(8) daemon that runs per host. Even multi-processor hosts use only one instance of the daemon to manage all processors.

The render queue system has two configuration files, located in /usr/local/rush/etc/*:

  • rush.conf
  • hosts
The rush.conf file contains general configurable settings for the system, used both by the daemon and front end tools.

The 'hosts' file contains a list of all hosts participating in the render queue system, along with the number of configured cpus on each host, and other host specific information.

Both files are reloaded automatically by the daemon whenever their date stamps change, within 30 seconds.

These files should be rdist(1)ed from a central location whenever modified. Neither file should be in an NFS mounted directory, nor should the daemon executables.


Examples

   Template Submit Script  

[erco@howland]% rush -tss            # Template Submit Script - Let's look at it.
#!/bin/csh -f                        # It's just a csh script.

#
#  S U B M I T
#

source /usr/tmp/rush/etc/.submit

rush -submit << EOF
title           SHOW/SHOT             # Title of job
ram             250                   # Amount of RAM job expects to use (MB)
frames          1-100                 # Frame range(s)
logdir          $cwd/logs             # Directory for frame logs
command         $cwd/render-script    # Path to render script
donemail        erco                  # Who to send mail to when job is done
autodump        done                  # Autodump job when done
cpus            howland=1@100         # cpu(s) to run on

# Optional
#notes          This is a test        # Optional free form notes for job
#state          Pause                 # Optional starting state for job
EOF
exit $status
    

There are other examples of submit scripts written in perl, and even C/C++.

   Template Render Script  

[erco@howland]% rush -trs                # Template Render Script - Let's look at it.
#!/bin/csh -f                            # It's just a csh script, too.

###############################
#  R E N D E R   S C R I P T  #
###############################

# Source your render environment as needed
source $RUSH_DIR/etc/.render              # System environment settings. (You can add other sources)

echo "--- Working on frame $RUSH_FRAME - `date`" 

### YOUR RENDER COMMAND(S) HERE
time sleep 10                             # Your render command
set err = $status                         # Keep exit code from your 'render command'

# Rush exit codes: 0=DONE  1=FAIL  2=RETRY 
if ( $err ) then                          # Translate render command exit code to rush exit codes (0|1|2)
    echo --- FAIL; exit 1 
else 
    echo --- DONE; exit 0 
endif 
#NOTREACHED#
    

   Creating Render and Submit Scripts  

[erco@howland]% rush -tss > submit_me    # Create submit script
[erco@howland]% rush -trs > render_me    # Create render script
[erco@howland]% chmod +x submit_me render_me
[erco@howland]% ls -la submit_me render_me
-rwxrwxr-x    1 erco     stree        443 Jan 24 00:59 render_me
-rwxrwxr-x    1 erco     stree        359 Jan 24 00:59 submit_me

[erco@howland]% vi submit_me render_me   # Customize the scripts (see below)

[..]

[erco@howland]% cat submit_me
#!/bin/csh -f

#
#  S U B M I T
#

source /usr/tmp/rush/etc/.submit

rush -submit << EOF
title           VEGA              # Set our title
ram             100               # MB of RAM we expect to use
frames          1-10              # Frame range to use (1 thru 10)
logdir          $cwd/logs
command         $cwd/render_me    # Our render script
donemail        erco
autodump        done
cpus            howland=3@100     # Use up to 3 cpus on howland at 100 priority
EOF
exit $status

[erco@howland]% cat render_me
#!/bin/csh -f

###############################
#  R E N D E R   S C R I P T  #
###############################

# Source your render environment as needed
source $RUSH_DIR/etc/.render

echo "--- Working on frame $RUSH_FRAME - `date`" 

### YOUR RENDER COMMAND(S) HERE
render < /job/VEGA/ribs/${RUSH_PADFRAME}.rib      # Command to be rendered
set err = $status

# Rush exit codes: 0=DONE  1=FAIL  2=RETRY 
if ( $err ) then 
    echo --- FAIL; exit 1 
else 
    echo --- DONE; exit 0 
endif 
#NOTREACHED#
    

   Submitting A Job  

[erco@howland]% ./submit_me     # Submit job by running submit script
setenv RUSH_JOBID how.848       # Our jobid (grab with mouse and paste below)

[erco@howland]% setenv RUSH_JOBID how-848

[erco@howland]% rush -lj        # List Jobs
STATUS JOBID       TITLE        OWNER    %DONE BUSY NOTES
------ ----------- ------------ -------- ----- ---- -----------
Run    how.848     VEGA         erco     %30   3    00:00:07
    

   Monitoring Frames  

[erco@howland]% rush -lf        # List Frames to see how they're doing

STAT FRAME TRY HOSTNAME       PID   START          ELAPSED   _ 
Run  0001  1   howland        12499 01/24,01:00:25 00:00:09   | 3 frames running on
Run  0002  1   howland        12500 01/24,01:00:25 00:00:09   | howland for last 9 secs.
Run  0003  1   howland        12501 01/24,01:00:25 00:00:09  _|
Que  0004  0   -              0     00/00,00:00:00 00:00:00 
Que  0005  0   -              0     00/00,00:00:00 00:00:00 
Que  0006  0   -              0     00/00,00:00:00 00:00:00 
Que  0007  0   -              0     00/00,00:00:00 00:00:00 
Que  0008  0   -              0     00/00,00:00:00 00:00:00 
Que  0009  0   -              0     00/00,00:00:00 00:00:00 
Que  0010  0   -              0     00/00,00:00:00 00:00:00 

[erco@howland]% rush -lfi       # List Frame Info for brief report
State Total Perc
----- ----- ----
Que   7     %69                 # %69 to go
Run   3     %30                 # %30 busy running
Done  0     %0                  # no frames done yet
Fail  0     %0                  # no frames failed either
Hold  0     %0

[erco@howland]% rush -lc        # List cpus to see all cpus we submitted

CPUSPEC[HOST]        STATE       FRM  PID     JOBTID  ELAPSED  NOTES
how=3@100            Run         0001 12499   2       00:00:31
how=3@100            Run         0002 12500   3       00:00:31
how=3@100            Run         0003 12501   4       00:00:31

[erco@howland]% rush -lfi
State Total Perc
----- ----- ----
Que   4     %40    
Run   3     %30
Done  3     %30                 # Some frames are done; up to %30..
Fail  0     %0
Hold  0     %0
    

   Done Mail  

[erco@howland]%
You have new mail.             # Rush sends mail when job is done

[erco@howland]% Mail           # Let's read the mail..
"/usr/mail/erco": 1 message 1 new
>N  2 erco@erco.com       Mon Jan 24 01:02  [how-848] VEGA (%0 QUE, %100 DONE, %0 FAIL)
& 
From erco@erco.com  Mon Jan 24 01:02:30 2000
Date: Mon, 24 Jan 2000 01:02:30 -0800
From: erco@erco.com (Greg Ercolano)
To: erco@erco.com
Subject: [how-848] VEGA (QUE=%0, DONE=%100, FAIL=%0)   # Subject shows jobid, title, stats

     Jobid: how-848
     Title: VEGA
  Priority: 1
    LogDir: /usr/var/tmp/rush/logs
       Ram: 100
   Command: /usr/var/tmp/rush/render_me    
ChkCommand: 
EndCommand: 
  AutoDump: done
      User: erco (1000/1007)
  DoneMail: erco
 StartDate: Mon Jan 24 01:00:24 2000
   EndDate: -
   Elapsed: 00:02:05
    Frames: 10
      Cpus: how=3@100
  Notes[0]: -
  Criteria: 

STAT FRAME TRY HOSTNAME       PID   START          ELAPSED   # Frame list dump
Done 0001  1   howland        12499 01/24,01:00:25 00:00:32 
Done 0002  1   howland        12500 01/24,01:00:25 00:00:32 
Done 0003  1   howland        12501 01/24,01:00:25 00:00:32 
Done 0004  1   howland        12517 01/24,01:00:56 00:00:31 
Done 0005  1   howland        12519 01/24,01:00:56 00:00:32 
Done 0006  1   howland        12522 01/24,01:00:57 00:00:32 
Done 0007  1   howland        12535 01/24,01:01:28 00:00:32 
Done 0008  1   howland        12537 01/24,01:01:28 00:00:31 
Done 0009  1   howland        12539 01/24,01:01:28 00:00:31 
Done 0010  1   howland        12548 01/24,01:02:00 00:00:30 

& q
Held 1 message in /usr/mail/erco
    

   Frame Logs  


[erco@howland]% ls logs                         # Frame logs directory
0001       0003       0005       0007       0009       framelist
0002       0004       0006       0008       0010

[erco@howland]% more logs/0001                  # Frame 0001's log
------------------------------------------       _
--    Host: howland                               |
--     Pid: 12499                                 |
--   Jobid: how.848                               |
--   Frame: 1                                     |
--   Owner: erco (1000/1007)                      |  Rush header
--  Tmpdir: /usr/var/tmp/RUSH_TMP.12499           |
-- Logfile: /usr/var/tmp/rush/logs/0001           |
-- Command: /usr/var/tmp/rush/render_me           |
-- Started: Mon Jan 24 01:00:26 2000             _|
------------------------------------------       _ 
--- Working on frame 1 - Mon Jan 24 01:00:26      |
Writing /job/VEGA/tif/0001.tif                    |  Output from render script
--- DONE                                         _|

[erco@howland]% more logs/framelist             # Frame List when job completed

STAT FRAME TRY HOSTNAME       PID   START          ELAPSED  NOTES
Done 0001  1   howland        12499 01/24,01:00:25 00:00:32 
Done 0002  1   howland        12500 01/24,01:00:25 00:00:32 
Done 0003  1   howland        12501 01/24,01:00:25 00:00:32 
Done 0004  1   howland        12517 01/24,01:00:56 00:00:31 
Done 0005  1   howland        12519 01/24,01:00:56 00:00:32 
Done 0006  1   howland        12522 01/24,01:00:57 00:00:32 
Done 0007  1   howland        12535 01/24,01:01:28 00:00:32 
Done 0008  1   howland        12537 01/24,01:01:28 00:00:31 
Done 0009  1   howland        12539 01/24,01:01:28 00:00:31 
Done 0010  1   howland        12548 01/24,01:02:00 00:00:30 
    

   Requeuing Frames  


[erco@howland]% rush -lf
STAT FRAME TRY HOSTNAME       PID   START          ELAPSED  _
Fail 0100  1   howland        23386 01/25,12:19:09 00:00:22  |
Fail 0101  1   howland        23387 01/25,12:19:09 00:00:22  | Failed frames
Fail 0102  1   howland        23388 01/25,12:19:09 00:00:22 _|
Done 0103  1   howland        23410 01/25,12:19:30 00:00:22 
Done 0104  1   howland        23412 01/25,12:19:30 00:00:23 
[..]

[erco@howland]% rush -que 100-102   # Requeue them manually to start again
0100: Que
0101: Que
0102: Que

[erco@howland]% rush -lf
STAT FRAME TRY HOSTNAME       PID   START          ELAPSED   _
Que  0100  1   howland        23386 01/25,12:19:09 00:00:22   |
Que  0101  1   howland        23387 01/25,12:19:09 00:00:22   | They'll restart shortly
Que  0102  1   howland        23388 01/25,12:19:09 00:00:22  _|
Done 0103  1   howland        23410 01/25,12:19:30 00:00:22 
Done 0104  1   howland        23412 01/25,12:19:30 00:00:23 
[..]
    

   Pausing A Job  


[erco@howland]% rush -pause        # Pause Job.
Job how.850 is now 'Pause'         # No new frames will be started, running frames allowed to finish.

[erco@howland]% rush -lj
STATUS JOBID       TITLE        OWNER    %DONE BUSY NOTES
------ ----------- ------------ -------- ----- ---- -----------
Pause  how.850     WERNER/C33   erco     %55   0    Job paused.

[erco@howland]% rush -cont         # Continue Job
Job how.850 is now 'Run'

[erco@howland]% rush -lj
STATUS JOBID       TITLE        OWNER    %DONE BUSY NOTES
------ ----------- ------------ -------- ----- ---- -----------
Run    how.850     WERNER/C33   erco     %55   0    00:29:20
    

   Advanced: Submit/Monitoring  


[erco@howland]% eval `./submit_me`   # Submit job, sets RUSH_JOBID automatically
[erco@howland]% rush -ljf            # List Job Full - All info about job
     Jobid: how-848
     Title: VEGA
  Priority: 1
    LogDir: /usr/var/tmp/rush/logs
       Ram: 100
   Command: /usr/var/tmp/rush/render_me    
ChkCommand: 
EndCommand: 
  AutoDump: done
      User: erco (1000/1007)
  DoneMail: erco
 StartDate: Mon Jan 24 01:00:24 2000
   EndDate: -
   Elapsed: 00:00:16
    Frames: 10
      Cpus: how=3@100
  Notes[0]: -
  Criteria: 

[erco@howland]% rush -lff | more      # List Frame Full - all information about frames
    Frame: 1
    State: Run
 NewState: ???
 Hostname: howland
 Priority: 100
      Pid: 12499
      Tid: 1
TaskSeqID: 2
    Tries: 1
    Notes: 
StartDate: Mon Jan 24 01:00:25 2000
  EndDate: Sun Jan 01 00:00:00 0000
  Elapsed: 00:00:22

    Frame: 2
    State: Run
 NewState: ???
 Hostname: howland
 Priority: 100
      Pid: 12500
      Tid: 2
TaskSeqID: 3
    Tries: 1
    Notes: 
StartDate: Mon Jan 24 01:00:25 2000
  EndDate: Sun Jan 01 00:00:00 0000
  Elapsed: 00:00:22

    Frame: 3
    State: Run
 NewState: ???
 Hostname: howland
 Priority: 100
      Pid: 12501
[..]

[erco@howland]% rush -tasklist howland     # Task List - View all jobs competing
                                           # for howland's cpus.
TID JOBSID  TASKSID PID   JOBID/NAME           FRM  PRI   STATE   UTRY NOTES               _
3   4       4       0     how-848,VEGA         0000 100   Idle    1    Last: 0004 DONE      |
4   4       5       0     how-848,VEGA         0000 100   Idle    1    Last: 0005 DONE      | erco's 3 tasks reservations; one 'Run'ing
5   4       6       15650 how-848,VEGA         0006 100   Run     1    Elapsed=00:00:55    _|
6   6       7       15761 how-851,TESLA/MATTE  0502 200k  Run     2    Elapsed=00:00:25     | liza's 2 task reservations; running at
7   6       8       15763 how-851,TESLA/MATTE  0503 200k  Run     2    Elapsed=00:00:25    _| higher 'killer' priority - both are busy.
    

In the above example, 5 task reservations are on howland. Two belong to Liza's job, three to Erco's. Erco's job has 100k (kill) priority, and Liza's job has 200k (kill) priority, higher than Erco's. Howland only has 3 cpus, therefore since Liza's job has higher priority it runs both of hers. The one leftover cpu is taken by Erco's job.