Rush Logo Rush Render Queue - Tutorial
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.42 07/11/05
Strikeout text indicates features not yet implemented

Creating A Submit Script

    When creating a submit script, you can either copy an existing one or use and modify the Template Submit Script.

      rush -tss > submit_me
      chmod +x submit_me

    When starting with a template, you should at least change the Title, Ram, Command, LogDir, and Cpus fields.

    Be sure to use absolute paths for Command and LogDir.



% rush -tss > submit_me
% chmod +x submit_me
% vi submit_me
% cat submit_me
#!/bin/csh -f

#
#  S U B M I T
#

set thisdir = ( `pwd` )

rush -submit << EOF
title           TEST
ram             1
frames          5000-5500
command         $thisdir/render_me
logdir          $thisdir/logs
cpus            +any=5
EOF
exit $status



Creating A Render Script

    Again, you can either copy an existing script, or start with a copy of the Template Render Script and modify that.

      rush -trs > render_me
      chmod +x render_me

    When starting with a template, you should at least add your render command in the section marked..

      ### YOUR RENDER COMMANDS HERE ###

    It is the minimum job of the Render Script to detect if the render worked or not, and to exit the script accordingly with one of these rush exit codes:

      exit 0    # OK - it worked
      exit 1    # FAIL - failed, don't retry
      exit 2    # RETRY - failed, requeue the frame


% rush -trs > render_me
% chmod +x render_me
% vi render_me
% cat render_me
#!/bin/csh -f

###############################
#  R E N D E R   S C R I P T  #
###############################

# Source your render environment as needed
source $RUSH_DIR/etc/.render

echo "--- Working on frame $RUSH_FRAME - `date`" 

### YOUR RENDER COMMAND(S) HERE
render /job/TEXAS/shot17/ribs/$RUSH_PADFRAME.rib
set err = $status

# Rush exit codes: 0=DONE  1=FAIL  2=RETRY 
if ( $err ) then 
    echo --- FAIL; exit 1 
else 
    echo --- DONE; exit 0 
endif 


Submitting A Job

    Running the submit script starts the job running and returns the jobid as an environment variable setting, which looks something like:

      setenv RUSH_JOBID va.215

    Using your mouse, just cut and paste this into your shell to set the variable.

    If you're using a CSH under Unix, you can submit the job and set the variable all in one operation by invoking the submit script as:

      eval `submit me`

    The environment variable setting allows you to control the job with successive rush commands, without having to specify the jobid for each command. Since you can have several jobs running at a time, rush must know which jobid you're controlling at all times.


% ./submit_me
setenv RUSH_JOBID va.215

% setenv RUSH_JOBID va.215

% rush -lj
STATUS JOBID       TITLE        OWNER    %DONE %FAIL BUSY NOTES
------ ----------- ------------ -------- ----- ----- ---- -----------
Run    va.215      TEST         erco     %0    %0    3    00:00:10

% 

Frame List

    The rush -lf (List Frames) shows the frame list: the Status of each frame, how many Trys have been made to run the frame, which Hostname the frame ran on, the PID it ran as, the Start time for the running frame, and its Elapsed time.

    Specific frames may be targeted for viewing by using typical Unix techniques like piping framelists through grep, e.g.:

      rush -lf | grep Run
      rush -lf | grep Fail
      rush -lf | grep '500[1-9]'
      rush -lf | grep vaio


% rush -lf | head
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Run  5000  1   vaio           18092   02/26,02:15:40 00:00:21 
Run  5001  1   vaio           18093   02/26,02:15:40 00:00:21 
Run  5002  1   vaio           18100   02/26,02:15:50 00:00:11 
Que  5003  0   -              0       00/00,00:00:00 00:00:00 
Que  5004  0   -              0       00/00,00:00:00 00:00:00 
Que  5005  0   -              0       00/00,00:00:00 00:00:00 
Que  5006  0   -              0       00/00,00:00:00 00:00:00 
Que  5007  0   -              0       00/00,00:00:00 00:00:00 
Que  5008  0   -              0       00/00,00:00:00 00:00:00 

% rush -lf | grep Run
Run  5000  1   vaio           18092   02/26,02:15:40 00:00:21 
Run  5001  1   vaio           18093   02/26,02:15:40 00:00:21 
Run  5002  1   vaio           18100   02/26,02:15:50 00:00:11 

% 

Frame Logs

    As renders run, they print informative text messages. Rush saves these messages in log files on a per frame basis. Rush writes the logs to the LogDir which can easily be viewed with the 'rush -log' command, e.g.:

      rush -log 3

    ..which would view the frame logs for frame 0003. Several frames can be specified at a time, or even frame ranges, e.g:

      rush -log 1-10
      rush -log 1 8 55 342

    Or, you can simply more(1) the log files directly; log files use the frame number as the filename.

    In this example, the user wants to look at the log output for frame #3..

      more /net/job/rush/logs/0003

    ..which is more or less equivalent to:

      rush -log 3



% rush -lf
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Done 0001  1   vaio           19458   02/27,13:36:59 00:08:54 
Done 0002  1   vaio           19461   02/27,13:36:59 00:08:54 
Fail 0003  1   tahoe          1172    02/26,02:15:40 00:00:03
Run  0004  1   vaio           19593   02/26,02:15:40 00:00:21 
Run  0005  1   vaio           19600   02/26,02:15:50 00:00:11 
Que  0003  0   -              0       00/00,00:00:00 00:00:00 
Que  0004  0   -              0       00/00,00:00:00 00:00:00 
Que  0005  0   -              0       00/00,00:00:00 00:00:00 
Que  0006  0   -              0       00/00,00:00:00 00:00:00 
Que  0007  0   -              0       00/00,00:00:00 00:00:00 
Que  0008  0   -              0       00/00,00:00:00 00:00:00 

% rush -log 3                 # See what happened
###                                    __ 
### vaio.215: 0003                       |
###                                      |
-------------------------------------    |
--    Host: tahoe                        |
--     Pid: 1172                         |
--   Jobid: vaio.215                     |  Rush Header.
--   Frame: 3                            |  Shows info about this frame.
--   Tries: 0                            |
--   Owner: erco (1000/1007)             |
--    Nice: 10                           |
--  Tmpdir: /var/tmp/.RUSH_TMP.412       |
-- LogFile: /net/job/rush/logs           |
-- Command: /net/job/rush/render_me      |
-- Started: Fri Jun 15 00:33:48 2001   __|
-------------------------------------  __ 
Sourcing user environment.. OK           |
Sourcing BMRT environment.. OK           |
                                         |  Render Script Output
Working on frame 0003                    |  The stdout/stderr of your script.
/net/job: File system is full            |
FAIL 1                                 __|


Requeue Frames

    Some frames may 'hang' while running on certain machines. Their Status may continue to show as Run after an obviously long render time, or they may even Fail for one reason or another.

    Sometimes just requeuing the frames will cause them to render successfully on the second try.

    You can requeue frames with rush -que by specifying particular frames or frame ranges, or even by requeueing all frames in a particular state, e.g.:

      rush -que 1 4 10 19
      rush -que 1-10
      rush -que fail

    Requeueing a "Run" frame will kill it, returning it to the Que state, where it will eventually be restarted elsewhere.


% rush -lf | head
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Run  5000  1   vaio           18092   02/26,02:15:40 00:00:21 
Run  5001  1   vaio           18093   02/26,02:15:40 00:00:21 
Run  5002  1   vaio           18100   02/26,02:15:50 00:00:11
Que  5003  0   -              0       00/00,00:00:00 00:00:00 
Que  5004  0   -              0       00/00,00:00:00 00:00:00 
Que  5005  0   -              0       00/00,00:00:00 00:00:00 
Que  5006  0   -              0       00/00,00:00:00 00:00:00 
Que  5007  0   -              0       00/00,00:00:00 00:00:00 
Que  5008  0   -              0       00/00,00:00:00 00:00:00 

% rush -que 5000-5003
5000: Que [Killed]
5001: Que [Killed]
5002: Que [Killed]
5003: Que

% rush -lf | head
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Run  5000  2   rotwang        18117   02/26,02:16:14 00:00:05 
Run  5001  2   rotwang        18119   02/26,02:16:14 00:00:05 
Run  5002  2   vaio           18115   02/26,02:16:14 00:00:05
Que  5003  0   -              0       00/00,00:00:00 00:00:00 
Que  5004  0   -              0       00/00,00:00:00 00:00:00 
Que  5005  0   -              0       00/00,00:00:00 00:00:00 
Que  5006  0   -              0       00/00,00:00:00 00:00:00 
Que  5007  0   -              0       00/00,00:00:00 00:00:00 
Que  5008  0   -              0       00/00,00:00:00 00:00:00 

% 

Requeue Failed Frames

    Occasionally frames may Fail because of missing files or other such problems. In these cases you will want to fix the problem, then requeue all the Fail frames so they render again.

    You can either requeue the frames individually or requeue all the Fail frames with a single command:

      rush -que 3 5 9 10 11 18       # This..
      rush -que fail                 # ..or this.

    The same trick works for all the frame states, i.e.:

          rush -que done    # All 'Done' frames become 'Que'
          rush -que hold    # All 'Hold' frames become 'Que'
          rush -done que    # All 'Que'  frames become 'Done'
          (etc.)
      


% rush -lf
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Done 0001  1   vaio           19458   02/27,13:36:59 00:03:04 
Done 0002  1   vaio           19461   02/27,13:36:59 00:03:04 
Fail 0003  1   rotwang        19463   02/27,13:36:59 00:00:00 
Done 0004  1   vaio           19467   02/27,13:36:59 00:03:04 
Fail 0005  1   rotwang        19470   02/27,13:36:59 00:00:01 
Done 0006  1   vaio           19472   02/27,13:48:59 00:03:04 
Done 0007  1   vaio           19476   02/27,13:49:00 00:03:04 
Done 0008  1   vaio           19479   02/27,13:49:00 00:03:04 
Fail 0009  1   rotwang        19481   02/27,13:49:00 00:00:00 
Fail 0010  1   rotwang        19485   02/27,13:49:00 00:00:00 
Fail 0011  1   rotwang        19425   02/27,13:49:59 00:00:00 
Done 0012  1   vaio           19428   02/27,13:55:59 00:03:04 
Done 0013  1   vaio           19431   02/27,13:55:59 00:03:04 
Done 0014  1   vaio           19434   02/27,13:55:59 00:03:04 
Done 0015  1   vaio           19437   02/27,13:59:04 00:03:04 
Done 0016  1   vaio           19440   02/27,13:59:04 00:03:04 
Done 0017  1   vaio           19444   02/27,13:59:04 00:03:04 
Fail 0018  1   rotwang        19446   02/27,13:59:45 00:00:00 
Done 0019  1   vaio           19449   02/27,14:09:05 00:03:04 
Done 0020  1   vaio           19452   02/27,14:09:05 00:03:04 

% rush -que fail
0003: Que
0005: Que
0009: Que
0010: Que
0011: Que
0018: Que

% 

Holding Frames

    At times you may require a job to hold off rendering certain frames because data to generate the frames isn't ready.

    You could pause the entire job with:

      rush -pause

    ...but then ALL frames stop rendering until you continue the job.

    If you only want certain frames to be skipped, mark them with 'Hold':

      rush -hold 35-50

    Those frames will be skipped for rendering, and the system will render all other frames. If there are no more frames to render, the job will remain active (even if the job is set to AutoDump) until the 'Hold' frames are manually marked 'Que' again:

      rush -que 35-50    # this..
      rush -que hold     # or this.



% rush -hold 35-50
0035: Hold
0036: Hold
0037: Hold
0038: Hold
0039: Hold
0040: Hold
0041: Hold
0042: Hold
0043: Hold
0044: Hold
0045: Hold
0046: Hold
0047: Hold
0048: Hold
0049: Hold
0050: Hold

% rush -lf
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Run  0030  1   vaio           19552   02/27,13:56:40 00:01:09 
Run  0031  1   vaio           19554   02/27,13:56:40 00:01:09 
Run  0032  1   vaio           19556   02/27,13:56:40 00:01:09 
Que  0033  0   -              0       00/00,00:00:00 00:00:00 
Que  0034  0   -              0       00/00,00:00:00 00:00:00 
Hold 0035  0   -              0       00/00,00:00:00 00:00:00 
Hold 0036  0   -              0       00/00,00:00:00 00:00:00 
Hold 0037  0   -              0       00/00,00:00:00 00:00:00 
Hold 0038  0   -              0       00/00,00:00:00 00:00:00 
Hold 0039  0   -              0       00/00,00:00:00 00:00:00 
Hold 0040  0   -              0       00/00,00:00:00 00:00:00 
Hold 0041  0   -              0       00/00,00:00:00 00:00:00 
Hold 0042  0   -              0       00/00,00:00:00 00:00:00 
Hold 0043  0   -              0       00/00,00:00:00 00:00:00 
Hold 0044  0   -              0       00/00,00:00:00 00:00:00 
Hold 0045  0   -              0       00/00,00:00:00 00:00:00 
Hold 0046  0   -              0       00/00,00:00:00 00:00:00 
Hold 0047  0   -              0       00/00,00:00:00 00:00:00 
Hold 0048  0   -              0       00/00,00:00:00 00:00:00 
Hold 0049  0   -              0       00/00,00:00:00 00:00:00 
Hold 0050  0   -              0       00/00,00:00:00 00:00:00 
Que  0051  0   -              0       00/00,00:00:00 00:00:00 
Que  0052  0   -              0       00/00,00:00:00 00:00:00 
Que  0053  0   -              0       00/00,00:00:00 00:00:00 
Que  0054  0   -              0       00/00,00:00:00 00:00:00 

% 

Pausing A Job

    Pausing a job prevents it from starting new frames to render, while letting running frames finish.

    You can pause the current job at any time with:

      rush -pause

    To continue the job again, use:

      rush -cont

    Jobs that are paused will show up in 'rush -lj' reports with the job's State field being 'Pause'.

    You can pause / continue several jobs at a time:

      rush -pause vaio.215 vaio.216 tahoe.353
      rush -cont vaio.215 vaio.216 tahoe.353

    If your login name is 'fred', you can pause all your jobs with:

      rush -pause fred

    Use pause if you want to leave the job in the queue while making fixes to scene files, then continue the job to pick up.

    You can pause a job and kill all running frames using this rush combination:

      rush -pause ; rush -que run


[erco@rotwang] % rush -pause
Job rotwang.52 is now 'Pause'

[erco@rotwang] % rush -lj
STATUS JOBID       TITLE        OWNER    %DONE %FAIL BUSY NOTES
------ ----------- ------------ -------- ----- ----- ---- -----------
Pause  rotwang.52  FOO          erco     %15   %0    2    Job paused.

[erco@rotwang] % rush -cont
Job rotwang.52 is now 'Run'

[erco@rotwang] % rush -lj
STATUS JOBID       TITLE        OWNER    %DONE %FAIL BUSY NOTES
------ ----------- ------------ -------- ----- ----- ---- -----------
Run    rotwang.52  FOO          erco     %25   %0    3    00:00:55


Dumping A Job

    To dump a job is to kill it completely, and remove it from the queue.

    All running frames are killed, and the job will remain in 'rush -lj' reports until all frames are confirmed killed. 'rush -lc' will show which frames are being waited for.

    If you have DoneMail configured, mail will be sent indicating the job was dumped. If a DoneCommand is configured, it will be executed before the job is removed from the queue.

    To dump jobs, you can either specify the jobid(s) on the command line individually or let the RUSH_JOBID variable determine which jobs will be dumped.

      rush -dump va.15 va.16 # Specific jobids,
      rush -dump       # ..or use RUSH_JOBID.


% rush -lj
STATUS JOBID       TITLE        OWNER    %DONE %FAIL BUSY NOTES
------ ----------- ------------ -------- ----- ----- ---- -----------
Run    va.15       TEST         erco     %30   %0    3    00:08:10
Run    va.16       TEST2        erco     %10   %0    8    00:08:12

% rush -dump va.15 va.16
Job va.15 is now 'Dump'
Job va.16 is now 'Dump'

% rush -lj
STATUS JOBID       TITLE        OWNER    %DONE %FAIL BUSY NOTES
------ ----------- ------------ -------- ----- ----- ---- -----------

% 

Frame Notes

    You can utilize the 'NOTES' field in 'rush -lf' reports by placing extra code in your render scripts to detect certain error conditions and, if errors are encountered, change the notes for that frame.

    You can embed 'rush -notes' commands into your render script to alter the 'NOTES' field for the rendering frame, e.g.:

      if ( error_condition ) then
          rush -notes ${RUSH_FRAME}:'Short error msg here'
      endif

    Frame notes are cleared each time a frame begins rendering, so there is no need to specify a rush command to clear the frame notes in your render script. In fact, that action is discouraged because of the following warning:

    Warning: Invoking 'rush' commands from within render scripts should only be executed during error conditions. Invoking rush unconditionally will incur heavy TCP load on the job server daemon. Many simultaneous connections will critically slow the daemon's response.

    This especially occurs if rendering on many cpus, and render times are short. You are encouraged to embed 'rush' commands in render scripts only under error conditions, so as to lessen the frequency of multiple TCP connections during normal operation.


% cat render_me
[..]
echo "--- Working on frame $RUSH_FRAME - `date`" 

### YOUR RENDER COMMAND(S) HERE
particle $DATA/files/stars-$RUSH_PADFRAME.par
set err = $status

# CHECK FOR MISSING FILES
egrep -i no.such.file.or.directory $RUSH_LOGFILE > /dev/null
if ( $status ) rush -notes ${RUSH_FRAME}:'Missing file'

# CHECK FOR CORE DUMPS
egrep -i core.dumped $RUSH_LOGFILE > /dev/null
if ( $status ) rush -notes ${RUSH_FRAME}:'Core dumped'

# CHECK FOR LICENSE ERRORS
egrep -i no.available.licenses $RUSH_LOGFILE > /dev/null
if ( $status ) then
    rush -notes ${RUSH_FRAME}:'License error'
    sleep 10
endif

[..]
% rush -lf
STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
Fail 0030  2   vaio           20338   02/27,14:41:22 00:01:03 Missing file
Fail 0031  2   vaio           20339   02/27,14:41:22 00:01:03 Missing file
Fail 0032  2   vaio           20340   02/27,14:41:22 00:01:03 Missing file
Run  0033  9   vaio           20365   02/27,14:55:25 00:00:45 License error
Done 0034  9   vaio           20367   02/27,14:41:25 00:01:04 -
Done 0035  8   vaio           20369   02/27,14:41:25 00:01:04 -
Done 0036  8   tahoe          20389   02/27,14:41:29 00:01:03 -
Done 0037  8   tahoe          20394   02/27,14:41:29 00:01:03 -
Done 0038  8   tahoe          20396   02/27,14:41:29 00:01:03 -
Done 0039  8   superior       20413   02/27,14:41:32 00:01:03 -
Done 0040  8   superior       20423   02/27,14:41:32 00:01:03 -
Done 0041  8   erie           20425   02/27,14:41:32 00:01:03 -
Done 0042  8   rotwang.erco.c 12662   02/27,14:41:32 00:01:06 -
Done 0043  8   rotwang.erco.c 12663   02/27,14:41:32 00:01:06 -
Fail 0044  8   rotwang.erco.c 12664   02/27,14:55:35 00:00:55 Missing file
Fail 0045  8   ontario        20434   02/27,14:55:35 00:00:55 Missing file
Fail 0046  8   ontario        20441   02/27,14:55:35 00:00:55 Missing file

All Hosts/All Jobs/All Cpus Reports

    Excerpts from network wide reports.


% rush -lah
IP               Hostname   Ram  Cpus MinPri Criteria
10.100.100.1     tahoe      512  2    0      +any,irix
10.100.100.2     superior   512  2    0      +any,irix,+dante
10.100.100.3     erie       512  2    0      +any,irix,+dante
10.100.100.4     ontario    512  2    0      +any,irix,+dante
10.100.100.5     vaio       128  1    0      +any,linux,intel
10.100.100.6     rotwang    128  1    100    +any,linux,intel

% rush -laj
STATUS JOBID       TITLE        OWNER    %DONE %FAIL BUSY NOTES
------ ----------- ------------ -------- ----- ----- ---- -----------
Run    tahoe.1     BURN/anchor  colby    %100  %0    0    42:11:46
Run    superior.3  BURN/relic   zev      %80   %0    3    03:22:03
Run    superior.4  BURN/relic2  zev      %0    %50   1    02:06:13
Run    erie.2      COMSAT/anten renke    %0    %0    0    19:32:15
Run    erie.4      COMSAT/retra renke    %0    %100  0    08:59:08
Run    ontario.6   BURN/lick1   colby    %100  %0    0    43:37:41
Run    ontario.8   BURN/lick3   colby    %100  %0    0    22:46:14
Run    ontario.9   BURN/lick4   colby    %75   %25   0    22:31:44
Run    ontario.10  BURN/lick5   colby    %0    %100  0    22:25:38
Run    ontario.11  BURN/lick6   colby    %100  %0    0    21:47:35
Run    ontario.12  BURN/lick7   colby    %99   %1    0    20:56:52
Run    vaio.11     TEST/r+d     wu       %100  %0    0    24:42:34
Run    erie.12     TEST/r+d     wu       %100  %0    0    22:10:29

% rush -lac
HOST       OWNER  JOBID      TITLE        FRM  PRI   PID    ELAPSED
tahoe      -      -          -            -    -     -       Online
tahoe      -      -          -            -    -     -       Online
superior   zev    superior.3 BURN/relic   0034 100   4026  00:10:01
superior   zev    superior.3 BURN/relic   0035 100   4027  00:10:02
erie       zev    superior.3 BURN/relic   0041 900k  14560 00:10:15
erie       zev    superior.4 BURN/relic2  0038 100k  4038  00:10:02
ontario    -      -          -            -    -     -      Offline
ontario    -      -          -            -    -     -      Offline
vaio       -      -          -            -    -     -      Offline
*** NO RESPONSE FROM:
*** rotwang