Rush Logo Rush Render Queue - How To Use Rush Features
V 103.07b 05/11/16
(C) Copyright 2008, 2016 Seriss Corporation. All rights reserved.
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.

Strikeout text indicates features not yet implemented





   Chaining Jobs  
There are two ways to chain jobs;

  1. Chaining Individual Frames
  2. Chaining Job Completion

Most people will want the newer 'Chaining Individual Frames', which lets one inter-depend at the frame level using the new 'dependon' submit command.

I. Chaining Frames

Jobs can be submitted such that one job's frames waits for the others using the submit script command DependOn.

This allows you to create a chain of dependencies, such that some jobs render frames in parallel, while other jobs wait for individual frames to finish.

You can do this by making a script that invokes several job submissions; each time a job is submitted, the jobid(s) are saved, and used in the 'dependon' command in the NEXT submit script. This creates a frame dependency chain between one job and others. Many jobs can be chained in this manner.

Example

Here's a typical fg/bg/comp example; a submit script that starts three jobs; two renders (fg/bg) run in parallel, and a third (comp) waits, starting comps as frames in the fg/bg job complete. Note how the csh eval command is used to gather up jobids for the comp job's dependon command.

    FG/BG/COMP Frame Dependent Jobs
    #!/bin/csh -f
    
    ### SUBMIT SCRIPT -- Create frame dependencies between jobs    
    
    # Job #1: FOREGROUND ELEMENT
    eval `rush -submit` << EOF
       title      MYSHOW/FG
       ram        250
       frames     1-10
       command    $cwd/render-fg
       cpus       +any=10@100
       logdir     $cwd/logs-fg
    EOF
    if ( $status ) exit 1
    set fgjobid = $RUSH_JOBID
    echo "  FG: setenv RUSH_JOBID $RUSH_JOBID"
    
    # Job #2 -- BACKGROUND ELEMENT
    #           This job can run in parallel with the foreground,
    #           so no dependency is defined.
    #
    eval `rush -submit` << EOF
       title      MYSHOW/BG
       ram        250
       frames     1-10
       command    $cwd/render-bg
       cpus       +any=10@100
       logdir     $cwd/logs-bg
    EOF
    if ( $status ) exit 1
    set bgjobid = $RUSH_JOBID
    echo "  BG: setenv RUSH_JOBID $RUSH_JOBID"
    
    # Job #3 -- COMP
    #           This job waits for individual frames in FG and BG jobs 
    #           to complete successfully before comping frames.
    #
    eval `rush -submit` << EOF
       title      MYSHOW/COMP
       ram        250
       frames     1-10
       command    $cwd/render-comp
       cpus       +any=10@100
       logdir     $cwd/logs-comp
       dependon   $fgjobid $bgjobid
    EOF
    if ( $status ) exit 1
    echo "COMP: setenv RUSH_JOBID $RUSH_JOBID"
        

II. Chaining Job Completion

    If you are looking for chaining jobs at the frame level, you probably want to see the above Chaining Frames section. However, if you want one job to wait for other jobs to COMPLETLY finish before moving on to the next, read on.

Jobs can be submitted such that one job waits for the other to complete. New in 102.40 and up: You can set a job to wait for other jobs to not only wait for another job to dump, but also change to certain states such as Fail or Done.

Typically the situation is to wait for other jobs to become Done. To do this, use the submit script 'waitfor' command to wait for other jobids, being sure to set waitforstate done so that the job waits for the other job to be done.

You can make a high level script that creates several jobs, taking the jobids from the first two, and passing them as arguments to 'waitfor' for the third job. There is no limit to how many jobs can be linked this way; you can have multiple jobs chain together.

(Note: It is no longer necessary to use autodump done to force jobs to dump when chaining jobs; the new submit command waitforstate lets you set a job to wait for other jobs to simply be Done or Fail, or even dump.

A simple example showing how the csh eval command can be used to gather up the jobid of the first job, so that it is passed to the second job's waitfor command.

    Chaining Jobs
    #!/bin/csh -f
    source $RUSH_DIR/etc/.submit
    
    ### SUBMIT SCRIPT -- Chaining Multiple Jobs
    
    # Job #1
    eval `rush -submit` << EOF
       title      MYSHOW/MYRENDER
       ram        250
       frames     1-10
       command    $cwd/render-script 
       cpus       +any=10@100
       cpus       vaio=8@100                
       logdir     $cwd/logs-1
    EOF
    if ( $status ) exit 1
    
    # (eval eats the setenv command, so we duplicate it here)
    echo "setenv RUSH_JOBID $RUSH_JOBID" 
    
    # Job #2 -- this job will wait for the above job to finish      
    rush -submit << EOF
       title      MYSHOW/MYCOMP
       ram        250
       frames     1-10
       command    $cwd/comp-script  
       cpus       +any=10@100
       cpus       vaio=8@100
       logdir     $cwd/logs-2
       waitfor    $RUSH_JOBID 
       waitforstate donefail 
    EOF
    	

   Batching Multiple Frames  
Large renders can often benefit from running several frames at a time in a single process execution, instead of rendering each frame with a separate process.

The benefit is mainly avoiding having the entire scene (geometry, texture maps, animation files) reloaded on /every/ frame. By having the renderer run several frames (a 'batch' of frames) each time it loads, it saves the overhead of loading all the data each frame; it only loads once for the whole batch. This makes the overall render time shorter.

A good technique is to tell the render queue to render on 'tens' (ie. 1-500,10) and have the render script invoke the render to run ten frames at a time using $RUSH_FRAME as the start frame, and ($RUSH_FRAME + 9) as the end frame.

To script this involves two things; (1) telling rush to create a framelist for every 10th frame (so it only invokes the render script every 10th frame), and (2) passing this batch value to the render script so it knows how many frames to render in one execution. We also pass the job's end frame to make sure we don't render beyond the last frame in the job. A csh script example:

    Submit Script: Batch Rendering
    #!/bin/csh -f
    
    ## SUBMIT SCRIPT
    
    set job_sfrm  = 1        # First frame to render
    set job_efrm  = 25       # Last frame to render
    set job_batch = 10       # Number of frames to batch at a time       
    
    # SUBMIT THE JOB
    rush -submit << EOF
    title           BATCH_RENDER
    ram             1
    frames          $job_sfrm-$job_efrm,$job_batch
    command         $cwd/render-batch $job_batch $job_efrm
    logdir          $cwd/logs
    cpus            +any=10@100
    EOF
    exit $status
    	

    Render Script: Batch Rendering
    #!/bin/csh -f
    
    ### RENDER SCRIPT
    
    # JOB BATCH FRAME + END FRAME VALUES
    @ job_batch  = $argv[1]
    @ job_efrm   = $argv[2]
    
    # DETERMINE BATCH START + END
    @ batch_sfrm = $RUSH_FRAME
    @ batch_efrm = ( $batch_sfrm + $job_batch - 1 )
    
    # BATCH CLIP - PREVENT RENDERING BEYOND JOB'S END FRAME
    if ( $batch_efrm > $job_efrm ) @ batch_efrm = $job_efrm
    
    echo "--- Working on frames $batch_sfrm - $batch_efrm - `date`"       
    myrender -s $batch_sfrm -e $batch_efrm
    if ( $status ) exit 1 
    exit 0
            

Python version of the above batch rendering script..

    Python Submit Script: Batch Rendering
    #!/usr/bin/python -B
    
    import os,sys
    
    ## PYTHON SUBMIT SCRIPT
    job_sfrm  = 1			# First frame to render
    job_efrm  = 25			# Last frame to render
    job_batch = 10			# Number of frames to batch at a time
    
    # SUBMIT THE JOB
    p = os.popen("rush -submit", "w")
    p.write("title           BATCH_RENDER\n"
           +"ram             1\n"
           +"frames          %d-%d,%d\n" % (job_sfrm, job_efrm, job_batch)
           +"command         python %s/render-batch.py %d %d\n" % (os.getcwd(), job_batch, job_efrm)
           +"logdir          %s/logs\n" % os.getcwd()
           +"cpus            +any=10@100\n"
           )
    sys.exit(0)
    	

    Python Render Script: Batch Rendering
    #!/usr/bin/python -B
    
    ### PYTHON RENDER SCRIPT
    
    import os,sys
    
    # JOB BATCH FRAME + END FRAME VALUES
    job_batch  = int(sys.argv[1])
    job_efrm   = int(sys.argv[2])
    
    # DETERMINE BATCH START + END
    batch_sfrm = int(os.environ["RUSH_FRAME"])
    batch_efrm = ( batch_sfrm + job_batch - 1 )
    
    # BATCH CLIP - PREVENT RENDERING BEYOND JOB'S END FRAME
    if ( batch_efrm > job_efrm ): batch_efrm = job_efrm     
    
    print "--- Working on frames %s - %s" % (batch_sfrm, batch_efrm)
    sys.stdout.flush()
    sys.stderr.flush()
    err = os.system("myrender -s " + str(batch_sfrm) + " -e " + str(batch_efrm))
    if ( err ): 
        sys.exit(1)
    sys.exit(0)
    	

Because batching affects rush's frame step rate (eg. 'frames 1-100,10), if the user wants to also be able to include a step rate for their render (ie. render only odd frames, or stepping on 2's) along with batching on 10's, you need a little more logic.

So in the following, we adapt the above script pair to have both 'batch' AND 'step' as separate options:

    Submit Script: Batching With Step Rates
    #!/bin/csh -f
    
    ## SUBMIT SCRIPT
    
    # NUMBER OF FRAMES TO BATCH
    #     Change these first values, sfrm/efrm/batch/step, as needed
    #
    set job_sfrm  = 1           # start frame for render
    set job_efrm  = 105         # end frame for render
    set job_batch = 10          # batch frames (how many frames to render per process)
    set job_step  = 2           # step rate for renders (e.g. 2 renders every other frame)    
    @ batchstep = ( $job_batch * $job_step )
    
    # SUBMIT THE JOB
    rush -submit << EOF
    title           BATCH_AND_STEP_RENDER
    ram             10
    frames          $job_sfrm-$job_efrm,$batchstep
    command         $cwd/render-batch $job_batch $job_step $job_efrm
    logdir          $cwd/logs
    cpus            +any=10@100
    EOF
    exit $status
    	

    Render Script: Batching With Step Rates
    #!/bin/csh -f
    
    ### RENDER SCRIPT
    
    # START/END FRAME FOR BATCHING
    @ job_batch   = $argv[1]          # user's batch frames
    @ batch_step  = $argv[2]          # user's step rate
    @ job_efrm    = $argv[3]          # user's last frame in range
    @ batch_sfrm  = $RUSH_FRAME
    @ batch_efrm  = ( $batch_sfrm + ($job_batch * $batch_step) - 1 )
    if ( $batch_efrm > $job_efrm ) @ batch_efrm = $job_efrm    # don't exceed user's last frame in range     
    
    echo "--- Working on frames $batch_sfrm-$batch_efrm,$batch_step - `date`"
    myrender -s $batch_sfrm -e $batch_efrm -i $batch_step
    if ( $status ) exit 1 
    exit 0
            

Python example of batching with step rates..

    Python SUBMIT Script: Batching With Step Rates
    #!/usr/bin/env python
    import os,sys
    
    ## SUBMIT SCRIPT - Batching with step rates
    
    # NUMBER OF FRAMES TO BATCH
    #    Change these first values, sfrm/efrm/batch/step, as needed
    #
    job_sfrm      = 1                # start frame for render
    job_efrm      = 105              # end frame for render
    job_batch     = 10               # batch frames (how many frames to render per process)
    job_step      = 2                # step rate for renders (2=render every other frame)
    batch_step    = ( job_batch * job_step )
    cwd           = os.getcwd()
    
    # SUBMIT THE JOB
    fp = os.popen("rush -submit", "w")
    fp.write("""
    title           BATCH_AND_STEP_RENDER
    ram             50
    frames          """ + "%d-%d,%d" % (job_sfrm, job_efrm, batch_step) + """
    command         """ + "python %s/render-batch %d %d %d" % (cwd, job_batch, job_step, job_efrm) + """
    logdir          """ + cwd + "/logs" + """
    cpus            +any=10@100
    """)
    fp.close()
    		

Note how the above submit passes job_batch, job_efrm and batch_step as arguments to the render script, which will then use them to determine the range of frames to render in the batch with a step rate:

    Python RENDER Script: Batching With Step Rates
    #!/usr/bin/env python
    import os,sys
    
    ## RENDER SCRIPT - Batching with step rates
    
    # START/END FOR BATCH
    job_batch   = int(sys.argv[1])
    job_step    = int(sys.argv[2])
    job_efrm    = int(sys.argv[3])
    batch_sfrm  = int(os.environ["RUSH_FRAME"])
    batch_efrm  = (batch_sfrm + (job_batch * job_step) - 1)
    if batch_efrm > job_efrm: batch_efrm = job_efrm     # don't exceed job's last frame     
    
    # INVOKE RENDERER
    print "--- Working on frames %d-%d,%d" % (batch_sfrm, batch_efrm, job_step)
    sys.stdout.flush()
    err = os.system("myrender -s %d -e %d -i %d" % (batch_sfrm, batch_efrm, job_step))
    if err == 0: sys.exit(0)
    else:        sys.exit(1)
    		

   Retry Frames  
It is often useful to retry failed renders several times before giving up on the frame, and leaving it FAILed.

Whenever your render script returns an exit code of 2 (REQUEUE), the frame is requeued, the 'Try' count is incremented (shown in 'rush -lf' and the frame is executed again.

Rush passes the retry count to the render script as an environment variable $RUSH_RETRY which the script can use to act conditionally.

    Render Script: Simple Retry Counting
    #!/bin/csh -f
    source $RUSH_DIR/etc/.render
    
    ### RENDER SCRIPT
    
    echo "--- Working on frame $RUSH_FRAME - `date`" 
                                                                 
    render /job/MYJOB/MYSHOT/ribs/fg.$RUSH_PADFRAME.rib
    if ( $status == 0 ) exit 0     # it worked
    
    # FAILED? RETRY 3 TIMES
    if ( $RUSH_TRY < 3 ) exit 2   # retry up to 3 times
    exit 1                        # otherwise fail
    	    

In some cases, using just the rush try counter can be problematic if, say, there are killer jobs on your network that regularly bump frames, causing try counts to clock up unexpectedly.

In such cases, you might want to make your own try counter that counts completed attempts on the rendered frame, ignoring frames bumped by other 'killer' jobs. To do this, you could make your render script use this approach:

    Render Script: Retry Counts Around Killer Jobs
    
    #!/bin/csh -f
    source $RUSH_DIR/etc/.render
    
    ### RENDER SCRIPT
    
    # GET THE CURRENT TRY COUNT
    #     Keep our own 'try file' that is basically the log filename
    #     with a .try extension on the end. Note that resetting the
    #     rush try count will reset our own try counter as well.
    #
    if ( $RUSH_TRY == 0 || ! -e $RUSH_LOGFILE.try ) echo 0 > $RUSH_LOGFILE.try
    $mytry = `cat $RUSH_LOGFILE.try`
    if ( $mytry >= 3 ) then
        echo --- FAIL: TRY COUNT EXCEEDED ; exit 1
    endif
    
    # Render command here
    render /myshow/myshot/foo.$RUSH_PADFRAME.rib
    set err = $status 
    echo --- RENDER EXIT CODE: $err
    
    # Update try count AFTER render completes
    #    This way we count complete trys, not bumps.
    #
    $mytry++ ; echo $mytry > $RUSH_LOGFILE.try
    
    if ( $err ) then
        echo --- FAIL ; exit 1
    endif
    echo --- OK 
    exit 0
    	

   Detecting Render Problems With Grep  
Renderers that don't return exit codes. You may encounter renderers that return with 'exit 0' even if the render failed, making it hard to determine if the render script should return FAIL or DONE exit codes.

For instance, in a situation where a 3rd party program outputs error messages like 'cannot open file' or 'write error', but always returns an exit 0. A savvy render script programmer can use 'egrep' to detect the error message and report it back to rush.

    Detecting Render Problems with Grep
    #!/bin/csh -f
    
    ### RENDER SCRIPT
    
    my_render $RUSH_FRAME
    
    # 'my_render' always returns an exit code of 0, 
    # so to detect errors we have to grep for them.
                                                                  
    egrep 'cannot open file|write error' $RUSH_LOGFILE > /dev/null
    if ( $status == 0 ) then
        echo -- FAILED --
        exit 1
    else
        echo -- DONE --
        exit 0
    endif
    	   

Here's the same example using perl. This works on Windows (which doesn't have grep(1)) and Unix:

    Detecting Render Problems with Perl
    #!/usr/bin/perl
    
    ### RENDER SCRIPT
    
    # 'my_render' always returns an exit code of 0,  
    # so to detect errors we have to grep for them. 
                                                                  
    system("my_render $ENV{RUSH_FRAME}");
    
    # Check for error messages from the log file
    if ( open(FD, "<$ENV{RUSH_LOGFILE}") ) {
        while ( <FD> ) {
    	if ( /cannot open file/ || /write error/ ) {
    	    print STDERR "-- FAILED --\n";
    	    exit(1);
    	}
        }
        close(FD);
    }
    else { 
        print STDERR "$ENV{RUSH_LOGFILE}: $!\n";
    }
    print STDERR "-- OK --\n";
    exit(0);
        

The following shows what a frames report can look like if the render script checks for particular errors, and adds TD-friendly messages that appear in the NOTES column of the job's framelist to helpfully highlight what kind of error each failed frame encountered, ie:

    Custom Error Messages in Notes Field
        % rush -lf
        STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
        Fail 0030  2   vaio           20338   02/27,14:41:22 00:01:03 Missing file
        Fail 0031  2   vaio           20339   02/27,14:41:22 00:01:03 Missing file
        Fail 0032  2   vaio           20340   02/27,14:41:22 00:01:03 Missing file
        Run  0033  9   vaio           20365   02/27,14:55:25 00:00:45 License error  
        Done 0034  9   vaio           20367   02/27,14:41:25 00:01:04 -
        Done 0035  8   rotwang.erco.c 12663   02/27,14:41:32 00:01:06 -
        Fail 0036  8   rotwang.erco.c 12664   02/27,14:55:35 00:00:55 Missing file
        Fail 0037  8   ontario        20434   02/27,14:55:35 00:00:55 Missing file
        Fail 0038  8   ontario        20441   02/27,14:55:35 00:00:55 Missing file
    	    

What follows is an advanced example of a render script showing the detection logic that generates these messages. There is also a simple example in the rush tutorial.

     Grep: An Advanced Example to make Custom Error Messages 
    
    #!/bin/csh -f
    
    ### RENDER SCRIPT
    
    echo "--- Working on frame $RUSH_FRAME - `date`" 
    
    ### MAYA RENDER
    Render30 -s $RUSH_FRAME -e $RUSH_FRAME -b 1 -proj $1 -rd /jobs/MYSHOW/MYSHOT/images $2
    set err = $status
    
    ### GREP FOR ERROR MESSAGES
    set msg = ""
    if ( `grep -s "Texture file"          $RUSH_LOGFILE; echo $status` == "0" ) set msg = "Texture File"
    if ( `grep -s "Failed to open IFF"    $RUSH_LOGFILE; echo $status` == "0" ) set msg = "IFF Error"
    if ( `grep -s "find destination plug" $RUSH_LOGFILE; echo $status` == "0" ) set msg = "Plug Error"
    if ( `grep -s "ESEC_J"                $RUSH_LOGFILE; echo $status` == "0" ) set msg = "License Error"
    if ( `grep -s "doesn"                 $RUSH_LOGFILE; echo $status` == "0" ) set msg = "Missing File"
    if ( `grep -s "TrenderTesselation"    $RUSH_LOGFILE; echo $status` == "0" ) set msg = "Tesselation Error"
    if ( `grep -s "Memory exception"      $RUSH_LOGFILE; echo $status` == "0" ) set msg = "Memory Error"
    if ( `grep -s "post-process stage"    $RUSH_LOGFILE; echo $status` == "0" ) set msg = "Post Process"
    
    ### FOUND ONE OF THE ABOVE?
    if ( "$msg" != "" ) then
    
        # MAKE NOTE IN FRAMELIST FOR TD/RENDER WATCHER
        rush -notes ${RUSH_FRAME}:"$msg"
    
        switch ( "$msg" )
    
            ### NON-FATAL
            case "License Error":
            case "IFF Error":
            case "Plug Error":
            case "Tesselation Error":
            case "Memory Error":
            case "Post Process":
                echo -- REQUEUE 
    	    exit 2
    
            ### FATAL
            case "Texture File":
            case "Missing File":
            default:
                echo -- FAIL 
    	    exit 1
    
        endsw
    endif
    
    # NON-SPECIFIC ERROR?
    if ( $err != 0 ) then
        rush -notes ${RUSH_FRAME}:"See Logs"
        echo -- FAIL 
        exit 1
    endif
    
    # NO ERRORS
    echo -- OK 
    exit 0
        

   Disk Management  
Most disk space management is handled these days via unix quotas, and large raided disk systems. Some companies still have individual, un-RAIDed disks, such that a large project might need to render on several different disks, to prevent disk full problems from stopping renders.  Here's a technique that automatically fails over to different physical disks as disks become full:

    Disk Management In Render Scripts
    #!/bin/csh -f
    [..]
    
    # AVAILABLE DISKS
    # Cycle through the disks (in order of preference) until we
    # find one with enough free space.
    #
    set $disks = ( /mnt/disk1 /mnt/disk2 /mnt/disk3 done )
    
    foreach $outdir ( $disks )
    
        if ( "$outdir" == "done" ) then
            echo --- All disks are too full to use ; exit 1
        endif
    
        # Disk < 95% in use? use it
        if ( `df $outdir | awk '/^\// {print $6}'` < 95 ) break
    end
    
    # YOUR RENDER COMMANDS HERE #
    my_render $outdir/foo.$RUSH_PADFRAME.tif
    
    [..]
            

   Frame Notes  
You can make use of the NOTES field in 'rush -lf' reports by putting some extra code in your render scripts to detect certain error conditions that, if encountered, changes the notes for that frame.

You can embed 'rush -notes' commands into your render script to alter the 'notes' field for the rendering frame, eg:

    if ( error_occurred ) then
	rush -notes ${RUSH_FRAME}:'Your msg here'
    endif
    

Frame notes are cleared each time a frame begins rendering, so there's no need to specify a rush command to clear the frame notes in your render script. In fact, that's discouraged because of the following warning..

Warning: Each execution of 'rush -notes' invokes a TCP connection to the job server daemon. Invoking 'rush' commands on a per frame basis is unwise (except under error condition circumstances), as it imposes a large TCP load on the job server daemon if many connections occur all at once, slowing the daemon's response critically.

This happens especially if your render times are short, and you are rendering on many cpus. Therefore you are only encouraged to embed 'rush' commands in render scripts under error conditions only (ie. infrequently), so as to lessen the possibility of multiple concurrent TCP connections.

Here's an example showing a render script that makes use of the NOTES field to report helpful errors to the user..

    Render Script: Simple Retry Counting
    
    % cat render_me
    
    #!/bin/csh -f
    echo "--- Working on frame $RUSH_FRAME - `date`" 
    
    ### YOUR RENDER COMMAND(S) HERE
    particle $DATA/files/stars-$RUSH_PADFRAME.par
    set err = $status
    
    ### CHECK FOR MISSING FILES
    egrep -i no.such.file.or.directory $RUSH_LOGFILE > /dev/null
    if ( $status ) then
        rush -notes ${RUSH_FRAME}:'Missing file'
        exit 1	# FAIL
    endif
    
    ### CHECK FOR CORE DUMPS
    egrep -i core.dumped $RUSH_LOGFILE > /dev/null
    if ( $status ) then
       rush -notes ${RUSH_FRAME}:'Core dumped'
       exit 1	# FAIL
    endif
    
    ### CHECK FOR LICENSE ERRORS
    egrep -i no.available.licenses $RUSH_LOGFILE > /dev/null
    if ( $status ) then
        rush -notes ${RUSH_FRAME}:'License error'
        sleep 10
        exit 2	# RETRY
    endif
    
    ### NON-SPECIFIC ERRORS
    if ( $err ) then
        rush -notes ${RUSH_FRAME}:'?'
        exit 1
    endif
    exit 0
    
    % rush -lf
    STAT FRAME TRY HOSTNAME       PID     START          ELAPSED  NOTES
    Fail 0030  2   vaio           20338   02/27,14:41:22 00:01:03 Missing file
    Fail 0031  2   vaio           20339   02/27,14:41:22 00:01:03 Missing file
    Fail 0032  2   vaio           20340   02/27,14:41:22 00:01:03 Missing file
    Que  0033  9   vaio           20365   02/27,14:55:25 00:00:45 License error  
    Done 0034  9   vaio           20367   02/27,14:41:25 00:01:04 -
    Done 0035  8   vaio           20369   02/27,14:41:25 00:01:04 -
    Done 0036  8   tahoe          20389   02/27,14:41:29 00:01:03 -
    Done 0037  8   tahoe          20394   02/27,14:41:29 00:01:03 -
    Done 0038  8   tahoe          20396   02/27,14:41:29 00:01:03 -
    Done 0039  8   superior       20413   02/27,14:41:32 00:01:03 -
    Done 0040  8   superior       20423   02/27,14:41:32 00:01:03 -
    Fail 0041  8   erie           20425   02/27,14:41:32 00:00:08 Core dumped
    Done 0042  8   rotwang.erco.c 12662   02/27,14:41:32 00:01:06 -
    Done 0043  8   rotwang.erco.c 12663   02/27,14:41:32 00:01:06 -
    Fail 0044  8   rotwang.erco.c 12664   02/27,14:55:35 00:00:55 Missing file
    Fail 0045  8   ontario        20434   02/27,14:55:35 00:00:55 Missing file
    Fail 0046  8   ontario        20441   02/27,14:55:35 00:00:55 Missing file
                

When one of the above failed frames is requeued, the NOTES field is cleared as soon as the frame starts rendering again, preventing stale error messages from remaining when the frame re-renders.

To disable this 'auto-clearing' behavior, use the submit command 'FrameFlags keepnotes'.

   Handling Renderer License Errors  
License errors create a situation where you may either want to make several attempts before giving up, or pause the entire job so that no new frames will be issued for a while.

To pause the job for a short period, use the new 'rush -licpause' option in your render script; it will pause the job for 60 seconds (unless changed with the submit command LicPauseSecs):

    Handling License Errors
    #!/bin/csh -f
    source $RUSH_DIR/etc/.render
    
    ###############################
    #  R E N D E R   S C R I P T  #
    ###############################
    
    echo "--- Working on frame $RUSH_FRAME - `date`" 
    
    # INVOKE RENDERER
    hscript $hipfile < foo.hscript
    
    # CHECK FOR RENDER LICENSE ERRORS
    egrep 'Error acquiring license' $RUSH_LOGFILE > /dev/null    
    if ( $status == 0 ) then
    
        # PAUSE JOB FOR SHORT TIME, REQUEUE FRAME
        rush -licpause
        rush -notes ${RUSH_FRAME}:'License Error'
        exit 2
    
    endif
    
    exit 0
            

Another way to prevent license problems, e.g. if you have less Renderman licenses than you have processors, make a +hostgroup (e.g. +prman) that only has as many machines as you have licenses of Renderman, and have all the Renderman jobs submit using that hostgroup. You can set up your renderman submit script to only allow users to submit to that hostgroup to prevent problems.

   Threaded Rendering  
There are several ways to handle threading in Rush; which you choose depends on your rendering needs. Any one of the following techniques can be used:

    1) Submit jobs with the 'Ram:' value set to secure memory and processors. (Use this if you plan to have a mix of threaded and non-threaded rendering)

    2) Set processors in the rush/etc/hosts file to 1. (Use this if ALL your renders will be threaded)

    3) Use 'rush -reserve' to reserve some of the processors on the machines you need, so you can thread your renders on these machines to use several processors.

1) Securing Ram To Secure Processors

If you have a farm of dual proc machines that all have a gig of ram configured in rush (eg. 'rush -lah' shows 1024 in the Ram column), and you submit a job with the 'ram' value set to '1024', then you will effectively secure both processors from rush. This is because when rush starts your render on a machine, it will subtract the ram value your job requests from the configured ram value for that machine, leaving zero available for other jobs to use.

Also, you will only be able to start rendering on machines that have 1024 available, which means both processors must be unused by rush, otherwise rush will think less than 1024 is available, preventing your job from running on those machines.

If you want to allow other renders to still be able to use the other processors, then submit with your Ram value set just a little bit lower, eg. 1023. You can then submit other renders to these machines using a Ram value of 1, and they'll be able to get on because of the 1MB your job leaves behind; 1024-1023=1MB available.

2) Disabling Processors in the Rush Hosts List

You can tell Rush each machine only has one processor instead of two. Just change the number of cpus in the rush hosts file to 1 for the dual proc machines. Then rush will only assign one render per machine.

Such a change is only recommended if you want to affect all jobs the same way, e.g. if all rendering you run through Rush is multithreaded, and you would never want more than one frame rendering on the dual proc machines you have modified.

3) Reserving Processors

This technique is pretty intuitive; simply use 'rush -reserve' to reserve processors on the machines you want to use, and then submit your job to use those machines.

Setup your render script to first check how many cpus are reserved by your reserve job on the local machine before starting the renderer. If no cpus got reserved (they're busy doing someone else's job) then just render on with one thread. But your reserve job has reserved the other cpu, then tell the renderer to use two threads.

   Server Rotation  
On large networks, it's useful to distribute the task of job serving to a pool of different machines.

By default, rush submits jobs to the local host the submit script was invoked from. But sometimes workstations are unreliable for job serving, since they are often rebooted.

Rather than have jobs submit to a single job server (aka. Submit Host), you can them rotate through a pool of hostnames.

The example Rush submit scripts can be modified to automatically choose one out of a list of hostnames by finding the lines that set up the 'rush -submit' command, eg:

        my $cmd;
        if ( $G::iswindows )
        {
            $cmd = "start /min /wait cmd /x /c " .
                    "\"rush -submit $in{SubmitHost} < $submitfile 2> $err > $out\"";
        }
        else
        {
            $cmd = $ENV{RUSH_DIR} . "/bin/rush -submit $in{SubmitHost} ".
                   "< $submitfile 2> $err > $out";
        }
        system($cmd);
    
..and put the following line /above/ those lines:

    Rush Example Submit Scripts: Auto-select Submit Host
    # NO SUBMITHOST SPECIFIED? CHOOSE ONE AT RANDOM
    if ( $in{SubmitHost} eq "" || $in{SubmitHost} eq "-" )
    {
        my @servers = ( "host1", "host2", "host3" );	# pool of job servers
        $in{SubmitHost} = $servers[ time() % ($#servers+1) ];
    }
    

..change the red text to be the hostnames you want to use. You can use a different number of hostnames; the script will select one at 'random'.

The above change will cause the script to select a different server at random if the "Submit Host:" field was left blank; this lets the user override the random selection IF they specify a hostname at the "Submit Host" prompt.

You may want to change the comments for the 'Submit Host:' field if you make such a change, indicating if the field is left blank, one of the servers will be chosen at random, instead of the local machine.

* * *

If you have your own custom submit scripts, eg. one written in Csh, you can do something like this:

    Homebrew Submit Scripts: Auto-select Jobserver from a Pool
    #!/bin/csh -f
    set servers = ( host1 host2 host3 )
    set index   = `perl -e "print ((time() % $#servers) + 1);"`	# random select a server
    set server  = $servers[$index]
    rush -submit $server << EOF
    title MYJOB
    :
    : etc
    :
    EOF
    

   Workstation Rendering  
It's often useful to allow users to render on their own workstations with Rush.

But users will be reluctant to online their machines during the workday if they know other people's jobs might try to use their workstation.

There are at least two approaches you can take to set things up so that only users can use their own workstations for rendering:

    1. Use the MINPRI ('minimum priority') value in the rush/etc/hosts file to set a minimum priority for rendering on the workstations.

    2. Make separate +farm and +work hostgroups; only let users set the Cpus: of their job to +farm and their own workstation's hostname

Regarding approach #1: the systems administrator can configure the MINPRI column in the rush/etc/hosts file for workstations, so that the machine will only accept renders for jobs that request that machine at a high priority (ie. equal to or higher than the MINPRI value).

The systems administrator can set the MINPRI column in the rush/etc/hosts file for user's workstations to eg. 900 to only allow jobs of priority 900 and up to render on them. This way users can submit jobs with their cpus set to eg:

    +any=10@100
    tahoe=1@900

..where 'tahoe' is the name of their workstation. This will cause rush to use 'any 10 available cpus on the network at a priority of 100' (+any=10@100) which will exclude all workstations that have MINPRI set to 900 (because 100 is lower than 900), and will also ask rush to use one cpu on their own workstation at 900 priority.

Make a rendering policy for the TDs so that 900 priority is only used for rendering on 'your own workstations'. This way users don't have to worry about other people's jobs rendering on their workstation when its idle.

If they decide to take their machine out of rendering so they can use it for doing jitter-free client playbacks or dailies, they can just hit 'Getoff' in the 'onrush' program to offline their machine from rendering their own job, then they can online it later.

Regarding approach #2: if you make a separate +farm group, and force all users to submit only to +farm, and separately ask for their own machines only, that will work as well, eg. setting their Cpus: to:

    +farm=10@100
    tahoe=1

..which asks for any 10 available farm machines, and one proc on their own workstation (in this case "tahoe").