RUSH RENDER QUEUE
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.40g 08/08/03

Strikeout text indicates features not yet implemented

TD Frequently Asked Questions

How can I use padded frame numbers (0000) in my render script?
My renders are coming up 'FAIL'. How do I figure out what's wrong?
How do I have rush automatically retry frames? How do I set the number of retrys?
My job isn't starting renders on my cpus. What's going on?
How do I set up my submit script to only render on certain platforms or operating systems?
How can I render several frames in one process using rush?
My job has its 'k' flag set; why isn't it bumping off other jobs' frames?
Is there an easier way to set the RUSH_JOBID environment variable?
What does 'rush' stand for?
Can my render script detect being 'bumped' by higher priority jobs?
Can I chain separate jobs together, so that one waits for the other to get done?
Is it possible to use negative frame numbers in rush?
Is there a way to see just the cpus busy running my job?
Is there a way to see what jobs a machine is busy rendering?
Is there a way to requeue a busy frame for a host that is down?
How do I list all the machines in a hostgroup?
Why is rush creating files owned by 'ntrush'?
Why are there weird file permissions on my rendered images?
Is there a way to start a job at a particular time?
Under Windows, why are UNC paths better than drive letters?
How do I make UNC paths work the same on Windows and Unix?

How can I use padded frame numbers (0000) in my render script?

Use $RUSH_PADFRAME, it is created for you automatically to do 4 digit padding.
To do your own custom frame number padding, use this unix technique:

    set padframe = `perl -e 'printf("%04d",$ENV{RUSH_FRAME});'`

To use different padding widths, just change the '4' (in '%04d') to a different number.

My renders are coming up 'FAIL'. How do I figure out what's wrong?

The most common problem is a render script that does not properly handle returning exit codes. Make sure your render script is correctly returning an appropriate render script exit code: 0=OK, 1=FAIL, 2=RETRY.
Also, check the frame logs being generated by your render script. Frame logs contain the error messages for each rendered frame which should help you determine the problem. Make sure your submit script has LogDir pointing to a valid directory, which is where your frame logs will be found.

How do I have rush automatically retry frames? How do I set the number of retrys?

See Retrying Frames.

My job isn't starting renders on my cpus. What's going on?

Use 'rush -lc' and check the Notes column for messages.
If you know the remote cpus aren't just busy with other jobs, then list your cpus and check the 'NOTES' column to see if the system is giving you reasons why your cpus are being rejected.
The job might be in Pause, there are no more frames to render, all the available machines don't have as much ram as your job needs, etc. Here are some typical situations:

[erco@howland]% rush -lc
CPUSPEC[HOST]        STATE       FRM  PID     JOBTID  ELAPSED  NOTES
placid=3@100k        Idle        -    -       1       00:04:37 Job state is 'Pause'
tahoe=1@1            Idle        -    -       2       00:02:08 No more frames
superior=1@1         Idle        -    -       3       00:02:08 Not enough ram
waccubuc=1@1         Idle        -    -       4       00:02:08 This is a 'neverhost'
ontario=1@1          Idle        -    -       5       00:02:08 Failed 'criteria' check

How do I set up my submit script to only render on certain platforms or operating systems?

Use the Criteria submit script command.
This command allows you to build a list of platforms, operating systems, or other general criteria to limit which machines will run your renders.
You can see the different criteria names in the output of 'rush -lah'. It is up to your sysadmin to maintain the criteria names.

How can I render several frames in one process using rush?

With clever scripting. See Batching Multiple Frames for how to render several frames at a time.
Sometimes it pays to render several frames at a time rather than one at a time, to decrease the amount of time the renderer spends loading files.
If you have existing script filters which monitor the progress of renders to determine which frames are rendering, you can probably easily modify these scripts to work with rush to reflect changes in the frame list, using either frame notes (rush -notes) or frame state change operations (rush -que/rush -done).

My job has its 'k' flag set; why isn't it bumping off other jobs' frames?

For a job to bump another off a cpu, these things must be true:

A job only bump other jobs of lower priority (i.e., not same priority)

A job can't be bumped if almighty flag is set ('a').

A job can't be bumped unless its entry in the -tasklist is either in the Avail or Run state.

When a frame is bumped, the bumped frame will show a message in its frame list indicating the job that bumped it, e.g.:
% rush -lf erie-790 STAT FRAME TRY HOSTNAME PID ELAPSED NOTES Run 0100 0 tahoe 10290 00:00:26 Run 0101 0 tahoe 10291 00:00:26 Que 0102 1 tahoe 10292 00:00:09 Bumped by ralph's superior-791,KILLER @300ka Que 0103 0 - 0 00:00:00 [..]

Is there an easier way to set the RUSH_JOBID environment variable?

You can use eval `submit` to automatically set it, or a simple alias to set it manually. However, cutting and pasting the 'setenv' command is not so hard.
Some people like to use this alias to make it easy to set new jobid variables:

# Put this in your .cshrc

alias jid 'setenv RUSH_JOBID "\!*"'

Then you can use it on the command line to set one or more jobids:

erco@tahoe % jid tahoe.932 tahoe.933

If you want to have the RUSH_JOBID variable set automatically in your shell whenever you invoke your submit script, then use 'eval':

erco@tahoe % eval `my_submit_script`

...the shell automatically parses the 'setenv RUSH_JOBID' command rush prints on stdout when a job is successfully submitted. Error messages are not affected by 'eval', so you don't have to worry about losing error messages when using this technique.

What does 'rush' stand for?

Rush is not an acronym, it was named after film 'rushes', and the fact effects people are usually in a rush to get things done.

Can my render script detect being 'bumped' by higher priority jobs?

Usually the desire to do this stems from wanting to clean up left over temporary files generated by renders. In most cases, you can avoid left over files by putting temporary files in $RUSH_TMPDIR, which rush cleans automatically, even after bumps.

Bumps and dumps use SIGKILL to kill the render script and its children. This signal is NOT trappable. There's a reason:

Under many circumstances SIGTERM, the 'trappable' kill is not effective, especially during heavy rendering, causing bumped frames not to bump, screwing up unattended use, and leaving processors unproductive.
Since bumps can happen just as readily as dumps, both use SIGKILL, untrappable, and always effective (except in pathological cases where the process is hung).

So do not expect to be able to trap interrupts to detect bumps/dumps.

If you need a way to determine if you are re-rendering a frame that was previous killed mid-execution (i.e., bumped by a higher priority job), you can put some logic into your render script:

    #!/bin/csh -f
    ..
    if ( -e /somewhere/$RUSH_FRAME.busy ) then
	echo We are picking up a frame that was killed.
	echo Do pickup stuff here..
    endif

    # Create a 'busy' file for this frame
    #    If we are bumped, busy file is left behind 
    #    so that the above logic can detect it.
    #
    touch /somewhere/$RUSH_FRAME.busy
    echo Do rendering here..
    rm -f /somewhere/$RUSH_FRAME.busy

Can I chain separate jobs together, so that one waits for the other to get done?

Yes, see the submit script command WaitFor to have a job wait for others to dump before starting.
Also, see DependOn to have a job wait for frames in another job to get done, i.e., rather than wait for the entire job to complete.

Is it possible to use negative frame numbers in rush?

No. You are evil.
If you are trying to include 'handles' and 'slates' by using negative numbers, don't.

Is there a way to see just the cpus busy running my job?

Yes. In Unix:

   rush -lc | grep Busy
   rush -lf | grep Run

...and on WinNT, if you don't have grep(1):

   rush -lc | findstr Busy
   rush -lf | findstr Run

Is there a way to see what jobs a machine is busy rendering?

In Unix:
   rush -tasklist host | grep Busy
   
..and on WinNT, if you don't have grep(1):
   rush -tasklist host | findstr Busy
   

Is there a way to requeue a busy frame for a host that is down?

If a machine goes down while rendering a frame, the frame stays in the Busy state until the machine is rebooted. Once rush realizes the remote machine rebooted, it requeues the frame.
But if the machine never reboots, the frame will stay in the Busy state indefinitely, unless you take the following action.
Assuming you're *sure* the machine is down, and not just 'slow', use the following command:
    % rush -down hosta hostb
    
...where 'hosta' is the name of the machine that is down, and 'hostb' is the name of the machine that's the server for the job(s) with the hung frame(s).
Beware; if the remote machine is not really down, and is still running the frame, doing the above will start the frame running on another machine, and the two frames will overwrite each other.

How do I list all the machines in a hostgroup?

Just grep the output of 'rush -lah', or parse the contents of the $RUSH_DIR/etc/hosts file.
For instance, to print all the hosts in the "+foo" hostgroup:
    rush -lah | grep +foo
    
...or to precisely parse them from the hosts file with awk (which you should be able to cut and paste into a Unix tcsh shell):

    awk 'BEGIN { s="+foo"; } 					\
	 { if (match($0,"^#")) next; n = split($5,arr,",");	\
	   for (i=1; i<=n; i++) { 				\
	       if(arr[i]==s) { print $1; break; } 		\
	   }							\
	 }' < /usr/local/rush/etc/hosts

Why is rush creating files owned by 'ntrush'?

Why are there weird file permissions on my rendered images?

umask

If the umask is incorrect, make sure your render script is sourcing the $RUSH_DIR/etc/.render file, and make sure the sysadmin configured this file correctly to include the umask approriate render jobs. This way everyone will have the correct umask by default, since all render scripts should be sourcing that file.

You can also explicitly specify a umask command in your render script. Just include the command somewhere before your first render commands.

Is there a way to start a job at a particular time?

WaitFor

'rush -waitfor'

    waitfor +8h     -- wait 8 hours from now
    waitfor 7:30pm  -- wait until 7:30pm

WaitFor

Under Windows, why are UNC paths better than drive letters?

UNC paths always work.
There's no setup needed, other than to 'share the drive'.
Drive letters don't always work;
if a user isn't logged in, or logs out, drive letters 'disappear', unexpectedly causing background renders to suddenly fail. Drive letters are connections associated with the logged in user.
Drive letters disappear when someone logs out.
UNC paths always work, and are not affected by login/logouts by users.
UNC paths will work even on a freshly booted machine.
Drive letters won't work unless someone is logged in.
Drive letters are not portable to other platforms

How do I make UNC paths work the same on Windows and Unix?

Often, it's just a matter of putting symbolic link(s) in the right place on the unix machines to make UNC paths work under unix.

For instance, on my network, before I created the symlinks, I needed a different path on each platform to access a particular file:

    //tahoe/net/tmp/foo       - UNC path; works on windows
    /mnt/net/tmp/foo          - Unix path; typical mount directory name

mkdir /tahoe ln -s /mnt/net /tahoe/net

..this makes the unix path //tahoe/net/tmp/foo resolve to the actual mount directory /mnt/tahoe/net/tmp/foo, resolving equally well.

You can find out the UNC pathnames for your drive maps by typing at a DOS prompt:

NET USE

For example:

    C:\>net use
    New connections will be remembered.
    Status       Local     Remote                    Network
    -----------------------------------------------------------------------------
    Connected    Z:        \\tahoe\net               Microsoft Windows Network
                 -----     ---------------
                 Drive     UNC pathname

        //tahoe/net         -- BAD: might not present a directory listing
        //tahoe/net/        -- GOOD: more likely will present a listing

Only Microsoft's own tools have problems with frontslashes, namely:

Explorer
Desktop folder browser
Many DOS commands such as COPY and DIR, which use front slashes as argument flags.

In all other cases, frontslashes work fine, esp. in third party software, like CSH, Perl, and all programs written in C or C++. C/C++ programs would have to go out of their way to misinterpret front slashes.