RUSH RENDER QUEUE: FILES
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.31m 03/12/02
Strikeout text indicates features not yet implemented


Configuration File
$RUSH_DIR/etc/rush.conf


rush.conf
  Commands  
AdminUser
AllJobsFmt
AllJobsHeader1
AllJobsHeader2
JobHeader2
AllowPush
AppHostCache
ClientPort
CpuAcctPath
DaemonHostCache
DisableFu
DisablePflags
ExpandCpus
ForceGid
ForceUid
GidRange
HourlyConsole
InMaxMsgs
JobFmt
JobHeader1
JobHeader2
JobPassTimeout
JobUpdateThrottle
LogFlags
LogRotateHour
MaxNewTaskMsgs
NtRushGid
NtRushUid
ServerPort
SmtpDebug
SmtpFrom
SmtpPort
SmtpServer
TaskCleanupHours
TaskKeepaliveSecs
TcpSockOpts
TmpDir
UdpMaxRetries
UdpRestTimeOut
UdpTimeout
UidRange
The configuration file should be customized by the systems administrator. Most settings are used only for fine tuning, but some control important security settings (uidrange/gidrange/forceuid/forcegid), and process auditing/logging (cpuacctpath).

The rush.conf file can be updated on the fly. Simply edit a copy, make changes, then rdist(1) the copy to all the machines, and the daemons will pick up your changes within one minute.

(New in 102.31) You can prefix any rush.conf commands with 'os=windows' or 'os=unix' to qualify the command be run on those respective operating systems. This way you can maintain one file for both windows and unix operating systems. eg:

	os=windows tmpdir C:/TEMP         # WinNT
	os=unix    tmpdir /var/tmp        # Unix
	
..in this case 'tmpdir C:/TEMP' is set on all the windows machines, and 'tmpdir /var/tmp' is set on all the unix machines.

To make changes to this file and update this to the network, use these commands.

AdminUser
Sets login name for user allowed to administer the rush daemons. The adminuser can always manipulate jobs, regardless of the value of DisableFu. Also, commands such as 'rush -dexit', 'rush -dlog a' and others are limited to root and 'adminuser'.

Set to 'root' if there is no special rush administrative login.

    Example: adminuser root

AllJobsFmt
(New in 102.31d)
Controls the 'rush -laj' report format, using a printf(2) style format string.

You may change the column width values, and can add or remove truncation specifications. eg. %s can be changed to %12s, %12.12s, %-12.12s, etc.

You may not add or remove '%s' fields, or change the data types (eg. do not change '%s' to '%d', etc). It is recommended that you do not use control sequences such as '\n', '\r', or ansi sequences such as '\033[1m', as these will negatively affect irush(1)'s presentations.

This string must be enclosed in double quotes. To represent double quotes in the string, you can use \".

Warnings and Caveats:

  • Making typos or improper modifications to this string can cause the rushd(1) daemon and/or rush(1) to memory fault. Change this string as carefully as you would change a printf() statement in a C program. No type checking is done to your modifications, so make changes to this string carefully, and double check your work.

  • AllJobsFmt, AllJobsHeader1 and AllJobsHeader2 all must agree on the width of columns, or irush(1) will not parse columns correctly. Make sure when you modify column widths that the headings correctly align with your format string.

  • Heading names must not contain spaces. Irush uses the first character of each word in the AllJobsHeader1 field to determine column boundaries for the purposes of sorting.

  • Modification of format string's left-hand verses right-hand justification of any field from the defaults *may* affect irush's ability to parse fields correctly from columns. Make sure irush(1) and other such tools that parse the report all still work correctly after you make modifications. Report white space parsing problems if you encounter them.

AllJobsHeader1
(New in 102.31d)
Sets the header string for line #1 of 'rush -laj' reports.

Basically this should be a string that correctly identifies the column headings for the 'rush -laj' report.

This setting is printed literally to the screen; no printf() style formatting expansions will be made, ie. \n, \r, \t, %s, will all be printed literally.

See the caveats and warnings for the AllJobsFmt command for other relevant warnings.

Each column heading should be a single word (ie. column heading names should NOT contain spaces), as this will affect the ability of irush to detect the report columns for sorting.

This string must be enclosed in double quotes. To represent double quotes in the string, you must use \".

This heading can be disabled from the reports by setting this field to "-", which means this header line WILL NOT be printed. However, doing so may negatively impact programs such as irush(1), www-rush(1), and other tools that parse the reports, expecting to skip over headers.

AllJobsHeader2
(New in 102.31d)
Sets the header string for line #2 of 'rush -laj' reports.

This is normally the 'dotted line' header that delineates columns for easy viewing. Try to follow the example of the default setting.

This setting is printed literally to the screen; no printf() style formatting expansions will be made, ie. \n, \r, \t, %s, will all be printed literally.

See the caveats and warnings for the AllJobsFmt command for other relevant warnings.

This string must be enclosed in double quotes. To represent double quotes in the string, you must use \".

This heading can be disabled from the reports by setting this field to "-", which means this header line WILL NOT be printed. However, doing so may negatively impact programs such as irush(1), www-rush(1), and other tools that parse the reports, expecting to skip over headers.

AllowPush
Allows the sysadmin to enable (or disable) the 'rush -push' feature. If enabled, the admin can release files like the rush.conf, license.dat, and rush 'hosts' files to an entire network easily and quickly.

    Example: allowpush yes

AppHostCache
rush(1) 's host caching option; only affects the rush(1) client application's method of host caching. Can be none or demand.

    Example: apphostcache demand

ClientPort
Obsolete. Remove from all rush.conf files.

CpuAcctPath
Path to cpu accounting file.

Set to '-' to disable generation of cpu accounting data.

Examples:

    os=windows cpuacctpath c:/rush/var/cpu.acct          # WinNT
    os=unix    cpuacctpath /usr/local/rush/var/cpu.acct  # Unix
      

DaemonHostCache
rushd(8)'s hostname caching options. Only affects the way the daemon caches information. 

Options can be none, demand or boot:

  • none  - No caching. Use local OS for lookups.
  • demand - Cache on demand, or whenever new hostlist is reloaded.
  • boot - Cache on boot, or whenever hostlist reloaded.

Example: daemonhostcache boot

DisableFu
Allows administrator to control whether users can use 'rush -fu' and $RUSH_FU to control other people's jobs.

disablefu 1 prevents users from controlling each other's jobs by disabling 'rush -fu' and $RUSH_FU; only root and adminuser can control any jobs.

disablefu 0 allows users to control each other's jobs, as well as root and adminuser.

Normally, users should be able to control each other jobs, allowing local policies, peer pressure (and auditing daemon logs) to prevent pandemonium.

Example: disablefu 0

DisablePflags
Allows administrator to control whether users can use 'k' or 'a' priorities in their cpu specifications.
	disablepflags -      # allows users to use any priority flags (default).

	disablepflags ka     # disables users from being able to use either the 'k' (kill) 
	                     # or 'a' (almighty) priority flags.

	disablepflags k      # disables users from being able to use the 'k' (kill) 
	                     # priority flag.
	
Examples:
    disablepflags -
    disablepflags ka

ExpandCpus
Allows administrator to control whether hostgroups are expanded to include one or all processors on each host.

  • expandcpus 1 causes hostgroups to expand all processors on each host.
  • expandcpus 0 causes hostgroups to expand to only one processor for each host.

Example: expandcpus 1

ForceGid

Forces all user processes to run as this GID. Default is -1, allowing user processes to run as the GID of the user who submitted the job.

Examples:

    forcegid -1     # Disabled
    forcegid 100    # Force the gid to 100 for all processes
      

ForceUid

Forces all user processes to run as this UID. Default is -1, allowing user processes to run as the UID of the user who submitted the job.

Examples:

    forceuid -1     # Disabled
    forceuid 100    # Force the UID to 100 for all processes
      

GidRange
Controls gid values the same way UidRange controls uid values.

Example: gidrange 100 65000

HourlyConsole
Controls whether hourly console messages are printed or not. Can be either 'on' or 'off'.

    Example: hourlyconsole on

InMaxMsgs
How many messages(tcp/udp) maximum are received and handled at a time before doing other daemon operations. e.g.:
    while ( daemon loop )
    {
	// Read messages from remotes
	for ( t=0; t < inmaxmsgs; t++ )
	{
	    select(.. inbuf ..)	
	    if ( inbuf is empty ) break;
	    HandleMessage(inbuf);
	}

	// Do other stuff
	...
    }

This puts a limit on how many messages to read, so rush doesn't spend too much time reading data from remotes. Since the whole network can potentially generate more traffic than the rush daemon can possibly deal with, a limit must be enforced.

    Example: inmaxmsgs 30

JobFmt
TBD.

JobHeader1
TBD.

JobHeader2
TBD.

JobUpdateThrottle
Doesn't advertise jobs' cpus faster than jobthrottlesecs. The daemon will re-advertise cpus that haven't been acknowledged by the remotes at about this rate. 

Example: jobupdatethrottle 10

JobPassTimeout
The 'jobpasstimeout' value configures how many seconds the task will remain in the JOBPASS state before re-entering an IDLE state by itself.

When a task on a remote cpu becomes IDLE, it tries to convince a job to use its cpu. If the job 'passes' on this request (no more frames to render, etc), the remote task enters a JOBPASS state, to avoid contacting the job again for a while. After the timeout period, the task re-enters an IDLE state to see if maybe the job had a FAIL frame, and has more frames to render after all.

Example: jobpasstimeout 150

LogFlags
Configures daemon logging features; not to be confused with submit script LogFlags.

Most flags are debugging flags used to track operation of the system. Flags can be combined to enable multiple debugging features. 

LogFlags affect both the daemon AND user applications. To affect only the daemon, specify flags on daemon's command line, or use 'rush -dlog <flags>'.

See Logging Flags Table for a complete list of all the one letter log flags.

    Example: logflags jE

LogRotateHour
Sets the hour (0-23) that the logs automatically rotate. A value of -1 disables automatic log rotation.

    Example: logrotatehour 0

MaxNewTaskMsgs
Chokes the maximum number of 'newtask' advertisements the job server sends out to the remotes at a time.

Absolutely must be greater than 0.

Should be a value around 30 or so.

Too low causes new jobs to take a while to establish processors. Too high floods remote's input buffers with 'newtask' advertisements.

    Example: maxnewtaskmsgs 20

NtRushGid
The gid used if an NT submitted job is to run on Unix machines.

Since the NT version of rush doesn't know how to map the name 'ntrush' to the equivalent gid value, NtRushGid is used to resolve it. Basically, this value should be the same as the gid value for the Unix user 'ntrush'.

    Example: ntrushgid 100

NtRushUid
The uid used if an NT submitted job is to run on Unix machines.

Since the NT version of rush doesn't know how to map the name 'ntrush' to the equivalent uid value, NtRushUid is used to resolve it. Basically, this value should be the same as the uid value for the Unix user 'ntrush'.

    Example: ntrushuid 100

ServerPort
Set the rushd(1) server daemon's port numbers for UDP/TCP connections. 

Though unnecessary for proper operation of the render queue, you should register the ServerPort value in your /etc/services file, e.g.:

	rushd 696/tcp    # rush render queue
	rushd 696/udp    # rush render queue

Example: serverport 696

SmtpDebug
(New in 102.31d)
Enables debugging of the SMTP mail transfers (eg. 'donemail' deliveries) to the log file $RUSH_DIR/var/mail.log.

This aids in debugging problems with email delivery, so one can see the entire email transaction with the configured SmtpServer.

This is a Windows specific flag.

Examples:

    os=windows smtpdebug -  # Disable SMTP debugging
    os=windows smtpdebug t  # Enable SMTP tcp transactions
      

SmtpFrom
(New in 102.31d)
Sets the 'from' address for all emails sent by rush. This also affects the "Errors-To:", "Reply-To", and "Return-Path" fields of the messages, controlling where bounced email is sent if delivery fails in transit.

NOTE: If rush is unable to deliver the mail to the server, the mail is simply dropped, and an error message will appear in the $RUSH_DIR/var/mail.log.

This is a Windows specific flag.

Example:

    os=windows smtpfrom ntrush@yourdomain.com
      

SmtpPort
(New in 102.31d)
The TCP port rush uses to contact the SmtpServer to deliver mail. Unless your mail server is listening on some other report, leave this at 25.

This is a Windows specific flag.

Example:

    os=windows smtpport 25
      

SmtpServer
(New in 102.31d)
This should be set to the hostname of the SMTP server for your local network. Rush will use this server to deliver DoneMail messages.

If set to '-', mail delivery is disabled. Error messages will appear in the rushd.log whenever a user attempts to use DoneMail.

If set to '-', mail delivery is disabled. Error messages will appear This is a Windows specific flag. Windows does not have a command line oriented mail delivery agent, so rush uses it's own ($RUSH_DIR/bin/rushsendmail).

Example:

    os=windows smtpserver mail.yourdomain.com
      

TaskCleanupHours
Sets up the hour(s) of the day that rush purges orphaned tasks from the 'rush -tasklist'. This should be done once a day during early morning hours.

An example of 'taskcleanuphours 5' indicates cleanup occurs between 5:00am and 5:45am. Rush disperses the cleanup operation on a per-host basis over a 40 minute period to prevent network load.

    Example: taskcleanuphours 5

TaskKeepaliveSecs
Sets the number of seconds for jobs in the cpu server tasklists to check in with their jobs to make sure the jobs are still active. (A job might 'disappear' due to a shutdown).

Note that this value is a minimum; a random value of up to 40 minutes is added to this value to prevent 'packet storming'.

Normally this is set to 8 hours (28800 seconds), which means that once a job is submitted, every 8 hours (or so) the remote cpu servers will check back in with their job server to ensure the job is still alive.

    Example: taskkeepalivesecs 28800

TcpSockOpts
Allows administrator to set various TCP tuning values for all tcp-based connections (rush -lf/-lj/-log/-ping, etc).

Several instances of 'tcpsockopt' can be specified, to set multiple flags.

Currently, only TCP_NODELAY is recommended. Usage:

tcpsockopt <SO_OPTION> <value>

..where <SO_OPTION> is one of:

    TCP_NODELAY
      Disables Nagle algorithm, speeds up connection time for all TCP based connections by a noticable factor, when small amounts of data are involved (rush -ping, etc). WRT "Nagle" RFC 896, "delayed ACK" RFC 813, and "Hosts Communication Requirements" RFC 1122.

    SO_LINGER
      If specified, it is ALWAYS enabled, argument is the linger 'time'. See setsockopt(2) for more info.

    SO_KEEPALIVE
      Enable connected sockets to be 'kept alive'. <value> must be '1' to enable, 0 to disable.

    SO_DONTROUTE
      Not recommended. If enabled, socket connections bypass normal routing. <value> must be '1' to enable, 0 to disable.

    SO_REUSEADDR
      Not recommended. Enabled reuse of port addresses. <value> must be '1' to enable, 0 to disable.

    SO_REUSEPORT
      Not recommended. Enabled reuse of port addresses. Some platforms (Redhat 6.2) don't even support this. On those platforms, the option is ignored. <value> must be '1' to enable, 0 to disable.

    SO_SNDBUF
    SO_RCVBUF
      Not recommended. Sets teh send/receive buffer size. <value> is the number of bytes in the buffer. Rush may override the value you specify in some cases, as it may know it needs large buffers.

Example: tcpsockopts TCP_NODELAY 1

TmpDir
Allows administrator to set where rush creates the $RUSH_TMPDIR for user's jobs.

Be aware that when every frame executes, a subdirectory is created in 'tmpdir', and on completion the subdir is 'rm -rf'ed. Both are done *as the user*, so users must have write permission to this directory.

Examples:

    os=windows tmpdir c:/temp         # WinNT
    os=unix    tmpdir /var/tmp        # Unix
      

UdpMaxRetries
The number of re-transmissions until 'retry time-out' occurs

    Example: udpmaxretries 5

UdpRestTimeOut
How many secs to rest before recovering from a 'retry time-out'

    Example: udpresttimeout 40

UdpTimeout
The number of seconds between udp re-transmissions.

    Example: udptimeout 8

UidRange
Disallow render queue to run processes with a uid outside this range. First value is a minimum, second value is a maximum. 

When a job is submitted, if the user's uid value is outside the range specified here, an error message is printed and the job will not be submitted.

    Example: uidrange 100 65000




Cpu Accounting File
$RUSH_DIR/var/cpu.acct


The cpu accounting file is configured with the rush.conf file's CpuAcctPath  command. Each time a frame finishes executing, a new entry is created in the Cpu Accounting file, logging the name of the job, how long the frame ran, etc.

Cpu Accounting File Example

u  948242700 53
p  948242783 tahoe.798    WERNER/C33 erco     0106  superior 100k  122  0   0	0 27823
p  948242783 tahoe.798    WERNER/C33 erco     0107  superior 100k  122  0   0	0 27834
p  948242865 tahoe.797    KILLER     erco     0504  superior 200   121  0   0	0 27846
u  948246300 5
u  948249900 0

Process Entries


p  948242783 tahoe.798 WERNER/C33 erco  0106  superior  100k  122  0   0   0 27822
p  948242783 tahoe.798 WERNER/C33 erco  0107  superior  100k  122  0   0   0 27834
p  948242865 tahoe.797 KILLER     erco  0504  superior  200   121  0   0   0 27846
-  --------- --------- ---------- ----  ----  --------  ----  ---  -   -   - -----
|      |         |          |      |     |       |       |     |   |   |   |   |
|      |         |          |      |     |       |       |     |   |   |   |   Pid
|      |         |          |      |     |       |       |     |   |   |   |
|      |         |          |      |     |       |       |     |   |   |   Exit code
|      |         |          |      |     |       |       |     |   |   |
|      |         |          |      |     |       |       |     |   |   #Secs User Time
|      |         |          |      |     |       |       |     |   |                 
|      |         |          |      *Job  |       |       |     |   #Secs System Time
|      |         |          |      Owner |       |       |     |
|      |         |          |            |       |       |     |
|      |         |          Title of job |       |       |     #Secs Wall Clock Time
|      |         Jobid                   |       |       |
|      |                                 |       |       Priority
|      time(2) process started           |       |
|                                        |       Host that ran the process
'p' indicates 'process entry'            |
					 Frame that ran

* The job owner is not necessarily the owner of the process.
  Such is the case in windows jobs running frames on unix machines,
  or 'forceuid' configured in the rush.conf file.


Utilization Entries


u  948242700 53
u  948246300 5
-  --------- --
|      |      |
|      |      Percent of time processor(s) were busy rendering. (0-100)
|      |
|      time(2) utilization recorded
|
'u' indicates 'utilization entry' 

CAVEATS

  • 'Exit code' is normally a positive number representing the actual exit code of the process. This value will be negative if the process was signaled; the value being the signal number. If the value is negative, this usually means the process killed, segfaulted, or was bumped by a higher priority process. Commonly, the 'Exit code' will be one of:
    
      -15 - process killed with SIGTERM; someone probably manually killed it
       -9 - process killed with SIGKILL; probably bumped in a priority battle
       -3 - process killed with SIGINT; someone sent it a ^C
        0 - process did an exit(0); frame Done
        1 - process did an exit(1); frame Fail
        2 - process did an exit(2); frame Requeue
    

  • Although tempting, it is not recommend to use process execution times for cpu billing purposes. Wall clock time includes time the process may have spent waiting for network load. User and System times report the respective times spent for the Render Script only; not its sub-processes (e.g., the renderer).

    To properly bill for cpu time, you would either need to enable full-on Unix process accounting to attain accumulated cpu time for all sub-processes in the user's render script, or, create wrapper scripts that use programs like timex(2) to monitor the binary execution time of the critical render/compositor processes.

    Tools like timex(2) indicate in their documentation that they must have Unix process accounting enabled to show sub-process totals. This is usually prohibitive on production machines, due to disk resources used by the Unix process accounting system.