Rush - Hosts File

Rush Render Queue - Hosts File
V 103.08p3 09/01/17
(C) Copyright 2008, 2017 Seriss Corporation. All rights reserved.
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.

Strikeout text indicates features not yet implemented

Hosts File
$RUSH_DIR/etc/hosts

Example
File Format
app-hostcache -- Control rush's hostname caching for command line tools
fastsearch -- Enable/disable fast search hostname lookups (New in 103.03)
daemon-hostcache -- Control rush's hostname caching for the daemon
ipalias -- A way to specify multiple IP addresses for a hostname
negcachesecs -- Control how long rush's 'negative' hostname cache timeout is
Host entries: Hostname, Cpus, Ram, MinPri, HostGroup/Criteria Cpu Affinity
Negative Cache description

Example Hosts File

The following shows an example rush/etc/hosts file for a network of workstations (+work), render farm machines (+farm), and some miscellaneous machines being evaled (+eval). Some machines have maya (+maya), After Effects (+ae) and 'tahoe' is for administration use only (+admin), who's 'Cpus' field is set to '0' to prevent rendering.
Note the use of '+hostgroups' (+work,+farm,+maya,+ae..) to reflect groups of machines users commonly want to submit to, and the use of 'criteria' (linux,irix,sgi,win2k..) for platform names users might want to avoid using as a group.

Example Hosts File


    # HOST CACHING
    daemon-hostcache full
    app-hostcache    demand
    negcachesecs     900
    fastsearch       1

    #
    # The 'Host' field should contain short names for hosts (aliases are ok),
    # and must be unique.
    #
    # The 'Criteria/Hostgroups' field must *NOT* contain white space, and words are
    # comma delimited. All hosts must contain '+any' in the Criteria/Hostgroups field.  
    #
    #Hostname   Cpus  Ram   MinPri  Criteria/Hostgroups                   Options
    #--------   ----  ----  ------  ------------------------------------  ----------------
    tahoe       0     256   0       +any,+admin,linux
    ontario     1     2048  0       +any,+work,+maya,linux,linux6.0,intel affinity=3
    superior    2     2048  0       +any,+work,+maya,windows,win7         affinity=3
    erie        1     2048  0       +any,+work,+maya,+ae,win2k
    rf1         2     1024  0       +any,+farm,+maya,+ae,win2k
    rf2         2     1024  0       +any,+farm,+maya,+ae,win2k
    rf3         2     1024  0       +any,+farm,+maya,+ae,win2k
    rf4         2     1024  0       +any,+farm,+maya,+ae,win2k
    rf5         2     1024  0       +any,+farm,+maya,+ae,win2k
    eval1       2     1024  0       +any,+eval,mac
    eval2       2     1024  0       +any,+eval,linux
    eval3       2     1024  0       +any,+eval,windows

File Format

The $RUSH_DIR/etc/hosts file must contain the names of all hosts that participate in rendering.

The hosts file can be updated on the fly. Simply edit the file, make changes, then use 'rush -push hosts +any' to copy the local changes to all the machines, and the daemons will pick up your modifications within a minute.
The rushadmin tool makes this easy to do; just run rushadmin, click on 'hosts' to edit the file, save when done, and hit 'Send' to send the file to the network. Or you can use these commands.
Rush won't talk to any machine not in the rush hosts file, as a security measure. So be sure all machines (including license servers) are in the rush hosts file on all machines.
The format of the hosts file is single lines of 5 white space separated fields, one line per host:

<Hostname> <#Cpus> <Ram> <Minimum Priority> <Criteria/Hostgroups>

Blank lines and lines starting with '#' are ignored.

   App-HostCache

Configures caching hostname-to-ip lookups in the rush client application.

Synopsis

app-hostcache [options]

Options

daemon-hostcache Options
none No caching. Use the OS for all lookups.
demand   Cache on demand (default)
full Cache all hostnames first

Examples

daemon-hostcache Examples
app-hostcache full Caches all lookups whenever 'rush' is executed
app-hostcache demand   Caches lookups on demand
app-hostcache none No caching; OS does all hostname lookups

Description

Configures caching hostname-to-ip lookups in the rush client (rush). Normally this is set to 'demand', so rush(1) cache the lookups on demand, for the duration of its execution (which is usually short).
There are few situations where any setting other than 'demand' would be useful.

   Daemon-HostCache

Configures caching hostname-to-ip lookups in the rushd daemon.

Synopsis

daemon-hostcache [options]

Options

daemon-hostcache Options
none No caching. Use the OS for all lookups.
demand   Cache on demand, or whenever new hostlist is reloaded.
full Cache on boot, or whenever hostlist reloaded.

Examples

daemon-hostcache Examples
daemon-hostcache full Caches hostname lookups on boot
daemon-hostcache demand   Caches whenever requested
daemon-hostcache none Caching disabled

Description

Configures caching hostname-to-ip lookups in the rush daemon (rushd). Normally this is set to 'full', so the daemon caches all the hostname lookups in the rush/etc/hosts file as soon as it starts up. This prevents rushd from having to make further hostname queries from the operating system.
If set to 'none', all hostname-to-ip lookups will be done by the operating system on a per-request basis. Useful if IPs are expected to change, but this is not recommended. Also, depending on the operating system, disabling caching will pass all name lookup load onto the operating system, especially if each request involves a network transaction to do the lookup (NIS, DNS).
Since rush can potentially make dozens of hostname lookups per second when jobs are active, it is highly recommended to leave this setting at the default; 'full'.

FastSearch

Enables/disables fast search algorithm. (New in 103.03)

Synopsis

fastsearch [0|1]

Examples

daemon-hostcache Examples
fastsearch 1 Enables fast searches. (default)
fastsearch 0 Disabled fast searches; uses linear lookups.

Description

The fast search feature greatly speeds hostname and IP lookups. It adds some memory footprint to the daemon; for 700 hosts, adds about 1MB to the daemon.
This flag, if specified, must appear at the top of the hosts file, before the first hostname entry.
This feature is new in 103.03, and is on by default. Normally you would only specify 'fastsearch 0' to disable this feature, such as if it's suspected of causing trouble.
Caveats
The debug flag "1" can be used to track the memory use of this feature, as well as track lookup time overhead for e.g. "purge invalid jobs" operations that occur after a host reload ('rush -reload hosts'), which is greatly improved by this feature when many large jobs and numerous render nodes are present..

ipalias

Tells rush there is more than one IP address associated with a hostname.
So for instance, if you have a machine with multiple interfaces, and machines on multiple networks that see packets from this machine as different IPs, this command will help Rush understand multiple IPs are all for the one machine.
This command can be specified several times for one machine to configure multiple IP aliases.
Such configs can be seen in the output of 'rush -lahf' for the "IpAliases[]:" field.

Synopsis

ipalias <hostname> <alt-ipaddress>

Examples

ipalias Examples

#Hostname Cpus Ram MinPri Criteria/Hostgroups #-------- ---- ---- ------ ------------------------------------ tahoe 0 256 0 +any,+admin,linux ontario 1 2048 0 +any,+work,+maya,linux,linux6.0,intel superior 2 2048 0 +any,+work,+maya,sgi,irix,irix6.2 [..] ipalias tahoe 192.168.5.5

This tells rush that host 'tahoe' may use 192.168.5.5 as its IP address, in addition to whatever other IP address 'tahoe' resolves to.

CAVEATS
It's usually better to configure rush to bind to a particular interface via the '<hostname>:<ipaddress/hostname>' syntax, which forces rush to process all communications on a particular IP. For more info on that, see the Hostname field for more info on the ':' syntax of forcing rush to bind to a named interface.

NegCacheSecs

(New in 102.42)
Sets the amount of time a bad hostname stays in the negative cache.
Synopsis
negcachesecs [seconds]
Examples
negcachesecs 900 # default 900 seconds (15 minutes) negcachesecs 0 # disable all negative caching
Description
Negative caching enables the system to remember bad hostname lookups after the first failed attempt, since bad hostname lookups can cause the rush daemons to be unresponsive while waiting for the OS to determine a hostname is unknown. This is especially a problem on Windows systems using WINS for hostname lookups, instead of DNS. (DNS is better)
With negative caching enabled to e.g. 900 seconds, a bad hostname lookup will be remembered, causing similar lookups for the same bad hostname to immediately fail during the 900 second period.
After the 900 seconds, the negative cache for that hostname is cleared, so that future lookups for the host will again be passed to the OS, in case the host comes back online.
900 is the default. 0 will completely disable negative caching.

Hostname

This is the name of the host, and should be the shortest name possible (e.g., host aliases can be used here).
This is the name that will be used in jobids and other cpu reports, so it is best if short names are used (10 chars or less). Longer names are ok, but will misalign columnar reports. Avoid using FQDN hostnames (e.g., foo.domain.com).
You can optionally specify an alternate network interface other than the default. Just append to the hostname a ':' followed by the name of the interface, e.g.:

tahoe:tahoe-eth

This says 'tahoe' is the actual name of the machine (ie. hostname(1)), but rush should use tahoe's 'tahoe-eth' network interface for all Rush communications.
You can also specify an IP address after the ':' (for instance, if no hostname has been assigned to the other IP address) e.g.:
tahoe:192.168.1.20

This says 'tahoe' should use the 192.168.1.20 interface for all Rush communications.
Such rush/etc/hosts file modifications need to be on all machines for communications to work properly.. don't just make the change to the rush/etc/hosts file on a single machine.

Cpus

This should be the number of cpus the host has. This is how many processes the host will run at the same time. This value can be larger or smaller than the actual number of physical cpus the machine has.
'0' is an acceptable value that effectively disables the machine from participating in rendering, while allowing the host to remain in the hosts file.

Ram

This is the amount of ram rush should think the machine has.
This RAM value can be less or more than the actual ram the machine has. Usually this value takes into account some percentage of the machine's swap space as well.
This value is used by rush to determine if jobs can fit on the machine. When users submit their jobs, they can optionally specify the amount of ram their job needs using the Ram submit command. (In the example submit scripts, this is the "Ram:" prompt) A job that asks to render on a machine wanting more ram than the machine has configured will be turned away.
Rush keeps a running total of the amount ram the machine has available, so that on multiprocessor machines when rush starts a frame running, it knows how much ram is available for other frames.
At render time, the job's Ram value is compared to the machine's configured RAM value.. if the job asks to use more ram than the machine has, the job will not render on this machine. If the job can run on the machine, the job's ram value is subtracted from the machine's total RAM to keep a running total, so rush knows whether it can start other frames running on available CPU slots, until there's either no more ram or cpus available.
On multiprocessor machines, this value is a total from which rendering frames subtract their estimated ram use. For instance, if a 4 cpu machine is configured with a Ram value of 4000, and 2 frames are currently rendering each with their job's ram value requesting 1500, then 3000 will be assumed 'used' by the two frames, leaving only 1000 left over for the other 2 CPU slots. (1000 = 4000 - ( 1500 x 1500 ) ).

MinimumPriority

Use this value to set a limit on the minimum priority a job must have to render on this machine.
Useful where you want to prevent people from rendering on workstations unless they are of at least a certain priority, or if you want to allow only the local workstation user to submit to their own workstation using a policy enforced priority value.
A value of '0' allows all jobs. A value of '900' will only allow renders with a priority of 900 or above; renders with less than that will be turned away.

Criteria/Hostgroups

This is a list of comma separated strings that define platform or operating system specific features for the host. These can be arbitrary alpha-numeric strings that may also contain dashes, underbars and periods, but must not contain any whitespace. '+' characters have the special purpose of leading off a Host Group specification.
The <Criteria/Hostgroups> field might be set to:
+any,linux,linux6.1,prman3.7
These strings can then be used in TD's submit scripts to limit which hosts will render their frames. See the Criteria Submit Script command for more info. All hosts should have a criteria entry that at least contains +any.
Changing the '+any' to '+offline' has the useful effect of temporarily removing a machine from showing up in 'rushtop', 'All Cpus' and 'All Jobs' reports, without removing it from the rush/etc/hosts file. This is useful if a machine will be taken down for service for a short time.
Host Group names are configured in this field, too. To add a hostgroup called +servers to the above example:
+any,linux,linux6.1,prman3.7,+servers

Cpu Affinity

(New in Rush 102.42a9c)
Processor affinity (or CPU Pinning) enables locking processes to use specific cpus. By default (when unspecified), the operating system micromanages the realtime scheduling of processes on a multiprocessor machine, assigning them to available cpus dynamically.
If you set a cpu affinity for a machine, this tells Rush to run renders /only/ on the specified physical processors. This is sometimes useful on e.g. workstations where you want to assign only some of the machine's physical processors for rendering, leaving some available for use by the user.
This is an optional field that can appear to the right of the cpu/criteria in the Rush hosts file, and is of the form:
affinity=#
..where '#' is a hexadecimal value representing the bit field positions of the cpus to lock the processes to. For instance, a value of '1' tells Rush to use cpu #0 only, a value of '3' tells Rush to use cpus #0 and #1, and a value of '7' to use cpus #0, #1 and #2.
When renders run on the machine, you should be able to clearly see your cpu affinity setting in e.g. rushtop(1); the render's cpu use should only show up on the cpus you assigned Rush to use.
Note that not all operating systems support cpu affinity. As of this writing (Oct 2012), only Windows and Linux support cpu affinity, OSX does not.
By default when this field is unspecified, the operating system will control which cpus Rush renders use, and all processors are available to the OS for rendering.
Only use cpu affinity if you want to force renders to only use certain physical processors to ensure other processors are always available for other purposes.
A good example would be an 8 processor workstation, where the user might want 4 processors available for interactive use, while allowing the other 4 cpus for rendering. In such a case you would want 'affinity=f' to lock Rush to using cpus #0, #1, #2 and #3.
Common values you might want to use:
affinity=1 -- use only one processor (cpu #0) affinity=3 -- use 2 processors (cpu #0 and #1) affinity=7 -- use 3 processors (cpu #0, #1, #2) affinity=f -- use 4 processors (cpu #0, #1, #2, #3) affinity=1f -- use 5 processors (cpu #0, #1, #2, #3, #4) affinity=3f -- use 6 processors (cpu #0, #1, #2, #3, #4, #5) affinity=7f -- use 7 processors (cpu #0, #1, #2, #3, #4, #5, #6) affinity=ff -- use 8 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7) affinity=1ff -- use 9 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8) affinity=3ff -- use 10 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9) affinity=7ff -- use 11 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10) affinity=fff -- use 12 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11) affinity=1fff -- use 13 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12) ..etc..
Here's a table of incremental affinity values. Note that the values are bit fields, and not a cpu count. This is so that one can uniquely specify any arrangement of physical cpus to be enabled:

Affinity
Value
CPU#
Enabled
Binary
Equiv -- Cpu# --
0 1 2 3 4 5
- - - - - -

-- Total Cpus --
1 0 00001 X - - - - - 1
2 1 00010 - X - - - - 1
3 0,1 00011 X X - - - - 2
4 2 00100 - - X - - - 1
5 0,2 00101 X - X - - - 2
6 1,2 00110 - X X - - - 2
7 0,1,2 00111 X X X - - - 3
8 3 01000 - - - X - - 1
9 0,3 01001 X - - X - - 2
a 1,3 01010 - X - X - - 2
b 0,1,3 01011 X X - X - - 3
: : :
e 1,2,3 01110 - X X X - - 3
f 0,1,2,3 01111 X X X X - - 4
10 4 10000 - - - - X - 1
11 0,4 10001 X - - - X - 2
: : :

Negative Cache Description

What is a 'negative cache'? (New in Rush 102.42)
Hostname lookups are a big part of rush's operation, and it's important that hostname lookups occur quickly.
When a bad hostname is passed to rush, it quickly determines its not in the hosts file, and passes the name on to DNS for resolution. DNS lookups are expensive; it can take a while for the DNS server to determine a hostname is unknown. This will cause rush to spend a few seconds waiting for DNS to say it's an unknown hostname.
During this time, the daemon may be unresponsive.
In situations where a misconfigured script or user typo is repeated many times, these repeat bad lookups can effectively end up causing a 'denial of service'.
By using a "negative cache", Rush keeps track of unknown hostnames, so when a bad hostname is requested a second time, the daemon will immediately indicate the hostname is bad, based on its 'negative cache', avoiding the expensive DNS lookups and unresponsiveness. The reasoning being, if DNS just said the hostname was unknown 5 seconds ago, it is likely that will still be the case now, so just save DNS the trouble, and indicate it's unknown immediately.
The only problem with this is if the sysadmin recently added a new host to DNS.. if the bad name is now a good name, the negative cache will prevent the name from being passed to DNS and being resolved.
The solution is to make the negative cache timeout automatically after a given amount of time. By default, the first request for a bad hostname (ie. a hostname that fails a DNS lookup) will be 'negative cached' in the daemon for up to 15 minutes. The 15 minute value is configurable via the rush hosts file) via the negcachesecs value.
The 'negative cache' for a daemon can be inspected via 'rush -lah <hostname>' command; the current negative cache will be shown at the bottom of the 'rush -lah' report. (See link for examples)
When the 15 minutes are up, the daemon clears that entry from the cache, so the next request will be checked through DNS again, just in case it's now a valid hostname. (if it's not, it will be negative cached again, as above)
Whenever the rush/etc/hosts file is reloaded, rush will automatically clear the negative cache. This will happen within 60 seconds of the hosts file's date stamp changing, or if the sysadmin invokes the 'rush -reload' command. To force the entire network to clear the negative cache, and reload the rush/etc/hosts files without sending out a new hosts file, the sysadmin can use 'rush -reload hosts +any -t 4'.