Rush Logo Rush Render Queue - Hosts File
V 103.07 05/28/15
(C) Copyright 2008, 2015 Seriss Corporation. All rights reserved.
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.

Strikeout text indicates features not yet implemented




   Example Hosts File  

The following shows an example rush/etc/hosts file for a network of workstations (+work), render farm machines (+farm), and some miscellaneous machines being evaled (+eval). Some machines have maya (+maya), After Effects (+ae) and 'tahoe' is for administration use only (+admin), who's 'Cpus' field is set to '0' to prevent rendering.

Note the use of '+hostgroups' (+work,+farm,+maya,+ae..) to reflect groups of machines users commonly want to submit to, and the use of 'criteria' (linux,irix,sgi,win2k..) for platform names users might want to avoid using as a group.

    Example Hosts File
    
        # HOST CACHING
        daemon-hostcache full
        app-hostcache    demand
        negcachesecs     900
    
        #
        # The 'Host' field should contain short names for hosts (aliases are ok),
        # and must be unique.
        #
        # The 'Criteria/Hostgroups' field must *NOT* contain white space, and words are
        # comma delimited. All hosts must contain '+any' in the Criteria/Hostgroups field.  
        #
        #Hostname   Cpus  Ram   MinPri  Criteria/Hostgroups                  Options
        #--------   ----  ----  ------  ------------------------------------             ----------------
        tahoe       0     256   0       +any,+admin,linux
        ontario     1     2048  0       +any,+work,+maya,linux,linux6.0,intel            affinity=3
        superior    2     2048  0       +any,+work,+maya,windows,win7                    affinity=3
        erie        1     2048  0       +any,+work,+maya,+ae,win2k
        rf1         2     1024  0       +any,+farm,+maya,+ae,win2k
        rf2         2     1024  0       +any,+farm,+maya,+ae,win2k
        rf3         2     1024  0       +any,+farm,+maya,+ae,win2k
        rf4         2     1024  0       +any,+farm,+maya,+ae,win2k
        rf5         2     1024  0       +any,+farm,+maya,+ae,win2k
        eval1       2     1024  0       +any,+eval,mac
        eval2       2     1024  0       +any,+eval,linux
        eval3       2     1024  0       +any,+eval,windows
    	

   File Format  

The $RUSH_DIR/etc/hosts file must contain the names of all hosts that participate in rendering.

The hosts file can be updated on the fly. Simply edit the file, make changes, then use 'rush -push hosts +any' to copy the local changes to all the machines, and the daemons will pick up your modifications within a minute.

The rushadmin tool makes this easy to do; just run rushadmin, click on 'hosts' to edit the file, save when done, and hit 'Send' to send the file to the network. Or you can use these commands.

Rush won't talk to any machine not in the rush hosts file, as a security measure. So be sure all machines (including license servers) are in the rush hosts file on all machines.

The format of the hosts file is single lines of 5 white space separated fields, one line per host:

Blank lines and lines starting with '#' are ignored.

   App-HostCache  

Configures caching hostname-to-ip lookups in the rush client application.

Synopsis

    app-hostcache [options]

Options

    daemon-hostcache Options
    none No caching. Use the OS for all lookups.
    demand   Cache on demand (default)
    full Cache all hostnames first

Examples

    daemon-hostcache Examples
    app-hostcache full Caches all lookups whenever 'rush' is executed
    app-hostcache demand   Caches lookups on demand
    app-hostcache none No caching; OS does all hostname lookups

Description

Configures caching hostname-to-ip lookups in the rush client (rush). Normally this is set to 'demand', so rush(1) cache the lookups on demand, for the duration of its execution (which is usually short).

There are few situations where any setting other than 'demand' would be useful.

   Daemon-HostCache  

Configures caching hostname-to-ip lookups in the rushd daemon.

Synopsis

    daemon-hostcache [options]

Options

    daemon-hostcache Options
    none No caching. Use the OS for all lookups.
    demand   Cache on demand, or whenever new hostlist is reloaded.
    full Cache on boot, or whenever hostlist reloaded.

Examples

    daemon-hostcache Examples
    daemon-hostcache full Caches hostname lookups on boot
    daemon-hostcache demand   Caches whenever requested
    daemon-hostcache none Caching disabled

Description

Configures caching hostname-to-ip lookups in the rush daemon (rushd). Normally this is set to 'full', so the daemon caches all the hostname lookups in the rush/etc/hosts file as soon as it starts up. This prevents rushd from having to make further hostname queries from the operating system.

If set to 'none', all hostname-to-ip lookups will be done by the operating system on a per-request basis. Useful if IPs are expected to change, but this is not recommended. Also, depending on the operating system, disabling caching will pass all name lookup load onto the operating system, especially if each request involves a network transaction to do the lookup (NIS, DNS).

Since rush can potentially make dozens of hostname lookups per second when jobs are active, it is highly recommended to leave this setting at the default; 'full'.

   FastSearch  

Enables/disables fast search algorithm. (New in 103.03)

Synopsis

    fastsearch [0|1]

Examples

    daemon-hostcache Examples
    fastsearch 1 Enables fast searches. (default)
    fastsearch 0   Disabled fast searches; uses linear lookups.

Description

The fast search feature greatly speeds hostname and IP lookups. It adds some memory footprint to the daemon; for 700 hosts, adds about 1MB to the daemon.

This flag, if specified, must appear at the top of the hosts file, before the first hostname entry.

This feature is new in 103.03, and is on by default. Normally you would only specify 'fastsearch 0' to disable this feature, such as if it's suspected of causing trouble.

Caveats
The debug flag "1" can be used to track the memory use of this feature, as well as track lookup time overhead for e.g. "purge invalid jobs" operations that occur after a host reload ('rush -reload hosts'), which is greatly improved by this feature when many large jobs and numerous render nodes are present..

   ipalias  

Tells rush there is more than one IP address associated with a hostname.
So for instance, if you have a machine with multiple interfaces, and machines on multiple networks that see packets from this machine as different IPs, this command will help Rush understand multiple IPs are all for the one machine.

This command can be specified several times for one machine to configure multiple IP aliases.

Such configs can be seen in the output of 'rush -lahf' for the "IpAliases[]:" field.

Synopsis

    ipalias <hostname> <alt-ipaddress>

Examples

    ipalias Examples
    #Hostname   Cpus  Ram   MinPri  Criteria/Hostgroups
    #--------   ----  ----  ------  ------------------------------------
    tahoe       0     256   0       +any,+admin,linux
    ontario     1     2048  0       +any,+work,+maya,linux,linux6.0,intel
    superior    2     2048  0       +any,+work,+maya,sgi,irix,irix6.2
    [..]
    ipalias tahoe 192.168.5.5
    
            

This tells rush that host 'tahoe' may use 192.168.5.5 as its IP address, in addition to whatever other IP address 'tahoe' resolves to.

CAVEATS

It's usually better to configure rush to bind to a particular interface via the '<hostname>:<ipaddress/hostname>' syntax, which forces rush to process all communications on a particular IP. For more info on that, see the Hostname field for more info on the ':' syntax of forcing rush to bind to a named interface.

   NegCacheSecs  
(New in 102.42)

Sets the amount of time a bad hostname stays in the negative cache.

Synopsis

    negcachesecs [seconds]
    
Examples
    negcachesecs 900    # default 900 seconds (15 minutes)
    negcachesecs 0      # disable all negative caching
    
Description
Negative caching enables the system to remember bad hostname lookups after the first failed attempt, since bad hostname lookups can cause the rush daemons to be unresponsive while waiting for the OS to determine a hostname is unknown. This is especially a problem on Windows systems using WINS for hostname lookups, instead of DNS. (DNS is better)

With negative caching enabled to e.g. 900 seconds, a bad hostname lookup will be remembered, causing similar lookups for the same bad hostname to immediately fail during the 900 second period.

After the 900 seconds, the negative cache for that hostname is cleared, so that future lookups for the host will again be passed to the OS, in case the host comes back online.

900 is the default. 0 will completely disable negative caching.

   Hostname  
This is the name of the host, and should be the shortest name possible (e.g., host aliases can be used here).

This is the name that will be used in jobids and other cpu reports, so it is best if short names are used (10 chars or less). Longer names are ok, but will misalign columnar reports. Avoid using FQDN hostnames (e.g., foo.domain.com).

You can optionally specify an alternate network interface other than the default. Just append to the hostname a ':' followed by the name of the interface, e.g.:

    tahoe:tahoe-eth

This says 'tahoe' is the actual name of the machine (ie. hostname(1)), but rush should use tahoe's 'tahoe-eth' network interface for all Rush communications.

You can also specify an IP address after the ':' (for instance, if no hostname has been assigned to the other IP address) e.g.:

    tahoe:192.168.1.20

This says 'tahoe' should use the 192.168.1.20 interface for all Rush communications.

Such rush/etc/hosts file modifications need to be on all machines for communications to work properly.. don't just make the change to the rush/etc/hosts file on a single machine.

   Cpus  
This should be the number of cpus the host has. This is how many processes the host will run at the same time. This value can be larger or smaller than the actual number of physical cpus the machine has.

'0' is an acceptable value that effectively disables the machine from participating in rendering, while allowing the host to remain in the hosts file.

   Ram  
This is the amount of ram rush should think the machine has.
This RAM value can be less or more than the actual ram the machine has. Usually this value takes into account some percentage of the machine's swap space as well.

This value is used by rush to determine if jobs can fit on the machine. When users submit their jobs, they can optionally specify the amount of ram their job needs using the Ram submit command. (In the example submit scripts, this is the "Ram:" prompt) A job that asks to render on a machine wanting more ram than the machine has configured will be turned away.

Rush keeps a running total of the amount ram the machine has available, so that on multiprocessor machines when rush starts a frame running, it knows how much ram is available for other frames.

At render time, the job's Ram value is compared to the machine's configured RAM value.. if the job asks to use more ram than the machine has, the job will not render on this machine. If the job can run on the machine, the job's ram value is subtracted from the machine's total RAM to keep a running total, so rush knows whether it can start other frames running on available CPU slots, until there's either no more ram or cpus available.

On multiprocessor machines, this value is a total from which rendering frames subtract their estimated ram use. For instance, if a 4 cpu machine is configured with a Ram value of 4000, and 2 frames are currently rendering each with their job's ram value requesting 1500, then 3000 will be assumed 'used' by the two frames, leaving only 1000 left over for the other 2 CPU slots. (1000 = 4000 - ( 1500 x 1500 ) ).

   MinimumPriority  
Use this value to set a limit on the minimum priority a job must have to render on this machine.

Useful where you want to prevent people from rendering on workstations unless they are of at least a certain priority, or if you want to allow only the local workstation user to submit to their own workstation using a policy enforced priority value.

A value of '0' allows all jobs. A value of '900' will only allow renders with a priority of 900 or above; renders with less than that will be turned away.

   Criteria/Hostgroups  
This is a list of comma separated strings that define platform or operating system specific features for the host. These can be arbitrary alpha-numeric strings that may also contain dashes, underbars and periods, but must not contain any whitespace. '+' characters have the special purpose of leading off a Host Group specification.

The <Criteria/Hostgroups> field might be set to:

+any,linux,linux6.1,prman3.7
These strings can then be used in TD's submit scripts to limit which hosts will render their frames. See the Criteria Submit Script command for more info. All hosts should have a criteria entry that at least contains +any.

Changing the '+any' to '+offline' has the useful effect of temporarily removing a machine from showing up in 'rushtop', 'All Cpus' and 'All Jobs' reports, without removing it from the rush/etc/hosts file. This is useful if a machine will be taken down for service for a short time.

Host Group names are configured in this field, too. To add a hostgroup called +servers to the above example:

+any,linux,linux6.1,prman3.7,+servers

   Cpu Affinity  
(New in Rush 102.42a9c)

Processor affinity (or CPU Pinning) enables locking processes to use specific cpus. By default (when unspecified), the operating system micromanages the realtime scheduling of processes on a multiprocessor machine, assigning them to available cpus dynamically.

If you set a cpu affinity for a machine, this tells Rush to run renders /only/ on the specified physical processors. This is sometimes useful on e.g. workstations where you want to assign only some of the machine's physical processors for rendering, leaving some available for use by the user.

This is an optional field that can appear to the right of the cpu/criteria in the Rush hosts file, and is of the form:

    affinity=#
    

..where '#' is a hexadecimal value representing the bit field positions of the cpus to lock the processes to. For instance, a value of '1' tells Rush to use cpu #0 only, a value of '3' tells Rush to use cpus #0 and #1, and a value of '7' to use cpus #0, #1 and #2.

When renders run on the machine, you should be able to clearly see your cpu affinity setting in e.g. rushtop(1); the render's cpu use should only show up on the cpus you assigned Rush to use.

Note that not all operating systems support cpu affinity. As of this writing (Oct 2012), only Windows and Linux support cpu affinity, OSX does not.

By default when this field is unspecified, the operating system will control which cpus Rush renders use, and all processors are available to the OS for rendering.

Only use cpu affinity if you want to force renders to only use certain physical processors to ensure other processors are always available for other purposes.

A good example would be an 8 processor workstation, where the user might want 4 processors available for interactive use, while allowing the other 4 cpus for rendering. In such a case you would want 'affinity=f' to lock Rush to using cpus #0, #1, #2 and #3.

Common values you might want to use:

        affinity=1    -- use only one processor (cpu #0)
	affinity=3    -- use 2  processors (cpu #0 and #1)
	affinity=7    -- use 3  processors (cpu #0, #1, #2)
	affinity=f    -- use 4  processors (cpu #0, #1, #2, #3)
	affinity=1f   -- use 5  processors (cpu #0, #1, #2, #3, #4)
	affinity=3f   -- use 6  processors (cpu #0, #1, #2, #3, #4, #5)
	affinity=7f   -- use 7  processors (cpu #0, #1, #2, #3, #4, #5, #6)
	affinity=ff   -- use 8  processors (cpu #0, #1, #2, #3, #4, #5, #6, #7)
	affinity=1ff  -- use 9  processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8)
	affinity=3ff  -- use 10 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9)
	affinity=7ff  -- use 11 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10)
	affinity=fff  -- use 12 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11)
	affinity=1fff -- use 13 processors (cpu #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12)
	..etc..
    

Here's a table of incremental affinity values. Note that the values are bit fields, and not a cpu count. This is so that one can uniquely specify any arrangement of physical cpus to be enabled:


Affinity
Value

CPU#
Enabled

Binary
Equiv
-- Cpu# --
0 1 2 3 4 5
- - - - - -


-- Total Cpus --
1 0 00001 X - - - - - 1
2 1 00010 - X - - - - 1
3 0,1 00011 X X - - - - 2
4 2 00100 - - X - - - 1
5 0,2 00101 X - X - - - 2
6 1,2 00110 - X X - - - 2
7 0,1,2 00111 X X X - - - 3
8 3 01000 - - - X - - 1
9 0,3 01001 X - - X - - 2
a 1,3 01010 - X - X - - 2
b 0,1,3 01011 X X - X - - 3
: : :
e 1,2,3 01110 - X X X - - 3
f 0,1,2,3 01111 X X X X - - 4
10 4 10000 - - - - X - 1
11 0,4 10001 X - - - X - 2
: : :

   Negative Cache Description  
What is a 'negative cache'? (New in Rush 102.42)

Hostname lookups are a big part of rush's operation, and it's important that hostname lookups occur quickly.

When a bad hostname is passed to rush, it quickly determines its not in the hosts file, and passes the name on to DNS for resolution. DNS lookups are expensive; it can take a while for the DNS server to determine a hostname is unknown. This will cause rush to spend a few seconds waiting for DNS to say it's an unknown hostname.

During this time, the daemon may be unresponsive.

In situations where a misconfigured script or user typo is repeated many times, these repeat bad lookups can effectively end up causing a 'denial of service'.

By using a "negative cache", Rush keeps track of unknown hostnames, so when a bad hostname is requested a second time, the daemon will immediately indicate the hostname is bad, based on its 'negative cache', avoiding the expensive DNS lookups and unresponsiveness. The reasoning being, if DNS just said the hostname was unknown 5 seconds ago, it is likely that will still be the case now, so just save DNS the trouble, and indicate it's unknown immediately.

The only problem with this is if the sysadmin recently added a new host to DNS.. if the bad name is now a good name, the negative cache will prevent the name from being passed to DNS and being resolved.

The solution is to make the negative cache timeout automatically after a given amount of time. By default, the first request for a bad hostname (ie. a hostname that fails a DNS lookup) will be 'negative cached' in the daemon for up to 15 minutes. The 15 minute value is configurable via the rush hosts file) via the negcachesecs value.

The 'negative cache' for a daemon can be inspected via 'rush -lah <hostname>' command; the current negative cache will be shown at the bottom of the 'rush -lah' report. (See link for examples)

When the 15 minutes are up, the daemon clears that entry from the cache, so the next request will be checked through DNS again, just in case it's now a valid hostname. (if it's not, it will be negative cached again, as above)

Whenever the rush/etc/hosts file is reloaded, rush will automatically clear the negative cache. This will happen within 60 seconds of the hosts file's date stamp changing, or if the sysadmin invokes the 'rush -reload' command. To force the entire network to clear the negative cache, and reload the rush/etc/hosts files without sending out a new hosts file, the sysadmin can use 'rush -reload hosts +any -t 4'.