From: Luke Cole <luke@(email surpressed).au>
Subject: Re: rush slowness/timeouts
   Date: Mon, 02 Jan 2006 22:39:16 -0500

Msg# 1166
View Complete Thread (8 articles) | All Threads
Last Next

Hi Greg,

OK - I will investigate further and get back to you with more details- in our case the machines are Windows XP hosts, so I'm not sure howmuch control over process nice-ness that will allow.


Thanks,

Luke

On 03/01/2006, at 1:41 PM, Greg Ercolano wrote:

[posted to rush.general]

Luke Cole wrote:
When I run the rush -tasklist command on loaner16 I get a listwith almost 420 entries. Does that sound reasonable?
    Yes, that sounds normal for dual proc machine, or for two
    tier submissions.
I checked loaner16 - it is reporting 99% of the CPU as being
utilised by mayabatch.exe, so it's definitely running slowlybecause of that.
    Sounds like you should dig a little deeper, since it's normal
for renders to use 99% of the cpu; rushd shouldn't actunresponsive
    for 40 seconds unless something very extreme is going on.

    Is the rushd.log file complaining on these machines? I'd expect
to see errors like 'connection reset by peer' due to theunresponsiveness,but I'm wondering if there are other errors that might indicaterushdis stuck doing eg. bad hostname lookups, or some such. Forinstance,does running 'rush -lah' on the slow-to-respond host show ???'sforany entries in the report? That might indicate bad orunresponsive DNS.
    Does the 'rush -lah' report take a long time to come up? (might be
    symptomatic of a slow to respond name lookup system, or if OSX,
possibly you have the .local Rendezvous/Bonjours Multicast DNSdisease,in which case make sure you have HOSTNAME=<name_of_host> andnot HOSTNAME=-AUTOMATIC-
    in the /etc/hostconfig)
rushd being unresponsive due to rendering sounds like it /could/ be
    the cause IF the machine's resources are being taxed by the render
    (ie. the renders are using too much ram, or are starting too many
    threads for the number of cpus the machine has)
If maya is involved, make sure that if your machines are dualprocs,and in the rush/etc/hosts file each box is configured with '2'for CPUS,then rush will try to start two invocations of maya per box, inwhich caseyou better make sure the maya renders have the '-n 1' flag set,to prevent/each/ instance of maya from trying to use /both/ processors..!(causing therenders to step on each other, and the rest of the machine,including rushd)
Or, make sure the ram isn't being overused, causing the box tothrash,
    as that will steal cpu from everything, including the renders.

    It's normal for renders to use 99% of the cpu; the unix scheduler
    should still be able to yield cpu to rushd under those conditions.
    The only situations I can think of where rushd wouldn't be getting
    enough cpu would be:
a) The renders are running at a higher system priority(lower niceness)
           than rushd
b) The kernel is using some kind of 'decaying' scheduling,giving rushda lower priority than it should. Possibly you can fixthis by adjusting
           rushd's system priority with 'renice'
c) The machine is swapping due to the renders using moreram than they
           should be, causing other processes (like rushd) to swap out
Note that Rush's priority values (+any@200) has nothing to dowith what I'mcalling the system priority (PRI column in 'ps -lax'). The rushpriority isused by rush only to determine which user's render should bestarted next,and has nothing to do with the priority of processes oncethey're running.
The only value in rush that affects the system priority ofrunning processesis the 'nice' value the job is submitted as, which is the'niceness' the renders
    will run as.
Regarding a/b, possibly you might want to experiment withadjusting theniceness level of the renders down, so that they don't bog thebox down.
    (Won't help if your problem is 'c')
If the problem is 'c', then you might have to get more ram foryour boxes,or tell rush not to render more than one render per box, sothat onlyone render runs per box. (Or if it's one user's particular jobusing allthe ram, have them try to optimize the job so that it doesn'tuse so much ram)
--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)


---
Luke Cole
Systems Administrator / TD

FUEL International
65 King St., Newtown, Sydney NSW, Australia 2042

Last Next