From: Luke Cole <luke@(email surpressed).au>
Subject: Re: rush slowness/timeouts
   Date: Mon, 02 Jan 2006 22:39:16 -0500
Msg# 1166
View Complete Thread (8 articles) | All Threads
Last Next
Hi Greg,

OK - I will investigate further and get back to you with more details - in our case the machines are Windows XP hosts, so I'm not sure how much control over process nice-ness that will allow.

Thanks,

Luke

On 03/01/2006, at 1:41 PM, Greg Ercolano wrote:

[posted to rush.general]

Luke Cole wrote:

When I run the rush -tasklist command on loaner16 I get a list with almost 420 entries. Does that sound reasonable?


    Yes, that sounds normal for dual proc machine, or for two
    tier submissions.


I checked loaner16 - it is reporting 99% of the CPU as being
utilised by mayabatch.exe, so it's definitely running slowly because of that.


    Sounds like you should dig a little deeper, since it's normal
for renders to use 99% of the cpu; rushd shouldn't act unresponsive
    for 40 seconds unless something very extreme is going on.

    Is the rushd.log file complaining on these machines? I'd expect
to see errors like 'connection reset by peer' due to the unresponsiveness, but I'm wondering if there are other errors that might indicate rushd is stuck doing eg. bad hostname lookups, or some such. For instance, does running 'rush -lah' on the slow-to-respond host show ???'s for any entries in the report? That might indicate bad or unresponsive DNS.
    Does the 'rush -lah' report take a long time to come up? (might be
    symptomatic of a slow to respond name lookup system, or if OSX,
possibly you have the .local Rendezvous/Bonjours Multicast DNS disease, in which case make sure you have HOSTNAME=<name_of_host> and not HOSTNAME=-AUTOMATIC-
    in the /etc/hostconfig)

rushd being unresponsive due to rendering sounds like it / could/ be
    the cause IF the machine's resources are being taxed by the render
    (ie. the renders are using too much ram, or are starting too many
    threads for the number of cpus the machine has)

If maya is involved, make sure that if your machines are dual procs, and in the rush/etc/hosts file each box is configured with '2' for CPUS, then rush will try to start two invocations of maya per box, in which case you better make sure the maya renders have the '-n 1' flag set, to prevent /each/ instance of maya from trying to use /both/ processors..! (causing the renders to step on each other, and the rest of the machine, including rushd)

Or, make sure the ram isn't being overused, causing the box to thrash,
    as that will steal cpu from everything, including the renders.

    It's normal for renders to use 99% of the cpu; the unix scheduler
    should still be able to yield cpu to rushd under those conditions.
    The only situations I can think of where rushd wouldn't be getting
    enough cpu would be:

a) The renders are running at a higher system priority (lower niceness)
           than rushd

b) The kernel is using some kind of 'decaying' scheduling, giving rushd a lower priority than it should. Possibly you can fix this by adjusting
           rushd's system priority with 'renice'

c) The machine is swapping due to the renders using more ram than they
           should be, causing other processes (like rushd) to swap out

Note that Rush's priority values (+any@200) has nothing to do with what I'm calling the system priority (PRI column in 'ps -lax'). The rush priority is used by rush only to determine which user's render should be started next, and has nothing to do with the priority of processes once they're running.

The only value in rush that affects the system priority of running processes is the 'nice' value the job is submitted as, which is the 'niceness' the renders
    will run as.

Regarding a/b, possibly you might want to experiment with adjusting the niceness level of the renders down, so that they don't bog the box down.
    (Won't help if your problem is 'c')

If the problem is 'c', then you might have to get more ram for your boxes, or tell rush not to render more than one render per box, so that only one render runs per box. (Or if it's one user's particular job using all the ram, have them try to optimize the job so that it doesn't use so much ram)

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)



---
Luke Cole
Systems Administrator / TD

FUEL International
65 King St., Newtown, Sydney NSW, Australia 2042



Last Next