From: Luke Cole <luke@(email surpressed).au>
Subject: Re: rush slowness/timeouts
   Date: Mon, 02 Jan 2006 20:50:17 -0500
Msg# 1163
View Complete Thread (8 articles) | All Threads
Last Next
Hi Greg,

Need some more info: regarding the unresponsive machines 'loaner5'
and 'loaner16', do you think rush is slow because the rush daemon is busy,
or because the machine is thrashing due to rendering?

It's important to determine if the rushd is busy, or if the machine
is busy due to rendering.

Ah yes. It would be getting hammered due to rendering - of course - if the machine is maxed out rendering, it will be slow to respond to rush! :)

I checked loaner16 - it is reporting 99% of the CPU as being utilised by mayabatch.exe, so it's definitely running slowly because of that.

When a machine is not being responsive to 'rush -ping', try ssh/ rsh'ing over to that machine and look at 'top' and/or the output of eg. 'vmstat 3'. Is rushd using up all the cpu, or is a render? Is the machine swapping due to unavailable ram? Does rsh/ssh not even respond when trying to connect
to the machine? If so, the renders may be using too much in the way of
ram resources, swapping the machine to death.

I didn't think to check machine performance as it hasn't really been a problem before - most of our dedicated render machines are dual-cpu hosts though, so it could be why they are a bit more responsive (loaner16 is a rental single-cpu host)...thanks for pointing that out as an issue as I didn't think to check load on the troublesome hosts themselves.

Or, possibly rush is being kept busy; what is the successful output of
'rush -tasklist loaner16? If the list is huge, possibly users are submitting with too many +any specifications. For instance if there are 250 jobs each asking for:

    +any=3@200 +any=5@150 +any=10@100 +any=20@50

..that will make four entries on each host, multiplying the complexity
to rush by 4 (4 specs per job * 250 jobs = 1000 active tasks)

..consider instead using just a two tier submissions:

    +any=3@200 +any=20@50

When I run the rush -tasklist command on loaner16 I get a list with almost 420 entries. Does that sound reasonable?

Our rush license server is often very heavily loaded up (it also serves files) - could this be a factor?


Not likely, as the rushd daemons only communicate with the license
server on boot.

Unless, that is, your license server is also acting as a job server for jobs (ie. submitting jobs to the license server, such that jobs have jobids
with the license server's hostname in them)

That should be OK then - the server is a license host only - we host the jobs on some other machines.

Thank you for your help!

---
Luke Cole
Systems Administrator / TD

FUEL International
65 King St., Newtown, Sydney NSW, Australia 2042



Last Next