From: Luke Cole <luke@(email surpressed).au>
Subject: rush slowness/timeouts
   Date: Mon, 02 Jan 2006 20:11:50 -0500
Msg# 1160
View Complete Thread (8 articles) | All Threads
Last Next
Hi rush.general,

We've been noticing rush (appearing to be) starting to slow down as we have increased the number of connected hosts, and running jobs. What we see is that it can take a very long time for rush to contact some of the hosts, and as a result, some of the applications like irush etc, will report hosts as being down, even when that is not the case - they are up and happily rendering away. For example:

manta:~ lrcole$ time rush -ping loaner5
loaner5: RUSHD 102.42 PID=776 Boot=12/29/05,11:29:52 Online, 0 jobs, 1 procs, 544 tasks, dlog=-, nfd=4

real    0m8.753s
user    0m0.041s
sys     0m0.017s

And another:

manta:~ lrcole$ time rush -ping loaner16
  loaner16: read error: 40 second timeout from loaner16

real    0m40.125s
user    0m0.041s
sys     0m0.018s
manta:~ lrcole$

We presently have 250 running jobs, and I think nearly 100 machines on the farm (some of these are workstations, so not all are rendering all the time).

I imagine that these apps are just timing out while trying to query some of the machines, and as a result, just assumes that they are unavailable. Has anyone else experienced problems like this before, and may have suggestions on how we could address the issue?

Our rush license server is often very heavily loaded up (it also serves files) - could this be a factor?

Thank you,
---
Luke Cole
Systems Administrator / TD

FUEL International
65 King St., Newtown, Sydney NSW, Australia 2042



Last Next