Hi rush.general,
We've been noticing rush (appearing to be) starting to slow down as
we have increased the number of connected hosts, and running jobs.
What we see is that it can take a very long time for rush to contact
some of the hosts, and as a result, some of the applications like
irush etc, will report hosts as being down, even when that is not the
case - they are up and happily rendering away. For example:
manta:~ lrcole$ time rush -ping loaner5
loaner5: RUSHD 102.42 PID=776 Boot=12/29/05,11:29:52 Online,
0 jobs, 1 procs, 544 tasks, dlog=-, nfd=4
real 0m8.753s
user 0m0.041s
sys 0m0.017s
And another:
manta:~ lrcole$ time rush -ping loaner16
loaner16: read error: 40 second timeout from loaner16
real 0m40.125s
user 0m0.041s
sys 0m0.018s
manta:~ lrcole$
We presently have 250 running jobs, and I think nearly 100 machines
on the farm (some of these are workstations, so not all are rendering
all the time).
I imagine that these apps are just timing out while trying to query
some of the machines, and as a result, just assumes that they are
unavailable. Has anyone else experienced problems like this before,
and may have suggestions on how we could address the issue?
Our rush license server is often very heavily loaded up (it also
serves files) - could this be a factor?
Thank you,
---
Luke Cole
Systems Administrator / TD
FUEL International
65 King St., Newtown, Sydney NSW, Australia 2042
|