Dylan Penhale wrote:
[posted to rush.general]
We have the odd one or two machines that are sometimes slow to respond
to anything (including rush -pings) when they are under a very heavy
render load. I know one way to solve this is to set the renders at a low
run level, probably be starting it from a START /BELOWNORMAL wrapper,
but I have also been thinking of QOS on the interface.
Traffic shaping, network throttling, and QOS are all used
to manage network bandwidth issues.
But I believe in your case it's not a network bandwidth issue
at hand; it sounds to me like the box is thrashing, and not
giving any cpu to rushd.
Rushd overusing network bandwidth (or even cpu) on a render node
is the /last/ thing I'd expect you to see. Rushd processes on render nodes
don't do very much.. esp when cpus are busy rendering. Rushd is usually
just waiting for renders to finish. It has a little activity when
a cpu becomes idle, because it wants to get a new job running on that cpu
asap.
Whether QOS will help or not depends on what's going on
with the box:
1) Does the task manager show the cpus pegged due to render
activity or swapping? Is memory use pegged?
2) Is the desktop temporarily frozen or unresponsive to moving
windows around?
If any or all, network QOS won't help, because the machine is
thrashing, not giving rushd any cpu. This causes rushd to appear
unresponsive because it's not getting any cpu to be responsive.
I believe you indicated in a previous email that the render
process(es) were swapping the box, overusing memory, causing
the machine to thrash.
When a box is thrashing, it won't yield the cpu to processes like
rushd, because it gives priority to swapping. This is why swapping
is such a bad thing, and machines usually act pretty badly when eg.
when a render is overusing memory.
The only situation I can think of where QOS would help is if the
box's network interface is completely saturated with I/O from some
other process (eg. rendering I/O), and you want to use QOS to increase
priority to rush's traffic, so that rush packets have a higher priority
than the renderer's I/O traffic, so as to be more responsive.
But that's a fairly unlikely scenario for renders.. comps maybe,
or realtime video. I'd only expect network bottlenecking on really
slow network interfaces (eg. a 10MB ethernet on a 1GHz machine,
or a 100MB ethernet for a dual proc 2GHz box doing high speed I/O)
A machine that's network bound usually has cpus that are /not/ pegged,
because they're all waiting on I/O for the network bottleneck to clear.
How does rush deal with QOS Packet Scheduler under windows XP if at all?
It deals with QOS the same way it deals with a network that drops
packets. (It's the same thing really)
Traffic shaping etc is the applied logic of dropping or delaying
packet delivery to implement network bandwidth control. Kind of like
how a traffic light delays traffic to allow cross traffic through
(packet delay), the QOS can delay packet delivery. And when traffic
gets *really* snarled, the QOS steam shovel appears, plowing the
snarled traffic off a cliff (ie. drops packets in favor of allowing
cross traffic to flow more smoothly)
My understanding is that it doesn't. By default I disable QOS Packet
Scheduler in the interface on the windows machines a thinking it's
mainly for streaming services but I wonder if there are any QOS-aware
applications that use it on render machines?
If a packet rush sends doesn't reach the remote, it tries again
later. (on the order of a few seconds), same as other network
applications.
But if the rushd application isn't getting any cpu, the remote
will just keep trying to contact it until it times out (rush -ping),
or until rushd eventaully responds.
Ethernet technology is inherently error prone; dropped packets are
'life as usual' on any ethernet. Packet loss is part of the ethernet
design. All applications (including rush) have to deal with it
seamlessly.
Rush throttles back when a network appears to be lossy; this is
what the <backoff_rate> and backoff_min/max values control, to prevent
further saturating an already saturated network. When the network
becomes responsive again, the backoff rates drop back to normal.
--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)
|