From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: QOS
   Date: Fri, 24 Feb 2006 01:57:10 -0500
Msg# 1249
View Complete Thread (4 articles) | All Threads
Last Next
Dylan Penhale wrote:
[posted to rush.general]

We have the odd one or two machines that are sometimes slow to respond to anything (including rush -pings) when they are under a very heavy render load. I know one way to solve this is to set the renders at a low run level, probably be starting it from a START /BELOWNORMAL wrapper, but I have also been thinking of QOS on the interface.

	Traffic shaping, network throttling, and QOS are all used
	to manage network bandwidth issues.

	But I believe in your case it's not a network bandwidth issue
	at hand; it sounds to me like the box is thrashing, and not
	giving any cpu to rushd.

	Rushd overusing network bandwidth (or even cpu) on a render node
	is the /last/ thing I'd expect you to see. Rushd processes on render nodes
	don't do very much.. esp when cpus are busy rendering. Rushd is usually
	just waiting for renders to finish. It has a little activity when
	a cpu becomes idle, because it wants to get a new job running on that cpu
	asap.

	Whether QOS will help or not depends on what's going on
	with the box:

		1) Does the task manager show the cpus pegged due to render
		   activity or swapping? Is memory use pegged?

		2) Is the desktop temporarily frozen or unresponsive to moving
		   windows around?

	If any or all, network QOS won't help, because the machine is
	thrashing, not giving rushd any cpu. This causes rushd to appear
	unresponsive because it's not getting any cpu to be responsive.

	I believe you indicated in a previous email that the render
	process(es) were swapping the box, overusing memory, causing
	the machine to thrash.

	When a box is thrashing, it won't yield the cpu to processes like
	rushd, because it gives priority to swapping. This is why swapping
	is such a bad thing, and machines usually act pretty badly when eg.
	when a render is overusing memory.

	The only situation I can think of where QOS would help is if the
	box's network interface is completely saturated with I/O from some
	other process (eg. rendering I/O), and you want to use QOS to increase
	priority to rush's traffic, so that rush packets have a higher priority
	than the renderer's I/O traffic, so as to be more responsive.

	But that's a fairly unlikely scenario for renders.. comps maybe,
	or realtime video. I'd only expect network bottlenecking on really
	slow network interfaces (eg. a 10MB ethernet on a 1GHz machine,
	or a 100MB ethernet for a dual proc 2GHz box doing high speed I/O)

	A machine that's network bound usually has cpus that are /not/ pegged,
	because they're all waiting on I/O for the network bottleneck to clear.

How does rush deal with QOS Packet Scheduler under windows XP if at all?

	It deals with QOS the same way it deals with a network that drops
	packets. (It's the same thing really)

	Traffic shaping etc is the applied logic of dropping or delaying
	packet delivery to implement network bandwidth control. Kind of like
	how a traffic light delays traffic to allow cross traffic through
	(packet delay), the QOS can delay packet delivery. And when traffic
	gets *really* snarled, the QOS steam shovel appears, plowing the
	snarled traffic off a cliff (ie. drops packets in favor of allowing
	cross traffic to flow more smoothly)

My understanding is that it doesn't. By default I disable QOS Packet Scheduler in the interface on the windows machines a thinking it's mainly for streaming services but I wonder if there are any QOS-aware applications that use it on render machines?

	If a packet rush sends doesn't reach the remote, it tries again
	later. (on the order of a few seconds), same as other network
	applications.

	But if the rushd application isn't getting any cpu, the remote
	will just keep trying to contact it until it times out (rush -ping),
	or until rushd eventaully responds.

	Ethernet technology is inherently error prone; dropped packets are
	'life as usual' on any ethernet. Packet loss is part of the ethernet
	design. All applications (including rush) have to deal with it
	seamlessly.

	Rush throttles back when a network appears to be lossy; this is
	what the <backoff_rate> and backoff_min/max values control, to prevent
	further saturating an already saturated network. When the network
	becomes responsive again, the backoff rates drop back to normal.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)

Last Next