From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: rush slowness/timeouts
   Date: Mon, 02 Jan 2006 21:41:22 -0500
Msg# 1165
View Complete Thread (8 articles) | All Threads
Last Next
Luke Cole wrote:
When I run the rush -tasklist command on loaner16 I get a list with almost 420 entries. Does that sound reasonable?

	Yes, that sounds normal for dual proc machine, or for two
	tier submissions.

I checked loaner16 - it is reporting 99% of the CPU as being
utilised by mayabatch.exe, so it's definitely running slowly because of that.

	Sounds like you should dig a little deeper, since it's normal
	for renders to use 99% of the cpu; rushd shouldn't act unresponsive
	for 40 seconds unless something very extreme is going on.

	Is the rushd.log file complaining on these machines? I'd expect
	to see errors like 'connection reset by peer' due to the unresponsiveness,
	but I'm wondering if there are other errors that might indicate rushd
	is stuck doing eg. bad hostname lookups, or some such. For instance,
	does running 'rush -lah' on the slow-to-respond host show ???'s for
	any entries in the report? That might indicate bad or unresponsive DNS.
	Does the 'rush -lah' report take a long time to come up? (might be
	symptomatic of a slow to respond name lookup system, or if OSX,
	possibly you have the .local Rendezvous/Bonjours Multicast DNS disease,
	in which case make sure you have HOSTNAME=<name_of_host> and not HOSTNAME=-AUTOMATIC-
	in the /etc/hostconfig)

	rushd being unresponsive due to rendering sounds like it /could/ be
	the cause IF the machine's resources are being taxed by the render
	(ie. the renders are using too much ram, or are starting too many
	threads for the number of cpus the machine has)

	If maya is involved, make sure that if your machines are dual procs,
	and in the rush/etc/hosts file each box is configured with '2' for CPUS,
	then rush will try to start two invocations of maya per box, in which case
	you better make sure the maya renders have the '-n 1' flag set, to prevent
	/each/ instance of maya from trying to use /both/ processors..! (causing the
	renders to step on each other, and the rest of the machine, including rushd)

	Or, make sure the ram isn't being overused, causing the box to thrash,
	as that will steal cpu from everything, including the renders.

	It's normal for renders to use 99% of the cpu; the unix scheduler
	should still be able to yield cpu to rushd under those conditions.
	The only situations I can think of where rushd wouldn't be getting
	enough cpu would be:

		a) The renders are running at a higher system priority (lower niceness)
		   than rushd

		b) The kernel is using some kind of 'decaying' scheduling, giving rushd
		   a lower priority than it should. Possibly you can fix this by adjusting
		   rushd's system priority with 'renice'

		c) The machine is swapping due to the renders using more ram than they
		   should be, causing other processes (like rushd) to swap out

	Note that Rush's priority values (+any@200) has nothing to do with what I'm
	calling the system priority (PRI column in 'ps -lax'). The rush priority is
	used by rush only to determine which user's render should be started next,
	and has nothing to do with the priority of processes once they're running.

	The only value in rush that affects the system priority of running processes
	is the 'nice' value the job is submitted as, which is the 'niceness' the renders
	will run as.

	Regarding a/b, possibly you might want to experiment with adjusting the
	niceness level of the renders down, so that they don't bog the box down.
	(Won't help if your problem is 'c')

	If the problem is 'c', then you might have to get more ram for your boxes,
	or tell rush not to render more than one render per box, so that only
	one render runs per box. (Or if it's one user's particular job using all
	the ram, have them try to optimize the job so that it doesn't use so much ram)

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)

Last Next