From: Antoine Durr <antoine@(email surpressed)>
Subject: Re: cpu balancing script
   Date: Wed, 06 Jun 2007 14:44:37 -0400
Msg# 1579
View Complete Thread (16 articles) | All Threads
Last Next
On 2007-06-05 18:29:07 -0700, Greg Ercolano <erco@(email surpressed)> said:

Antoine Durr wrote:
So I whipped up something this afternoon.   It's pretty simplistic, and
I'm curious as to where it might fall flat on its nose.

	You might find that 'beats up' rush too much by having it change
	things every 5 seconds.

	Also, try to avoid running 'rush -cp' commands if it's not actually
	changing anything.

	The script I'm writing counts the number of available cpus,
	and splits them up to each user. So if there are 15 procs
	and 3 different users, each user gets 5 cpus assigned to their
	jobs. If a user has multiple jobs, each job gets a few cpus,
	but no more than 5 total for all their jobs.

Yeah, that's really the right way to go, as it should be per-user.


	I'm getting hung up on the latter condition of what to do
	when a user has 10 jobs with all their priorities equal,
	but only 5 cpus allocated to them; just assign 1 cpu to
	their first 5 jobs, and let the others languish until
	more procs free up or jobs finish?

Well, the problem with changing the # of cpus allocated to a job is that you stomp on information that could be critical to the job, e.g. I'm running a comp, but don't want to run with more than 4 cpus, or I'll flood the IO bandwidth of the drives. Or I've only got two Houdini licenses, thus only run two jobs. Thus, the thing that should be tweaked is priority, so that a person deficient cpus gets a higher priority, and therefore a greater chance of picking up the next available proc. Of course, then there's no longer a user-settable priority system. This isn't too bad, as (IMO) all users should have the same priority, and it's up to show management to tweak that.


	I'm using 'rush -status +any -c 0 -s 20' to get the job info;
	it will poll forever until killed, and is low bandwidth.
	I'm caching the data, looking for changes in job states
	or jobs either added or removed.

	I count 'available cpus' as cpus that are 'online' and
	don't have RESERVE jobs assigned to them.

	Anyway, it's taking a while to write, to handle the weird
	situations.

	But definitely you must avoid the tendency to keep changing
	things in rush to the point where all it's doing is rescheduling
	things. Only tell rush to change things when there's something
	worth changing, and try not to 'oscillate' (ie. jumping jobs
	up and down one or two procs every iteration, due to rounding)

Ideally, the priority scheduling should be revised on every cpu assignment and every done frame, so that the next assignment makes the distribution more balanced. The challenge then becomes dealing with fast frames, as you then spend an inordinate amount of time rebalancing. Thus, every 5 or 10 seconds should be plenty. However, if a whole slew of cpus can be assigned in that time, the queue could very quickly become out of balance.

This process is, IMO, one of the top requirements of a render queue.

-- Antoine

--
Floq FX Inc.
10839 Washington Blvd.
Culver City, CA 90232
310/430-2473


Last Next