From: Antoine Durr <antoine@(email surpressed)>
Subject: Re: cpu balancing script
   Date: Wed, 06 Jun 2007 18:26:03 -0400
Msg# 1586
View Complete Thread (16 articles) | All Threads
Last Next
On 2007-06-06 15:08:41 -0700, Greg Ercolano <erco@(email surpressed)> said:

Antoine Durr wrote:

	Yes.. some companies make license counting 'wrappers' for their
	renders, so that interactive use can be tracked and be 'predicted'
	as part of a larger, 'reservation' oriented system.

	Others use their 'watcher' to interrogate the third party
	license managers, to see how many lics are available, and
	modify the cpu allocations to keep that balanced.

	Honestly though, most folks with large nets just buy or rent
	the licenses they need so they can make use of the whole farm,

Hmm, not sure that's feasible for 1000+ node networks, when the packages is multiple thousands per. Site license would do it, but may not be cost effective either.

	and not have to juggle that stuff, because even when its done
	right, there are race conditions with the license counting,
	unless you have some kind of reservation system embedded in
	the license system, or a wrapper that does this.

When the process completes, it tails the last line of that
file, and puts it into the per-frame notes field, so that the users see
what kind of memory footprint their job had.

	Yes, that is very useful info, and its best to simply advertise
	it to the user, so they can submit their job with a correct
	'Ram' value based on their impression of the numbers.

	Often the renderers print those numbers in the log for you,
	so you can just grep them out; maya and mental ray both
	do this.

	Trouble is the format of these messages can change from one
	rev of the renderer to another. And, depending on the job,
	sometimes there are several of these messages per frame,
	such as when renders are batching, or worse, when complex jobs
	render multiple images per frame ('levels', 'passes', 'comps' etc)
	So it gets a little tricky to implement that in a way that
	works for all situations.

That's why having the queue monitor what your process is doing is really the only solution. Reports by the software are what *it* thinks, not what the OS thought.
	A few weeks ago I implemented a much lower latency 'rush -exitnotes'
	command which handles sending back 'per frame' messages to the
	jobserver in a reasonable manner by connecting to the 'render node'
	instead of the job server, setting things up so the note is passed
	back to the job server as part of the UDP message that delivers the
	exit code back to the server. It'll be in version 102.43.

I'll definitely migrate to that. My "exitnotes" also show cpu efficiency (I run my commands via /usr/bin/time), so that compositors can get a general sense of how whether their jobs are cpu or i/o bound, i.e. low efficiency most likely indicates mostly waiting for disk. Renderer low efficiency could be many texture maps access or undue swapping.


	I'd like to add the 'ram usage indicator' to the submit scripts
	as an option, once the 'rush -exitnotes' is fully released.

This is a pretty critical
feature, IMO, as you *really* want to avoid going into swap on a box!

	Yes, definitely.

	Gets tricky to detect swap though, as often when a box looks like
	it's swapping, it's actually just paging out old junk to make room
	in ram that it should have cleared out long ago.

	Best thing is to just know in advance how much ram the job will
	tend to need, and submit with that ram value set. (e.g. the 'Ram:'
	prompt in the submit forms)

Hence the need for frame notes with what happened!


	'rushtop' is handy for seeing if a job is using a lot of ram.
	Just render your job on a box, and watch the ram use as the
	render runs to get a feel for how nasty it is.. then submit
	with the 'Ram' field set accordingly.

Doesn't really work when all your renderfarm computers are 8-proc nodes.

	My goal was to avoid any features in rush that had too much
	of a 'fuzzy' aspect to it, such as polling ram use.

I can get that. What might be useful is having hooks for a bunch of these things, and users can install them if they feel they're needed. A user (me!) shouldn't really have to go and write a memory checking script on their own.


	Even kernel mailing lists argue endlessly on how free ram
	should be determined, and it often changes from release
	to release. I've had to tweak rushtop several times to take
	into account changes in the different OS's ram calculations.

	It's too bad that most of the OS's (esp unix!) doesn't let
	a parent program get back the memory use and usr/sys time

Yeah, I was floored by that when I found out just how bad memory accounting is! The structures are there, for Pete's sake!

	of the accumulated process tree. The counters are there
	in the structures for the accumulated times, but they're
	all zeroed out. All you can get back is the ram/cpu time
	of the immediate process reliably, and that's usually just
	the perl script, which is useless. The only way to get unix
	to show process hierarchy data that I've seen is to have to
	have system wide process accounting (acct(2)) turned on.
	And whenever that's on, the system bogs down because
	process accounting makes giant logs quickly. But it helps
	the OS tally accumulated process tree info internally.


Funny, I'm doing the ram-usage polling right now.  I simultaneously
launch a memory watcher script, which given a PID, finds all the child
PIDs and adds up their memory consumption, writes to a file in the
logdir.

	Trouble I've found with snapshoting the proc table (did it at DD)
	is that you run into a few real world problems, enough that it
	can often cause more trouble than its worth.

	When polling the process hierarchy, you can end up with wild
	snapshots when processes fork, showing double memory use during
	that time. You can try to smooth those out as aberrant data,
	but some renders fork frequently, causing the data to sometimes
	appear valid, throwing the job into a stall.

My script has a ramp-down of polling frequency: for the first 10 seconds, it polls every 2 seconds, then every 5 seconds for another 30, eventually down to once a minute for the life of the process. This has worked pretty darned well, as it captures the fast frames decently. It does miss out on last-second memory surges (Shake tends to do that once in a while, it seems). But for the most part, the frame-to-frame correlation is pretty strong. Oddly enough, I haven't seen the data-doublings due to forks, maybe because the polling is infrequent.


	Also, sometimes a single frame would go bananas on ram, causing
	the queue to think the job was going into a phase of high memory
	use. Or sometimes a scene will simply go from a black frame to
	a sudden high memory use, enough to swap. An automated mechanism
	that tries to use this wild data to handle scheduling almost always
	stalls the job, causing folks to simply turn off the feature;
	they'd rather have their render crash a few machines on the
	few frames that go bananas instead of having the job completely
	stall in the middle of the night.

	In rushtop I added an experimental 'paging activity' indicator
	(the 'orange bar') which watches for 'excessive' paging activity,
	and bumps the bar when that happens. This limit was determined
	empirically.. when the orange bar appears, chances are you can
	'feel' the slowness if you're on that box in the form of an
	unresponsive mouse, or similar.

That's great stuff.  It would be a useful bit of info on an exitnote.

-- Antoine



--
Floq FX Inc.
10839 Washington Blvd.
Culver City, CA 90232
310/430-2473


Last Next