From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: How to detect and handle Frame MaxTime failures
   Date: Thu, 03 Nov 2011 18:58:03 -0400

Msg# 2144
View Complete Thread (5 articles) | All Threads
Last Next

On 11/03/11 11:03, Lutz Paelike wrote:
> we have sometimes some MaxTime failures in our rush queue and the frames =
> are then killed after MaxTime is reached. This is fine but still every =
> frame is rendered, reaches MaxTime and is finally killed.
> We would like to monitor a job and if more then, let's say 5 frames,
> are killed due to Maxtime the job (or a series of jobs) should be =
> skipped completely and no more frames should be renderered.=20

	I'd suggest instead of using MaxTime, to handle this
	specific set of circumstances, you'd probably want to
	instead put some logic in your script to handle the
	more specific behavior you want.

	For instance, I could see having your own "Render Max Time:"
	field in the submit form that passes the value to the
	render script, which in turn would take this value,
	fork()s the render off as a child, and then monitors
	the execution time of the render.

	This way the script can decide if it should kill the render,
	and if so, implement its own logic to modify the job.

	For instance, I could see logic that adds a job remark
	(rush -jobremark) and frame notes (rush -notes) to tell the user
	what happened, and have the script then either pause the job (rush -pause)
	or have it fail all the Que frames (rush -fail que) so that the job
	simply fails itself quickly.

> Because we usually chain several jobs together with the WaitFor command,
> a single jobs with 100 frames reaching MaxTime blocks the renderfarm for =
> several hours which is mostly a problem at night when the farm is not =
> watched.

	If you used the above technique to 'Fail' all the Que frames,
	then the job would suddenly fail itself, allowing other the
	other waitfor jobs to start running.

	Just curious though: are you using 'waitfor' to simulate
	a FIFO queue? If so, did you rule out using rush's FIFO
	scheduling? (eg. 'sched fifo' in the rush.conf file)
	Perhaps that's not what you need, but since it sounds like
	you want the other jobs to continue if this one keeps hanging,
	then I imagine the jobs really shouldn't be dependent on
	each other, and perhaps just FIFO scheduled..

> A solution would be to have something like a TimeOutCommand, that
> calls a script that can take appropriate action (This would be on a per =
> frame basis), or even better a general StatusCommand that could be =
> called for every frame, or for every job and additional information =
> could be passed via environment variables.

	I think this kind of thing is best done as logic in the
	script itself; background the command, and monitor its
	execution time.. if it exceeds the max, the script can
	choose what to do.

> Since the killing of the process is initiated by rush, my custom render
> script can not detect that it was killed because it reached MaxTime.

	Right -- a good reason not to use it in this case,
	and use the above instead, I would think.

> The only solution i can think of right now is to go through every job in
> the queue and parse the log files if there is any MAXTIME entry.

	I once investigated trying to make a 'callback option'
	for maxtime so that when it expires, a script could be
	run to do post-kill logic.. but I soon realized there
	would need to be all kinds of options to do what someone
	would want; run the script BEFORE the kill occurs, or
	AFTER it occurs, or have the script decide whether to
	kill it or not, etc.

	Seemed best to implement such things in the script itself.


-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)ext.23
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

Last Next