From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Requeing failed frames from a batch
   Date: Fri, 10 Mar 2006 00:52:43 -0500

Msg# 1254
View Complete Thread (4 articles) | All Threads
Last Next

Cool 

I think option A is what we want. Generally if we have found that if a batch
is going to fail from a scene file error it will either not render at all,
or start failing after a particular point in the frame range and from then
on. So this being the case we would just retry the batch from the failed
frame onwards.

The system we have come up with then is as follows. 

	O Rush submits the batch to mayabatch for rendering. 
	
	O Rush waits for the mayabatch exit and then checks the frames 
	(and frame sizes in our case) according to the log file.
	
	O If it finds that the number of frames written in the log does not
equal 	the number of frames in the batch then reset $opt{sfrm} value to the
last 	good frame +1 and retry. This will effectively retry the batch from
the 	failed frame till the end of the batch ($opt{efrm})

I think this is all we need. We currently have the failures going out as
emails to sysadmins and job owners, so I think we could tweak that slightly
so they only get the error if the 3rd retry (or however many retries are
set) is reached.


|-----Original Message-----
|From: Greg Ercolano [mailto:erco@(email surpressed)] 
|Sent: Friday, 10 March 2006 3:43 PM
|To: void@(email surpressed)
|Subject: Re: Requeing failed frames from a batch
|
|[posted to rush.general]
|
|> I need to find a way to re-submit/re-queue failed frames when submit 
|> from batches.
|
|	I might be missing something.
|
|	Let's say after rendering a batch of 20-29, frame 25 fails,
|	and you have a way to detect this by grepping the logs.
|
|	So you'll know for sure that 20-24 are OK, and 25-29 need to be
|	re-rendered.
|
|	If you just exit the script with exit(2), rush will try 
|to re-render
|	the entire batch of 20-29.
|
|	Are you saying that you want it to:
|
|		a) Assume the problem was intermittent (ie. 
|maya crashed due to a bug,
|		   and will probably render 25-29 just fine if 
|we re-invoke maya to just
|		   render that frame range?
|	..or..
|
|		b) Assume the problem was caused by the user (a 
|bad scene file)
|		   and we should just append frames 25-29 to a 
|text file somewhere
|		   that we can later use to submit a job to fix 
|these frames
|		   after the user has intervened to fix the scenefile.
|
|	If a) then yes, just re-invoke maya with the new frame range,
|	and the user won't even have to know there was a failure.
|
|	If b) then I would think you could do any of a number of things:
|
|		o Have the render script create 0000.fix frames 
|in the image dir
|		  that a later 'fix job' could look for, and 
|just render those frames
|
|		o Append the bad frames to a .txt file
|
|		o Have a 'jobdonecommand' that looks for bad 
|frames, and submits
|		  a 'fix job' in the pause state, and emails 
|the user to fix the
|		  problem, then unpause the fix job to run the 
|fix frames..
|
|> It's easy to re-gueue the whole batch but if only one frame 
|has failed 
|> it's re-rendering all the good frames again.
|
|	If you have the logic in the render script to know which frames
|	need to be re-rendered, and are confident that just 
|re-rendering those
|	is all that's needed to get them to render OK, then 
|just re-invoke maya
|	with just the fix range.
|
|	Or, if you just want to tell the user which frames are 
|bad, you can
|	use 'rush -notes $ENV{RUSH_FRAME}:"BAD FRAMES: 
|$bad_start - $bad_end"'
|	so that the "Frames" report shows a message telling 
|which frames were bad.
|	(this is safe to do when errors occur. Using 'rush 
|-notes' isn't recommended
|	if run on EVERY frame.. that's too much load to the job 
|server if there are
|	100's of render nodes. But logging (uncommon) error 
|conditions is OK..)
|
|--
|Greg Ercolano, erco@(email surpressed)
|Rush Render Queue, http://seriss.com/rush/
|Tel: (Tel# suppressed)
|Cel: (Tel# suppressed)
|Fax: (Tel# suppressed)
|

Last Next