From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Jobs not being killed by "Getoff"
   Date: Fri, 04 May 2012 15:09:56 -0400
Msg# 2231
View Complete Thread (11 articles) | All Threads
Last Next
On 05/04/12 11:35, Mr. Daniel Browne wrote:
> We're using a Houdini submit based on your perl version that I rebuilt =
> in Python based off of your one for Maya. It seems that when a job is =
> removed from a machine, either by issuing a getoff, down, or fail, the =
> job restarts on a new host but does not die or get removed from the =
> first one properly.

	Sounds like you're saying the script is killed, but the render
	remains running, is that the case?

	If so: are you sure the script isn't somehow 'backgrounding'
	the render, so that it disconnects from the process hierarchy?

	That's the only situation where I'd think this would be possible.

	Rush will kill the entire process hierarchy, but if the render process
	somehow disconnects from the process hierarchy, then rush can't control it.

> Is there something more within a Python render =
> script that I have to do to handle job fails?

	The technique I'd recommend for running a render from Python that
	I know works OK:

	On unix:

	    sys.stdout.flush()
	    sys.stderr.flush()
	    return os.system(cmd)

	On windows:

		import subprocess
		exitcode = subprocess.call(cmd, shell=0)

	I would change the shell=0 to shell=1 if you plan on using
	redirection (<>) or pipes (|) or other shell-special chars
	like &&.

	To know how to advise based on what you're currently
	using, would need more info:

	1) What does the process hierarchy look like when a frame
	   is running? For instance, on linux:

% ps fax
[..]
32325 ?        Ss     0:00 /usr/local/rush/bin/rushd
 6627 ?        SNs    0:00  \_ perl /eagle/net/cd/releases/tar/102.42d/irush/../examples/perl/submit-maya.pl -render 5 1 5 1 yes - 3 Fail Licpause+Retry //eag
 6628 ?        SN     0:00      \_ Render -r mr -proj /eagle/net/tmp -s 1 -e 5 -b 1 -v 5 -rt 1 /eagle/net/tmp/scenes/foo.ma
 6629 ?        SN     0:00          \_ /bin/csh -f /usr/autodesk/maya2012-x64/bin/maya -batch -file /eagle/net/tmp/scenes/foo.ma -script /var/tmp/.RUSH_TMP.15
 6648 ?        DN     0:00              \_ /usr/autodesk/maya2012-x64/bin/maya.bin -batch -file /eagle/net/tmp/scenes/foo.ma -script /var/tmp/.RUSH_TMP.15/AST
[..]

	   ..and then, what's the process hierarchy look like AFTER you requeue/getoff/whatever
	   the frame.. is the python script still running? Or is it only the render
	   processes, and if so, which ones?


	2) What python code are you using to invoke Houdini during rendering?
	   My guess is how this is being invoked is what's causing the problem.

		2a) Are you using os.system() or os.popen() or subprocess.call()..?
	            Include any special command flags for these that you might be using.

		2b) What the exact command is being invoked?
		    Perhaps there's something about the command that's causing the
		    problem, like the presence of unix '&' or DOS's 'start'.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)ext.23
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


Last Next