From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: OSX hidden process using 50% sys.
   Date: Thu, 17 Sep 2009 03:56:27 -0400

Msg# 1890
View Complete Thread (5 articles) | All Threads
Last Next

Dylan Penhale wrote:
>>> We are unable to read the log file from the problem render node,
>>> but the file is readable from all other hosts.
>> When you try to read the log, what happens, and with what
>> command/technique are you trying to look at the file?
>> (rush -log, more, type, cat, text editor, etc)
> 
> Cat, more, tail - all return no output.
> 
> As I say, the log file is ok because we can view it from other machines. I
> am curios why this box is even trying to write these files as this node
> isn't even supposed to be rendering this frame. I think this might be the
> clue to the issue.

	This sounds messed up at the file system level;
	I'd suspect the network file system as the problem,
	probably at the client side.

	Hopefully other folks can chime in here if they've seen
	this too.

	What kind of machine is the file server at the other end
	of the NFS mount? Possibly there's an NFS incompatibility
	there.. only thing I can think of other than the client OS.

	I haven't heard of such a problem in OSX's nfs; in my
	experience NFS has been pretty much problem free mounting
	the linux NFS server here.

	BTW, you mention you're unable to umount the file server,
	is this true even after you kill the Render process, and
	lsof verifies there are no other mounts? Or is the Render
	process unkillable because it's 'hung accessing the mount',
	which would all the more point to an NFS client fault.

	Did you try 'umount -f'?

>>> We think that the kernel of this render node is stuck trying to access
>>> this file, which in turn is causing the high cpu sys load.

>> Yes, most likely, if the cpu is not attributed to a process.

> The render log looks the same as the others, nothing to suggest an error.
> Log size is 46K.

	Try yanking the network cable to see if it affects the 50% cpu use.
	My guess is it won't, which means the kernel is spinning doing
	'something bad'.. hard to say what.

	Is the whole mount in a bad state (ie can you view other files OK?),
	or is it just this file and all else works fine?

	All the behavior sounds quite odd from an OS point of view.
	I'd say the client OS is at fault here, with only a slim
	possibility the NFS server might be triggering it.

	Once rush starts the Render process, its up to the OS and Render
	to do its thing; rush has no part of the rendering process once
	it starts until the process exits; then rush steps in and returns
	the exit code.

	I think if you want to move forward, get out the nfs debugging
	tools; tcpdump, possibly even ktrace (or whatever it's called
	on Leopard). If you suspect maybe the server is at fault,
	contact the vendor if you're on support.

> RUSHD 102.42a9

	That sounds fine.

	That release is a little over a year old (June 08),
	and there's only been 2 minor maintenance releases since then..
	but nothing you'd need.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

Last Next