From: Dylan Penhale <dylan@(email surpressed)>
Subject: OSX hidden process using 50% sys.
   Date: Wed, 16 Sep 2009 23:04:54 -0400
Msg# 1887
View Complete Thread (5 articles) | All Threads
Last Next
Title: OSX hidden process using 50% sys.
Hi

We have a tricky little problem here, I wonder if you have seen anything like it.  

We are submitting a Maya mentalray render from a PC to a bunch of PC’s and Macs. The job server is running XP. On one Mac render node, sometimes more than one, we seemingly randomly see the CPU pegged at 50% system. Top does not show what process is using that 50%, it appears to be hidden. A little digging reveals that there are two processes that appear to be “holding” log files open on the rush log server. Interestingly these commands are both trying to access the same log file and they are NOT the log file that the machine has rendered, nor been assigned to render (from what we can tell).

We are unable to read the log file from the problem render node, but the file is readable from all other hosts.
We think that the kernel of this render node is stuck trying to access this file, which in turn is causing the high cpu sys load.

When we lsof we can see that the command that is accessing the log file is Render.

Render    19841      netrender    1w      REG     26,10     47549 37261653 /Volumes/atlantic/rushlogs/3d/tjn_se_0210_c004rs_a033as_l014hh_seal_matteSealA.log/0008
Render    19841      netrender    2w      REG     26,10     47549 37261653 /Volumes/atlantic/rushlogs/3d/tjn_se_0210_c004rs_a033as_l014hh_seal_matteSealA.log/0008

Note: This not did not try to render frame 0008 from what we can tell.

Should Render be writing the log file directly, or should the STDERR from Render be passed to Rush to do the writing?

The PID’s listed do not show up in the process list (top or ps) so we assume the kernel is experiencing some sort of bug.

We also thought that perhaps this render node was assigned to render this frame, and at the last minute informed that another node was already doing it so told to abort it’s render. It’s hard to tell from the cpu.acct file what is happening as I think this only gets updated at the end of each successful frame.

We are not able to umount the NFS share that the logs are written to, which again suggests the logs are somehow the cause. In fact the only way to get the CPU to behave normally is to restart the box, which can only be done by physically holding the power button in.

In this case the render node is running OSX 10.5.4, and is an Intel Core 2 Duo.

Any ideas?

DYLAN PENHALE
IT MANAGER

FUEL
65 KING STREET
NEWTOWN  SYDNEY
NSW  2042  AUSTRALIA

T. 61 2 9557 7799
F. 61 2 9557 7882
M. 0424 655 320

WWW.FUELVFX.COM


Last Next