From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: Re: OSX hidden process using 50% sys.
   Date: Thu, 17 Sep 2009 02:10:51 -0400
Msg# 1889
View Complete Thread (5 articles) | All Threads
Last Next


> From: Greg Ercolano <erco@(email surpressed)>
> Reply-To: <rush decimal general at seriss decimal com>
> Date: 17 Sep 2009 04:31:30 -0000
> To: <void@(email surpressed)>
> Subject: Re: OSX hidden process using 50% sys.
> 
> [posted to rush.general]
> 
> Dylan Penhale wrote:
>> We are submitting a Maya mentalray render from a PC to a bunch of PC=92=
> s
>> and Macs. The job server is running XP. On one Mac render node,
>> sometimes more than one, we seemingly randomly see the CPU pegged at 50=
> %
>> system. Top does not show what process is using that 50%, it appears to
>> be hidden.
> 
> The kernel can take CPU of its own to process IO, and that
> won't show up associated with any process. For instance, in
> rushtop this would show up in red.

Yes, rushtop shows lots of red :)

> 
>> A little digging reveals that there are two processes that
>> appear to be =93holding=94 log files open on the rush log server.
> 
>> Interestingly these commands are both trying to access the same log fil=
> e
>> and they are NOT the log file that the machine has rendered, nor been
>> assigned to render (from what we can tell).
> 
>> We are unable to read the log file from the problem render node, but th=
> e
>> file is readable from all other hosts.
> 
> When you try to read the log, what happens, and with what
> command/technique are you trying to look at the file?
> (rush -log, more, type, cat, text editor, etc)

Cat, more, tail - all return no output.

As I say, the log file is ok because we can view it from other machines. I
am curios why this box is even trying to write these files as this node
isn't even supposed to be rendering this frame. I think this might be the
clue to the issue.

> 
>> We think that the kernel of this render node is stuck trying to access
>> this file, which in turn is causing the high cpu sys load.
> 
> Yes, most likely, if the cpu is not attributed to a process.
> 
>> When we lsof we can see that the command that is accessing the log file
>> is Render.
>> =20
>> Render    19841      netrender    1w      REG     26,10     47549
>> 37261653
>> /Volumes/atlantic/rushlogs/3d/tjn_se_0210_c004rs_a033as_l014hh_seal_mat=
> teSealA.log/0008
>> Render    19841      netrender    2w      REG     26,10     47549
>> 37261653
>> /Volumes/atlantic/rushlogs/3d/tjn_se_0210_c004rs_a033as_l014hh_seal_mat=
> teSealA.log/0008
> 
> The process ids are the same (19841), so it's one process.
> 
> There are two entries in lsof, one for the stdout file descriptor (1w)
> and one for the stderr file descriptor (2w) which are both 'write only' =
> (w).
> 
> This is normal behavior during rendering, because the processes
> are generating output that is being written to the file server.
> 
> You might check to see what's going on in the log, as maybe it's generat=
> ing
> a large amount of output due to a high verbosity setting.
> 

The render log looks the same as the others, nothing to suggest an error.
Log size is 46K.

> Often you can mitigate the cpu/network overhead of render's verbose outp=
> ut
> by setting the 'maxlogsize' to some very high number (like 10000000, ie.=
>  10M)
> so that the render output is piped through 'logtrim', which will line bu=
> ffer
> the render's output so that verbose output is buffered.
> 
> BTW, what version of rush are you running?
> I don't think it matters in this case, as the io has to do with the rend=
> er
> and likely its verbosity setting. But it may help to know on this end.

RUSHD 102.42a9

> 
> 
> --=20
> Greg Ercolano, erco@(email surpressed)
> Seriss Corporation
> Rush Render Queue, http://seriss.com/rush/
> Tel: 626-795-5922x23
> Fax: 626-795-5947
> Cel: 310-266-8906



Last Next