From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Shake INIT_Processeses problem
   Date: Fri, 04 Aug 2006 03:41:49 -0400
Msg# 1362
View Complete Thread (10 articles) | All Threads
Last Next
I have just noticed that this is happening on some boxes that are not
rendering. We have just started rolling out a few of the 102.42a6 update to
the farm today. Do you think this is related?


-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)] 
Sent: 02 August 2006 14:40
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

Dylan Penhale wrote:
> [posted to rush.general]
> 
> Thanks Greg
> 
> If I ssh into the problem machine as the user that submits the job and 
> try to launch shake I get:
> 
> kCGErrorRangeCheck : Window Server communications from outside of 
> session allowed for root and console user only INIT_Processeses(), 
> could not establish the default connection to the WindowServer.Abort 
> trap
> 
> However I get that error on other machines that "are" able to render 
> out the frame fine.
> 
> This problem is intermittent too. Some times the machine can render, 
> occasionally we get this:
> Executing: shake -exec
> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
> connection to the WindowServer.--- shake: terminated by signal 6
> 
> I think the error is linked though. The user that is having the 
> problem has a lower ID than others (169 compared to the usual 1000+) 
> and I do remember reading something about low ID's being a problem for Mac
machines.
> 
> I will change his ID and report back.
> 
> 
> 
>  
> 
> -----Original Message-----
> From: Greg Ercolano [mailto:erco@(email surpressed)]
> Sent: 25 July 2006 13:16
> To: void@(email surpressed)
> Subject: Re: Shake INIT_Processeses problem
> 
> [posted to rush.general]
> 
>  > INIT_Processeses(), could not establish the default  > connection 
> to the
> WindowServer.--- shake: terminated by signal 6
> 
> Sounds like shake is trying to access the window manager when it 
> shouldn't be.
> 
> The two most common causes of this:
> 
>      1) User error -- the shake file is trying to render
>             to the screen, instead of rendering to a file.
> 
>      2) Bad OS library (eg. quicktime) loaded by shake
>         that is trying to manipulate the window manager.
> 
> Regaring #1, try running the same shake command from a terminal to see 
> if it opens a GUI. If it does, that's the problem.
> 
> If it doesn't, then it's probably #2, which means some OSX library 
> (that shake is loading) is trying to access the window manager when 
> the library is loaded and initialized.
> 
> In the past I've seen QuickTime libraries cause this, where someone 
> either updated the quicktime libs from Apple with buggy libs causing 
> the problem, or a recent OS re-install from CDs that DIDN'T take the 
> latest updates from Apple.
> 
>  > This is only happening on 3 machines, the others are fine.
> 
> Check the patch level of the machines (ie. run 'sw_vers' on each box)
> 
> You can probably replicate this problem by ssh'ing into the same 
> machine that rendered the frame and failed, and logging in as the same 
> user the rush render was running shake as. This user likely doesn't 
> match the user logged into the window manager, and thus the error 
> about being unable to connect to the window manager.
> 
> Shake renders should not be trying to access the window manager unless 
> something is wrong.. ie. #1 or #2 above.
> 
> Dylan Penhale wrote:
>> [posted to rush.general]
>>
>> Has anyone seen the following error when trying to render shake jobs 
>> through rush?
>>
>> Executing: shake -exec
>> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
>> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
>> connection to the WindowServer.--- shake: terminated by signal 6
>>
>> This is only happening on 3 machines, the others are fine.
>> The 3 machines are able to resolve DNS, and get the UID/GID of the 
>> submitting user.
>> Shake runs fine on these boxes.
>>
>> I notice that this may be similar to the AE issue listed here: 
>> http://seriss.com/rush-current/issues-afterfx-6.5/index.html
>>
>> Should I change the shake owner to 0:0 on the problem hosts?

	Doing a chmod 4755; chown 0:0 will surely by pass the problem,
	similar to how that 'fixes' the problem with AfterFx.

	It's not a great solution, of course, as it makes the program
	run as root, and the files it reads/writes are accessed as root too.
	But in a production environment, a sysadmin has ta 'do what you
gotta do'
	to keep the production locomotive running on the track, permissions
be damned.

> I can't figure why only some boxes have the problem.

	I'd bet it's a library issue or plugin issue, or a combo of the two
	where some machines have different versions of libraries and/or
plugins
	than others.

	In ssh, you might try ktrace'ing the binary to see if you can
	determine /which/ library is being initialized that is causing
	the problem.

	Sometimes libraries initialize right after they load, giving a
	tell-tale sign as to the problem. If you can figure out which
	lib it is, you might then be able to compare the file size or
	rev number of that lib against the working machines.


--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


Last Next