From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: jobs not picking up spare cpus
   Date: Fri, 25 May 2007 12:40:41 -0400
Msg# 1550
View Complete Thread (2 articles) | All Threads
Last Next
Andrew Kingston wrote:
> ALERT      Ignoring frmarb 'Run': task in unexpected state 'Idle' 
> (expected Start|Busy) msg from ?@lafarm23:33523 * Lots of these showing 
> up for different farm machines
> 
> FAIL/LISTCPUS Fputs[2]: write failed: _SureWrite(): Broken pipe
> 
> ALERT	   Ignoring 'Idle': task in non-applicable state 'Start' for jobid 
> lin2.928 from ?@lafarm19:

	Can you send me some complete logs directly via email?
	(ie. not on the group) Not sure if these are really something
	to worry about or not.

	Regarding jobs not picking up spare cpus, focus on the
	'Cpus' report for the job (check the STATE and NOTES column)
	and compare to the 'All Cpus' report to see what's idle vs. inuse.
	Send me those two reports if need be.

> lafarm19 & lafarm23 were two of the machines not picking up frames.
> 
> Also I've just checked through the logs this morning & found quite a few 
> of these types of messages on that job server:-
> 
> ALERT      Task 'CpuPass1' ignored for non-existant frame -99999 from ?@lafarm23:33566
>   Prev=lin2            0       lin2.892,091_070_tiles_v04     -99999 100   2048 JobPass    Job state is 'Done'
>   New=lin2             0       lin2.892,091_070_tiles_v04     -99999  100   2048 CpuPass2   Ram unavailable on lafarm23 (2048>0)
> 
> These only appeared between 4 & 4:15 am, and I'm fairly sure no one was 
> here rendering then...

	At 5am rush runs a cleanup operation (see 'taskcleanuphours' in rush.conf),
	but I'm not sure why it would show as 4am instead of 5am.
	Would need to see some logs to tell what's up there.

	Is there possibly a mix of different rush versions on the network?
	When sending the above (in separate email), include the output of:

		rush -ping +any -t 3

-- 
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

Last Next