From: "Abraham Schneider" <aschneider@(email surpressed)>
Subject: several strange Rush behaviours
   Date: Thu, 19 Jul 2012 04:51:07 -0400
Msg# 2257
View Complete Thread (5 articles) | All Threads
Last Next
Hi there!

Wanted to ask about two strange behaviours that occur from time to time on our rush renderfarm and I don't have any plausible explanation for. Our farm is a mixed farm of Macs and Linux with Rush 102.42a9c/d installed, rendering Nuke 6.3v8.

1. problem:
Very unregulary and random we have a situation like this:

Done     rind10.12202     N_NORM_019_060_comp_v04_as aschneid        %100    %0    0   16:52:52
Done     rind10.12204     N_NORM_002_010_comp_v16_dl dlaubsch        %100    %0    0   16:32:02
Done     rind10.12206     N_LOW_048_060_redLog_v00a_mw mwarlimont      %100    %0    0   16:28:48
Done     rind10.12213     N_NORM_001_050_comp_v38_ts tstern          %100    %0    0   16:06:51
Done     rind10.12215     N_LOW_048_080_redLog_v00a_mw mwarlimont      %100    %0    0   16:05:03
Done     rind10.12218     N_NORM_019_060_comp_v04_as aschneid        %100    %0    0   15:08:02
Fail     rind10.12221     N_NORM_001_010_comp_v01_st stischne         %99    %1    0   00:58:31
Done     rind10.12223     N_NORM_001_010_comp_v01_st stischne        %100    %0    0   00:55:04
Run      rind10.12225     N_NORM_001_050_comp_v38_ts tstern           %81    %0    2   00:41:35
Run      rind10.12226     N_NORM_055_010_comp_v01_mt mwarlimont        %0    %0    0   00:36:58
Run      rind10.12227     N_NORM_100_020_comp_v21_mt mwarlimont        %0    %0    0   00:34:11
Done     rind10.12228     N_NORM_001_010_comp_v01_st stischne        %100    %0    0   00:23:08
Run      rind10.12230     N_NORM_103_010_comp_v14_mt mwarlimont       %36    %0   17   00:19:49
Run      rind10.12231     N_NORM_022_cfd0046_comp_v102 ppoetsch          %6    %0    0   00:19:37

Rush is configured to work "first in - first out". And all these jobs were submitted from inside of Nuke via a slightly modified submit_nuke.pl script, all with the same priorities '+nuke=42@500', no difference in submitting at all, as far as I can see. Nothing changed on the farm, no machines added or removed, switched on/offline, etc.

Most of the time, all works just fine and the jobs are rendered one after the other in order of the submitting time/job ID. But sometimes something like above happens: job 12225 starts rendering on all online machines. But halfway through the rendering, it just stops or the amount of CPUs drops significantly and all the other machines continue rendering on a much newer job (in this case job 12228), skipping the unfinished frames from job 12225 and the next submitted jobs 12226 and 12227. 12228 was rendered completely and instead of returning to 12225/12226/12227, most of the machines (except for one machine with two CPUs, that keeps rendering 12225) continued with 12230. I tried to pause 12230 while most of the machines were rendering it. Result was that the machines continued with 12231.

Is there any reason and/or solution, why Rush doesn't follow the 'first in/first out' randomly from time to time? It's hard to debug this problem because I haven't found a way to reproduce this behaviour.



2. problem:
Most of the time, switching a machine/workstation from offline to online, it takes from many seconds to several minutes for this machine to pick up a frame and start rendering. The machine is shown as 'online' instantly, but it just won't start rendering a frame. It's listed as 'online' and 'idle' for several minutes. This happens for all of our machines, doesn't matter if they are Macs or Linux.

Any explanation for that?


Thanks, Abraham


Abraham Schneider
Senior VFX Compositor
 

ARRI Film & TV Services GmbH
Tuerkenstr. 89
D-80799 Muenchen / Germany

Phone (Tel# suppressed) 

EMail aschneider@(email surpressed)
www.arri.de/filmtv
________________________________


ARRI Film & TV Services GmbH
Sitz: München Registergericht: Amtsgericht München
Handelsregisternummer: HRB 69396
Geschäftsführer: Franz Kraus, Dr. Martin Prillmann, Josef Reidinger

Last Next