From: "Dylan Penhale" <dylan@(email surpressed)>
Subject: Staggering render start times.
   Date: Fri, 13 May 2004 03:56:05 -0700
Msg# 585
View Complete Thread (7 articles) | All Threads
Last Next
How can I stagger the start time of each frame of a render job?
 
Because of the size of some of our renders, setting them all off a once is
putting quite a load on our file server and slowing workstations. Is it
possible to set an advanced submit option to delay each nodes frame start
time?
 
Perhaps something along the lines of: rush -waitfor 30 [jobid]? This looks
as if it will only work on jobs not frames.

 
Dylan Penhale
 
Digital Systems Manager
Nexus Visual Effects



   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Staggering render start times.
   Date: Sat, 14 May 2004 01:53:45 -0700
Msg# 602
View Complete Thread (7 articles) | All Threads
Last Next
Dylan Penhale wrote:
How can I stagger the start time of each frame of a render job?
Because of the size of some of our renders, setting them all off a once is
putting quite a load on our file server and slowing workstations. Is it
possible to set an advanced submit option to delay each nodes frame start
time?

	Trouble is, even if you stagger, eventually once all the
	machines are rendering, they'll all be hitting your server.

	If you've got a network large enough to where it slows your
	server down, you may want to consider some kind of network
	flow control at the switches, so that the workstations have
	higher priority to the network than the farm.

	To stagger start, you could any one of several things..
	just to give you an idea:

		o Submit with the 'maxcpus' set to '1', then slowly advance
		  the maxcpus value with an external script:

			#!/bin/csh -f
			set jobid = $1
			# Every 10 seconds, add two cpus. Eventually, removes maxcpus limit
			foreach i ( 2 4 8 10 0 )
			    rush -maxcpus $i $jobid -fu
			    sleep 10
			end

		  ..or build this behavior into the submit script.

		o Ramp up sleep times in the render script for first 20 frames:

			if ( $frame <= 20 ) { sleep($frame*5); }

		o Enter a loop that polls the server's uptime before starting
		  a frame to render, eg. in perl:

			while ( 1 )
			{
			    # Parse output of 'uptime' to get load average
			    # If load average is high (>8), sleep until it gets lower
			    my $load=`rsh SERVER uptime | sed 's/.*load averages: //;s/ .*//';
			    if ( $load > 8.0 ) { sleep(10); next; }
			    last;
			}

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Staggering render start times.
   Date: Sat, 14 May 2004 02:13:32 -0700
Msg# 603
View Complete Thread (7 articles) | All Threads
Last Next
Greg Ercolano wrote:
    If you've got a network large enough to where it slows your
    server down, you may want to consider some kind of network
    flow control at the switches, so that the workstations have
    higher priority to the network than the farm.

	Many Cisco routers have this, BTW.

	They refer to this as QOS (Quality Of Service) prioritization.
	There may be other terms for it.

	Made popular because of VoIP, and other high volume data
	streaming apps.

	I believe some of the more sophisticated switches have
	prioritization as well, where you can set priorities
	at the individual ports, so that workstations and server
	can be given higher priority than the farm.

	I don't know specifics, I just hear rumors ;)

	Has anyone on the group played with this stuff, and has
	any stories to tell?

	I know some of my larger customers have played with that
	stuff with success, but I've never received particular
	hardware or 'best config' recommendations.

   From: Steve Kochak <steve@(email surpressed)>
Subject: Re: Staggering render start times.
   Date: Sat, 14 May 2004 09:33:34 -0700
Msg# 605
View Complete Thread (7 articles) | All Threads
Last Next
I used to use it with my Extreme Networks setup. However, in the end, I just ended up making my storage better. Which OS are you using for your fileserver(s)?



Greg Ercolano wrote:
Greg Ercolano wrote:

    If you've got a network large enough to where it slows your
    server down, you may want to consider some kind of network
    flow control at the switches, so that the workstations have
    higher priority to the network than the farm.


    Many Cisco routers have this, BTW.

    They refer to this as QOS (Quality Of Service) prioritization.
    There may be other terms for it.

    Made popular because of VoIP, and other high volume data
    streaming apps.

    I believe some of the more sophisticated switches have
    prioritization as well, where you can set priorities
    at the individual ports, so that workstations and server
    can be given higher priority than the farm.

    I don't know specifics, I just hear rumors ;)

    Has anyone on the group played with this stuff, and has
    any stories to tell?

    I know some of my larger customers have played with that
    stuff with success, but I've never received particular
    hardware or 'best config' recommendations.


   From: "Dylan Penhale" <dylan@(email surpressed)>
Subject: RE: Staggering render start times.
   Date: Sat, 14 May 2004 09:40:11 -0700
Msg# 606
View Complete Thread (7 articles) | All Threads
Last Next
Gawd, don't ask :)

We are using OSX Xserves. I seem to get nothing but grief from them. I'm
having a real problem with "delayed write failed" errors running the latest
10.3.3 release, and if I roll it back that error is not there, but all sorts
of performance related issues come up. NFS used to be no better on these,
but I may look at that next.

Of course I know the real answer is a NetApp or similar... 

|-----Original Message-----
|From: Steve Kochak [mailto:steve@(email surpressed)] 
|Sent: 14 May 2004 17:34
|To: void@(email surpressed)
|Subject: Re: Staggering render start times.
|
|[posted to rush.general]
|
|I used to use it with my Extreme Networks setup.  However, in 
|the end, I just ended up making my storage better.  Which OS 
|are you using for your fileserver(s)?
|
|
|
|Greg Ercolano wrote:
|> Greg Ercolano wrote:
|> 
|>>     If you've got a network large enough to where it slows your
|>>     server down, you may want to consider some kind of network
|>>     flow control at the switches, so that the workstations have
|>>     higher priority to the network than the farm.
|> 
|> 
|>     Many Cisco routers have this, BTW.
|> 
|>     They refer to this as QOS (Quality Of Service) prioritization.
|>     There may be other terms for it.
|> 
|>     Made popular because of VoIP, and other high volume data
|>     streaming apps.
|> 
|>     I believe some of the more sophisticated switches have
|>     prioritization as well, where you can set priorities
|>     at the individual ports, so that workstations and server
|>     can be given higher priority than the farm.
|> 
|>     I don't know specifics, I just hear rumors ;)
|> 
|>     Has anyone on the group played with this stuff, and has
|>     any stories to tell?
|> 
|>     I know some of my larger customers have played with that
|>     stuff with success, but I've never received particular
|>     hardware or 'best config' recommendations.
|


   From: "Dylan Penhale" <dylan@(email surpressed)>
Subject: RE: Staggering render start times.
   Date: Tue, 17 May 2004 01:24:09 -0700
Msg# 613
View Complete Thread (7 articles) | All Threads
Last Next
After I switched the Maya workstations ports on our main switch to HIGH QOS,
there is certainly a difference. Not done any kind of tests, but I thought I
would let you know.

The switch I am using, which I am using as a "backbone" switch is the Asante
GX5-2400W. It's pretty neat, the only thing I miss is SNMP monitoring, I
need to keep the web interface open to do any monitoring, which makes
graphing tricky.



|-----Original Message-----
|From: Greg Ercolano [mailto:erco@(email surpressed)] 
|Sent: 14 May 2004 10:14
|To: void@(email surpressed)
|Subject: Re: Staggering render start times.
|
|[posted to rush.general]
|
|Greg Ercolano wrote:
|>     If you've got a network large enough to where it slows your
|>     server down, you may want to consider some kind of network
|>     flow control at the switches, so that the workstations have
|>     higher priority to the network than the farm.
|
|	Many Cisco routers have this, BTW.
|
|	They refer to this as QOS (Quality Of Service) prioritization.
|	There may be other terms for it.
|
|	Made popular because of VoIP, and other high volume data
|	streaming apps.
|
|	I believe some of the more sophisticated switches have
|	prioritization as well, where you can set priorities
|	at the individual ports, so that workstations and server
|	can be given higher priority than the farm.
|
|	I don't know specifics, I just hear rumors ;)
|
|	Has anyone on the group played with this stuff, and has
|	any stories to tell?
|
|	I know some of my larger customers have played with that
|	stuff with success, but I've never received particular
|	hardware or 'best config' recommendations.
|


   From: "Dylan Penhale" <dylan@(email surpressed)>
Subject: RE: Staggering render start times.
   Date: Sat, 14 May 2004 02:36:56 -0700
Msg# 604
View Complete Thread (7 articles) | All Threads
Last Next
  
|> Because of the size of some of our renders, setting them all off a 
|> once is putting quite a load on our file server and slowing 
|> workstations. Is it possible to set an advanced submit 
|option to delay 
|> each nodes frame start time?
|
|	Trouble is, even if you stagger, eventually once all the
|	machines are rendering, they'll all be hitting your server.

I think the main problem we where seeing with big jobs is when all the
render nodes try to load the same data at exactly the same time, this is
when the server hits 100%. Even if we could separate them drawing data by a
few seconds it should help. I agree that this is only likely to help in
situations where actual render times are long compared to server I/O. If a
render takes only a short amount of time to render, it's likely to have
little effect I agree.

|
|	If you've got a network large enough to where it slows your
|	server down, you may want to consider some kind of network
|	flow control at the switches, so that the workstations have
|	higher priority to the network than the farm.

I'll look at this, I've been meaning to look at tuning the switches anyway,
and they have got some pretty cool looking features.

|
|	To stagger start, you could any one of several things..
|	just to give you an idea:
|
|		o Submit with the 'maxcpus' set to '1', then 
|slowly advance
|		  the maxcpus value with an external script:
|
|			#!/bin/csh -f
|			set jobid = $1
|			# Every 10 seconds, add two cpus. 
|Eventually, removes maxcpus limit
|			foreach i ( 2 4 8 10 0 )
|			    rush -maxcpus $i $jobid -fu
|			    sleep 10
|			end
|
|		  ..or build this behavior into the submit script.
|
|		o Ramp up sleep times in the render script for 
|first 20 frames:
|
|			if ( $frame <= 20 ) { sleep($frame*5); }
|
|		o Enter a loop that polls the server's uptime 
|before starting
|		  a frame to render, eg. in perl:
|
|			while ( 1 )
|			{
|			    # Parse output of 'uptime' to get 
|load average
|			    # If load average is high (>8), 
|sleep until it gets lower
|			    my $load=`rsh SERVER uptime | sed 
|'s/.*load averages: //;s/ .*//';
|			    if ( $load > 8.0 ) { sleep(10); next; }
|			    last;
|			}
|
This is cool, I would love to get this working off the CPU load.... I'll let
you know how I go, and uptime should report load fast enough.