From: "Dylan Penhale" <dylan@(email surpressed)> Subject: Staggering render start times. Date: Fri, 13 May 2004 03:56:05 -0700 |
Msg# 585 View Complete Thread (7 articles) | All Threads Last Next |
How can I stagger the start time of each frame of a render job? Because of the size of some of our renders, setting them all off a once is putting quite a load on our file server and slowing workstations. Is it possible to set an advanced submit option to delay each nodes frame start time? Perhaps something along the lines of: rush -waitfor 30 [jobid]? This looks as if it will only work on jobs not frames. Dylan Penhale Digital Systems Manager Nexus Visual Effects |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Staggering render start times. Date: Sat, 14 May 2004 01:53:45 -0700 |
Msg# 602 View Complete Thread (7 articles) | All Threads Last Next |
Dylan Penhale wrote: How can I stagger the start time of each frame of a render job?Because of the size of some of our renders, setting them all off a once isputting quite a load on our file server and slowing workstations. Is it possible to set an advanced submit option to delay each nodes frame start time? Trouble is, even if you stagger, eventually once all the machines are rendering, they'll all be hitting your server. If you've got a network large enough to where it slows your server down, you may want to consider some kind of network flow control at the switches, so that the workstations have higher priority to the network than the farm. To stagger start, you could any one of several things.. just to give you an idea: o Submit with the 'maxcpus' set to '1', then slowly advance the maxcpus value with an external script: #!/bin/csh -f set jobid = $1 # Every 10 seconds, add two cpus. Eventually, removes maxcpus limit foreach i ( 2 4 8 10 0 ) rush -maxcpus $i $jobid -fu sleep 10 end ..or build this behavior into the submit script. o Ramp up sleep times in the render script for first 20 frames: if ( $frame <= 20 ) { sleep($frame*5); } o Enter a loop that polls the server's uptime before starting a frame to render, eg. in perl: while ( 1 ) { # Parse output of 'uptime' to get load average # If load average is high (>8), sleep until it gets lower my $load=`rsh SERVER uptime | sed 's/.*load averages: //;s/ .*//'; if ( $load > 8.0 ) { sleep(10); next; } last; } |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Staggering render start times. Date: Sat, 14 May 2004 02:13:32 -0700 |
Msg# 603 View Complete Thread (7 articles) | All Threads Last Next |
Greg Ercolano wrote: If you've got a network large enough to where it slows your server down, you may want to consider some kind of network flow control at the switches, so that the workstations have higher priority to the network than the farm. Many Cisco routers have this, BTW. They refer to this as QOS (Quality Of Service) prioritization. There may be other terms for it. Made popular because of VoIP, and other high volume data streaming apps. I believe some of the more sophisticated switches have prioritization as well, where you can set priorities at the individual ports, so that workstations and server can be given higher priority than the farm. I don't know specifics, I just hear rumors ;) Has anyone on the group played with this stuff, and has any stories to tell? I know some of my larger customers have played with that stuff with success, but I've never received particular hardware or 'best config' recommendations. |
From: Steve Kochak <steve@(email surpressed)> Subject: Re: Staggering render start times. Date: Sat, 14 May 2004 09:33:34 -0700 |
Msg# 605 View Complete Thread (7 articles) | All Threads Last Next |
I used to use it with my Extreme Networks setup. However, in the end, I
just ended up making my storage better. Which OS are you using for your
fileserver(s)?
Greg Ercolano wrote: Greg Ercolano wrote:If you've got a network large enough to where it slows your server down, you may want to consider some kind of network flow control at the switches, so that the workstations have higher priority to the network than the farm.Many Cisco routers have this, BTW. They refer to this as QOS (Quality Of Service) prioritization. There may be other terms for it. Made popular because of VoIP, and other high volume data streaming apps. I believe some of the more sophisticated switches have prioritization as well, where you can set priorities at the individual ports, so that workstations and server can be given higher priority than the farm. I don't know specifics, I just hear rumors ;) Has anyone on the group played with this stuff, and has any stories to tell? I know some of my larger customers have played with that stuff with success, but I've never received particular hardware or 'best config' recommendations. |
From: "Dylan Penhale" <dylan@(email surpressed)> Subject: RE: Staggering render start times. Date: Sat, 14 May 2004 09:40:11 -0700 |
Msg# 606 View Complete Thread (7 articles) | All Threads Last Next |
Gawd, don't ask :) We are using OSX Xserves. I seem to get nothing but grief from them. I'm having a real problem with "delayed write failed" errors running the latest 10.3.3 release, and if I roll it back that error is not there, but all sorts of performance related issues come up. NFS used to be no better on these, but I may look at that next. Of course I know the real answer is a NetApp or similar... |-----Original Message----- |From: Steve Kochak [mailto:steve@(email surpressed)] |Sent: 14 May 2004 17:34 |To: void@(email surpressed) |Subject: Re: Staggering render start times. | |[posted to rush.general] | |I used to use it with my Extreme Networks setup. However, in |the end, I just ended up making my storage better. Which OS |are you using for your fileserver(s)? | | | |Greg Ercolano wrote: |> Greg Ercolano wrote: |> |>> If you've got a network large enough to where it slows your |>> server down, you may want to consider some kind of network |>> flow control at the switches, so that the workstations have |>> higher priority to the network than the farm. |> |> |> Many Cisco routers have this, BTW. |> |> They refer to this as QOS (Quality Of Service) prioritization. |> There may be other terms for it. |> |> Made popular because of VoIP, and other high volume data |> streaming apps. |> |> I believe some of the more sophisticated switches have |> prioritization as well, where you can set priorities |> at the individual ports, so that workstations and server |> can be given higher priority than the farm. |> |> I don't know specifics, I just hear rumors ;) |> |> Has anyone on the group played with this stuff, and has |> any stories to tell? |> |> I know some of my larger customers have played with that |> stuff with success, but I've never received particular |> hardware or 'best config' recommendations. | |
From: "Dylan Penhale" <dylan@(email surpressed)> Subject: RE: Staggering render start times. Date: Tue, 17 May 2004 01:24:09 -0700 |
Msg# 613 View Complete Thread (7 articles) | All Threads Last Next |
After I switched the Maya workstations ports on our main switch to HIGH QOS, there is certainly a difference. Not done any kind of tests, but I thought I would let you know. The switch I am using, which I am using as a "backbone" switch is the Asante GX5-2400W. It's pretty neat, the only thing I miss is SNMP monitoring, I need to keep the web interface open to do any monitoring, which makes graphing tricky. |-----Original Message----- |From: Greg Ercolano [mailto:erco@(email surpressed)] |Sent: 14 May 2004 10:14 |To: void@(email surpressed) |Subject: Re: Staggering render start times. | |[posted to rush.general] | |Greg Ercolano wrote: |> If you've got a network large enough to where it slows your |> server down, you may want to consider some kind of network |> flow control at the switches, so that the workstations have |> higher priority to the network than the farm. | | Many Cisco routers have this, BTW. | | They refer to this as QOS (Quality Of Service) prioritization. | There may be other terms for it. | | Made popular because of VoIP, and other high volume data | streaming apps. | | I believe some of the more sophisticated switches have | prioritization as well, where you can set priorities | at the individual ports, so that workstations and server | can be given higher priority than the farm. | | I don't know specifics, I just hear rumors ;) | | Has anyone on the group played with this stuff, and has | any stories to tell? | | I know some of my larger customers have played with that | stuff with success, but I've never received particular | hardware or 'best config' recommendations. | |
From: "Dylan Penhale" <dylan@(email surpressed)> Subject: RE: Staggering render start times. Date: Sat, 14 May 2004 02:36:56 -0700 |
Msg# 604 View Complete Thread (7 articles) | All Threads Last Next |
|> Because of the size of some of our renders, setting them all off a |> once is putting quite a load on our file server and slowing |> workstations. Is it possible to set an advanced submit |option to delay |> each nodes frame start time? | | Trouble is, even if you stagger, eventually once all the | machines are rendering, they'll all be hitting your server. I think the main problem we where seeing with big jobs is when all the render nodes try to load the same data at exactly the same time, this is when the server hits 100%. Even if we could separate them drawing data by a few seconds it should help. I agree that this is only likely to help in situations where actual render times are long compared to server I/O. If a render takes only a short amount of time to render, it's likely to have little effect I agree. | | If you've got a network large enough to where it slows your | server down, you may want to consider some kind of network | flow control at the switches, so that the workstations have | higher priority to the network than the farm. I'll look at this, I've been meaning to look at tuning the switches anyway, and they have got some pretty cool looking features. | | To stagger start, you could any one of several things.. | just to give you an idea: | | o Submit with the 'maxcpus' set to '1', then |slowly advance | the maxcpus value with an external script: | | #!/bin/csh -f | set jobid = $1 | # Every 10 seconds, add two cpus. |Eventually, removes maxcpus limit | foreach i ( 2 4 8 10 0 ) | rush -maxcpus $i $jobid -fu | sleep 10 | end | | ..or build this behavior into the submit script. | | o Ramp up sleep times in the render script for |first 20 frames: | | if ( $frame <= 20 ) { sleep($frame*5); } | | o Enter a loop that polls the server's uptime |before starting | a frame to render, eg. in perl: | | while ( 1 ) | { | # Parse output of 'uptime' to get |load average | # If load average is high (>8), |sleep until it gets lower | my $load=`rsh SERVER uptime | sed |'s/.*load averages: //;s/ .*//'; | if ( $load > 8.0 ) { sleep(10); next; } | last; | } | This is cool, I would love to get this working off the CPU load.... I'll let you know how I go, and uptime should report load fast enough. |