Rush Render Queue - Tutorial (C) Copyright 1995,2000 Greg Ercolano. All rights reserved. V 102.42 07/11/05 Strikeout text indicates features not yet implemented |
Creating A Submit Script
chmod +x submit_me When starting with a template, you should at least change the Title, Ram, Command, LogDir, and Cpus fields. |
% rush -tss > submit_me % chmod +x submit_me % vi submit_me % cat submit_me #!/bin/csh -f # # S U B M I T # set thisdir = ( `pwd` ) rush -submit << EOF title TEST ram 1 frames 5000-5500 command $thisdir/render_me logdir $thisdir/logs cpus +any=5 EOF exit $status |
|
Creating A Render Script
chmod +x render_me When starting with a template, you should at least add your render command in the section marked..
It is the minimum job of the Render Script to detect if the render worked or not, and to exit the script accordingly with one of these rush exit codes:
exit 1 # FAIL - failed, don't retry exit 2 # RETRY - failed, requeue the frame |
% rush -trs > render_me % chmod +x render_me % vi render_me % cat render_me #!/bin/csh -f ############################### # R E N D E R S C R I P T # ############################### # Source your render environment as needed source $RUSH_DIR/etc/.render echo "--- Working on frame $RUSH_FRAME - `date`" ### YOUR RENDER COMMAND(S) HERE render /job/TEXAS/shot17/ribs/$RUSH_PADFRAME.rib set err = $status # Rush exit codes: 0=DONE 1=FAIL 2=RETRY if ( $err ) then echo --- FAIL; exit 1 else echo --- DONE; exit 0 endif |
|
Submitting A Job
Using your mouse, just cut and paste this into your shell to set the variable. If you're using a CSH under Unix, you can submit the job and set the variable all in one operation by invoking the submit script as:
The environment variable setting allows you to control the job with successive rush commands, without having to specify the jobid for each command. Since you can have several jobs running at a time, rush must know which jobid you're controlling at all times. |
% ./submit_me setenv RUSH_JOBID va.215 % setenv RUSH_JOBID va.215 % rush -lj STATUS JOBID TITLE OWNER %DONE %FAIL BUSY NOTES ------ ----------- ------------ -------- ----- ----- ---- ----------- Run va.215 TEST erco %0 %0 3 00:00:10 % |
|
Frame List
Specific frames may be targeted for viewing by using typical Unix techniques like piping framelists through grep, e.g.:
rush -lf | grep Fail rush -lf | grep '500[1-9]' rush -lf | grep vaio |
% rush -lf | head STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Run 5000 1 vaio 18092 02/26,02:15:40 00:00:21 Run 5001 1 vaio 18093 02/26,02:15:40 00:00:21 Run 5002 1 vaio 18100 02/26,02:15:50 00:00:11 Que 5003 0 - 0 00/00,00:00:00 00:00:00 Que 5004 0 - 0 00/00,00:00:00 00:00:00 Que 5005 0 - 0 00/00,00:00:00 00:00:00 Que 5006 0 - 0 00/00,00:00:00 00:00:00 Que 5007 0 - 0 00/00,00:00:00 00:00:00 Que 5008 0 - 0 00/00,00:00:00 00:00:00 % rush -lf | grep Run Run 5000 1 vaio 18092 02/26,02:15:40 00:00:21 Run 5001 1 vaio 18093 02/26,02:15:40 00:00:21 Run 5002 1 vaio 18100 02/26,02:15:50 00:00:11 % |
|
Frame Logs
..which would view the frame logs for frame 0003. Several frames can be specified at a time, or even frame ranges, e.g:
rush -log 1 8 55 342 Or, you can simply more(1) the log files directly; log files use the frame number as the filename. In this example, the user wants to look at the log output for frame #3..
..which is more or less equivalent to:
|
% rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Done 0001 1 vaio 19458 02/27,13:36:59 00:08:54 Done 0002 1 vaio 19461 02/27,13:36:59 00:08:54 Fail 0003 1 tahoe 1172 02/26,02:15:40 00:00:03 Run 0004 1 vaio 19593 02/26,02:15:40 00:00:21 Run 0005 1 vaio 19600 02/26,02:15:50 00:00:11 Que 0003 0 - 0 00/00,00:00:00 00:00:00 Que 0004 0 - 0 00/00,00:00:00 00:00:00 Que 0005 0 - 0 00/00,00:00:00 00:00:00 Que 0006 0 - 0 00/00,00:00:00 00:00:00 Que 0007 0 - 0 00/00,00:00:00 00:00:00 Que 0008 0 - 0 00/00,00:00:00 00:00:00 % rush -log 3 # See what happened ### __ ### vaio.215: 0003 | ### | ------------------------------------- | -- Host: tahoe | -- Pid: 1172 | -- Jobid: vaio.215 | Rush Header. -- Frame: 3 | Shows info about this frame. -- Tries: 0 | -- Owner: erco (1000/1007) | -- Nice: 10 | -- Tmpdir: /var/tmp/.RUSH_TMP.412 | -- LogFile: /net/job/rush/logs | -- Command: /net/job/rush/render_me | -- Started: Fri Jun 15 00:33:48 2001 __| ------------------------------------- __ Sourcing user environment.. OK | Sourcing BMRT environment.. OK | | Render Script Output Working on frame 0003 | The stdout/stderr of your script. /net/job: File system is full | FAIL 1 __| |
|
Requeue Frames
Sometimes just requeuing the frames will cause them to render successfully on the second try. You can requeue frames with rush -que by specifying particular frames or frame ranges, or even by requeueing all frames in a particular state, e.g.:
rush -que 1-10 rush -que fail Requeueing a "Run" frame will kill it, returning it to the Que state, where it will eventually be restarted elsewhere. |
% rush -lf | head STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Run 5000 1 vaio 18092 02/26,02:15:40 00:00:21 Run 5001 1 vaio 18093 02/26,02:15:40 00:00:21 Run 5002 1 vaio 18100 02/26,02:15:50 00:00:11 Que 5003 0 - 0 00/00,00:00:00 00:00:00 Que 5004 0 - 0 00/00,00:00:00 00:00:00 Que 5005 0 - 0 00/00,00:00:00 00:00:00 Que 5006 0 - 0 00/00,00:00:00 00:00:00 Que 5007 0 - 0 00/00,00:00:00 00:00:00 Que 5008 0 - 0 00/00,00:00:00 00:00:00 % rush -que 5000-5003 5000: Que [Killed] 5001: Que [Killed] 5002: Que [Killed] 5003: Que % rush -lf | head STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Run 5000 2 rotwang 18117 02/26,02:16:14 00:00:05 Run 5001 2 rotwang 18119 02/26,02:16:14 00:00:05 Run 5002 2 vaio 18115 02/26,02:16:14 00:00:05 Que 5003 0 - 0 00/00,00:00:00 00:00:00 Que 5004 0 - 0 00/00,00:00:00 00:00:00 Que 5005 0 - 0 00/00,00:00:00 00:00:00 Que 5006 0 - 0 00/00,00:00:00 00:00:00 Que 5007 0 - 0 00/00,00:00:00 00:00:00 Que 5008 0 - 0 00/00,00:00:00 00:00:00 % |
|
Requeue Failed Frames
You can either requeue the frames individually or requeue all the Fail frames with a single command:
The same trick works for all the frame states, i.e.: rush -que done # All 'Done' frames become 'Que' rush -que hold # All 'Hold' frames become 'Que' rush -done que # All 'Que' frames become 'Done' (etc.)
|
% rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Done 0001 1 vaio 19458 02/27,13:36:59 00:03:04 Done 0002 1 vaio 19461 02/27,13:36:59 00:03:04 Fail 0003 1 rotwang 19463 02/27,13:36:59 00:00:00 Done 0004 1 vaio 19467 02/27,13:36:59 00:03:04 Fail 0005 1 rotwang 19470 02/27,13:36:59 00:00:01 Done 0006 1 vaio 19472 02/27,13:48:59 00:03:04 Done 0007 1 vaio 19476 02/27,13:49:00 00:03:04 Done 0008 1 vaio 19479 02/27,13:49:00 00:03:04 Fail 0009 1 rotwang 19481 02/27,13:49:00 00:00:00 Fail 0010 1 rotwang 19485 02/27,13:49:00 00:00:00 Fail 0011 1 rotwang 19425 02/27,13:49:59 00:00:00 Done 0012 1 vaio 19428 02/27,13:55:59 00:03:04 Done 0013 1 vaio 19431 02/27,13:55:59 00:03:04 Done 0014 1 vaio 19434 02/27,13:55:59 00:03:04 Done 0015 1 vaio 19437 02/27,13:59:04 00:03:04 Done 0016 1 vaio 19440 02/27,13:59:04 00:03:04 Done 0017 1 vaio 19444 02/27,13:59:04 00:03:04 Fail 0018 1 rotwang 19446 02/27,13:59:45 00:00:00 Done 0019 1 vaio 19449 02/27,14:09:05 00:03:04 Done 0020 1 vaio 19452 02/27,14:09:05 00:03:04 % rush -que fail 0003: Que 0005: Que 0009: Que 0010: Que 0011: Que 0018: Que % |
|
Holding Frames
You could pause the entire job with:
...but then ALL frames stop rendering until you continue the job. If you only want certain frames to be skipped, mark them with 'Hold':
Those frames will be skipped for rendering, and the system will render all other frames. If there are no more frames to render, the job will remain active (even if the job is set to AutoDump) until the 'Hold' frames are manually marked 'Que' again:
rush -que hold # or this.
|
% rush -hold 35-50 0035: Hold 0036: Hold 0037: Hold 0038: Hold 0039: Hold 0040: Hold 0041: Hold 0042: Hold 0043: Hold 0044: Hold 0045: Hold 0046: Hold 0047: Hold 0048: Hold 0049: Hold 0050: Hold % rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Run 0030 1 vaio 19552 02/27,13:56:40 00:01:09 Run 0031 1 vaio 19554 02/27,13:56:40 00:01:09 Run 0032 1 vaio 19556 02/27,13:56:40 00:01:09 Que 0033 0 - 0 00/00,00:00:00 00:00:00 Que 0034 0 - 0 00/00,00:00:00 00:00:00 Hold 0035 0 - 0 00/00,00:00:00 00:00:00 Hold 0036 0 - 0 00/00,00:00:00 00:00:00 Hold 0037 0 - 0 00/00,00:00:00 00:00:00 Hold 0038 0 - 0 00/00,00:00:00 00:00:00 Hold 0039 0 - 0 00/00,00:00:00 00:00:00 Hold 0040 0 - 0 00/00,00:00:00 00:00:00 Hold 0041 0 - 0 00/00,00:00:00 00:00:00 Hold 0042 0 - 0 00/00,00:00:00 00:00:00 Hold 0043 0 - 0 00/00,00:00:00 00:00:00 Hold 0044 0 - 0 00/00,00:00:00 00:00:00 Hold 0045 0 - 0 00/00,00:00:00 00:00:00 Hold 0046 0 - 0 00/00,00:00:00 00:00:00 Hold 0047 0 - 0 00/00,00:00:00 00:00:00 Hold 0048 0 - 0 00/00,00:00:00 00:00:00 Hold 0049 0 - 0 00/00,00:00:00 00:00:00 Hold 0050 0 - 0 00/00,00:00:00 00:00:00 Que 0051 0 - 0 00/00,00:00:00 00:00:00 Que 0052 0 - 0 00/00,00:00:00 00:00:00 Que 0053 0 - 0 00/00,00:00:00 00:00:00 Que 0054 0 - 0 00/00,00:00:00 00:00:00 % |
|
Pausing A Job
You can pause the current job at any time with:
To continue the job again, use:
Jobs that are paused will show up in 'rush -lj' reports with the job's State field being 'Pause'. You can pause / continue several jobs at a time:
rush -cont vaio.215 vaio.216 tahoe.353 If your login name is 'fred', you can pause all your jobs with:
Use pause if you want to leave the job in the queue while making fixes to scene files, then continue the job to pick up. You can pause a job and kill all running frames using this rush combination:
|
[erco@rotwang] % rush -pause Job rotwang.52 is now 'Pause' [erco@rotwang] % rush -lj STATUS JOBID TITLE OWNER %DONE %FAIL BUSY NOTES ------ ----------- ------------ -------- ----- ----- ---- ----------- Pause rotwang.52 FOO erco %15 %0 2 Job paused. [erco@rotwang] % rush -cont Job rotwang.52 is now 'Run' [erco@rotwang] % rush -lj STATUS JOBID TITLE OWNER %DONE %FAIL BUSY NOTES ------ ----------- ------------ -------- ----- ----- ---- ----------- Run rotwang.52 FOO erco %25 %0 3 00:00:55 | |
Dumping A Job
All running frames are killed, and the job will remain in 'rush -lj' reports until all frames are confirmed killed. 'rush -lc' will show which frames are being waited for. If you have DoneMail configured, mail will be sent indicating the job was dumped. If a DoneCommand is configured, it will be executed before the job is removed from the queue. To dump jobs, you can either specify the jobid(s) on the command line individually or let the RUSH_JOBID variable determine which jobs will be dumped.
|
% rush -lj STATUS JOBID TITLE OWNER %DONE %FAIL BUSY NOTES ------ ----------- ------------ -------- ----- ----- ---- ----------- Run va.15 TEST erco %30 %0 3 00:08:10 Run va.16 TEST2 erco %10 %0 8 00:08:12 % rush -dump va.15 va.16 Job va.15 is now 'Dump' Job va.16 is now 'Dump' % rush -lj STATUS JOBID TITLE OWNER %DONE %FAIL BUSY NOTES ------ ----------- ------------ -------- ----- ----- ---- ----------- % |
|
Frame Notes
You can embed 'rush -notes' commands into your render script to alter the 'NOTES' field for the rendering frame, e.g.:
rush -notes ${RUSH_FRAME}:'Short error msg here' endif Frame notes are cleared each time a frame begins rendering, so there is no need to specify a rush command to clear the frame notes in your render script. In fact, that action is discouraged because of the following warning:
|
% cat render_me [..] echo "--- Working on frame $RUSH_FRAME - `date`" ### YOUR RENDER COMMAND(S) HERE particle $DATA/files/stars-$RUSH_PADFRAME.par set err = $status # CHECK FOR MISSING FILES egrep -i no.such.file.or.directory $RUSH_LOGFILE > /dev/null if ( $status ) rush -notes ${RUSH_FRAME}:'Missing file' # CHECK FOR CORE DUMPS egrep -i core.dumped $RUSH_LOGFILE > /dev/null if ( $status ) rush -notes ${RUSH_FRAME}:'Core dumped' # CHECK FOR LICENSE ERRORS egrep -i no.available.licenses $RUSH_LOGFILE > /dev/null if ( $status ) then rush -notes ${RUSH_FRAME}:'License error' sleep 10 endif [..] % rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Fail 0030 2 vaio 20338 02/27,14:41:22 00:01:03 Missing file Fail 0031 2 vaio 20339 02/27,14:41:22 00:01:03 Missing file Fail 0032 2 vaio 20340 02/27,14:41:22 00:01:03 Missing file Run 0033 9 vaio 20365 02/27,14:55:25 00:00:45 License error Done 0034 9 vaio 20367 02/27,14:41:25 00:01:04 - Done 0035 8 vaio 20369 02/27,14:41:25 00:01:04 - Done 0036 8 tahoe 20389 02/27,14:41:29 00:01:03 - Done 0037 8 tahoe 20394 02/27,14:41:29 00:01:03 - Done 0038 8 tahoe 20396 02/27,14:41:29 00:01:03 - Done 0039 8 superior 20413 02/27,14:41:32 00:01:03 - Done 0040 8 superior 20423 02/27,14:41:32 00:01:03 - Done 0041 8 erie 20425 02/27,14:41:32 00:01:03 - Done 0042 8 rotwang.erco.c 12662 02/27,14:41:32 00:01:06 - Done 0043 8 rotwang.erco.c 12663 02/27,14:41:32 00:01:06 - Fail 0044 8 rotwang.erco.c 12664 02/27,14:55:35 00:00:55 Missing file Fail 0045 8 ontario 20434 02/27,14:55:35 00:00:55 Missing file Fail 0046 8 ontario 20441 02/27,14:55:35 00:00:55 Missing file |
|
All Hosts/All Jobs/All Cpus Reports
|
% rush -lah IP Hostname Ram Cpus MinPri Criteria 10.100.100.1 tahoe 512 2 0 +any,irix 10.100.100.2 superior 512 2 0 +any,irix,+dante 10.100.100.3 erie 512 2 0 +any,irix,+dante 10.100.100.4 ontario 512 2 0 +any,irix,+dante 10.100.100.5 vaio 128 1 0 +any,linux,intel 10.100.100.6 rotwang 128 1 100 +any,linux,intel % rush -laj STATUS JOBID TITLE OWNER %DONE %FAIL BUSY NOTES ------ ----------- ------------ -------- ----- ----- ---- ----------- Run tahoe.1 BURN/anchor colby %100 %0 0 42:11:46 Run superior.3 BURN/relic zev %80 %0 3 03:22:03 Run superior.4 BURN/relic2 zev %0 %50 1 02:06:13 Run erie.2 COMSAT/anten renke %0 %0 0 19:32:15 Run erie.4 COMSAT/retra renke %0 %100 0 08:59:08 Run ontario.6 BURN/lick1 colby %100 %0 0 43:37:41 Run ontario.8 BURN/lick3 colby %100 %0 0 22:46:14 Run ontario.9 BURN/lick4 colby %75 %25 0 22:31:44 Run ontario.10 BURN/lick5 colby %0 %100 0 22:25:38 Run ontario.11 BURN/lick6 colby %100 %0 0 21:47:35 Run ontario.12 BURN/lick7 colby %99 %1 0 20:56:52 Run vaio.11 TEST/r+d wu %100 %0 0 24:42:34 Run erie.12 TEST/r+d wu %100 %0 0 22:10:29 % rush -lac HOST OWNER JOBID TITLE FRM PRI PID ELAPSED tahoe - - - - - - Online tahoe - - - - - - Online superior zev superior.3 BURN/relic 0034 100 4026 00:10:01 superior zev superior.3 BURN/relic 0035 100 4027 00:10:02 erie zev superior.3 BURN/relic 0041 900k 14560 00:10:15 erie zev superior.4 BURN/relic2 0038 100k 4038 00:10:02 ontario - - - - - - Offline ontario - - - - - - Offline vaio - - - - - - Offline *** NO RESPONSE FROM: *** rotwang |