RUSH RENDER QUEUE
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.40 09/26/02
Strikeout text indicates features not yet implemented


Introduction



Template Submit Script
[erco@howland]% rush -tss # Template Submit Script - Let's look at it. #!/bin/csh -f # It's just a csh script. # # S U B M I T # source /usr/tmp/rush/etc/.submit rush -submit << EOF title SHOW/SHOT # Title of job ram 250 # Amount of RAM job expects to use (MB) frames 1-100 # Frame range(s) logdir $cwd/logs # Directory for frame logs command $cwd/render-script # Path to render script donemail erco # Who to send mail to when job is done autodump done # Autodump job when done cpus howland=1@100 # cpu(s) to run on # Optional #notes This is a test # Optional free form notes for job #state Pause # Optional starting state for job EOF exit $status

There are other examples of submit scripts written in perl, and even C/C++.




Template Render Script
[erco@howland]% rush -trs # Template Render Script - Let's look at it. #!/bin/csh -f # It's just a csh script, too. ############################### # R E N D E R S C R I P T # ############################### # Source your render environment as needed source $RUSH_DIR/etc/.render # System environment settings. (You can add other sources) echo "--- Working on frame $RUSH_FRAME - `date`" ### YOUR RENDER COMMAND(S) HERE time sleep 10 # Your render command set err = $status # Keep exit code from your 'render command' # Rush exit codes: 0=DONE 1=FAIL 2=RETRY if ( $err ) then # Translate render command exit code to rush exit codes (0|1|2) echo --- FAIL; exit 1 else echo --- DONE; exit 0 endif #NOTREACHED#
Creating Render and Submit Scripts

[erco@howland]% rush -tss > submit_me # Create submit script [erco@howland]% rush -trs > render_me # Create render script [erco@howland]% chmod +x submit_me render_me [erco@howland]% ls -la submit_me render_me -rwxrwxr-x 1 erco stree 443 Jan 24 00:59 render_me -rwxrwxr-x 1 erco stree 359 Jan 24 00:59 submit_me [erco@howland]% vi submit_me render_me # Customize the scripts (see below) [..] [erco@howland]% cat submit_me #!/bin/csh -f # # S U B M I T # source /usr/tmp/rush/etc/.submit rush -submit << EOF title VEGA # Set our title ram 100 # MB of RAM we expect to use frames 1-10 # Frame range to use (1 thru 10) logdir $cwd/logs command $cwd/render_me # Our render script donemail erco autodump done cpus howland=3@100 # Use up to 3 cpus on howland at 100 priority EOF exit $status [erco@howland]% cat render_me #!/bin/csh -f ############################### # R E N D E R S C R I P T # ############################### # Source your render environment as needed source $RUSH_DIR/etc/.render echo "--- Working on frame $RUSH_FRAME - `date`" ### YOUR RENDER COMMAND(S) HERE render < /job/VEGA/ribs/${RUSH_PADFRAME}.rib # Command to be rendered set err = $status # Rush exit codes: 0=DONE 1=FAIL 2=RETRY if ( $err ) then echo --- FAIL; exit 1 else echo --- DONE; exit 0 endif #NOTREACHED#
Submitting Job
[erco@howland]% ./submit_me # Submit job by running submit script setenv RUSH_JOBID how.848 # Our jobid (grab with mouse and paste below) [erco@howland]% setenv RUSH_JOBID how-848 [erco@howland]% rush -lj # List Jobs STATUS JOBID TITLE OWNER %DONE BUSY NOTES ------ ----------- ------------ -------- ----- ---- ----------- Run how.848 VEGA erco %30 3 00:00:07
Monitoring Frames
[erco@howland]% rush -lf # List Frames to see how they're doing STAT FRAME TRY HOSTNAME PID START ELAPSED _ Run 0001 1 howland 12499 01/24,01:00:25 00:00:09 | 3 frames running on Run 0002 1 howland 12500 01/24,01:00:25 00:00:09 | howland for last 9 secs. Run 0003 1 howland 12501 01/24,01:00:25 00:00:09 _| Que 0004 0 - 0 00/00,00:00:00 00:00:00 Que 0005 0 - 0 00/00,00:00:00 00:00:00 Que 0006 0 - 0 00/00,00:00:00 00:00:00 Que 0007 0 - 0 00/00,00:00:00 00:00:00 Que 0008 0 - 0 00/00,00:00:00 00:00:00 Que 0009 0 - 0 00/00,00:00:00 00:00:00 Que 0010 0 - 0 00/00,00:00:00 00:00:00 [erco@howland]% rush -lfi # List Frame Info for brief report State Total Perc ----- ----- ---- Que 7 %69 # %69 to go Run 3 %30 # %30 busy running Done 0 %0 # no frames done yet Fail 0 %0 # no frames failed either Hold 0 %0 [erco@howland]% rush -lc # List cpus to see all cpus we submitted CPUSPEC[HOST] STATE FRM PID JOBTID ELAPSED NOTES how=3@100 Run 0001 12499 2 00:00:31 how=3@100 Run 0002 12500 3 00:00:31 how=3@100 Run 0003 12501 4 00:00:31 [erco@howland]% rush -lfi State Total Perc ----- ----- ---- Que 4 %40 Run 3 %30 Done 3 %30 # Some frames are done; up to %30.. Fail 0 %0 Hold 0 %0
Job Completion: Done Mail
[erco@howland]% You have new mail. # Rush sends mail when job is done [erco@howland]% Mail # Let's read the mail.. "/usr/mail/erco": 1 message 1 new >N 2 erco@erco.com Mon Jan 24 01:02 [how-848] VEGA (%0 QUE, %100 DONE, %0 FAIL) & From erco@erco.com Mon Jan 24 01:02:30 2000 Date: Mon, 24 Jan 2000 01:02:30 -0800 From: erco@erco.com (Greg Ercolano) To: erco@erco.com Subject: [how-848] VEGA (QUE=%0, DONE=%100, FAIL=%0) # Subject shows jobid, title, stats Jobid: how-848 Title: VEGA Priority: 1 LogDir: /usr/var/tmp/rush/logs Ram: 100 Command: /usr/var/tmp/rush/render_me ChkCommand: EndCommand: AutoDump: done User: erco (1000/1007) DoneMail: erco StartDate: Mon Jan 24 01:00:24 2000 EndDate: - Elapsed: 00:02:05 Frames: 10 Cpus: how=3@100 Notes[0]: - Criteria: STAT FRAME TRY HOSTNAME PID START ELAPSED # Frame list dump Done 0001 1 howland 12499 01/24,01:00:25 00:00:32 Done 0002 1 howland 12500 01/24,01:00:25 00:00:32 Done 0003 1 howland 12501 01/24,01:00:25 00:00:32 Done 0004 1 howland 12517 01/24,01:00:56 00:00:31 Done 0005 1 howland 12519 01/24,01:00:56 00:00:32 Done 0006 1 howland 12522 01/24,01:00:57 00:00:32 Done 0007 1 howland 12535 01/24,01:01:28 00:00:32 Done 0008 1 howland 12537 01/24,01:01:28 00:00:31 Done 0009 1 howland 12539 01/24,01:01:28 00:00:31 Done 0010 1 howland 12548 01/24,01:02:00 00:00:30 & q Held 1 message in /usr/mail/erco
Frame Logs
[erco@howland]% ls logs # Frame logs directory 0001 0003 0005 0007 0009 framelist 0002 0004 0006 0008 0010 [erco@howland]% more logs/0001 # Frame 0001's log ------------------------------------------ _ -- Host: howland | -- Pid: 12499 | -- Jobid: how.848 | -- Frame: 1 | -- Owner: erco (1000/1007) | Rush header -- Tmpdir: /usr/var/tmp/RUSH_TMP.12499 | -- Logfile: /usr/var/tmp/rush/logs/0001 | -- Command: /usr/var/tmp/rush/render_me | -- Started: Mon Jan 24 01:00:26 2000 _| ------------------------------------------ _ --- Working on frame 1 - Mon Jan 24 01:00:26 | Writing /job/VEGA/tif/0001.tif | Output from render script --- DONE _| [erco@howland]% more logs/framelist # Frame List when job completed STAT FRAME TRY HOSTNAME PID START ELAPSED NOTES Done 0001 1 howland 12499 01/24,01:00:25 00:00:32 Done 0002 1 howland 12500 01/24,01:00:25 00:00:32 Done 0003 1 howland 12501 01/24,01:00:25 00:00:32 Done 0004 1 howland 12517 01/24,01:00:56 00:00:31 Done 0005 1 howland 12519 01/24,01:00:56 00:00:32 Done 0006 1 howland 12522 01/24,01:00:57 00:00:32 Done 0007 1 howland 12535 01/24,01:01:28 00:00:32 Done 0008 1 howland 12537 01/24,01:01:28 00:00:31 Done 0009 1 howland 12539 01/24,01:01:28 00:00:31 Done 0010 1 howland 12548 01/24,01:02:00 00:00:30
Requeuing Frames
[erco@howland]% rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED _ Fail 0100 1 howland 23386 01/25,12:19:09 00:00:22 | Fail 0101 1 howland 23387 01/25,12:19:09 00:00:22 | Failed frames Fail 0102 1 howland 23388 01/25,12:19:09 00:00:22 _| Done 0103 1 howland 23410 01/25,12:19:30 00:00:22 Done 0104 1 howland 23412 01/25,12:19:30 00:00:23 [..] [erco@howland]% rush -que 100-102 # Requeue them manually to start again 0100: Que 0101: Que 0102: Que [erco@howland]% rush -lf STAT FRAME TRY HOSTNAME PID START ELAPSED _ Que 0100 1 howland 23386 01/25,12:19:09 00:00:22 | Que 0101 1 howland 23387 01/25,12:19:09 00:00:22 | They'll restart shortly Que 0102 1 howland 23388 01/25,12:19:09 00:00:22 _| Done 0103 1 howland 23410 01/25,12:19:30 00:00:22 Done 0104 1 howland 23412 01/25,12:19:30 00:00:23 [..]
Pausing A Job
[erco@howland]% rush -pause # Pause Job. Job how.850 is now 'Pause' # No new frames will be started, running frames allowed to finish. [erco@howland]% rush -lj STATUS JOBID TITLE OWNER %DONE BUSY NOTES ------ ----------- ------------ -------- ----- ---- ----------- Pause how.850 WERNER/C33 erco %55 0 Job paused. [erco@howland]% rush -cont # Continue Job Job how.850 is now 'Run' [erco@howland]% rush -lj STATUS JOBID TITLE OWNER %DONE BUSY NOTES ------ ----------- ------------ -------- ----- ---- ----------- Run how.850 WERNER/C33 erco %55 0 00:29:20
Advanced: Submit/Monitoring
[erco@howland]% eval `./submit_me` # Submit job, sets RUSH_JOBID automatically [erco@howland]% rush -ljf # List Job Full - All info about job Jobid: how-848 Title: VEGA Priority: 1 LogDir: /usr/var/tmp/rush/logs Ram: 100 Command: /usr/var/tmp/rush/render_me ChkCommand: EndCommand: AutoDump: done User: erco (1000/1007) DoneMail: erco StartDate: Mon Jan 24 01:00:24 2000 EndDate: - Elapsed: 00:00:16 Frames: 10 Cpus: how=3@100 Notes[0]: - Criteria: [erco@howland]% rush -lff | more # List Frame Full - all information about frames Frame: 1 State: Run NewState: ??? Hostname: howland Priority: 100 Pid: 12499 Tid: 1 TaskSeqID: 2 Tries: 1 Notes: StartDate: Mon Jan 24 01:00:25 2000 EndDate: Sun Jan 01 00:00:00 0000 Elapsed: 00:00:22 Frame: 2 State: Run NewState: ??? Hostname: howland Priority: 100 Pid: 12500 Tid: 2 TaskSeqID: 3 Tries: 1 Notes: StartDate: Mon Jan 24 01:00:25 2000 EndDate: Sun Jan 01 00:00:00 0000 Elapsed: 00:00:22 Frame: 3 State: Run NewState: ??? Hostname: howland Priority: 100 Pid: 12501 [..] [erco@howland]% rush -tasklist howland # Task List - View all jobs competing # for howland's cpus. TID JOBSID TASKSID PID JOBID/NAME FRM PRI STATE UTRY NOTES _ 3 4 4 0 how-848,VEGA 0000 100 Idle 1 Last: 0004 DONE | 4 4 5 0 how-848,VEGA 0000 100 Idle 1 Last: 0005 DONE | erco's 3 tasks reservations; one 'Run'ing 5 4 6 15650 how-848,VEGA 0006 100 Run 1 Elapsed=00:00:55 _| 6 6 7 15761 how-851,TESLA/MATTE 0502 200k Run 2 Elapsed=00:00:25 | liza's 2 task reservations; running at 7 6 8 15763 how-851,TESLA/MATTE 0503 200k Run 2 Elapsed=00:00:25 _| higher 'killer' priority - both are busy.

In the above example, 5 task reservations are on howland. Two belong to Liza's job, three to Erco's. Erco's job has 100k (kill) priority, and Liza's job has 200k (kill) priority, higher than Erco's. Howland only has 3 cpus, therefore since Liza's job has higher priority it runs both of hers. The one leftover cpu is taken by Erco's job.

---                  ---                  ---                  ---                  ---
---                  ---                  ---                  ---                  ---
--- WORK IN PROGRESS --- WORK IN PROGRESS --- WORK IN PROGRESS --- WORK IN PROGRESS --- 
---                  ---                  ---                  ---                  ---
---                  ---                  ---                  ---                  ---