Rush - Job Priority

Rush Render Queue - Job Priorities
V 103.07b 05/11/16
(C) Copyright 2008, 2016 Seriss Corporation. All rights reserved.
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.

Strikeout text indicates features not yet implemented

Rush Priority
Priority Description Priority Staircasing FIFO Scheduling

   Priority Description

In general, higher priority values always take precedence.
Priority values are in the range 1-999. Values outside this range cause an error. (In 102.42a9c, the priority min/max can be set via rush.conf with priority.range)
Priority values are generally specified in the 'cpus' command, such as:

cpus tahoe@100
cpus tahoe=1@100

Both of the above are equivalent, asking for one cpu on tahoe at 100 priority. (If the number of cpus is not specified, '1' is the default).
Jobs contend for cpus based primarily on priority. When priority values are equal, the system uses a round robin scheme as 'first come, first served'.
Priority values are 'relative'. If all other jobs on the network are 100 but your job is 101, it will always win battles for an idle cpu. (Making your job 200 priority will not make it execute any quicker.)
When priority values differ, cpus are arbitrated using these rules:

Higher Priority Always Wins.
If two jobs contend for a cpu, the higher priority job always wins-- which implies the next rule:

Lower Priority Jobs Always Lose.
The lower priority job will always lose.

Equal Priority Jobs Share.
If two jobs of equal priority contend for a cpu, they alternate execution on the cpu.

In addition to the above, 'priority flags' may be appended to the priority values. (eg. 100k, 100a, 100ka). These flags augment the above behavior in the following ways:

Kill Flag ('k').
The job will kill lower priority jobs immediately, rather than wait for them to finish rendering frames already in progress (the default behavior). A killed frame will automatically be re-rendered on the next available cpu. The 'k' flag is only effective against jobs of lower priority. Where the priorities are equal, the 'k' flag has no effect.

Almighty Flag ('a').
Disables higher priority jobs from being able to kill frames already in progress during priority battles. Basically, this disables other job's Kill ('k') flags, causing these jobs to revert to the default 'passive' behavior of waiting for an in-progress frame to finish.

Priority flags are normally used separately, but can be combined (e.g., 100ka) to create the situation known as 'Kick Ass' mode.

Beware: abuse of these flags can be tracked by sysadmins. If you cause trouble by submitting jobs with killer priorities that are not assigned to you, you can be tracked down via the system's auditing logs.

Here are some example situations to demonstrate the above rules.

Priority Scenarios
Example: Passive Higher Priority (Non-Killer)

A 100 priority job is running on a cpu. No other jobs are using the cpu, so the job continues to render on that cpu, one frame after the other.
Suddenly, someone submits a 200 priority job to the same cpu. The 100 priority job will be allowed to finish rendering the current frame and then the 200 priority takes over the cpu, rendering all its frames. Once the 200 priority job has completed, the 100 priority job continues to render the remaining frames.

Example: Aggressive Higher Priority (Killer)

Similar to the above, a 100 priority job is running on a cpu. No other jobs are active, so the job continues to render on that cpu, one frame after the other.
But this time, someone submits a 200k priority job (kill flag is set). The 100 priority job's frame is immediately killed, and the 200k priority job takes over the cpu until all its frames are rendered, at which point the 100 priority job resumes on the cpu.

Example: Equal Priority (Round Robin)

Again, a 100 priority job is running, with no other jobs active.
Then someone submits a different job at 100 priority. Both jobs will alternate using the cpu, yielding to each other.
Note: Even if either or both jobs had their 'k' flags set, the behavior would still be the same, since the priority of both jobs is equal (killer jobs will only kill lower priority jobs, not jobs of equal priority).

   Priority Staircasing

To use rush effectively, assign jobs with some cpus at high priority, and some at low. This is called 'staircasing' the priorities, because when you graph it out, it resembles a staircase:
____________________________________________________________________________ | | | +any=5@800k | P 200-| _________ | r | | | | | | | | i | | | | | | | +any=10@100 | o 100-| | | | | | |_________________ | r | | | | | | | | | | | | | | | | | i | | | | | | | | | | | | | | | | +any=20@1 | t 1-| | | | | | | | | | | | | | | |_______________________________________ | y | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| | | | |____________________________________________________________________________| | | | | 0 5 15 35 T o t a l C p u s R e q u e s t e d Job Requested: _ +any=5@800k | +any=10@100 |-- 35 total +any=20@1 _|
Most companies find they only need two specifications per job; a high priority request for cpus (+any=5@800k) and a low priority request (+any=20@1).
This way, if someone else submits with similar priorities, each job will at least get 5 procs each (due the 5 at high priority), as the high priority requests will take precedence over the low priority cpus.
This ensures everyone gets at least 5 high priority cpus. The only problem is if the network is completely saturated with high priority frames, in which case you're probably needing more machines. For the 'high' priority requests, a good rule of thumb is to use a cpu cap (=5) that is around 1/10th the number of cpus on your farm.
So if you have a 10 host farm use =1@800, on a 50 host farm use =5@800, on 100 use =10@800, etc. Use smaller cpu cap values if there are more jobs.
If all jobs use the same requests, there should be a balance. If one job needs a few more cpus than others, just make the high priority cpu cap a little higher, so instead of =5, use =8.
There are basically two ways to use "staircased" priorities; 'Passive' and 'Aggressive'.

Staircased Priorities: Passive
If everyone submits with:
+any=2@800 +any=50@1
..then they'll all get at least =2 cpus @800 high priority, and the rest up to =50 procs @1 low priority.
The idea here is each job asks for any 2 processors at high priority, and the rest at low.
But if the network is saturated with jobs, then a new guy submitting won't be able to get a frame running until one of the running frames finish. If the renders are long, that might be a while to wait.

Staircased Priorities: Aggressive

A more aggressive job wouldn't want to wait around for low priority frames to get done. They want the job to take those high priority frames right away.
This is where the 'k' flag becomes useful (Kill) on the high priority submission, so that it bumps lower priority frames out of the way, instead of waiting for them to finish, so the job can kick in those high priority frames without waiting.
So instead, if everyone 'that's in a hurry' submits jobs with:
+any=2@800k -- note the 'k' +any=50@1
..then that ensures that if the network is mostly saturated with low priority renders, this new submit will bump at least two low priority renders to get a couple at high priority.
Note that the above will only bump for two procs, the rest will wait in round robin.
High priority renders won't bump other high priority renders (as long as the 'high' numbers are all 100k).
Feel free to ask questions, but first check out the docs above to get an understanding of how the priority stuff works.
You may find you want to increase or decrease the number after the '=' sign as needed.
The priority numbers themselves are relativeistic; there's no magic about the numbers '100' or '1', they could be '499' and '500'. Just as long as the entire shop agrees on what numbers are considered 'high priority' and what numbers are considered 'low'.
It is recommended to think of numbers above 99 as 'high', and 99 and lower as low. This is arbitrary from the software's point of view, since values are relativistic. But using these arbitrary values leaves some elbow room in both directions.
All companies reserve 999k priority for 'it must go through' jobs. For instance, submitting a job with '+any=3@999k' will ensure 3 frames rendering as soon as the job is submitted. And of course, to 'take over' the entire network, nasty this would be: +any=100@999k, which would kill everything running (requing the frames of course) and pushing the current job through the pipe.

   FIFO Scheduling

FIFO (First In/First Out) scheduling is a scheduling option that was added to Rush 102.42a9. "Round Robin" is normally the default scheduling algorithm used by Rush, but the systems administrator can change this with the rush.conf file's "sched fifo" command. FIFO and Round Robin are mutually exclusive scheduling techniques; the entire farm will use either one or the other. These are used to decide which job gets an available cpu when priorities of jobs are otherwise equal.
For instance, when a cpu becomes available, if all jobs being considered are of equal priority, then one of these schemes is used to decide "who's next" to break the tie.
Comparing the two scheduling schemes:

   Round Robin Scheduling

Round Robin Scheduling (default) will render job's frames using "fairness"; whenever a processor becomes available, each job will get a chance to render a frame on that processor. In the case of three jobs A,B and C, all rendering at the same priority, each time a cpu becomes available, it first works on a frame from job "A", then job "B", then job "C", then back to "A". This repeats until the jobs are completed.
With Round Robin scheduling, jobs of equal priority finish more or less at the same time. Example:

An example of round robin scheduling.
Note the PROGRESS field shows the queue is moving through jobs concurrently, more or less from left-to-right.

   FIFO Scheduling

FIFO Scheduling will render job's frames in the order the jobs were submitted, one job at a time; whenever a processor becomes available, it will continue to work on the oldest job until that job can render no more frames. Then the processor will start working on the next job. So in the case of three jobs A,B and C, submitted in that order and rendering at the same priorities, each time a cpu becomes available, it continues to work on frames from job "A" until that job finishes, then works on frames from job "B", then "C".
With FIFO scheduling, jobs of equal priority finish more or less in the order they were submitted. Example:

An example of FIFO scheduling.
Note the PROGRESS field shows the queue moving through jobs from top-to-bottom.

FIFO means the job "first in" gets the cpus it wants, so that job becomes the "first out" (ie. first done), and all all other jobs submitted after that have to wait in line.
FIFO is useful if users don't mind waiting in line, and don't care about not getting /any/ cpus until their job is next in line, where it then "takes over" the entire farm.
Since priority always takes precedence, FIFO scheduling can only be clearly seen when all jobs are at the same priority. However, it is useful to mix priorities and FIFO, to create different FIFO 'tiers'.

Using "Pure" FIFO

The idea of FIFO is to prevent users from having to be concerned about prirority, and it's just a 'first in/first out' queue. So for a pure FIFO queue, everyone should submit with the *same priority*, and not uses mixes (eg. NOT use Priority Staircasing)
So if everyone submits their job with e.g.
+any=50@100
..then all jobs will line up in a row (as shown in the above screenshots), each waiting for the job that was submitted in front of them to finish.

Using FIFO With Priority

There's always situations where 'pure FIFO' is too simple; there's always someone who's job suddenly becomes very important, and even though their job is last in the queue, it needs to get *some* cpus right away to preview a few frames, or perhaps it needs *all* cpus because a client is waiting.
In such cases, in addition to the user's +any=50@100 FIFO cpu request, they can add a few cpus at a higher priority, say +any=4@200.
Since all the other jobs are @100, this job will "win" up to 4 cpus @200 priority from the other FIFO jobs because of the higher priority request.
Remember that "higher priority always wins", so those 4 cpus @200 means they'll get some cpu right away, while their other cpu request for +any=50@100 will continue to wait in FIFO order.
In this way a job can still 'mostly' follow FIFO rules, with just a few higher priority cpus being the exception.
Or, it can ask for *ALL* cpus at high priority to take over the entire network right away. Use this for very critical jobs that need to get done right away, or can get done very quickly.
If more than one job is configured with high priority, they will fight amongst each other using FIFO precedence, before anything is done with the lower priority FIFO queue. This would effectively create two 'tiers', or two FIFO queues.
So with this, you can make multiple FIFO queues. Say there's two projects:
1) "INHOUSE" -- a very low priority project 2) "VIP" -- an important project
The "INHOUSE" project should only get cpus when the higher priority "VIP" project has nothing to be rendered. So if all users on the VIP project submit with +any=50@200, whereas all users on the INHOUSE project submit +any=50@100, then the VIP project will have its own FIFO 'queue', and all of its jobs will will /always/ be ahead of the INHOUSE project. It's only when there are no VIP projects that the INHOUSE project will get some cpus.