From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: How do I force a machine with multiple CPUs to use only one CPU
   Date: Mon, 08 May 2006 17:23:26 -0400
Msg# 1286
View Complete Thread (2 articles) | All Threads
Last Next
Jon Herman wrote:
[posted to rush.general]

I would like to be able to create a rush group that would limit the number of CPUs to be used in a specific job.

Here's my problem: I need to be able to use all of the CPUs on a host for rendering with XSI, but I also need to be able to use those same muli-cpu hosts with an application that work using only one CPU.

So, I'd like to retain the multiple processors for XSI jobs, but use a single processor for another type of job, on the same set of hosts. I know I can describe the host's CPUs in the rush.conf file, but is there a way to make rush think the host only has one processor when it's using a specific group?

Hi Jon,

	It sounds like you want a single XSI frame to take up the whole
	machine when it runs, blocking other jobs from using the other cpus,
	and causing rush to only run one instance of XSI.

	And when single thread non-XSI frames are running on the cpus,
	they just take up one cpu, each, such that several frames can
	run on a machine, one per cpu.

	There are a few techniques defined here:
	http://www.seriss.com/rush-current/rush/rush-techniques.html#Threading

	None of these approaches are 'pretty'; the issue is that for rush
	to do it properly, when a cpu becomes available, rush would have to
	hold it available until the OTHER cpu also freed up, so that both
	cpus would be free when the XSI job runs. Currently rush doesn't
	hold a cpu free to wait for other cpu(s) to free up to run a job.
	But I see no way to do it without doing that.

	The 'using ram to reserve cpus' approach (#1 in the above) comes close,
	but the job will only take a cpu if both cpus are available. You can
	submit the XSI job with a higher priority than others using the 'k' flag,
	to ensure it first bumps other jobs out of the way, so that it can secure
	both processors.

	For instance; say all the machines on your farm are configured with
	4096 of ram in rush (ie. the 'RAM' field in the rush/etc/hosts file
	are all set to 4096), then submitting an XSI job with:

# SUBMIT XSI JOB
rush -submit << EOF
:
ram 4096
cpus +any=5@10k
:
EOF

	..will cause the job to request to use all the ram on each machine,
	and submits asking for 5 cpus at 10k priority.

	This way if two processors on a machine are each rendering single
	threaded maya jobs at a lower priority, the above XSI job will bump
	those two maya jobs out of the way. because:

                > the 10k (the k=kill) will ensure other jobs are cleared off,
                  because this job will kill off other lower priority jobs
                  to clear up enough ram to run this one

                > the "ram 4096" guarantees all ram will be reserved to this job's
                  frame, preventing other jobs from jumping in, and also preventing
                  this job from using more than one cpu on each machine

	For instance, here's a maya job using both processors of all machines
	on a small network of 4 machines, each with dual procs, running at a
	priority of 5:

[erco@ontario] : rush -lac
HOST            OWNER          JOBID            TITLE                     FRAME   PRI   PID        ELAPSED REMARKS
ontario         erco           ontario.56       MAYA_JOB                  0007    5     7392      00:05:02
ontario         erco           ontario.56       MAYA_JOB                  0008    5     7394      00:05:02

rotwang         erco           ontario.56       MAYA_JOB                  0001    5     29699     00:05:03
rotwang         erco           ontario.56       MAYA_JOB                  0004    5     29701     00:05:03

meade           erco           ontario.56       MAYA_JOB                  0002    5     32204     00:05:03
meade           erco           ontario.56       MAYA_JOB                  0003    5     32206     00:05:03

tower           erco           ontario.56       MAYA_JOB                  0005    5     5062      00:05:03
tower           erco           ontario.56       MAYA_JOB                  0006    5     5063      00:05:03

	Now I submit an XSI job asking for all the ram on each machine (4096)
	and asking for +any=3@10k, and ram of 4096:

# SUBMIT XSI JOB
rush -submit << EOF
:
ram 4096
cpus +any=5@100k
:
EOF
	As soon as the job is submitted, 3 of the 4 machines will get their maya
	frames bumped (and requeued), putting the XSI job in their place, one XSI
	frame per machine, leaving the other cpu on each machine unavailable:

[erco@ontario] : rush -lac
HOST            OWNER          JOBID            TITLE                     FRAME   PRI   PID        ELAPSED REMARKS
ontario         erco           ontario.58       XSI                       0002    10k   7461      00:00:09
ontario         -              -                -                         -       -     -           Online

rotwang         erco           ontario.58       XSI                       0001    10k   29712     00:00:10
rotwang         -              -                -                         -       -     -           Online

meade           erco           ontario.58       XSI                       0003    10k   32214     00:00:09
meade           -              -                -                         -       -     -           Online

tower           erco           ontario.56       MAYA_JOB                  0005    5     5062      00:14:36
tower           erco           ontario.56       MAYA_JOB                  0006    5     5063      00:14:36

	When you look at the ram available on the machines running XSI,
	you'll see the XSI job is taking all the ram, leaving none for
	other jobs, preventing other jobs from sneaking in:

[erco@ontario] : rush -ramlist rotwang
STATE   JOBID/TITLE                            PRI    RAMUSE NOTES
Busy    ontario.58,XSI                         10k      4096  <-- asking for all the ram
                                                      ------
                                                        4096
    Total ram on rotwang: 4096
Available ram on rotwang: 0     <-- no ram available for other jobs to use the other cpu

	Note how only "tower" has two MAYA jobs running; the other 3 machines
	are taken over by the XSI job, with only one cpu busy each.

	This is not a perfect solution, but it does get you what you want.
	Or you can use the 'reserve' approach (#3 in the above link) where
	you might make a +xsi group, and then reserve the extra processors
	on each machine with a 'sleep' job, and submit the XSI frames to just
	that +xsi group.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)

Last Next