On 2007-06-06 13:15:34 -0700, Greg Ercolano <erco@(email surpressed)> said:
Antoine Durr wrote:
maxcpus 2
cpus tahoe=1@999 ontario=1@999
Beyond their own machine (which is important) I really dislike the
notion of a user having to know or choose what machines their stuff
lands on.
Agreed; but if you have e.g. node locked licenses,
it at least ensures only those boxes get the renders.
Again, an appropriately named pool would take care of that.
If you have floaters, then yes, you wouldn't specify
hostnames, just a cpu cap.
Thing is, if you submit two jobs that are both houdini
jobs, and you only have two floating lics, then you
need some 'centralized' way to count the lics; this is
where a watcher script might come in, and realize there
are two houdini jobs and limit the procs, or tweak the
priorities to prevent a situation where more than two
are rendering at once.
Jobs should have the notion of "requirements", one of which is a
particular license type. Admittedly, this is tricky because users use
the licenses w/out notifying the queue! So the queue has to figure out
what's left, figure out how many it's using, and only allow so many
more after that.
Rush itself doesn't maintain a centralized perspective
of things, so it can't do centralized stuff like counting.
I'm curious as to the different ways that exist. What kinds of things
are important to people?
I've seen a variety of requests too numerous to mention.
I think Esc had the most complex of all the scheduling
algorithms I'd come across for The Matrix III. They had
all kinds of stuff in there; taking ram into account,
render times that change over time, I think they were
even polling ram use as the renders ran. The guy who was
Funny, I'm doing the ram-usage polling right now. I simultaneously
launch a memory watcher script, which given a PID, finds all the child
PIDs and adds up their memory consumption, writes to a file in the
logdir. When the process completes, it tails the last line of that
file, and puts it into the per-frame notes field, so that the users see
what kind of memory footprint their job had. This is a pretty critical
feature, IMO, as you *really* want to avoid going into swap on a box!
Ideally, I should be able to submit with a requirement of a certain
amount of ram, and have the frames only run on machines that have that
much ram left. Yes, that is not failure-free, as doing only
spot-checks doesn't tell you that a particular job on a host might
suddenly chew up another gig. But at least it should try.
-- Antoine
writing it had a lot of high level goals.
Trouble with their implementation was they had >500 boxes
on their farm, and were throwing all the jobs submitted
to one box..! I warned that was a bad, bad from the get go,
but they locked into that for some reason. The whole point
of rush's decentralized design is to distribute the job
load, so it really hinders it to focus all jobs on a single
box for a large net. This made it tough for them, because
the central box became overloaded fast, in addition to their
watcher constantly rescheduling it.
That was a while ago, 2002/2003 IIRC, when boxes and
networks were slower, and rush has had some optimizations
since then.
--
Floq FX Inc.
10839 Washington Blvd.
Culver City, CA 90232
310/430-2473
|