Ron Griswold wrote:
I ran across the cpu.acct file using the 'rush -catlog cpu.acct host'
command. It has info recorded dating back before I started working
here! Woo hoo!
Yes, that's the place to look, and yes, you want the wall clock
time (which is the field that immediately follows the priority,
and priority immediately follows the hostname)
I think I'm missing something. According to the docs the 9th column in
the 'p' report is the "#Secs Wall Clock Time".
Correct, and that is in seconds, and is the number you want.
Both the "system time" and "user time" are 0.
Yes, usually they are zero, because that's the system/user times
for the immediate child (usually the perl script itself) which
has little or no impact on the cpu.
AFAIK, to this day even the modern OS's still do not provide
accumulated system/user time for children (e.g. the render
processes themselves), so rush can't display them either.
You might ask why doesn't rush poll the process tree and
try to accumulate the user/system times; I tried, and I can
tell you polling sucks and is prone to error. Only the OS
can do these calculations properly, and from what I've seen,
they do not.
The values however aren't even remotely close to what I would expect
them to be though. For a render that took around 2h, the time is
recorded as 79s.
Hmm, unless there's a bug I don't know about, you should
be seeing a value of ~7200 for a 2h render in that field.
Try the test below to check it.
What platform were you on for the cpu.acct files you were
looking at?
Is there something wrong, or am I just not getting the meaning of this
value?
I'm certainly hoping you haven't found a bug, and are just
not looking at the right data somehow.
If you have proof of this, show a few lines of excerpt
from the cpu.acct log, and then show what you know to
be the matching 'rush -lf' report for those entries to
show the discrepancy. I'll be able to tell. Make sure you're
not looking at requeued frames.
Here's a quick test I just ran (linux) showing how to check
the consistency between cpu.acct and 'rush -lf':
[erco@ontario] % rush -submit ## SUBMIT TEST JOB
title foo
frames 1-10
command sleep 35 ## EACH FRM SLEEPS 35 SECS
cpus ontario=2
^D
setenv RUSH_JOBID ontario.8
[erco@ontario] % tail -f /usr/local/rush/var/cpu.acct ## WATCH FRMS GET DONE
[..]
Start Jobid Title Owner Frame Hostname Pri Wall Sys Usr Exit PID
p 1169282036 ontario.8 foo erco 0001 ontario 1 37 0 0 0 11011
p 1169282037 ontario.8 foo erco 0002 ontario 1 36 0 0 0 11013
p 1169282074 ontario.8 foo erco 0003 ontario 1 36 0 0 0 11019
p 1169282074 ontario.8 foo erco 0004 ontario 1 36 0 0 0 11021
p 1169282112 ontario.8 foo erco 0005 ontario 1 36 0 0 0 11027
p 1169282112 ontario.8 foo erco 0006 ontario 1 36 0 0 0 11029
p 1169282149 ontario.8 foo erco 0007 ontario 1 36 0 0 0 11085
p 1169282149 ontario.8 foo erco 0008 ontario 1 36 0 0 0 11087
p 1169282186 ontario.8 foo erco 0009 ontario 1 37 0 0 0 11093
p 1169282187 ontario.8 foo erco 0010 ontario 1 36 0 0 0 11095
^C ^
/|\
|
+-----+
|
\|/
[erco@ontario] % rush -lf ontario.8 ## VIEW RUSH FRAMES v
STAT FRAME TRY HOSTNAME PID JOBID START ELAPSED NOTES
Done 0001 1 ontario 11011 ontario.8 01/20,00:33:56 00:00:37
Done 0002 1 ontario 11013 ontario.8 01/20,00:33:57 00:00:36
Done 0003 1 ontario 11019 ontario.8 01/20,00:34:34 00:00:36
Done 0004 1 ontario 11021 ontario.8 01/20,00:34:34 00:00:36
Done 0005 1 ontario 11027 ontario.8 01/20,00:35:12 00:00:37
Done 0006 1 ontario 11029 ontario.8 01/20,00:35:12 00:00:37
Done 0007 1 ontario 11085 ontario.8 01/20,00:35:49 00:00:36
Done 0008 1 ontario 11087 ontario.8 01/20,00:35:49 00:00:36
Done 0009 1 ontario 11093 ontario.8 01/20,00:36:27 00:00:37
Done 0010 1 ontario 11095 ontario.8 01/20,00:36:27 00:00:37
The +/-1 second variance is due to rush's own overhead for
actually switching from the Run to the Done state; 'rush -lf'
includes this overhead for the job, cpu.acct does not.
--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
|