cpu.acct - Rush Cpu Accounting File

Rush Render Queue - cpu.acct File
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 102.43 06/14/08

Strikeout text indicates features not yet implemented

Cpu Accounting File
$RUSH_DIR/etc/cpu.acct

File Format
Example File
Process Entries
Log Rotation Entries
State Change (Online/Offline)
Midnight Marks

File Format

The cpu accounting file is configured with the rush.conf file's CpuAcctPath command.

Each time a frame finishes executing, a new entry is created in the Cpu Accounting file, logging the name of the job, how long the frame ran, etc. The file format is one record per line, with tab delimited fields in each record.

The first character of the line determines the record type, and the fields that follow it. What follows is a list of the different kind of record entries that can be found..

Cpu Accounting File Example


  r start 940000000 online
  p 940001000 tahoe.798    WERNER/C33 erco     0106  superior 100k  122  0   0	0 27823
  p 940001123 tahoe.797    KILLER     erco     0504  superior 200   121  0   0	0 27846
  s 940007123 offline fred@tahoe[192.17.1.34] -
  m 940100000
  r end 940100000 offline
  d stop 940100005 offline
  d start 940100008 offline
  ^
 /|\
  |_______
          |
         'r' - Log rotation
         'p' - Process completed
         's' - State change of daemon (online/offline)
         'm' - Midnight marker
         'd' - Daemon start/stop

Process Entries



p  948242783 tahoe.798 WERNER/C33 erco  0106  superior  100k  122  0   0   0 27822
p  948242783 tahoe.798 WERNER/C33 erco  0107  superior  100k  122  0   0   0 27834
p  948242865 tahoe.797 KILLER     erco  0504  superior  200   121  0   0   0 27846
-  --------- --------- ---------- ----  ----  --------  ----  ---  -   -   - -----
|      |         |          |      |     |       |       |     |   |   |   |   |
|      |         |          |      |     |       |       |     |   |   |   |   Pid
|      |         |          |      |     |       |       |     |   |   |   |
|      |         |          |      |     |       |       |     |   |   |   Exit code
|      |         |          |      |     |       |       |     |   |   |
|      |         |          |      |     |       |       |     |   |   #Secs User Time
|      |         |          |      |     |       |       |     |   |                 
|      |         |          |      *Job  |       |       |     |   #Secs System Time
|      |         |          |      Owner |       |       |     |
|      |         |          |            |       |       |     |
|      |         |          Title of job |       |       |     #Secs Wall Clock Time
|      |         Jobid                   |       |       |
|      |                                 |       |       Priority
|      time(2) process started           |       |
|                                        |       Host that ran the process
'p' indicates 'process entry'            |
                                         Frame that ran

* The job owner is not necessarily the owner of the process.
  Such is the case in windows jobs running frames on unix machines,
  or 'forceuid' configured in the rush.conf file.

Log Rotation Entries


    Rotation entries help applications determine the start/end range           
    of times a particular log file covers.

r start  948200000 online
r end    948300001 online
- -----  --------- ------
|   |        |       |
|   |        |       (New in 102.42a9)
|   |        |       Indicates 'online' or 'offline' state of daemon
|   |        |
|   |        time(1) file was rotated
|   |
|   'start' indicates the time the new log created
|   'end' indicates time log was rotated out (Ocpu.acct files only)
|
'r' indicates log file was rotated, either manually or automatically

Daemon Online/Offline State Change


    (New in 102.42a9)
    An 's' entry is logged when someone changes the online/offline state       
    of the dameon, indicating what time the change was made, what state
    it was changed to (online or offline), by whom, from which machine 
    the change was issued, and optional comments (if any).

s 982330201 online jerry@tahoe[192.17.1.34] -
s 982334241 offline root@meade[192.15.0.177] Offline for maintenance
- -----     ------- ------------------------ ---------------------------
|   |        |            |                          |
|   |        |            |                          Optional remarks ('-' if none)
|   |        |            |
|   |        |            User@host who invoked the online/offline command
|   |        |
|   |        'online' or 'offline'
|   |
|   time(1) state was changed
|
's' indicates daemon's online/offline state was changed

Midnight Marks


    Midnight marks are useful for applications to determine days that were     
    completely idle, such as when a log isn't rotated for several days.

m 1114326000
- ----------
|     |
|     time(1) mark occurred
|
'm' indicates a midnight time marker.

Daemon Boot


    Daemon boot messages indicate when the daemon was started/stopped.         

d 1114326000 start "reason"
d 1114326005 stop  "reason"
- ---------- ----- --------
|     |        |      |
|     |        |      (Optional) user supplied reason why daemon was stop/started
|     |        |
|     |        start|stop
|     |
|     time(1) mark occurred
|
'd' indicates a daemon boot message

CAVEATS

'Exit code' is normally a positive number representing the actual exit code of the process. This value will be negative if the process was signaled; the value being the signal number. If the value is negative, this usually means the process killed, segfaulted, or was bumped by a higher priority process. Commonly, the 'Exit code' will be one of:
```
  -15 - process killed with SIGTERM; someone probably manually killed it
   -9 - process killed with SIGKILL; probably bumped in a priority battle
   -3 - process killed with SIGINT; someone sent it a ^C
    0 - process did an exit(0); frame Done
    1 - process did an exit(1); frame Fail
    2 - process did an exit(2); frame Requeue
    
```
Do /not/ attempt to redirect the cpu.acct log to an NFS server or remote file system; keep the files local. If you want to centralize the data, make a crontab(1) that sweeps the data to a central server, using either sendmail(8), rcp(1), rdist(1), rsync(1), or some other more forgiving mechanism than NFS.
NFS is the 'kiss of death' for daemons (rush, cron) if the NFS server hangs or goes down; as soon as the daemon tries to touch a hung NFS (e.g. rush adding a line to cpu.acct when a frame finishes), the daemon will hang up completely. In the case of rush, it will not only make the daemon unresponsive via irush during the outage, it will also be unkillable if the mounts are 'hard'.
Although tempting, it is not recommend to use process execution times for cpu billing purposes. Wall clock time includes time the process may have spent waiting for network load. User and System times report the respective times spent for the Render Script only; not its sub-processes (e.g., the renderer).
To properly bill for cpu time, you would either need to enable full-on Unix process accounting to attain accumulated cpu time for all sub-processes in the user's render script, or, create wrapper scripts that use programs like timex(2) to monitor the binary execution time of the critical render/compositor processes.
Tools like timex(2) indicate in their documentation that they must have Unix process accounting enabled to show sub-process totals. This is usually prohibitive on production machines, due to disk resources used by the Unix process accounting system.