Rush Logo Rush Render Queue - cpu.acct File
V 103.07 05/28/15
(C) Copyright 2008, 2015 Seriss Corporation. All rights reserved.
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.

Strikeout text indicates features not yet implemented



   cpu.acct File Format  

The cpu accounting file is configured with the rush.conf file's CpuAcctPath  command.

Each time a frame finishes executing, a new entry is created in the Cpu Accounting file, logging the name of the job, how long the frame ran, etc. The file format is one record per line, with tab delimited fields in each record.

NOTE:
    Do /not/ attempt to symbolically link or mount the rush/var/cpu.acct log file directlry to an NFS server or remote file system; keep these files /local/. NFS is the 'kiss of death' for daemons (rush, cron) if the NFS server hangs or goes down; as soon as the daemon tries to touch a hung NFS (e.g. rush adding a line to cpu.acct when a frame finishes), the daemon will hang up completely. In the case of rush, it will not only make the daemon unresponsive via irush during the outage, it will also be unkillable if the mounts are 'hard'.

    If you want to centralize the data, use the new rush.conf command cpuacct.dbasedir command to set a path to the server, and rush will regularly update the file info to a date stamped directory hierarchy to make it easy to get at the data. Threads are used when writing to this directory to ensure file server hiccups do not affect the main daemon thread.

    Other ways to manage this data are to use a crontab to sweep the data to a central location (more or less what cpuacct.dbasedir does without needing cron to do it), and use some other protocol like rcp, rdist, rsync, etc. which will timeout if the remote server is not responding.

Here's a small example of a cpu.acct file:

    Cpu Accounting File Example
    
          r start 940000000 online
          p 940001000 tahoe.798    WERNER/C33 erco     0106  superior 100k  122  0   0  0 27823         
          p 940001123 tahoe.797    KILLER     erco     0504  superior 200   121  0   0  0 27846
          s 940007123 offline fred@tahoe[192.17.1.34] -
          m 940100000
          r end 940100000 offline
          d stop 940100005 offline
          d start 940100008 offline
         
          Where the first letter in each line is one of:
    
    	    'r' - Log rotation
    	    'p' - Process completed
    	    's' - State change of daemon (online/offline)
    	    'm' - Midnight marker
    	    'd' - Daemon start/stop
    	

The first character of each line in the file determines the type of data found on the rest of the line, and defines what the space delimited fields will contain.

What follows is a description of the different letter codes, and the format of the data lines for each code..

   'p' -- Process Entry  
This entry is added to the file whenever rush finishes rendering a rush frame (process). It includes relevant info about the process; its pid, exit code, how long it ran, etc.

New data for job/host ram and host cpu slots was added in 103.06 to assist in generating more accurate actualization, so that external programs can see when changes were made to the rush/etc/hosts file 'Ram' and 'Cpus' values, and can determine ram utilization based on job requests.

    'p' Process Entries
    
    
        p  948242783 tahoe.798 WERNER/C33 erco  0106  superior  100k  122  0   0   0 27822 50/100 4
        p  948242783 tahoe.798 WERNER/C33 erco  0107  superior  100k  122  0   0   0 27834 50/100 4
        p  948242865 tahoe.797 KILLER     erco  0504  superior  200   121  0   0   0 27846 25/100 4
        -  --------- --------- ---------- ----  ----  --------  ----  ---  -   -   - ----- ------ -
        |      |         |          |      |     |       |       |     |   |   |   |   |      |   |
        |      |         |          |      |     |       |       |     |   |   |   |   |      |   (New in 103.06)
        |      |         |          |      |     |       |       |     |   |   |   |   |      |   #Cpu slots for this machine
        |      |         |          |      |     |       |       |     |   |   |   |   |      |   (from rush/etc/hosts file's 'Cpus' column)
        |      |         |          |      |     |       |       |     |   |   |   |   |      |
        |      |         |          |      |     |       |       |     |   |   |   |   |      (New in 103.06)
        |      |         |          |      |     |       |       |     |   |   |   |   |      Ram job requested / Total Ram for this machine
        |      |         |          |      |     |       |       |     |   |   |   |   |      (Total Ram from rush/etc/hosts file's 'Ram' column)     
        |      |         |          |      |     |       |       |     |   |   |   |   |
        |      |         |          |      |     |       |       |     |   |   |   |   Pid
        |      |         |          |      |     |       |       |     |   |   |   |
        |      |         |          |      |     |       |       |     |   |   |   Exit code
        |      |         |          |      |     |       |       |     |   |   |
        |      |         |          |      |     |       |       |     |   |   #Secs User Time
        |      |         |          |      |     |       |       |     |   |                 
        |      |         |          |      *Job  |       |       |     |   #Secs System Time
        |      |         |          |      Owner |       |       |     |
        |      |         |          |            |       |       |     |
        |      |         |          Title of job |       |       |     #Secs Wall Clock Time
        |      |         Jobid                   |       |       |
        |      |                                 |       |       Priority
        |      time(2) process started           |       |
        |                                        |       Host that ran the process
        'p' indicates 'process entry'            |
                                                 Frame that ran
        
        * The job owner is not necessarily the owner of the process.
          Such is the case in windows jobs running frames on unix machines,
          or 'forceuid' configured in the rush.conf file.
    
        

CAVEATS

  • 'Exit code' is normally a positive number representing the actual exit code of the process. This value will be negative if the process was signaled; the value being the signal number. If the value is negative, this usually means the process killed, segfaulted, or was bumped by a higher priority process. Commonly, the 'Exit code' will be one of:
    
          -15 - process killed with SIGTERM; someone probably manually killed it
           -9 - process killed with SIGKILL; probably bumped in a priority battle
           -3 - process killed with SIGINT; someone sent it a ^C
    	0 - process did an exit(0); frame Done
    	1 - process did an exit(1); frame Fail
    	2 - process did an exit(2); frame Requeue
    	

  • Although tempting, it is not recommend to use process execution times for cpu billing purposes. Wall clock time includes time the process may have spent waiting for network load. User and System times report the respective times spent for the Render Script only; not its sub-processes (e.g., the renderer).

    To properly bill for cpu time, you would either need to enable full-on Unix process accounting to attain accumulated cpu time for all sub-processes in the user's render script, or, create wrapper scripts that use programs like timex(2) to monitor the binary execution time of the critical render/compositor processes.

    Tools like timex(2) indicate in their documentation that they must have Unix process accounting enabled to show sub-process totals. This is usually prohibitive on production machines, due to disk resources used by the Unix process accounting system.

   'r' -- Rotation Entry  
These entries indicate when the account log is rotated out, useful in determining the range of start and end times the data within this file represents.

    'r' Rotation Entries
    
    
        r start  948200000 online
        r end    948300001 online
        - -----  --------- ------
        |   |        |       |
        |   |        |       (New in 102.42a9)
        |   |        |       Indicates 'online' or 'offline' state of daemon
        |   |        |
        |   |        time(1) file was rotated
        |   |
        |   'start' indicates the time the new log created
        |   'end' indicates time log was rotated out (Ocpu.acct files only)          
        |
        'r' indicates log file was rotated, either manually or automatically
        

   's' -- Daemon State Change  
An 's' entry is logged when someone changes the online/offline state of the dameon, indicating what time the change was made, what state it was changed to (online or offline), by whom, from which machine the change was issued, and optional comments (if any).

    's' State Entries
    
        (New in 102.42a9)
    
        s 982330201 online  jerry@tahoe[192.17.1.34] -
        s 982334241 offline root@meade[192.15.0.177] Offline for maintenance
        - --------- ------- ------------------------ -----------------------
        |     |        |              |                        |
        |     |        |              |                        Optional remarks ('-' if none)        
        |     |        |              |
        |     |        |              User@host who invoked the online/offline command
        |     |        |
        |     |        'online' or 'offline'
        |     |
        |     time(1) state was changed
        |
        's' indicates daemon's online/offline state was changed
        

   'm' -- Midnight Mark  
Midnight marks are useful for applications to determine days that were completely idle, such as when a log isn't rotated for several days.

    Midnight Marks
    
    
        m 1114326000
        - ----------
        |     |
        |     time(1) mark occurred
        |
        'm' indicates a midnight time marker.                      
        

   'd' -- Daemon Boot Message  

Daemon boot messages indicate when the daemon was started/stopped.

    Daemon Boot
    
    
        d 1114326000 start "reason"
        d 1114326005 stop  "reason"
        - ---------- ----- --------
        |     |        |      |
        |     |        |      (Optional) user supplied reason why daemon was stop/started         
        |     |        |
        |     |        start|stop
        |     |
        |     time(1) mark occurred
        |
        'd' indicates a daemon boot message