From: Greg Ercolano <erco@(email surpressed)>
Subject: [RUSH 103] How do I use the new centralized accounting feature in
   Date: Fri, 01 Feb 2013 18:13:52 -0500

Msg# 2280
View Complete Thread (1 article) | All Threads
Last Next

> We see there's the new 'cpuacct.dbasedir' command that lets us
> configure rush to write the accounting data to a file server.
> Can you provide some details on how this works?

	In short, yes, the 103 version allows you to specify the path
	to your file server for cpu accounting data to be accumulated
	using this rush.conf command:

cpuacct.dbasedir "//your/server/rush-accounting" 0755 0644
                  -----------------------------  ---- ----
                                 |                 |    |
                                 |                 |    The permissions for created files
                                 |                 |
                                 |                 The permissions for created sub-directories
                                 |
                                 The path to the file server top level directory
                                 into which the accounting data is written.

	So whenever accounting data is generated from a render node,
	using the above example, the accounting data would be written
	as ascii data to a date oriented directory tree of the form:

//your/server/rush-accounting/YYYY/MM/DD/HOSTNAME-cpu.acct
----------------------------- ---- -- -- --------
            |                   |   |  |     |
     The path specified to      |   |  |     Hostname that generated the
     "cpuacct.dbasedir"         |   |  |     accounting data, ie. render node name
                                |   |  |
                                |   |  Day of month (padded to 2 digits)
                                |   Month of year (padded to 2 digits)
                                4 digit year

	So if today is 12/31/2012, and the render node generating the data is 'tahoe',
	then the data would be appended to the file:

//your/server/rush-accounting/2012/12/31/tahoe-cpu.acct

	Each line in the file would be in the usual rush cpu.acct file format:
	http://www.seriss.com/rush.103.00/rush/rush-cpu-acct.html

	The data is swept at regular intervals to the server; the default is
	5 minute intervals, but the value can be adjusted with the rush.conf
	'cpuacct.dbasedir.sweepsecs' command. (Large networks may need a slower
	sweep time to prevent load)

	Included with Rush 103 are python modules that let you load this data
	for doing computations for cpu utilization, project utilization, etc.
	These are in rush/examples/python/lib/RushAcct*.py, and include examples
	and docs that show how to use them.

	There will be some example python web scripts that generate html graphs
	based on this data, with features added in subsequent releases.

	What follows is a comparison of the old rush accounting data management
	technique for 102.xx vs the newer 103.xx technique:


OLD RUSH ACCOUNTING (RUSH 102)
------------------------------
You're probably familiar with the rush/var/cpu.acct files that rush has always
written locally on each render node.

This accounting file's data format is documented here, and hasn't changed much
over the last 13 years:
http://www.seriss.com/rush.103.00/rush/rush-cpu-acct.html

To collect this data, the design was a single machine could have a crontab that
periodically reached out to all the render nodes to collect that data with something
like the following:

	# ROTATE THE ACCOUNTING LOG ON ALL THE NODES
	rush -rotate rush.acct +any

	# COLLECT THE ACCOUNTING DATA FROM ALL THE NODES
	rush -catlog rush.acct +any >> today.log

...and then take the resulting data, and merge it into a database or archive
for later processing.

The reason it wrote to the local disks of each node first, instead of writing
directly to a file server was to prevent the daemons having to ever touch
a file server during normal operation, since even a short server outage could
hang the daemon up during a file access.

At that time Rush was designed, the operating system concept of 'threads'
did not exist. (If they had, the rush daemons could use a thread to write
the data to prevent hanging up the daemon's main thread)

So at that time, better to have the daemons writing to local drives,
and have a crontab on the server "pull" the data and write to the archive.

NEW RUSH ACCOUNTING (RUSH 103)
------------------------------

The old 102 technique is still available and is the default behavior in 103,
but if you modify the cpuacct.dbase entry in the rush.conf file, you can automate
the centralization of the cpu accounting data, and you can even disable the local
cpu.acct files if you don't want that data to accumulate locally.

The new 103 technique handles caching the data locally, and then periodically
sweeping the data out to the server, creating a date oriented hierarchy automatically.

The dameons use threads to prevent server outages and NFS hangs from causing problems
with the daemon, and ensure data doesn't get lost during an outage.

And now that the centralized data fits a date oriented directory hierarchy,
rush can provide tools in the form of python modules to access this data
for reporting purposes, and includes web examples that make use of this data
which the user can either customize, or use for reference.

These library modules for loading cpu accounting data are in rush/examples/python/lib/RushAcct*
and example web script reports will probably be in: rush/examples/python/reports

Last Next