From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: redirects and semicolons in jobdumpcommand
   Date: Sat, 01 Dec 2007 14:31:13 -0500

Msg# 1657
View Complete Thread (5 articles) | All Threads
Last Next

Antoine Durr wrote:
>> 	The command is invoked by exec(), so no shell is involved.
>> 	(';' and '>' being shell characters)
>>
>> 	To use shell specific syntax, it's best if you put your commands
>> 	into a shell script and invoke that.
> 
> Ahh, ok.  Out of curiosity, why not just do a system() call, with shell 
> and all?  From the user's standpoint, it's not obvious that we cannot 
> put arbitrary unix commands into those fields.

    Mmm, gotta know, do ya? ;)

    I actually used to use system(3), but for many mundane reasons
    it caused trouble in odd cases, and had to switch to exec(2).

    system(2) is actually a complex call that does a LOT of weird stuff,
    depending on the platform. Several times it caused critical problems
    in high load environments.

    One is system(2) uses fork(), which causes trouble when the daemon
    has a large memory footprint when managing large numbers of large jobs.
    Two customers had unix job servers swapping to near death because
    a user dumped a large number of jobs all at once, which caused the
    jobdonecommand to fork the daemon numerous times within a short period,
    paging their servers for long periods, making them unresponsive.
    I had thought copy-on-write would prevent that from being a problem,
    but unfortunately when destructors are called, silly things like
    ptr = NULL; can trigger copy-on-write for an entire page, and enough
    of these paging operations pretty much defeats copy-on-write.
    Rush now uses vfork(2) to prevent this overhead, but the vfork docs
    are explicit on using exec(2) immediately after vfork, since vfork
    is often implemented with threads, where code like system(2) creates
    havoc.

    Also, system(2) is implemented with wait, which for many reasons can be
    undesirable for rush's own job control, where rushd needs to do
    the actual wait()ing on the child process. sh's also like to play
    around with signal handling, exit codes, and get inbetween rush and
    the user's process for managing signal deliveries.

    The code in system(2) is non-trivial, and affects many weird things
    like signals, which caused trouble with rushd's careful management
    of things like sigpipe. Within the system() call, there was a window
    of opportunity for the daemon to be susceptible to sigpipe, which
    could be triggered by high network traffic happening at the same
    time the system() call was running.

    There were a few other reasons, but memory isn't serving me.
    Suffice it to say there are 'really good reasons'.

    You might ask why don't I stick with exec, but use 'sh -c' to ensure
    the commands run in a shell. Again, there are technical reasons
    that range from cross platform space/quote handling to resource
    management calls and issues where the intermediate shell can cause
    trouble with certain customer configs and apps.

    Surprisingly, I ran into similar issues on Windows with CreateProcess(),
    which is even more info than the above.

    Gotta love systems programming.

    I'll see if I can put a small note in the docs that commands are
    run with exec(), and anything more complex than running a single
    command is best handled by creating a script.

-- 
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

Last Next