Problem Description
-------------------
Specific to Rush on OSX machines. (seen on 10.3.9 with 102.42a)
Customer reports after rebooting Rush would be stuck for about 15 minutes
trying to access the license server, repeating in the rushd.log:
11/18,15:41:16 LICENSE select() on connect(): Connection refused
11/18,15:41:16 LICENSE no servers could validate license (30 sec retries)
...then finally, after exactly 15 minutes it would suddenly kick in by itself:
11/18,15:56:16 LICENSE validated with server CGISVR1 <--
11/18,15:56:16 LICENSE expires 08/04/2032
11/18,15:56:16 START r120 RUSHD 102.42 PID=377 Boot=11/18/05,15:41:16 Online
11/18,15:56:16 INFO TCP listening on port 696, service 'rushd', sockfd=5
11/18,15:56:16 INFO UDP listening on port 696, service 'rushd', sockfd=6
This 15 minute delay prevents users from being to submit jobs from that machine
until the daemon kicks back in.
Cause
-----
It was determined the problem is in the OS; at boot time the rushd service
has a boot script dependency on the "Resolver" service, so as to not start
before name lookups are working properly.
What was happening is OSX would start "lookupd", then tell rush to start
before lookupd is working fully.
Debugging
---------
We were able to verify name lookups were not working yet when the rush
boot script ran, by adding 'ping -c 1 <local_hostname>' commands to the
rush boot script. ping reported 'unknown host', making it clear OSX was
prematurely invoking the rush boot script, regardless of the dependency.
Solution
--------
Customer modified the /usr/local/rush/etc/S99rush script to preface
the starting of the daemon with an 'ipconfig waitall' command, ie:
BEFORE:
# Start in background, incase name lookups are slow
( cd $RUSH_DIR/var && $RUSH_DIR/bin/rushd ) &
AFTER:
# Start in background, incase name lookups are slow
( /usr/sbin/ipconfig waitall; cd $RUSH_DIR/var && $RUSH_DIR/bin/rushd ) &
^^^^^^^^^^^^^^^^^^^^^^^^^^^
This causes the boot script to delay starting rush until all network
services have confirmed starting.
Caveats
-------
10.3.9 machines have the ipconfig command, but no man pages for it.
In 10.4.x, they included a man pages which clearly documents the 'waitall' option.
|