From: Greg Ercolano <erco@(email surpressed)>
Subject: OSX 10.3.9 + Rush 102.42a -- problems with rush taking up to 15 mins
   Date: Sat, 19 Nov 2005 10:19:50 -0800
Msg# 1116
View Complete Thread (3 articles) | All Threads
Last Next
Problem Description
-------------------
Specific to Rush on OSX machines. (seen on 10.3.9 with 102.42a)

Customer reports after rebooting Rush would be stuck for about 15 minutes
trying to access the license server, repeating in the rushd.log:

  11/18,15:41:16 LICENSE    select() on connect(): Connection refused
  11/18,15:41:16 LICENSE    no servers could validate license (30 sec retries)

...then finally, after exactly 15 minutes it would suddenly kick in by itself:

  11/18,15:56:16 LICENSE    validated with server CGISVR1  <--
  11/18,15:56:16 LICENSE    expires 08/04/2032
  11/18,15:56:16 START      r120 RUSHD 102.42 PID=377     Boot=11/18/05,15:41:16  Online
  11/18,15:56:16 INFO       TCP listening on port 696, service 'rushd', sockfd=5
  11/18,15:56:16 INFO       UDP listening on port 696, service 'rushd', sockfd=6

This 15 minute delay prevents users from being to submit jobs from that machine
until the daemon kicks back in.

Cause
-----
It was determined the problem is in the OS; at boot time the rushd service
has a boot script dependency on the "Resolver" service, so as to not start
before name lookups are working properly.

What was happening is OSX would start "lookupd", then tell rush to start
before lookupd is working fully.

Debugging
---------
We were able to verify name lookups were not working yet when the rush
boot script ran, by adding 'ping -c 1 <local_hostname>' commands to the
rush boot script. ping reported 'unknown host', making it clear OSX was
prematurely invoking the rush boot script, regardless of the dependency.

Solution
--------
Customer modified the /usr/local/rush/etc/S99rush script to preface
the starting of the daemon with an 'ipconfig waitall' command, ie:

BEFORE:
        # Start in background, incase name lookups are slow
        ( cd $RUSH_DIR/var && $RUSH_DIR/bin/rushd ) &

AFTER:
        # Start in background, incase name lookups are slow
        ( /usr/sbin/ipconfig waitall; cd $RUSH_DIR/var && $RUSH_DIR/bin/rushd ) &
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^

This causes the boot script to delay starting rush until all network
services have confirmed starting.

Caveats
-------
10.3.9 machines have the ipconfig command, but no man pages for it.
In 10.4.x, they included a man pages which clearly documents the 'waitall' option.


Last Next