It usually means one of these things.
This is an SGI, and the kernel's NFS is using the port
Something else is using the port.
Two or more rushd's are running. (Not likely in 102.31+)
You recently stop/started the daemon. Problem goes away by itself. (Not likely in 102.31+)
#1 often occurs if you've just installed rush on an SGI
for the first time, and the machine has been up for a while.
'netstat -an' will show a whole slew of UDP listeners
on ports between 512 and 1024 all in sequence, one of them being port
number 696, the one rush has been assigned by IANA. Some rogue kernel
utility is causing this, probably NFS. Usually fuser(1) shows no process
associated with the rogue UDP listeners because it's a kernel process.
The easiest solution is to simply reboot; when rush starts on boot,
it always secures the port it needs well before the kernel gets a
chance to step on it.
#2 Stop the rush daemon, and use 'netstat -an'
to see if some other program is using rush's port (normally port #696;
see your rush.conf file's
serverport setting, incase
your site uses a different port number). Look for open UDP or TCP
connections on that port, either in the Local or Foreign address.
If you see port #696 in the 'Foreign' address of the local machine,
suspect hung clients on the remotes:
- rsh over to the remote machine (ie. 'Foreign' host)
- Kill any 'rush' client processes you see, eg. 'killall rush'
- Back on the local machine, do a 'netstat -an' to verify
the connections are gone or closing.
- Restart the daemon once all 696 ports have closed
If the local TCP or UDP port is in use, suspect some system daemon
or other is using the port when it shouldn't. Use fuser(1) or similar
utility to figure out which process is using the port, or simply reboot.
If fuser(1) shows no process and it's an SGI, then see #1..
#3 only happens in older versions of rush (pre-102.31)
where more than one rush might be running. Newer versions of rush
use a lock file that prevents this.
Only one daemon should have a PPID of 1 (Parent Process ID).
If there's more than one with a PPID of 1, kill the one(s) with the higher PID.
#4 is common only in the older versions of rush (pre-102.31),
and occurs when you stop/start the rushd daemon. This problem fixes itself
within 2 minutes automatically. The OS often keeps recently closed TCP
listeners unavailable to other processes for a 90 second period.
Rush will keep retrying to bind to the port, and eventually succeeds
within 2 minutes.