Java: it’s your fault! Connections on AIX


Just a quick one during my lunch hour …. ran into an issue yesterday at my current client that shows once more that when you do not work with a specific OS for a while, you really loose your touch for the small details.

The Saga of WAS, AIX and the damn Java Cache

We installed iFixes yesterday and that all went well. However the syncing of the nodes (kicked off from the Dmgr console) took forever, and then one of the app clusters on one of the nodes would not restart (it eventually did after 4 hours).

To clean the system and get rid of any old temp files we:

  • Stopped all WebSphere servers:  /WAS_Profile/bin/stopServer.sh xxx -user xxx -password ****
  • Stopped the NodeAgent:  /WAS_Profile/bin/stopNode.sh -username xxxx -password ****
  • Cleaned all temp files /WAS_Profile/temp  and /wstemp (everything inside of both folders)
  • Ran  /WAS_Profile/bin/osgiCfgInit.sh
  • Ran  /WAS_Profile/bin/clearClassCache.sh

Note: you can also use the command “./stopNode.sh –stopservers -username xxxx -password ****” to shut down the node agent AND the servers at the same time. We wanted to see the individual servers come down as we had issues with one of them.

We then tried to restart the node agent ….. and it failed. We found this in the startserver.log for the node agent:

ADMU3011E: Server launched but failed initialization

Damn, nothing worked … re-cleaned, checked, cursed, cried ….. and then opened a Sev 1 ticket with IBM support online. (had a REALLY fast response – thanks guys!)

The Cavalry to the Rescue …

The Connections support guy had a look at the logs and brought in a WAS support specialist who had me repeat the clean-ups steps above AND clean this location as well (everything in this folder, but not delete the folder itself):

/tmp/javasharedresources

The IBM tech thinks we had a corrupted system level java cache that was causing the issue.  After that a ./startNode.sh worked like a charm and the servers started fine as well.

Total Clean-up

Incidentally, we ended up shutting each AIX WAS server (including the Dmgr) down one by one so we do not have a service outage and ran the above maintenance once more. On the nodes we also ran a “./synchNode.sh” with the node agent turned off – just to eliminate any possibility of the nodes maybe being out of synch (thanks for the idea Stuart).

We will also be going through our automated scripts to test adding some more items to them (email notifications when individual steps are done, add the “/temp/javasharedresources” to the list of folders to be cleaned,  etc.).

Lessons to be learned:

  • When you don’t work with an OS for a while you forget the important SMALL stuff (/tmp/javasharedresources) – I had run into this very issue a few years ago and totally forgot about it. I actually did not remember it until this morning, the day after.
  • When in doubt – call support RIGHT AWAY, if for no other reason than to validate your thought process is correct and you are not barking up the wrong tree. We did not wait very long to call, but sometimes even 5 minutes can mean the difference between failure and success.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s