Saturday, December 3, 2016

Thread Dumps in WebLogic12c Server

Thread Dumps in WebLogic12c Server
Here I am going to provide information about different ways on taking java thread dumps in a WebLogic Server environment.

Thread dumps are very useful to analyze and troubleshoot performance related issues such as server hang, deadlocks, slow running, idle or stuck applications,  etc.

Different ways to take thread dumps in WebLogic Server 

Always prefer Operating system(OS) commands rather instead of Admin Console or Java Classes, because if the console is hanging, users won't be able to connect to it to issue thread dumps.

1. OS Commands for Thread Dumps

i)
 On Windows,

<ctrl>+<break> --
 the thread dumps are generated in the server stdout

ii)
 On Solaris / Linux

first identify the process ID (
pid) using   ps -ef | grep java, then run   kill -3 <pid> td_filename 2>&1

2. Using weblogic.WLST ( work only from WLS 9.x onwards)


First set CLASSPATH using setDomain.cmd or setDomain.sh (wlst.sh /wlst.cmd will also do in path C:\Oracle\Middleware\wlserver_12.1\common\bin ). Then run below command

java weblogic.WLST ThreadDumps.py

save below code in ThreadDumps.py file:
connect("<username>","<password>","t3://<url>:<port>")
cd(‘Servers’)
cd(‘AdminServer’)
threadDump()
disconnect()
exit()
threadDump()

The thread dumps get stored in the location from where you run it.

3. From Weblogic Administration Console

navigating to Server -> <server_name> -> Monitoring -> Dump threads stack.

4. From the JRockit Command line

jrcmd <pid> print_threads


NodeManager not Reachable: java.io.IOException: Invalid State File Format

Error :
NodeManager not Reachable: java.io.IOException: Invalid State File Format
If we get the above error, Managed Servers cannot be started as the nodemanager is not reachable to Admin server Console.
Stack trace from nodemanager.log file :
   
java.io.IOException: Invalid state file format. State file contents: 
at weblogic.nodemanager.common.StateInfo.load(StateInfo.java:135) 
at weblogic.nodemanager.server.ServerMonitor.loadStateInfo(ServerMonitor.java:475) 
at weblogic.nodemanager.server.ServerMonitor.isCleanupAfterCrashNeeded(ServerMonitor.java:139) 
at weblogic.nodemanager.server.ServerManager.recoverServer(ServerManager.java:255) 
at weblogic.nodemanager.server.DomainManager.initialize(DomainManager.java:103) 
at weblogic.nodemanager.server.DomainManager.(DomainManager.java:55) 
at weblogic.nodemanager.server.NMServer.getDomainManager(NMServer.java:257) 
at weblogic.nodemanager.server.Handler.handleDomain(Handler.java:224) 
at weblogic.nodemanager.server.Handler.handleCommand(Handler.java:108) 
at weblogic.nodemanager.server.Handler.run(Handler.java:70) at java.lang.Thread.run(Thread.java:619)
Reason :
The state file of the managed server is in an invalid state or Failed_not_Restartable.

Login to the server & navigate the Domain_Home, then move  inside to the servers directory
Under each managed server directory, there is a NodeManager directory containing a state file <managed_server_name>.state. If this file is empty or corrupt, then we may have a option to chance to get the errors o.
For example, 

Domain_Home/servers/Managed-server1/data/nodemanager/Managed-server1.state

We have couple of reasons to corrupt this Managed-server1.state file. 
     1) Unexpected physical server reboot
     2) Killing the running node manager process

How we can resolve this issue, if any target managed server is running, try to stop the instance using command line

Domain_Home/bin/stopManagedWeblogic.sh <Managed server Name>

The stop the Admin server

Domain_Home/bin/stopWebogic.sh 

Stop NodeManager process if it is running 

Finally verify the all process related to the domain & delete the below three files.

Domain_Home/servers/Managed-server1/data/nodemanager/Managed-server1.state
Domain_Home/servers/Managed-server1/data/nodemanager/Managed-server1.lck
Domain_Home/servers/Managed-server1/data/nodemanager/Managed-server1.pid
1.     Start the Admin Server .
2.     Start Node Manager.
3.     Verify the nodemanager staus in weblogic admin console(should be reachable)

4.     Start the managed server using the Admin Console.

weblogic.nodemanager.server.NMServer main SEVERE: Fatal error in node manager server java.net.BindException: Address already in use

Error :
Feb 5, 2014 2:45:02 AM weblogic.nodemanager.server.NMServer main SEVERE: Fatal error in node manager server java.net.BindException: Address already in use a
Reason :
The error message : java.net.BindException: Address already in use at
Shows that the Port number used by the Node Manager process is being already  used by some other process .
Solution :
1 : First thing we need to check is whether Node Manager is already running on the machine :
For unix :
ps -ef | grep nodemanager
For windows :
netstat -a0 | findstr <Node_manager port>
tasklist | findstr nodemanager
2 : If you have not observe the running process then , check which process is using the using the port assigned to node manager using lsof command or other .
By default Weblogic Node Manager runs on port : 5556 if not just check nodemanager.properties
So, to check which process is using this port use the below command .
For unix :
netstat -an | grep 5556
For Windows :
netstat -ao | findstr 5556
3 : If this port is already used by some other process try to change the Node Manager Port number from file
Middleware_Home/wlserver_10.3/common/nodemanager/nodemanager.properties file :
ListenPort = XXXX

other wise before changing , verify the process & observe the wl server home... if it is same(which you are trying)... try to kill the process & start once again with below commands
4 : Now try to start the Node manager from :
Middleware_Home/wlserver_10.3/server/bin/startNodemanager.sh or

Middleware_Home/wlserver_10.3/server/bin/startNodemanager.cmd

Error NodeManager BEA-300033 Could not execute command “getVersion” on the node manager. Reason: “Access to domain ‘Base_Domain’ for user ‘DTXezFTI’ denied”.


<Error> <NodeManager> <BEA-300033> <Could not execute command “getVersion” on the node manager. Reason: “Access to domain ‘Test_Domain’ for user ‘DTXezFTI’ denied”.>

Resolution Steps:

By default node manager log level is Info .
First we have to set the log level to 
Finest.(if you configured the logging as finest... you will get more information in logs) 

We have few other log levels available for Node Manager :

SEVERE          
WARNING
INFO                
CONFIG
FINE
FINER
FINEST             

After  changes restart the Node Manager and check for the log file error message again.

Solution :

1 : Log on to the Admin console ( http://AdminserverIP:Port/console )

2 : click on  Domain Name (Test_Domain) in my case >>>>> Then to security >>>>

3 : Click on the Advance Option for change the username & password values to use nodemanager.

Let's an example

username is : weblogic 

password is : welcome123

4 : save the changes.

Then we have to edit the nodemanager properties of each remote host(with in domain):

·         Navigate to the folder %DOMAINHOME%\config\nodemanager

·         Edit the file: nm_password.properties with updated credentials

·         username=weblogic

·         password=welcome123

·         Save

·         Restart the Weblogic Node Manager

5 : Then navigate to the Test_Domain/servers/mannaged_server/data/nodemanager/ dir.

6 : open the boot.properties file and enter the below values :

username=weblogi & password= welcome123

7: Restart the your Admin Server.

8 : Now run the nmEnroll() wlst command for all the machines that has the Managed server.

wlst>connect(‘username’,'password’,'t3://admin_host:admin_port’)

online>>nmEnroll(‘Domain_dir_path’,'NodeManager_Home_Path’)

online>>exit()

Now you can check the node manager status from Admin Console .

left panel >> Machines >> Machine1 >> Monitoring >>

The Status should be reachable.