Analyze and Fix One Java System.exit Hang Problem


My Solr application is designed to be running at user's laptop, it will fetch content from remote Solr server via proxy application. To reduce the impact to user, we limit the max memory size to 512mb: -Xmx512.
We take several approaches to reduce memory usage: for example use JSON(GSon) Streaming to read Solr document one by one from response.
Please see more details at:
Solr: Use JSON(GSon) Streaming to Reduce Memory Usage
Solr: Use STAX Parser to Read XML Response to Reduce Memory Usage
Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage

But the Solr application may still throw OutOfMemoryError, when this happens, I have to kill the embedded jetty server, another application will detect the jetty server is down (by sending http request to /solr/admin/ping) and will restart it.

But I found that at some point, there are 2 embedded jetty application running: the first one hit OutOfMemoryError, and tried to kill itself, but hang, another was starting but failed with followingerror:
org.apache.solr.common.SolrException: 
Index locked for write for core collection1.
EVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Index locked for write for core collection1
 at org.apache.solr.core.SolrCore.$lt;init$gt;(SolrCore.java:806)
 at org.apache.solr.core.SolrCore.$lt;init$gt;(SolrCore.java:619)
Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core collection1
 at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:484)
 at org.apache.solr.core.SolrCore.$lt;init$gt;(SolrCore.java:730)
This is because the first jetty application didn't actually be killed, so it's still holding the lock to write.lock file. This problem can be fixed by fixing the first issue: why the first jetty application didn't kill itself.
Why System.exit hang?
I reproduced the problem, and get the heap dump and use IBM Java Thread and Dump Analyzer to analyze it.
Quick Conclusion
Don't call System.ext in ServletContextListener.contextDestroyed.

MyTaskThread
This happens first: when the application code detects OutOfMemoryError, it will do some clean up jobs and then call System.exit to kill itself. System.exit will call Shutdown.runHooks, which will run all ShutdownHook threads - added by Runtime.addShutdownHook(Thread hook).
Now it owns the lock of Shutdown instance and wait for jetty ShutdownThread finish.
MyTaskThread State: in Object.wait()
Monitor: Owns Monitor Lock on 0x00000000f0051870
at java.lang.Object.wait(Native Method) 
- waiting on [0x00000000f00a7000] (a org.eclipse.jetty.util.thread.ShutdownThread) 
at java.lang.Thread.join(Unknown Source) 
- locked [0x00000000f00a7000] (a org.eclipse.jetty.util.thread.ShutdownThread) 
at java.lang.Thread.join(Unknown Source) 
at java.lang.ApplicationShutdownHooks.runHooks(Unknown Source) 
at java.lang.ApplicationShutdownHooks$1.run(Unknown Source) 
at java.lang.Shutdown.runHooks(Unknown Source) 
at java.lang.Shutdown.sequence(Unknown Source) 
at java.lang.Shutdown.exit(Unknown Source) 
- locked [0x00000000f0051870] (a java.lang.Class for java.lang.Shutdown) 
at java.lang.Runtime.exit(Unknown Source) 
at java.lang.System.exit(Unknown Source) 
at MyAppUtilUtil.hardCommitLocalCache(MyAppUtilUtil.java:152) 
at MyAppUtilUtil.synchronizeP1Index(MyAppUtilUtil.java:372) 
at MyAppUtil$MyAppUtilUtilTask$MyTaskThread.doRun(MyAppUtilUtil.java:352) 
at MyAppUtil$MyAppUtilUtilTask$MyTaskThread.run(MyAppUtilUtil.java:324
Look at jetty ShutdownThread, it's a ShutdownHook added by jetty.server.Server. It calls MyServletContextListener.contextDestroyed, which will call same method MyAppUtil.shutdown to do clean up which will call System.exit.
Now the cause of the hang is obvious.
ShutdownThread tries to call Shutdown.exit, and tries to get the lock of Shutdown instance to enter synchronized (lock) block. But it can't as the lock is already owned by MyTaskThread thread, so ShutdownThread will wait for ever.

MyTaskThread thread owns the lock of Shutdown instance and wait for jetty ShutdownThread finish. It will also wait for ever. This cause the System.ext hang.
ShutdownThread
Look at jetty ShutdownThread, it's a ShutdownHook added by jetty.server.Server. It calls MyServletContextListener.contextDestroyed, which will call same method MyAppUtil.shutdown to do clean up which will call System.exit.
Now the cause of the hang is obvious.
ShutdownThread tries to call Shutdown.exit, and tries to get the lock of Shutdown instance to enter synchronized (lock) block. But it can't as the lock is already owned by MyTaskThread thread, so ShutdownThread will wait for ever.

MyTaskThread thread owns the lock of Shutdown instance and wait for jetty ShutdownThread finish. It will also wait for ever. This cause the System.ext hang.

State: Waiting on monitor
Monitor: Waiting for Monitor Lock on 0x00000000f0051870
at java.lang.Shutdown.exit(Unknown Source) 
- waiting to lock [0x00000000f0051870] (a java.lang.Class for java.lang.Shutdown) 
at java.lang.Runtime.exit(Unknown Source) 
at java.lang.System.exit(Unknown Source) 
at MyAppUtilUtil.hardCommitLocalCache(MyAppUtilUtil.java:152) 
at MyAppUtil$MyAppUtilUtilTask.shutdown(MyAppUtilUtil.java:126) 
at MyAppUtil.shutdown(MyAppUtilUtil.java:64) 
- locked [0x00000000f04cbfc0] (a MyAppUtil) 
at MyServletContextListener.contextDestroyed(MyServletContextListener.java:18) 
at org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:813) 
at org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:158) 
at org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:504) 
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89) 
- locked [0x00000000f010ce40] (a java.lang.Object) 
at org.eclipse.jetty.server.handler.HandlerCollection.doStop(HandlerCollection.java:250) 
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89) 
- locked [0x00000000f00430c8] (a java.lang.Object) 
at org.eclipse.jetty.server.handler.HandlerWrapper.doStop(HandlerWrapper.java:107) 
at org.eclipse.jetty.server.Server.doStop(Server.java:338) 
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89) 
- locked [0x00000000f00404d8] (a java.lang.Object) 
at org.eclipse.jetty.util.thread.ShutdownThread.run(ShutdownThread.java:131)
To fix the problem, In MyServletContextListener.contextDestroyed, I do all nedded cleanup but don't call System.exit.

Resource

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)