Too many open files in Jetty

Guozheng Ge
3 min readAug 29, 2020

We have a set of REST APIs written in Java and they are deployed with Jetty server. Recently, we experienced a mysterious production issue with increased request traffic: the max and p99 latency at ELB had sudden spikes, ELB showed sporadic 5xx errors and servers randomly restarted.

very high ELB latency in seconds!!!
Spikes of 5xx backend server errors on ELB

After looking through the JVM log, we found that the JVM restarted because it ran out of file handlers. After adding a metric to track open files used by JVM, it was very clear the file handlers are leaking. Before digging deeper to understand and address the root cause, we first added alerts so that we can manually restart the server when the open file count crosses a threshold before JVM restarts.

Jetty leaking file handlers

Then, we compared a server in the leaking state with a normal server. We found there was a huge amount of TCP connections got stuck in “CLOSE_WAIT” state. Specially, these connections are between clients and the Jetty server.

CLOSE_WAIT state occurs when a client has initiated the TCP connection close, but server did not proceed to finish the connection close. Here is a nice chart that explains TCP close flow:

When the server has increased amount of requests, to serve these requests, Jetty needs to open new connections and thus the open file count increases faster than the rate of reclaiming file handlers by idle timeout. Eventually, JVM runs out of file limit and dies.

Jetty connector has a timeout setting to forcibly close idle connections (idleTimeout, see Jetty connector configuration page). The default value is 30s, but somehow we’ve configured it to be 30min! This caused those connections to stuck in CLOSE_WAIT and linger around for a long time before the timeout kicks in to reclaim the file handlers.

We’ve restored to use the default 30s idleTimeout, also tried shorter timeout like 5s, this effectively addressed the open file issue. Latency and 5xx errors are fixed as a result with reduced CPU usage.

open files with different Jetty idleTimeout values

Now, why Jetty cannot close connection properly is the next thing we need to dig out…

References

https://github.com/eclipse/jetty.project/issues/1473

--

--