理解Socket Leak

Last Updated: 2023-03-27 09:34:58 Monday

-- TOC --

socket资源其实就是一个文件描述符资源,所谓socket leak,就是存在没有调用close的socket,此socket将无限期地处于CLOSE_WAIT状态,得不到释放。

正如memory leak,内存泄漏,没有调用free。

socket leak,socket泄漏,或文件描述符泄漏,没有调用close!

socket leak是如何发生的?

简单说,当client调用close关闭tcp连接后,client进入FIN_WAIT_1状态,server收到FIN后,回复ACK,client进入FIN_WAIT_2状态,此时server就处于CLOSE_WAIT状态。

当server处于CLOSE_WAIT状态时,如果没有对此socket调用close,那么,这个socket就会无限期的一直处于CLOSE_WAIT状态。这就是socket leak,直到server重启才能释放这个socket。

复现这个问题很容易,下面显示的这个socket,已经CLOSE_WAIT很久了:

$ ss -tn4
State       Recv-Q  Send-Q   Local Address:Port      Peer Address:Port  Process
ESTAB       0       48         172.16.6.90:18962    49.77.232.115:3203
ESTAB       0       0          172.16.6.90:18962    49.77.232.115:3946
ESTAB       0       0          172.16.6.90:18962    49.77.232.115:3371
ESTAB       0       0          172.16.6.90:18962   117.89.130.240:59409
CLOSE-WAIT  1       0          172.16.6.90:51986    49.77.232.115:3541
ESTAB       0       0          172.16.6.90:18962   117.89.130.240:59412
ESTAB       0       0          172.16.6.90:33856    100.100.30.25:80

socket leak的影响

client在FIN_WAIT_2状态下,会等待默认60秒的时间,这个时间长度默认与TIME_WAIT等待的时长一样,也可以通过net.ipv4.tcp_fin_timeout来修改。超时系统就强行释放socket资源。

当此client(ip不变)再次(也许是偶然)使用相同的port,连接server的时候,One part of the socket won't be able to advance from the SYN_SENT state, while the other part is stuck in CLOSE_WAIT. The SYN_SENT socket will eventually give up failing with ETIMEDOUT. client发出SYN后收不到回应,connect最终会超时!

因此,socket leak的影响,就是可能特定地址无法连接成功的诡异情况。

思考

It seems that the design decisions made by the BSD Socket API have unexpected long lasting consequences. If you think about it - why exactly the socket can automatically expire the FIN_WAIT state, but can't move off from CLOSE_WAIT after some grace time. This is very confusing... And it should be! The original TCP specification does not allow automatic state transition after FIN_WAIT_2 state! According to the spec FIN_WAIT_2 is supposed to stay running until the application on the other side cleans up.

TCP标准并没有允许自动从FIN_WAIT_2状态退出,这个动作是不符合标准的。当看到本地有FIN_WAIT_2的socket的时候,也许tcp通信的另一端已经出问题了。

本文链接:https://cs.pynote.net/net/tcp/202205181/

-- EOF --

-- MORE --