wait系统调用,僵尸进程

Last Updated: 2023-06-19 09:41:37 Monday

-- TOC --

僵尸进程,zombie or defunct,是指一个已经结束但还没有被父进程reap的子进程。reap的方式,就是使用wait系统调用。

如下示例:

xinlin    311016  ...    S+   15:03   0:00 python3 test_dup2.py
xinlin    311017  ...    Z+   15:03   0:00 [python3] <defunct>

311017是311016的子进程,但311016不调用wait,也没有结束,因此311017不幸成为了zombie(Z+)!

// wait, waitpid, waitid - wait for process to change state
// man 2 wait
#include <sys/wait.h>

wait接口比较简单,它就是单纯的等待任一个子进程的结束,

pid_t wait(NULL);

等待某个子进程的结束,不需要子进程的status,不关心子进程的pid,不设超时时间(死等)。如果有多个子进程,只要其中任一个结束,就满足打破wait的条件(wait返回,程序继续)。

下面是较复杂的用法:

pid_t wait(int *wstatus);

既可以得到结束的子进程的status,还有pid。


子进程自己执行结束了,可为什么父进程还要wait呢?

这需要我们理解僵尸进程,不要被僵尸这个名字吓到了,僵尸进程一点都不可怕。manpage的信息,足够我们理解zombie process:

All of these system calls are used to wait for state changes in a child of the calling process, and obtain information about the child whose state has changed. A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a signal.

子进程结束,子进程被信号暂停,子进程被信号重新启动,都属于state change,都可以wait。

In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then the terminated child remains in a "zombie" state.

对于已结束的子进程,父进程的wait系统调用,可以release一些与child绑定的资源,如果父进程没有wait,已结束的子进程就进入zombie状态。

A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.

wait的目的,是让父进程能够获取子进程的信息(PID,结束状态信息,资源使用信息),既然父进程还没有调用wait,子进程的相关信息就还是会保留在系统中。僵尸进程也是进程,也会占用一点系统资源。

If a parent process terminates, then its "zombie" children (if any) are adopted by init(1), (or by the nearest "subreaper" process as defined through the use of the prctl(2) PR_SET_CHILD_SUBREAPER operation); init(1) automatically performs a wait to remove the zombies.

如果父进程没有调用wait就结束,僵死进程变为孤儿进程,被1号进程领养,然后被wait掉。

不断创建子进程的程序,一定要注意调用wait,否则运行时间长了,zombie太多,部分资源得不到释放,系统慢慢就异常了.....


waitpid接口就要复杂一些,有更多的参数,更丰富的控制。man 2 wait最后有一段代码,需仔细体会。Python的os模块中提供的接口,几乎就是系统调用上面的一层薄薄的封装。

本文链接:https://cs.pynote.net/sf/linux/prog/202207142/

-- EOF --

-- MORE --