Zombie Processes

A 'zombie process' in Linux/Unix is a process that has received the exit system call, however the process is still active in the process list. When a process has received the exit system call, the parent process needs to read that child processes exit status via the wait system call. Once the process exit status is read by the parent, the 'zombie process' is removed from the process table: this process is known as 'reaped'.

Typically processes are reaped by the system very quickly, however if you see the zombie processes not getting cleaned by the parent, an issue or error may be at play.

The key factor with zombie processes is that they cannot be killed by the kill command, which is where they get the name.

You will know you have a zombie process when that process STAT has a 'Z' value. You can review process statuses via the 'ps' or 'top' command. The following are the possible states you may see. (see the ps man page, for more details)

               D    uninterruptible sleep (usually IO)
               R    running or runnable (on run queue)
               S    interruptible sleep (waiting for an event to complete)
               T    stopped by job control signal
               t    stopped by debugger during the tracing
               W    paging (not valid since the 2.6.xx kernel)
               X    dead (should never be seen)
               Z    defunct ("zombie") process, terminated but not reaped by its parent

The following is example output with zombie process when you run 'ps aux'. Image Source

Zombie Process Example

Whenever a process is terminated, interrupted, or resumes after being interrupted, the parent process receives SIGCHLD system call which instructs the operating system to deallocate memory and resources used by that process. The parent process can then obtain the child processes exit status by immediately calling the wait system call.

Once an exited process has been reaped, its PID can be release to the process table and reused. However, if that parent fails to call the wait system call, the zombie process is not reaped and is left in the process table, which can lead to resource leak.

There are cases where you may see a zombie process for a short period of time, which may be by design. For example if a parent process creates another child process and wants to ensure that the child is not allocated to the same PID. If the zombie process does not go away however, it is most likely not by design.

To remove a zombie process, you can attempt to send the SIGCHLD signal to the parent process manually and see if it can reap the process. To find the parent process, use 'ps faux' to view the process tree.

# kill -SIGCHLD [parent PID]
# kill -17 [parent PID]

If this does not work, there is an issue in the parent process preventing it from reaping the child processes. You may need to terminate the parent process. When a child process looses its parent, init becomes its new parent, which periodically executes the wait system call to reap any zombies it may have. 

For more information review the following links:

ttps://en.wikipedia.org/wiki/Zombie_process