How are segmentation errors reported?

I'm just wondering how segmentation errors can occur.

  • The process will simply die, so obviously it cannot communicate this.
  • The shell will not know for sure if the process does not transmit a signal, which may not be necessary.
  • The OS can do something, but I'm not sure how to do it.

Which of these reports causes segmentation errors (just an example) and how?

+4
source share
4 answers

The process will simply die, so obviously it cannot communicate this.

This is actually incorrect. You can install the SIGSEGV handler to replace the standard one, which simply resets the kernel and dies. The preload library can do this to catch a segmentation violation and use limited features to notify another process running on the system of what happened before the release.

+3
source

If you look at the wait() or waitpid() functions, you will find that one of the bits in the exit status indicates a core dump. The POSIX specification mentions WIFSIGNALED [sic] and WTERMSIG to get the signal that completed the process. The POSIX specification does not mention this, but for example, on Mac OS X (10.7.4) there is a WCOREDUMP() macro to check if the main file has been created.

+2
source

You might have some code, for example this , that will invoke the GDB command to reset the call trace:

 void BacktraceOnSegv() { struct sigaction action = {}; action.sa_handler = DumpBacktrace; if (sigaction(SIGSEGV, &action, NULL) < 0) { perror("sigaction(SEGV)"); } } void DumpBacktrace(int) { pid_t dying_pid = getpid(); pid_t child_pid = fork(); if (child_pid < 0) { perror("fork() while collecting backtrace:"); } else if (child_pid == 0) { char buf[1024]; sprintf(buf, "gdb -p %d -batch -ex bt 2>/dev/null | " "sed '0,/<signal handler/d'", dying_pid); const char* argv[] = {"sh", "-c", buf, NULL}; execve("/bin/sh", (char**)argv, NULL); _exit(1); } else { waitpid(child_pid, NULL, 0); } _exit(1); } 

Here is an implementation that supports more platforms.

+2
source

ok, to start, a segmentation error occurs when the CPU tries to access an address that the process does not have access to. At the lowest level, a memory mapping implementation should detect this, which generally causes an interrupt. The kernel receives this interrupt and has a table of addresses for other code segments, each of which is designed to handle this interrupt.

When the kernel receives this interrupt, it translates it into a specific value (I'm vague because the exact data differs with both the hardware architecture and the kernel implementation). SIGSEGV usually defined as a value of 11, but the exact value does not matter; it is defined in signal.h .

At this point, the signal value is transferred to another table inside the kernel, which contains the addresses of the "signal handlers". One of these handlers is at the offset represented by SIGSEGV . If you haven’t done something to change it, this address is usually a subroutine that calls the main dump, assuming appropriate restrictions, but you can replace this with the address of your own procedure, which can do whatever you like.

+1
source

All Articles