-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Finto AI's Annif rebooted unexpectedly today logging this:
ERROR] Worker (pid:54) was sent code 139!
https://www.groundcover.com/kubernetes-troubleshooting/exit-code-139
What is Exit Code 139?
In Kubernetes, exit code 139 is an event that occurs when a container receives the SIGSEGV signal from the operating system on its host node.In Linux and other Unix-like operating systems, SIGSEGV is a type of forced termination signal that tells a process to shut down. The operating system typically generates this signal when it detects a process that is trying to access system memory that either doesn't exist or that the process lacks permission to access – an event known as a segmentation fault (or just segfault, as die-hard Linux geeks like to put it).
If a container receives SIGSEGV, it will usually terminate. That's undesirable, of course, because you typically don't want your containers to shut down unless you decide to shut them down. But the alternative to SIGSEGV is potentially having your entire server crash due to multiple processes trying to use the same memory address – which would be like all of the dogs in the neighborhood rushing into the same yard to brawl. It would be chaos, and everything would stop working because no container would be able to access memory reliably.
So, the operating system sends SIGSEGV error in an effort to prevent a much bigger problem.
I think this happened also one time a few weeks ago.
If the cause is TensorFlow, then this issue will be resolved if/when reimplementing NN ensemble using Pytorch (#895).
Note that crashes with exit code 134 (which have not been occurring in a while) were tracked in the issue #737.