Did you know that mutex can hang the system/execution if used inappropriately? If you are new to mutex concept, click here. If you are new to mutex concurrent programming, click here.
In what are all the scenarios system can hang?
1. Forgetting to unlock or unable to unlock
2. Deadlock
3. Function/Command execution not returned (Edited)
Mostly this will happen in a multi-threading environment. As stated in the mentioned posts, mutex is used in multi-threading.
How to find the thread which has locked the mutex?
Recently, I've come across this issue, but it is difficult to find the root cause in the larger systems. Anyway, primarily find out the hanging process. Typically, a process that is unresponsive and using 100% of a CPU is stuck in an endless loop. This can be found out by the top command. Dump all threads' backtrace of the process to find out threads that are blocked on a lock. This can be achieved by using the below command.
(gdb) thread apply all bt
From this point, we can browse the code to find which thread was blocked with a lock. To know the running threads of your process, use the below command.
(gdb) info threads
You might be questioning how the GDB will work if the system is in a hang state. We attached the GDB prior to the hang scenario as the issue was consistently reproducible. When the hang issue was hit, the command 'thread apply all bt' was typed in the GDB but the typed characters wouldn't reflect in GDB prompt as the system was in hang state. After a few seconds, when the system is recovered from the hang state, it started dumping the stack trace of all the threads. This greatly helped to narrow down the problem.
To switch between threads, the below command can be used.
(gdb) thread 3
The below program is an intentional mutex hang issue by creating a set of codes that will not unlock mutex. Try running the program from Here.
There are 5 threads namely 1,2,3,4, and 5 in the above program. As soon as thread 2 created, it acquired the mutex and stuck in the while loop as the condition never exited. So, thread 3 will wait to acquire the mutex, and similarly thread 4 & 5.
Take a look at this series of GDB debugging commands when the program is running.
Once we find the threads which are all waiting for lock, we can debug further easily as said above. Here, Thread1 - 32317 - Main()
Thread2 - 32321 - thread created, locked, counter incremented to 1, and unlocked.
Thread3 - 32322 - thread created, locked, counter incremented to 2, and not unlocked.
Thread4 - 32323 - thread created, locked, counter incremented to 3, and unlocked.
Thread5 - 32324 - thread created, locked, counter incremented to 4, and unlocked.
How I derived that 32322 is not unlocked?
32321 is unlocked and exited from the callback function trythis(), backtrace is underneath each thread.
Takeaways
- Threads are listed in the order of creation, as thread 1 is first created which is main(), it is listed at the bottom.
- There are 3 different numbering systems:
- Suspended mutexes are highlighted in orange, after thread 32323 creation, it will wait for 32322 to release the lock but 32322 goes indefinite state where Mutex can't be unlocked.
Note: Same thread can lock the mutex multiple times without the risk of deadlock.
That's all guys, hope this helps a lot.
References
github, cppreference, jakascorner, codeistry, stackoverflow, fayewilliams, yusufonlinux blog
Comments
Post a Comment