Goroutines Hell — A Survival Guide to Panic and Deadlock
In the page, this guide will include:
- What triggers panic and deadlock
- Recognize their symptoms and differences
- Learn strategies to prevent and handle them
- How debugging these issues
panic
When it happens:
- Immediately stops the current goroutine, bubbles up the error, executes each previously registered defer
- Code after panic and defers registered after panic won't execute
- If recover() fails, the program crashes
🚫 Send data for closed channel
code
func main() { defer fmt.Println("This line will work after panic.") ch := make(chan int) close(ch) ch <- 1 // send data to closed channel defer fmt.Println("This line won't work after panic.") fmt.Println("This line won't be executed") } // panic: send on closed channel
🚫 Access violation
code
type Person struct { Name string } func main() { var p *Person fmt.Println(p.Name) } // panic: runtime error: invalid memory address or nil pointer dereference
🚫 Nil Pointer reference
code
func panicExample() { defer fmt.Println("panic example") a := []int{1, 2, 3} fmt.Println(a[5]) } // panic: runtime error: index out of range [5] with length 3
🚫 Type Assertion Error
code
func main() { var i interface{} = "hello" num := i.(int) fmt.Println((num)) //panic: interface conversion: interface {} is string, not int }
deadlock
Your logic is bad
- We're all waiting for each other, like your office BLOCKING ISSUE
- Only some can be detected, and when detected, the program exits
⚠️ Deadlock - Mutex
each goroutine try get each other lock
code
func mutexDeadlock() { var mutex1, mutex2 sync.Mutex go func() { mutex1.Lock() fmt.Println("Goroutine 1: GOT mutex1") time.Sleep(100 * time.Millisecond) fmt.Println("Goroutine 1: TRY mutex2") mutex2.Lock() // deadlock fmt.Println("Goroutine 1: GOT mutex2") defer mutex2.Unlock() defer mutex1.Unlock() }() go func() { mutex2.Lock() fmt.Println("Goroutine 2: GOT mutex2") time.Sleep(100 * time.Millisecond) fmt.Println("Goroutine 2: TRY mutex1") mutex1.Lock() // deadlock fmt.Println("Goroutine 2: GOT mutex1") defer mutex1.Unlock() defer mutex2.Unlock() }() time.Sleep(time.Second) }
🚫 Deadlock - Channel
code
func channelDeadlock() { ch := make(chan int) // no buffer // deadlock here ch <- 1 // no receiver fmt.Println(<-ch) // never been here } // fatal error: all goroutines are asleep - deadlock!
Observing Mutex Deadlocks
Unlike channel deadlocks which produce immediate fatal errors, mutex deadlocks can be more subtle and challenging to detect. Here are two effective methods for identifying mutex deadlocks:
Method 1: Goroutine Count Monitoring
By tracking the number of active goroutines, you can detect potential deadlocks.
If the goroutine count remains constant and elevated, it might indicate a deadlock.
code
func main() { go func() { for { fmt.Printf("Number of goroutines: %d\n", runtime.NumGoroutine()) time.Sleep(time.Second) } }() mutexDeadlock() time.Sleep(100 * time.Second) // give enough time to observe }
Method 2: Runtime Profiling with pprof
For more detailed analysis, Go's pprof tool provides comprehensive insights:
code
import ( // ... _ "net/http/pprof" ) func main() { // Start pprof server go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }() mutexDeadlock() time.Sleep(100 * time.Second) // give enough time to observe }
Navigate to http://localhost:6060/debug/pprof/goroutine?debug=1
in your browser to view detailed goroutine states, including:
- Locked goroutines
- Stack traces
- Lock contention points
1 @ 0x1001f7b88 0x10020bd98 0x10020bd75 0x10022bcc8 0x100249554 0x1004004a0 0x10040044d 0x100230284
# 0x10022bcc7 sync.runtime_SemacquireMutex+0x27 /usr/local/go/src/runtime/sema.go:77
# 0x100249553 sync.(*Mutex).lockSlow+0x173 /usr/local/go/src/sync/mutex.go:171
# 0x10040049f sync.(*Mutex).Lock+0x16f /usr/local/go/src/sync/mutex.go:90
# 0x10040044c main.mutexDeadlock.func2+0x11c /Users/jialinhuang/Desktop/go-websocket/main.go:47
1 @ 0x1001f7b88 0x10020bd98 0x10020bd75 0x10022bcc8 0x100249554 0x100400820 0x1004007cd 0x100230284
# 0x10022bcc7 sync.runtime_SemacquireMutex+0x27 /usr/local/go/src/runtime/sema.go:77
# 0x100249553 sync.(*Mutex).lockSlow+0x173 /usr/local/go/src/sync/mutex.go:171
# 0x10040081f sync.(*Mutex).Lock+0x16f /usr/local/go/src/sync/mutex.go:90
# 0x1004007cc main.mutexDeadlock.func1+0x11c /Users/jialinhuang/Desktop/go-websocket/main.go:35
If you want visualization
brew install graphviz
# when go run main.go
go tool pprof http://localhost:6060/debug/pprof/goroutine
# generate a compressed profile in your
# /Users/{???}/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.004.pb.gz
# to serve that profile you just created.
go tool pprof -http=:8080 /Users/jialinhuang/pprof/pprof.main.goroutine.004.pb.gz
# Serving web UI on http://localhost:8080
Deadlock solution
Deadlock Mutex Solution - same order
Goroutine1 gets lock1, goroutine2 is blocked because lock1 is already taken by goroutine1, so goroutine2 will be in a blocking state until goroutine1 releases the lock.
NOT GOOD ENOUGH: This becomes hard to manage when you have many locks
code
func mutexSolution1() { var mutex1, mutex2 sync.Mutex var wg sync.WaitGroup wg.Add(2) // both lock thing in the same order:first mutex1 then mutex2 for i := 0; i < 2; i++ { go func(id int) { defer wg.Done() mutex1.Lock() fmt.Printf("Goroutine %d: LOCKING mutex1\n", id) // some job time.Sleep(100 * time.Millisecond) fmt.Printf("Goroutine %d: TRY LOCKING mutex2\n", id) mutex2.Lock() // fmt.Printf("Goroutine %d: USING...\n", id) time.Sleep(50 * time.Millisecond) mutex2.Unlock() mutex1.Unlock() }(i) } wg.Wait() }
Deadlock Mutex Solution - lock all
Barbaric one
NOT GOOD ENOUGH: This brute force approach isn't sustainable long-term
code
func mutexSolution2() { locks := OrderedLocks{} go func() { locks.LockBoth() fmt.Println("Goroutine 1: GET ALL LOCKS") fmt.Println("Goroutine 1: DONE") locks.UnlockBoth() }() go func() { locks.LockBoth() fmt.Println("Goroutine 2: GET ALL LOCKS") fmt.Println("Goroutine 2: DONE") locks.UnlockBoth() }() time.Sleep(time.Second) } type OrderedLocks struct { mutex1 sync.Mutex mutex2 sync.Mutex } func (l *OrderedLocks) LockBoth() { l.mutex1.Lock() l.mutex2.Lock() } func (l *OrderedLocks) UnlockBoth() { l.mutex2.Unlock() l.mutex1.Unlock() }
Deadlock Mutex Solution - TryLock
TryLock is available after Go 1.18
Here we use Context for communication. Don't overthink it - context is just a tool for time-limited actions
For simple signal passing, both Channel and Context work.
Choose Channel for synchronization needs, Context for time-limited stuff like HTTP
The idea below is:
- Each goroutine first locks its corresponding index lock
- Then tries to lock the other one
- Finds out it can't because the other lock is taken by another goroutine
- In tryLock, changing the default behavior
time.Sleep(1 * time.Millisecond)
totime.Sleep(100 * time.Microsecond)
can effectively GET ALL LOCKS
code
func trylockSolution() { var mutexes [2]sync.Mutex var wg sync.WaitGroup tryLock := func(m *sync.Mutex, timeout time.Duration, index int) bool { ctx, cancel := context.WithTimeout(context.Background(), timeout) defer cancel() for { select { case <-ctx.Done(): return false default: if m.TryLock() { fmt.Printf("Goroutine %d: Lock Success\n", index) return true } time.Sleep(1 * time.Millisecond) // reduce interval here, } } } wg.Add(2) for i := 0; i < 2; i++ { go func(i int) { defer wg.Done() // Try to lock the first mutex if !tryLock(&mutexes[i], 500*time.Millisecond, i+1) { fmt.Printf("Goroutine %d: CAN't GET mutex%d, GIVE UP!\n", i+1, i+1) return } defer mutexes[i].Unlock() time.Sleep(100 * time.Millisecond) // Try to lock the second mutex (in reverse order for the second goroutine) secondLock := (i + 1) % 2 if !tryLock(&mutexes[secondLock], 500*time.Millisecond, i+1) { fmt.Printf("Goroutine %d: CAN't GET mutex%d, GIVE UP!\n", i+1, secondLock+1) return } defer mutexes[secondLock].Unlock() fmt.Printf("Goroutine %d: GET ALL LOCKS\n", i+1) }(i) } wg.Wait() } /* Goroutine 1: Lock Success Goroutine 2: Lock Success Goroutine 1: CAN't GET mutex2, GIVE UP! Goroutine 2: CAN't GET mutex1, GIVE UP! */ // small chance will be /* Iteration 7 Goroutine 2: Lock Success Goroutine 1: Lock Success Goroutine 2: CAN't GET mutex1, GIVE UP! Goroutine 1: Lock Success Goroutine 1: GET ALL LOCKS */
Deadlock Channel Solution
If you prefer Go's CSP style
code
func channelSolution() { resource1 := make(chan struct{}, 1) resource2 := make(chan struct{}, 1) var wg sync.WaitGroup resource1 <- struct{}{} resource2 <- struct{}{} wg.Add(2) go func() { defer wg.Done() select { case <-resource1: fmt.Println("Goroutine 1: GET RESOURCE 1") time.Sleep(100 * time.Millisecond) select { case <-resource2: fmt.Println("Goroutine 1: GET RESOURCE 2") time.Sleep(100 * time.Millisecond) resource2 <- struct{}{} //release resource default: fmt.Println("Goroutine 1: CAN'T GET RESOURCE 2") } resource1 <- struct{}{} default: fmt.Println("Goroutine 1: CAN'T GET RESOURCE 1") } }() go func() { defer wg.Done() select { case <-resource2: fmt.Println("Goroutine 2: GET RESOURCE 2") time.Sleep(100 * time.Millisecond) select { case <-resource1: fmt.Println("Goroutine 2: GET RESOURCE 1") time.Sleep(100 * time.Millisecond) resource1 <- struct{}{} // release resource default: fmt.Println("Goroutine 2: CAN'T GET RESOURCE 1") } resource2 <- struct{}{} default: fmt.Println("Goroutine 2: CAN'T GET RESOURCE 2") } }() wg.Wait() }
Comparison
Aspect | Panic | Deadlock |
Timing | immediately | Maybe wait for unsure |
Recovery | yes with recover() | no, program restart |
Detection | Explicit error message | Often requires monitoring tools |
Scope | Affects specific goroutine | Potentially system-wide |
Debugging | obvious stack and error message, easy | Complex, requires runtime analysis |
References
https://wangdaming.gitbooks.io/golang/content/tong_bu_lock.html