@jialin.huang
FRONT-ENDBACK-ENDNETWORK, HTTPOS, COMPUTERCLOUD, AWS, Docker
To live is to risk it all Otherwise you are just an inert chunk of randomly assembled molecules drifting wherever the Universe blows you

© 2024 jialin00.com

Original content since 2022

back
RSS

Goroutines Hell — A Survival Guide to Panic and Deadlock

In the page, this guide will include:

  • What triggers panic and deadlock
  • Recognize their symptoms and differences
  • Learn strategies to prevent and handle them
  • How debugging these issues

panic

When it happens:

  • Immediately stops the current goroutine, bubbles up the error, executes each previously registered defer
    • Code after panic and defers registered after panic won't execute
  • If recover() fails, the program crashes

🚫 Send data for closed channel

  • code
    func main() {
    	defer fmt.Println("This line will work after panic.")
    	ch := make(chan int)
    
    	close(ch)
    
    	ch <- 1 // send data to closed channel
    	defer fmt.Println("This line won't work after panic.")
    	fmt.Println("This line won't be executed")
    }
    
    // panic: send on closed channel

🚫 Access violation

  • code
    type Person struct {
    	Name string
    }
    
    func main() {
    	var p *Person
    	fmt.Println(p.Name)
    	
    }
    
    // panic: runtime error: invalid memory address or nil pointer dereference

🚫 Nil Pointer reference

  • code
    func panicExample() {
    	defer fmt.Println("panic  example")
    
    	a := []int{1, 2, 3}
    	fmt.Println(a[5])
    }
    
    // panic: runtime error: index out of range [5] with length 3

🚫 Type Assertion Error

  • code
    func main() {
    	var i interface{} = "hello"
    	num := i.(int)
    	fmt.Println((num))
    	//panic: interface conversion: interface {} is string, not int
    }

deadlock

Your logic is bad

  • We're all waiting for each other, like your office BLOCKING ISSUE
  • Only some can be detected, and when detected, the program exits

⚠️ Deadlock - Mutex

each goroutine try get each other lock

  • code
    func mutexDeadlock() {
    	var mutex1, mutex2 sync.Mutex
    
    	go func() {
    		mutex1.Lock()
    		fmt.Println("Goroutine 1: GOT mutex1")
    		time.Sleep(100 * time.Millisecond)
    
    		fmt.Println("Goroutine 1: TRY mutex2")
    		mutex2.Lock() // deadlock
    		fmt.Println("Goroutine 1: GOT mutex2")
    		defer mutex2.Unlock()
    		defer mutex1.Unlock()
    	}()
    
    	go func() {
    		mutex2.Lock()
    		fmt.Println("Goroutine 2: GOT mutex2")
    		time.Sleep(100 * time.Millisecond)
    
    		fmt.Println("Goroutine 2: TRY mutex1")
    		mutex1.Lock() // deadlock
    		fmt.Println("Goroutine 2: GOT mutex1")
    		defer mutex1.Unlock()
    		defer mutex2.Unlock()
    	}()
    
    	time.Sleep(time.Second)
    }

🚫 Deadlock - Channel

  • code
    func channelDeadlock() {
    	ch := make(chan int) // no buffer
    	// deadlock here
    	ch <- 1 // no receiver
    	fmt.Println(<-ch) // never been here
    }
    // fatal error: all goroutines are asleep - deadlock!

Observing Mutex Deadlocks

Unlike channel deadlocks which produce immediate fatal errors, mutex deadlocks can be more subtle and challenging to detect. Here are two effective methods for identifying mutex deadlocks:

Method 1: Goroutine Count Monitoring

By tracking the number of active goroutines, you can detect potential deadlocks.

If the goroutine count remains constant and elevated, it might indicate a deadlock.

  • code
    func main() {
    	go func() {
    		for {
    			fmt.Printf("Number of goroutines: %d\n", runtime.NumGoroutine())
    			time.Sleep(time.Second)
    		}
    	}()
    	mutexDeadlock()
    	time.Sleep(100 * time.Second) // give enough time to observe
    }

Method 2: Runtime Profiling with pprof

For more detailed analysis, Go's pprof tool provides comprehensive insights:

  • code
    import (
      // ...
    	_ "net/http/pprof"
    )
    
    func main() {
    	// Start pprof server
    	go func() {
    		log.Println(http.ListenAndServe("localhost:6060", nil))
    	}()
    
    	mutexDeadlock()
    	time.Sleep(100 * time.Second) // give enough time to observe
    }

Navigate to http://localhost:6060/debug/pprof/goroutine?debug=1 in your browser to view detailed goroutine states, including:

  • Locked goroutines
  • Stack traces
  • Lock contention points
1 @ 0x1001f7b88 0x10020bd98 0x10020bd75 0x10022bcc8 0x100249554 0x1004004a0 0x10040044d 0x100230284
#	0x10022bcc7	sync.runtime_SemacquireMutex+0x27	/usr/local/go/src/runtime/sema.go:77
#	0x100249553	sync.(*Mutex).lockSlow+0x173		/usr/local/go/src/sync/mutex.go:171
#	0x10040049f	sync.(*Mutex).Lock+0x16f		/usr/local/go/src/sync/mutex.go:90
#	0x10040044c	main.mutexDeadlock.func2+0x11c		/Users/jialinhuang/Desktop/go-websocket/main.go:47

1 @ 0x1001f7b88 0x10020bd98 0x10020bd75 0x10022bcc8 0x100249554 0x100400820 0x1004007cd 0x100230284
#	0x10022bcc7	sync.runtime_SemacquireMutex+0x27	/usr/local/go/src/runtime/sema.go:77
#	0x100249553	sync.(*Mutex).lockSlow+0x173		/usr/local/go/src/sync/mutex.go:171
#	0x10040081f	sync.(*Mutex).Lock+0x16f		/usr/local/go/src/sync/mutex.go:90
#	0x1004007cc	main.mutexDeadlock.func1+0x11c		/Users/jialinhuang/Desktop/go-websocket/main.go:35

If you want visualization

brew install graphviz

# when go run main.go

go tool pprof http://localhost:6060/debug/pprof/goroutine
# generate a compressed profile in your
# /Users/{???}/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.004.pb.gz


# to serve that profile you just created.
go tool pprof -http=:8080 /Users/jialinhuang/pprof/pprof.main.goroutine.004.pb.gz
# Serving web UI on http://localhost:8080

Deadlock solution

Deadlock Mutex Solution - same order

Goroutine1 gets lock1, goroutine2 is blocked because lock1 is already taken by goroutine1, so goroutine2 will be in a blocking state until goroutine1 releases the lock.

NOT GOOD ENOUGH: This becomes hard to manage when you have many locks

  • code
    func mutexSolution1() {
    	var mutex1, mutex2 sync.Mutex
    	var wg sync.WaitGroup
    
    	wg.Add(2)
    
    	// both lock thing in the same order:first mutex1 then mutex2
    	for i := 0; i < 2; i++ {
    		go func(id int) {
    			defer wg.Done()
    
    			mutex1.Lock()
    			fmt.Printf("Goroutine %d: LOCKING mutex1\n", id)
    
    			// some job
    			time.Sleep(100 * time.Millisecond)
    
    			fmt.Printf("Goroutine %d: TRY LOCKING mutex2\n", id)
    			mutex2.Lock()
    
    			//
    			fmt.Printf("Goroutine %d: USING...\n", id)
    			time.Sleep(50 * time.Millisecond)
    
    			mutex2.Unlock()
    			mutex1.Unlock()
    		}(i)
    	}
    
    	wg.Wait()
    }

Deadlock Mutex Solution - lock all

Barbaric one

NOT GOOD ENOUGH: This brute force approach isn't sustainable long-term

  • code
    func mutexSolution2() {
    	locks := OrderedLocks{}
    
    	go func() {
    		locks.LockBoth()
    		fmt.Println("Goroutine 1: GET ALL LOCKS")
    		fmt.Println("Goroutine 1: DONE")
    		locks.UnlockBoth()
    	}()
    
    	go func() {
    		locks.LockBoth()
    		fmt.Println("Goroutine 2: GET ALL LOCKS")
    		fmt.Println("Goroutine 2: DONE")
    		locks.UnlockBoth()
    	}()
    
    	time.Sleep(time.Second)
    }
    
    type OrderedLocks struct {
    	mutex1 sync.Mutex
    	mutex2 sync.Mutex
    }
    
    func (l *OrderedLocks) LockBoth() {
    	l.mutex1.Lock()
    	l.mutex2.Lock()
    }
    
    func (l *OrderedLocks) UnlockBoth() {
    	l.mutex2.Unlock()
    	l.mutex1.Unlock()
    }

Deadlock Mutex Solution - TryLock

TryLock is available after Go 1.18

Here we use Context for communication. Don't overthink it - context is just a tool for time-limited actions
For simple signal passing, both Channel and Context work.
Choose Channel for synchronization needs, Context for time-limited stuff like HTTP

The idea below is:

  1. Each goroutine first locks its corresponding index lock
  1. Then tries to lock the other one
  1. Finds out it can't because the other lock is taken by another goroutine
  1. In tryLock, changing the default behavior time.Sleep(1 * time.Millisecond) to time.Sleep(100 * time.Microsecond) can effectively GET ALL LOCKS
  • code
    func trylockSolution() {
    	var mutexes [2]sync.Mutex
    	var wg sync.WaitGroup
    
    	tryLock := func(m *sync.Mutex, timeout time.Duration, index int) bool {
    		ctx, cancel := context.WithTimeout(context.Background(), timeout)
    		defer cancel()
    
    		for {
    			select {
    			case <-ctx.Done():
    				return false
    			default:
    				if m.TryLock() {
    					fmt.Printf("Goroutine %d: Lock Success\n", index)
    					return true
    				}
    				time.Sleep(1 * time.Millisecond) // reduce interval here, 
    			}
    		}
    	}
    
    	wg.Add(2)
    
    	for i := 0; i < 2; i++ {
    			go func(i int) {
    			defer wg.Done()
    
    			// Try to lock the first mutex
    			if !tryLock(&mutexes[i], 500*time.Millisecond, i+1) {
    				fmt.Printf("Goroutine %d: CAN't GET mutex%d, GIVE UP!\n", i+1, i+1)
    				return
    			}
    			defer mutexes[i].Unlock()
    
    			time.Sleep(100 * time.Millisecond)
    
    			// Try to lock the second mutex (in reverse order for the second goroutine)
    			secondLock := (i + 1) % 2
    			if !tryLock(&mutexes[secondLock], 500*time.Millisecond, i+1) {
    				fmt.Printf("Goroutine %d: CAN't GET mutex%d, GIVE UP!\n", i+1, secondLock+1)
    				return
    			}
    			defer mutexes[secondLock].Unlock()
    
    			fmt.Printf("Goroutine %d: GET ALL LOCKS\n", i+1)
    		}(i)
    	}
    
    	wg.Wait()
    }
    
    /*
    
    Goroutine 1: Lock Success
    Goroutine 2: Lock Success
    Goroutine 1: CAN't GET mutex2, GIVE UP!
    Goroutine 2: CAN't GET mutex1, GIVE UP!
    
    */
    
    
    // small chance will be
    
    /*
    Iteration 7
    Goroutine 2: Lock Success
    Goroutine 1: Lock Success
    Goroutine 2: CAN't GET mutex1, GIVE UP!
    Goroutine 1: Lock Success
    Goroutine 1: GET ALL LOCKS
    */

Deadlock Channel Solution

If you prefer Go's CSP style

  • code
    func channelSolution() {
    	resource1 := make(chan struct{}, 1)
    	resource2 := make(chan struct{}, 1)
    	var wg sync.WaitGroup
    
    	resource1 <- struct{}{}
    	resource2 <- struct{}{}
    
    	wg.Add(2)
    
    	go func() {
    		defer wg.Done()
    
    		select {
    		case <-resource1:
    			fmt.Println("Goroutine 1: GET RESOURCE 1")
    			time.Sleep(100 * time.Millisecond)
    
    			select {
    			case <-resource2:
    				fmt.Println("Goroutine 1: GET RESOURCE 2")
    				time.Sleep(100 * time.Millisecond)
    				resource2 <- struct{}{} //release resource
    			default:
    				fmt.Println("Goroutine 1: CAN'T GET RESOURCE 2")
    			}
    
    			resource1 <- struct{}{}
    		default:
    			fmt.Println("Goroutine 1: CAN'T GET RESOURCE 1")
    		}
    	}()
    
    	go func() {
    		defer wg.Done()
    
    		select {
    		case <-resource2:
    			fmt.Println("Goroutine 2: GET RESOURCE 2")
    			time.Sleep(100 * time.Millisecond)
    
    			select {
    			case <-resource1:
    				fmt.Println("Goroutine 2: GET RESOURCE 1")
    				time.Sleep(100 * time.Millisecond)
    				resource1 <- struct{}{} // release resource
    			default:
    				fmt.Println("Goroutine 2: CAN'T GET RESOURCE 1")
    			}
    
    			resource2 <- struct{}{}
    		default:
    			fmt.Println("Goroutine 2: CAN'T GET RESOURCE 2")
    		}
    	}()
    
    	wg.Wait()
    }

Comparison

AspectPanicDeadlock
TimingimmediatelyMaybe wait for unsure
Recoveryyes with recover()no, program restart
DetectionExplicit error messageOften requires monitoring tools
ScopeAffects specific goroutinePotentially system-wide
Debuggingobvious stack and error message, easyComplex, requires runtime analysis

References

https://wangdaming.gitbooks.io/golang/content/tong_bu_lock.html

https://golang.design/go-questions/channel/csp/

https://ithelp.ithome.com.tw/articles/10235172

EOF