STM32 hardfault debugging

Programming a microcontroller is a bit different than programming on a PC. Error messages aren't nicely propagated to a terminal or GUI.

Overview

Error states are however, reported via register values that can be used to easily find the source of the exception. Keeping it open source, I will be using OpenOCD with Eclipse to demo this.

The general idea is to:

Jump to a specific place in the code when the exception occurs (the HardFault handler)
Preserve the register values at the time of the exception, so we know what happened

This is done in two parts:

Assembly code in the startup file that checks the PC (Program Counter) register for the line that caused the exception
C code in the main application that loads the various values into variables that can be easily inspected

Assembly Code

I added this code to the startup file after the reset handler

.section  .text.Reset_Handler
.weak  HardFault_Handler
.type  HardFault_Handler, %function
HardFault_Handler:
  movs r0,#4
  movs r1, lr
  tst r0, r1
  beq _MSP
  mrs r0, psp
  b _HALT
_MSP:
  mrs r0, msp
_HALT:
  ldr r1,[r0,#20]
  b hard_fault_handler_c
  bkpt #0

.size  HardFault_Handler, .-HardFault_Handler

This assembly checks which stack pointer is in use MSP (Master stack pointer) or PSP (Process stack pointer) then loads the PC into R1. Note that the PC register will be one instruction after the offending line, so be sure to add your breakpoint in the assembly view window, on the previous instruction.

The assembly hard fault handler then calls the extended hardfault handler that is defined somewhere in the C code.

C Code

void hard_fault_handler_c(unsigned long *hardfault_args){
  volatile unsigned long stacked_r0 ;
  volatile unsigned long stacked_r1 ;
  volatile unsigned long stacked_r2 ;
  volatile unsigned long stacked_r3 ;
  volatile unsigned long stacked_r12 ;
  volatile unsigned long stacked_lr ;
  volatile unsigned long stacked_pc ;
  volatile unsigned long stacked_psr ;
  volatile unsigned long _CFSR ;
  volatile unsigned long _HFSR ;
  volatile unsigned long _DFSR ;
  volatile unsigned long _AFSR ;
  volatile unsigned long _BFAR ;
  volatile unsigned long _MMAR ;

  stacked_r0 = ((unsigned long)hardfault_args[0]) ;
  stacked_r1 = ((unsigned long)hardfault_args[1]) ;
  stacked_r2 = ((unsigned long)hardfault_args[2]) ;
  stacked_r3 = ((unsigned long)hardfault_args[3]) ;
  stacked_r12 = ((unsigned long)hardfault_args[4]) ;
  stacked_lr = ((unsigned long)hardfault_args[5]) ;
  stacked_pc = ((unsigned long)hardfault_args[6]) ;
  stacked_psr = ((unsigned long)hardfault_args[7]) ;

  // Configurable Fault Status Register
  // Consists of MMSR, BFSR and UFSR
  _CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;

  // Hard Fault Status Register
  _HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;

  // Debug Fault Status Register
  _DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;

  // Auxiliary Fault Status Register
  _AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;

  // Read the Fault Address Registers. These may not contain valid values.
  // Check BFARVALID/MMARVALID to see if they are valid values
  // MemManage Fault Address Register
  _MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
  // Bus Fault Address Register
  _BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;

  __asm("BKPT #0\n") ; // Break into the debugger
}

The idea here is to load all the relevant information into variables that can be easily inspected. The Fault Status and Address registers should give us a good idea of what caused the fault.

This application note by Keil describes the register value meanings by bit: http://www.keil.com/appnotes/files/apnt209.pdf

This is a great example of debugging an imprecise bus access fault: http://chmorgan.blogspot.com/2013/06/debugging-imprecise-bus-access-fault-on.html "most often caused by a write to an invalid address."

The stacked_pc variable is the address of the instruction after the fault-causing instruction and can be pasted into the dissambly view in Eclipse.

Mad props to Erich Styger over at https://mcuoneclipse.com/2012/11/24/debugging-hard-faults-on-arm-cortex-m/ who first introduced me to hardfault handling.

See my BetaFlight commit for a complete example: https://github.com/borisbstyle/betaflight/commit/d97d4dd544807ac2c6b82553c3cb77f484038eda

What if it's not a hard fault?

For example, if you've got an external interrupt handler pointing to the wrong place? You can check the handler that is called by tweaking the Default_Handler

Default_Handler:
  // Load the address of the interrupt control register into r3
  ldr r3, NVIC_INT_CTRL_CONST
  // Load the value of the interrupt control register into r2 from the address held in r3
  ldr r2, [r3, #0]
  // The interrupt number is in the least significant byte - clear all other bits
  uxtb r2, r2
  // break
  bkpt #0
Infinite_Loop:
  b  Infinite_Loop
  .size  Default_Handler, .-Default_Handler
  .align 4
// address of the NVIC interrupt control register
NVIC_INT_CTRL_CONST: .word 0xe000ed04

Then search the manual for this value to find the table of interrupt number to types, 0xe000ed04: http://www.st.com/content/ccc/resource/technical/document/programming_manual/6c/3a/cb/e7/e4/ea/44/9b/DM00046982.pdf/files/DM00046982.pdf/jcr:content/translations/en.DM00046982.pdf.

Other resources

The ultimate cortex-m3/m4 manual, well explained material: https://www.eecs.umich.edu/courses/eecs373/labs/refs/M3%20Guide.pdf

Programming manual from STM: http://www.st.com/content/ccc/resource/technical/document/programming_manual/6c/3a/cb/e7/e4/ea/44/9b/DM00046982.pdf/files/DM00046982.pdf/jcr:content/translations/en.DM00046982.pdf

Fault handler example: https://github.com/cvra/arm-cortex-tools/blob/master/fault.c

Great summary of registers: http://www.keil.com/appnotes/files/apnt209.pdf