There are some errors in these tables. See the trace outputs in the Example Analyses section below for details.
The cycle counts shown for instructions in PM0044 section 7 are one less than the actual counts because the first decode cycle of an instruction normally overlaps with the last execution cycle of the preceding instruction.
Error/warning event reporting of stall cycles is available should timings be important in your application.
0> show error Error: non-classified [on/ON] [...] Error: stm8 [off/OFF] Warning: pipeline [unset/OFF] Warning: decode_stall [unset/OFF] Warning: fetch_stall [unset/OFF] [...]
These are off by default but may be enabled as required either as a group:
0> set error pipelineor individually:
0> set error decode_stall on 0> set error fetch_stall on
The simulator is able to generate detailed analyses of execution showing timings for each instruction executed including pipeline overlaps and stalls. This is controlled via the pipetrace feature of the STM8 CPU module. The output is in the form of a self-contained HTML document that can be opened with a browser or imported into other application documentation.
To generate a pipeline analysis:
0> set hw cpu pipetrace title "..."
0> set hw cpu pipetrace style "url"
0> set hw cpu pipetrace start "path"
0> set hw cpu pipetrace fold [on|off]
0> set hw cpu pipetrace pause
0> set hw cpu pipetrace data "text"
0> set hw cpu pipetrace resume
0> set hw cpu pipetrace stop
These are taken from the examples in ST's “PM0044 Programming Manual” section “5.3 Pipelined execution examples” and are generated by the test stm8.src/test/stm8-cycles/test.asm using the “pipetrace”functionality described above.
Note that there are some errors in the examples in section 5.3. These are noted in the output below and the differences confirmed on HW.
The DIV instruction is special in that it takes a variable number of cycles and is interruptible.
Other instructions, each run individually starting from an empty pipeline and showing the overlap with the following instruction.
Actual cycle counts may be obtained from hardware for comparison using a combination of stm8-gdb, openocd and an STLink or other openocd/SWIM compatible debugger. Set the master and CPU clocks to be equivalent and use one of the target's timers to count cycles.
For instance:
$ openocd -f interface/stlink.cfg -f target/stm8s003.cfg & $ stm8-gdb [...] (gdb) target extended-remote :3333 (gdb) set $DM_CSR2 = 0x7f99 (gdb) set $DM_ENFCTR = 0x7f9a (gdb) set $CLK_CKDIVR = 0x50c6 (gdb) set $CLK_PCKENR1 = 0x50c7 (gdb) set $TIM2_CR1 = 0x5300 (gdb) set $TIM2_EGR = 0x5306 (gdb) set $TIM2_CNTRH = 0x530c (gdb) set $TIM2_CNTRL = 0x530d (gdb) set $TIM2_PSCR = 0x530e (gdb) define cycles dont-repeat # Freeze TIM2 when CPU is stalled by DM set {unsigned char}$DM_ENFCTR = 0xfd # Set HSIDIV = 0, CPUDIV = 0 set {unsigned char}$CLK_CKDIVR = 0x00 # Set TIM2 prescalar to 0 so f_CK_CNT matches f_MASTER (and hence f_CPU) set {unsigned char}$TIM2_PSCR = 0x00 # Clear count and update config set {unsigned char}$TIM2_EGR = 1 set {unsigned char}$TIM2_CNTRH = 0xff set {unsigned char}$TIM2_CNTRL = 0xff # Enable counter set {unsigned char}$TIM2_CR1 = 0x01 # Enable clock gate set {unsigned char}$CLK_PCKENR1 = 0x20 # Set PC # N.B. Do not attempt to flush the decoder by writing to DM_CSR2. It upsets # openocd which is then unable to set breakpoints. set $pc = $arg0 #set {unsigned char}$DM_CSR2 = 0x81 # Set a HW breakpoint, run, then clear monitor bp $arg1 1 hw cont monitor rbp $arg1 set $_tmp = {unsigned short}$TIM2_CNTR disass/r $arg0,$arg1 printf "%u cycles\n", $_tmp end (gdb) document cycles Set PC to the first address, set a HW break at the second address, run and report how many cycles (as reported by $TIM2_CNTR) it took. The target is assumed to be halted initially. end (gdb) monitor reset halt target halted due to debug-request, pc: 0x00008000 (gdb) x/3i 0x811c 0x811c: ldw X,#0xfc00 ;0xfc00 0x811f: ld A,#0x80 ;0x80 0x8121: div X,A (gdb) cycles 0x811c 0x8122 target halted due to debug-request, pc: 0x00008000 breakpoint set at 0x00008122 Program received signal SIGTRAP, Trace/breakpoint trap. 0x00008122 in ?? () Dump of assembler code from 0x811c to 0x8122: 0x0000811c: ae fc 00 ldw X,#0xfc00 ;0xfc00 0x0000811f: a6 80 ld A,#0x80 ;0x80 0x00008121: 62 div X,A End of assembler dump. 14 cycles
Don't forget that there will be an initial pipeline fetch cycle before the first instruction can be decoded, there may be stall cycles, multiple instructions (mostly) overlap by one cycle (which is assumed in the timings given by PM0044), and you may have interrupts that should be disabled.