Selecting a Microcontroller: Part 4 – I/O Throughput
Dennis Cecic, P. Eng. (d.cecic@ieee.org)
Senior Member, IEEE Toronto Section
CPU performance is only one aspect of MCU performance. In this article, we will discuss how to get your data on/off the chip, and compare the I/O throughput capability of our 3 hero MCUs (PIC16F19197, PIC24FJ1024GA606, PIC32MZ1024EFH064).
What are Peripherals?
Peripherals are dedicated hardware circuits, separate from the CPU, running independently. They off-load work from the CPU, simplifying the program. Common peripherals include:
- Timers
- Communication Interfaces (SPI, I2C, UART)
- Analog Interfaces (ADC, DAC, Comparators)
- Signal Capture and Generation (Input Capture, Output Compare, PWM)
There are also many specialized peripherals for different applications:
- Displays (Graphical/LCD)
- Cryptographic coprocessors
Microchip Advanced Parts Selector (MAPs) can be used to filter MCU devices based on specific peripheral hardware requirements:
Peripheral Interfacing
The first step to interfacing with the peripheral is to set it up. Your main application code configures the peripheral’s functionality, then enables it. The peripheral then runs on its own without any further program intervention.
Next, your main application needs to send/receive data from the peripheral while it’s running. There are 3 methods for doing this:
Polled I/O
This method has your main application code periodically/continuously check (“poll”) the peripheral hardware device to see if it is ready for data transfer. The following example triggers an ADC conversion, then spins/waits for the conversion to be completed before proceeding:
This is very inefficient, as the CPU has to take time out of running your main application. Also, it may check too often, or not often enough to keep up with the data.
Interrupt-Driven I/O
In this method, the peripheral hardware “interrupts” the CPU when it is ready for data transfer. This is very efficient as the CPU can run your application code while waiting for the data transfer to complete.
Interrupt-generation logic performs hardware operations to pause execution of the main execution thread and update the CPU’s program counter (PC) to point to a special user-written handler function (called an interrupt service routine or ISR), and then transfers control to that function. Once the ISR has finished servicing the peripheral that caused the interrupt, the interrupt-generation logic restores the main execution context.
The following diagram depicts the general flow of interrupt processing:
Refer to the user manual for your specific compiler to review the syntax for writing ISR functions.
Direct Memory Access (DMA) I/O
Using Interrupt-Driven I/O at a high interrupt rate can still use significant CPU time, due to the overhead of saving and restoring the CPU context for every byte transferred in/out of the MCU.
There is another approach, called Direct Memory Access (DMA), which reduces the load on the CPU by reducing the number of interrupts for simple block data transfers. A few things DMA can do for us are:
- Transfer data from memory-to-memory
- Transfer data between memory and peripherals
- Automatically calculate the CRC checksum for a transfer
- Can stop a transfer on a pattern match (specific character or word)
The following table indicates DMA hardware availability in our 3 hero MCU devices:
Refer to the following documents for specific DMA functionality:
Benchmarking Interrupt Performance
As discussed above, interrupt processing requires hardware and software intervention (and resources) in order to save/restore the CPU context between the main code and the interrupt handler.
As shown in Fig. 5, a key performance attribute for interrupt sub-systems is entry latency – the delay between assertion of an interrupt condition and the execution of the first instruction in the interrupt handler. It consists of a hardware component (the time taken for the interrupt logic to save the return address and point the PC to the ISR) and a software component (the time taken to perform Context Save in the ISR).
High performance interrupt subsystems incorporate the following features to minimize entry latency:
- Dedicated interrupt vector (and associated interrupt handler) for each peripheral
- Dedicated CPU “shadow” register sets, which reduce the need for handler code to stack/unstack CPU registers
- Adjustable interrupt priority
The following table summarizes the key interrupt hardware features and measured entry latency of our 3 hero MCUs:
Comments (Entry Latency value in Fig. 6):
- “Best case” interrupt configuration is used for Entry Latency measurement, described below,
- PIC16F Target: Only 1 peripheral interrupt is enabled. Additional interrupts will add to this latency, since your interrupt handler will need to poll each enabled hardware resource to see who actually caused the interrupt.
- PIC24F and PIC32MZ Targets: Shadow register usage is enabled to reduce entry latency.
Entry Latency Examples (Disassembly Listings)
The following disassembly listings highlight the additional context save overhead of large CPU register sets, such as those found in PIC32 MCUs:
The next listing shows the entry latency using a disassembly listing for the PIC24F target:
And finally, this listing shows the entry latency using a disassembly listing for the PIC16F1 target:
Observation
Hmmm… a PIC32MZ running at the same instruction frequency (Fcyc) as a PIC16F1 will be slower to respond to interrupts!
Make sure to run PIC32MZ as fast as possible to get the most out of this MCU!