Running Hard Real-Time Applications and Linux on PolarFire SoC

This article discusses the RISC-V-based SoC FPGA architecture for PolarFire SoC, which allows hard real-time applications and Linux applications to coexist.

Industry Article March 26, 2019 by Tim Morin, Microchip

This article discusses the RISC-V-based SoC FPGA architecture for PolarFire SoC, which allows hard real-time applications and Linux applications to coexist.

Real-time Linux is eye-catching, but what exactly does it mean? The definition of a real-time system, in its simplest form, is one that executes deterministically on a periodic basis. Determinism is a first order requirement for real-time systems because they are generally controlling machines. You don’t want your numerically-controlled drill press to move from point A to point B in 10 milliseconds (ms) on Tuesday and perform the same operation in 20 ms on Wednesday. Likewise, a pilot’s flight control system should control the flight surfaces the exact same way, every time, under all conditions.

Figure 1 illustrates a deterministic system. Periodic interrupts fire and the interrupt service routine handles time-critical code. The execution time of that code must be deterministic, lest you end up with a system that behaves, as in Figure 2 where updates to hardware occur randomly in time.

Figure 1: Example of deterministic execution

Figure 1. Example of deterministic execution

Figure 2. Variable Interrupt Service Routine (ISR) execution times

There is also a need to bring the richness of Linux and all the associated middleware to hardware-controlled systems. Linux requires a Memory Management Unit (MMU) to virtualize physical memory to the application developer. Processors that embed an MMU also include an L1 cache at a minimum, and in most cases an L2 cache. Caches and determinism are orthogonal to each other as indicated in Figure 3. Here, we can see that L1 or L2 misses will introduce execution jitter by stalling the execution pipeline while cache lines are filled. Larger caches can reduce the frequency of cache misses but do not remove them entirely.

Figure 3: L1 and L2 Cache misses affecting determinism

Figure 3. L1 and L2 Cache misses affecting determinism

In processors that can run Linux, an additional source of execution jitter is the branch predictor. Processor cores include a branch predictor to increase application-level performance. Regardless of the implementation, branches are predicted and sometimes missed.

When the miss occurs, the pipeline gets flushed. Misses lead to non-deterministic execution behavior. During an Interrupt Service Routine (ISR), branch history tables used in the predictor have a history of branches that are germane to the execution history of the main application code, not the execution history of the ISR itself. This will result in pipeline flushes within the ISR, leading to variable execution time from ISR to ISR.

Using a processor that allows the user to disable the branch predictor gives the application developer control over where and how determinism is applied in the system. For application wide determinism, you can disable the branch predictors completely. Of course, branch predictors are put in place to increase performance, so turning them off will reduce performance.

The RISC-V PolarFire SoC FPGA Architecture

There are processors that can run Linux but can’t execute code deterministically, and there are processors that can execute code deterministically but can’t run Linux. Wouldn’t it be nice to have an architecture in your embedded toolkit that can support both? Microchip’s recently announced a RISC-V-based SoC FPGA architecture for PolarFire SoC which performs this task.

Figure 4 contains four 64-bit RV64GC RISC-V cores capable of running Linux, and one core (RV64IMAC) that cannot run Linux. In other words, the RV64IMAC does not contain an MMU, and the four RV64GC cores do contain an MMU.

PolarFire SoC Architecture

Figure 4. PolarFire SoC architecture

Instruction set differences between the RV64IMAC and the RV64GC is simple; the RV64GC contains a double precision floating point unit. To increase the level of determinism within the architecture, the user can turn off the branch predictor in any core, either after power-up or during an ISR. Additionally, in-order pipelines were chosen for all five cores to increase determinism and to avoid Spectre and Meltdown attacks on out-of-order machines.

So far, we’ve only discussed determinism as it relates to CPU cores. Code needs to execute from memory, so let’s discuss the memory subsystem in PolarFire SoC. First, the entire memory space in PolarFire SoC is coherent.

Coherency is defined as any memories that have multiple copies of data are managed by the coherency manager, and any memories that only contain a single copy of data are by their very nature coherent, as no other copies exist in the memory hierarchy. PolarFire SoC has three memory subsystems: L1, L2, and L3. The L3 memory subsystem integrates a hardened LPDDR3/LPDDR4 and DDR3/DDR4 36-bit controller. The extra 4 bits are for adding SECEDED to the external L3 memory subsystem.

L1 Memory Subsystem

The four RV64GC application cores each have an 8-way set associate, 32 KB I$TIMs, and 8-way set associate, 32 KB D$TIM. I$ equates to an instruction cache and TIM indicates Tightly Integrated Memory (TIM).

The I$TIM and the D$TIM are user configurable with the requirement there must always be one cache way for the I$TIM and D$TIM. The RV64IMAC Monitor core has a 16 KB two-way set associative I$TIM and an 8 KB DTIM. The DTIM is a data scratchpad memory that code can execute from. All L1 TIM functionality provides low-latency deterministic access and is Singe Error Correct Double Error Detect (SECDED) capable.

L2 Memory Subsystem

The L2 memory subsystem is 2 MB in size with SECDED capability and can be configured into three different modes. A 16-way set associative cache, a Loosely Integrated Memory (LIM) and a scratchpad memory. LIM memory can be pinned to a processor and can be sized in cache ways - in other words, LIMs can be constructed in 128KB chunks (ways) and assigned exclusive access to a processor.

Configured as a LIM, the L2 memory subsystem provides deterministic access to the core it gets pinned to and is coherent, as no other copies are shared with the L1 and L3 memory subsystem. LIM works well for deterministic code execution in both the main application and ISRs. Figure 5 illustrates a deterministic system when the L2 memory subsystem is configured as a LIM and the L1’s are configured as TIMs.

Figure 5 Deterministic execution with LIMs and TIMs

Figure 5. Deterministic execution with LIMs and TIMs

Unfortunately, due to the branch predictors mispredicting, ISR execution time variability still exists even if the L2 is configured as a LIM. Figure 6 shows an application executing when the L1 is configured as a TIM and the L2 is configured as a LIM. The horizontal axis indicates interrupts, and the vertical access indicates the cycle time within the ISR. As you can see, over time, the execution for the ISR varies.

Figure 6 Branch Predictor effect on determinism

Figure 6. Branch Predictor effect on determinism

Figure 7 gives us the determinism we were after by turning off the branch predictors.

Figure 7 Deterministic behavior

Figure 7. Deterministic behavior

Like the LIM, scratchpad memory can be configured in 128 KB chunks (ways) and assigned to CPU cores. Scratchpad memory works well as a shared memory resource between the processor executing code from the LIM and processors executing code from the L1/L2 and L3 memory subsystem (typically Linux). If the RV64IMAC application writes data to the scratchpad, and a copy of that memory location exists elsewhere in the L1/L2/L3 memory subsystem, the coherency manager will guarantee coherency. In this way, a real-time application can share data coherently with an application running in user space on Linux.

Figure 8 is one possible configuration of the PolarFire SoC Microprocessor Subsystem. In this configuration, the RV64IMAC serves up the real-time function while the RV64GCs run Linux. If your real-time function needs floating point performance, the RV64GC could serve that purpose because the branch predictors can be turned off, and the L1 memory subsystem can be configured as a TIM.

Figure 8 Coherent message passing

Figure 8. Coherent message passing

PolarFire SoC Allows Hard Real-Time and Linux Applications to Coexist

Determinism is a crucial requirement for real-time systems. However, the market has many processors that can run Linux but can’t execute code deterministically and others that can execute code deterministically but can’t run Linux. PolarFire SoC has a unique, flexible memory subsystem enabling hard real-time applications and Linux applications to coexist in a flexible, coherent manner.