Overview

The T-Head XuanTie E907 is a fully synthesizable, high-end, microcontroller-class processor that is compatible to the RISC-V RV32IMA[F][DIC[P] ISA. It delivers considerable integer and enhanced, energy-efficient floating-point compute performance. The key features of E907 include single/double precision FPU, deep optimized DSP execution unit with CSI-DSP lib and fast interrupt response.

 

Feature Description
Architecture RV32IMA[F][D]C[P]
Pipeline 5-stages (integer)
Main interface AXI4.0 64-bit Master Port
Peripheral interface AHBS 32-bit Master Port
FPU Suitable architecture for double precision floating point
DSP Enhanced Deep optimized DSP unit with CSI-DSP lib compliant to v0.9.4 P-extension spec
Hybrid Branch Predictor BHT/BTB/RAS
Instruction cache Up to 32KB
Data cache Up to 32KB
Interrupts Up to 240 interrupts + Non-maskable interrupt (NMI)
Hardware Performance Monitor (HPM) HPM for performance profiling
XuanTie Extensions XuanTie MCU Enhanced Extensions include the interrupt accelerating technology to reduce the response latency and the enhanced ISA to improve the instruction set performance
Sleep modes Sleep and deep sleep mode
RISC-V Debug Three levels of hardware configurations

 

Processor Overview

The E907 processor adopts a 16/32 bits mixed instruction set and implements a classic five-stages pipeline for integer. Also, it can be configured with single floating point or both the single and double floating point or DSP ISA. The processor offers high floating-point compute performance, enhanced ISA extensions, extended fast interrupt handling. The E907 is designed for high-end performance MCU/MPU applications whose target frequency falls in 400MHz-1GHz range.

 

Floating Point Unit (FPU)

Oriented towards the motor and navigation domain, the E907 processor implements a powerful FPU to accelerate the algorithm. The FPU has following features:

 

  • Compliant to the RISC-V RV32F and RV32D
  • Compliant to the IEEE-754 protocol spec;
  • Special design for double precision floating point unit when configured with RV32D and the single precision reuses the pipeline
  • Enough 64-bit data width to access the data cache under for double precisions

 

DSP

The DSP execution unit in E907 is compliant to the v0.9.4 version P extension spec and includes such as 8/16-bit SIMD  multiply, multiply-accumulate with 32/64-bit data operations, etc. which are key operations to accelerate the signal processing or filter arithmetic like FFT, FIR, IIR and Al arithmetic like matrix multiplication, vector multiplication, etc.

 

The DSP execution unit can make full use of the 32 integer GPRs in E907 to supply enough resource for the software optimization. Further, E907 optimized the micro-architecture to reduce the execution latency, adopts hybrid branch prediction to decrease the mis-prediction ratio. Thus, the CPI of key DSP lib is close to 1. Besides, the DSP execution unit had following features:

  • Optimized data prefetch mechanism
  • Appropriate read and write GPR ports for Zp64 64-bit arithmetic
  • Supply tuned CSI-DSP lib after software and hardware co-optimization

 

Memory Subsystem

E907 implements optional instruction cache and data cache. Also E907 supplies two configurations on the master interface: rich and outstanding capability of bus transactions. The “rich” configuration can achieve high bandwidth and accelerate memory accesses such as memory copy, memory set, etc. Both the instruction cache and data cache has following features:

  • 2-way set-associative and cache line is 32Bytes
  • FIFO cache replacement policy
  • Support software invalid and clear (only D-cache) operations through extended instructions
  • Can be configured to 2KB/4KB/8KB/16KB/32KB

 

Physical Memory Protection (PMP)

The E907 processor has optional RISC-V PMP which allows machine and user privilege modes to access different address ranges. Only the machine mode has the authority to define the memory access permissions. If an unauthorized access is detected, an access fault exception is triggered. The PMP has following features:

 

  • Upto 16 regions can be configured
  • Read/Write/Execution memory protection
  • Minimum 128B address range

 

Core Local Interrupt Controller (CLIC)

The E907 processor implements the RISC-V standard interrupt controller, CLIC and the CLINT. The CLIC has following features:

 

  • Support up to 240 external interrupts
  • Up to 32 priority settings
  • Support level or positive/negative edge interrupt types
  • Support hardware vector interrupt
  • The control registers are memory mapped.

 

Debug Components

The E907 processor adopts RISC-V v0.13.2 version debug spec with standard JTAG to communicate the host and E907 debug unit. E907 has done a lot of optimizations on the debugger and probe and has achieved 800KB/5s-900KB/s download speed, 4 times faster than the common solutions in the market.

 

The debug unit supports following operations:

  • Supply multi-level configurations in order to adapt to various needs
  • Support hardware/software breakpoints
  • Support variety trigger settings
  • Supply an independent master port to access the SoC resource
  • Check and modify CPU register resource
  • Single step or multi step flexibly supported

 

Hardware Performance Monitor (HPM)

The E907 processor implements optional RISC-V standard HPM to enable the software developer to profile the performance. The HPM has following features:

  • Support the ratio of branch prediction profiling
  • Support the cache miss ratio profiling
  • Support the execution number of instructions and CPU cycles profiling
  • Support profiling under machine and user mode

 

Interface

The E907 has three 64-bit AMBA4 AXI master master bus and a fast peripheral master bus to communicate with the external memory or peripheral IP. The internal requests can be allocated to either bus according to the address.

 

The fast peripheral master bus has a 32-bit data width and adopts the AHB5 protocol spec. Transactions are directly sent out through the fast peripheral port after the address calculation, bypassing the data cache pipeline. This, the fast peripheral bus can supply the ability of accessing the SRAM or slave IP with low latency.

 

T-HEAD MCU Enhanced Extensions (XME)

Oriented towards the MCU/MPU applications, XuanTie processor architecture enriches the RISC-V spec on performance and interrupt response speed.

  • Support fast interrupt handling and the response time is 20 CPU cycles
  • Support tail-chain for both vector and non-vector interrupts
  • Support hardware interrupt stack swapping
  • Support NMI
  • Support Lockup
  • Support sleep and deep sleep
  • Support soft reset operation
  • Support configurable reset address through top port during integration
  • 56 extended instructions including cache maintenance, bitwise operations, load/store enhancements and interrupt acceleration.

 

Software Ecosystems
  • Optimized compiler, assembler, linker and binary tools are contributed to GNU and supported officially
  • Enhanced ISA is supported by GCC and LLVM
  • QEMU is contributed and supported officially
  • Deep optimized CSI-DSP lib
  • Code size optimized runtime lib
  • Supply Keil-like Integrated Development Environment (CDK) and support mainline IDE and debug probe such as IAR IDE, OpenQCD, Lauterbach debugger, Segger J-Link
  • Support FreeRTOS, uCos, RT-Thread and AliOS-Things.

 

Fabrication Process for E907 RISC-V

Process is TSMC28 HPCPlus, 9T, RVT

 

You can license the T-HEAD E907 RISC-V core here