# AMD XILINX

## Host Programming of QSPI Flash

XAPP1372 (v1.1) October 4, 2022

## Summary

Versal<sup>®</sup> devices have a built-in hardware QSPI controller in the platform management controller (PMC). The QSPI controller is routed to the PMC MIO pins when connected to QSPI flash—this is commonly used as a boot device. The QSPI controller can be accessed by both the internal processing system (PS) and an external CPU. This application note provides a reference design for using an external CPU programming QSPI flash. The external CPU, later known as the host, communicates with the QSPI controller in the Versal device through the PCIe<sup>®</sup> bus, network on chip (NoC), and PMC in a sequential manner. The PS is not involved in this entire process. For customers with limited memory resources, this application does not require DDR memory. The programmable logic (PL) block RAM is sufficient.

Download the reference design files for this application note from the Xilinx website. For detailed information about the design files, see Reference Design.

### Features

The reference design has the following features:

- Read QSPI flash ID
- Erase/write QSPI flash
- Verify QSPI flash
- All the host operations to QSPI are via PCIe. The hardware PCIe bus is configured as Gen2 x1 using the Integrated Block for PCI Express.
- No involvement of the Versal device PS
- QSPI flash DMA read support
- PL block RAMs replace DDR memory to store the verified data, thereby saving DDR space

Xilinx is creating an environment where employees, customers, and partners feel welcome and included. To that end, we're removing noninclusive language from our products and related collateral. We've launched an internal initiative to remove language that could exclude people or reinforce historical biases, including terms embedded in our software and IPs. You may still find examples of non-inclusive language in our older products as we work to make these changes and align with evolving industry standards. Follow this link for more information.



1



## Introduction

The QSPI controller is a hardened module inside the PMC, which is instantiated in the Vivado<sup>®</sup> Control, Interfaces, and Processing System (CIPS) IP. In the PMC functional block diagram, the QSPI controller is located at the PMC I/O peripherals, which includes the QSPI controller, OSPI controller, SD/EMMC controllers, I2C controller, and GPIO controller. The following figure shows that there is an NoC interconnect connected to the PMC through which the PL can access the QSPI controller.







Besides the Versal device internally accessing the QSPI controller path, on the other side, the host can access the Versal device NoC through the PCIe interface in the Versal device. Thus, the complete access path is as follows:

Host: PCIe RC > Versal device: PCIe EP (QDMA) > NoC > QSPI controller inside PMC > QSPI flash

The following figure shows the system architecture.





As shown in the figure, there are three types of datapath in the design:

• Blue Path: Host Access QSPI Controller: This path is used to read and write QSPI controller registers. Thus, there are two directions: one for write and another one for read.

Both Read Flash ID and Erase/Write Flash features are completed via this datapath. In a Read Flash ID, the host issues a series of QSPI controller register write and read operations until the QSPI Flash ID is obtained.

The Erase or Write QSPI Flash has a set of read and write operations. The main difference between Erase and Write operations is the content to write. Erase QSPI Flash writes zeroes to all QSPI flash, but the Write QSPI Flash operation writes specific content to the QSPI flash.

• Cyan Path: QSPI Controller DMA Read: After the host issues a DMA read sequence to the QSPI controller via the blue path, the QSPI controller sends the data to the block RAM by the cyan path until the DMA data length reaches a defined size.



• **Purple Path: Host Reads Back QSPI Flash Data in Block RAM:** All these three paths take in the Verify QSPI Flash operation. First, The QSPI command is sent through the blue path. Secondly, the QSPI controller uses DMA to send the data to block RAM via the cyan path. Finally, the host fetches the block RAM data through the purple data path, and then compares with the local memory data.

The reference design accompanying this application note includes a host software design and Versal ACAP hardware design (see <u>Reference Design</u>). The reference design is verified successfully in a X86 host and Xilinx VCK190 evaluation board environment. The Xilinx VCK190 evaluation board uses the Versal device XCVC1902.

## Versal Device Hardware Design

#### **Block Design Introduction**

It is assumed that customers use PCIe as a controller bus to connect the host (such as a x86 or Arm<sup>®</sup> processor) with the Versal device silicon. Normally, there are low bandwidth requirements, so GT Quad sharing with other protocols is realized, as an example.

In the block design (BD), PCIe Gen2 x1 and Aurora share the same GT Quad. PCIe integrated IP in the PL and QDMA AXI\_bridge mode are used. In the following figure, AI Engine, DDR4, and Aurora are not involved in the host programming of QSPI flash.



#### Figure 3: Design Diagram



Four base address register (BAR) spaces are defined, as shown in the following figure. The first is BAR1, which is an AXI bridge master that connects to data storage spaces through the NOC (DDR4) and SmartConnect (block RAM). The second BAR is an AXI bridge master that accesses AI Engine space through the NoC. The third BAR is an AXI bridge master that connects to the QSPI controller through the NOC. The fourth BAR is an AXI4-Lite master space to access block RAM on the Versal device.

| Show disabled ports                                                                                    | Compone | ent Name qdma                                                                                                                                                                                                                                                                                                                                       | _subsystem/qdr | na_0     |              |      |             |                |      |                 |               |
|--------------------------------------------------------------------------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|----------|--------------|------|-------------|----------------|------|-----------------|---------------|
|                                                                                                        | Basic   | Capabilities                                                                                                                                                                                                                                                                                                                                        | PCIe : BARs    | PCIe : M | ISC PCIe : E | AMO  | Debug and A | dditional Opti | ions |                 |               |
|                                                                                                        | map. Af | Base Address Registers (BARs) serve two purposes. Initially, they serve as a mechanism for the device to request blocks of address space in the sy<br>map. After the BIOS or 05 determines what addresses to assign to the device, the Base Address Registers are programmed with addresses and th<br>this information to perform address decoding. |                |          |              |      |             |                |      |                 |               |
|                                                                                                        | PFO     |                                                                                                                                                                                                                                                                                                                                                     |                |          |              |      |             |                |      |                 |               |
|                                                                                                        | Bar     | Туре                                                                                                                                                                                                                                                                                                                                                |                | 64 bit   | Prefetchable | Size |             | Scale          |      | Value (Hex)     | PCIe to AXI T |
|                                                                                                        | 8       | AXI Bridge Ma                                                                                                                                                                                                                                                                                                                                       | ster 👻         |          | 8            | 256  | •           | Megabytes      | •    | FFFFFFFF000000C | 0×00000000    |
|                                                                                                        |         | AXI Bridge Ma                                                                                                                                                                                                                                                                                                                                       | ster ·         |          |              | 128  | *           | Megabytes      |      | 0000000         | 0x00000000    |
|                                                                                                        | 9       | AXI Bridge Ma                                                                                                                                                                                                                                                                                                                                       | ster •         | 2        | 8            | 4    | •           | Gigabytes      | •    | FFFFFFF0000000C | 20000000      |
| M,AXI +                                                                                                |         | AXI Bridge Ma                                                                                                                                                                                                                                                                                                                                       | ster •         |          |              | 128  |             | Kilobytes      |      | 0000000         | 0×0000000     |
| + m_axis_cq M_AX0_BRIDGE +<br>+ m_axis_rc M_AX0_UTE +                                                  | 8       | AXI Bridge Ma                                                                                                                                                                                                                                                                                                                                       | ster •         |          |              | 256  | •           | Megabytes      | •    | F0000000        | 0×1000000     |
| + pcie_cfg_mesg_bc s_exis_cc +<br>+ pcie_transmit_fc_if s_exis_rq +                                    | 8       | AXI Lite Maste                                                                                                                                                                                                                                                                                                                                      | er •           |          |              | 64   | •           | Kilobytes      | •    | FFFF0000        | 0x0F000000    |
| + pcie_cfg_mesg_rcvd pcie_cfg_control_if +<br>+ pcie_cfg_statupgfic_cfg_external_msbc_vithout_msi_if + | 0       | Expansion R0                                                                                                                                                                                                                                                                                                                                        | м <b>ч</b>     |          |              | 4    |             | Klobytes       |      | 00000000        | 0×0000000     |
| + pcie_cfg_fc pcie_cfg_interrupt +<br>+ usr_irq pcie_cfg_mgmt_if +                                     |         |                                                                                                                                                                                                                                                                                                                                                     |                |          |              |      |             |                |      |                 |               |
| + dsc_crdt_in tm_dsc_sts + i<br>user ink up_sd asts_out + i                                            |         |                                                                                                                                                                                                                                                                                                                                                     |                |          |              |      |             |                |      |                 |               |
| - phy_rdy_out_sd and_aclk -                                                                            |         |                                                                                                                                                                                                                                                                                                                                                     |                |          |              |      |             |                |      |                 |               |
| user_clk_sd axi_aresetn<br>user reset só cfg_negotiated width o(2:0)                                   |         |                                                                                                                                                                                                                                                                                                                                                     |                |          |              |      |             |                |      |                 |               |

#### Figure 4: BAR Spaces

As the following address editor shows, BAR1 connects DDR, BAR2 connects AI Engine, and BAR3 connects PMC\_SLAVES:

- axi\_noc\_1/S09\_AXI: BAR1 space: The reference design does not access this space.
- ai\_engine\_0/S00\_AXI: BAR2 space: The reference design does not access this space.
- versal\_cips\_0/NOC\_PMC\_AXI\_0/pspmc\_0\_psv\_pmc\_qspi\_0: This is the QSPI controller space inside the PMC space.
- versal\_cips\_0/NOC\_PMC\_AXI\_0/pspmc\_0\_psv\*: The reference design does not access these spaces.

## AMD XILINX

#### Figure 5: Versal Device Address Editor

| > Im /qdma_subsystem/qdma_0/M_AXI_BRIDGE ( | 64 address bits : 16E) |                                   |                         |      |   |                       |
|--------------------------------------------|------------------------|-----------------------------------|-------------------------|------|---|-----------------------|
| 1 /ai_engine_0/S00_AXI                     | S00_AXI                | AIE_ARRAY_0                       | 0x0000_0200_0000_0000 0 | 4G   | • | 0x0000_0200_FFFF_FFFF |
| 1\$ /axi_noc_1/S09_AXI                     | S09_AXI                | C1_DDR_LOW0                       | 0x0000_0000_0000_0000 Ø | 2G   | • | 0x0000_0000_7FFF_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_trng              | 0x0000_0001_0123_0000 Ø | 64K  | * | 0x0000_0001_0123_FFFF |
| Versal_cips_0/NOC_PMC_AXI_0                | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_xmpu_0            | 0x0000_0001_012F_0000 0 | 64K  | • | 0x0000_0001_012F_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_slave_boot        | 0x0000_0001_0122_0000 0 | 64K  | * | 0x0000_0001_0122_FFFF |
| * /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_sha               | 0x0000_0001_0121_0000 0 | 64K  | • | 0x0000_0001_0121_FFFF |
| * /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_tmr_inject_0      | 0x0000_0001_0008_3000 0 | 4K   |   | 0x0000_0001_0008_3FFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_sysmon_0          | 0x0000_0001_0127_0000 Ø | 192K | • | 0x0000_0001_0129_FFFF |
| T /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_slave_boot_stream | 0x0000_0001_0210_0000 Ø | 64K  | • | 0x0000_0001_0210_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_tmr_manager_0     | 0x0000_0001_0028_3000 Ø | 4K   |   | 0x0000_0001_0028_3FFF |
| * /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_xppu_0            | 0x0000_0001_0131_0000 Ø | 64K  | * | 0x0000_0001_0131_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_adma_7                | 0x0000_0000_FFAF_0000 0 | 64K  | * | 0x0000_0000_FFAF_FFFF |
| Versal_cips_0/NOC_PMC_AXI_0                | NOC_PMC_AXI_0          | pspmc_0_psv_adma_6                | 0x0000_0000_FFAE_0000 0 | 64K  | • | 0x0000_0000_FFAE_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_xppu_npi_0        | 0x0000_0001_0130_0000 0 | 64K  |   | 0x0000_0001_0130_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_gpio_0            | 0x0000_0001_0102_0000 Ø | 64K  |   | 0x0000_0001_0102_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_i2c_0             | 0x0000_0001_0100_0000 0 | 64K  |   | 0x0000_0001_0100_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_sd_1              | 0x0000_0001_0105_0000 Ø | 64K  | • | 0x0000_0001_0105_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_rtc_0             | 0x0000_0001_012A_0000 0 | 64K  | * | 0x0000_0001_012A_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_rsa               | 0x0000_0001_0120_0000 0 | 64K  |   | 0x0000_0001_0120_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_ram_instr_cntlr   | 0x0000_0001_0020_0000 0 | 256K |   | 0x0000_0001_0023_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_ram               | 0x0000_0001_0200_0000 Ø | 128K | * | 0x0000_0001_0201_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_ram_npi           | 0x0000_0001_0600_0000 0 | 32M  | * | 0x0000_0001_07FF_FFFF |
| % /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_ram_data_cntlr    | 0x0000_0001_0024_0000 0 | 128K | • | 0x0000_0001_0025_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_qspi_0            | 0x0000_0001_0103_0000 Ø | 64K  |   | 0x0000_0001_0103_FFFF |
| 1 /versal_cips_0/NOC_PMC_AXI_0             | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_ppu1_mdm_0        | 0x0000_0001_0031_0000 Ø | 32K  | • | 0x0000_0001_0031_7FFF |
| % // Wersal_cips_0/NOC_PMC_AXI_0           | NOC_PMC_AXI_0          | pspmc_0_psv_pmc_iomodule_0        | 0x0000_0001_0028_0000 0 | 4K   | • | 0x0000_0001_0028_0FFF |
|                                            |                        |                                   |                         |      |   |                       |

The following figure shows that BAR4 connects the block RAM controller. There is a space in the BAR4 that can be accessed by the host:

axi\_bram\_ctrl\_0/S\_AXI: This is the block RAM space that the host accesses through PCIe.

#### Figure 6: BAR4 Space

| I Network 1                            |                        |       |             |       |                                 |
|----------------------------------------|------------------------|-------|-------------|-------|---------------------------------|
| v \$ /qdma_subsystem/qdma_0            |                        |       |             |       |                                 |
| V 22 /qdma_subsystem/qdma_6/M_AXI_LITE | 132 address bits ( 40) |       |             |       |                                 |
| 18 /aki bram ctrl 0/5 AXI              | S AN                   | Memil | 0xF000_0000 | 2.180 | <ul> <li>0xF000_1FFF</li> </ul> |

On the other hand, the block RAM controller connects with the dual-port block RAM to store QSPI flash data. The QSPI controller uses the DMA to transfer the data from QSPI flash to block RAM. There is another block RAM controller that connects the dual-port block RAM which reads out the QSPI flash data. There are two different addresses for the block RAM block. If the master is PCIe, the block RAM block's address is 0xF000\_0000. If the master is the QSPI controller in PMC, the address is 0x201\_8000\_0000. This is more than 32 bits of address space. Configuring this address to the QSPI controller register, DMA\_Dst\_Addr\_L = 0x8000\_0000 and DMA\_Dst\_Addr\_H = 0x0000\_0201.

#### Figure 7: BRAM Controller

| V Morsal_cips_0/PMC_NOC_AXI_0 (64 addre    | iss bits : 0x00000000000 [ | 2G].0x000E0000000[256M | ,0x00100000000 [128M],0x006000 | 000000 [ 86 ] | .0x00800000000 [ 32G ] .0           |
|--------------------------------------------|----------------------------|------------------------|--------------------------------|---------------|-------------------------------------|
| 12 /ai_engine_0/S00_AXI                    | \$00_AXI                   | AIE_ARRAY_0            | 0x200_0000_0000                | / 4G          | <ul> <li>0x200_FFFF_FFFF</li> </ul> |
| 1\$ /axi_bram_ctrl_1/S_AXI                 | S_AXI                      | Mem0                   | 0x201_8000_0000                | 0 8K          | <ul> <li>0x201_8000_1FFF</li> </ul> |
| ∀ ≥ Network 1                              |                            |                        |                                |               |                                     |
| v 👎 /qdma_subsystem/qdma_0                 |                            |                        |                                |               |                                     |
| V III /qdma_subsystem/qdma_0/M_AXI_LITE () | 32 address bits : 4G)      |                        |                                |               |                                     |
| 1¢ /axi_bram_ctrl_0/S_AXI                  | S_AXI                      | Mem0                   | 0xF000_0000                    | 0 8K          | <ul> <li>0xF000_1FFF</li> </ul>     |

# AMD

#### Figure 8: Datapath inside PL Design



The host accesses the QSPI controller space by accessing BAR3. Data is transformed through the QDMA M\_AXI\_BRIDGE port, goes into the NoC slave port to the master port, and arrives at NOC\_PMC\_AXI\_0.

The three colored datapaths are Versal device hardware datapaths in the entire system architecture. All the datapaths are identical in the whole design.

#### PCIe Debug Accessory

For easy debug of the PCIe link status, a PCIe specified debug hub and ILA are inserted in the BD.



#### Figure 9: PCIe Debug Hub and ILA

## AMD7 XILINX

#### Figure 10: PCIe Debug Hub and ILA Path



## **Host Software Design**

#### **Address Mapping**

After setting up the host and VCK190 environment, list all the PCIe BAR spaces using the <code>lspci</code> command on the host computer. There are four PCIe BARs that the host could access, as shown in the following figure.



| 2:00.0 | Memory  | COL | ntroller:  | Xilinx (  | Corpo | ration Device  | b021        |
|--------|---------|-----|------------|-----------|-------|----------------|-------------|
|        | Subsyst | tem | : Xilinx ( | Corporati | ion D | evice 0007     |             |
|        | Flags:  | fa  | st devsel, | IRQ 11    |       |                |             |
|        | Memory  | at  | 381fc0000  | 0000 (64- | -bit, | prefetchable)  | [size=256M] |
|        | Memory  | at  | 381180000  | 0000 (64- | -bit, | prefetchable)  | [size=1G]   |
|        | Memory  | at  | e0000000   | (32-bit,  | non   | -prefetchable) | [size=256M] |
|        | Memory  | at  | 20000000   | (32-bit,  | non   | -prefetchable) | [size=64K]  |

The BAR3 space has a size of 256 MB, which is from address 0xE000\_0000 to 0xEFFF\_FFF. This BAR3 space is mapped to Versal device address 0x1\_0000\_0000 to 0x1\_0FFFF\_FFFF, which includes the PMC slave space. The QSPI controller is located at the PMC.

In this application note, the host also accesses the BAR4 space. There is a block RAM segment with a base address of 0xF000\_0000 and size of 8 KB.

#### Table 1: PCIe BARs Address Mapping

| PCIe BAR3 Space           | Device Name                           | Versal Device Address Base            |
|---------------------------|---------------------------------------|---------------------------------------|
| 0xE103_0000 ~ 0xE103_FFFF | QSPI controller                       | 0x1_0103_0000 ~ 0x1_0103_FFFF (64 KB) |
| 0xF000_0000 ~ 0xF000_1FFF | Block RAM Port0<br>For read operation | 0xF000_0000 ~ 0xF000_1FFF (8 KB)      |



This block RAM has independent addresses for the two ports. As described previously, one address is for the host readback which maps to PCIe BAR4. Another address is written by the QSPI controller, so the software tells the QSPI controller that the BOARD\_BRAM\_BASE address is 0x201\_8000\_0000. This access is internal to the Versal device.

| PCIe BAR4   | Device Name                  | Size (B) | Versal Device<br>Address Base | Host Access                                                                           |
|-------------|------------------------------|----------|-------------------------------|---------------------------------------------------------------------------------------|
| 0xF000_0000 | Block RAM Port0<br>For read  | 8K       | 0xF000_000                    | Host accesses BAR space                                                               |
| Nop         | Block RAM Port1<br>For write | 8К       | 0x201_8000_0000               | Host configures this 64-<br>bit address to QSPI<br>controller DMA address<br>register |

| Table 2: Dual Ports Block RAM Ade | dress Mapping |
|-----------------------------------|---------------|
|-----------------------------------|---------------|

In the software application, the BOARD\_BRAM\_BASE and BOARD\_BRAM\_SIZE macros are defined in the file xqspipsu.h.

#### Figure 12: Address Mapping of Host Software

194: #define BOARD\_BRAM\_BASE (0x2018000000ull) 195: #define BOARD\_BRAM\_SIZE 8192

#### **Host Application Data Flow**

The following figure shows the overall programming guideline. Generally, the complete programming flow includes these subflows: QSPI Abort and Initialization, Read Flash ID, Erase Flash, Write Flash, and Verify Flash. In the Verify Flash subflow, there is a DMA Read Flash to the block RAM followed by a Host Read Block RAM to the Host local RAM and then a comparison on the host side.



Figure 13: Host Application Data Flow



## Debugging

#### Hardware Environment Setup

1. Some PCs might require **Above 4G Memory Assignment** and **Resizable BAR Support** to be set in the BIOS Configuration tab.

| MEMORY                     |        | CPU RATIO x BCLK |        |           | PCI-EXPRES        |
|----------------------------|--------|------------------|--------|-----------|-------------------|
| EXTRAS Y DC                | MEMORY | ADVANCED         |        | BOOT      | SAVE & EXIT       |
| PCIe Configuration         |        |                  |        | If sustem | has Resizable BAR |
| Above 4G Memory Assignment |        | Enabled          | V      |           | CIe Devices, this |
| Resizable BAR Support      |        | Enabled          | $\sim$ |           | ables or Disables |
|                            |        | Disabled         |        |           | BAR Support.      |
| PE1                        |        | Enabled          |        |           |                   |
| Speed                      |        |                  |        |           |                   |
| PE2                        |        | Not Present      |        |           |                   |
| Speed                      |        | Auto             | >      |           |                   |
|                            |        |                  |        |           |                   |
|                            |        |                  | $\leq$ |           |                   |
| PE4                        |        | Not Present      |        |           |                   |
| Speed                      |        | Auto             | <      |           |                   |



2. Build the VCK190 project (refer to the readme.txt file in the reference design to build the VCK190 project). Download the PDI file through the Vivado hardware manager and check the PCIe link status.



3. On the host side, use the lspci command to get the BAR address. Then use the devmem command to try to read and write the BAR spaces successfully.

#### Software Debugging

To compile the host application, select a directory and launch the application source code to this directory. Then compile the application by running the Makefile on an X86 host.

> make

Three objects are generated, as shown in the following table.

#### Table 3: Objects Generated by Running Makefile

| Object       | Description                                                                                                                |
|--------------|----------------------------------------------------------------------------------------------------------------------------|
| main         | This executable file programs and verifies the 128 Mb flash.                                                               |
| program_only | This executable file programs the 128 Mb flash.                                                                            |
| debug        | This executable file programs the 128 Mb flash. At the same time, it opens the debug switch and prints all the debug logs. |

Then run the executable files on the host. The following figure shows the log for the programming and verification of the 128 MB flash.



#### Figure 14: Programming and Verification of 128 MB Flash

```
root@localhost st]# time ./main
 * build date:Jul 25 2022 04:30:48
* build version:V1.0
FlashID=0x20 0xbb 0x21
Flash connection mode : 2
there 0 - Single; 1 - Stacked; 2 - Parallel
 CTIndex: 10
lash PageSize = 512 , PageCount = 262144, total programming size = 134217728(0x8000000)
ReadCmd: 0x6b, WriteCmd: 0x2, StatusCmd: 0x70, FSRFlag: 1
FlashEnterExit4BAddMode...
     -Successful!
Initilizing WriteBuffer...
    --Successful!
Erasing Flash 134217728 Bytes ...
   --Successful!
Writing Flash 134217728 Bytes ...
    --Successful!
leading and Comparing Flash 134217728 Bytes ...
    --Successful!
Successfully Programming Flash
       6m34.470s
real
       4m6.359s
ser
sys
        2m26.532s
```

The following figure shows the log from programming the 128 MB flash.

#### Figure 15: Programming of 128 MB Flash



#### Performance

Programming the 128 MB image takes 193 seconds, which is 3 minutes 13 seconds. Comparing the two sets of performance results shown in the following table, it can be determined that there is a linear relation between time and programming size. For example,  $48s \ge 4 = 192s$ , which is almost 193s.

Thus, 128 MB flash programming takes four times as long as 32 MB flash programming.

Programming + verify takes more time during this process. It is obvious that the verify time increases in a linear manner with the programming size.



#### Table 4: Host Programming Performance

| Operation            | Size (MB) | Time (s) | Description                                 |
|----------------------|-----------|----------|---------------------------------------------|
| Programming only     | 32        | 48       | Read Flash ID, Erase Flash, and Write Flash |
| Programming only     | 64        | 97       |                                             |
| Programming only     | 128       | 193      |                                             |
| Programming + Verify | 32        | 99       | Read Flash ID, Erase Flash, and Write Flash |
| Programming + Verify | 64        | 197      | Then Read Flash content and Compare         |
| Programming + Verify | 128       | 394      |                                             |

## **Reference Design**

Download the reference design files for this application note from the Xilinx website.

#### **Reference Design Matrix**

The following checklist indicates the procedures used for the provided reference design.

#### Table 5: Reference Design Matrix

| Parameter                                                                                                               | Description                                                                |  |  |  |
|-------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|--|--|--|
| Ger                                                                                                                     | neral                                                                      |  |  |  |
| Developer name                                                                                                          | Xilinx                                                                     |  |  |  |
| Target devices                                                                                                          | Versal device XCVC1902                                                     |  |  |  |
| Source code provided?                                                                                                   | Y                                                                          |  |  |  |
| Source code format (if provided)                                                                                        | Vivado tools Tcl script for hardware block design/C++ for host application |  |  |  |
| Design uses code or IP from existing reference design,<br>application note, 3rd party or Vivado software? If yes, list. | Ν                                                                          |  |  |  |
| Simu                                                                                                                    | lation                                                                     |  |  |  |
| Functional simulation performed                                                                                         | Ν                                                                          |  |  |  |
| Timing simulation performed?                                                                                            | Ν                                                                          |  |  |  |
| Test bench provided for functional and timing simulation?                                                               | Ν                                                                          |  |  |  |
| Test bench format                                                                                                       | N/A                                                                        |  |  |  |
| Simulator software and version                                                                                          | N/A                                                                        |  |  |  |
| SPICE/IBIS simulations                                                                                                  | Ν                                                                          |  |  |  |
| Implem                                                                                                                  | entation                                                                   |  |  |  |
| Synthesis software tools/versions used                                                                                  | Vivado synthesis                                                           |  |  |  |
| Implementation software tool(s) and version                                                                             | Vivado implementation                                                      |  |  |  |
| Static timing analysis performed?                                                                                       | N                                                                          |  |  |  |
| Hardware                                                                                                                | Verification                                                               |  |  |  |
| Hardware verified?                                                                                                      | Y                                                                          |  |  |  |
| Platform used for verification                                                                                          | VCK190 evaluation board                                                    |  |  |  |



## Conclusion

QSPI flash is commonly used as a boot device. In Versal devices, there is an integrated QSPI controller in PMC which can be access by both internal A72/R5F APU (normally) and external CPU. This application provides a framework for an external CPU (host PC) programming of QSPI Flash memory. The external CPU communicates with the Versal QSPI Flash Controller through the PCIe, without involving the Versal PS. If DDR is inaccessible, this application does not even need DDR. The PL BRAM can be used instead.

## References

These documents provide supplemental material useful with this guide:

- 1. Versal ACAP Technical Reference Manual (AM011)
- 2. Versal ACAP Register Reference (AM012)
- 3. Versal ACAP DMA and Bridge Subsystem for PCI Express Product Guide (PG344)
- 4. Versal ACAP Transceivers Wizard LogiCORE IP Product Guide (PG331)

## **Revision History**

The following table shows the revision history for this document.

| Section                       | Revision Summary                                                                                                     |
|-------------------------------|----------------------------------------------------------------------------------------------------------------------|
| 10/04/2022 Version 1.1        |                                                                                                                      |
| General updates               | Replaced XDMA with QDMA.                                                                                             |
| Versal Device Hardware Design | Added BAR4 to description of BAR spaces, and updated                                                                 |
| Host Software Design          | figures.                                                                                                             |
| Debugging                     | <ul><li>Updated with description of 128 MB flash.</li><li>Removed <i>Performance Optimization</i> section.</li></ul> |
| 06/29/2022 Version 1.0        |                                                                                                                      |
| Initial release.              | N/A                                                                                                                  |

## **Please Read: Important Legal Notices**

The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2) Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any



action brought by a third party) even if such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or to product specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior written consent. Certain products are subject to the terms and conditions of Xilinx's limited warranty, please refer to Xilinx's Terms of Sale which can be viewed at https:// www.xilinx.com/legal.htm#tos; IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring fail-safe performance; you assume sole risk and liability for use of Xilinx products in such critical applications, please refer to Xilinx's Terms of Sale which can be viewed at https://www.xilinx.com/legal.htm#tos.

#### **AUTOMOTIVE APPLICATIONS DISCLAIMER**

AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262 AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT LIABILITY.

#### Copyright

© Copyright 2022 Advanced Micro Devices, Inc. Xilinx, the Xilinx logo, Alveo, Artix, Kintex, Kria, Spartan, Versal, Vitis, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. AMBA, AMBA Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of Arm Limited in the EU and other countries. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and used under license. All other trademarks are the property of their respective owners.