Multi-Processor System-on-Chip 2. Liliana Andrade
Чтение книги онлайн.

Читать онлайн книгу Multi-Processor System-on-Chip 2 - Liliana Andrade страница 14

СКАЧАТЬ black – high-throughput variant, and blue – low-energy-consumption variant. We drop red and green from further consideration since they do not achieve the lowest cycle counts and have high-memory register file usage. However, we should keep in mind the lesson learned: that our code could create a kernel that is in a very unfavorable spot on the design space, if we were to underestimate all the complexity layers and their implications.

      1.5.3. Measurements for low-end and high-end use cases

      For the analysis of GFDM kernel performance for the high-end and low-end workloads posed by the standard analysis in section 1.2 and Table 1.1, we have to provide a vDSP architecture on which to execute the GFDM kernel.

      We use theoretical minimum operations as the baseline for a fair, objective and unbiased comparison between the code variants and their utilization of processor resources. We call this metric implementation execution density ρ and define it as a ratio of minimum theoretical operations and measured cycles. The theoretical minimum includes a) only general/standard arithmetic or logical operations (not fused/composite operations that combine several into one, with MAC as the only exception), and b) memory accesses. The theoretical vector operations minimum depends on the implementation variant, i.e. the loop order combination and vectorization.

image
Use Case Metric Black Blue
low-end LTE legacy image required budget [MHz] 1.01 5.39
our vDSP processing time [µs] 0.504 2.695
vDSP utilization [%] 0.10 0.54
min. vDSP s to run [#] 1 1
CA high-end FR2 image 4 ×CA, µ = 3,400MHz required budget [MHz] 921.2 5,505.5
our vDSP processing time [µs] 57.58 344.09
vDSP utilization [%] 92.12 550.55
min. vDSP s to run [#] 1 6
MIMO CA high-end FR2 image 8 ×8, 4 ×CA, µ = 3,400MHz required budget [MHz] 7.37 44.04
our vDSP processing time [µs] 460.6 2,752.7
vDSP utilization [%] 736.95 4,404.38
min. vDSP s to run [#] 8 45

      The results argue in favor of the discussion from section 1.2: it is practical to run the low-end use cases quite effortlessly in parallel with other kernels and tasks on a vDSP. Surprisingly, even the CA high-end black GFDM can fit on a single vDSP core and make the deadline, albeit at a heavy load. Since there are several vDSPs on the MPSoC, running this modulation flavor is an option to consider, provided, of course, that the memory bandwidth allows using the black flavor. Finally, as expected, it is practical to use HW accelerator engines for the MIMO CA high-end use case instead of many fully loaded vDSP cores.

      This СКАЧАТЬ