Multi-Processor System-on-Chip 2. Liliana Andrade
Чтение книги онлайн.

Читать онлайн книгу Multi-Processor System-on-Chip 2 - Liliana Andrade страница 18

СКАЧАТЬ conflict-free Turbo code interleavers that enable efficient implementation with a high degree of parallelism, advanced algorithmic and architectural features, such as next-iteration initialization, optimized radix-4 kernel, re-computation and advanced normalization to reduce internal bit widths. We see that microelectronics could not keep up with the increased requirements coming from communication systems. Thus, the design of communication systems is no longer just a matter of spectral efficiency or BER/FER. When it comes to implementation, channel coding requires a cross-layer approach covering information theory, algorithms, parallel hardware architectures and semiconductor technology to achieve excellent communications performance, high throughput, low latency, low power consumption and high energy and area efficiency (Scholl et al. 2016; Kestel et al. 2018a).

Code Dec. Algorithms Parallel vs. Serial Locality Compute Kernels Transfers vs. Compute
Turbo MAP Serial/iterative Low (interleaver) Add-Compare-Select Compute dominated
LDPC Belief propagation Parallel/iterative Low (Tanner graph) Min-Sum/add Transfer dominated
Polar Successive cancelation/list Serial High Min-Sum/add/sorting Balanced

      where ω is a normalized value between 0 and 1, that represents the timing overhead due to, for example, data distribution, interconnect, memory access conflicts, etc. The maximum clock frequency f is determined by the critical path in the compute kernels of the corresponding decoding algorithms and/or delay due to interconnect. Moreover, f is typically upper limited to 1 GHz due to power and design methodology constraints. The overhead ω increases with increasing N and P and is larger for decoding algorithms that have limited locality and are data-transfer dominated. The impact of ω on the throughput can be considered as an effective reduction of the clock frequency f or decrease in P, if additional clock cycles are mandatory, for example, due to memory conflicts, that cannot be hidden. If we are targeting 1 Tbit/s throughput with a frequency limit of 1 GHz, 1,000 bits have to be decoded in one single clock cycle that requires extreme parallelism. To achieve the highest throughput, P has to be maximized, and ω minimized. In the following sections, we will discuss the throughput maximization for the different coding schemes.

      2.3.1. Turbo decoder