The following publication Y. Fang, Y. Bu, P. Chen, F. C. M. Lau and S. A. Otaibi, "Irregular-Mapped Protograph LDPC-Coded Modulation: A Bandwidth-Efficient Solution for 6G-Enabled Mobile Networks," in IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 2, pp. 2060-2073, Feb. 2023 is available at https://dx.doi.org/10.1109/TITS.2021.3122994.

# Irregular-Mapped Protograph LDPC-Coded Modulation: A Bandwidth-Efficient Solution for 6G-Enabled Mobile Networks

Yi Fang, *Member, IEEE*, Yingcheng Bu, Pingping Chen, *Member, IEEE*, Francis C. M. Lau, *Fellow, IEEE*, and Sattam Al Otaibi

Abstract-The huge amount of data produced in the 6G networks not only brings new challenges to the reliability and efficiency of mobile devices but also drives rapid development of new storage techniques. With the benefits of fast access speed and high reliability, NAND flash memory has become a promising storage solution for the 6G networks. In this paper, we investigate a protograph-coded bit-interleaved coded modulation with iterative detection and decoding (BICM-ID) utilizing irregular mapping (IM) in the NAND flash-memory systems. First, we propose an enhanced protograph-based extrinsic information transfer (EPEXIT) algorithm to facilitate the analysis of protograph codes in the IM-BICM-ID systems. With the use of EPEXIT algorithm, a simple design method is conceived for the construction of a family of high-rate protograph codes, called irregular-mapped accumulate-repeat-accumulate (IMARA) codes, which possess excellent decoding thresholds and linearminimum-distance-growth property. Furthermore, motivated by the voltage-region iterative gain characteristics of IM-BICM-ID systems, a novel read-voltage optimization scheme is developed to acquire accurate read-voltage levels, thus minimizing the decoding thresholds (in dB) of protograph codes. Analyses and simulations indicate that the proposed IMARA-aided IM-BICM-ID scheme and read-voltage optimization scheme remarkably improve the convergence and decoding performance of flashmemory systems. Thus, the proposed protograph-coded IM-BICM-ID can be viewed as a reliable and efficient storage solution for the new-generation mobile networks, such as Internet of Vehicles.

*Index Terms*—6G networks, Internet of Vehicles, bitinterleaved coded modulation (BICM), irregular mapping (IM), massive data storage, protograph LDPC codes.

#### I. INTRODUCTION

The 6G networks are expected to support three generic services, i.e., enhanced mobile broadband (eMBB), ultrareliable low latency communication (URLLC), and massive

Y. Fang and Y. Bu are with the School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China, and also with the State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710126, China (e-mail: fangyi@gdut.edu.cn; yingchengbu@126.com).

P. Chen is with the Department of Electronic Information, Fuzhou University, Fuzhou 350116, China (e-mail: ppchen.xm@gmail.com).

F. C. M. Lau is with the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong (e-mail: francis-cm.lau@polyu.edu.hk).

S. A. Otaibi is with Innovation and Entrepreneurship Center, College of Engineering, Taif University, Ta'if 26571, Saudi Arabia (e-mail: sro-taibi@tu.edu.sa).

machine-type communication (mMTC), which aim to provide high data rate, ultra-low latency, high reliability and massive connectivity [1]. To meet these requirements, a variety of leading-edge technologies, such as artificial intelligence (AI) and big-data analytics, have been applied in the design of 6G networks to optimize their performance in terms of error rate and spectral efficiency. With the evolution of 6G technology as well as the emergence of data-driven usage scenarios (e.g., autonomous vehicles), a huge amount of data is produced [2]. As such, it is indispensable to devise high-performance datastorage devices to reliably and efficiently store the massive volumes of data generated in 6G mobile networks [3], [4]. However, the conventional data-storage techniques, such as magnetic hard-disk drives (HDDs) and compact discs (CDs), have become incapable of satisfying the fast-access-speed and high-reliability requirement of the 6G-and-beyond networks. With the advantages of large storage capacity, high read-andwrite speed, low power consumption and high reliability, the NAND-flash-memory-based solid-state drives (SSDs) [5] have appeared to be a competitive and promising alternative for the 6G-enabled vehicular networks.

In fact, the NAND flash memory has been recognized as an efficient storage medium, which can be utilized in a myriad of wireless communication systems [6], [7]. Using the multilevel-cell (MLC) technique [8], [9], the flash memory can store two bits in each cell, which leads to a significant growth in storage density and capacity. However, due to the scaling down of flash memory device, the flash memory is prone to suffer from severer noises, which deteriorate the reliability of flash memory. In particular, the repeated program-anderase (PE) cycles induce severe voltage level distortions and thus result in high raw bit-error rate (BER). To ensure the data reliability of the flash memory, error correction codes (ECCs) have been used to compensate the high raw BER [10]. Unfortunately, the conventional ECCs, such as Bose-Chaudhuri-Hocquenghem [11], can no longer meet the highreliability requirement of ultra-high-density flash memory. As a remedy, low-density parity-check (LDPC) codes have appeared to be a more promising types of ECCs for flash memory [12]–[15]. In particular, LDPC codes can be decoded by iterative belief-propagation (BP) algorithm [16], in which the log-likelihood-ratios (LLRs) are iteratively exchanged between the variable nodes (VNs) and check nodes (CNs) to achieve excellent decoding performance. For this reason, the flash memory requires fine-grained memory-sensing precision to acquire accurate LLR information so as to improve the decoding performance.

Due to the self-interleaving feature of LDPC codes, an MLC

© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Manuscript received July 20, 2021, revised September 20, 2021; accepted October XX, 2021. Date of publication October XX, 2021; date of current version Month XX, 2021. This work was supported by the NSF of China (Nos. 62071131, 61771149, 62171135, U2001203, 61871132), the RGC of the Hong Kong SAR, China (Project No. PolyU 152170/18E), the Open Research Fund of the State Key Laboratory of Integrated Services Networks under Grant ISN22-23, the NSF of Guangdong Province (No. 2019A1515011465), and the Guangdong Innovative Research Team Program (No. 2014ZT05G157). (Corresponding author: Pingping Chen.)

flash-memory system with LDPC codes can be considered as a bit-interleaved coded modulation (BICM) system [17] with 4-pulse-amplitude modulation (4PAM), which is widely applied to many wireless communication applications. Since the Gray mapping can achieve the largest BICM capacity, the flash memory generally applies the Gray mapping to minimize its error probability. As an evolution of BICM, BICM with iterative detection (or demodulation) and decoding (i.e., BICM-ID), in which extrinsic-information iterations are performed between the detector and decoder, has been introduced in [18]. Benefiting from the iterative process, a large gain can be achieved to significantly enhance the system performance. Therefore, the ID architecture has been introduced into the flash-memory systems [13], [19]. In these BICM-ID flashmemory systems, however, the Gray mapping cannot attain any performance gain. Consequently, anti-Gray mapping has been considered as a preferable choice in such scenarios [19].

As a class of powerful ECCs, LDPC codes have been extensively applied in a variety of storage and communication systems. Protograph codes [20], [21], i.e., a subclass of LDPC codes, benefit from simple structures and desirable performance. Particularly, the accumulate-repeat-by-4-jaggedaccumulate (AR4JA) code [22] is a typical type of protograph code that enjoys the excellent error performance over additive white Gaussian noise (AWGN) channels. The authors in [13] have employed the asymmetric density evolution (DE) to design the LDPC codes in the BICM-ID three-level-cell (TLC) flash-memory systems. To adapt to the variation of PE cycles, the authors in [23] have designed the rate-adaptive protograph LDPC (RAP-LDPC) codes for MLC flash-memory systems without ID architecture. Recently, the authors in [19] have developed a voltage-sensing protograph-based extrinsic information transfer (VS-PEXIT) algorithm to optimize the protograph codes in the BICM-ID flash-memory systems.

To achieve excellent decoding performance of BICM-ID systems, a sufficiently large number of iterations are required, which heavily increase the computational complexity and decoding latency of the receiver. In particular, for flash-memory systems, the total read latency is composed of firmware processing, memory-sensing latency and flash-to-controller data transfer latency, where the decoding latency is included [24], [25]. Thus, the application of BICM-ID would increase the total read latency and consequently deteriorate the system performance. To overcome this disadvantage, irregular mapping (IM) has been developed to accelerate the convergence of BICM-ID systems [26]-[29], in which different mappings are used within a codeword. Compared with the regular mapping, IM can provide additional convergence improvement and more flexible design for BICM-ID systems. In [27], the authors have adopted two different mappings within the same codeword and found a proper mixing ratio through simulations. Also, an IM design method has been presented in [28], where a new mapping is searched via modified adaptive binary switch algorithm (ABSA) given a pre-fix mapping, a channel code and a mixing ratio. The authors in [29] have optimized the mixing ratio of two mappings by maximizing the extrinsic mutual information (MI). Nonetheless, the above mentioned works only focus on the design of IM for BICM-ID systems over AWGN and fading channels. As far as we know, few studies have touched upon the analysis and protograph-code design for BICM-ID with IM in MLC flash-memory systems.

The performance of detection is another important issue to determine the reliability of MLC flash-memory systems. To fully benefit from the LDPC decoder at the receiver, it is desirable to acquire the initial channel LLR information as accurate as possible. Consequently, some read-voltage optimization schemes have been proposed to obtain more accurate LLR information for MLC flash-memory systems [12], [24], [30], in which the ID framework was not considered. In particular, a more precise read-voltage optimization scheme has been provided in [30], which selects the read-voltage levels by maximizing the MI (MMI) between the input and output of a flash-memory channel. Furthermore, a voltage-entropybased read-voltage optimization scheme has been proposed to minimize the channel error probability in [24]. In this readvoltage optimization scheme, the width of erasure regions is selected through various simulations, and it dramatically increases the system complexity. However, how to optimize the read voltages by considering the features of the BICM-ID and IM in MLC flash-memory systems has not been reported in the open literature.

In this paper, we investigate the performance of the protograph-coded IM-BICM-ID in the MLC flash-memory systems. In the IM-BICM-ID systems, we propose to employ two fix mappings, namely Grav mapping and anti-Grav mapping, within a codeword. To facilitate the analysis of protograph codes, an enhanced PEXIT (EPEXIT) algorithm is developed, which considers the ID architecture, IM and variation of threshold voltages. Based on the EPEXIT algorithm, we put forward a novel design approach for the construction of a family of high-rate protograph codes, called irregularmapped accumulate-repeat-accumulate (IMARA) codes. The proposed rate-0.9 IMARA code can not only possess the linear-minimum-distance-growth property, but also benefit from a desirable decoding threshold, which has only a 0.182dB gap to the capacity limit of an IM-BICM-ID flash-memory channel. In addition, in the proposed IM-BICM-ID systems, one can observe that the MI of the coded bits located in the overlapped region between two adjacent symbols with the largest Hamming distance (i.e.,  $d_{Ham} = 2$ , referred to as dominant-gain region) can significantly increase after performing the outer iterations, while those located in other regions cannot do so. Inspired by the above voltage-region iterative gain characteristics, an EPEXIT-aided read-voltage optimization scheme is introduced to minimize the decoding thresholds of protograph codes. Through simulations, it is shown that the IMARA-based IM-BICM-ID scheme stands out as a superior signal processing framework compared with the conventional BICM-ID and IM-BICM-ID schemes over MLC flash-memory channels in terms of error performance and decoding latency. Besides, it is demonstrated that the proposed read-voltage optimization scheme can not only boost the decoding performance of IM-BICM-ID systems, but also further reduce the decoding latency over MLC flash-memory channels with respect to the state-of-the-art read-voltage optimization schemes.



Fig. 1. Block diagram of a protograph-coded IM-BICM-ID MLC flashmemory system.

Thanks to the aforementioned advantages, the proposed protograph-based IM coded-modulation storage scheme appear to be a prospective storage framework for the 6G-enabled mobile communication applications, such as Internet of Vehicles.

## II. SYSTEM MODEL AND PRELIMINARIES

#### A. Transmitter

The block diagram of a protograph-coded IM-BICM-ID MLC flash-memory system is shown in Fig. 1, where Ddifferent types of signal constellations  $\chi_d$  and mappings  $\mu_d$  $(d = 1, 2, \dots, D)$  are adopted within a codeword. Suppose that a protograph has P VNs  $v_i$  (j = 1, 2, ..., P). A protograph LDPC code can be derived through a "copy-and-permute" operation. The protograph encoder encodes a information-bit sequence s into a codeword c (i.e., coded-bit sequence) of length n, where n = lP and l is the lifting factor of a protograph code. As can be seen in Fig. 2, due to the structure of a protograph code [20], the coded-bit sequence c can be grouped into P blocks  $V_1, V_2, \ldots, V_P$ , and each block  $V_j$  comprises the l copies of  $v_i$ . Assuming that D different types of mappings are employed in an IM-BICM-ID system, one can divide each block  $\hat{V}_j$  into D sub-blocks  $\{\hat{V}_{j,d} : d = 1, 2, \dots, D\}$ . The length of  $\hat{V}_{j,d}$  equals  $\alpha_d l$  (j = 1, 2, ..., P; d = 1, 2, ..., D), where  $\alpha_d \in (0, 1)$  and  $\sum_{d=1}^{D} \alpha_d = 1$ . As a result, the coded-bit sequence  $\mathbf{c} = (\hat{V}_1, \hat{V}_2, ..., \hat{V}_P)$  can be expressed as  $\mathbf{c} = (\hat{V}_{1,1}, \hat{V}_{1,2}, \dots, \hat{V}_{1,D}; \dots; \hat{V}_{P,1}, \hat{V}_{P,2}, \dots, \hat{V}_{P,D}).$ 

The principle of bit interleaving process is illustrated in Fig. 2.<sup>1</sup> After being processed by the interleaver II, the interleaved coded-bit sub-sequence  $\mathbf{c}_d$  of length  $\alpha_d lP = \alpha_d n$  is produced. To be specific, the interleaved coded-bit sub-sequence  $\mathbf{c}_d$  consists of the sub-blocks  $\hat{V}_{1,d}, \hat{V}_{2,d}, \ldots, \hat{V}_{P,d}$ , i.e.,  $\mathbf{c}_d = (\hat{V}_{1,d}, \hat{V}_{2,d}, \ldots, \hat{V}_{P,d})$ . Therefore, the interleaved coded-bit sub-sequence  $\hat{\mathbf{c}} = (\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_D)$  consisting of D interleaved coded-bit sub-sequences can be represented as  $\hat{\mathbf{c}} = (\hat{V}_{1,1}, \hat{V}_{2,1}, \ldots, \hat{V}_{P,1}; \ldots; \hat{V}_{1,D}, \hat{V}_{2,D}, \ldots, \hat{V}_{P,D})$ .

Afterwards, the interleaved coded-bit sub-sequence  $\mathbf{c}_d = (c_{d,1}, c_{d,2}, \ldots, c_{d,\alpha_d n})$  is reshaped into another sub-sequence  $\mathbf{c}'_d = (c'_{d,1}, c'_{d,2}, \ldots, c'_{d,\alpha_d m})$ . Especially,  $\mathbf{c}'_d$  is composed of  $\alpha_d m$  k-arrays, where  $c'_{d,p}$   $(p = 1, 2, \ldots, \alpha_d m)$  represents a k-array containing k = n/m bits. Then, a modulated



Fig. 2. Illustration of the bit interleaving process in an IM-BICM-ID system.

symbol sub-sequence  $\mathbf{x}_d$  can be generated based on the subsequence  $\mathbf{c}'_d$ , where  $\mathbf{x}_d = (x_{d,1}, x_{d,2}, \ldots, x_{d,\alpha_d m})$  and the *p*th modulated symbol  $x_{d,p}$  is selected from a  $2^k$ -ary signal constellation  $\chi_d$ . Here, let  $b_t(x_{d,p})$   $(t = 1, 2, \ldots, k)$  denote the *t*-th bit of the label of  $x_{d,p}$  and  $c'_{d,p}$  denote the *t*-th bit in  $c'_{d,p}$ .

#### B. Flash-Memory Channel Model

This paper considers an MLC flash-memory channel model including the random telegraph noise, programming noise, data retention noise and cell-to-cell interference [24], [25], [31], which can be formulated as:

$$V_{\rm th} = v_w + n_u + n_p + n_w + n_r + n_c, \tag{1}$$

where  $V_{\rm th}$  is the threshold voltage of a memory cell and  $v_w$  is the write-voltage level;  $n_u$  and  $n_p$  represent the incremental step pulse programming (ISPP) noise and programming noise, respectively;  $n_w$  is the random telegraph noise, which is related to the repeated PE cycles;  $n_r$  is the data retention noise caused by charge leakage over retention time after flash memory cells are programmed, which is related to the retention time and number of PE cycles;  $n_c$  is the cellto-cell interference, which can be alleviated by conducting post-compensation and pre-distortion techniques [24], [31]. According to [24], [25], [31], the four write-voltage levels are assumed as  $\{1.4 \text{ V}, 2.6 \text{ V}, 3.2 \text{ V}, 3.93 \text{ V}\}$  and the ISPP size is assumed as 0.3 V in this paper. The standard deviation of programming noise is set to 0.05, which remains unchanged throughout the lifetime of flash memory. The parameters of remaining noises in the flash-memory systems are set according to [24]. Additionally, the pre-distortion technique can be used to mitigate the effect of cell-to-cell interference.

Fig. 3 shows threshold-voltage distributions of an MLC flash memory employing *Gray* mapping and *anti-Gray* mapping. The four threshold-voltage levels  $S_1$ ,  $S_2$ ,  $S_3$  and  $S_4$  correspond to the data symbols '11', '10', '00' and '01' for *Gray* mapping, respectively, while for *anti-Gray* mapping, the four threshold-voltage levels represent the data symbols '11', '10', '01' and '00', respectively. To obtain accurate LLR information for the protograph decoder, multiple memory-sensing operations should be performed. For instance, the read-voltage levels can be obtained by maximizing the MI (MMI) between the input and output of the flash-memory channel [30].

Fig. 3 presents an MLC flash memory using 6 read voltages, in which the vertical dashed-lines (i.e.  $R_1$  to  $R_6$ ) denote the

<sup>&</sup>lt;sup>1</sup>Although LDPC codes possess an intrinsic interleaving feature, we still need an extra interleaver to re-assign the coded-bit sequence c into D subsequences for modulation, where each sub-sequence is modulated by using a unique mapping. In particular, a carefully-designed interleaver can help to improve the error correction performance for the MLC flash-memory IM-BICM-ID system.



Fig. 3. Threshold-voltage distributions of an MLC flash memory employing *Gray* mapping and *anti-Gray* mapping.

read-voltage levels. With 6 read-voltage levels obtained via MMI technique [30], the MLC flash-memory channel can be modeled as a 4-input 7-output discrete memoryless channel (DMC) [30],<sup>2</sup> where its input  $X \in \{00, 01, 10, 11\}$  and output  $Y \in \{00, 01, 10, 11, e_1, e_2, e_3\}$   $(e_1, e_2, e_3$  denote the symbols in three distinct erasure regions  $E_1, E_2, E_3$ , respectively). Therefore, for a quantized IM-BICM-ID flash-memory channel using six read voltages, its capacity can be computed as (2), where  $P_{ur}$  represents the channel transfer probability of an equivalent DMC, i.e.,  $P(Y_r|X_u)$  (r = 1, 2, ..., 7; u = 1, 2, 3, 4), which is expressed by

$$P(Y_r|X_u) = \int_{R_{r-1}}^{R_r} f_{S_u}(V_{\rm th}) dV_{\rm th}.$$
 (3)

Here,  $R_r$  is the read-voltage level,  $R_0 = -\infty$ ,  $R_7 = \infty$ , and  $f_{S_u}(V_{\text{th}})$  is the probability density function (PDF) of the threshold-voltage level  $S_u$  [24].

#### C. Receiver

At the receiver, the iterative detection and decoding are performed. We define the inner iterations as the iterations within the protograph decoder (i.e., between the VNs and CNs), and define the outer iterations as the iterations between the detector and decoder. The LLR sequences for the outer iteration are defined as follows:  $\{L_{\rm E,det}\}$  represents the extrinsic LLR sequence of the detector, which consists of D sub-sequences, i.e.,  $\{L_{\rm E,det}\} = (\{L_{\rm E,det}^1\}, \{L_{\rm E,det}^2\}, \dots, \{L_{\rm E,det}^D\})$ , where  $\{L_{\rm E,det}\}$  contains P blocks  $(d = 1, 2, \dots, D)$ ;  $\{L_{\rm A,det}\}$ represents the *a priori* LLR sequences, i.e.,  $\{L_{\rm A,det}\} = (\{L_{\rm A,det}^1\}, \{L_{\rm A,det}^2\}, \dots, \{L_{\rm A,det}^D\})$ , where  $\{L_{\rm A,det}^1\}$  contains *P* blocks;  $\{L_{E,dec}\}$  and  $\{L_{A,dec}\}$  are the extrinsic LLR sequence and *a priori* LLR sequence of the decoder, respectively.

As is shown in Fig. 1, the detector utilizes the stored symbol sub-sequence  $\mathbf{y}_d = (y_{d,1}, y_{d,2}, \dots, y_{d,\alpha_d m})$  and its *a priori* LLR sub-sequence  $\{L_{A,det}^d\}$   $(d = 1, 2, \dots, D)$  fed back from the decoder to evaluate the extrinsic LLR sub-sequence  $\{L_{E,det}^d\}$ , which can be given by

$$L_{\mathrm{E,det}}^{d}(c_{d,p}^{'t}) = \ln \frac{\sum\limits_{x_{d,p} \in \chi_{d,0}^{t}} P_{d,p} \exp\left(\sum\limits_{\substack{q=1,q \neq t \\ b_{q}(x_{d,p})=0}}^{k} L_{\mathrm{A,det}}^{d}(c_{d,p}^{'q})\right)}{\sum\limits_{x_{d,p} \in \chi_{d,1}^{t}} P_{d,p} \exp\left(\sum\limits_{\substack{q=1,q \neq t \\ b_{q}(x_{d,p})=0}}^{k} L_{\mathrm{A,det}}^{d}(c_{d,p}^{'q})\right)},$$
(4)

where  $P_{d,p}$  represents the channel transition probability  $P(y_{d,p}|x_{d,p})$ , and  $\chi_{d,b}^t$  denotes the subset of constellation  $\chi_d$  whose label has the value  $b \in \{0,1\}$  in its *t*-th position, i.e.,  $\chi_{d,b}^t = \{x_{d,p} \in \chi_d : b_t(x_{d,p}) = b, b \in \{0,1\}\}.$ 

Especially, during the initialization process, the *a priori* LLR sub-sequence of the detector  $\{L_{A,det}^d\} = \{0\}$ , the extrinsic LLR sub-sequence of the detector  $\{L_{E,det}^d\}$  is equal to the channel LLR sub-sequence  $\{L_{ch}^d\}$ . Subsequently, the extrinsic LLR sequence of the detector  $\{L_{E,det}^d\}$  is processed by deinterleaver and sent to decoder to serve as its *a priori* LLR sequence  $\{L_{A,dec}\}$ . The decoder uses the *a priori* LLR sequence  $\{L_{E,dec}\}$ . Afterwards, the extrinsic LLR sequence of the detector  $\{L_{E,dec}\}$ . Afterwards, the extrinsic LLR sequence of the decoder  $\{L_{E,dec}\}$  is processed by an interleaver and fed back to the detector to serve as its *a priori* LLR sequence  $\{L_{A,dec}\}$ .

## III. EPEXIT ALGORITHM FOR IM-BICM-ID MLC Flash-Memory Systems

With an aim to analyzing and designing protograph codes in the IM-BICM-ID flash-memory systems, an *enhanced PEXIT* (*EPEXIT*) algorithm is proposed, which substantially considers the ID architecture, IM and variation of threshold voltages.

## A. Equivalent SNR for MLC Flash-Memory Channels

Based on the flash-memory channel model in [23], [24], [31], the threshold voltages of the erased state  $S_1$  can be accurately modeled as a Gaussian-distributed random variable, while the threshold voltages of a programmed state  $S_u$  (u = 2, 3, 4) approximately follow a Gaussian distribution. Therefore, it makes sense to define the equivalent signal-tonoise ratio (SNR) based on Gaussian approximation in the analysis of decoding thresholds of protograph codes in the

$$C = H(Y) - H(Y|X)$$

$$= H\left(\frac{\sum_{u=1}^{4} P_{u1}}{4}, \frac{\sum_{u=1}^{4} P_{u2}}{4}, \frac{\sum_{u=1}^{4} P_{u3}}{4}, \frac{\sum_{u=1}^{4} P_{u4}}{4}, \frac{\sum_{u=1}^{4} P_{u5}}{4}, \frac{\sum_{u=1}^{4} P_{u6}}{4}, \frac{\sum_{u=1}^{4} P_{u7}}{4}\right) - \frac{1}{4}\left[\sum_{u=1}^{4} H\left(P_{u1}, P_{u2}, P_{u3}, P_{u4}, P_{u5}, P_{u6}, P_{u7}\right)\right]$$

$$(2)$$

<sup>&</sup>lt;sup>2</sup>Since the cell-to-cell interference can be compensated by using postcompensation or pre-distortion techniques, the flash-memory channel can be regarded as a memoryless channel [24], [30], [31].



Fig. 4. Block diagram of the EPEXIT algorithm in an IM-BICM-ID flashmemory system.

IM-BICM-ID systems.<sup>3</sup> To be specific, we approximate the threshold-voltage distribution of the programmed state  $S_u$  (u = 2, 3, 4) by using a Gaussian PDF, denoted as  $\tilde{f}_{S_u}(V_{\text{th}})$ , i.e.,

$$\tilde{f}_{S_u}(V_{\rm th}) = \frac{1}{\sqrt{2\pi\sigma_u}} \exp\left(-\frac{(V_{\rm th} - \mu_u)^2}{2\sigma_u^2}\right),\tag{5}$$

where  $\mu_u$  and  $\sigma_u^2$  are the mean and variance of the approximated threshold-voltage distribution  $V_{\rm th} \sim \mathcal{N}(\mu_u, \sigma_u^2)$ , respectively. When the number of PE cycles and the retention time are given,  $\mu_u$  and  $\sigma_u^2$  can be evaluated based on the real threshold-voltage distribution function  $f_{S_u}(V_{\rm th})$  [24], i.e.,

$$\mu_u = \int_{-\infty}^{+\infty} V_{\rm th} f_{S_u}(V_{\rm th}) \,\mathrm{d}V_{\rm th},\tag{6}$$

$$\sigma_u^2 = \int_{-\infty}^{+\infty} V_{\rm th}^2 f_{S_u}(V_{\rm th}) dV_{\rm th} - \mu_u^2.$$
(7)

As a result, the equivalent SNR per information bit, i.e.,  $E_b/N_0$ , can be defined by treating the flash-memory channel model as a 4-PAM-aided AWGN channel model [30], [32], as

$$\frac{E_b}{N_0} = \frac{E_s}{2RN_0} = \frac{\sum_{u=1}^4 w_{S_u} \mu_u^2}{4R \sum_{u=1}^4 w_{S_u} \sigma_u^2} = \frac{\sum_{u=1}^4 \mu_u^2}{4R \sum_{u=1}^4 \sigma_u^2}, \quad (8)$$

where  $E_s/N_0$  is the SNR per symbol,  $E_b$  is the average energy per information bit,  $N_0 = 2 \sum_{u=1}^4 w_{S_u} \sigma_u^2$  is the noise powerspectral density, R is the code rate,  $\mu_1$  and  $\sigma_1^2$  are the mean and variance of the voltage distribution of erased state  $S_1$ , respectively,  $w_{S_u}$  is the probability that the memory cells are written to voltage state  $S_u$  during the programming process. The memory cells are always written to different voltage states with equal probability, i.e.,  $w_{s_u} = 1/4$  for u = 1, 2, 3, 4.

#### B. Proposed EPEXIT Algorithm

A protograph with P VNs  $v_1, v_2, \ldots, v_P$  and Q CNs  $c_1, c_2, \ldots, c_Q$  can be represented by a  $Q \times P$  base matrix  $\mathbf{B} = (b_{i,j})$ , in which  $b_{i,j}$  represents the number of edges connecting  $v_j$   $(j = 1, 2, \ldots, Q)$  to  $c_i$   $(i = 1, 2, \ldots, P)$ . Based on the size of a protograph or its corresponding base matrix, one can easily obtain the code rate as  $R = (P-Q)/(P-N_E)$ , where  $N_E$  is the number of punctured VNs. After an *l*-time "copy-and-permute" (also known as lifting) of the base matrix  $\mathbf{B}$ , the parity-check matrix of a protograph code  $\mathbf{H}$  of size  $(lQ) \times (lP)$  can be derived. Typically, the copy-and-permute procedure can be implemented by a modified progressive edge growth (PEG) algorithm [33].

EXIT algorithm is an effective tool to predict the convergence behavior of iterative decoders. Especially, the conventional PEXIT algorithm can be employed to predict the asymptotic convergence performance of protograph codes in terms of decoding thresholds [34]. However, the conventional PEXIT algorithm only considers the binary-phase-shift-keying (BPSK) modulation, AWGN, and the MI update between VNs and CNs within the BP decoder, which is not suitable for flash-memory systems with ID architecture and IM scheme. To overcome this disadvantage, an EPEXIT algorithm, considering the iterations between the detector and decoder, IM and variation of threshold voltages, is proposed to trace the MI evolution not only between VNs and CNs, but also between the detector and decoder.

The block diagram of the EPEXIT algorithm in an IM-BICM-ID flash-memory system is illustrated in Fig. 4, where the extrinsic MI output from the detector serves as the *a priori* MI of the decoder, i.e.,  $I_{\rm E,det} = I_{\rm A,dec}$ , and the extrinsic MI output from the decoder serves as the *a priori* MI of the detector, i.e.,  $I_{\rm E,dec} = I_{\rm A,det}$ . It can be verified that the extrinsic LLR sequence of the decoder  $\{L_{\rm E,dec}\}$  is approximately subjected to a Gaussian distribution through Monte-Carlo simulations. Therefore, the  $J(\cdot)$  function [35] can be utilized to calculate the extrinsic MI of the decoder  $I_{\rm E,dec}$ , as

$$I_{\rm E,dec} = 1 - \int_{-\infty}^{\infty} \frac{\exp\left(-\frac{(z-\sigma_L^2/2)^2}{2\sigma_L^2}\right)}{\sqrt{2\pi\sigma_L^2}} \times \log_2\left[1 + \exp(-z)\right] \mathrm{d}z,$$
(9)

where  $\sigma_L$  denotes the standard deviation of  $\{L_{E,dec}\}$ .

Suppose that a protograph codeword c is grouped into P blocks  $\hat{V}_1, \hat{V}_2, \ldots, \hat{V}_P$ , where  $\hat{V}_i$  comprises the l copies of  $v_j$ . According to the principle of IM, the *j*-th block  $\hat{V}_{j}$  is further divided into D sub-blocks  $\{\hat{V}_{j,d}: d =$  $1, 2, \ldots, D$ . At the receiver, the LLR sequence of the detector  $\{L_{\phi,\text{det}}\}$  consists of D sub-sequences, i.e.,  $\{L_{\phi,\text{det}}\} = (\{L_{\phi,\text{det}}^1\}, \{L_{\phi,\text{dec}}^2\}, \dots, \{L_{\phi,\text{dec}}^D\})$ , where  $\{L_{\phi,\text{det}}^d\}$  contains P blocks  $(d = 1, 2, \dots, D)$ , and  $\phi \in \{A, E\}$ . The above statement also holds for the LLR sequence of the decoder  $\{L_{\phi, \text{dec}}\}$ . Consequently, let  $\{L_{A, \text{det}}^{d, j}\}$  denote the *j*-th block of the *a priori* LLR sub-sequence of the detector  $\{L_{A,det}^d\}$ , and let  $\{L_{E,det}^{d,j}\}$  represent the *j*-th block of the extrinsic LLR sub-sequence of the detector  $\{L_{E,det}^d\}$ , where d = 1, 2, ..., D. In addition, we assume that the maximum numbers of outer iterations and inner iterations are  $K_{out}$  and  $K_{in}$ , respectively. Based on the above foundations, the proposed EPEXIT algorithm for an IM-BICM-ID flash-memory system is described as below.

1) **Initialization:** Based on a given number of PE cycles and a given retention time, 6 read-voltage levels are selected via the MMI technique [30] to dynamically adapt to the variation of threshold voltages in MLC flash memory. The information bits are encoded to coded bits c and then processed by the interleaver II to produce the interleaved coded-bit sequence  $\hat{c}$ . Then, the interleaved codeword  $\hat{c}$  is transformed to a modulated symbol sequence xand passed through the flash-memory channel. Based on

<sup>&</sup>lt;sup>3</sup>The Gaussian approximation is only used in the EPEXIT analysis, while the real threshold-voltage distributions are adopted in simulations.

the 6 read voltages, the stored symbol sequence  $\mathbf{y}$  can be promptly detected. Subsequently, the channel LLR sequence  $\{L_{ch}\}$  of  $\hat{\mathbf{c}}$ , which contains D sub-sequences  $\{L_{ch}^1\}, \{L_{ch}^2\}, \ldots, \{L_{ch}^D\}$ , is evaluated by

$$L_{\rm ch}^{d}(c_{d,p}^{'t}) = \ln \frac{\sum\limits_{x_{d,p} \in \chi_{d,0}^{t}} P(y_{d,p}|x_{d,p})}{\sum\limits_{x_{d,p} \in \chi_{d,1}^{t}} P(y_{d,p}|x_{d,p})}.$$
 (10)

2) Estimation of extrinsic LLR sub-sequence of the detector: When the *a priori* MI of the detector I<sup>j</sup><sub>A,det</sub> ∈ [0,1] is given, we can calculate the standard deviation σ<sup>j</sup><sub>A,det</sub> of the *a priori* LLR sequence {L<sup>d,j</sup><sub>A,det</sub>}, given by

$$\sigma_{\rm A,det}^j = J^{-1}(I_{\rm A,det}^j), \tag{11}$$

where the  $J^{-1}(\cdot)$  is the inverse function of  $J(\cdot)$  function [35]. For d = 1, 2, ..., D, we generate the *a priori* LLR block  $\{L_{A,det}^{d,j}\}$  of length  $\alpha_d l$  following the symmetric Gaussian distribution  $\mathcal{N}(\pm \sigma_{A,det}^j/2, \sigma_{A,det}^j)$ , where *l* is the length of the *j*-th block  $\hat{V}_j$ . Thus, for d = 1, 2, ..., D, the *a priori* LLR sub-sequence of the detector  $\{L_{A,det}^d\}$ can be generated, which is expressed by  $\{L_{A,det}^d\} =$  $\{\{L_{A,det}^{d,1}\}, \{L_{A,det}^{d,2}\}, ..., \{L_{A,det}^{d,P}\}\}$ . Subsequently, we can compute the extrinsic LLR sub-sequence of the detector  $\{L_{E,det}^d\}$  via applying (4).

3) **Computation of channel MI:** Based on the *j*-th block  $\{L_{\rm E,det}^{d,j}\}$  of the extrinsic LLR sub-sequence of the detector  $\{L_{\rm E,det}^d\}$ , the extrinsic MI  $I_{\rm E,det}^{d,j}$  can be obtained by exploiting Monte-Carlo method,<sup>4</sup> which is given by [36]

$$I_{\rm E,det}^{d,j} = 1 - \mathbb{E}\left[\log_2(1 + e^{-L_{\rm E,det}^{d,j}})|c=0\right], \quad (12)$$

where c represents a coded bit. The extrinsic MI of the detector  $I_{E,det}^{j}$ , for j = 1, 2, ..., P, can be measured as

$$I_{\rm E,det}^{j} = \sum_{d=1}^{D} \alpha_d I_{\rm E,det}^{d,j}.$$
 (13)

Hence, the channel MI of the *j*-th VN  $v_j$  (j = 1, 2, ..., P) for the decoder is set to  $I_{E,det}^j$ , i.e.,  $I_{ch}^j = I_{E,det}^j$ , if  $V_j$  is not punctured; otherwise  $I_{ch}^j = 0$ .

4) Calculation of *a posteriori* MI of the decoder: Based on the channel MI, we can update the MI between the VNs and CNs during the inner iterations. Specifically, the *extrinsic-MI* derivation of VN-to-CN and CN-to-VN is available in [34]. We can obtain the *a posteriori* MI  $I_{ADD}^{j}$  after each inner iteration, given by

$$I_{\rm App}^{j} = J\left(\sqrt{\sum_{i} b_{i,j} \left[J^{-1}(I_{\rm Av}^{i,j})\right]^{2} + \left[J^{-1}(I_{\rm ch}^{j})\right]^{2}}\right), (14)$$

where  $I_{Av}^{i,j}$  is the *a priori* MI from the CN  $c_i$  to the VN  $v_j$ . Then, the inner iterative process continues until  $I_{App}^j = 1$  for all VNs or  $K_{in}$  is reached. 5) Update of *a priori* MI of the detector: The extrinsic MI of the decoder  $I_{E,dec}^{j}$  can be evaluated after  $K_{in}$  inner iterations and is given by

$$I_{\rm E,dec}^{j} = J\left(\sqrt{\sum_{i} b_{i,j} \left[J^{-1}(I_{\rm Av}^{i,j})\right]^{2}}\right),$$
 (15)

where  $I_{\rm E,dec}^{j} = 0$  if  $V_{j}$  is punctured. Then,  $I_{\rm E,dec}^{j}$  is passed to the detector as its *a priori* MI  $I_{\rm A,det}^{j}$ .

6) Evaluation of decoding threshold: Repeat Step 1) to Step 5) until  $I_{App}^{j} = 1$  for j = 1, 2, ..., P or  $K_{out}$  is reached. As a given number of PE cycles is equivalent to an  $E_b/N_0$ ,<sup>5</sup> the decoding threshold of a protograph code can be viewed as the minimum SNR to guarantee  $I_{App}^{j} = 1$  for j = 1, 2, ..., P.

Remarks:

- The proposed EPEXIT algorithm utilizes the Monte-Carlo method to calculate the extrinsic MI of the detector. Therefore, to guarantee the accuracy of the EPEXIT analysis, the length of the coded bits must be sufficiently large.
- After performing the inner iterations, the extrinsic MIs of the *P* VNs from the decoder may have different values because the degrees of the *P* VNs are probably different. Consequently, the extrinsic MI of the *P* VNs from the decoder should be independently fed back to the detector to ensure the accuracy of EPEXIT algorithm. In addition, by considering the IM, the extrinsic MI of the *j*-th VN output from the decoder is utilized to produce *D a priori* LLR blocks  $\{L_{A,det}^{1,j}\}, \{L_{A,det}^{2,j}\}, \ldots, \{L_{A,det}^{D,j}\}$  for calculating the extrinsic MI of the detector  $I_{E,det}^{j}$ .
- Although we assume that the MMI method is employed to dynamically adapt to the variation of threshold voltages of MLC flash memory in the proposed EPEXIT algorithm, other read-voltage optimization methods, such as voltage-entropy-based read-voltage optimization scheme [24], can also be exploited in our proposed algorithm.
- Since the proposed EPEXIT algorithm considers the ID architecture, IM and variation of threshold voltages, the decoding threshold over an MLC flash-memory channel is related to the read-voltage optimization scheme, the mixing ratio and the type of protograph code.

We set the maximum number of outer iterations  $K_{out}$  to 6 in the EPEXIT algorithm, which coincides with the simulations in Sect. IV and Sect. V. In MLC flash-memory systems, a modulated symbol is composed of 2 bits (i.e., k = 2). In the following, we will employ the *Gray* mapping and *anti-Gray* mapping within a codeword (i.e., D = 2, d = 1, 2) to validate the effectiveness of the EPEXIT algorithm and the superiority of the proposed IM-BICM-ID design. Note that the IM-BICM-ID scheme can also be implemented by utilizing three or more mappings within the same codeword (i.e.,  $D \ge 3$ ). However, we just focus on the IM including two sub-mappings in this paper so as to illustrate the potential benefits of IM-BICM-ID systems in a simple and clear way.

<sup>&</sup>lt;sup>4</sup>As the MLC flash-memory channel is asymmetric, we assume that a binary codeword including both 0 and 1 bits (instead of an all-zero codeword) is transmitted in the EPEXIT algorithm. Based on this assumption, we can accurately calculate the initial channel MI for an MLC flash-memory channel by using the Monte-Carlo simulations.

<sup>&</sup>lt;sup>5</sup>According to Sect. III-A, we can easily measure the equivalent SNR for a given number of PE cycles based on the Gaussian approximation.



Fig. 5. Protograph structures of (a) the AR4JA codes and (b) the proposed IMARA codes.

## IV. DESIGN HIGH-RATE PROTOGRAPH CODES FOR IM-BICM-ID MLC FLASH-MEMORY SYSTEMS

Here, we put forward a design approach for the construction of protograph codes with two fixed mappings and a mixing ratio for such scenarios. For the two given mappings, i.e., *Gray* mapping and *anti-Gray* mapping, we select a typical mixing ratio  $\alpha_1 = \alpha_2 = 0.5$  without loss of generality. To meet the high-rate demand for flash-memory applications, our goal is to construct a family of high-rate protograph codes with the lowest decoding thresholds and linear-minimum-distancegrowth property [36] for IM-BICM-ID systems. Note that the proposed code design method for IM-BICM-ID systems is also applicable to other mixing ratios.

#### A. Analysis of Existing Protograph Codes

We consider a typical protograph code, namely the AR4JA code, which can achieve near-Shannon-limit performance over AWGN channels [20]. The base matrix of AR4JA codes of size  $3 \times (2n+5)$  with rates of R = (n+1)/(n+2) is given by

$$\mathbf{B}_{\text{AR4JA}} = \begin{bmatrix} 2n & 2n \\ 1 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 3 & 1 & 1 & 1 & 1 & 3 & \cdots & 1 & 3 \\ 0 & 1 & 2 & 2 & 1 & 3 & 1 & \cdots & 3 & 1 \end{bmatrix}, \quad (16)$$

where n is the number of VN extension patterns, and the sixth and seventh columns are repeated in the last 2n columns. The structure of AR4JA codes is illustrated in Fig. 5(a), where the dark circles represent the transmitted VNs, the white circle represents the punctured VN (corresponding to the second column in (16)), and the plus circles represent the CNs.

To evaluate the error performance of the high-rate AR4JA code in the IM-BICM-ID flash-memory systems, we select a regular column-weight-3 (CW-3) LDPC code with a code rate of 0.9 [36] as a benchmark. Note that the performance of the AR4JA code is superior to that of the regular CW-3 code over AWGN channels. As shown in Table I, we observe that the decoding threshold (in dB) of the rate-0.9 regular CW-3 LDPC code is 11.573 dB, which is 0.027 dB lower than that of the AR4JA code (11.600 dB), by applying the proposed EPEXIT algorithm. The results indicate that the AR4JA code is expected to be inferior to the regular CW-3 LDPC code in the IM-BICM-ID flash-memory systems. Hence, to optimize the system performance, we develop a design method for the construction of a family of high-rate protograph codes.

## B. Design of Protograph Codes for IM-BICM-ID Systems

To meet the high-rate demand for flash-memory applications, we first propose a design method starting from a rate-0.9 protograph code with a base matrix of size  $3 \times 21$  (n = 8), which includes a punctured VN. Then, the constructed rate-0.9 protograph code can be extended to higher-rate protograph codes, i.e., R = (n + 1)/(n + 2) (n > 9), via repeatedly adding a VN extension pattern (i.e., two VNs) with linearminimum-distance-growth property [36] to the resultant base matrix. To facilitate the design, we impose some constraints on the rate-0.9 protograph code in order to guarantee that its initial base matrix possesses linear-minimum-distance-growth property and a relatively low decoding threshold, as follows.

- Since a protograph with low decoding threshold generally contains a degree-1 VN, a high-degree punctured VN and some degree-2 VNs [36], we initialize a protograph with a precoding structure (i.e., a degree-1 VN) corresponding to the first column and first row in the base matrix, a highest-degree punctured VN corresponding to the second column with the largest weight, and a degree-2 VN outside the precoding structure (i.e., excluding the first row and first column in the base matrix) [37].
- 2) As the linear-minimum-distance-growth property of a protograph code is sensitive to the number of degree-2 VNs, one should limit the maximum number of degree-2 VNs to be less than the number of CNs outside the precoding structure to preserve this property. Thereby, the proposed protograph can contain at most one degree-2 VN, which is assigned to the fifth column in the base matrix. In other words, the degree of other VNs outside the precoding structure must be no less than 3 to preserve the linear-minimum-distance-growth property.
- 3) Inspired by the structure of high-rate protograph codes enabling the linear-minimum-distance-growth property (e.g., the AR4JA and RJA codes) [22], [36], eight VN extension patterns should be appended to the first five VNs (i.e., a  $3 \times 5$  base matrix) to formulate all the 21 columns of the base matrix corresponding to a rate-0.9 protograph code. To allow the newly added VN extension patterns to enable linear-minimum-distancegrowth property while remaining the lowest complexity (i.e., the encoding and decoding complexity of a protograph code can be partially reflected by the number of edges connecting VNs to CNs), we employ two degree-3 VNs to construct such an extension pattern, denoted by  $[0\ 1\ 2,\ 0\ 2\ 1]^{T}$  (where "T" is the transposition operation), in the proposed rate-0.9 protograph.

Based on the above constraints, the base-matrix structure of the proposed rate-0.9 protograph code can be formulated as

$$\mathbf{B}_{0.9} = \begin{bmatrix} 1 & b_{1,2} & b_{1,3} & b_{1,4} & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & b_{2,2} & b_{2,3} & b_{2,4} & 1 & 1 & 2 & \cdots & 1 & 2 \\ 0 & b_{3,2} & b_{3,3} & b_{3,4} & 1 & 2 & 1 & \cdots & 2 & 1 \end{bmatrix},$$
(17)

where  $b_{i,j}$  is the (i, j)-th entry of the base matrix (i = 1, 2, 3; j = 1, 2, ..., 21), the second column corresponding to the punctured VN, and the sixth and seventh columns (i.e., VN extension pattern) are repeated 8 times to constitute the last

TABLE IDECODING THRESHOLDS (IN dB) AND CORRESPONDING PE CYCLES (INtimes) OF THE RATE-0.9 IMARA CODE, AR4JA CODE, RAP-LDPCCODE, OARA CODE AND REGULAR CW-3 LDPC CODE OVER ANIM-BICM-ID MLC FLASH-MEMORY CHANNEL. THE CAPACITY LIMIT ISEQUAL TO 11.301 dB (PE = 30020).6

| Code Type      | $(E_b/N_0)_{\rm th}$ | $PE_{th}$ |
|----------------|----------------------|-----------|
| Proposed IMARA | 11.483               | 27310     |
| AR4JA [22]     | 11.600               | 25580     |
| RAP-LDPC [23]  | 11.696               | 24200     |
| OARA [19]      | 11.514               | 26850     |
| CW-3 LDPC [36] | 11.573               | 25980     |

16 columns. To keep the feature of low complexity for the proposed protograph code, we set the maximum value of the entry  $b_{i,j}$  to 3, i.e.,  $b_{i,j} \in \{0, 1, 2, 3\}$ .

After an exhaustive search using the proposed EPEXIT algorithm, an optimal base matrix with lowest decoding threshold and linear-minimum-distance-growth property, whose corresponding protograph code is referred to as *irregular-mapped accumulate-repeat-accumulate (IMARA) code*, is obtained as

$$\mathbf{B}_{\text{OPT},0.9} = \begin{vmatrix} 1 & 2 & 0 & 1 & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & 3 & 1 & 0 & 1 & 1 & 2 & \cdots & 1 & 2 \\ 0 & 1 & 2 & 3 & 1 & 2 & 1 & \cdots & 2 & 1 \end{vmatrix} .$$
(18)

Furthermore, one can easily extend the rate-0.9 IMARA code to the higher-rate IMARA codes by repeatedly appending the degree-3 VN pattern to the base matrix (18). It has been demonstrated in [22], [36] that adding degree-3 VNs into a rate-0.9 protograph can maintain the lowest-complexity feature for its corresponding higher-rate counterparts without deteriorating the linear-minimum-distance-growth property. As a result, the generic base matrix of size  $3 \times (2n + 5)$  corresponding to the family of rate-compatible IMARA codes with rates R = (n + 1)/(n + 2)  $(n \ge 8)$  is expressed by

$$\mathbf{B}_{\mathrm{IMARA}} = \begin{bmatrix} 2n & 2n & 2n \\ 1 & 2 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 3 & 1 & 0 & 1 & 1 & 2 & \cdots & 1 & 2 \\ 0 & 1 & 2 & 3 & 1 & 2 & 1 & \cdots & 2 & 1 \end{bmatrix}.$$
 (19)

The protograph structure of the IMARA codes is shown in Fig. 5(b).

To verify the convergence performance of the proposed high-rate code-design method, we consider the rate-0.9 IMARA code and show its corresponding decoding threshold in Table I. As shown, the proposed IMARA code possesses a lower decoding threshold (in dB) than the AR4JA code [22], the RAP-LDPC code in [23], the OARA code in [19], and the regular CW-3 LDPC code in [36], indicating that the proposed IMARA code outperforms the other four types of protograph/LDPC codes. In particular, the decoding threshold of the proposed IMARA code has a gap of only 0.182 dB to the capacity limit of an IM-BICM-ID flash-memory channel. Furthermore, by exploiting asymptotic weight distribution (AWD) function [22], [36], we have verified that the typical minimum distance ratios (TMDRs) of IMARA codes exist for all rates equal to and higher than 9/10 (e.g., 9/10 (n = 8), 10/11 (n = 9), ...), which guarantees the linear-minimum-distance-growth property of IMARA codes in the high-SNR region (i.e., low-PE region).<sup>7</sup> The above analysis illustrates that the proposed IMARA ensemble can not only perform best in the high-PE region among the five protograph/LDPC codes, but also exhibit desirable error performance in the low-PE region in the IM-BICM-ID systems.

*Remark:* We have also performed EPEXIT algorithm on the five types of protograph/LDPC codes with other code rates (e.g., R = 10/11) and have observed that the IMARA code achieves the lowest decoding threshold (in dB).

#### C. Performance Evaluation

To demonstrate the merit of the proposed IMARA-based IM-BICM-ID scheme, we carry out various simulations of the proposed IM-BICM-ID systems, conventional anti-Graymapped BICM-ID systems, and Gray-mapped BICM with non-iterative detection and decoding (BICM-NI) systems in an MLC flash-memory system. In the simulations, we assume that the transmitted codeword length is 4000, and the maximum numbers of outer iterations and inner iterations are 6 and 40, respectively. The simulations are implemented by using MATLAB, which is a well-known mathematical computing environment.

We first compares the BER performance of five different rate-0.9 protograph/LDPC codes in an IM-BICM-ID flashmemory system, which is shown in Fig. 6. As observed, when BER =  $10^{-6}$ , the IMARA code attains performance gains of about 1100 PE cycles, 1800 PE cycles, 500 PE cycles, and 1000 PE cycles compared to the AR4JA code, RAP-LDPC code, OARA code, and regular CW-3 LDPC code, respectively. Moreover, the above simulated results are reasonably consistent with the asymptotic performance analysis in Sect. IV-B.

Fig. 7 compares the BER performance of the proposed IMARA-based IM-BICM-ID scheme and conventional anti-Gray-mapped BICM-ID schemes with the five different rate-0.9 protograph codes in an MLC flash-memory system. According to this figure, at a BER of  $10^{-6}$ , the IMARA-based IM-BICM-ID scheme achieves a gain of 1000 PE cycles compared with the OARA-based BICM-ID scheme. Furthermore, the OARA code attains additional PE-cycle improvements by about 750 times, 1800 times, 2200 times and 700 times compared to the IMARA code, AR4JA code, RAP-LDPC code and regular CW-3 LDPC code, respectively, in the BICM-ID scenario.

Fig. 8 presents the BER results of the proposed IMARAbased IM-BICM-ID scheme and Gray-mapped BICM-NI schemes with five different rate-0.9 protograph codes in an MLC flash-memory system.<sup>8</sup>As shown, the IMARA-based IM-BICM-ID scheme not only outperforms its Gray-mapped

<sup>&</sup>lt;sup>6</sup>For an IM-BICM-ID flash-memory channel employing 6 read voltages, the achievable rate can be evaluated by applying (2). Through such a method, one can calculate the maximum number of PE cycles or the corresponding minimum SNR (i.e., the capacity limit) to realize reliable storage for any rate-R channel code in such a scenario.

<sup>&</sup>lt;sup>7</sup>According to [22], [36], the AWD can be used to predict the error performance of LDPC codes in the high-SNR region (i.e., low-PE region).

<sup>&</sup>lt;sup>8</sup>It is widely recognized that the Gray mapping exhibits the best error performance among all the existing mappings in the BICM-NI scenario.



Fig. 6. BER results of five different rate-0.9 protograph codes in an IM-BICM-ID flash-memory system.



Fig. 7. BER performance of the proposed IMARA-based IM-BICM-ID scheme and conventional anti-Gray-mapped BICM-ID schemes with five different rate-0.9 protograph codes over in an MLC flash-memory system.

BICM-NI counterpart, but also outperforms other four Graymapped BICM-NI schemes with the use of AR4JA, RAP-LDPC, IMARA, OARA, and regular CW-3 LDPC codes.

To verify the decoding-latency advantage of the proposed design, we show the average number of inner iterations required to decode each codeword for the IM-BICM-ID schemes and conventional BICM-ID schemes with five different protograph/LDPC codes versus PE cycles in Fig. 9. We can notice that in comparison with the conventional BICM-ID schemes, e.g., at PE = 22000, the average number of inner iterations required by the IM-BICM-ID scheme with a given protograph/LDPC code is reduced by about  $50\% \sim 60\%$ . Therefore, the decoding latency of the proposed IM-BICM-ID systems can be significantly reduced with respect to the conventional BICM-ID systems. Moreover, the IMARA code requires the lowest iteration number among all the protograph and LDPC codes in the IM-BICM-ID scenario.

To elaborate a little further, utilizing the EPEXIT algorithm, we observe that the initial extrinsic MI of detector in the IM-BICM-ID flash-memory systems is larger than that in the anti-Gray-mapped BICM-ID systems,<sup>9</sup> indicating that the detector

 $^{9}$ We refer to the extrinsic MI of detector in the first outer iteration (i.e., the *a priori* MI of the detector equals zero) as the initial extrinsic MI of detector.



Fig. 8. BER performance of the proposed rate-0.9 IMARA-based IM-BICM-ID scheme and Gray-mapped BICM-NI schemes with five different rate-0.9 protograph/LDPC codes in an MLC flash-memory system.



Fig. 9. Average number of inner iterations required to decode each codeword for the IM-BICM-ID schemes and conventional BICM-ID schemes with five different rate-0.9 protograph/LDPC codes in an MLC flash-memory system.

provides more reliable information for the decoder in the IM-BICM-ID flash-memory systems (with respect to the anti-Gray-mapped BICM-ID systems) during the first outer iteration. More importantly, the EPEXIT analysis also indicates that the IMARA code can feed back the largest extrinsic MI to the detector after inner iterations among all the protograph and LDPC codes in the IM-BICM-ID systems, indicating that the detector will provide more reliable information for the decoder with the usage of IMARA code (with respect to other four types of protograph/LDPC codes) during the remaining outer iterations. From the MI perspective, the proposed IM scheme and IMARA code significantly accelerate the convergence of decoding processor, and hence effectively reduce the decoding latency of the MLC flash-memory systems. Thereby, it is demonstrated that the proposed IMARA-based IM-BICM-ID scheme can exhibit excellent error performance as well as data transfer latency of the MLC flash-memory systems.

*Remark*: Although the transmitted codeword length is assumed as 4000 in the performance comparison of both Sect. IV-C and Sect. V-C, simulations have also been performed with other transmitted codeword lengths (e.g., 8000) to commendably validate the superiority of the proposed code-design and voltage-optimization schemes. Besides, BER

simulations have also been carried out for other code rates (e.g., R = 10/11), which verify that the relative performance among all the protograph and LDPC codes remains the same.

## V. READ-VOLTAGE OPTIMIZATION FOR IM-BICM-ID MLC FLASH-MEMORY SYSTEMS

In this section, we propose a novel *EPEXIT-aided read*voltage optimization scheme for IM-BICM-ID MLC flashmemory systems, which can substantially exploit the voltageregion iterative gain characteristics of IM-BICM-ID systems to acquire more accurate channel LLRs.

## A. Voltage-Region Iterative Gain Characteristics

For a given M-ary modulation constellation, the Hamming distance  $d_{Ham} \in \{1, 2, \dots, \log_2 M\}$  between two different labeling symbols is defined as the number of their different component bits. By exploiting the EPEXIT algorithm, one can observe that the MIs of the coded bits located in the overlapped region between the two adjacent symbols with the largest Hamming distance (i.e.,  $d_{Ham} = 2$ , referred to as dominant-gain region) can be significantly increased after outer iterations (i.e., iterations between the detector and decoder) in the proposed IM-BICM-ID flash-memory systems, while those located in other regions cannot do so. For instance, with a 6-level quantization scheme, the threshold voltages of MLC flash memory can be quantized into 7 regions, i.e., 4 data regions  $\mathcal{D}_1, \mathcal{D}_2, \mathcal{D}_3, \mathcal{D}_4$  and 3 erasure regions  $E_1, E_2, E_3$  (see Fig. 3), in which the performance of the 3 dominant error regions (i.e., erasure regions) largely determines the overall performance of the MLC flash-memory systems. As can be seen in Fig. 3, two adjacent threshold-voltage levels in Gray mapping differ in only one bit, whereas  $S_2$  and  $S_3$  in *anti-Gray* mapping differ in two bits, indicating that the erasure area between  $S_2$  and  $S_3$  is more prone to error than other regions. In Fig. 10, we exploit the EPEXIT algorithm to analyze the variation of the MIs of the rate-0.9 IMARA coded bits located in the seven regions versus the number of outer iterations at PE = 27000. As shown, the coded bits located in the second erasure region  $E_2$  (i.e., the region between  $S_2$  and  $S_3$ ) possess the lowest MI among the 7 regions at the beginning of outer iteration, but they obtain significant MI gains after performing more outer iterations. Nevertheless, the MIs of the coded bits located in the other six regions can only achieve tiny gains in such a scenario. Consequently, the accuracy of the LLR information in the second erasure region  $E_2$  is of great importance to guarantee the decoding performance of the IM-BICM-ID systems. The above phenomenon is referred to as voltage-region iterative gain characteristics of the IM-BICM-ID systems.

Based on the voltage-region iterative gain characteristics of IM-BICM-ID systems, we propose a scheme that can efficiently search for the best voltage entropy by minimizing the decoding threshold (in dB) with the help of the EPEXIT analysis. As mentioned in Sect. III-B, the decoding threshold over an MLC flash-memory channel is related to the readvoltage optimization scheme, the mixing ratio, and code type. Hence, the EPEXIT algorithm can also be utilized to evaluate



Fig. 10. Variation of the MIs of the rate-0.9 IMARA coded bits located in the seven regions versus the number of outer iterations at PE = 27000.

the quality of an IM-BICM-ID flash-memory channel for a given protograph code and a given mixing ratio.

## B. EPEXIT-Aided Read-Voltage Optimization Scheme

According to [12], [24], the entropy of a threshold voltage  $V_{\rm th}$  can be defined as

$$H(V_{\rm th}) = \sum_{u} \left( \frac{f_{S_u}(V_{\rm th})}{\sum_{u} f_{S_u}(V_{\rm th})} \right) \log \left( \frac{\sum_{u} f_{S_u}(V_{\rm th})}{f_{S_u}(V_{\rm th})} \right),$$
(20)

where  $u \in \{1, 2, 3, 4\}$  and  $f_{S_u}(V_{\text{th}})$  is the PDF of thresholdvoltage level  $S_u$ . In the erasure regions, it will lead to a relatively high entropy when  $V_{\text{th}}$  gets close to the harddecision read-voltage level (i.e. the boundary of two adjacent levels), and thus the regions with large entropies are called *dominating overlapped regions* in [12] or *high-entropy regions* in [24]. Our aim is to optimize the read-voltage levels for three erasure regions as a severe BER deterioration is incurred by the dominating overlapped regions in MLC flash memory. Moreover, since each erasure region includes two borders, at least 6 read-voltage levels are required to utilize for three erasure regions.

To facilitate the search of the optimal voltage entropy for the three erasure regions, an entropy parameter  $\theta \in (0, 1)$  is defined. In other words, the read-voltage levels can be obtained by solving

$$H(R_n) = \theta, \tag{21}$$

where n = 1, 2, ..., 6. In particular, the entropy-based readvoltage optimization scheme in [24] is used to obtain an optimal parameter  $\theta$  by minimizing the BER via simulations, which significantly increases the system complexity. In addition, another classic read-voltage optimization scheme, i.e., MMI scheme [30], exploits bisection search or quasi-convex optimization techniques to acquire the read-voltage levels that can maximize the MI (MMI) between the input and quantized output of flash-memory channel. However, both MMI and entropy-based read-voltage optimization schemes optimize the read-voltage levels based on a given threshold-voltage distribution in an MLC flash-memory system without ID framework and IM scheme, which cannot achieve the best performance in the proposed IM-BICM-ID systems. More importantly, the above two read-voltage optimization schemes do not take the voltage-region iterative gain characteristics into account, and thus adopt the same memory-sensing precision (i.e., two readvoltage levels) for each erasure region in MLC flash-memory systems, i.e., 6 read-voltage levels in total.

Benefiting from the voltage-region iterative gain characteristics in IM-BICM-ID MLC flash-memory systems, an additional performance gain can be achieved by boosting the memory-sensing precision (i.e., increasing the number of read-voltage levels) for the dominant-gain region (i.e., erasure region  $E_2$ ) in the IM-BICM-ID systems. To boost the memory-sensing precision without introducing excessive memory-sensing latency, one can only increase one additional read-voltage level for the dominant-gain region. In the following, we will exploit EPEXIT analysis to further demonstrate that adding only one read-voltage level for the dominant-gain region can achieve significant decoding-threshold gains with respect to other erasure regions in the IM-BICM-ID systems. Note also that we have conducted similar analysis by increasing two additional read-voltage levels for the dominant-gain region and have found that larger decoding-threshold gains can be obtained at the price of introducing more memory-sensing latency.

Based on the above discussion, a new EPEXIT-aided readvoltage optimization scheme for IM-BICM-ID systems (where the proposed rate-0.9 IMARA code is employed) is described as follows.

- 1) Acquiring two read-voltage levels for each erasure region as its borders based on EPEXIT analysis: We first set the voltage entropy of the six memory-sensing levels to  $\theta_1$ , i.e.,  $H(R_n) = \theta_1$  for  $n = 1, 2, \dots, 6$ . After an exhaustive search employing the proposed EPEXIT algorithm, the lowest decoding threshold (in dB) is reached at  $\theta_1 = 0.4$  and the corresponding decoding threshold is 11.483 dB (27310 PE cycles), which is the same as that measured by the EPEXIT algorithm with the MMI readvoltage optimization scheme (see Table I).
- 2) Setting one additional hard-decision read-voltage level  $V_{h_2}$  for dominant-gain region (i.e., erasure region  $E_2$ ): We can obtain the hard-decision read-voltage level  $V_{h_2}$  by solving the common intersection of two adjacent threshold-voltage PDFs, i.e., by solving the equation  $f_{S_2}(V_{h_2}) = f_{S_3}(V_{h_2})$ . Exploiting the proposed EPEXIT algorithm, we observe that adding one additional readvoltage level for different erasure regions can achieve different decoding-threshold improvements, which is shown in Table II, where  $V_{h_1}$  and  $V_{h_3}$  are the hard-decision read-voltage levels in the erasure region  $E_1$  and  $E_3$ , respectively. As seen, compared with adding  $V_{h_1}$  for erasure region  $E_1$  or  $V_{h_3}$  for erasure region  $E_3$ , adding  $V_{h_2}$  for erasure region  $E_2$  can obtain a more noticeable threshold gain, which further verifies that the LLR information accuracy of the dominant-gain region  $E_2$  is most important in determining the decoding performance of the IM-BICM-ID systems. Consequently, there are totally 7 read-voltage levels employed in the proposed IM-BICM-ID systems. In detail, the dominant-gain region  $E_2$  contains one hard-decision read-voltage level and

#### TABLE II DECODING THRESHOLDS (IN dB) AND CORRESPONDING PE CYCLES (IN

times) of the rate-0.9 IMARA code over an IM-BICM-ID MLC FLASH-MEMORY CHANNEL WITH AN ADDITIONAL HARD-DECISION READ-VOLTAGE LEVEL  $V_{h_1}$ ,  $V_{h_2}$  and  $V_{h_3}$ .

| Hard-decision read voltage | $(E_b/N_0)_{\rm th}$ | $PE_{th}$ |
|----------------------------|----------------------|-----------|
| $V_{h_1}$                  | 11.469               | 27510     |
| $V_{h_2}$                  | 11.430               | 28090     |
| $V_{h_2}$                  | 11.464               | 27590     |

two borders, while both erasure region  $E_1$  and  $E_3$  only contain two borders.

3) Re-optimizing the borders of dominant-gain region (i.e.,  $R_3$  and  $R_4$ ) to further lower the decoding threshold of the protograph-coded IM-BICM-ID scheme: To be specific, we set the voltage entropy of  $R_3$  and  $R_4$  to  $\theta_2$ , i.e.,  $H(R_3) = H(R_4) = \theta_2$ . After a simple search with the help of EPEXIT algorithm, the lowest decoding threshold can be obtained when  $\theta_2 = 0.25$ , which is equal to 11.405 dB (28470 PE cycles). In consequence, the proposed EPEXIT-aided read-voltage optimization scheme benefits from gains of about 0.078 dB (1160 PE cycles) and 0.084 dB (1260 PE cycles) with respect to the MMI read-voltage optimization scheme (11.483 dB, 27310 PE cycles) [30] and entropy-based read-voltage optimization scheme (11.489 dB, 27210 PE cycles) [24], respectively.

Remark: Although we consider a rate-0.9 IMARA code in the description of the proposed read-voltage optimization, the method is also applicable to other code rates and other types of protograph/LDPC codes.

## C. Performance Evaluation

To verify the merit of our design, we compare the error performance of the proposed read-voltage optimization scheme, existing MMI read-voltage optimization scheme [30] and entropy-based read-voltage optimization scheme [24] in the IM-BICM-ID systems.

Fig. 11(a) plots the BER curves of the proposed readvoltage optimization scheme and MMI read-voltage optimization scheme with five different rate-0.9 protograph/LDPC codes in an IM-BICM-ID system. As can be seen, at a BER of  $10^{-6}$ , the proposed EPEXIT-aided read-voltage optimization scheme employing the IMARA code can achieve about a 1000-PE-cycle improvement compared with the MMI scheme. Moreover, for the other four protograph/LDPC codes, i.e., the AR4JA code, RAP-LDPC code, OARA code and regular CW-3 LDPC code, the EPEXIT-aided read-voltage optimization scheme can also accomplish a 1000-PE-cycle improvement over the MMI scheme at a BER of  $10^{-6}$ , which indicates that the proposed read-voltage optimization scheme is effective for all protograph/LDPC codes.

As a further insight, we present the average number of inner iterations required to decode each codeword for the proposed read-voltage optimization scheme and MMI read-voltage optimization scheme with five different rate-0.9 protograph/LDPC codes versus PE cycles in Fig. 12(a). It can be observed that compared with the MMI scheme, at PE = 24000, the average



Fig. 11. BER results of the proposed EPEXIT-aided read-voltage optimization scheme, (a) MMI scheme and (b) entropy-based scheme with five different rate-0.9 protograph codes in an IM-BICM-ID flash-memory system.

number of inner iterations required by the proposed readvoltage optimization scheme with a given protograph/LDPC code is reduced by about 35%, meaning that the flash-tocontroller decoding latency of former can be significantly reduced. Similarly, the IMARA code achieves the lowest average number of inner iterations among all the protograph and LDPC codes under the proposed read-voltage optimization scheme, which further confirms the superiority of our design in the IM-BICM-ID MLC flash-memory systems.

The error-rate and decoding-latency comparison results of the proposed read-voltage optimization scheme and entropy-based read-voltage optimization scheme are shown in Fig. 11(b) and Fig. 12(b), respectively. As seen, the error performance and decoding latency of the entropy-based readvoltage optimization scheme are close to that of the MMI read-voltage optimization scheme, both of which are obviously inferior to the proposed read-voltage optimization scheme.

Overall, it is demonstrated that the proposed EPEXIT-aided read-voltage optimization scheme can not only significantly boost the performance of the IM-BICM-ID systems, but also further reduce the decoding latency at the price of introducing a slight memory-sensing latency (i.e., adding one quantization level) compared with the state-of-the-art MMI readvoltage optimization and entropy-based read-voltage optimiza-



Fig. 12. Average number of inner iterations required to decode each codeword for the proposed EPEXIT-aided read-voltage optimization scheme, (a) MMI scheme and (b) entropy-based scheme with five different rate-0.9 protograph codes in an IM-BICM-ID flash-memory system.

tion schemes.

*Remark:* The proposed irregular mapping, code-design method and read-voltage optimization scheme can be applied to TLC flash-memory systems after appropriate modifications.

#### VI. CONCLUSIONS

To overcome the disadvantage of high decoding complexity and long decoding latency of the conventional BICM-ID MLC flash-memory systems, we proposed a new protograph-coded BICM-ID using irregular mapping, referred to as IM-BICM-ID, in the MLC flash-memory systems. To facilitate the design of protograph codes for IM-BICM-ID flash-memory systems, we developed an enhanced PEXIT (EPEXIT) analytical algorithm. Based on the EPEXIT algorithm, a novel design method was conceived for the construction of a family of high-rate protograph codes, named as IMARA codes, with the lowest decoding thresholds and desirable linear-minimum-distancegrowth property. Furthermore, motivated by the voltage-region iterative gain characteristics of such systems, an EPEXITaided read-voltage optimization scheme was proposed to help minimizing the decoding thresholds of protograph codes. Simulation results demonstrated that the proposed IMARAbased IM-BICM-ID scheme and proposed read-voltage optimization scheme significantly outperform the state-of-the-art counterparts in terms of convergence and error performance. Owing to the advantages of high reliability and efficiency, the proposed protograph-coded IM-BICM-ID flash memory can be envisioned as a promising massive-data-storage solution for the 6G-enabled mobile networks, such as vehicular networks.

#### REFERENCES

- [1] K. Gao, C. Xu, P. Zhang, J. Qin, L. Zhong, and G.-M. Muntean, "GCH-MV: Game-enhanced compensation handover scheme for multipath TCP in 6G software defined vehicular networks," *IEEE Trans. Veh. Technol.*, vol. 69, no. 12, pp. 16142–16154, Dec. 2020.
- [2] S. Mumtaz, H. Lundqvist, K. M. S. Huq, J. Rodriguez, and A. Radwan, "Smart direct-LTE communication: An energy saving perspective," *Ad Hoc Netw.*, vol. 13, no. 1, pp. 296–311, Feb. 2014.
- [3] C. Feng, K. Yu, M. Aloqaily, M. Alazab, Z. Lv, and S. Mumtaz, "Attribute-based encryption with parallel outsourced decryption for edge intelligent IoV," *IEEE Trans. Veh. Technol.*, vol. 69, no. 11, pp. 13784– 13795, Nov. 2020.
- [4] B. Ji, X. Zhang, S. Mumtaz, C. Han, C. Li, H. Wen, and D. Wang, "Survey on the internet of vehicles: Network architectures and applications," *IEEE Commun. Standards Mag.*, vol. 4, no. 1, pp. 34–41, Mar. 2020.
- [5] Y. Di, L. Shi, C. Gao, Q. Li, C. J. Xue, and K. Wu, "Minimizing retention induced refresh through exploiting process variation of flash memory," *IEEE Trans. Comput.*, vol. 68, no. 1, pp. 83–98, Jan. 2019.
- [6] J. Navarro-Ortiz, P. Romero-Diaz, S. Sendra, P. Ameigeiras, J. J. Ramos-Munoz, and J. M. Lopez-Soler, "A survey on 5G usage scenarios and traffic models," *IEEE Commun. Surveys Tuts.*, vol. 22, no. 2, pp. 905– 929, 2nd Quart. 2020.
- [7] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, "6G wireless networks: Vision, requirements, architecture, and key technologies," *IEEE Veh. Technol. Mag.*, vol. 14, no. 3, pp. 28–41, Mar. 2019.
- [8] K. Wei, J. Li, L. Kong, F. Shu, and F. C. M. Lau, "Page-based dynamic partitioning scheduling for LDPC decoding in MLC NAND flash memory," *IEEE Trans. Circuits Syst. II*, vol. 66, no. 12, pp. 2082– 2086, Dec. 2019.
- [9] W. Lee, M. Kang, S. Hong, and S. Kim, "Interpage-based enduranceenhancing lower state encoding for MLC and TLC flash memory storages," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, no. 9, pp. 2033–2045, Sep. 2019.
- [10] L. Dolecek and Y. Cassuto, "Channel coding for nonvolatile memory technologies: Theoretical advances and practical considerations," *Proc. IEEE*, vol. 105, no. 9, pp. 1705–1724, Sep. 2017.
- [11] C. Yang, Y. Emre, and C. Chakrabarti, "Product code schemes for error correction in MLC NAND flash memories," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 12, pp. 2302–2314, Dec. 2012.
- [12] G. Dong, N. Xie, and T. Zhang, "On the use of soft-decision errorcorrection codes in NAND flash memory," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 2, pp. 429–439, Feb. 2011.
- [13] H. Lee, J. Shy, Y. Chen, and Y. Ueng, "LDPC coded modulation for TLC flash memory," in *Proc. 2017 IEEE Inf. Theory Workshop (ITW)*, Nov. 2017, pp. 204–208.
- [14] W. Shao, J. Sha, and C. Zhang, "Dispersed array LDPC codes and decoder architecture for NAND flash memory," *IEEE Trans. Circuits Syst. II*, vol. 65, no. 8, pp. 1014–1018, Aug. 2018.
- [15] A. Hareedy, C. Lanka, N. Guo, and L. Dolecek, "A combinatorial methodology for optimizing non-binary graph-based codes: Theoretical analysis and applications in data storage," *IEEE Trans. Inf. Theory*, vol. 65, no. 4, pp. 2128–2154, Apr. 2019.
- [16] Y. Liao, C. Lin, H. Chang, and S. Lin, "A (21150, 19050) GC-LDPC decoder for NAND flash applications," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 66, no. 3, pp. 1219–1230, Mar. 2019.
- [17] G. Caire, G. Taricco, and E. Biglieri, "Bit-interleaved coded modulation," *IEEE Trans. Inf. Theory*, vol. 44, no. 3, pp. 927–946, May 1998.
- [18] Y. Fang, G. Zhang, G. Cai, F. C. M. Lau, and *et al.*, "Root-protographbased BICM-ID: A reliable and efficient transmission solution for blockfading channels," *IEEE Trans. Commun.*, vol. 67, no. 9, pp. 1–19, Sep. 2019.
- [19] Y. Bu, Y. Fang, G. Han, S. Mumtaz, and M. Guizani, "Design of protograph-LDPC-based BICM-ID for multi-level-cell (MLC) NAND flash memory," *IEEE Commun. Lett.*, vol. 23, no. 7, pp. 1127–1131, Jul. 2019.

- [20] Y. Fang, G. Han, G. Cai, F. C. M. Lau, and *et al.*, "Design guidelines of low-density parity-check codes for magnetic recording systems," *IEEE Commun. Surveys Tuts.*, vol. 20, no. 2, pp. 1574–1606, 2nd Quart. 2018.
- [21] L. Dai, Y. Fang, Z. Yang, P. Chen, and Y. Li, "Protograph LDPC-coded BICM-ID with irregular CSK mapping in visible light communication systems," *IEEE Trans. Veh. Technol.*, vol. 70, no. 10, pp. 11033–11038, Oct. 2021.
- [22] D. Divsalar, S. Dolinar, C. R. Jones, and K. Andrews, "Capacityapproaching protograph codes," *IEEE J. Sel. Areas Commun.*, vol. 27, no. 6, pp. 876–888, Aug. 2009.
- [23] P. Chen, K. Cai, and S. Zheng, "Rate-adaptive protograph LDPC codes for multi-level-cell (MLC) NAND flash memory," *IEEE Commun. Lett.*, vol. 22, no. 6, pp. 1112–1115, Jun. 2018.
- [24] C. A. Aslam, Y. L. Guan, and K. Cai, "Read and write voltage signal optimization for multi-level-cell (MLC) NAND flash memory," *IEEE Trans. Commun.*, vol. 64, no. 4, pp. 1613–1623, Apr. 2016.
  [25] G. Dong, N. Xie, and T. Zhang, "Enabling NAND flash memory use
- [25] G. Dong, N. Xie, and T. Zhang, "Enabling NAND flash memory use soft-decision error correction codes at minimal read latency overhead," *IEEE Trans. Circuits Syst. 1*, vol. 60, no. 9, pp. 2412–2421, Sep. 2013.
- [26] G. Cai, Y. Fang, P. Chen, G. Han, G. Cai, and Y. Song, "Design of an MISO-SWIPT-aided code-index modulated multi-carrier *M*-DCSK system for e-health IoT," *IEEE J. Sel. Areas Commun.*, vol. 39, no. 2, pp. 311–324, Feb. 2021.
- [27] L. Szczecinski, H. Chafnaji, and C. Hermosilla, "Modulation doping for iterative demapping of bit-interleaved coded modulation," *IEEE Commun. Lett.*, vol. 9, no. 12, pp. 1031–1033, Dec. 2005.
- [28] Z. Liu, K. Peng, T. Cheng, and Z. Wang, "Irregular mapping and its application in bit-interleaved LDPC coded modulation with iterative demapping and decoding," *IEEE Trans. Broadcast.*, vol. 57, no. 3, pp. 707–712, Sep. 2011.
- [29] Q. Wang, C. Zhang, and J. Dai, "Irregular mapping design for bitinterleaved coded modulation with low complexity iterative decoding," in *Proc. Int. Conf. Electron. Inf. Emergency Commun.*, Jun. 2016, pp. 42–45.
- [30] J. Wang, K. Vakilinia, T. Chen, T. Courtade, G. Dong, T. Zhang, H. Shankar, and R. Wesel, "Enhanced precision through multiple reads for LDPC decoding in flash memories," *IEEE J. Sel. Areas Commun.*, vol. 32, no. 5, pp. 880–891, May 2014.
- [31] C. A. Aslam, Y. L. Guan, and K. Cai, "Decision-directed retentionfailure recovery with channel update for MLC NAND flash memory," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 65, no. 1, pp. 353–365, Jan. 2018.
- [32] M. Asadi, X. Huang, A. Kavcic, and N. P. Santhanam, "Optimal detector for multilevel NAND flash memory channels with intercell interference," *IEEE J. Sel. Areas Commun.*, vol. 32, no. 5, pp. 825–835, May 2014.
- [33] S. Yang, L. Wang, Y. Fang, and P. Chen, "Performance of improved AR3A code over EPR4 channel," in *Proc. Int. Conf. Computer Research* and Development (ICCRD), vol. 2, Mar. 2011, pp. 60–64.
- [34] G. Liva and M. Chiani, "Protograph LDPC codes design based on EXIT analysis," in *Proc. IEEE Global Commun. Conf.*, Nov. 2007, pp. 3250– 3254.
- [35] S. ten Brink, G. Kramer, and A. Ashikhmin, "Design of low-density parity-check codes for modulation and detection," *IEEE Trans. Commun.*, vol. 52, no. 4, pp. 670–678, Apr. 2004.
- [36] Y. Fang, G. Bi, Y. L. Guan, and F. C. M. Lau, "A survey on protograph LDPC codes and their applications," *IEEE Commun. Surveys Tuts.*, vol. 17, no. 4, pp. 1989–2016, Fourth Quarter 2015.
- [37] T. V. Nguyen, A. Nosratinia, and D. Divsalar, "The design of ratecompatible protograph LDPC codes," *IEEE Trans. Commun.*, vol. 60, no. 10, pp. 2841–2850, Oct. 2012.



Yi Fang (Member, IEEE) received the Ph.D. degree in communication engineering, Xiamen University, China, in 2013. From May 2012 to July 2012, He was a Research Assistant in electronic and information engineering, Hong Kong Polytechnic University, Hong Kong. From September 2012 to September 2013, he was a Visiting Scholar in electronic and electrical engineering, University College London, UK. From February 2014 to February 2015, he was a Research Fellow at the School of Electrical and Electronic Engineering, Nanyang Technological

University, Singapore. He is currently a Full Professor and Vice Dean at the School of Information Engineering, Guangdong University of Technology, China. He served as the Publicity Co-Chair of the International Symposium on Turbo Codes and Iterative Information Processing 2018. His current research interests include information and coding theory, spread-spectrum modulation, and cooperative communications.



Sattam Al Otaibi is currently the Head of the Innovation and Entrepreneurship Center, Taif University, Saudi Arabia. He is a Researcher and an Academician specializing in electrical engineering and nanotechnology. His practical experience in the field of industry, education, and scientific research has been formed through his research work and through his mobility among many companies, institutions, and universities as well as active participation in research centers that resulted in many scientific researches published in refereed scientific bodies.



Yingcheng Bu received the B.E. degree in communication engineering from Guangdong University of Technology, China, in 2018. He is currently pursuing the master degree in the Department of Communication Engineering, Guangdong University of Technology, China. His primary research interests include channel coding and signal processing for data storage.



**Pingping Chen** (Member, IEEE) received the Ph.D. degree in electronic engineering from Xiamen University, China, in 2013. In 2012, he was a Research Assistant in electronic and information engineering with The Hong Kong Polytechnic University, Hong Kong. From 2013 to 2015, he was a Post-Doctoral Fellow at the Institute of Network Coding, The Chinese University of Hong Kong, Hong Kong. He is currently a full Professor with Fuzhou University, China. His primary research interests include channel coding, joint source and channel coding, network

coding, and UWB communications.



**Francis C. M. Lau** (Fellow, IEEE) received the BEng (Hons) degree in electrical and electronic engineering and the PhD degree from King's College London, University of London, UK. He is a Professor at the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong. He is also a Fellow of IEEE and a Fellow of IET.

He is a co-author of two research monographs and a co-holder of five US patents and one pending US patent. He has published more than 320

papers. His main research interests include channel coding, chaos-based digital communications, complex network, cooperative networks and wireless communications. He is a co-recipient of one Natural Science Award from the Guangdong Provincial Government, China; and eight best/outstanding conference paper awards.

He was the General Co-chair of International Symposium on Turbo Codes & Iterative Information Processing (2018) and the Chair of Technical Committee on Nonlinear Circuits and Systems, IEEE Circuits and Systems Society (2012-13). He served as an associate editor for IEEE Transactions on Circuits and Systems II (2004-2005 and 2015-2019), IEEE Transactions on Circuits and Systems I (2006-2007), and IEEE Circuits and Systems Magazine (2012-2015). He has been a guest associate editor of International Journal and Bifurcation and Chaos since 2010.