# **Performance Evaluation of Index-Less Indexed Flash Codes for Non-Uniform Write Operations**

**Y**UICHI КАЈІ<sup>1,а)</sup>

**Abstract:** A random-walk model is investigated and utilized to analyze the performance of a coding scheme that aims to extend the lifetime of flash memory. Flash memory is widely used in various products today, but the cells that constitute flash memory wear out as they experience many operations. This issue can be mitigated by employing a clever coding scheme that is known as a flash code. The purpose of this study is to establish a well-defined random-walk model of a flash code that is known as an index-less indexed flash code (ILIFC), and clarify the expected performance of ILIFC. Preliminary study has been made by the author for a simplified model of data operation, and the contribution of this study is to extend the model of data operation to more general and practical one. Mathematical properties of the random-walk model is reconsidered, and useful properties are derived that help analyzing the performance of ILIFC both in non-asymptotic and asymptotic scenarios.

Keywords: flash codes, flash memory, random-walk model, index-less indexed flash codes, coding for storage

# 1. Introduction

Flash memory is widely used in a number of electronic products today, but data recording in flash memory is not a simple issue. Flash memory consists of many flash cells that can store electric charge in their floating gates. Fig. 1 illustrates the simplified structure of flash cells[10]. Each cell has its own control gate, and several cells (three cells in Fig. 1, but as many as  $10^{18}$  to 10<sup>20</sup> cells in practical products[8]) share a single *basement* and a back gate. The collection of cells that share a back gate is called an erase block because of the reason we mention later. There are insulators that separate floating gates from control gates and the basement, and therefore floating gates are electrically isolated. Basically, the amount of charge that is stored in the floating gate represents the value of the cell. In this study, we let the cell value be an integer in  $A_q = \{0, \dots, q-1\}$ , where q is the resolution of the quantization of the charge. If we apply positive voltage to a control gate, then, thanks to the quantum tunnel effect, electrons in the basement "tunnels through" the insulator and added to the floating gate. This operation is called a cell programming, and is



<sup>1</sup> Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0101, Japan

<sup>a)</sup> kaji@is.naist.jp

used to raise the value of a selected cell. The charge in a floating gate remains for a long period of time even if the voltage to the control gate is removed, which makes flash memory nonvolatile. An interesting aspect of flash memory is that the operation to remove charge from a floating gate is not symmetrical to the cell programming; the charge in a floating gate is removed by applying positive voltage to the back gate, but this operation affects all cells in the same erase block. With this *block erase* operation, values of all cells in the erase block are reset to 0. The block erasure is a quite strong electric operation, and it can deteriorate the insulators of the affected cells. The deteriorated insulator cannot prevent the charge in a floating gate from leaking out, and it is said that flash cells that have experienced thousands of block erasures are no more suitable for data recording[8]. In this sense, flash memory has finite lifetime in principle.

A number of efforts have been made to extend the lifetime span of flash memory products. For example, many flash memory products are equipped with the mechanism that is known as *wear-leveling*[9]. The wear-leveling is a kind of scheduling algorithm, and contributes to prevent excessive use of small number of specific cells. Recent operating systems inform flash memory controller of higher-level information of data operation, where the controller makes use of the information to avoid needless operations. The TRIM command of the Windows operating system[12] is an example of this device. We can also investigate a data structure that is suitable for flash memory. For example, the file-system architectures of JFFS and JFFS2 are designed to avoid "in-place" rewritings of journal files that are essential in hierarchical file-systems[14].

The use of *flash codes* can be regarded as one of such attempts to extend the lifetime span of flash memory products. The purpose of flash codes is to give a clever way of data representation and data operations in a flash memory. Historically saying, a flash code can be regarded as a descendant of WOM (write-once memory) codes that was proposed by Rivest in early 1980s[11]. However, today's framework of flash codes owes much to the studies of Jiang et al.[3], [4], in which *K*-bit *data* is stored in one erase block with *N* flash cells, and the *K*-bit data is updated by a *write operation* which randomly selects one of *K*-bits of the data and flips the binary value of the selected bit. A flash code is designed to accommodate as many write operations as possible between two consecutive block erasures. Jiang et al. proposed flash codes which allows more number of write operations than naive coding scheme[3], and extended the idea in [4]. Mahdavifar et al. improved the idea of [4], and proposed *index-less indexed flash codes* (*ILIFC*)[8].

The purpose of this study is to analyze the expected performance of ILIFC. In the traditional discussion of WOM codes and flash codes, the worst-case performance has been regarded as significant. The worst-case performance gives the guarantee of the lifetime of a memory product in the most unfortunate case, which is especially important when the memory is really "write-once". However, as stated previously, flash cells today endure thousands of block erasures. It is quite unlikely that the most unfortunate scenario is repeated for thousands times, and the expected performance should have strong and direct relation to the lifetime of mass-produced flash memory products[2], [7]. In [2], the expected performance is discussed in terms of "cost" of moves of a certain Markov chain model, and a flash code with good expected performance was proposed. The code construction is further improved in [7], but these two studies do not discuss ILIFC. Suzuki considered to improve the expected performance of IL-IFC, and applied the Markov chain formalization of [2] in their analysis[13]. The formalization contributes to estimate the expected performance of ILIFC with small parameters, though, the approach seems not scalable because we need to construct and analyze a Markov model whose size is exponential in N. Kaji modeled the behavior of ILIFC as a multi-token cyclic randomwalk model, and clarified the expected performance of ILIFC in a uniform writing scenario in which it is assumed that K data bits are selected by write operations with an equal probability[6].

The contribution of this study is to relax the assumption in [6], and to extend the discussion in [6] to the non-uniform writing scenario. In many applications of data processing, data to be handled is not homogeneous. In a practical file-systems, for example, some files are updated frequently, while many files remain unchanged for long time. When a user edits an ASCII encoded text file, the most significant bit hardly changes while the other seven bits have great probabilities to be modified. This kind of bias is common in computer system, and the performance of flash codes for such non-uniform writing scenario has practical importance. Fortunately, the random-walk model that was developed in [6] is still effective for this extended scenario. Unfortunately, however, the analysis techniques that was used in [6] is no more available because bits of the data have different statistical characteristics. In this study, we reorganize the mathematical discussion in [6], and refine the analysis technique so that it can be adopted to the more general class of the problem.

# 2. Preliminary

## 2.1 Flash Codes

Flash codes and related notions are briefly reviewed in this section. See [3] for detailed discussion and background issues related to these preliminary. An erase block of a flash memory is an array of N cells, where a cell is an element which stores an integer value in  $A_q = \{0, \dots, q-1\}$  where q is an integer greater than one. A cell is said to be *empty* (resp. *full*) if its value is 0 (resp. q - 1). Cells in an erase block are ordered, and the value of the *i*-th cell with  $0 \le i < N$  is denoted by  $c_i$ . A Tuple  $(c_0, \ldots, c_{N-1}) \in A_a^N$  is used to represent the contents of cells, and called a *state* of the erase block. For two states  $\mathbf{c} = (c_0, \dots, c_{N-1})$ and  $\mathbf{c}' = (c'_0, \ldots, c'_{N-1})$ , we write  $\mathbf{c} \leq \mathbf{c}'$  if  $c_i \leq c'_i$  for all  $0 \leq i < N$ , and  $\mathbf{c} < \mathbf{c}'$  if  $\mathbf{c} \le \mathbf{c}'$  and  $\mathbf{c} \neq \mathbf{c}'$ . The notion of "states" and " $\prec$ " are extended to subsets of cells in a natural manner. In an erase block with *N* cells, we store a value a *K*-bit binary *data*  $(b_0, \ldots, b_{K-1})$ . The value of the data is changed through a write operation, which probabilistically selects one of bits of the data and flips the binary value of the selected bit. This is a simplified model of the data operation for a memory with the striping architecture. In the following discussion, we write  $p_i$  with  $0 \le i < K$  for the probability of the *i*-th bit of the data be selected by a write operation.

The purpose of a *flash code* is to give correspondence between  $\{0, 1\}^K$  (the values of the K-bit data) and  $A_q^N$  (states of N cells), and to determine how to operate cell values for a requested write operation. Indeed, a flash code is defined as a pair of functions  $C = (\mathcal{E}, \mathcal{D})$ . The decoding function  $\mathcal{D}$  is a mapping from  $A_q^N$  to  $\{0,1\}^K$ , and used to translate the state of the block to a K-bit data value. The encoding function  $\mathcal{E}$  is a mapping from  $\{0, \ldots, K-1\} \times A_q^N$  to  $A_q^N \cup \{E\}$ , where E is a special symbol which is called *block erasure*, and determines how to operate cell values. It is required that, if  $\mathbf{c}' = \mathcal{E}(i; \mathbf{c})$  and  $\mathbf{c}' \neq \mathbf{E}$ , then  $\mathbf{c} < \mathbf{c}'$ , and  $\mathcal{D}(\mathbf{c})$ and  $\mathcal{D}(\mathbf{c}')$  differ at the *i*-th bit position only. If there is no  $\mathbf{c}'$  that satisfies the above conditions, then  $\mathcal{E}$  must return E. It is assumed that, at the initial moment, all cells are empty and the K-bit data is  $(0, \ldots, 0)$ . Write operations are then performed repeatedly, and the encoding function  $\mathcal{E}$  is executed for each write operation. Because the state of the block increases monotonically with respect to  $\prec$ , the encoding function eventually returns E. Let T denote the number of write operations that were accommodated before  $\mathcal{E}$  returns E (i.e. E is returned at the T + 1-th call of  $\mathcal{E}$ ). In general, the value of T depends on the bit positions that were selected by the write operations, and hence T is regarded as a random variable. It is understood that  $T \leq N(q-1)$ , and therefore

$$\Delta = N(q-1) - T$$

which is called a *write deficiency*, indicates the "overhead" that was induced by using the flash code. Obviously, a smaller value of  $\Delta$  is more favorable. The maximum  $\Delta$  is called the worst-case write deficiency, and the expected value of  $\Delta$  is called the expected write deficiency.

#### 2.2 Index-Less Indexed Flash Codes

An *index-less indexed flash code (ILIFC)*[8] has two different "stages" in encoding. Asymptotically saying, using both of the

first and the second stages is good to reduce the worst-case write deficiency[8]. However, the asymptotic difference is so small that the use of the second stage gives very little contribution for practical choices of N. Indeed it often happens that the performance is improved by omitting the second stage of ILIFC[6]. For this reason, we consider ILIFC with the first stage only.

In ILIFC, N cells in an erase block are divided into slices with each slice consists of K cells. For simplicity, we assume that N is a multiple of K and there are N/K slices in one erase block. We also assume that K(q-1) is an even number, which is essential in ILIFC. A slice consists of K cells  $\mathbf{s}_m = (c_{mK}, \dots, c_{mK+K-1})$  with  $0 \le m < N/K$ . A slice is *empty* (resp. *full*) if its state is (0, ..., 0)(resp.  $(q-1, \ldots, q-1)$ ). A slice which is neither of empty nor full is said to be active. The weight of a slice is defined as the sum of values of cells in the slice. The key idea of ILIFC is to devise a coding scheme which allows a slice to simultaneously represent the value and the index of one of bits of the data. For integers i and w with  $0 \le i < K$  and  $0 \le w \le K(q-1)$ , let define  $\mathbf{c}_w^{[i]}$  as follows;

- c<sub>0</sub><sup>(0)</sup> = (0,...,0),
   c<sub>w+1</sub><sup>(0)</sup> is obtained from c<sub>w</sub><sup>(0)</sup> by increasing the value of the leftmost nonempty cell in c<sub>w</sub><sup>(0)</sup> by one, and
- $\mathbf{c}_{w}^{[i+1]}$  is obtained from  $\mathbf{c}_{w}^{[i]}$  by cyclically shifting  $\mathbf{c}_{w}^{[i]}$  to the right direction by one position.

For example, if q = 3 and K = 4, then  $\mathbf{c}_0^{[0]}, \dots, \mathbf{c}_{K(q-1)}^{[0]}$  are

# 0000, 1000, 2000, 2100, 2200, 2210, 2220, 2221, 2222,

respectively. States  $\mathbf{c}_0^{[2]}, \ldots, \mathbf{c}_{K(q-1)}^{[2]}$  are obtained by cyclically shifting the above states by two-bits each;

# 0000, 0010, 0020, 0021, 0022, 1022, 2022, 2122, 2222.

If the state of a slice equals to  $\mathbf{c}_{w}^{[i]}$ , then we say that the slice has an index i and a binary value (w mod 2). An update is an operation to modify the state of a slice from  $\mathbf{c}_{w}^{[i]}$  to  $\mathbf{c}_{w+1}^{[i]}$  where  $0 \le i < K$ and  $0 \le w < K(q-1)$ . Note that an update operation increases the weight of a slice by one, and flips the binary-value of the slice. The index of the slice stays unchanged by update operations.

ILIFC manages cell values in such a way that the *i*-th bit of the K-bit data is 1 if and only if there is an active slice whose index is *i* and whose binary value is 1. If there is no active slice with index *i*, or, if there is an active slice with index *i* but its binary value is 0, then the *i*-th bit of the data is interpreted as 0. Consider for example that a value  $(b_0, b_1, b_2, b_3)$  of the K-bit data is recorded as the state in Fig. 2(a), where we assume that N = 24, K = 4 and q = 4. In this case N = 24 cells are divided into six slices with K = 4 cells each as in (b) in the figure. We can determine that the slice  $s_0$  has an index 1 and a binary value of 1, which means that  $b_1 = 1$ . The slices  $s_1$  and  $s_3$  have indexes 0 and 2, respectively, but their binary values are both 0. Consequently we have  $b_0 = b_2 = 0$ . There is no active slice with index 3 in Fig. 2(b), and  $b_3$  is interpreted as 0. Summarizing, the data value is  $(b_0, b_1, b_2, b_3) = (0, 1, 1, 0)$ .

The encoding function  $\mathcal{E}$  operates cell values so that the state is consistent with the current value of the K-bit data. Consider that a write operations requests to flip the *i*-th bit of the data. The



first attempt the encoding function tries is look for an active slice with index *i*, and update the found slice. If there is no active slice with index *i*, then the function chooses one of empty slices, and *activates* the slice to become  $\mathbf{c}_{1}^{[i]}$ . In case there is no empty slice available, then  $\mathcal{E}$  returns E. Again consider that the state of the block is given by Fig. 2)(b). If a write operation requests to filp  $b_2$ , then the encoding function spots and updates  $s_3$  because its index is 2. The encoding results in Fig. 2)(c). With one more write operation of  $b_2$ , the state becomes as in Fig. 2)(d). Note that there is no active slice with index 2 in Fig. 2)(d), but this is not a problem because  $b_2 = 0$  at this moment. If another writ operation is performed for  $b_2$ , then the encoding function activates an empty slice, for example  $s_2$ , and the encoding continues. Consider that we have reached Fig. 2)(f), and a write operation of  $b_2$  is requested. In this case, the erase block has no room to accommodate the request, and a block erasure E is returned.

#### 3. **Cyclic Random-Walk Model**

The purpose of this section is to define a mathematical model that contributes to analyze the performance of ILIFC. The model indeed reprsents the distribution of weights of slices that are associated with the bits of the data. A fundamental property of the model is discussed in this section, which will be is utilized in the next for the performance analysis of ILIFC.

#### 3.1 Definition of the Model

In the following discussion, we let Z = K(q-1) which equals to the weight of a full slice. Consider a structure which consists of Z places  $Q_0, \ldots, Q_{Z-1}$  and K tokens  $\tau_0, \ldots, \tau_{k-1}$ . Fig. 3 illustrates a simple example of such a structure with siz states and four tokens, where tokens are represented by numbered small circles. The places are cyclically connected, and we say that  $Q_{w+1 \mod Z}$  is the *next* place of  $Q_w$ . The tokens are initially put in the place  $Q_0$ , and moved to the next places according the execution of the encoding function  $\mathcal{E}$ ; if  $\mathcal{E}$  is invoked for a bit position *i* with  $0 \le i < K$ , then the *i*-th token  $\tau_i$  is moved to the next place. From the characteristic of the encoding function, we have the relationship that a token  $\tau_i$  is in the place  $Q_w$  with  $1 \le w < Z$  (note that w = 0 is not included here) if and oly if there is an active slice whose index is *i* and whose weight is *w*. The token  $\tau_i$  is in  $Q_0$  if and only if there is no active slice whose index is *i*.

This place-and-token structure can be regarded as a cyclic random-walk model with multiple tokens. At each move of the model, one of tokens is selected according to the probabilities  $p_0, \ldots, p_K$  (note that these are probabilities of the corresponding bits selected by a write operation), and moved to the next place. We are interested in the distribution of the tokens after *t* moves of this model, where *t* is an arbitrary positive integer. For  $0 \le w < Z$  and  $t \ge 0$ , let denote the number of tokens in the place  $Q_w$  after *t* moves by a random variable  $X_{w;p_0,\ldots,p_{K-1}}^{(t)}$ . We would like to obtain a mathematical means that gives clear perspective of the expected value of  $X_{w;p_0,\ldots,p_{K-1}}^{(t)}$ .

# 3.2 Simplification to a Single-Token Model

It is difficult to clarify the distribution of  $X_{w;p_0,...,p_{K-1}}^{(t)}$  precisely because there is complicated correlation among random variables. For example, we must have

$$\sum_{w=0}^{Z-1} X_{w;p_0,\dots,p_{K-1}}^{(t)} = K, \text{ and}$$
$$\sum_{w=0}^{Z-1} w X_{w;p_0,\dots,p_{K-1}}^{(t)} = t.$$

To get around such complicated discussion, we consider to "decompose" the multi-token model into K independet single-token models. We remark that there are subtle problems in the decomposition; the decomposition loses some important aspects of the multi-token model including the correlation amoung tokens, though, it is still helpful as far as our interest is limited to the expected values. We formalize the single-token model in this section, and its relation to the multi-token model is discussed in the next section.

A cyclic random-walk model with a *single* token is a struture that consists of places and only one token. The token stays at the current place with probability 1 - p, and moves to the next place with probability p. We use  $Y_{w;p}^{(t)}$  with  $0 \le w < Z$  and  $t \ge 0$  as the random variable denoting the number of token in the place  $Q_w$  after t moves of the model. Because there is only one token in this model,  $Y_{w;p}^{(t)}$  is either of 0 or 1, and the expected value of the variable  $E[Y_{w;p}^{(t)}]$  equals to the probability that the token is in the place  $Q_w$ . This means that the relation among  $E[Y_{w;p}^{(t)}]$  can be stated by utilizing a recursion that characterizes the probability distribution of the token. For  $t \ge 0$ , let define a vector  $\mathbf{Y}^{(t)} = \left(E[Y_{0;p}^{(t)}], \ldots, E[Y_{Z-1;p}^{(t)}]\right)$ . Because a token must be in  $Q_0$  initially, we have  $\mathbf{Y}^{(0)} = (1, 0, \ldots, 0)$  The token stays at the same place with probability 1 - p, and moves to the next place with probability p. Therefore, we have



Fig. 3 Cyclic Random-Walk Model with Multple Tokens

$$\left(\mathbf{Y}^{(t)}\right)^{\mathrm{T}} = W\left(\mathbf{Y}^{(t-1)}\right)^{\mathrm{T}} = W^{t}\left(\mathbf{Y}^{(0)}\right)^{\mathrm{T}},\tag{1}$$

where T denotes the transposition of a vector and

$$W = \begin{pmatrix} 1-p & 0 & \dots & 0 & p \\ p & 1-p & & & \\ & \ddots & \ddots & & \mathbf{O} \\ \mathbf{O} & \ddots & \ddots & & \\ & & & p & 1-p \end{pmatrix}.$$
 (2)

The change of values of  $\mathbf{Y}^{(t)}$  is sketched in Fig. 4 for p = 0.1 (left), 0.5 (center) and 0.8 (right) with taking Z = 16. Each illustration consists of arrays of square tiles, where the color of the (w, t') component of the tile (the top-left tile is the (0,0) component) represents the value of  $E(Y_{w;p}^{(St')})$ , and therefore one column shows the component values of  $\mathbf{Y}^{(St')}$ . A tile with denser color means greater probability; black is 1, white is 0, and gray color represents a intermediate probability. We can see that the token disperses uniformly as the number *t* of moves increases.

#### 3.3 Relation between Single-Token and Multi-Token Models

The following lemma connects the single-token model and the multi-token model.

**Lemma 3.1** For any  $p_0, \ldots, p_{K-1}$  with  $p_0 + \cdots + p_{K-1} = 1$ ,

$$E(X_{w;p_0,\dots,p_{K-1}}^{(t)}) = \sum_{i=0}^{K-1} E(Y_{w;p_i}^{(t)}).$$

Remind that the positions of tokens in the multi-token model is correlated with each other, which makes the analysis of the model quite complicated. However, Lemma 3.1 implies that, as far as we discuss the expected number of tokens, we can decompose the multi-token model into k single-token models that move independently. This characteristic simplifies the analysis of the multi-token model, and contributes to the performance evaluation of ILIFC as we see in the next section.

For the proof of Lemma 3.1, we first consider non-cyclic variants of the random-walk models, and extend the discussion to the original cyclic models. A *linear random-walk model* with multiple tokens (resp. a single token) is a struture with infinite places  $Q_0, Q_1, \ldots$  and K tokens  $\tau_0, \ldots, \tau_{K-1}$  (resp. one token  $\tau$ ). The move of the token is defined in the same way as the cyclic counterparts of the model. We use  $V_{w;p_0,\ldots,p_{K-1}}^{(t)}$  and  $W_{w;p}^{(t)}$  as random variables of the numbers of tokens in the linear random-walk model with multiple tokens and a single token, respectively. The single-token model defined here is a Markov model of a binomial process, and

$$E\left[W_{w;p}^{(t)}\right] = \binom{t}{w} p^{w} (1-p)^{t-w}$$

for  $w \ge 0$ . The following lemma is the non-cyclic counterpart of Lemma 3.1.

**Lemma 3.2** For any  $p_0, \ldots, p_{K-1}$  with  $p_0 + \cdots + p_{K-1} = 1$ ,

$$E\left[V_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = \sum_{i=0}^{K-1} E\left[W_{w;p_i}^{(t)}\right] = \sum_{i=0}^{K-1} \binom{t}{w} p_i^w (1-p_i)^{t-w}.$$
 (3)

**Proof:** The proof is by induction on the number K of tokens.



Fig. 4 The diffusion of the expected number of tokens

When K = 1, the multi-token and single-token models are identical, and the equation holds obviously. Assume for the induction that the equation holds for any cases with K - 1 or less tokens. Consider that a linear random-walk model with K tokens have made t moves, and t moves contain l moves of the K-th token  $\tau_{K-1}$  where  $0 \le l \le t$ . Note that this case happens with probability

$$r_{l} = {\binom{t}{l}} p_{K-1}^{l} (1 - p_{K-1})^{t-l},$$

because  $\tau_{K-1}$  is selected and moved with probability  $p_{K-1}$ . The remaining t-l moves are used for other K-1 tokens  $\tau_0, \ldots, \tau_{k-2}$ . Consider to restrict this multi-token model by ignoring the token  $\tau_{K-1}$  and l moves of  $\tau_{K-1}$ , then we have another random-walk model with K-1 tokens and t-l moves. In this restricted model, the token  $\tau_i$  with  $0 \le i < K-1$  is characterized by the probability  $q_i = p_i/(1 - p_{K-1})$ . Use the inductive hypothesis, and we have

$$E\left[V_{w;q_0,\ldots,q_{K-2}}^{(t-l)}\right] = \sum_{i=0}^{K-2} \binom{t-l}{w} q_i^w (1-q_i)^{t-l-w}.$$

Now bring back the token  $\tau_{K-1}$  to this restricted model. The token  $\tau_{K-1}$  must be in the place  $Q_l$ , and contribute to the expected number of tokens in  $Q_l$ . Consequently, if  $\tau_{K-1}$  has made *l* moves, then

$$E\left[V_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = E\left[V_{l;q_0,\dots,q_{K-2}}^{(t-l)}\right] + \phi(w,l),\tag{4}$$

where  $\phi(w, l)$  is an indicator function that is 1 if and only if w = l, and 0 otherwise.

Remind that we are now discussing the case of  $\tau_{K-1}$  moved l times, which happens with probability  $r_l$ . The general value of  $E\left[V_{w;p_0,\ldots,p_{K-1}}^{(l)}\right]$  must be obtained by averaging (4) over all possible values of l. Therefore, we have

$$E\left[V_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = \sum_{l=0}^{t} r_l \left( E\left[V_{w;q_0,\dots,q_{K-2}}^{(t-l)}\right] + \phi(w,l) \right)$$
(5)

The first term in the right-hand side of (5) is reduced to

$$\sum_{i=0}^{K-2} \binom{t}{w} p_i^w (1-p_i)^{t-w},$$
(6)

where detailed transformation of the formula is given in Fig. 5. The second term in the right-hand side of (5) is simply reduced to

$$\sum_{l=0}^{t} r_l \phi(w, l) = r_w = {t \choose w} p_{K-1}^w (1 - p_{K-1})^{t-w}.$$
 (7)

Add (6) and (7), and (3) is shown for this K token case. This completes the proof of the lemma.

Lemma 3.2 is for linear random-walk models, but the result can be transformed to the cyclic case by merging places  $Q_{w+Z}, Q_{w+2Z}, \dots$  into  $Q_w$ , where  $0 \le w \le Z - 1$ . This observation implies that

$$E\left[X_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = \sum_{h=0}^{\infty} E\left[V_{w+hZ;p_0,\dots,p_{K-1}}^{(t)}\right], \text{ and}$$
(8)

$$E\left[Y_{w;p_{i}}^{(t)}\right] = \sum_{h=0}^{\infty} E\left[W_{w+hz;p_{i}}^{(t)}\right] \quad (0 \le i < K).$$
(9)

Use (3) of Lemma 3.2 to connect the right-hand sides of (8) and (9), and we have

$$E\left[X_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = \sum_{h=0}^{\infty} E\left[V_{w+hZ;p_0,\dots,p_{K-1}}^{(t)}\right] = \sum_{h=0}^{\infty} \sum_{i=0}^{K-1} E\left[W_{w+hZ;p_i}^{(t)}\right]$$
$$= \sum_{i=0}^{K-1} \sum_{h=0}^{\infty} E\left[W_{w+hZ;p_i}^{(t)}\right] = \sum_{i=0}^{K-1} E\left[Y_{w;p_i}^{(t)}\right],$$

and the proof of Lemma 3.1 completed.

# 4. Expected Write Deficiency of ILIFC

Lemma 3.1 gives the baseline of the performance analysis of ILIFC, but we need to employ different techniques to utilize Lemma 3.1 according to our target. We first consider a *non-asymptotic* scenario in which N is rather small, and the block erasure is returned while the random-walk model is still in the transient behavior. In the second *asymptotic* scenario, we deal with the case that N is sufficiently large and the limit discussion is feasible.

#### 4.1 Non-Asymptotic Discussion

Remind that the encoding function of ILIFC performs either one of two actions; increase the weight of an active slice, or "activate" an empty slice (i.e. choose an empty slice and raise its weight by one). The block erasure is requested when the encoding function tries to perform the N/K + 1-th activation, because the number of slices in the erase block is only N/K. Note that the activation of a slice is made when a write operation tries to flip a data bit, say *i*-th bit, that does not have a corresponding active slice. In this case, the token  $\tau_i$  must be in the place  $Q_0$ due to the correspondence between the bahvior of the encoding function and the random-walk model. Consequently, at the *t*-th write opearation, an activation of a slice takes place with probability  $\sum_{i=0}^{K-1} p_i E \left[ Y_{0;p_i}^{(t-1)} \right]$ . This probability can be also regarded as the expected number of slices that are newly activated at the *t*-th write operation, and the accumulation of this value

$$S_{t} = \sum_{l=1}^{t} \sum_{i=0}^{K-1} p_{i} E\left[Y_{0;p_{i}}^{(l-1)}\right] = \sum_{i=0}^{K-1} \sum_{l=1}^{t} p_{i} E\left[Y_{0;p_{i}}^{(l-1)}\right]$$
(10)

gives the expected number of slices which have been activated

$$\sum_{l=0}^{t} r_{l} E\left[V_{w:q_{0},\dots,q_{K-2}}^{(t-l)}\right] = \sum_{l=0}^{t} {t \choose l} p_{K-1}^{l} (1-p_{K-1})^{t-l} \sum_{i=0}^{K-2} {t-l \choose w} q_{i}^{w} (1-q_{i})^{t-l-w}$$

$$= \sum_{l=0}^{t} \sum_{i=0}^{K-2} \frac{t!}{l!(t-l)!} \frac{(t-l)!}{w!(t-l-w)!} p_{K-1}^{l} (1-p_{K-1})^{t-l} \left(\frac{p_{i}}{1-p_{K-1}}\right)^{w} \left(1-\frac{p_{i}}{1-p_{K-1}}\right)^{t-l-w}$$

$$= \sum_{i=0}^{K-2} \frac{t!}{w!(t-w)!} p_{i}^{w} (1-p_{i})^{t-w} \sum_{l=0}^{t} \frac{(t-w)!}{l!(t-w-l)!} \left(\frac{p_{K-1}}{1-p_{i}}\right)^{l} \left(1-\frac{p_{K-1}}{1-p_{i}}\right)^{t-w-l} = \sum_{i=0}^{K-2} {t \choose w} p_{i}^{w} (1-p_{i})^{t-w}.$$

Fig. 5 Transformation of the first term of (5)



Fig. 6 The accumulation of used slices (single bit)

by the *t* write operations performed so far. Since N/K is the maximum number of slices that we can use, the smallest *t* with  $S_t > N/K$  gives the estimation of the number of write operations that causes the block erasure. The expected write deficiency is therefore estimated as  $\Delta = N(q - 1) - t$  where *t* is the smallest integer with  $S_t > N/K$ .

Fig. 6 shows how the inner summation  $\sum_{l=1}^{t} p_{i} E\left[Y_{0;p_{i}}^{(l-1)}\right]$  in (10) where the value of t is varied from 0 to 80. Four different values  $p_i = 0.1, 0.3, 0.5$  and 0.8 are sketched. Intuitively, this graph shows the expected number of slices that have been used to represent a certain bit of the data. For example, we can read that the values of the curves for p = 0.1 and 0.3 at t = 40 are 0.99 and 1.06, respectively. This suggests that a data bit with probability 0.1 (resp. 0.3) should consume approximately 0.99 (resp. 1.06) slices after t = 40 write operations. The accumulation of these values gives the total number of slices that are used by either bit of the K-bit data. For example, if K = 4 and four bits are selected by write operations with probabilities 0.1, 0.3, 0.3 and 0.3, then, after 40 write operations,  $0.99+1.06\times3 = 4.17$  slices are expected to be in use. We let this example as "case 1". In the "case 2", we consider that four bits are selected by write operations with probabilities 0.1, 0.1, 0.3 and 0.5. In this case, the expected number of slices in use will be  $0.99 \times 2 + 1.06 + 1.87 = 4.91$  after 40 write operations. The expected number of slices for these two cases are illustrated in Fig. 7, again the x-axis is the value of t. The number of slices is expected to exceed 5 at t = 51 for the case 1, and at t = 43 for the case 2. If an erase block contains only four slices (and hence  $4 \times 4 = 16$  cells), then ILIFC can offer 51 - 43 = 8more write operations in the case 1 compared to the case 2. From this example, we can see that the non-uniform nature of write operations affects the performance of ILIFC for relatively small N.



#### 4.2 Asymptotic Discussion

If there are so many cells in the erase block and asymptotic discussion is feasible, then we can take purely analytical approach. In ILIFC, it is easily understood that the sum of weights of slices equals to the number of performed write operations. This means that the write deficiency is determined by subtracting from N(q-1) the sum of weights of all slices at the time of block erasure. Assume that t write operations have been successfully performed, and block erasure occurs at the t + 1-th write operation. At this moment, at most K slices are used to store K-bit data, and the other N/K - K slices are full. The sum of weights of those full slices is (N/K - K)K(q - 1) because one full slice has weight K(q-1). The weights of the remaining K slices are characterized by the random-walk model discussed in Sect 3. Remind that  $X_{w;p_0,\ldots,p_{K-1}}^{(l)}$  with  $1 \le w \le K(q-1) - 1$  equals to the number of data bits whose values are represented by slices with weight *i*, that is, the erase block contains  $X_{w;p_0,...,p_{K-1}}^{(t)}$  slices with weight w. For w = 0 case, we need special care because  $X_{0;p_0,\dots,p_{K-1}}^{(t)}$  is the number of data bits which do not have corresponding active slices. This means that the K slices under investigation contain  $X_{0;p_0,\ldots,p_{K-1}}^{(t)}$  slices which are not active. These non-active slices cannot be empty, because the encoding function is returning a block erasure, and hence those  $X_{0;p_0,...,p_{K-1}}^{(t)}$  slices must be all full. For notational simplicity, let define  $X_{K(q-1);p_0,...,p_{K-1}}^{(t)} = X_{0;p_0,...,p_{K-1}}^{(t)}$ Summarizing the discussion, the write deficiency of this block erasure event is given as follows:

$$W = N(q-1) - (N/K - K)K(q-1) - \sum_{w=1}^{K(q-1)} wX_{w;p_0,\dots,p_{K-1}}^{(t)}$$
$$= K^2(q-1) - \sum_{w=1}^{K(q-1)} wX_{w;p_0,\dots,p_{K-1}}^{(t)}.$$

The expected write deficiency is therefore equal to

$$K^{2}(q-1) - \sum_{w=1}^{K(q-1)} w E\left[X_{w;p_{0},\dots,p_{K-1}}^{(t)}\right].$$
 (11)

The value of  $E\left[X_{w;p_0,\ldots,p_{K-1}}^{(t)}\right]$  is characterized by  $E\left[Y_{w;p_i}^{(t)}\right]$  as in Lemma 3.1. Recall the vector representation  $\mathbf{Y}^{(t)}$  of the expected values, the recursive relation (1), and the transition matrix W in (2). According to the discussion in Chapter XVI of [1], the (a, b) component of  $W^t$  is given as

$$w_{a,b}^{(t)} = \frac{1}{K(q-1)} \sum_{r=1}^{K(q-1)} \theta^{r(a-b)} \left(1 - p(1-\theta^r)\right)^t, \quad (12)$$

where  $\theta = e^{2\pi i/K(q-1)}$  with  $i^2 = -1$  (remark: there is an error in Eq. (2.11) in Chap. XVI of [1], where *r* should range from 1 to  $\rho$ , not to  $\rho - 1$ , with  $\rho = K(q-1)$  in our context).

**Lemma 4.1**  $\lim_{t\to\infty} w_{a,b}^{(t)} = 1/K(q-1).$ 

**Proof:** We show that terms in (12) diminish to zero except for r = K(q - 1). Let define  $t_r = 1 - p(1 - \theta^r)$ . By using the Eular's formula  $e^{ix} = \cos x + i \sin x$ ,  $t_r$  can be written as

$$t_r = 1 - p + p \cos 2\pi r / K(q - 1) + ip \sin 2\pi r / K(q - 1).$$

The L2-norm of  $t_r$  is computed as

$$(1 - p + p \cos 2\pi r/K(q - 1))^2 + (p \sin 2\pi r/K(q - 1))^2$$
  
= 1 - 2p(1 - p)(1 - cos 2\pi r/K(q - 1)),

which is smaller than 1 if  $r \neq K(q-1)$ , and equals to 1 if r = K(q-1). Therefore, all terms in (12) diminish to zero at  $t \rightarrow \infty$  except for r = K(q-1). The lemma holds because  $\theta^{r(a-b)} = \theta^r = 1$  for r = K(q-1).

Since  $(\mathbf{Y}^{(t)})^{\mathrm{T}} = W^t (\mathbf{Y}^{(0)})^{\mathrm{T}} = W^t (1, 0, \dots, 0)^{\mathrm{T}}$ , the above lemma implies that  $E[Y_{w;p}^{(t)}] = 1/K(q-1)$  for any w at  $t \to \infty$ , independent from the probability value of p.

**Lemma 4.2** For very large *N*, the expected write deficiency of ILIFC is K(K(q-1)-1)/2.

**Proof:** From Lemma 3.1, the expected number of active slices with weight *w* is given by  $E\left[X_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = \sum_{i=0}^{K-1} E\left[Y_{w;p_i}^{(t)}\right]$ . If *N* is very large, then a large number of write operations are performed, and  $E\left[Y_{w;p_i}^{(t)}\right]$  approaches to 1/K(q-1). In this case,  $E\left[X_{w;p_0,\dots,p_{K-1}}^{(t)}\right] = 1/(q-1)$ , and (11) is computed as

$$\begin{split} & K^2(q-1) - \sum_{w=1}^{K(q-1)} w E(X^{(t)}_{w;p_0,\dots,p_{K-1}}) \\ & = K^2(q-1) - \sum_{w=1}^{K(q-1)} w/(q-1) \\ & = K(K(q-1)-1)/2, \end{split}$$

and the lemma holds.

We remark that the probabilities  $p_0, \ldots, p_{K-1}$  do not affect the expected write deficiency in this asymptotic scenario, which is an interesting contrast to the non-asymptotic case.

### 5. Conclusion

The expected write deficiency of ILIFC is studied for nonuniform write operation. The write deficiency of ILIFC has strong relation to the weight distribution of active slices. The transition of weight distribution can be modeled as a cyclic random-walk with multiple tokens, for which discussion can be decomposed to simpler problems with just one token. Based on this decomposition, the expected write deficiency is determined for non-asymptotic scenario with the aid of computation. For asymptotic case with very large N, the weight distribution converges to a certain value, and the write deficiency is not affected by the non-uniform nature of write operations. This is an interesting contrast to the non-asymptotic discussion in which the nonuniform nature affects the write deficiency in general.

#### References

- [1] Feller, W., An Introduction to Probability Theory and Its Applications, Third Edition, Wiley, 1968.
- [2] Finucane, H., Liu, Z. and Mitzenmacher, M.: "Designing Floating Codes for Expected Performance," Proc. of 46th Allerton Conf. on Communication, Control and Computing, pp.1389–1396, 2008.
- [3] Jiang, A., Bohossian, V. and Bruck, J.: "Floating Codes for Joint Information Storage in Write Asymmetric Memories," Proc. of 2007 Intl. Symp. on Inf. Theory, pp.1166–1170, 2007.
- [4] Jiang, A. and Bruck, J.: "Joint Coding for Flash Memory Storage," Proc. of 2008 Intl. Symp. on Inf. Theory, pp.1741–1745, 2008.
- [5] Jiang, A., Mateesch, R., Schwartz, M. and Bruck, J.: "Rank Modulation for Flash Memories," IEEE Trans. on Inf. Theory, 55, 6, pp.2659– 2673, June 2009.
- [6] Kaji, Y.: "The Expected Write Deficiency of Index-Less Indexed Flash Codes," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E95-A, 12, pp.2130-2138, 2012.
- [7] Kamabe, H.: "Floating Codes with Good Average Performance," Proc. of 32nd Symp. on Inf. Theory and Its Applications, pp.856–861, 2009.
- [8] Mahdavifar, H., Siegel, P.H., Vardy, A., Wolf, J.K. and Yaakobi, E.: "A Nearly Optimal Construction of Flash Codes," arXiv:0905.1512, 2009.
- [9] MicronTechnology, Inc.: "Wear Leveling in Micron NAND Flash Memory," Technical Note, TN-29-61, 2010 (retrieved on May 12, 2014). http://www.micron.com/-/media/Documents/Products/Technical

 Note/NAND Flash/tn2961\_wear\_leveling\_in\_nand.pdf

 [10]
 Olson, A.R. and Langlois, D.J.: "Solid State Drives Data Reliability

- and Lifetime," Imation Corporation White Paper, 2008 (retrieved on May 12, 2014). http://www.csee.umbc.edu/ squire/images/ssd1.pdf, 2008
- [11] Rivest, R.L. and Shamir, A.: "How to Reuse a 'Write-Once' Memory," Information and Control, 55, pp. 1.19, 1982.
- [12] Shu, R.: "Windows 7 Enhancements for Solid-State Drives," 2008 Windows Hardware Engineering Conference, 2008 (retrieved on May 14, 2014). http://download.microsoft.com/download/F/A/7/FA70E919-8F82-4C4E-8D02-97DB3CF79AD5/COR-T558\_Shu\_Taiwan.pdf
- [13] Suzuki, R. and Wadayama, T.: "Modified Index-less Indexed Flash Codes for Improving Average Performance," IEICE Trans. on Fundamentals (Japanese Edition), J94-A, 12, pp.991–1000, 2011. (in Japanese).
- [14] Woodhouse, D.: "JFFS2: The Journalling Flash File System, version 2," 2003 (retrieved on May 14, 2014). https://www.sourceware.org/iffs2/