Synopsis V1.0 HI SEE Test Report for the Hynix, Micron, and Samsung 4Gbit NAND Flash Memories

Tim Oldham, Perot Systems Government Services, Inc. /NASA-GSFC Christian Poivey, MEI Technology Inc./NASA-GSFC Steven Buchner, Perot Systems Government Services, Inc./NASA-GSFC Hak Kim, MEI Technology Inc./NASA-GSFC Mark Friendlich, MEI Technology Inc./NASA-GSFC Melanie Berg, MEI Technology Inc./NASA-GSFC

Test Date: 22 May and 12 Aug, 2007

Report Date: 10/29/07

### I. Introduction

This study was undertaken to determine and compare the susceptibilities of 4 Gbit NAND Flash memories from Micron, Hynix and Samsung to destructive and nondestructive single-event effects (SEE). The devices were monitored for SEUs and for destructive events induced by exposing them to a heavy ion beam at the Texas A&M University Cyclotron.

# II. Devices Tested

We tested a total of five Micron parts (part number MT29F4G08AAAWP, Lot Date Code (LDC) 0628), five Hynix parts (part number HY27UF084G2M, LDC 0636), and two Samsung parts (part number K9F4G08U0A, LDC 0625). All were 4G NAND devices. The Hynix and Micron parts had not been tested previously, so they were to be characterized completely. The Samsung parts had been tested for most SEE previously, so fewer parts were needed to complete their testing, which was primarily for SEL (single event latchup). The Samsung parts were also used in an initial, preliminary experiment to look for angular effects. Note that with commercial devices, the same lot date code is no guarantee that the devices are from the same wafer diffusion lot or even from the same fabrication facility.

The device technology is 63 nm minimum feature size CMOS NAND Flash memory. All the parts are single die, SLC (single level cells). The chips came in a 48-pin TSOP package, but the plastic had been dissolved on the topside to expose the chips, allowing the beam to reach the chip surface.

# **III.** Test Facilities

Facility: Texas A&M University Cyclotron

Flux:  $5 \times 10^3$  to 1. x  $10^5$  particles/cm<sup>2</sup>/s.

**Fluence:** All tests were run to 1E3 to 1E8  $p/cm^2$ , or until destructive or functional events occurred.

| TAMU<br>Ions | Energy/<br>AMU | Energy<br>(MeV) | Approx. LET on<br>die (MeV•cm²/mg) | Angle | Effective<br>LET |
|--------------|----------------|-----------------|------------------------------------|-------|------------------|
| Ne           | 15             | 300             | 2.7                                | 0     | 2.7              |
| Ar           | 15             | 600             | 8.4                                | 0     | 8.4              |
| Cu           | 15             | 945             | 21                                 | 0     | 21               |
| Kr           | 15             | 1260            | 30.1                               | 0,45  | 30.1, 42.1       |
| Xe           | 15             | 1965            | 54.8                               | 0     | 54.8             |

Table I: Ions/Energies and LET for this test

### **IV.** Test Conditions

Test Temperature:Room Temperature for SEU, 70° C for SELOperating Frequency:(0-30 MHz).Power Supply Voltage:(3.3V and 3.0V (3.3-10%) for SEU, 3.6V (3.3+10%) for SEL).

# V. Test Methods

Because Flash technology uses different voltages and circuitry depending on the operation being performed, testing was performed for a variety of test patterns and bias and operating conditions.

Test patterns included all 0's, all 1's, checkerboard and inverse checkerboard. In general, all zeroes is the worst-case condition for single bit errors. For a zero, the floating gate is fully charged with electrons. An ion can have the effect of introducing positive charge, which may be enough to cause a zero-to-one error. However, a checkerboard pattern (AA) was used in most of the testing because errors in the control circuitry can cause errors of both polarities. One-to-zero errors are an indication that the errors are coming from the control circuits. Between exposures, all patterns were used to exercise the DUT, to verify that it was still fully functional. The maximum clock frequency for these devices was 40 MHz, which is also the frequency used in the dynamic testing.

Bias and operating conditions included:

- 1) Static/Unbiased irradiation, in which a pattern was written and verified, and then bias was removed from the part and the part was irradiated. Once the irradiation reached the desired fluence, it was stopped, bias was restored, and the memory contents were read and errors tallied.
- 2) Static irradiation, which was similar to unbiased irradiation, except that bias was maintained throughout irradiation of the part.

Note that these conditions provide no opportunity to monitor functional or hard failures that may occur during the irradiation.

- 3) Dynamic Read, in which a pattern was written to memory and verified, then subsequently read continuously during irradiation. This condition allows determination of functional, configuration and hard errors, as well as bit errors.
- 4) Dynamic Read/Write, which was similar to the Dynamic Read, except that a write operation is performed on each word found to be in error during the previous Read.
- 5) Dynamic Read/Erase/Write, which again was similar to the Dynamic Read and Read/Write, except that a word in error was first erased and then rewritten. Because the Erase and Write operations use the charge pump, it is expected that the Flash could be more vulnerable to destructive conditions during these operations.
- 6) Latchup testing was conducted at 70° C, and 3.6 V, on parts from all three manufacturers. It was expected that high voltage, dynamic test modes would be most likely to result in latchup, so these were emphasized in the latchup testing, but all test modes were checked at least briefly.
- 7) In this set of experiments, we have included an initial attempt to look at angular effects, which may include multiple bits grazed by the same ion, and other effects due to charge sharing by multiple nodes in the control logic. This test was done with (only) the Samsung 4G, Kr ions, at 45 degrees. The original plan was have the ions incident along four axes, which we refer to as north, east, south and west. But because of limited beam time, only two axes (north and east) were actually done.

The Block diagram for control of the DUT is shown in Figure 2. The FPGA based controller interfaces to the FLASH daughter card and to a laptop, allowing control of the FPGA and uploading of new FPGA configurations and instructions for control of the DUT. Power for the flash is supplied by means of a computer-controlled power supply. The National Instruments Labview interface monitors the power supply for over-current conditions and shuts down power to the DUT if such conditions are detected.



Figure 2. Overall Block Diagram for the testing of the NAND Flash.

#### (image not available)

Fig. 3. Front and back views of motherboard and daughterboard, with DUT.

#### VI. Results

During testing, the DUTs were irradiated with the ions indicated in Table I. The DUT was oriented normal to the incident beam, except as noted. The errors observed in static testing are shown in Fig. 4a for the Micron part, and Fig. 4b for the Hynix part.

Even for the static case, bit errors and Page/Block errors were evident in the patterns of upsets observed. It is likely that the Page/Block errors arise due to upsets in configuration registers in the memory array. Because the DUT was not actively exercised during the exposure, we could not determine exactly when a page/block error occurred, so cross sections are approximate for these error modes. Here and in the following discussion, bit errors are taken to be single bits, which are flipped, as a result of the interaction with incident ions, normally from zero to one. We do not have the physical to logical address mapping, which would allow us to look for multiple bit errors (error clusters) for these parts. However, in the overwhelming majority of cases of bit errors, there is only one error in a page, or one error in an entire block, which makes it extremely unlikely that there will be multiple errors from a single ion. This result is consistent with previously published results on the upset mechanism in flash memory—an ion passing through a floating gate creates a dense charge column, which creates a conducting path between the gate and substrate, which allows charge to leak off the floating gate. Since the ion only hits one gate at normal incidence, only one bit is affected. This situation is far different from that in volatile memories, where charge generated in the Si substrate can be shared across multiple

nodes. The only apparent multiple bit errors are cases where an entire page or a block (or a large part of one) upsets simultaneously—these page and block errors are attributed to errors in the control logic, rather than to the individual bits. These are counted as SEFIs (Single Event Functional Interrupt). In general, a SEFI is any event where the entire DUT, or a large part of it, stops working, presumably from an interaction with a single ion. As a practical matter, most of the SEFIs recorded here are either page errors or block errors, although some involve multiple pages or multiple blocks. Some are also watchdog errors, where the DUT simply stopped responding to commands.



Fig. 4. Error cross-sections observed in static testing: (a) Micron; (b) Hynix.

The data in Fig. 4 is replotted in Fig. 5, normalized per device, instead of per bit, so that the SEFI effects can be shown on the same scale.

For the Dynamic Read condition, the parts showed exhibited transient read errors in addition to the bit and Page/Block errors, and other SEFIs, which are plotted in Fig. 6. For Ne and Ar ions (LET up to 9.74), there were no static bit errors at any LET, detected after the beam was turned off. There were transient read errors, as shown in Fig. 6, at these LETs, which are thought to be due to noise in the read circuit. At higher LETs (Cu, Kr, and Xe ions), SEFIs were observed on all shots for the Micron part, and many shots for Hynix parts, which made it difficult to count the transient errors-if the DUT stops responding to commands, it is hard to say what errors were not counted. And most of the errors that were counted, were due to control logic, and not associated with individual bits. After the DUT was restored to operating condition, it could still be checked for static bit errors, however. The transient results are shown in Fig. 6. Static results are not shown in Fig. 6, but they are consistent with results in Figs. 4 and 5. In Fig. 6, we have attempted to count SEFI events, despite the obvious difficulties of doing so. For example, one can count block errors, but it is often unclear whether multiple events are independent or not. We have assumed that block or page errors at widely separated addresses are independent events, and block or page errors at consecutive addresses are one event. Of course, if the DUT stops responding to commands, there may be other events that were missed completely. The number of SEFI events is small in any case. As always, the statistical uncertainty associated with rare events is large.

Results of the dynamic R/W tests are shown in Fig. 7. Generally these results are unremarkable, because the usual zero-to-one errors are rewritten as they occur. For this reason, there are fewer errors indicated than in Fig. 6. The main reason for including this test was the expectation that the high voltage write operation would contribute to more errors in the control circuits, but this appears not to have happened, at least not on a large scale. Probably, this is because the write operation is performed only when a zero-to-one error is detected. For this circuit, four thousand such errors are still only one part per million of the entire memory, so the write circuit duty cycle is a very small number. Where a static cross section is given, it is based on the number of errors detected after the exposure and resetting of the DUT. The transient cross section is based on errors that were rewritten during the test.

Results for the dynamic R/E/W tests are shown in Fig. 8, to the extent that they can be determined. For this condition, there were many more SEFIs than in the R/W (without erase) mode, which is probably due to the fact that every block is erased and rewritten on every cycle, so that the duty cycle for high voltage operations is much higher. As a practical matter, there are many page and block errors, which usually appear to be independent, on almost every shot with LET at or above 9.74 (Ar). With many large chunks of the memory completely knocked out, it becomes impossible to determine static or transient errors affecting only single bits.

The Micron and Hynix parts were tested for SEL, Single Event Latchup, along with the Micron 4G, which had been tested previously for other SEE. Latchup testing was done at 70° C, and 3.6 V, which are the maximum rated operating voltage and temperature for all of these parts. Heating was accomplished with a strip heater inserted under the chip package in the test socket. Thermal contact was achieved with conducting grease. No latchup was observed for any part with any incident ion. Power supply current was monitored throughout the test, and some current increase was observed on nearly every shot. Many of these events were bus contention, where the DUT corrected itself during the exposure, without operator intervention. There was no case of a high current condition, requiring a power cycle to restore normal operation, which would have been the signature of a true latchup. Representative power histories are shown in Fig. 9.

In the angular experiment, only a few shots were taken, because we were running out of beam time. For this reason, it is not possible to draw any conclusion with a high level of statistical confidence. However, the results suggest that angular effects may be significant. In the static mode, the measured cross-section for 45 degree exposures along the east axis are virtually identical with cross sections for

normal incidence. For 45 degree incidence along the north axis, the measured cross-section is about a factor of three lower. There was only one exposure at each test condition, with total fluence in the range of 2000-5000 incident particles/cm<sup>2</sup> on each exposure. There were a total of five exposures at 45 degrees, including dynamic test modes, with no SEFIs observed on any of them. Statistical confidence is limited, but there may be an angular effect on the static bit error rate. There is no indication of an increased SEFI rate at high angles. We note that there was a very good recent paper by Cellere et al. (IEEE TNS, vol. 54, no. 6, pp. 2371-2378, 2007), which does show angular effects in a flash memory array.

In Table 2, we show Weibull parameters for the static bit error cross-section along with the Crème 96 calculated error rate for geosynchronous orbit at solar minimum. Although there is some difference in response between manufacturers, they all have excellent results compared to advanced volatile commercial memories, which rarely achieve error rates better than 1e-9 errors/bit-day. The largest error rate here is for the Hynix part, which corresponds to about three errors per year for a 4G. We note that these error rates are obtained without error correction, even though all the manufacturers recommend the use of error correction. With properly implemented error correction, the bit error rate for all three of these parts would be too low to ever measure.

The SEFI (Single Event Functional Interrupt) rate is of greater concern for space applications than the bit error rate, however. Typically, a SEFI occurs when a control circuit malfunctions as a result of a single ion interaction, and the entire memory, or a large part of it, fails. This may mean reloading the entire memory from a backup, or even rebooting the entire system. Although the Micron part had the best bit error response, it was the one most sensitive to SEFIs. For example, there was one point in the test where there were eight consecutive exposure of the Micron part with Ar ions (LET=8.7), which was the lowest LET ion used in the Micron testing. Fluence on these exposures varied from  $10^3$  to  $10^4$ ions/cm<sup>2</sup>. There was a SEFI on seven of the eight exposures. Fluence was  $10^4$  particle/cm<sup>2</sup> on the one shot with no SEFI. Therefore, one can conclude that a SEFI occurs, on the average, every few thousand incident particles at this LET or higher. According to the Crème 96 input spectrum, the flux at LET = 9 or greater is  $1.9 \times 10^{-5}$  particles/cm<sup>2</sup>-sec. If the interval between SEFIs is about one thousand particles, this is equivalent to a SEFI every one-to-two-years. Probably, the actual interval is a few times greater. This may be a low enough rate to be considered acceptable for a system. But the SEFI rate for the Samsung part is much lower, at least two orders of magnitude. The Hynix SEFI rate is intermediate.

|                                   | Hynix   | Micron  | Samsung |
|-----------------------------------|---------|---------|---------|
|                                   | 4G      | 4G      | 4G      |
| Threshold (LET)                   | 1       | 9       | 3.5     |
| Width (LET)                       | 29      | 49      | 30      |
| Exponent                          | 4       | 5       | 5       |
| Saturation (cm <sup>2</sup> /bit) | 5e-11   | 5.5e-12 | 4.5e-11 |
| Geosynch. error rate (e/b-d)      | 2.3e-12 | 3.6e-17 | 3.7e-13 |

Table 2. Weibull parameters and calculated error rates for geosynchronous orbit at solar minimum



Fig. 5. Static upset cross sections: (a) Micron; (b) Hynix



Fig. 6. Dynamic read upset cross section (a) Micron; (b) Hynix.



Fig. 7. Error cross sections observed in dynamic read/write testing (a) Micron; (b) Hynix.



Fig. 8. Error cross sections observed in dynamic Read/Write/Erase testing (a) Micron; (b) Hynix.

#### VII. Recommendations

These parts had all been tested previously for total dose (TID) response, with the Samsung part surviving above 100 krad (SiO<sub>2</sub>), Micron surviving above 75 krad (SiO<sub>2</sub>), and Hynix surviving above 30 krad (SiO<sub>2</sub>). These results were all obtained at high dose rate, and the parts would be expected to do even better at the dose rates encountered in space. Obviously, the higher numbers are better, but the parts may all have TID response suitable for NASA applications. The SEU response (single bit upset rate) of all the NAND flash parts is excellent, compared with other commercial memories, especially if error correction is used. Micron had the best response of the parts tested here, for static single bit errors. However, it also had the highest SEFI rate of the parts tested, while Samsung had the lowest SEFI rate, by a wide margin. But even the Micron SEFI rate may be acceptable for some systems. None of the parts suffered from latchup, which is an important advantage. System designers need to consider carefully the system trade-offs involved with any of these parts, but they are all promising enough to consider for NASA applications.

#### **VIII.** Further Test Requirements

This test represents a preliminary characterization of SEE vulnerability of 4G NAND flash products from Samsung, Hynix and Micron. Although the static bit error rate is projected to be very good in space, the SEFI rate is not well quantified. There are many different SEFI modes, and relatively few SEFIs observed, which means the statistics are not very good for any of the modes. Therefore, SEFIs will need to be better understood, and mitigation strategies identified.

In prior TID testing, these devices showed some promise for applications with moderate dose levels. Additional TID testing is recommended to fully characterize TID degradation, especially at low dose rates characteristic of space environments.