Commercial Microelectronics Technologies for Applications in the Satellite Radiation Environment

Kenneth A. LaBel, Michele M. Gates, Amy K. Moran
NASA/GSFC Code 735.1
Greenbelt, MD 20771
301-286-9936
FAX:301-286-1718

Paul W. Marshall
Consultant

Janet Barth, E.G. Stassinopoulos
NASA/GSFC Code 900
Greenbelt, MD 20771

Christina M. Seidleck
Hughes/STX
NASA/GSFC Code 735.1
Greenbelt, MD 20771

Cheryl J. Dale
Naval Research Laboratory
Washington, DC 20375

Abstract-As spacecraft require reduced parameters such as power, weight, volume, and cost, while increasing performance requirements, enabling technologies have come to the forefront. We present data and design strategies for these enabling technologies in spacecraft.

TABLE OF CONTENTS:

1. INTRODUCTION
2. THE SPACE RADIATION HAZARD AND ITS IMPACT ON SATELLITE DESIGN
3. BASIC RADIATION EFFECTS ON ELECTRONICS
4. SATELLITE SYSTEM LEVEL CONCERNS
5. TID LESSONS: DEVICE SCREENING AND MITIGATION
6. SEE LESSONS
7. ILLUSTRATION OF COMMERCIAL TECHNOLOGIES SUCCESSFULLY UTILIZED IN SPACECRAFT
8. CONCLUSIONS
9. REFERENCES
1. INTRODUCTION

Current trends throughout NASA, military and commercial space sectors favor the insertion of commercial off-the-shelf (COTS) technologies for satellite applications. However, there are also unique concerns for assuring reliable performance in the presence of ionizing particle environments which present concerns in all orbits of interest. Our paper will detail these concerns from two important perspectives including premature device failure from total ionizing dose and also single particle effects which can cause both permanent failure and soft errors.

Background

Spacecraft and spacecraft designers are being pushed to utilize enabling or emerging commercial devices in order to meet high science data performance in increasingly smaller and lower power and cost spacecraft. These technologies include, but are not limited to: GaAs ICs (standard and emerging such as Honeywell's C-HIGFET), low power 3.3V CMOS ICs, integrated optoelectronics at 850, 1300, and 1550 nm wavelengths, submicron CMOS, Field Programmable Gate Arrays (FPGAs), solid state power controllers, high performance microprocessors, etc.

Why are these technologies enabling? The benefits may include: higher gate densities, increased speed/performance, easier system development path using COTS development and test equipment, and in the case of commercial devices, decreased lead times versus rad hard (RH) devices. IC manufacturers are being driven by a commercial market of which the space community is a very small portion. Because of this (and reduced DoD efforts in this area), these technologies must be evaluated to meet performance requirements of spacecraft, especially the smaller satellite programs. However, the radiation characteristics of these technologies may show a susceptibility to the space radiation environment.

Defining the Problem

This paper is based on the need for designers and spacecraft programs to be aware of the potential difficulties that challenge the design for the space radiation environment. Typically, most NASA engineers, project managers, etc. discuss the radiation hardness of their designs and spacecraft in terms of Total Ionizing Dose (TID) only. TID effects encompass those that appear from long-term absorption of radiation. Single Event Effects (SEE), on the other hand, are any effect caused by the passage of a single ionizing particle through a device. SEE has only recently begun to be noticed, and even then, many designers and projects are still ignorant of SEE or do not understand the seriousness of the potential problem.

Project managers desire a single number they can specify for SEE (as with a TID requirement of 10 krads(Si)) called Linear Energy Transfer (LET) or LET threshold (LETth). This is not practical. Different areas of the spacecraft have different criticalities for SEE, e.g., a Pyro controller would have a much stricter SEE requirement than would a data storage recorder that would utilize error detection and correction (EDAC) codes to correct Single Event Upsets (SEUs) as they occur. Designers face additional complicated issues.

In brief, if a designer has selected a device with some known SEE potential (based on ground test data), analysis of SEE rates for the designer's particular mission needs to be performed. However, this is not straightforward. The radiation environment must be predicted to some degree of accuracy. Then, test data must be known in detail. Questions must be then asked such as: was the device tested in the same operating mode as it will be utilized in flight, are there secondary effects such as multiple bit upsets or stuck bits, does the clock frequency of the device matter, etc.? A designer's knowledge of circuit operations as well as a radiation effects expertise on testing and environment is required to evaluate this hazard.

Therefore, it is not to say that an SEE sensitive device is unusable, but that SEE sensitivity of a device must be understood and evaluated as well as, if required, any means of mitigation of the SEEs through alternate devices, circuit design, watchdog timers, EDAC, current limiting, etc. The bottom line is to allow usage of these "soft" devices only if proper design rules are used to mitigate the SEEs.

One additional caveat is that SEE rates are often described by projects as Mean-Time-To-Failure (MTTF). This is improper. If an SEE rate is one per five years, it may happen at any time during that five year period with nearly equal probability.

2. THE SPACE RADIATION HAZARD AND ITS IMPACT ON SATELLITE DESIGN

The main sources of energetic particles that contribute to TID and SEE are:
1) protons and electrons trapped in the Van Allen belts,
2) cosmic ray protons and heavy ions, and
3) protons and heavy ions from solar flares.
Heavy ions trapped in the magnetosphere do not make a significant contribution to the TID and have sufficient energies to penetrate the satellite and generate the ionization necessary to cause SEEs. To calculate TID, contributions from the trapped protons and electrons, secondary bremsstrahlung photons, and solar flare protons must be considered. (The dose due to galactic cosmic ray ions is negligible in the presence of these other sources.) To calculate the level of SEE hazard, the cosmic ray ions, the trapped protons, and the solar flare protons must be analyzed.

The levels of all of these sources are affected by the activity of the sun. The solar cycle is divided into two activity phases: the solar minimum and the solar maximum. An average cycle lasts about eleven years, with approximately four years of solar minimum and seven years of solar maximum.

Protons and Electrons Trapped in the Van Allen Belts-Newer, high density electronic parts can be much more sensitive to protons than they are to heavy ions. In addition, it is difficult to shield against the high energy protons that cause SEE problems and contribute significantly to TID within the weight budget of a spacecraft. As a result, any successful and cost effective SEE mitigation plan must include a careful definition of the trapped proton environment and its variations.

The trapped electron population occupies regions of space known as the inner zone (extending out to about 2.4 earth radii at the equator) and the outer zone (from about 2.8 to 12 earth radii at the equator). The levels of intensities and the actual physical boundaries are dependent on particle energy, and are affected by secular variation in the magnetic field, magnetic perturbations, local time effects, solar cycle variations, and individual solar events. The outer zone population is higher in intensity by about an order of magnitude than the inner zone and extends to higher energies.

The trapped protons cannot be classified into inner and outer zone regions. For regions greater than 1 MeV, the protons occupy a volume of space that varies inversely and monotonically with the proton's energy. The approximate boundary for trapped protons with energies greater than 10 MeV is 3.8 earth radii at the equator. The trapped proton population is also affected by the secular variations in the magnetic field, magnetic perturbations, solar cycle variations, and individual solar events.

Trapped particle levels are calculated using the NASA AP8 and AE8 model. The models come in solar minimum and solar maximum versions, but the models are otherwise static and do not reflect the significant variations due to storms and the geomagnetic field changes. Consequently, the trapped particle fluxes from the models represent omnidirectional, integral intensities that one would expect to accumulate on an average over a six month period of time.

Cosmic Ray Protons and Heavy Ions-Galactic cosmic ray particles originate outside of the solar system. They include ions of all elements from atomic number 1 through atomic number 92. The flux levels of these particles are low but, because they include highly energetic particles (10s of MeV ~ E ~ 100s of GeV) of heavy elements such as iron, they produce intense ionization as they pass through matter. As with the high energy trapped protons, they are difficult to shield against. Therefore, in spite of their low numbers, they constitute a significant hazard to electronics in terms of SEEs.

As with the trapped proton population, the galactic cosmic ray particle population varies with the solar cycle. It is at its peak level during solar minimum and at its lowest level during solar maximum. The earth's magnetic field provides spacecraft with varying degrees of protection from the cosmic rays, depending primarily on the inclination and secondarily on the altitude of the trajectory. The levels of galactic cosmic ray particles also vary with the ionization state of the particle.

Protons and Heavy Ions from Solar Flares During the solar minimum phase, no significant solar flare events occur, therefore, only the seven active years of the solar cycle are modeled. Large solar flare events can occur several times during each solar maximum phase. Events last from several hours to a few days, and energies may reach a few hundred MeV. As with the galactic cosmic ray particles, the solar flare particles are attenuated by the earth's magnetosphere.

Several models of the cosmic ray and solar flare particle environments are available. Table 1 summarizes the most commonly used and most recent one.

Table 1 Summary of Radiation Sources
Radiation SourceModelsEffects of Solar CycleVariationsTypes of Orbits Affected
Trapped ProtonsAP8-MIN; AP8-MAXSolar Min - Higher; Solar Max - LowerGeomagnetic Field, Solar Flares, Geomagnetic StormsLEO, HEO, Transfer Orbits
Trapped ElectronsAE8-MIN; AE8-MAXSolar Min - Lower; Solar Max - HigherGeomagnetic Field, Solar Flares, Geomagnetic StormsLEO, GEO, HEO, Transfer Orbits
Galactic Cosmic Ray IonsCREME; CHIME; Badhwar & O'NeillSolar Min - Higher; Solar Max - LowerIonization Level, Orbit AttenuationLEO, GEO, HEO, Interplanetary
Solar Flare ProtonsKING; JPL92During Solar Max OnlyDistance from Sun; Outside 1 AU, Orbit Attenuation; Location of Flare on SunLEO (I>45°), GEO, HEO, Interplanetary
Solar Flare Heavy IonsCREME; JPL92; CHIMEDuring Solar Max OnlyDistance from Sun; Outside 1 AU, Orbit Attenuation; Location of Flare on SunLEO, GEO, HEO, Interplanetary

Mission-Dependent Environment-There are extremely large variations in the TID and SEE inducing flux levels that a given spacecraft encounters, depending on its trajectory through the radiation sources.

Low Earth Orbits (LEOs)-Satellites in LEOs pass through the particles trapped in the Van Allen belts several times each day. The level of fluxes seen during these passes varies greatly with orbit inclination and altitude. The location of the peak fluxes depends on the energy of the particle. For protons with E > 10 MeV, the peak is at about 3000 km. For normal geomagnetic and solar activity conditions, the flux levels drop rapidly at altitudes over 3000 km. However, high energy protons have been detected in the regions above 3000 km after large geomagnetic storms and solar flare events.

The amount of protection that the geomagnetic field provides a satellite from the cosmic ray and solar flare particles is also dependent on the inclination and to a smaller degree the altitude of the orbit. As altitude increases, the exposure to cosmic ray and solar flare particles gradually increase. However, the effect that the inclination has on the exposure to these particles is much more important. As the inclination increases, the satellite spends more and more of its time in regions accessible to these particles, until in polar regions, it is beyond the geomagnetic field lines and fully exposed to cosmic ray and solar flare particles for a significant portion of the orbit.

Under normal magnetic conditions, satellites with inclinations below 45 will be completely shielded from solar flare protons. During large solar events, the pressure on the magnetosphere will cause the magnetic field lines to be compressed resulting in solar flare and cosmic ray particles reaching previously unattainable altitudes and inclinations. The same can be true for cosmic ray particles during large magnetic storms.

Highly Elliptical Orbits (HEOs)-Highly elliptical orbits are similar to LEO orbits in that they pass through the Van Allen belts each day. However, because of their high altitude, they also have long exposures to the cosmic ray and solar flare environments regardless of their inclination. The levels of trapped proton fluxes that HEOs encounter depend on the perigee position of the orbit including altitude, latitude, and longitude. If this position drifts during the course of the mission, the degree of drift must be taken into account when predicting proton flux levels. HEOs also accumulate high TID levels due to both the trapped proton exposure and the electrons in the outer belts where the spacecraft spends a significant amount of time during each apogee pass.

Geostationary Orbits (GEOs)-At geostationary altitudes, the only trapped protons that are present are below energy levels necessary to initiate the nuclear events in materials surrounding the sensitive region of the device that cause SEEs. However, GEOs are almost fully exposed to the galactic cosmic ray and solar flare particles. Protons below about 40-50 MeV are normally geomagnetically attenuated, but this attenuation breaks down during solar flare events and geomagnetic storms. Field lines that are at about 7 earth radii during normal conditions can be compressed down to about 4 earth radii during these events. As a result, particles that were previously deflected have access to much lower latitudes and altitudes. Also, GEO satellites are continuously exposed to trapped electrons, hence, the TID accumulated in GEO orbits can be severe for locations on the satellite with little shielding.

Planetary and Interplanetary-The evaluation of the radiation environment for these missions can be extremely complex depending on the number of times the trajectory passes through the earth's radiation belts, how close the spacecraft passes to the sun, and how well known the radiation environment of the planet is. Each of these factors must be taken very carefully into account for the exact mission trajectory.

Careful analysis is especially important for missions that fly during solar maximum and that have trajectories that fly close to the sun. Guidelines for scaling the intensities of particles of solar origin for spacecraft outside of 1 AU have been determined by a panel of experts[1]. They recommend that a factor of 1 AU x 1/r2 be used for distances less than 1 AU and that values of 1 AU x 1/r3 be used for distances greater than 1 AU.

Experience has shown that the most effective means of reducing uncertainty factors and design margins in particle predictions is to define for the mission:
1. when the mission will fly,
2. where the mission will fly,
3. when the systems will be deployed,
4. what systems must operate during worst case environment conditions,
5. what systems are critical to mission success, and
6. the amount of shielding surrounding the SEE sensitive part(s).

Estimates that include only worst case conditions lead to overdesign and should be used only in the concept design phase of a mission when the actual launch date and length have not been defined. After the launch date and duration are defined, it is possible to estimate how long the spacecraft will be in each phase of the solar cycle. These estimates should consider the impact of a launch delay of one year. Mission scenario definition is especially important for solar flare particles where the number of events is highly dependent on the amount time that the satellite spends in solar maximum conditions.

3. BASIC RADIATION EFFECTS ON ELECTRONICS

Ionizing radiation effects in space vehicle electronics can be separated into two areas: total ionizing dose (TID) and single event effects (SEE). The two effects are distinct, as are the requirements and mitigation techniques.

TID is due to long-term degradation of electronics due to the cumulative energy deposited in a material. Effects include parametric failures, or variations in device parameters such as leakage current, threshold voltage, etc., and functional failures. Significant sources of TID exposure in the space environment include trapped electrons, trapped protons, and solar flare protons.

SEEs occur when a single ion strikes the material, depositing sufficient energy in the device to cause an SEE. The many types of SEE may be divided into two main categories: soft errors and hard errors. In general, a soft error occurs when a transient pulse or bitflip in the device causes an error detectable at the device output. Therefore, soft errors are entirely device specific, and are best categorized by their impact on the device. Single Event Upset (SEU) is generally a transient pulse or bitflip. In combinatorial logic or an analog-to-digital converter, a transient or spike on the device output would be a potential SEU; in a memory cell or latch, a bitflip would be an SEU. SEUs occurring in the device's control circuitry may also cause other effects. In general, SEUs are corrected by resetting the device or rewriting the data. During Single Event Functional Interrupt (SEFI), the device halts normal operations, often requiring a power reset to recover. SEFI most likely occurs when an SEU in the device's control circuitry places the DUT into a test mode, or a halt or undefined state. Again, this depends on the device itself.

Hard errors may be - but are not necessarily - physically destructive to the device, and cause permanent functional effects. Single Hard Error (SHE) causes a permanent change to the operation of the device. A common example would be a stuck bit in a memory device. Like SEUs, this is also device dependent. Single Event Latchup (SEL) is a potentially destructive condition involving parasitic circuit elements. During a traditional or destructive SEL, the device current exceeds the maximum specified for the device. Unless power is removed, the device will eventually be destroyed. A Microlatch is a type of SEL where the device current is elevated, but below the device's specified maximum. Again, a power reset is required to recover normal device operation. Single Event Burnout (SEB) is a highly localized destructive burnout of the drain-source in power MOSFETs. Single Event Gate Rupture (SEGR) is the destructive burnout of a gate insulator in a power MOSFET.

The SEE sensitivity of a device is discussed in terms of LET and Cross Section (s). LET is a measure of the energy deposited per unit length as an ionizing particle travels through a material. The common unit is MeV*cm2/mg of material (Si for MOS devices). LET threshold (LETth) is the minimum LET to cause an effect, at a given particle fluence of 1E6 or 1E7 ions/cm2 . s reflects the device area which is sensitive to ionizing radiation. For a specific LET, cross section is calculated: s = #errors/particle fluence. The units for cross section are cm2 per device or per bit. Sensitive volume refers to the device volume affected by SEE-inducing radiation. The sensitive volume is, in general, much smaller than the actual device volume.

4. SATELLITE SYSTEM LEVEL CONCERNS

Device parametric and permanent functional failure are the principal failure modes associated with the TID environment. Since TID is a cumulative effect, total dose tolerances of devices are MTTF numbers, where the time-to-failure is the amount of mission time until the device has encountered enough dose to cause failure. As discussed earlier, the mission orbit, launch date, and launch length determine the external radiation environment. The device exposure to this hazard is determined by the amount of shielding between the device and the external environment. Requirements and design considerations are therefore based on device location on the spacecraft. Effective mitigation tools include device TID hardness, spot-shielding of devices, box shielding, and placing electronic boxes inside the spacecraft and/or closer together. Redundancy with powered-on devices is not effective as mitigation, since these devices will also degrade.

The system-level impact of SEE depends on the type and location of the effect, as well as on the design. Permanent device failure is, of course, of great concern. The effects of propagation of transient SEEs through a circuit, subsystem, and system are also often of particular importance. For example, a device error or failure may have effects propagating to critical mission elements, such as a command error affecting thruster firing. There are also cases where SEEs may have little or no observable effect on a system level. In fact, in most designs, there are specific areas in which SEUs have less system impact from certain radiation effects. As stated previously, a data storage recorder utilizing EDAC would fit this category. The more critical an SEE is to operational performance, the more strict the requirements should be. Since SEE presents a functional impact to a device, functional analysis enables evaluation of severity. The design is viewed in terms of function, not by box or physical subsystem. Functions are categorized into defined "criticality classes", or categories of differing severity of SEE occurrence. For example, for a project, there might be three criticality groups for SEU: error-functional, error-vulnerable, and error-critical. Functions in the error-functional groups are unaffected by SEUs, whether it be due to an implemented error-correction scheme or redundancy. Functions in the error-vulnerable group might be those that the risk of a low probability is assumable. Functions in the error-critical group are functions where SEE is unacceptable.

Both the functional impact of an SEE to the system or spacecraft and the probability of its occurrence provide the foundation for setting a design requirement. Unlike TID tolerances, SEE rates are probabilistic, given as a predicted span of time within which a SEE will randomly occur. Requirements are specified for each functional group by specifying the maximum probability of SEE permitted in each category. Optimizing design for SEE tolerance is a trade study in risk, cost, performance, and design complexity. System-level SEE requirements may be fulfilled through a variety of mitigation techniques, including hardware, software, and device tolerance requirements. The most cost efficient approach may be an appropriate combination of SEE-hard devices and other mitigation. However, the availability, power, volume, and performance of radiation-hardened devices may prohibit their use. Hardware or software design also serve as effective mitigation, but design complexity may present a problem. A combination of the two may be the selected option. It is important to note that, in general, shielding is not an effective mitigation tool for SEE, unless a device is soft to attenuable protons.

5. TID LESSONS: DEVICE SCREENING AND MITIGATION

Setting TID Requirements

The prediction of the mission-specific radiation environment in the initial design phase is one of the most important tasks in the radiation effects analysis. Mission-specific TID in the early design phase is calculated using an ideal geometry, such as a solid aluminum sphere. The ideal geometry approximates the total shielding thickness between the space environment and the point of exposure. This TID prediction is used to define spacecraft-level TID requirements for early design efforts and serve as the starting point for TID-tolerant design.

It has been observed that TID can vary by one and as much as two orders of magnitude depending on the location in the spacecraft. Therefore, using ideal geometries to provide spacecraft-level requirements can set TID requirements unnecessarily high for some components. The spacecraft, instrument, electronic boxes, and any other material substance can all contribute to shielding. Representing these structures in a three-dimensional radiation model provides the means of calculating TIDs via 3-D ray trace methods at the component level or electronic box level. For critical missions or missions with high radiation environments, it is recommended to schedule a 3-D ray trace prediction close to the beginning of the preliminary design phase, when the spacecraft geometry is reasonably well defined and the boxes are arranged into the structure. With this method, component level and /or box level TID requirements can be set for the design. TID requirements stemming from this effort will be more accurate, and usually lower, than from an ideal geometry calculation, allowing for a more efficient design. Over-specifying tolerance requirements can be avoided with subsequent savings in costs.

Meeting TID Requirements

TID requirements are met through many avenues. Electronic devices may be procured to a hardness level sufficient to meet the box requirement. Some device packaging techniques are designed to increase radiation tolerance. However, these devices are typically costly and have long lead times for procurement. Shielding is an effective TID mitigation tool but can be costly in terms of the added weight to the spacecraft. At a device level, spot shielding offers the least impact on the weight budget. However, for electronic boxes in which large amounts of circuitry must be protected, box-level shielding may be the only practical method of reducing dose through shielding.

Slight redesign at the spacecraft and/or subsystem level can also reduce TID exposure levels without impacting the weight budget. Electronic boxes placed inside a spacecraft structure receive more radiation shielding from the spacecraft than those on the outside of the structure. In addition, electronic boxes placed closer together provide more shielding to each other than boxes further apart. Internal box structures and components also provide shielding. Designing the softest, or less radiation tolerant devices into the center of the box, with the more radiation tolerant devices on the outer regions provides still more potential shielding to the least tolerant devices.

Verification of System Hardness and Parts Testing

Verification is the process in which the design is demonstrated to meet requirements. Dynamic verification refers to verification while the design is changing. The product is continually designed to meet requirements. Early in design, initial candidate electronic device lists are gathered from appropriate design engineers. The group of initial parts lists serves as the initial parts database. For projects utilizing design heritage, these heritage design device lists are usually the starting point. The parts list provides for communication between engineers and radiation experts. In addition, the lists should be separate for each box to facilitate later verification with TID box level requirements if necessary.

The parts list is then "scrubbed" for TID tolerance by appropriate experts. This parts list scrubbing compares TID requirements with known tolerances of the candidate devices. Recommendations for design come out of this review and may be in the form of device acceptance, device rejection, better device alternatives, design mitigation, etc. If shielding is added, its effectiveness can be verified by adding the shielding to the 3-D model and recalculating the TID. These recommendations are device specific line items and are fed back to the designers. These input provide design engineers with radiation information and recommendations for implementing or modifying heritage designs. With this valuable input being considered during early stages of design when device selection and box design first begin, heritage use is maximized and identified radiation issues are addressed early on. At periodic intervals in design, modified parts lists are obtained and reviewed for radiation tolerance.

Devices with unknown radiation tolerance characteristics should be replaced by alternates with known tolerance to the part requirement or else tested to qualify them for radiation. Radiation testing of key devices with unknown tolerance during design reduces the risk of schedule and cost impacts of redesign and/or work-arounds. Although device TID tolerance may vary by a factor of two or more from lot to lot, look ahead testing of devices gives insights into their use. In later development phases, testing of the flight lot parts is critical for commercial grade devices to account for the lot to lot variations that may occur as a result of manufacturers' changes in processing.

6. SEE LESSONS

Requirements

Flight hardware, in order to be acceptable from an SEE standpoint, must pass several requirements. First and foremost, no SEE may cause permanent damage to a system or subsystem. SEL-immune components, defined as a device having an LETth > 100 MeV*cm2/mg, should therefore be used. For any device that is not immune to SEL or other potentially destructive conditions, protective circuitry must be added to eliminate the possibility of damage, and verified by analysis or test.

Wherever practical, procure SEU immune devices. In devices which are not SEU- immune, the improper operation caused by an SEU must be reduced to acceptable levels, and may not cause performance anomalies or outages which require ground intervention to correct. Additionally, analysis for SEU rates and effects must take place based on the experimentally determined LETth and s of the candidate; if such device test data does not exist, ground testing is required. Error rate predictions are calculated using mission-specific cosmic ray induced LET spectrum, trapped proton environment spectrum, and solar flare environment spectrum, as seen in Table 2. Systems engineering analysis of circuit design, operating modes, duty cycle, device criticality, etc... shall be used to determine acceptable error levels for that device. Means of gaining acceptable levels include parts selection, error detection and correction schemes, redundancy and voting methods, error tolerant coding, or the acceptance of errors in non-critical areas.

Table 2 Required SEU Analysis
Device LET Threshold, in MeV*cm2/mgEnvironment to be assessed
LETth < 10Cosmic Ray, Trapped Protons, Solar Flare
LETth = 10 - 100Cosmic Ray
LETth > 100No Analysis Required

Ground Testing

SEE ground testing should be performed ideally in Phase A or B of projects. In order to calculate an accurate error rate prediction, test data must reflect actual flight applications. Therefore, whenever possible, a DUT operates under conditions (clock speed, voltage level, etc.) similar to its potential flight application. Changes in device operating conditions may have a great impact on SEU rates; for example, during testing of the 80486 microprocessor, SEU totals were significantly different between cache-intensive and non-cache-intensive applications. However if the specifications for a flight application are unknown, devices may be tested under either typical or worst-case conditions. The data may then be scaled up or down to reflect a specific application.

While testing for heavy ion induced events, the error measurement determines the upset cross- section. The measurement is repeated with various particle types and energies which vary in ionization strength, as measured by the particle's LET. The results of a series of such tests are customarily presented in a plot showing the cross-section versus the particle LET. For a given part type, a family of such curves may be measured to quantify the part's upset sensitivity under various operating conditions including static versus dynamic operation, operating voltage, read versus write mode, etc., as appropriate for the planned application. Proton upset measurements follow a similar treatment, though the cross-section dependence is then on the proton energy instead of the LET.

SEU Rate Predictions and Impact Analysis

The bases for the SEU upset rate predictions for on orbit applications are gained through heavy ion and proton testing under laboratory conditions, as described above, to determine a device's upset sensitivity.

Once the sensitivity to the relevant range of particles is known, the next step in calculating the expected upset rate on orbit involves determination of the expected particle environment and its dependencies on orbital position, solar cycle, solar weather conditions, and other variables. For a given orbit of interest, these models are exercised to evaluate the fluxes of protons and heavy ions at the location of the device of interest in the satellite. These calculations account for the satellite shielding effects, and the result is an environment assessment indicating the energy distribution and numbers of protons and heavy ions reaching the device. In the case of cosmic rays, the heavy ion particle environment may be combined with estimates of the geometry of the sensitive node in the microcircuit to evaluate the rate of depositing charge packets exceeding the minimum amount required to alter the state of the circuit. The environment estimates from these models are combined mathematically with the circuit sensitivity measurements described in the preceding paragraph to calculate expected upset rates from the proton and heavy ion environments respectively, and the two results are combined to arrive at an aggregate upset rate. A similar approach is applied to assess upset rates due to solar flare protons.

The impact of a given upset rate is a very application dependent issue. As described above, analysis begins with top down impact assessments in the conceptual design phase of the mission, resulting in criticality levels for the respective hardware subsystems along functional boundaries. This process is known as Single Event Effect Criticality Analysis (SEECA), and its successful implementation begins in mission planning and continues as a key design tool through the subsystem design phase.

SEU Mitigation

Digital and analog devices, like SEEs, may be divided into two overlapping categories: memory or data-related devices such as RAMs or ICs used in communication links or data streams, and control-related devices such as microprocessors, logic ICs, and power controllers.

Mitigation of Memories and Data-Related Devices-There are several options for data- related SEU mitigation. First, parity checking is a "detect only" scheme, which counts the number of logic one states occurring in a data set, producing a single parity bit saying whether an odd or even number of ones were in that structure. [2]. This scheme will flag an SEU if an odd number of bits are in error, but not if an even number of bits are in error.

A second option, Hamming code, is known as single bit correct, double bit detect. The use of EDAC schemes such as this, known as scrubbing, is common among current solid-state recorders flying in space [for example, 3,4]. Hamming code schemes encode an entire block of data with a check code; this method will detect the position of a single error, and the existence of more than one error in a data structure [2]. Because the SEU position is known, it is possible to correct this error. This coding method is recommended for systems with low probabilities of multiple errors in a single data structure (e.g., only a single bit in error in a byte of data).

Other block error codes provide more powerful error correcting codes (ECCs). Among these, Reed-Solomon (R-S) coding is becoming widespread in its usage [5]. The R-S code is able to detect and correct multiple and consecutive errors in a data structure. An example [6] is what is known as (255,223), or a 255 byte block having 223 bytes of data with 32 bytes of overhead. This particular R-S scheme is able to correct up to 16 consecutive bytes in error, and is available in a single IC designed by the NASA VLSI Design Center [6]. A modified R-S code for a SSR has been performed by software as well [7].

Convolutional encoding [8] differs from block coding by interleaving the overhead or check bits into the actual data stream rather than being grouped into words. This provides good immunity for mitigating isolated burst noise, and is particularly useful in communication systems.

Mitigation may also be performed at the system level. Typical error detection schemes as described above may be used, and error correction may be accomplished by rewriting or retransmitting data. A combination of EDAC techniques may be most effective.

The above methods provide ways of reducing the effective bit error rate (BER) of data storage areas such as solid-state recorders and communication paths or data interconnects. Table 3 summarizes sample EDAC methods for memory or data devices and systems.

Table 3 Sample EDAC Methods for Memory or Data Devices and Systems
EDAC MethodEDAC Capability
ParitySingle bit error detect
Hamming CodeSingle bit correct, double bit detect
RS CodeCorrect consecutive and multiple bytes in error
Convolutional encodingCorrects isolated burst noise in a communication stream
Overlying protocolSpecific to each system implementation

Mitigation of Control-related Devices-The above techniques are useful for data SEUs, and may also be applicable to some types of control SEUs as well. Highly integrated devices such as VLSI circuitry or microprocessors leave the system potentially more vulnerable to hazards such as issuing an incorrect command to a subsystem, or functionally interrupting system operations. Additionally, many newer devices, especially microprocessors, have hidden registers not accessible external to the device, which provide internal device control and may affect device or system operation. Microprocessor software tasks or subroutines dubbed Health and Safety (H&S) may provide some SEE mitigation [9]; H&S tasks may include memory scrubbing with parity or other code methods on external devices, or on registers internal to the microprocessor. They also might use internal hardware timers to set watchdog timers (some type of message is sent indicating health of a device or system) or to pass H&S messages between spacecraft systems.

Redundancy between circuits, boxes, systems, etc. provides a potential means of recovery from an SEE on a system. Autonomous or ground- controlled switching from a prime system to a redundant spare may provide system designers an option, depending on spacecraft power and weight restrictions. Alternately, lockstep operation uses two identical circuits performing identical operations with synchronized clocking, a technique often used with microprocessors [10]. Errors are detected when the processor outputs do not agree, implying that a potential SEU has occurred. The system then has the option of reinitializing, etc. However. for longer spacecraft mission time frames, lockstep circuits using commercial devices may cause TID-induced problems; clock skew with increasing dosage may cause false triggers when the lockstep devices respond to the dosage differently. Voting takes lockstep systems one step further: with three identical circuits, choose the output that at least two agree upon. Katz, et al. [11] provide an excellent example. They have proposed and SEU-tested a triple modular redundancy (TMR) voting scheme for FPGAs. FPGAs provide higher gate counts and device logic densities than older LSI circuits; while this reduces the IC count for spacecraft electrical designs, with the TMR scheme you essentially lose over two- thirds of the available FPGAs gates.

Good engineering practices for spacecraft provide other means of mitigation [12]. Utilizing redundant command structures (two commands trigger an event with different data or addresses), signal power margins, etc. may aid an SEU hardening scheme. These and other good engineering practices usually allow designers to be innovative and discover sufficient methods for SEU mitigation as needed. Unknown device or system SEE characteristics provide the greatest risk to a system and conversely, the greatest challenge to an electrical designer.

Treatment of Destructive Conditions and Mitigation-Destructive conditions may or may not be recoverable depending on the individual device. Hardening from the system level is difficult at best, and in most cases, not particularly effective, due to several concerns. First, non-recoverable destructive events such as single event gate rupture (SEGR) or burnout (SEB) require redundant devices or systems to be in place since the devices fail when this occurs. SEL may or may not have this same effect and is very device specific. Microlatch, in particular, is difficult to detect since the current consumption of this condition may be within that of normal device operation. LaBel [13] has demonstrated the use of multiple watchdog timeout conditions as a potential mitigation scheme. A similar concern exists if current limiting is performed on a card or higher integration level: a single device may see SEL at a high enough current to destroy itself, but not at a sufficient current to trigger the overcurrent protection on the card. Current limiting circuits to cycle power on individual devices are often considered, but failure modes of this protection circuit are sometimes worse than finding a less SEL-sensitive device (e.g., infinite loop of power cycling may occur). Hence, SEL should be treated by the designer on a case-by-case basis considering the device's SEL response, circuit design, and protection methods. A risky method of SEL protection on SEL-vulnerable devices involves reading the device's current periodically, and cycling power if the current exceeds a specified limit. This method can use either telemetry points or device calibration parameters to be successful [14].

Sample Methods of Improving Designs for SEE Performance-By changing circuit design or parameters, improved SEU performance may be gained. Marshall [15] and LaBel [16] have demonstrated ways of improving a fiber optic link's BER from SEU by choice of diode material (III-V versus Si) resulting in a significantly smaller device sensitive volume, method of received signal detection (edge versus level sensitive) defining a dynamic sensitive time window, and optical power margin (BER decreases with increased margin). These and similar techniques may apply to other designs as well.

Sample Method of Realistic SEE Risks and Usage-Many factors determine whether a device's SEE risk factor makes is usable in spaceflight or not: mission environment, device test data, modes of operation, etc. For example, the SEDS RPP uses EEPROMs for its boot and application software storage on-board the SAMPEX spacecraft, which have shown a sensitivity to SEUs while being programmed, but not when being read from [13]. Also, stuck bits may occur during programming, albeit at LETs above Ni-58 (low probability). Since launch in July 1992, the application software EEPROMs have successfully been reprogrammed in-flight twice, but with certain constraints: programming must occur during a relatively flux-free portions of the orbit, and the boot EEPROM is not programmed during flight. Why was the risk taken? The SEDS verifies programmed data prior to loading the new executable software: if a incorrect byte was programmed into the device, this mitigation scheme would catch it; if a stuck bit is discovered, it is possible to memory map around the failed location. Additionally, the time window during programming when the device is susceptible to error is very small; the device sees few, if any, particles capable of causing an anomaly. However, it should be noted that the risk might be unacceptable if continuous programming of the EEPROM was being performed.

7. ILLUSTRATION OF COMMERCIAL TECHNOLOGIES SUCCESSFULLY UTILIZED IN SPACECRAFT

The Total Ozone Mapping Spectrometer (TOMS) is a joint US/USSR scientific experiment launched aboard a USSR Meteor-3 spacecraft. The TOMS/Meteor-3 is the first NASA mission to place a Solid State Recorder (SSR) into orbit as the main data recording device for the instrument. This SSR is a memory-based array of SRAMs utilized for storing science and engineering telemetry on- board the spacecraft. The TOMS/Meteor-3 SSR utilizes an array of Hitachi 256Kbit (32k x 8) SRAMs. This spacecraft flies in a 82°, 1200 km orbit.

The Solar Anomalous Magnetospheric Particle Explorer (SAMPEX) is the first in a series of Small Explorer spacecraft being managed by GSFC. In order to meet mission constraints of power, weight, and volume, newer enabling technologies were utilized in the spacecraft's command and data handling subsystem known as the Small Explorer Data System (SEDS). These technologies include a SSR similar to TOMS/Meteor-3 in terms of utilization of the same Hitachi 256 kbit SRAMs, but also include the SEDS MIL-STD-1773 Fiber Optic Data Bus (or SEDS 1773 bus), the Intel 80386 microprocessor family, surface mount technology, etc. The SEDS 1773 bus, utilizing commercially available fiber optic transmitters and receivers, is the first known utilization of a fiber optic data bus as an in-line spacecraft subsystem. SAMPEX flies in a 82°, 580 x 640 km orbit.

Both SAMPEX and TOMS utilize Hamming code for error detection and correction (EDAC) schemes. SAMPEX uses a 32-bit data path and 8-bits of Hamming code (32,8). TOMS, on the other hand, employs a (64,8) scheme. TOMS also employs a built-in test (BIT) EDAC feature. This method essentially performs a read of an unused memory location, compares the read value with a known value, and writes back the correct value if the two values differ.

The SEDS 1773 employs a different style of EDAC: a system level protocol method. This system utilizes among its error control features two methods of detection: parity checks and detection of a non-valid Manchester encoding of data. As stated above, parity is a "detect only" method of mitigation and does not attempt to correct the error that occurs. The second method detects if the data, which is Manchester-encoded, is in the proper format. Ground testing has shown that Manchester- encoding errors are the prime mechanism for expected SEUs [17].This military standard has a system level protocol option of retransmitting or retrying a bus transaction up to three times if these error detection methods are triggered. Thus, the error detection schemes are via normal methods, while the error correction is via retransmission.

Both SAMPEX and TOMS SSRs Hamming code EDAC schemes have performed successfully since their respective launches. The SAMPEX SSR has performed flawlessly: no discernible engineering noise is evident and all data has been captured without loss. Even with its engineering noise, the TOMS SSR and its EDAC have successfully collected 100% of the spacecraft science and engineering telemetry. All SEUs observed on both spacecraft have been single bit errors; no multiple bit errors have been noted.

The SEDS system detects the number of retransmissions (or retries) that occur following an SEU on the 1773 bus. Ground test data [18,17] has shown that all SEUs observed by the system are in the form of non-valid Manchester errors causing a bus retry. For SAMPEX, a single retry is enabled. This is all that has been necessary. Calculations by LaBel, et al... [17] have shown that the probability of a failure of a retried message is extremely small. Indeed, all bus retries have been successful. Thus, the effective bit error rate (BER) is zero.

8. CONCLUSIONS

In this paper, we have presented a summary of the concerns of radiation effects on commercial technologies and how their radiation sensitivities may be dealt with in terms of defining and evaluating the hazard and methods of mitigating the radiation effects when required. Additionally, we presented this overview with the intent to aid the designers and program managers in evaluating the usage of these commercial technologies in the space radiation environment.

9. REFERENCES

1. Shea, M.A. et al.,"Toward a Descriptive Model of Solar Particles in the Heliosphere, Working Group Reports, Interplanetary Particle Environment," edited by J. Feynman and S. Gabriel, Joint Propulsion Lab., California Institute of Technology, Pasadena, CA, JPL 88-26, 1988.

2. A.B. Carlson, Communication Systems, New York: McGraw Hill Book Company, 1975.

3. K.A. LaBel, S. Way, E.G. Stassinopoulos, C.M. Crabtree, J. Hengemihle, M.M. Gates, "Solid State Tape Recorders: Spaceflight SEU Data for SAMPEX and TOMS/Meteor-3", Workshop Record for the 1993 IEEE Radiation Effects Data Workshop, pp 77-84, 1993.

4. R. Harboe-Sorensen, E.J. Daly, L. Adams, "Observation and Prediction of SEU in Hitachi SRAMS in Low Altitude Polar Orbits", IEEE Trans. Nucl. Sci., vol 40, pp 1498-1504, Dec 1993.

5. W.K. Miller, Private communication, 1995.

6. R-S Encoder Data Sheet, NASA VLSI Design Center, 1994.

7. C.I. Underwood, R. Ecoffet, S. Duzellier, D. Faguere, " Observations of Single-Event Upset and Multiple-Bit Upset in Non-Hardened High-Density SRAMs in the TOPEX/Poseidon Orbit", Workshop Record for the 1993 IEEE Radiation Effects Data Workshop, pp 85-92, 1993.

8. W.L. Pritchard, H.G. Suyderhoud, R.A. Nelson, Satellite Communication Systems Engineering, New Jersey: Prentice Hall, Inc., 1993.

9. R.J. Whitley, Private communication, 1995.

10. J.L. Kaschmitter, D.L. Shaeffer, N.J. Colella, C.L. McKnett, P.G. Coakley, "Operation of Commercial R3000 Processors in the Low Earth Orbit (LEO) Space Environment", IEEE Trans. Nucl. Sci., vol 38, pp 1415- 1428, Dec 1991.

11. R. Katz, R. Barto, P. McKerracher, R. Koga, "SEU hardening of Field Programmable Gate Arrays (FPGAs) for Space Applications and Device Characterization", IEEE Trans. Nucl. Sci., vol 41, pp 2179-2186, Dec 1994.

12. Engineering Directorate Electrical Design Guidelines, NASA/GSFC, 1991.

13. K.A. LaBel, E.G. Stassinopoulos, G.J. Brucker, C.A. Stauffer, "SEU Tests of a 80386 Based Flight- Computer/Data-Handling System and Discrete PROM and EEPROM Devices, and SEL Tests of Discrete 80386, 80387, PROM, EEPROM and ASICS", Workshop Record for the 1992 IEEE Radiation Effects Data Workshop, pp 1-11, 1992.

14. S.K. Miller, Private communication, 1994.

15. P.W. Marshall, C.J. Dale, M.A. Carts, K.A. LaBel, "Particle-Induced Bit Errors in High Performance Fiber Optic Data Links for Satellite Data Management", IEEE Trans. Nucl. Sci., vol 41, pp 1958-1965, Dec 1994.

16. K.A. LaBel, D.K. Hawkins, J.A. Cooley, C.M. Seidleck, P.W. Marshall, C.J. Dale, M.M. Gates, H.S. Kim, E.G. Stassinopoulos, " Single Event Effect Ground Test Results for a Fiber Optic Data Interconnect and Associated Electronics", IEEE Trans. Nucl. Sci., vol 41, pp 1999-2004, Dec 1994.

17. LaBel, K. A., Marshall, P., Dale, C.,Crabtree, C.M., Stassinopoulos, E.G., Miller, J.T., Gates, M.M., "SEDS MIL-STD-1773 Fiber Optic Data Bus: Proton Irradiation Test Results and Spaceflight SEU Data", IEEE Transactions on Nuclear Science, Vol. 40, No. 6, December 1993, pp. 1638-1644.

18. Crabtree, C. M., LaBel, K. A., et al., "Preliminary SEU Analysis of the SAMPEX MIL-STD-1773 Spaceflight Data", SPIE Proceedings from OE/Aerospace '93: Photonics for the Space Environment, Vol 1953, Orlando, FL, April 1993.