for more information, contact Kenneth A. LaBel
Kenneth A. LaBel, NASA Goddard Space Flight Center
SEU propagation is the art and science of determining the effect and potential impact that the occurrence of an SEU has on the device where the SEU occurs, its associated circuitry, subsystem, system, and spacecraft. That is to say, how an SEU propagates up the ladder of design integration. For example, an SEU occurs in an A-to-D converter causing a single incorrect data sample to be gathered. This "invalid" data sample may provide an incorrect data point such as a star location or a misleading temperature value.
The concept of propagated SEUs is straightforward to the typical electrical engineer. It is similar to what one might perform in a standard mathematical circuit simulation, that is, how a signal pulse, transient, or state will affect a circuit's performance either instantly or in future clock cycles.
Several groups have published information pertaining to either the simulation of SEU effects and their propagation to circuit and system level, as well as the performance of SEE ground testing on devices with the actual circuit design as used in a spacecraft system [1-12].
Newberry, et al. [1-3] have been leaders in the area of SEU propagation. In particular, they have discussed the effects of radiation-induced input/output (I/O) transients or noise spikes on system performance as well as that of VLSIC transients. In essence, the concept relays the idea that traditional bit flips in memory cells are not the only cause of SEUs on a system level, but also SEU-induced voltage spikes occurring in logic or I/O devices impact the system SEU rate and effects. This work was also among the first to discuss transients and circuit-specific levels for defining SEUs (i.e., duration and amplitude constraints). For example, a 0.25V spike of 5 nanoseconds in duration may or may not be observed by the following circuit elements.
Leavy, et al. [4] have described the propagation of events inside of a bulk CMOS microprocessor with SEU-hardened clocked flip-flops. In this instance, SEU-induced transients on the clock lines were shown to be capable of causing upsets to microprocessor operation. As a side note, Leavy, et al. were able to solve this problem through a circuit redesign for their next foundry run of the microprocessor.
LaBel, et al. [5,6] have described the effects of transients in a fiber optic receiver photodiode as well as how this affects a system bit error rate (BER) from both the physical link perspective as well as through higher layers of network protocol. This will be described below.
SEU-induced transients in analog devices have been reported by several organizations [7,8,9]. All of these references point out two facts. First, transients in devices such as a comparator or op amp may propagate to the digital electronics in the surrounding circuitry. Depending on the specific circuit designs, these transients may only corrupt a single telemetry sample or, in a worst case scenario, cause system disfunction or failure. The second item was pointed out by Newberry [3] as well: the definition of an analog SEU phenomena is specific to the interface circuitry surrounding the radiation-sensitive device.
Taking this one step further, Turflinger, et al. [10,11] have extensively delved into separating SEUs for conventional analog-to-digital converters (ADCs) into several categories. The two major categories are noise and offset errors that are analogous to Gaussian and non-Gaussian errors. Neither of these errors is fatal to the device itself, but both are capable of causing erroneous telemetry and misinterpretation by or impairment of the surrounding spacecraft systems.
McCarty, et al. [12] also have explored an ADC. However, this ADC was not a conventional successive-approximation register (SAR) or flash ADC, but a complex hybrid delta-sigma averaging ADC susceptible to both noise and offset errors, as well as control errors. These control errors are capable of affecting device operation and calibration. Furthermore, they hinder system performance in a space environment.
At this point, we have emphasized the effects of transient SEUs on system performance. It is not intended to slight digital SEU effects such as bit flips. These types of SEUs may propagate, for example, from a control or data register inside of a microprocessor into operational performance of the circuit or system. A worst case example may be the false commanding of critical hardware such as a thruster or pyro.
In some instances, it is not required to know what particular area of a device has seen an SEU, but how well the system mitigation design will work. NASA has been among the first to fly a commercial 32-bit microprocessor in a critical space application [13]. The Small Explorer Data System (SEDS) is a spacecraft Command and Data Handling subsystem for the Solar Anomalous Magnetospheric Particle Explorer (SAMPEX) mission at Goddard Space Flight Center (GSFC). Included as a critical portion of the SEDS is the Recorder Processor Packetizer (RPP): an INTEL 80386 microprocessor-based flight computer with 26.5 MBytes of solid state data storage.
One of the design features of the SEDS is its built-in fault tolerance and its ability to recover from observed errors. This is accomplished via SEDS hardware watchdog circuitry (at multiple levels: circuit, board, box, etc...) as well as software health and safety tasks. To this end, a SEU test was performed on the RPP. SEUs were induced on the 80386 microprocessor family in order to verify the fault tolerant capabilities of the SEDS [13]. The Brookhaven National Laboratories' tandem Tandem VandeGraaff accelerator was utilized for this purpose.
To summarize the SEDS ground SEE test results, several different errors were observed including a halting of the RPP's operation and "processor exceptions". All the SEE events were recoverable using planned mitigation techniques by the SEDS.
It should be noted that the SEDS has been performing flawlessly from the SEE mitigation perspective since its launch in July of 1992.
In many ways, SEU propagation is similar to both traditional circuit simulation and FMEA. In both instances, the end result is to determine the end effects that an error or failure has on the performance of a device, circuit, or system. To this end, we shall trace the steps and engineer may utilize in determining SEU propagation effects.
This is the lowest level of propagation analysis included herein. Figure 5.1 illustrates this methodology.
Step 1: Is the device sensitive to SEUs?
This is relatively straightforward,
Step 2: Does the device meet mission requirements?
A device that has a known SEU sensitivity might still meet mission requirements. An example would
be a device having an LETth = 45 MeV*cm2/mg when the mission requires
devices with a LETth > 35 MeV*cm2/mg. The device is not insensitive to
SEUs, but is acceptable for this particular mission.
Step 3: Determine SEU sensitive device areas.
In analyzing a device, one must determine where and what types of SEUs may occur. Simple
devices such as a memory device may have two device areas for discussion: memory cells and
control logic while complex devices such as microprocessors may have dozens of individual areas.
As one would expect, the more highly integrated a device is, the more sensitive areas may be
associated with it. For simplicity, we shall limit the types of SEUs discussed to two types: bit
flips (state changes) that typically occur in memory cells or flip-flops, and transients, those SEUs
that occur in combinatorial logic or manifest themselves as a "noise" spike on both
analog and digital IC areas. Table 5.1 illustrates several potential ICs and their associated areas.
This list should not be construed as exhaustive, but simply a sampling of device types.
Step 4: Determine operational parameters
How a device is being utilized in its specific application may affect its SEU performance
as well. Parameters such as access rates, operational modes, clock frequency, power
supply voltage, etc... have definitive impacts not on the occurrence, but on the observed
effect of an SEU. Several examples may aid the reader to understand this.
Starting with an SRAM device, used in a data storage area, provides a simple example.
SRAMs, again for convenience, have three operating conditions: Read, Write, and Static
(Data Storage) modes. SEU ground testing may show each mode to have a different SEU
sensitivity, i.e. LETth and cell cross-section. In a typical SSR application, an
SRAM is written to once between downlink operations to the ground, read once during
downlink playback, and remains in static mode for the remainder of the time (typically >99%).
Because all memory cells in a device are not written to at the same time (i.e., one byte
at a time), SEUs that have an observed effect are those that occur during a write or read
operation and those that occur after the device is written to and prior to downlink. If an SEU
occurs during the time period between downlink and the writing of a memory cell, the SEU
would be overwritten during the write operation. Hence, that particular SEU has no observed
effect. This is sometimes known as a benign SEU. Additionally, actual write and read
accesses take on the order of 10-200 nsecs to occur. Thus, the sensitive time window, i.e.,
the time period when an SEU has an observed effect, is very small for these operations.
A second sample scenario might involve a microprocessor. As discussed previously, these
types of devices are very complex and have many different areas where an SEU may occur.
Some areas have obvious effects on the device performance: for example, a program control
(PC) register. If a bit flip occurs in the PC, the microprocessor program flow is disrupted.
However, there may be other device areas such as a status register or an area of the device
not being utilized where the occurrence of an SEU is benign. If, for example, the microprocessor
has a programmable interval timer (PIT) built-in, one must know if and how it is utilized in this
specific design. If the PIT is not used, the SEU would be benign. If the PIT is utilized, one must
analyze what performance effect (i.e., different time period than expected) this has based on
when the SEU occurs. Additionally, one should know the expected operating modes and area
utilization to determine sensitive time windows and non-benign SEU conditions.
Other parameters may affect the device's SEU performance. These include clock frequency
and power supply voltage. One should always ask the "what if" question: what if an
SEU occurred at location A during time period B? Note that the probability of SEU observance
is linked to the sensitive time window for the event as well as to area SEU sensitivity and the
environment.
Step 5: Determine/simulate device performance
Now that we have determined the sensitive device areas and operational effects on observed
SEUs, the determination of what apparent effect the SEU has on device performance must be
explored. Several outcomes may transpire. These include, but are far from limited to:
If one looks at this as a traditional circuit simulation, digital test vectors with errors (SEUs)
could be used to determine the observed effect. At a lower level, SPICE (analog) simulations
with injected transients could be utilized as well. Sample scenarios would include FPGA
simulations of combinatorial and/or sequential logic or a microprocessor PIT sending out a
pulse at an incorrect time. The output of this analysis is a list of potential SEUs for each device.
Circuit level analysis follows the same steps (3-5) as the device level but with the key now
being the circuit operation and performance. As with device level analysis, once we know
which devices have SEUs and what those SEUs may look like, we then look at the
operational parameters and their impacts on SEU performance. For example, we know
that a bit flip may occur in an SRAM, but the circuit level effects are dependent on the what
the SRAM is being used for in this application. Sample propagated effects might include:
One must again be aware of the potential for benign and non-benign SEU effects. A sample
case is as follows. Assume that a bus driver IC that is being used to drive a microprocessor
address bus has an SEU-induced noise spike. Both the time that this spike occurs and the
transient's amplitudes (time and voltage levels) determine whether this condition is observed
by the surrounding circuitry as an error or not. Again, the concept of a sensitive time window
is observed. If the transient occurs on a quiescent bus (i.e., no transactions taking place), the
SEU is most likely benign. If the transient occurs on an active bus, the SEU may or may not
be non-benign depending on the exact timing of the transaction and the noise spike, as well
as the spike's amplitudes.
Once the operational analysis is performed, the engineer is again able to perform a circuit
simulation using digital or analog tools. The output of this analysis is a list of the potential
SEUs in a circuit and their effects on circuit operation. We may view this as a "black
box" wherein the internal circuitry doesn't matter, but what is observed by the outside
world (subsystem, system , etc...) is noted.
We may treat subsystem, system, and spacecraft levels of analysis in a single manner.
Each of these levels handles the previous level as a black box, not worrying about intimate
details, but only on the higher level effects. We will discuss the subsystem level herein as
a representative analysis layer.
Once the circuit level analysis is complete, we begin the subsystem level analysis. In
essence, we may treat the subsystem exactly like the circuit level, but look for performance
aspects of the SEU-induced anomaly. An example follows.
A Command and Data Handling (CADH) subsystem may be composed of separate circuits
such as those data storage, spacecraft command processing, attitude control processing,
instrument interfacing, spacecraft engineering telemetry gathering, etc... Let's say, for
instance, that an SEU occurs in the spacecraft command processing circuitry. To be more
specific, we know by circuit analysis that this SEU causes the spacecraft command
processing circuitry to have a false output. Again looking at operational parameters and
sensitive time windows and amplitudes, we determine if and how this may affect the
surrounding circuits and whether there is an effect on the subsystem performance and its
output on the whole. For example, we determine if the false output propagates through the
instrument interfacing circuit causing an incorrect output on the instrument command interface.
The system level analysis takes this one step further. By continuing with the CADH example,
we observe that this false output again may or may not propagate to another subsystem.
Depending again on sensitive time windows and amplitudes, an incorrect command may or
may not be issued to the instrument.
The spacecraft level of analysis then would take the output of the system level analysis and
determine, in this case, whether the incorrect command would affect the overall spacecraft
operation. For example, we might observe incorrect instrument data being gathered or a
system safing occur.
To provide a little more detailed understanding, we shall discuss a typical ADC. This
(hypothetical) ADC has both digital and analog sections. Let's assume an SEU occurs in a
calibration RAM area of the device. We shall look at how this SEU could propagate to affect
spacecraft performance.
We have presented some methodology in viewing the propagation of SEUs from the device
level to the spacecraft level of integration. Understanding the effect a single bit flip or transient
has on the spacecraft is a key to reducing risk in spacecraft programs.
1. D.M. Newberry, D.H. Kaye, G.A. Soli, "Single Event Induced Transients in I/O
Devices: A Characterization", IEEE Trans. Nucl. Sci., vol 37, pp 1974-1980, Dec 1990.
2. D.M. Newberry, "Single Event Upset Error Propagation Between Interconnected
VLSI Logic Devices", RADECS 91: IEEE Proceedings from, vol 15, pp 471-474, Sep 1991.
3. D.M. Newberry, "Investigation of Single Event Effects at the System Level",
RADECS 93: IEEE Proceedings from, pp 113-120, Sep 1993.
4. J.F. Leavy, L.F. Hoffman, R.W. Shovan, M.T. Johnson, "Upset Due to a Single
Particle Caused Propagated Transient in a Bulk CMOS Microprocessor", IEEE Trans.
Nucl. Sci., vol 38, pp 1493-1499, Dec 1991.
5. K.A. LaBel, E.G. Stassinopoulos, G.J. Brucker, "Transient SEUs in a Fiber Optic
System for Space Applications", IEEE Trans. Nucl. Sci., vol 38, pp 1546-1550, Dec 1991.
6. K.A. LaBel, P.W. Marshall, C.J. Dale, C.M. Crabtree, E.G. Stassinopoulos, M.M. Gates,
"SEDS MIL-STD-1773 Fiber Optic Data Bus: Proton Irradiation Test Results and Spaceflight
SEU Data", IEEE Trans. Nucl. Sci., vol 40, pp 1638-1644, Dec 1993.
7. R. Koga, S.D. Pinkerton, S.C. Moss, D.C. Mayer, S. LaLumondiere, S.J. Hansel, K.B.
Crawford, W.R. Crain, "Observation of Single Event Upsets in Analog Microcircuits",
IEEE Trans. Nucl. Sci., vol 40, pp 1838-1844, Dec 1993.
8. R. Ecoffet, S. Duzellier, P. Tastet, C. Aicardi, M. Labrunee, "Observation of Heavy Ion
Induced transients in Linear Circuits", Workshop Record for the 1994 IEEE Radiation Effects
Data Workshop, pp 72-77, 1994.
9. K.A. LaBel, A.K. Moran, D.K. Hawkins, J.A. Cooley, C.M. Seidleck, M.M. Gates, B.S. Smith,
E.G. Stassinopoulos, P.W. Marshall, C.J. Dale, "Single Event Effect Proton and Heavy Ion
Test Results for Candidate Spacecraft Electronics", Workshop Record for the 1994 IEEE
Radiation Effects Data Workshop, pp 64-71, 1994.
10. T.L Turflinger, M. V. Davey, "Transient Radiation Test Techniques for High-Speed
Analog-to-Digital Converters", IEEE Trans. Nucl. Sci., vol 36, pp 2356-2361, Dec 1989.
11. T.L. Turflinger, M.V. Davey, "Single Event Effects in Analog-to-Digital Converters: Device
Performance and System Impact", IEEE Trans. Nucl. Sci., vol 41, pp 2187-2194, Dec 1994.
12. K.P. McCarty, J.R. Coss, D.K. Nichols, G.M. Swift, K.A. LaBel, "Single Event Effects
Testing of the Crystal CS5327 16-Bit ADC", Workshop Record for the 1994 IEEE Radiation
Effects Data Workshop, pp 86-96, 1994.
13. K.A. LaBel, E.G. Stassinopoulos, G.J. Brucker, C.A. Stauffer, "SEU Tests of a 80386
Based Flight-Computer/Data-Handling System and Discrete PROM and EEPROM Devices, and
SEL Tests of Discrete 80386, 80387, PROM, EEPROM and ASICS", Workshop Record
for the 1992 IEEE Radiation Effects Data Workshop, pp 1-11, 1992.
Introduction
Device Type Sensitive Area SEU Types Memories Memory cells Bit flips Control Logic Bit flips if sequential, Transients if combinatorial Combinatorial logic Combinatorial logic Transients Sequential logic Sequential logic Bit flips FPGAs Combinatorial logic Transients Sequential logic Bit flips Microprocessors Registers, cache, sequential control logic Bit flips Combinatorial control logic Transients ADCs, DACs Analog portion Transients Digital portion Bit flips or transients depending on design Linear ICs Analog area Transients Photodiodes Photodiode Transients
5.3.2 Circuit Level Analysis
5.3.3 Higher Level Analysis
5.4 Example
5.5 Summary
5.6 References
1. The SEE Problem
2. Functional Analysis and Criticality
3. Ionizing Radiation Environment Concerns
4. Effects in Electronic Devices and SEE Rates
5. SEU Propagation Analysis: System Level Effects
6. SEE Mitigation: Methods of Reducing SEE Impacts
7. Managing SEEs: System Level Planning
8. SEE Criticality Assessment Case Studies