SEECA - Section 8

SEECA

Single Event Effect Criticality Analysis

Sponsored by NASA Headquarters/ Code QW

February 15, 1996

for more information, contact Kenneth A. LaBel

Introduction
1. The SEE Problem
2. Functional Analysis and Criticality
3. Ionizing Radiation Environment Concerns
4. Effects in Electronic Devices and SEE Rates
5. SEU Propagation Analysis: System Level Effects
6. SEE Mitigation: Methods of Reducing SEE Impacts
7. Managing SEEs: System Level Planning
8. SEE Criticality Assessment Case Studies

Section 8


SEE Criticality Assessment Case Studies

Paul Marshall, Consultant

8.1 Design of the AS-1773 Fiber Optic Data Bus for T&C or Payload Applications

This case study illustrates the top-down application of SEECA in system design beginning with the definition of functional requirements and the use of those requirements to establish derived requirements for subsystem performance and hardware specifications. The example shows the expression of the SEE particle environment and its impact on the design from an SEE perspective. It also discusses the trade of various candidate technologies in the design by using SEE criticality, along with other performance gauges, as a guideline. Finally, the verification of the SEE-related performance is discussed along with the test and validation, analysis, and flight demonstration approaches used to prove the design.

Background

Beginning in the late 1980s, NASA Goddard Space Flight Center managers and engineers sought to take advantage of the weight and power savings afforded by fiber optic communications technology for satellite applications. After carefully assessing various feasibility issues for inserting this technology into NASA programs, the decision was made to pursue the development of a fiber optic based physical layer for the familiar MIL-STD-1553 bus used extensively for the mission-critical telemetry and control (T&C) needs. This would provide immediate savings for planned applications of the MIL-STD-1553 while leveraging heavily off of existing electronic hardware for the MIL-STD-1553, while at the same time providing a basis for further developments of higher data rate busses using fiber optics.

The effort began around 1989 under the Small Explorer Program, and the bus itself came to be known as the Small Explorer Data System (SEDS). As part of the effort, the standards community was involved, and the SEDS bus gained extended recognition as the MIL-STD-1773 fiber optic data bus. In 1995, the MIL-STD-1553 standard was revised to accommodate data transmission at either 1 Mbps or 20 Mbps, and the new standard, now under the Society of Automotive Engineers, became the AS-1773.

General Requirements Definition Specific to SEEs

The functional requirements of the SAE AS-1773 data bus were largely inherited from the MIL-STD-1553 system. Its intended applications as a high reliability bus for avionics and satellite T&C roles have driven the tradeoff of data throughput capacity in favor of enhanced reliability. For our purposes in this illustration, it is not necessary to describe these features, except to say that the bus protocol described in the SAE AS-1773 data bus standard has extensive error detection capabilities, and when errors are detected in messages, the messages can be automatically retransmitted. The bus availability for data transmission is a requirement also addressed in the standard which requires a dual redundant physical layer with automatic switching to the redundant side on hard error detection.

In short, the bus must be available for the transmission of mission-critical commands and telemetry. It must transmit that information without error and provide a response acknowledging successful transmission. Retries of messages with errors are allowed, but retries must be successful and this must be minimized so as not to affect bus traffic.

The SAE AS-1773 data bus standard is not specific to satellite applications, and there are no explicit requirements for radiation-related issues of any sort. For envisioned NASA applications of the AS-1773, the bus is obviously required to function in compliance with the standard, and it must do so in the presence of ionizing particle environments with characteristics and effects as already described. Obviously, the bus must also be tolerant to accumulated total ionizing dose, but that is outside the scope of this discussion.

Since the AS-1773 data bus is mission critical, its SEE functional criticality level is event-critical. This means no single particle event can prevent the bus from compliance with the AS-1773 standard. Further, this means that message transmissions must be successful (with occasional retransmissions allowed), and the bus must be available when needed. Even though the hardware level is dual redundant, since the bus is mission-critical, its availability must not be compromised by a permanently destructive particle events. Therefore, the SEE criticality of components in the bus would be event-critical for destructive SEEs.

A list of functional requirements pertaining to SEEs might include:

[1] The system must be compliant with the SAE AS-1773 data bus standard in the presence of the satellite particle environment.

[2] No single particle effect shall lead to permanent failure of any system component.

This second requirement might be modified in some cases to allow exceedingly improbable occurrences of certain events (e.g. parts which may latchup may be allowed if the test and analysis support that decision based on the expected likelihood of the event). For example, if the likelihood of part destruction from latchup were possible with an expected frequency of once in 50 missions, then the part could possibly be used with the assumption that failure of one side of a dual redundant system would not result in system failure.

Beyond these general performance requirements, there might be specific requirements on system data handling performance. For example, given that single event transient events might require data retransmission when certain types of errors occur (system level error tolerance), there may be a requirement on how often this might be allowed to happen. AS-1773 messages are transmitted in strings of twenty 32 bit words corresponding to 640 bits. When a transient disrupts a single bit, the entire message is retransmitted. Retransmission frequency would then be specified, for example, as only one retransmission in 106 messages.

As was mentioned in the section on criticality, there may be multi-tiered requirements with stipulations, for example, for flare versus nonflare conditions. Also, there may be separate requirements for different functional aspects of the bus. AS-1773 operates to transfer message traffic at two data rates, 1 Mbps and 20 Mbps. Mission critical T&C traffic would likely occur at the 1 Mbps rate, but payload traffic might be handled better at 20 Mbps. If payload information were rated error-vulnerable in criticality, then it would be expected that 20 Mbps traffic could have less demanding functional requirements.

Environment Description

As is often the case, NASA hardware development efforts target a broad user base both within and beyond NASA. As was the case with the SEDS MIL-STD-1773 bus, the AS-1773 bus is intended for flight on a number of missions and in a variety of orbits with varying severity of the single event environment. A single bus design, with SEE and total ionizing dose immunity levels for the more demanding missions envisioned, is usually the most appropriate design (as opposed to multiple designs for varying levels of severity or design for an initial application and retrofit to meet more difficult requirements for later missions).

For this reason, even though the initial application of a subsystem such as the AS-1773 might be for a low earth orbit (LEO), the design might eventually be required to perform in a more severe cosmic ray environment. For this reason, AS-1773 development has assumed the cosmic ray environment shown in figure 8.1. This has been calculated using the CREME models and assuming solar minimum conditions. The orbit is assumed to be at geostationary position, where geomagnetic shielding effects are minimal. These conditions represent a worst case average cosmic ray exposure, but this is not an unrealistically extreme worst case. Conditions at LEO might be reasonably similar, particularly for orbits at high inclination angles.

The requirement specification with respect to proton-induced SEE is a bit more complex, as the proton flux may vary by several orders of magnitude depending on the orbital position and the occurrence of solar particle events. Since the SAE AS-1773 data bus would be expected to perform mission critical functions without interruption, its design has been implemented with the worst case expected proton flux in mind. According to NASA's AP-8 environment models, the proton belt peak integral flux for energies capable of penetrating ~ 60 mils Al shielding would not be expected to exceed about 5 x 104 p/cm2/s. However, since the bus must also function during short duration solar flare particle events, the design requirement for AS-1773 is somewhat higher at 2 x 105 p/cm2/s. To further specify the proton requirement, the spectral energy composition (along with the assumptions made in establishing it) is also provided.

Figure 8.2 shows, for design purposes, the worst case proton flare environment which has been arrived at using the CREME model August 1972 flare as described by King [1]. This particular environment was selected for the AS-1773 development since it is sufficiently large that it would probably not be exceeded during a 10 year mission. Even so, there is some likelihood of a larger flare occurring. Feynman [2] has treated this problem of solar particle event peak fluxes in a probabilistic sense, and this reference would be of interest to anyone tasked with defining a worst case design requirement for solar flare events.

The solar flare conditions depicted in figure 8.2 establishes the most demanding proton flux for the AS-1773. This exceeds the proton belt peak levels even for orbits passing through the heart of the belts, and it may exceed low LEO requirements in the SAA by 2-3 orders of magnitude. For a payload data bus or some other noncritical subsystem, this flare condition might be covered with a somewhat relaxed functional requirement, but we note that this is not the case for mission critical subsystems which must function adequately even in rare stressful events. There is some degree of margin assumed in the choice of the August 1972 flare as a worst case design criteria, since this was an unusually large flare event. More quantitative treatment of the largest expected fluxes and design margins are described by Feynman [2].

SEE Component Requirements and Design Issues for the AS-1773

The functional requirements for the AS-1773 provide the basis for component SEE requirements and hardware design trades. These issues are guided according to figure 7.1 by recognition of which function is served by the hardware and what its functional criticality rating is. Further, there must be consideration of how a SEE affects the system performance in terms of system availability versus system performance.

Where system availability has error-critical rating, such as with the AS-1773, this dictates a hardware requirement for virtual latch-up immunity. In the case of AS-1773, this requirement is met by selection of custom ASICs from foundries with proven capabilities to provide latchup immune microcircuits, and by ASIC procurement with latchup parameters specified. Ultimately, latchup immunity will be confirmed by ion beam testing on flight lot parts, unless the process can be certified to provide immunity to latchup.

Availability might also be compromised by use of ASICs with soft error susceptibilities in microcircuits controlling the subsystem configuration. In particular, the protocol chip and bus transceiver circuits could be adversely affected by upsets in certain locations. Consequently, soft error immune cell libraries and processes are used to control the frequency of soft error occurrence in these circuits. Soft error upset thresholds of > 20 MeV*cm2/mg assure low upset rates for cosmic rays and immunity to upsets from protons. These levels are met in the case of the AS-1773 by protocol and transceiver chips from United Technologies Microelectronics Center (UTMC) and Honeywell's RHC-MOS IV line respectively.

In the case of the transceiver chip, Honeywell's standard cell libraries could not meet some of the functional requirements for circuit performance at clock rates of 200 MHz, and custom design was required. This custom design was accompanied by evaluation of soft error vulnerabilities at the microcircuit level, and SEU hardened registers were applied where necessary. Though less formal in detail, this microcircuit vulnerability analysis and hardening repeats the same theme applied at the system level with SEE failure mode analysis, criticality evaluation, and hardening. Ultimately, the objective is the same in both cases, to attain appropriate levels of SEE immunity without taking unnecessary measures.

The transceiver chip also has other noteworthy features related to SEE. In the qualification of the MIL-STD-1773 hardware for the small explorer data system (SEDS) for the SAMPEX satellite, it was discovered that fiber-based data links can be extremely sensitive to proton strikes in the receiver's photodiode [3]. Further studies showed the severity of the problem could be reduced by changing the optical wavelength of the system from 830 nm light on the SAMPEX generation hardware to 1300 nm on the newer design for AS-1773 [4]. Though this provides substantial improvement by allowing the use of InGaAs photodiodes, analysis indicated further reductions in the expected SEU rates would be needed. System level trades were performed as to where the error mitigation for this specific type of error could best be accomplished, and subsequently circuit level hardening against such errors was included as part of the design requirement for the AS-1773 transceiver chip.

The method for hardening against photodiode proton events involves a certain circuit level technique which is a variation of majority vote logic. The design has been described in reference [5]. Analysis of the temporal characteristics of the proton-induced single event transient revealed that the proton-induced signal was short in duration relative to symbols in the Manchester encoded data in the serial data stream. The transceiver circuit differentiates between the true data and "false" signals from protons by taking advantage of this difference. For each Manchester symbol period, the signal is oversampled at five times the symbol rate, and these results are clocked into a 5-stage serial shift register. Then the 5 outputs (one from each register stage) are majority voted to determine whether at least 3 of the 5 stages held low or high levels. Proton transients would affect only one (or possibly 2) of the 5 results, and would subsequently be rejected in the voted output.

In this example of proton-induced transients in the optical receiver's photodiode, we see that the ultimate solution of the problem is a combination of several solutions. First there is the hardware level hardening which is implemented with the choice of the wavelength and consequently the photodiode material. Next, is the circuit level hardening, which in this case follows a novel approach to discriminating the particle effect based on its temporal characteristics. Finally, as a last level of protection, any errors which were not already suppressed before reaching the system level would be recognized as an invalid Manchester symbol, and system level retransmission of the message would assure error-free message traffic.

Not surprisingly, the most efficient means of dealing with errors usually involves dealing with them at or near their point of origin, which in this case involve the receiver photodiode and circuit. The combination of the two approaches offer sufficiently robust tolerance at both 1 and 20 Mbps, though the effectiveness of the approaches differ at the two data rates. Without these two measures, the burden at the third level, system level message retransmission, would be too great during peak proton flux periods (e.g. solar flares).

In this section we have described the process of establishing system level functional requirements for SEE, flowing these into hardware requirements based on SEE criticality, trading component performance versus circuit and system level error mitigation, and arriving at a final design.

SEE Design Verification

In order to gain confidence that the design will meet SEE requirements, it is necessary to engage in a test verification process. This effort takes place on both the component and subsystem levels. Component tests are necessary for two reasons. First, it is often the case that the SEE characteristics of parts desired by the design engineers will not be known, either because of the use of a new vendor or part type or possibly because the vendor has altered something in the design or process. This is particularly true for commercial off-the-shelf (COTS) parts. SEU testing prior to final design is then necessary to accurately assess the expected performance, and the results of testing may indicate whether or not the part may be used in the design.

Once the design is finalized, it is again necessary to test components which have been procured as flight lot parts. This verifies that the actual flight performs as expected, and demonstrates their SEE performance at the levels at which they were procured. Of course, every component in the design will not undergo such scrutiny. The SEE failure modes analysis will identify those parts in the design which require this level of test and analysis. In the case of AS-1773, component SEE testing is required for the transceiver and for the protocol ASICs, as well as for the dual port RAMs. The type of testing (e.g. heavy ion SEU, heavy ion latchup, or proton SEU) will also be determined by the failure modes analysis.

The final test phase is carried out in situ on the actual subsystem design. This may be done on an engineering development unit or on a brassboard version built specifically for SEE testing. This level of testing may provide additional information about component response, but its primary function is to evaluate SEE impact at the system level. It validates models for error propagation within the subsystem, and it validates error mitigation schemes. Also, for certain types of errors such as proton strikes on the fiber optic receiver's photodiode, in situ testing is required since component testing outside the system application can be extremely difficult to interpret. In other cases, subsystem level in situ testing may not be necessary, provided component testing and analysis can provide orbital performance estimates with desired accuracy.

AS-1773 system level tests will be carried out to evaluate the heavy ion and proton response in each SEE sensitive component. The testing is usually done as a function of ion energy (or LET), and for a variety of system operating conditions. One main objective of the system tests is to formulate and execute a test plan covering the range of system test vectors and environment variables to refine models for expected flight performance in a specific orbit. Consequently, for each ion energy or LET used in testing, the AS-1773 system will be exercised at both of the two data rates, and at a series of incident optical powers. Variation of the optical power will establish performance in terms of beginning of life conditions (with stronger signals) and at end of life with typical or worst case power levels.

The performance of the AS-1773 subsystem is monitored in terms of system retransmissions, and also system availability. Permanent failures (which should not occur) are monitored, as well as switch-offs to the other half of the dual redundant architecture.

The purpose and goals of the subsystem tests are to verify the absence of permanent failures, parametrically identify system performance to verify design and refine flight performance models, evaluate error tolerance and mitigation schemes, and finally to guard against any surprises which may have been overlooked in the failure modes analysis.

The SEECA process for the AS-1773 described in this section illustrates its use for deriving hardware requirements from functional requirements, and for carrying out the details of an appropriately SEE immune design. The process involves close coordination with radiation environment and radiation effects specialists from the beginning and throughout the design and test phases. We summarize here by reviewing these various roles in the case of AS-1773 which in turn illustrate the process described in the section on Criticality.

SEE functional requirements definition. SEE engineers work with system designers to identify various ways in which SEEs might damage or disrupt system operation and help to identify meaningful ways to specify SEE functional requirements.

SEE environment specification. Radiation environment and effects specialists analyze orbital parameters to generate SEE-relevant charged particle environment descriptions to include in the system specifications.

SEE requirements flowdown in preliminary design. SEE specialists coordinate with system engineers and design engineers to derive SEE hardware requirements from the SEE functional requirements based on criticality of the function. Further consultations follow in the allocation of SEE budget to various segments of the design.

Detailed Design. SEE specialists work with system designers to identify appropriate component choices and to perform trades of various candidate error hardening, tolerance, and mitigation schemes.

Test and verification. SEE test engineers perform parts evaluation and screening for candidate components, and after final design, testing is done in situ on the operating subsystem to verify design and derive needed parameters for flight performance prediction.

Flight performance prediction. Based on the test results, SEE specialists predict the performance for specific orbital conditions. Having carried out the above process, the predictions should establish that the performance will meet SEE requirements, and with minimal impact on cost and complexity. In the event that functional requirements are not met, SEECA provides the framework for rectifying the problem.

8.2 Case Study: Retrofit of a DC to DC Power Converter

The discussions in the previous sections and in the AS-1773 case study identify the roles of SEECA in the design and qualification of a system or subsystem, but this is not the only application for SEECA. In many cases satellite missions rely on "heritage" designs which already exist and may already have flight histories. In such cases, the prior experience may not have involved qualification to the radiation and SEE environments necessary for the mission being planned, or as a worst case there may have been no such requirements at all. Also, part lists corresponding to an existing design may include items which are no longer available, and if parts are available, their radiation and SEE characteristics may differ from those qualified for the initial application.

Heritage designs represent a special case for SEECA since it is assumed that the nonrecurring engineering costs have been paid and redesign for SEE or any other reason is a costly proposition. Nevertheless, a SEECA must be executed for the intended application and with the SEE characteristics of available parts in mind. Where requirements cannot be met with existing designs, subsystem engineers must be inventive to find hardening or mitigation approaches which do not involve alteration of the heritage design.

This case study involves the use of a power supply which in turn uses a DC-to-DC converter manufactured by Modular Devices Incorporated (MDI). Not only does the study illustrate the nature of dealing with an existing design, but it also highlights the fact that SEEs are not limited to memories and other digital logic devices.

The subsystem in question in this case is generic in the sense that the portion of the subsystem of interest is a power supply typical of those found throughout many satellite subsystems. The power supply function here involves the conversion from the spacecraft 28 volt supply to regulated + 15 volt supplies. The part in question is the MDI2690R-D15F DC-to-DC converter which is actually a hybrid comprised of many components.

In the course of testing SEE characteristics of power converters, NASA Goddard radiation effects engineers discovered that the MDI hybrid was susceptible to single particle induced resets which dropped the supply output from 15 volts to 0 volts followed by a spontaneous recovery after about 10 ms. The details of the testing and results have been reported in [6]. By testing with heavy ions incident on various isolated portions of the hybrid part, the problem was isolated and identified to be related to a LM139 op amp. The LET threshold of this linear device was sufficiently low to indicate a sensitivity even to protons. The existence of SEEs in linear devices had been reported previously, but the effects are highly application dependent. This example is one of a very few cases where subsystems have been shown to be sensitive to "upsets" in linear devices, probably because the spacecraft community has not been fully aware of the potential for these problems and SEE testing of analog parts is not usually done.

These results were sufficiently alarming to engineers who had included the MDI converter in their designs to warrant activity on multiple fronts. In a coordinated effort between NASA Goddard, the Jet Propulsion Laboratory, the Naval Research Laboratory, and MDI engineers, the MDI design was analyzed and a potential solution was suggested which involved the addition of a capacitive filter to suppress the transient resulting from the particle interaction. Subsequent tests with heavy ion beams were conducted to first verify the initial results and also to validate the efficacy of the proposed solution to "harden" the converter against transients. The details and results of these tests are available on the NASA Goddard Radiation Effects Group home page on the world wide web [7]. In summary, the previous results were reproduced on the unhardened design, and the modified design was shown to have a sufficiently high tolerance so that only the most energetic interactions could produce the reset effect. Analyses for the orbit in question indicated that the expected rate for the effect on orbit was reduced by several orders of magnitude by the proposed solution, and the problem would occur so infrequently that the MDI converter could be used with acceptable risk. MDI subsequently adopted the minor alteration to the hybrid without impact to the converter's other electrical characteristics, and consequently a costly redesign of all the power supplies using the MDI converter was averted. In the absence of such an elegant solution, it might be necessary for system engineers to abandon the heritage design, or to add external mitigation hardware, or (unless the function is error-critical) to absorb the resulting SEE rates by reallocating more restrictive error budgets to other subsystems.

Though the discovery of the problem and identification of a solution was harrowing, especially for projects who had already purchased flight lot converters, the launch of this potentially catastrophic design flaw was averted. This case study illustrates how a heritage design must be adopted with proper planning using SEECA, how SEE problems may be discovered in unexpected places (e.g. linear parts), and how testing and innovative solutions involving teamwork between suppliers, test engineers, design engineers, and system engineers can turn a serious problem into a successful design with understood and acceptable SEE related risks.

8.3 References

1. We use the CREME model as contained in the software package SPACE RADIATION, Severn Communications Corp., Millersville MD.

2. J. Feynman, T.P. Armstrong, L. Dao-Gibner, and S. Silverman, "New Interplanetary Proton Fluence Model," J. Spacecraft, Vol. 27, No. 4, pp. 403-410, Jul.-Aug. 1990.

3. Kenneth A. LaBel, Paul W. Marshall, Cheryl Dale, Christina Crabtree, E.G. Stassinopolous, Jay T. Miller, and Michele M. Gates, "SEDS MIL-STD-1773 Fiber Optic Data Bus: Proton Irradiation Test Results and Spaceflight SEU Data," IEEE Trans. Nucl. Sci. NS-40, (6), p. 1638-44 (1993).

4. P.W. Marshall, C.J. Dale, M.A. Carts, and K.A. LaBel, "Particle-Induced Bit Errors in High Performance Data Links for Satellite Data Management," IEEE Trans. Nucl. Sci. NS-41, (6), p. 1958-65 (1994).

5. Don Thelen, Steve Rankin, Paul Mashall, Kenneth A. LaBel, and Michael Krainak, "A Dual Rate MIL-STD-1773 Fiber Optic Transceiver for Satellite Applications," Photonics for Space Environments II, SPIE Proceedings (1994).

6. Kenneth A. LaBel, Richard K. Barry, Karen Castell, Hak S. Kim, and Christina M. Seidleck, "Implications of Single Event Effect Characterization of Hybrid DC-DC Converters and a Solid State Power Controller," IEEE Nuclear and Space Radiation Effects Conference Workshop Proceedings, December 1995.

7. http://flick.gsfc.nasa.gov/radhome.htm

Introduction
1. The SEE Problem
2. Functional Analysis and Criticality
3. Ionizing Radiation Environment Concerns
4. Effects in Electronic Devices and SEE Rates
5. SEU Propagation Analysis: System Level Effects
6. SEE Mitigation: Methods of Reducing SEE Impacts
7. Managing SEEs: System Level Planning
8. SEE Criticality Assessment Case Studies