

# Literature Review for Condition Monitoring For Power Electronics Reliability (Draft)

By

Shaoyong Yang

April 2008

# Contents

| 1. Introduction                                                         | 1                    |
|-------------------------------------------------------------------------|----------------------|
| 2. Reliability Theory and Requirement for Power Electronics             | 2                    |
| 2.1 Definitions of Reliability                                          | 2                    |
| 2.2 Basic Concepts of Probability Theory                                | 2                    |
| 2.2.1 Definition of Probability & Conditional probability               | 2                    |
| 2.2.2 Definition of a random variable                                   | 3                    |
| 2.2.3 Functions of one RV, mean and variance of an RV                   | 3                    |
| 2.3 Reliability Measures                                                | 4                    |
| 2.3.1 Relationship between Different Reliability Measures               | 4                    |
| 2.3.2 Normal and log normal distributions                               | 6                    |
| 2.3.3 Weibull distribution                                              | 8                    |
| 2.4 Reliability Requirements of an Electrical System and a power device | 9                    |
| 3. General Failure Mechanisms for Electronics                           | 11                   |
| 3.1 Failure Types                                                       | 11                   |
| 3.2 Failure Ranking                                                     | 13                   |
| 3.3 Failure mechanisms in capacitors                                    | 15                   |
| 3.3.1. Reliability for film capacitors                                  | 15                   |
| 3.3.2. Reliability for ceramic capacitors                               | 16                   |
| 3.3.3. Reliability for electrolytic capacitors                          | 16                   |
| 3.4 Failures in PCBs                                                    | 18                   |
| 3.4.1 CFF (Conductive Filament Formation)                               | 18                   |
| 3.4.2. Delamination Flex crack and PCB fatigue                          | 19                   |
| 3 5 Failure Mechanisms in Semiconductor devices                         | 19                   |
| 3.5.1 Electrical Stress (In-Circuit) Failures                           | 20                   |
| 3.5.7 Gate Oxide Breakdown                                              | $\frac{20}{21}$      |
| 3.5.2 Guie Onter Dieukeo with and charge effects                        | 21                   |
| 3.5.4 Defects and nining                                                | $\frac{21}{24}$      |
| 3.5.5 Migrations                                                        | 25                   |
| 3.5.6 Microcracks and sten coverage                                     | $\frac{25}{26}$      |
| 3.5.7 Radiation                                                         | 20                   |
| 3.5.8 Package and assembly                                              | 27                   |
| 3.6 Summary                                                             | 31                   |
| 4 Failure Mechanisms for power devices                                  | 31                   |
| A 1 Power diodes                                                        | 31                   |
| 4.2 Bipolar junction transistors (BITs)                                 | 32                   |
| 4.2 Dipotal junction transitions (DJ 13)                                | 32                   |
| A A Power MOSEETs                                                       | 32                   |
| 4.4 1 GBTs                                                              | 31                   |
| 4.5 10D 19                                                              | 35                   |
| 4.6.1 Press pack and power modules                                      | 36                   |
| 4.6.2 Bond wire lift off:                                               | 37                   |
| 1.6.2 Solder fatigue and cracking                                       | 38                   |
| A 6 A Temperature measurement issues                                    | 30                   |
| 7 Summary                                                               | 57<br>/10            |
| 5 Condition Monitoring for Power Electronics Peliability                | <del>4</del> 0<br>// |
| 5.1 introduction                                                        | -τυ<br>//Ω           |
| 5.2 Review of research projects                                         | +0<br>∕11            |
| J.2 Review of research projects                                         | 41                   |

| 5.2.1 LESIT                                                                      | 41       |
|----------------------------------------------------------------------------------|----------|
| 5.2.2 RAPSDRA                                                                    | 42       |
| 5.2.3 Projects in CALCE                                                          | 43       |
| 5.2.4 Condition Monitoring Projects for Capacitors                               | 44       |
| 5.3 Condition monitoring tools                                                   | 45       |
| 5.3.1. Reliability Evaluation Tools                                              | 45       |
| 5.3.2. Cycling tests and accelerated stress test                                 | 46       |
| 5.3.3 Simulation tools                                                           | 48       |
| 5.4 Summary                                                                      | 48       |
| 6. COMPERE project                                                               | 49       |
| 6.1 Novelties                                                                    | 49       |
| 6.2 Methodologies                                                                | 50       |
| 7. Summary                                                                       | 50       |
| Reference:                                                                       | 51       |
| Web links:                                                                       | 59       |
|                                                                                  |          |
| Figure 1: Variation of the failure parameters during life                        | 5        |
| Figure 2: Various failure functions for a lognormal failure distribution         | 7        |
| Figure 3: The relationship between the MRL and R(t)                              | 8        |
| Figure 4: Weibull analysis for $\Delta Tj=105 \text{ K}$                         | 9        |
| Figure 5: MOSFET failure rates versus voltage                                    | 11       |
| Figure 6: Failure process                                                        | 11       |
| Figure 7: Failure types                                                          | 12       |
| Figure 8: Failure rate against time                                              | 13       |
| Figure 9: Statistics of electronics failures                                     | 14       |
| Figure 10: An electrolytic capacitor structure and equivalent circuit            | 17       |
| Figure 11: diagram of the formation of CFF                                       | 18       |
| Figure 12: Examples of Structures with potential for Microcracks                 | 27       |
| Figure 13: cross section of a moulded plastic package                            | 27       |
| Figure 14: Safe Operating Area of a power MOSFET                                 | 34       |
| Figure 15: Plastic IGBT module                                                   | 36       |
| Figure 16: Press-pack IGBT                                                       | 37       |
| Figure 17: An example for bond wire lift off                                     | 37       |
| Figure 18: Block diagram of the supervision of the chip temperature and power lo | sses     |
|                                                                                  | 40       |
| Figure 19: LESIT results for different $\Delta T_j$                              |          |
| e ,                                                                              | 42       |
| Figure 20: CALCE research projects                                               | 42<br>44 |

#### 1. Introduction

Power electronics systems are key elements in many safety-critical, high-reliability, electrical systems working in uncertain and harsh environments. Examples include aerospace power supplies and servo motor drives, marine propulsion and traction drives, and offshore renewable energy systems. Converter failures reduce the availability of overall systems and can be costly to repair [COMPERE]. The approach toward more electric airplanes [www1] and vehicles, and the emergence of renewable energy as the promising future generating options put more emphasis on the reliability of power electronics system. For example, the research on wind energy in Denmark and Germany revealed that the fluctuating mode of operation and inclusion of power electronic converters in large turbines, are the primary causes of poor reliability [Tavner06]. The situations for the other target applications are similar for their intermittence in nature.

The current practice of ensuring power electronic converter reliability is to design for design thermal management for semiconductor devices during the development stage. This is reflected in device data sheets as thermal resistance, thermal capacitance and the operating limits [Meysenc05]. High reliability of a power electronic system can be achieved by using such data sheets, and by de-rating converter active devices and redundancy in components.

However, the redundancy design is less attractive in today's environments where space, efficiency and cost are all under pressure. This redundancy can increase the mean time between failures (MTBF) of converters, but will not prevent a catastrophic failure from happening when an initial device failure triggers cascade insulation and mechanical breakdowns due to electrical transients or pulsating torque or force [Kastha94].

Condition monitoring offers the potential of preventing a catastrophic failure. The concept has seldom been used to the converter. It is often thought that the process leading to failure of an active power device is too short to allow on-line monitoring [COMPERE]. Little is known about the behaviour of the device and the converter system when the device conditions start to deteriorate. As a result, there is an urgent need to understand the mechanisms of failure and the effect of the fluctuating mode of operation on the power electronics system design.

The New& Renewable Energy Group (NAREG) at Durham University and the Power Electronics Applications and Technology in Energy Research (PEATER) at Warwick University are collaborating on the EPSRC sponsored project Condition Monitoring for Power Electronics Reliability (COMPERE). Both Universities have been investigating the reliability of wind turbines and power devices for some years and producing a number of publications.

This project aims to prove the feasibility of condition monitoring of power devices in converters and to further understand the failure and deterioration mechanisms with the support of simulation and experimental work. With this project, a demonstration system of monitoring conditions of power electronics system reliability will be built.

The basic knowledge of reliability theory and requirement for power electronics is reviewed first. It is followed by power electronics failure mechanisms and research status for condition monitoring for power electronics reliability. The research novelties and methodologies for COMPERE project is summarised in the end.

## 2. Reliability Theory and Requirement for Power Electronics

## 2.1 Definitions of Reliability

The terms of *quality* and *reliability* are normally defined as follows:

*Quality* is the degree to which products or services satisfy or even exceed the requirements and expectations [Gerling07a]. It is inversely related to the proportion of defective or out-of-specification parts at the time for shipment [Grant89].

*Reliability* measures the capability of the device to perform as specified over an extended operational time [Grant89]. *Reliability* gives the following information [Gerling07a]:

- (1) Quality during use.
- (2) Intrinsic characteristics of an object:
  - a. To perform as expected by users.
  - b. For the period of time intended by the designer.
- (3) Fitness of use.

(4) Stability of characteristics at delivery and during subsequent use.

Frequently quality is understood as quality at delivery not including reliability.

According to IEC50, the definition of *Reliability* is the probability that an item can perform a required function under given conditions for a given time interval [Gerling07a]. From the probability theory, the *reliability* qualification of a product need be quantified according to the statistical data.

# 2.2 Basic Concepts of Probability Theory

A physical system fails when its performance is unable to meet some specification. The time to failure is usually unpredictable even if the operation conditions of the system are deterministic and controlled. It is convenient to think of the time-to-failure as being random. Hence the theory of probability is needed to study reliability.

## 2.2.1 Definition of Probability & Conditional probability

*Probability* is a real-valued *set function* that assigns non-negative numbers to subsets of the sample space  $\Omega$ . If  $\Omega$  contains a finite number of elements, then every subset of  $\Omega$  can be assigned a probability value and is said to be an *event*.

However, if  $\Omega$  contains an uncountable infinity of elements, then in certain case it may be impossible to assign probabilities to every subset of  $\Omega$ . In this case, a Borel Field *F* is introduced which include a class of subsets of  $\Omega$  with special properties. The elements of *F* may be assigned probabilities and called events. The triplet ( $\Omega$ , *F*, *P*) is called a *probability space* [Amerasekera97].

There are three axioms following the whole of probability theory: **Axiom 1.** For any event  $A \in F$ ,  $P(A) \ge 0$ . (1) Axiom 2.  $P(\Omega) = 1.$  (2) Axiom 3. For any infinite sequence of disjoint events A<sub>1</sub>, A<sub>2</sub>,...,

$$P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)$$
(3)

The above axiomatic development of probability theory is relatively recent, proposed by Kolmogorov in 1933.

The conditional probability of an event A given an event B, denoted P(A|B) is defined as:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)} \tag{4}$$

when  $P(B) \neq 0$ , otherwise it is undefined.

The conditional probability P(A|B) satisfies all three axioms of probability theory.

#### 2.2.2 Definition of a random variable

A random variable x is a real-valued function defined on the sample space  $\Omega$ , [Amerasekera97]

$$\forall a \in R, \{ \omega \in \Omega : -\infty < x(\omega) \le a \} \in F$$
(5)

thus every subset of  $\Omega$  whose image is an interval  $(-\infty, a]$  must be well-defined event in *F*, with an assigned probability value. The probability of the event  $\{\underline{x} \le a\}$  is

$$P\{x \le a\} = P(\{\omega \in \Omega : -\infty < x(\omega) \le a\})$$
(6)

For a given x, this probability value is a function of  $a \in R$ . It is denoted by

$$F_X(a) = P(x \le a) \tag{7}$$

The above function is called *cumulative distribution function (cdf)*, which is widely used to study the aging problem of a power electronics system.

The cdf need not be a continuous function, but it is always continuous from the right. As a result, it is always true that F(a) = F(a+).

For a continuous RV x with cdf F(x), the probability density function (pdf) is the derivative of the cdf:

$$f(x) = \frac{dF(x)}{dx} \tag{8}$$

If F(x) is not differentiable at  $x_0$ , then f(x) is discontinuous at  $x_0$ - there is a jump.

For a discrete RV that takes the values  $x_i$  with probability  $p_i$ , the pdf function

$$f(x) = \begin{cases} p_i, & x = x_i; \\ 0 & \text{othersise} \end{cases}$$
(9)

#### 2.2.3 Functions of one RV, mean and variance of an RV

The expected or mean of an RV x is denoted E[x] and defined as follows

If x is continuous

$$\mathbf{E}[\mathbf{x}] = \int_{-\infty}^{+\infty} x f(\mathbf{x}) d\mathbf{x}$$
(10)

provided the integral exists.

If x is discrete

$$\mathbf{E}[\mathbf{x}] = \sum_{i} x f(x) \tag{11}$$

provided the summation exists.

If x is an RV with mean  $\mu = E[x]$ , then the variance of x, denoted Var(x) is defined as Var(x) =  $E[(x-\mu)^2]$  (12)

The positive square root of the variance is called the standard deviation of x, and denoted by  $\sigma_x$  or simply  $\sigma$ . Accordingly the variance is also denoted by  $\sigma^2$  [Amerasekera97].

#### 2.3 Reliability Measures

Since a power electronics system is normally non-reparable, a main concern of reliability is the time to first failure, herein denoted by T. The start of a power device's operation life is defined as time origin. Thus T is a continuous positive random variable, whose cumulative distribution function is denoted [Amerasekera97]

 $F(t) = P\{T < t\}$ (13) Which represent that the probability that a component fails prior to the time t.

A component survives to the time *t* is

$$R(t) = 1 - F(t) \tag{14}$$

The probability that the component fails during the period t and t+dt, is f(t)dt, where f(t) is known as failure probability density function.

#### 2.3.1 Relationship between Different Reliability Measures

The probability that the component fails during the period t and t+dt, given that it survives to the time t, is  $\lambda(t)dt$ , where  $\lambda(t)$  is known as failure rate [Grant].

(15)

The probability theory gives  $f(t) = \lambda(t) R(t)$ 

Other relationships:

 $\int_{0}^{t} f(t)dt = 1 - R(t) = F(t)$ (16)

Differentiation of equation (2) shows that

$$f(t) = -\frac{dR(t)}{dt}$$
(17)

Substituting (3) into (1) gives:

$$\lambda(t) = -\frac{dR(t)/dt}{R(t)}$$
(18)

Integration of (4) gives

$$\mathbf{R} (\mathbf{t}) = \exp\left(-\int_0^t \lambda(t) dt\right)$$
(19)

The relationship between these parameters for a component that follows the bathtub curve is illustrated in Figure 1.



Figure 1: Variation of the failure parameters during life [Grant]

When it is possible, the failure rate is usually measured in FITs, representing one failure per  $10^9$  device hours:

1 FIT = 1 in 1000,000 failures per 1000 hours. 1 FIT = 1 in 1,000,000,000 failures per hour.

In the early 70s, IC failure rates of 1000-200 FIT were acceptable. Failure rates are projected to be at 10 FIT around the year 2000. The Semiconductor Research Corporation (SRC) lists among its "Year 2007 Research Goals" a failure rate for ICs of 0.1 FIT [Amerasekera]. The reliability for power electronics will be explained in the Section 4.

#### 2.3.2 Normal and log normal distributions

The mean time to failure (MTTF or MTF) is defined as

 $\langle t \rangle = \int t f(t) dt$  (20)

The term *mean time between failures* (MTBF) is often used for a system containing many subsystems or components that are repaired or replaced on failure [Grant].

In many instances the cumulative failure function for semiconductor devices follows a *lognormal* distribution. The implication of this is that the random consequence of the product of a number of independent factors. As these factors or events result from the operation of the component, this is essentially a wear out mechanism. The failure probability distribution function takes the form [Grant]

$$f(t) = (2\pi)^{-1/2} (ts)^{-1} \exp\left(\frac{-[\ln(t/t_m)]^2}{2s^2}\right)$$
(21)

where  $t_m$  is the geometrical mean of the failure times, that is, the time at which 50% of the components will have failed.

*s* is sometimes called the dispersion, representing the range of lifetimes about their mean value <t>, which is given by

 $<t>=t_m \exp(s^2/2)$  (22)

In Figure 2, the various failure functions for a typical lognormal failure distribution are plotted, where the infant failure and constant failure rates are not considered. In Figure 2c, it can be seen that a graph f(t) versus t gives a *normal* or Gaussian distribution. The integral of the distribution, given in Figure 2d, can be transformed into a straight line shown in Figure 2e. The validity of the lognormal distribution function can be established by recording the times of successive failures in this way. Once the straight line is established, the median lifetime and the dispersion can be easily determined [Grant].

The life time test is sometimes carried out on a small sample from the total population of components supposing that the distribution of the samples is representative of that of the whole population... however, the error can be very large. There are some methods available to correct f this. One that is quite accurate and easy to apply is known as Bernard's method [Grant]. Further study is necessary to quantify this if necessary.



Figure 2: Various failure functions for a lognormal failure distribution [Grant]

In general, knowledge of the median lifetime is of little value to an equipment designer. Much more important is to know the likely time to the first failure among the many components used in a given system. Furthermore, depending on the sample size used, confidence limits can be calculated.

Another useful first-order measure is called the *Mean Residual Life (MRL)* at time t, denoted MRL(t), which is defined as the expected value of the remaining life of a system, given that the system is still in operation at time t. Thus the MRL(t) is the conditional mean:

$$MLRL(t) = \int_{0}^{\infty} R(x/t)dx = \frac{1}{R(t)} \int_{0}^{\infty} R(x)dx$$
(23)

The above relations are illustrated in Figure 3.



Figure 3: The relationship between the MRL and R(t) [Amerasekera] Other second order measures, like the *variance* of the time to failure can be derived from the above first-order measures.

Besides the normal /log normal distributions, a number of other distributions have been found useful to describe the statistics of the time to failure, for example, Weibull Distribution being widely used in reliability and failure analysis.

## 2.3.3 Weibull distribution

A general three-parameter Weibull distribution is given by the equation below:

$$f(t) = \alpha^{-\beta} \beta(t-\gamma)^{\beta-1} e^{-((t-\gamma)/a)^{\beta}}$$
(24)

where  $\alpha$  is scale parameter;  $\beta$  is shape parameter (or slope); and  $\gamma$  is location parameter [Amerasekera 97, www2].

An example of two-parameter Weibull distribution is shown in Figure 4, where Weibull analysis provides a simple and useful graphic plot: the horizontal scale is a measure of life aging parameters (i.e. number of cycles); the vertical scales are the probability density f  $(x,\alpha,\beta)$  and cumulative distribution function F(x,  $\alpha$ ,  $\beta$ ) following two-parameter Weibull distributions:

$$f(t,\alpha,\beta) = \alpha^{-\beta} \beta t^{\beta-1} e^{-(t/\alpha)^{\beta}}$$
(25)

$$F(t,\alpha,\beta) = 1 - e^{-(t/\alpha)^{\nu}}$$
(26)



Figure 4: Weibull analysis for  $\Delta T_j=105$  K [Amro04]

The Weibull slope  $\beta$  indicates which class of failure is present. A  $\beta$  lower than 1 indicates the system or the samples are more likely to fail early and become more reliable as the aging parameters increases. At  $\beta$ =1 failures occur independent of aging parameter and a  $\beta$  value higher than 1 indicates wear-out failures. The characteristic life  $\alpha$  is defined as the age at which 63.2 % of the units will have failed [Amro04].

Generally both lognormal and Weibull distribution appear to fit most sets of reliability data equally. However, closer examination reveals that Weibull distribution provides a better fit of short time failures, while the lognormal plot is better at predicting longer lifetimes [Ohring98].

Lognormal distributions tend to apply when gradual degradation occurs over time because of diffusion effects, corrosion processes, and chemical reactions. Early p-n-p mesa transistors, bipolar and MOS transistors, light-emitting diodes, and lasers are examples whose failures are modelled by lognormal statistics. On the other hand, Weibull distributions are applied in cases where the weak points or defects propagate to failure. Dielectric breakdown, capacitor failures, and fracture in ceramics are typically described by Weibull distributions [Ohring98].

## 2.4 Reliability Requirements of an Electrical System and a power device

The reliability of a system is affected by its own condition and its environment. For a power electronics system, a variety of factors such as electrical load, mechanical load or torque, and environmental conditions (temperature, humidity, vibration, mechanical shocks etc) may all exert stress on the components.

For example, in the power electronics system for traction application, special attention is paid to temperature, which can arise from the following types: 1) seasonal influences; 2) daily changes; 3) changes due to long stopovers at terminal stations or park lots; 4) power cycles.

An example based on urban train service is given to explain the importance of power cycles for the reliability. From Table 1, the effect of power cycles is the most severe

due to their large number and high temperature swing compared with other types [Held97].

|                     | Type 1           | Type 2           | Type 3           | Type 4             |
|---------------------|------------------|------------------|------------------|--------------------|
| $\Delta T(K)$       | 20-40            | 10-30            | 30-80            | 30-80              |
| Number of Cycles    | <10 <sup>2</sup> | <10 <sup>4</sup> | <10 <sup>5</sup> | $<10^{6} - 10^{8}$ |
| during the lifetime |                  |                  |                  |                    |

Table 1: Expected lifetime temperature changes and number of cycles in urban railway applications

Typical operation data of a tram are given as follows [Held97]:

| Expected life time of a tram                                | 30 years      |
|-------------------------------------------------------------|---------------|
| Service time in 30 years<br>30years ×300 days/year×15 h/day | 135,000 hours |
| Motion time between stopover                                | 30s           |
| Time at stopover                                            | 30s           |
| Number of power cycles                                      | 8,100,000     |

Automotive vehicles and aeroplanes have similar expected lifetime, and the power devices in these applications experience similar scale power cycles. Due to the large number of components and power cycles, the fault level per component must be very low to guarantee the reliability level of the whole system. Supposing 100 PCB per car and 100 components per PCB, a failure component rate of 1 ppm means that 1% cars will fail, so the zero-defect concept was proposed [Gerling07b].

For utility power system, the rate of successful delivery of demanded energy is 99.99998% in 2004/5 according to the data from National Grid [www3]. Considering the vast number of components in power system, the fault level is far below one ppm level. The distribution generation and wind energy contribute little to the whole system's reliability due to the disperse locations of the former and the intermittence of the latter [Zhu03, Phoon06].

A unit, FIT, is introduced to specify the reliability level of power component reliabilities. 1 FIT =  $1/10^9$  failure per device-hours [Gerling07a, w3]. From Figure 5, the failure rate of a good power device can be several FITs. In the case of traction applications: 200 FIT is required, which mean that only one failure occur in  $10^7$  component hours (1,141 year) [Lefranc00].



Figure 5: MOSFET failure rates versus voltage [Wolfgang07a]

## 3. General Failure Mechanisms for Electronics

A typical failure process is explained in Figure 6. A failure occurs when the ability of a power electronics system to perform a required function is terminated, and is normally caused by the physical construction and electrical load and other stresses that triggered one or more failure mechanisms, which represent a process that results in a failure. Failure modes manifest the effect of failures, for example, physical open or short [Gerling07a]. Failure types are reviewed and the power electronics failure mechanisms are followed.

Figure 6: Failure process

## 3.1 Failure Types

Failure types may be viewed from different angles according to the effect which the lack of performance has on the overall functional capability of system and component [Smolens07, Blache94]. Such a classification is illustrated in Figure 7 and more details are given as follows:

- (1) Intermittent failure: failures which result in a lack of some function of the component only for a very short period of time, the component reverting to its full operational standard immediately after the failure. It could also include degradation that can be taken care of during the scheduled maintenance time. A judgement should be made whether this would be considered a failure or not. However, if those faults occur very frequently, they must be deemed as failures [Blache94].
- (2) Extended Failure: failures which result in a lack of some function which will continue until some part of the component is replaced or repaired. Extended failures may be further divided into the following two types:

- Complete Failure: Failure that causes complete lack of a required function.
- Partial Failure: Failure which leads to a lack of some function but not such as to cause a complete lack of the required function.

Initially faults will appear intermittent, depending on specific operating conditions (e.g., voltage, temperature, circuit inputs, etc.), but eventually result in permanent defects. Partial failures, for example a single chip's failure within a multi-chip package, may not be noticed from external measurement during applications until the development into a complete failure.

Both complete and partial failure may be further classified according to the suddenness with which the failure occurs: catastrophic/sudden failure due to a single occurrence of a single stress event that exceeds the intrinsic strength of materials, and degraded/gradual failures due to non-visible internal accumulation of damage. The former could not have been forecasted by testing or examination; however, the latter could have been forecasted.



Figure 7: Failure types [Blache94]

According to the product reliability phases, the failures can be classified into [Gerling07a]:

- Wear-out failures normally happen near the end of the lifetime under a specific load or stress and they are the effect of systemic failure mechanisms inherent in the product construction.
- Early failures happen in the product infant period and are caused by deficiencies in design, structure, manufacturing processes, or severe misapplications. The deficiencies not detected in the screening may increase its sensitivity to stress.
- Random failures are normally caused by external incidental stress, for example, radiation and voltage transients. The failure rate can be regarded constant independent of the system's age.

These two classifications may have overlaps. Studies show that wear-out faults have a gradual onset, manifesting initially as timing or intermittent faults before eventually leading to hard breakdown [Smolens07]. The gradual or sudden failures both likely happen during the wear-out process.

These three types of failure combine together give a bath-tub curve shown in Figure 8. Stimulated mechanisms normally correspond to wear-out failures since the early

failure is dependent on the product design and manufacture process, and random failures is related with the external stress and independent of system age [Gerling 07a].



Figure 8: Failure rate against time

## 3.2 Failure Ranking

A power electronics system includes many parts [Mermet07]:

- Power devices, including IGBTs, MOSFETs, diodes, transistors and so forth.
- Cooling devices including heat sinks, coolants, fans and so forth.
- Connections: high voltage and low voltage connections including bus bars, leads, vias, screws and so forth.
- Control: separate control parts including gate circuits, control boards and sensors; and integrated control board and current sensors.
- Capacitors: decoupling, by pass, filter and other applications.
- Mechanical parts for support and connectors.

It has been found that most standard components and packaging elements can recover after thermal shocks at 200 °C. However, capacitors, wire bonds, eutectic tin-lead solder joints and PCBs will seriously degrade at temperatures at 200 ° [McCluskey98].

From over 200 failure analyses on products from 80 different companies, a failure classification is given in Figure 9. It can be seen that the capacitor failures rank top, around 30%, which is followed by the PCB failure accounting for one quarter. The failures of semiconductor devices are an important part around 21%, which may be crack development on the die and over heating of junctions. Soldering joint and connector failures, normally caused by thermal fatigue and mechanical force, 13% and 3% respectively. Other failures, around 7%, were formed in different types.



Figure 9: Statistics of electronics failures [CALCE cited by Wolfgang 07a]

| Failures      | Failure         | Detection test  | Indicators    | Impacts     |
|---------------|-----------------|-----------------|---------------|-------------|
|               | Mechanisms      |                 |               | _           |
| Capacitors    | Thermal stress  | Resistance and  | Harmonics     | Magnetic    |
|               |                 | Ripple          |               | stress      |
| PCB           | Half by CFF     | High            | Harmonics     | Harmonics;  |
|               | [Wolfgang07a,   | magnification   |               | Stress on   |
|               | Turbini,        | inspection;     |               | the system. |
|               | Pecht];         | microsectioning |               |             |
|               | Contamination   | [www5]          |               |             |
|               | and             |                 |               |             |
|               | plating defects |                 |               |             |
|               | [www5]          |                 |               |             |
| Semiconductor | Trapping of     | Bare substrate  | Junction      |             |
| Devices       | charges         | test            | temperature;  |             |
|               | (hot carrier    | [Gerling07c]    | leak current; |             |
|               | effects)        |                 | voltage       |             |
|               | Surface         |                 | rises         |             |
|               | inversion       |                 |               |             |
|               | (Ion effect)    |                 |               |             |
|               | [Gerling07c]    |                 |               |             |
| Solder        | Solder fatigue  | Power cycling   | Voltage       |             |
|               |                 |                 | rises         |             |
| Connectors    | Bond wire lift  | Power cycling   | Voltage       |             |
|               | off             |                 | rises         |             |
| Others        | -               | -               | -             | -           |

Table 2: Ranking of failure mechanisms

Failure mechanisms for capacitors and PCBs, ranking top two, will be reviewed in the next two subsections. The package failures total up to 37%, including semiconductor device, solder and relevant connection failures. Their effects mix up sometimes and will be described in the later part of this section.

## 3.3 Failure mechanisms in capacitors

Capacitors can be roughly divided into four groups [www6]:

- (1) Film capacitors, including
- (2) Ceramic capacitors
- (3) Electrolytic capacitors
- (4) Miscellaneous, including glass, mica, porcelain and vacuum.

| types         | Varieties     | Cap (F)   | Voltage | Freq(Hz)    | Accuracy      | Temp      | leakage  |
|---------------|---------------|-----------|---------|-------------|---------------|-----------|----------|
| - JP - S      | , unouros     |           | (V)     | [Sarjean96] | 1 loo al ao y | Stability | realinge |
| Film          | polyester     | 1p-50μ    | 50-600  | >100        | +             | -         | +        |
|               | polycarbonate | 100p      | 50-800  |             | ++            | ++        | +        |
|               |               | -30 µ     |         |             |               |           |          |
|               | Teflon        | 1000p     | 50-200  |             | ++            | +++       | +++      |
|               |               | -2 µ      |         |             |               |           |          |
|               | polypropylene | 100p-50 μ | 100-800 |             | ++            | +         | ++       |
|               | polystyrene   | 10p-2.7 μ | 100-600 |             | ++            | +         | ++       |
| Ceramic       |               | 10p-1µ    | 50-30k  | >100k       | -             | -         | Moderate |
| Electrolytic  | Al            | 0.1 μ-1.6 | 3-600   | >100        |               |           |          |
|               | Tantalum      | 0.1μ-500μ | 6-100   |             | -             | -         | -        |
| Miscellaneous | Mica          | 1p-0.01µ  | 100-600 | >1M         | +             | /         | +        |
|               | Glass         | 10p-1000p | 100-600 | ]           | +             | /         | ++       |
|               | Porcelain     | 100p-0.1µ | 50-400  |             | +             | +         | +        |
|               | Vacuum        | 1p-5000p  | 2k-36k  |             | /             | /         | ++       |

Table 3: Capacitors and their properties [Horowitz89] +: good, -: poor, / : no comment;

It can be seen form Table 3, capacitors used in power electronics normally refer to the first three types (film, ceramic and electrolytic), among which electrolytic capacitors are widely used as DC link capacitors due to their high energy density. The reliability issues with them will be discussed in this subsection.

# **3.3.1. Reliability for film capacitors**

The reliability of the metallized film capacitors was assessed form degradation data by [Zhao07]. According to this paper, high reliability capacitors do not normally fail in a reasonable length of time; it is difficult to assess reliability using the traditional time-to-failure analysis method. A life distribution model, whose parameters can be estimated from the degradation measures of capacitors, was proposed instead. This model was proven to be accurate and economical in test costs.

Heat generation, size and shape effects, capacitor cooling, heat removal, empirical test data and thermal rules for capacitors are discussed in [Kampen01]. This paper is quite practical for selecting and operating AC film capacitors.

The reliability of unencapsulated SMD plastic film capacitors is discussed in [Seppl 00]. The reliability of the unencapsulated test capacitors was evaluated using standard temperature cycling, humidity storage, and high temperature environmental tests. Solderability and resistance to soldering heat were tested by mounting the test capacitors using the reflow soldering technique. The electrical properties including capacitance, insulation resistance, and dissipation factors at 1kHz and 100kHz were verified.

An attempt has been made to study the effect of impulse voltages with different magnitudes and repetition rates on the characteristics of metalized polypropylene film capacitors. An method of ageing test using impulse voltages on capacitors is proposed in [Varalakshimi05].

## **3.3.2. Reliability for ceramic capacitors**

The reliability of multilayer ceramic chip capacitors was reported in [Kobayashi 78]. In this paper, multilayer ceramic chip capacitors mounted on a hybrid IC were implemented. Failure modes, failure mechanisms, and drift of characteristics were analyzed. Humidity acceleration as well as voltage and temperature accelerations was investigated to estimate the chip capacitor reliability. The chip capacitor reliability proved to be high and adequate for the service period of the communication.

Miniaturized multilayer ceramic capacitors were studied in [Chan93], where components were subjected to various degrees of thermal shock up to 450 °C. Microstructural and layer-by-layer insulation resistance analyses have clearly identified the physical locations responsible for the electrical leakage of defective capacitors. From these tests and analyses, failure mechanisms are proposed to explain the failure of miniaturized multilayer ceramic capacitors under normal service conditions.

Failure analysis and reliability evaluation for high voltage ceramic capacitors were presented in [Kim01]. The failure modes and failure mechanisms were studied in two ways to estimate component life and failure rate. The validity of the results was confirmed by accelerated testing. Delamination between ceramic and epoxy, which might cause electrical short in underlying circuitry, can occur during curing or thermal cycle. The results can be conveniently used to quickly identify defective lots, determine life estimation, and detect major changes. The condition for dielectric breakdown was also investigated for the estimation of failure rate with load-strength interference.

# **3.3.3. Reliability for electrolytic capacitors**

# A. Aluminium electrolytic capacitors

An aluminium electrolytic capacitor consists of an anode foil, a cathode foil and a separator paper that are wound together and impregnated with an electrolyte, as shown in Figure 10. The high volumetric efficiency, compared with other types of capacitors, is due to its enhanced plate surface area by an etching process. Non-solid electrolytes are normally used.

The life of aluminium electrolytic capacitors depends on environmental and electrical factors. Temperature, ambient or internal, is the most critical element since increased temperatures accelerate the chemical reaction rates. Overvoltage causes high leakage current, which may fail the capacitor, and even if it survives, this operating mode cannot be maintained for long because gas is produced in the electrolyte. The ripple current dissipates heat with a power equal to the ripple current squared times the ESR.

Generally, ESL does not affect unless the capacitor is operating at high frequencies [Imam07].





The initial catastrophic failures caused by misapplication, such as inappropriate ambient conditions, over voltage, reverse voltage, or excessive ripple current, can be avoided with proper circuit design and installation. During the normal useful life, the catastrophic failure rate is the lower than semiconductors and tantalum capacitors. In the wear-out failure period, the capacitance decrease and the tan $\delta$  increase are caused by the loss of electrolyte, where  $\delta$  is the phase angle between current and voltage [Mahalik97]. Factors that can increase the capacitor temperature and internal pressure can further accelerate capacitor wear-out [Imam07].

Since the failure rate of capacitors is the highest according to Figure 9 and the electrolytic capacitor is a key component for power electronics, the condition monitoring of its reliability will be reviewed in Section 5.2.4.

## **B.** Tantalum capacitors

Tantalum Capacitors use a material called tantalum for the electrodes. Tantalum capacitors are superior to aluminium ones in temperature and frequency characteristics [www7].

Solid tantalum capacitors are generally considered more reliable than electrolytic capacitors because solid tantalum capacitors do not wear out by drying-out of the electrolyte [Imam07]. However, a short-circuit failure mode for a tantalum capacitor may be accompanied by catching fire [Imam07].

Tantalum chip capacitor reliability in high surge and ripple current applications was reported by [Reed 94]. Simple circuits that highlight the fundamental theoretical principles behind transient surge and steady state ripple current applications are analyzed and pertinent reliability issues are discussed. The relationship of device ESR to surge and ripple current robustness and device temperature rise is established theoretically.

# 3.4 Failures in PCBs

The reliability of a PCB is very much dependent of the board design and processing, soldering, integration with components and its applications. The principal mechanisms of semiconductor failure are to be reviewed in the next subsection.

# **3.4.1. CFF (Conductive Filament Formation)**

CFF (or conductive anodic filament, CAF) are copper corrosion by products that emanate from the anode of a circuit and "grow" subsurface toward the cathode, frequently along separated fibre-epoxy interfaces. The formation of CFF is illustrated in Figure 11 [Turbini].



Figure 11: diagram of the formation of CFF [Turbini]

The probability of CFF is a function of temperature, moisture content, the voltage bias and other environmental conditions and physical factors [Pecht99, Wolfgang07a, Welsher 80].

Studies on CFF have found that the path formation is often along the glass fibre to epoxy matrix interface. Hollow Fibers can Accelerate Conductive Filament Formation according to [Rogers01, Pecht99]. Hollow fibres can be produced if there is insufficient process control during the manufacture of glass fibres although most fibres are solid.

The increased risk of CFF failures in higher reflow temperatures must be taken seriously, and mitigation strategies should be examined to minimize this reliability risk [www8]. Manufacture of high quality hollow-free glass fibres may be the primary solution to maintain the reliability of laminates used in the electronics industry [Pecht99].

## 3.4.2. Delamination, Flex crack and PCB fatigue

#### Delamination

Fibre/resin interface delamination can occur as a result of stresses generated under thermal cycling due to the CTE mismatch between the glass fibre (5.55 ppm/°C) and the epoxy resin (65 ppm/°C) [Wolfgang07a].

#### Flex crack

Flex cracks are created when a physical displacement generates sufficient stress within the surface mounted body, e.g. a ceramic capacitor, to fracture the ceramic material. Lead free soldering systems are less ductile resulting in industry concern about the effect on flex performance [Wolfgang07a].

PCB fatigue was especially high in halogen free boards. As high density interconnection and halogen-free materials become more prevalent in electronics, PCB failure may be a significant factor influencing the overall reliability of area array assemblies in mechanical bend fatigue [Jonnalagadda 04].

#### 3.5 Failure Mechanisms in Semiconductor devices

The mechanisms of semiconductor failure can be classified into three main areas [Amerasekera97]:

- (1) Electrical stress failures.
- (2) Intrinsic failure mechanisms.
- (3) Extrinsic failure mechanisms.

Electrical stress failures are event dependent and are directly related to equipment /design related issues or static discharge during handling or shipping. Such failures are a major cause of concern to both manufacturers and customers.

Intrinsic mechanisms are those which are material related and associated with the reliability of the semiconductor devices themselves. Such mechanisms include crystal defects and dislocations, processing defects and general wafer fabrication problems, which are related to the 'front-end' of the manufacturing process.

Extrinsic failures are defined as those related to the interconnection (metal and contacts), passivation and the packaging. The interconnection and the passivation are also wafer fabrication related, and the packaging includes bonding, which is the 'back-end' of manufacturing process.

It is difficult to assign an order of importance to failure mechanisms. Some mechanisms are dominant in certain operational and environmental conditions [Amerasekera94]. Electronic devices should be considered to have several failure modes degrading simultaneously. Each mechanism 'competes' with the others to cause an eventual failure [Bernstein06].

All of these are thermally activated processes, having a characteristic activation energy,  $E_a$ , They are enhanced by an increase in temperature, according to the law of Arrhenius [Grant89, Amerasekera97]:

Rate of activation  $\propto \exp(-E_a/kT)$  (27)

A lower value of  $E_a$  implies that the failure mechanism is less temperature sensitive than one with a high  $E_a$ . Typical values of  $E_a$  range from 0.3 to 1.5 eV. In most cases it is possible to expedite the effect of the failure mechanism by stressing the component at elevated temperatures. However, some failure mechanisms have a negative  $E_a$ , and this will actually be inhibited at elevated temperatures [Amerasekera97].

A general description of failure mechanisms in semiconductor devices is given in this subsection, and then their effects on power devices will be analysed.

#### 3.5.1 Electrical Stress (In-Circuit) Failures

Damage caused by Electrical Overstress (EOS) and Electrostatic Discharge (ESD) stresses can account for over 50% of field failures [Bloomer89 cited by Amerasekera]. It is generally accepted that an ESD event is of the order of 1ns to 1µs, and EOS events are classified as those that extend into the time domain beyond 1µs. The contribution of ESD to the general set of EOS failure is difficult to define since both damage mechanisms result in similar failure modes [Amerasekera97].

#### A. Electrical Overstress (EOS)

The most important factor about EOS is that it is related with events which usually occur during normal operation. Voltage and current transients can be caused by many reasons, such as power supply switching, relay operation, power grid transients, and even lighting surges.

Overstress failure is usually associated with hot-spot development due to the high stress currents. As the semiconductor junctions get hotter more current flows in the hot regions and a thermal runaway condition is reached. Eventually the device is driven into second breakdown as the temperature approaches the melting point of silicon at 1688 K if the package does not fail first [Amerasekera97].

#### **B. Electrostatic Discharge (ESD)**

Static charge build-up in a typical working environment can generate potentials ranging from 100 V to 20 kV [Moss 82]. Low voltage electrostatic pulses of the order of 100 V can damage to the gate oxides of MOS transistors if no protection is provided [Amerasekera 86 cited by Amerasekera 97].

#### C. Latchup

The presence of the parasitic four layer pnpn structure creates the possibility of the device latching up by regenerative action. This mode of operation is highly undesirable for a MOS device because it leads to loss of control the collector current by the applied gate voltage [Baliga 87]. Latch up, in most cases, is triggered by voltage or current transient on either a power supply or an input/output pad [Amerasekera97]. Other triggering mechanisms include:

- (1) Avalanche breakdown of the n-well to p-substrate junction.
- (2) Displacement currents due to fast transient switching.
- (3) Minority/majority carrier injection from nearby transistors.
- (4) Photocurrent from ionizing radiation.
- (5) Extreme environment, for example, cryogenic temperature.

If the current drawn through the latchup path is large and the duration is long, the thermal heating can cause a catastrophic failure. However, even non-catastrophic failures are undesirable since the device needs to be reset.

# 3.5.2 Gate Oxide Breakdown

The critical  $SiO_2$  film, the smallest fabricated dimension in MOS devices, has been the focus of much study and reliability testing [Ohring98]. As a consequence, Gate Oxide Integrity (GOI) is extremely important to process control. GOI is one of the trade-offs defining the gate oxide thickness in a new process since thinner gate oxides are usually more sensitive to wear-out and damage for a given supply voltage.

The mechanisms related to gate oxide breakdown are very complex and the issues involved are too numerous. Two types of gate oxide damage are possible [Amerasekera97]:

- (1) Catastrophic damage which is usually a result of an over-voltage stress such as EOS or ESD as described in Section 3.5.1.
- (2) Time-dependent dielectric breakdown (TDDB) occurs during operation within the rated conditions of voltage, temperature and power dissipation.

Details of the latter category will be discussed. The oxide wear-out mechanism or TDDB occurs at weakness in the oxide film due to defects and it is towards the reduction of these defects that most effort in GOI is directed [Prendergast95].

Failures of MOS devices due to gate oxide break-down during device operational life are very low because it is possible to screen most defective devices before they reach the market. Furthermore, gate oxide thickness for a given technology is defined by the intrinsic breakdown limit of the oxide film, thus ensuring that the range of device operation is well within the intrinsic failure threshold. Therefore, the main concern is the defect related failures. Models have been developed to describe the relationship between the electric field at breakdown and the time to breakdown [Scott96 and Suehle94].

# **3.5.3** Ionic contamination and charge effects

Mobile ions in semiconductor devices can result in changes in important device parameters such as the threshold voltages, off-state leakage currents, and even transistor drive currents, hence the phrase 'ionic contamination.'

Apart from the mobile ions, other failure mechanisms caused by charge effects discussed herein are:

- Surface charge effect on isolation: charge movement through the field oxide that isolate active diffusion regions from each other.
- Charge effect and I-V stability. Three mechanisms related to the quality of silicon dioxides will be discussed in the following paragraphs.

## A. Ionic contamination

Ionic contamination is usually observed in gate oxide layers of MOS transistors. The problem of ionic contamination has been steadily reduced as sources of contamination have been identified and removed [Verwey90].

Ionic contamination is rated as both an infant mortality and a wear-out mechanism in the bathtub curve. Na<sup>+</sup>, Cl<sup>-</sup> and K<sup>+</sup> have been identified the principal causes of failures. Na<sup>+</sup> is the most mobile due to its small radius, and is considered to be most suitable for transportation through the amorphous SiO<sub>2</sub> [Amerasekera97].

The main sources of mobile ions during the manufacturing process are: processing equipment and materials; packaging materials; fabrication environment containing dust particles and water vapour; and human body [Amerasekera97].

In an n-MOS transistor, the extra positive charges at the oxide to Si interface introduce an extra negative voltage in the n-channel which appears as a decrease in threshold voltage. For a p-MOS device, the direction of the electric field in the gate oxide repels positive ions away from the  $SiO_2/Si$  area, so p-MOS devices are less sensitive than n-MOS devices to ionic contamination, which is one reason why the first MOS transistors were p-channel devices [Edwards82].

In bipolar devices, the forward current gain is most affected by changes in the carrier concentration in the base due to ionic contamination. Sodium ions are driven towards the silicon by the electric field at the collector junction of the non transistor [Nicollian82].

In both MOS and bipolar devices, ionic contaminants can change the junction avalanche breakdown voltage and leakage currents.

Since the ionic mobility is strongly influenced by electric field and temperature, typical screening techniques are high temperature baking and high temperature reverse-bias test [Amerasekera97]. Electrical tests after burn-in must be carefully formulated since the recovery of ion-induced failures after burn-in have been observed [Bell80].

To inhibit the movement of the mobile ions through the package and into the die, the chip can be coated with a protective or passivation layer. Different materials, like silicon glass (or silicon oxide), phosphosilicate glass, silicon nitride, Boron phosphosilicate glass and polyimide, have been used to be passivation layer [Amerasekera97].

## **B.** Surface charge effect on isolation

Apart from the effects of mobile charge in the gate oxide, charge movement through the oxide that isolate active diffusion regions from each other is also a reliability problem. Electron charge trapping and interface charge are the main contributors to the degradation, which result in a positive threshold shift and an increased source-to-drain leakage current [Tonti95].

An activation energy,  $E_a$ , of between 0.5 eV and 1.0 eV is generally accepted for this mechanism [Lycoudes80]. The strong temperature dependence means that an effective screen is either a high temperature storage bake at temperatures between 150 and 250 °C or a high temperature reverse-bias test between 125 and 250 °C.

The reliability effects of surface charging are usually observed in the wear-out region although extreme cases would occur in the infant mortality stage.

# C. Charge effects and I-V stability

The quality of silicon dioxides has a significant influence on the reliability of MOS devices. Three mechanisms of failure due to the injection of charge into oxide are considered [Amerasekera97]:

- Slow Trapping or negative –bias temperature instability.
- Hot carrier injection: one of the most important charge related mechanisms.
- Plasma damage is becoming more important with new process equipment.

## (1) Slow Trapping

Slow trapping occurs either during the read/erase programming memories or when high electric fields in the silicon can provide electrons with sufficient energy to cross  $Si-SiO_2$  interface [Woods80]. The presence of permanently trapped charges at the oxide interface affects the threshold voltage for MOS devices. In bipolar transistors, the effect of slow trapping is a monotonic increase in the current gain [Yiqi91]. The mechanism has an activation energy of around 0.9 eV.

The amount of trapped charge can be reduced by high temperature (>150  $^{\circ}$ C) annealing in a hydrogen ambient. However, the only real solution is to limit the number of interface states by controlled growth of the gate oxide.

## (2) Hot Carriers

Hot carriers are produced when the source-drain current flowing through the channel attains high energy beyond the lattice temperature. Some of these hot carriers gain sufficient energy to be injected into the gate oxide, resulting in charge trap and interface state generation. The latter may lead to shifts in performance characteristics of the device, e.g., the threshold voltage, transconductance, or saturation current, and eventually to its degradation [Bernstein06].

Hot carriers are also a concern in advanced bipolar processes. The hot carriers are injected into the oxide in the region of the emitter-base junction which will change the bipolar parameters such as leakage current and gain [Maugain95, Witczak94].

Hot carriers in the semiconductor devices are the cause of a distinct wear-out mechanism, the hot carrier injection (HCI). The rate of hot carrier is directly related to the channel length, oxide thickness, and the operating voltage.

The common solution to hot carriers is to use process techniques to reduce the electric field at the drain junction in MOS transistors. Hot carrier robust circuit designs can be realised by identifying circuit nodes sensitive to hot carrier stress to improve the VLSI circuit robustness [Mistry94].

## (3) Plasma Damage

Plasma processes such as etching, deposition, cleaning, and ion implantation can all cause damage to the thin oxides in MOS devices. Charging is considered to be due to plasma non-uniformity [Amerasekera97].

The most common effect observed is that of increased oxide leakage current after plasma damage takes place. In MOS devices, degradation of the  $I_D$ -V<sub>G</sub> characteristics is also observed, usually in the form of an increase in the threshold voltage and a decrease in the subthreshold slope [Lin96].

The simplest prevention is to determine the source of the plasma non-uniformity and to neutralise the effect thought it is not easy. Advanced technologies increasingly use 'dry' plasma processing rather than 'wet' chemical one. On-chip protection can be provided by using diodes to clamp the voltages across the thin gate oxides [Amerasekera97].

# 3.5.4 Defects and piping

Dislocation and crystalline defects are typical defects for electronic devices. The crystal defects can create current paths, known as pipes, in bipolar and MOS transistors. The reliability issues with these two mechanisms will be discussed as below.

# A. Defects

Dislocations are structural defects in the silicon lattice which can be electrically active through the presence impurities; crystalline defects are of particular concern in epitaxial bipolar transistor. In advanced semiconductor technologies, issues related to them have become more of a process engineering concern rather than a reliability issue [Amerasekera97]. Therefore, these two issues will not be discussed in details herein.

# **B.** Piping

Typical ranges for pipe resistances are 10 k $\Omega$  to 30 k $\Omega$ , but can extend from a few hundred ohms to a few M $\Omega$ . Resistance in these pipes are linear at low voltages. Beyond a specific voltage, the resistance increases with current until eventually junction breakdowns [Amerasekera97].

A pipe with a low resistance can pass specifications for output voltage and leakage current checking, but performance in field conditions may be impaired. Some screen techniques may be used to detect the piping caused by crystal defects; pipes caused by dislocations may be prevented by improving processing at the manufacturing stage [Amerasekera97].

Piping is more time-dependent failure mechanism, and is not a significantly temperature dependent, so it could be a problem in high reliability applications [Amerasekera97].

## **3.5.5 Migrations**

The previous mechanisms are all associated with the 'front end' of the semiconductor manufacturing process. Hence the relevant failures will principally affect the performance of the transistors. Electromigration, contact and via migration, and stress-induced migration are related to the 'back end' of semiconductor process. Such issues will generally have an effect on the long-term reliability of a circuit.

## A. Metallization issues- Electromigration

The EM physics can be simply explained as: at high current densities, the force exerted by electrons scattering off the positively charged metal ions becomes stronger than the electrostatic pull force toward the cathode. Thus, the diffusion of the ions is biased in the direction of the electron flow, leading to electromigration. Its effects are expected to be the characteristic of the material, such that the activation energy for electromigration is dependent on the material type, the size and orientation of the grains, stress, temperature and even the length of the conductor [Bernstein06].

The degradation of the wire bond joint leads to an increase of the current density at the bond contact areas still remaining intact, thereby accelerating electromigration. Furthermore, the bond degradation implies an increase of the bond contact resistance causing higher heat dissipation at the bond position. A local temperature increase is obtained, which also enhances electromigration and thermomechanical fatigue. The bond degradation mechanism is therefore self-accelerating. Eventually the local temperature rise is large enough to either trigger the parasitic bipolar transistor of the DMOS (latch-up) or to initiate self-conduction in the silicon underneath the bond. The device fails by a thermal runaway mechanism destroying the junctions and resulting in the observed drain-source breakdown [Glavanovics04].

It is difficult to model electromigration median time to failure (MTTF) starting from the first principles of the failure mechanism. While there are many competing models, no one has been universally accepted. Currently the favourite method to predict time to failure is an approximate statistical one given by Black's equation:

$$MTTF = AJ_{e}^{-n} \exp(E_{a}/kT)$$
(28)

where  $J_e$  is the current density and  $E_a$  is the EM activation energy. A is a constant, which depends on a number of factors, including grain size, line structure and geometry, test conditions, current density, thermal history, etc. Black determined the value of n equal to 2. However, n is highly dependent on residual stress and current density [Ohring 98] and its value is highly controversial.

## **B.** Contact Migration

Contact migration is the interdiffusion of the contact metal and silicon, either under current or temperature stress conditions. The activation energy for the out-diffusion of silicon at a contact is between 0.8 and 0.9 eV which agrees with the activation energy for the diffusion of silicon in aluminium.

Contact migration is one of the major problems in GaAs devices [Christou94 cited by Amerasekera97]. The use of better alloys should prevent it.

# C. Via Migration

Via migration happens between metal and metal contacts, or vias. Both open circuit and resistance increases have been reported depending on the direction of electron flow and the type of metal system used [Oates94].

It has been shown that the incorporation of a thin titanium interlayer between the via and the aluminium metallization significantly improve the via reliability [Graas94]. Improved step coverage in the vias, elimination of oxygen residuals and silicon nodules, and the increase of the metal line thickness in the upper metal line have all been shown to improve via reliability [Amerasekera97].

## **D.** Stress induced migration

The deposition of the different layers and the associated shrinkage, during the wafer manufacturing process, can result in large mechanical stress in the individual layers. This results in a high driving force for creep voids which are a significant reliability problem for narrow lines [Jones87].

The content of the silicon and copper doping, the substrate temperature, grain size, and heat treatment need to be optimised together with the passivation techniques to control this mechanism [Amerasekera97].

## **3.5.6** Microcracks and step coverage

The topography of the surface of the wafer can results in large steps as shown in Figure 12. This can lead to enhanced susceptibility to electromigration due to high current densities, or open circuits due to cracking of the metallization [Van de Pol 96].

An increase of interconnect resistance, open circuit or electromigration-type damage mechanisms. By controlling the aspect ration across a wafer and reducing the step size, it is possible to ensure that the step coverage is close to 100% to prevent the failure caused by this mechanism.



Figure 12: Examples of Structures with potential for Microcracks [Amerasekera 97] (a) Step at the edge of a thick oxide; (b) Undercutting of a thin oxide

## 3.5.7 Radiation

Semiconductor devices may be exposed to two types of radiations [Amerasekera97]:

- (1) External radiation:  $\gamma$ -rays, cosmic rays or x-rays from the operating environment. Most investigations have used accelerated tests to simulate the radiation effects in the laboratories. For aerospace applications, this type of radiation is a very important issue.
- (2) Intrinsic radiation: radioactive impurities, such as thorium or uranium, are present in packaging materials. These materials can emit  $\alpha$  particles with energies up to 8 MeV. Protection against  $\alpha$  particles using a shield is a solution, forr example, coating the chips with polyimides.

## 3.5.8 Package and assembly

The main functions of a package are:

- To enable the chip to be connected;
- To provided a sealed and protected environment against moisture, contamination, particles, radiation and so on.
- To provide a thermal sink of the chip.



Figure 13: cross section of a moulded plastic package [Amerasekera 97]

A cross-section of a moulded plastic package is shown in Figure 13. The lead frame consists of the pins which make contact to the outside. The die is attached to the lead frame, and very thin bond wires connect the die and lead frames. The die is encapsulated either in ceramic or in plastic packages.

Relevant issues for semiconductor devices are outlined in this subsection and the issues for power electronics are reviewed in the next subsection. Detailed analyses of packaging and the related reliability for microelectronics components have been presented in [Tummala89].

Table 5 lists the electrical signature and the physical failure mechanisms for package and assembly [Tummala89]. Failure mechanisms associated with the packaging process are separated into: electrochemical, thermal and mechanical. Specifically four major failure mechanisms can be identified:

- Die attach failures;
- Bonding failures;
- Delamination and popcorn effect;
- Corrosion.

More details of them will be discussed in the following paragraphs.

| Electrical Signature | Failure mechanism                                           |  |  |  |  |
|----------------------|-------------------------------------------------------------|--|--|--|--|
|                      |                                                             |  |  |  |  |
| Open circuit         | Wire bond defect                                            |  |  |  |  |
|                      | Kirkendall voids (induced in a diffusion couple between two |  |  |  |  |
|                      | metals with different interdiffusion coefficients)          |  |  |  |  |
|                      | Corrosion                                                   |  |  |  |  |
|                      | Fatigue crack                                               |  |  |  |  |
| Short circuit        | Dendrite formation                                          |  |  |  |  |
|                      | Bond wire displacement                                      |  |  |  |  |
|                      | Dielectric breakdown                                        |  |  |  |  |
|                      | Overheating                                                 |  |  |  |  |
| Noisy circuit        | Unstable contacts                                           |  |  |  |  |
|                      | Fracture                                                    |  |  |  |  |
|                      | Variable temperature distribution                           |  |  |  |  |

Table 5: Electrical signatures and failure mechanisms for package and assembly [Tummala89, Amerasekera 97]

## A. Die Attach failures – solder fatigue

Failure mechanisms for the die attachment can be separated into two categories:

- (a) Failures related to the integrity of the die attach, including voids in the die attachment and cracking due to mechanical stress at high temperature. Voids in the die attach will lead to increased thermal resistance that can result in cracking during temperature cycling [Koziarz95].
- (b) Corrosion due to contamination and moisture introduced by the lead frame of the adhesive.

Failure modes are burn-out, parameter shifts, or corrosion related. Additionally, cracks in the die during excessive mechanical or thermal stress conditions will be catastrophic. Many die attach failures are observed during thermal cycling and highly accelerated stress testing (HAST) [Amerasekera97].

Detection techniques for poor die attach are extremely sophisticated. Scanning Acoustic Microscopy (SAM) and X-rays have both been demonstrated to be extremely effective in determining the presence of voids [Lee89].

The die attach integrity is strongly affected by temperature due to the themomechanical stress caused by the difference in the CTEs. Values of typical CTEs are given in Table 4. Used in conjunction with a mechanical shock or vibration test, these tests make it possible to screen for devices which would be prone to attach failures. MIL-STD 883 incorporates these tests as part of its required screening procedure for high reliability military components [Amerasekera97].

Corrosion issues are addressed by contamination control both in the die attach material and during processing. The cleanliness of the die/leadframe during assembly is important. Techniques which reduce the exposure of the chip to high temperatures prior to moulding will help to minimise leadframe oxidation [Tummala89].

## **B.** Bonding failures

Gold is the preferred metal for the bond wires because of its resistance to corrosion and ease to implement. Gold wire may also contain 1% copper to add stiffness. Aluminium wires containing 1% silicon are also used.

The bonding between the chip and the package may be performed using either a wedge-bond or a ball bond. Wedge bonding is usually purely ultrasonic since no heat is required; however ultrasonic techniques are more time-consuming and, therefore, more expensive than thermosonic techniques. Aluminium bonds use pure ultrasonic bonding because of the difficulties in forming the ball bonds. Ball bonding is the most commonly used in IC packages because the shorter times required.

There are six main causes of bonding failure:

- (a) Formation of intermetallics due to gold-aluminium interdiffusion. Au<sub>2</sub>Al formation is necessary to ensure a good contact. However, if the alloying is not controlled, purple plague will be formed, which is a particular gold-aluminium alloy to cause bond failures [Ritz87]. Voids are also observed to form at the base of bond wire due to the different interdiffusion rates of Al and Au. These voids undermine the quality of the contact at the base of the bond and failures may occur due to bond lifts [Chang04].
- (b) Bond looping and lagging. If the loop is too tight tension in wire is high and tends to fracture; if too loose, the wire is free to move and may short circuit with adjacent wires.
- (c) Bond integrity. Considerable improvements in bond integrity have been observed by better surface cleaning using, for example, plasma methods [Christou94]
- (d) Whisker growth occurs at the bond pad to equalise compressive stress.
- (e) Wire sweep occurs during the moulding process. By controlling the length of wire, the extent of the wire sweep can be limited.
- (f) Bonding pressure affects bond integrity with low bonding pressure giving rise to low fracture strengths in the neck and heel.

## Effects

The most common failure mode observed is open circuits due to bond lifts. The formation of intermetallics can result in high resistances [Shirley87]. Wire sweeping and whisker growth can cause short-circuits between adjacent wires. High tension or bonding pressure issues can lead to fractures and consequently open circuits. Of bonding failures, it has been observed that while the heel is probably the part expected to give way first, breaks due to tensile stress are quite common.

Thinning of the bond wire, due to oxidation, especially when using aluminium can result in localised heating and lead to thermal melting and eventual open circuit [King89].

Bonding failures are typically of a wear-out nature found in the reliability curve. Those that account for infant mortality should be detected by tests before reaching the customer.

#### Prevention

Screening for defective bonds is performed by means of a bond pull or bond shear test [Tummala89]. It is usually to perform the test after a burn-in at 250°C for 48 hours as described in MIL-STD883 method 2023. Additionally, thermal shock, mechanical shock and excess vibration are all recommended as screens for weak bonds.

The use of bond metals other than gold to eliminate purple plague has been reported. Alloys of aluminium are now viable alternatives to gold in some applications.

Design approaches are also important to limit possible bond failures. To ensure good ball-bond strength, a minimum ball diameter of three times the wire diameter is specified in military specifications. Other size requirements to avoid ball bond overlapping the edge of the bond pad are also important.

## C. Delamination and the popcorn effect

Delamination is essentially a moisture related failure mechanism but may also be initiated by thermal mismatch [Moore92, Hu07]. Plastic package can absorbs nearly 0.4% moisture by weight from the air, and additional water can be absorbed through cracks formed through thermal mismatch between leadframe, die and encapsulant. During temperature cycling, the moisture at these interfaces can expand and cause various failures. The typical one among them is known as 'popcorn' because of the sound of the die cracking [Moore92].

Delamination may also occur due to mismatch in the CTEs of the die and the encapsulant materials. Such delamination may result in increased susceptibility to moisture ingress and corrosion as well as bond shearing [Amerasekera97].

The best solution is to prevent moisture from reaching the package. This is achieved by Moisture Sensitivity Classification [Moore92], through which devices are classified into different levels and have different application and storage requirements. CTE matching is always an issue. Methods to deter package cracking have been studied using improved moulding compounds, polyimide coating on the back side of the die and improved leadframe design [Omi91].

## **D.** Corrosion

Corrosion requires the presence of moisture and is enhanced by ionic catalysts, for example, flame retardants. Failure is usually due to an increase in the metallization resistance and eventual open circuit, or an increase in the leakage current between adjacent tracks of metallization due to migration or whisker growth. Corrosion is typically a wear-out mechanism with reference to the bathtub curve [Amerasekera97].

## 3.6 Summary

General failure mechanisms for electronics, such as capacitors, PCBs and semiconductor devices, are reviewed in this section. These mechanisms have been studied for many years and they supply very solid experience and knowledge foundation for the exploration of the failure mechanisms for power electronics, which will be reviewed in the next section.

# 4. Failure Mechanisms for power devices

Eight failure mechanisms for semiconductor devices are reviewed above. For power devices, some of them may not be very important because of much larger physical dimensions and different electrical and thermal stresses caused by high power applications. The failure mechanisms in power devices, namely power diodes, bipolar transistors, thyristors (IGCTs and GTOs), MOSFETs, and IGBTs are reviewed first and then the packaging issues are followed.

# 4.1 Power diodes

The leakage effects are a major concern since the diode is a bistable device and the difference between the 'off' and 'on' is determined by the terminal current:

- (1) The effect of ionic contaminants on the surface conductivity can result in a gradual increase in leakage current, which is a major concern to a diode.
- (2) A second issue that of depletion region is walk-in and walk-out. Electron injection would create a negative surface charge reducing the surface electric field resulting in breakdown voltage walk-out, which increases the breakdown voltage, while a positive charge would enhance the surface electric field leading to walk-in [Pepper73, Brisbin04, and Ohring98].

Surface states, traps and ionic contamination will affect the diode parameters, so the junction depletion region needs to have a low defect density [Amerasekera97]

The variation of energy band gap is directly related to the performance of a diode, which can be caused by:

- (1) Mechanical stress may lower the bandgap and reduce the knee voltage;
- (2) Migration of dopant impurities can lead to short circuits and possible burn-out of the junction. Contact migration is a particular problem with the Schottky diodes [Amerasekera97].

Forwarded biased diodes have very good ESD capability, limited only by the power dissipated. However, diodes are prone to electrical overstress damage and thermal breakdown, especially in clamping diodes, for example zener diodes.

From the previous research efforts, it can be seen that the leakage current, the breakdown voltage, and the knee voltage can be used as signatures of reliability diagnosis. The leakage current and breakdown voltage may be measured by using some instruments, for example, M3 Semiconductor Analyzer [www11].

#### 4.2 Bipolar junction transistors (BJTs)

Bipolar technologies have failure mechanisms which are mostly common to all applications [Amerasekera97]:

- (1) Electrical stress (EOS, ESD and latchup).
- (2) Junction degradation and hot carrier injection.
- (3) Contamination
- (4) Corrosion
- (5) Electromigration and contact migration
- (6) Stress relief migration
- (7) Diffusion defects and piping
- (8) Insulating oxide breakdown (similar to passivation defects)
- (9) Radiation
- (10) Microcracks
- (11) Packaging

It can be seen that nearly all the failure mechanisms take effect on bipolar transistors. Power BJTs have low current gain, especially at larger breakdown voltage ratings. This has led to the development of monolithic Darlington transistors [Mohan95], which introduce further reliability concerns. Therefore, in some power applications, BJTs have been replaced by other devices, such as MOSFETs and IGBTs.

#### 4.3 Thyristors (SCRs)

Silicon Controlled Rectifiers (SCRs) are subjected to the same failure mechanisms as bipolar devices and PN diodes. SCR are widely used in electrical power system and high power applications, and are required to [Amerasekera97]:

- (1) Block large forward and reverse voltages with negligible current flow in the off-state.
- (2) Be able to conduct high current without an increase in on-resistance.
- (3) Be able to switch on and off as fast as possible.

These requirements make the following mechanisms important [Amerasekera97]:

(1) Junction integrity. The requirement for voltage blocking makes the junction integrity important. The issues to cause the depletion region walk-in and walk-out are important to the off-state current of SCRs when reverse biased.

Current gain degradation can occur due to trapped interface charge or hot carrier injection.

- (2) Fast voltage and current ramps. During turning off, the non-uniform current distribution, caused by fast di/dt, can lead to increase power dissipation and eventual thermal damage. Second breakdown is therefore a concern for SCR devices and defines a SOA for device operation. A fast rising voltage ramp can cause displacement current, which can spuriously trigger the device on [Sze81].
- (3) Latch up is another major failure mechanism to SCRs. It is important that the trigger currents for parasitic thyristor action are for above the maximum current levels for the device [Shekar91].
- (4) High temperatures will increase the current gains, thus making it difficult to switch off. The quality of heat sinking plays an important for SCR behaviour.

The breakdown voltage and leakage current can be used signatures to monitor the SCR conditions. Good circuit designs are very important to the applications of SCRs to prevent the failures caused by fast voltage and current ramp.

## 4.4 Power MOSFETs

Common failure mechanisms shared by both power MOS devices and low power MOS devices are gate-oxide failures, ionic contamination, electromigration, and corrosion. Because of the relatively large dimensions of the gate oxides and the gate lengths, problems with hot carriers are negligible [Amerasekera97].

#### Gate Oxide Failure

Tests indicate that power devices are affected by time-dependent oxide breakdown. Although the applied gate voltage may not exceed the maximum intrinsic voltage of a gate oxide, it is possible that the electric fields may be in excess of 5 MV/cm, leading to a reduction in median lifetimes [Amerasekera97]. Particularly vulnerable periods occur during the switching inductive circuits, when overvoltage or rapid reapplication of voltage, caused by excessive dV/dt, may occur.

#### ESD

Power MOS devices are still sensitive to ESD [Duvvury94, Grant89], especially the LDMOS transistors which have a large spacing between the highly doped drain region and the channel. Gate short was a common failure for VDMOS usually caused by excessive voltage applied to the gate either in circuit or during handling. Although power MOS devices are expected to support DC voltages in excess of 100V and approaching 1000V, the current levels are not as high as those injected during ESD [Amerasekera97].

## SOA

For power device, SOA is an important specification. The forward-biased SOA is shown in Figure 14 [Amerasekera97]. SOA failures are normally caused by one or more excessive thermal transients, or by poor heat-sinking [Grant89].



Figure 14: Safe Operating Area of a power MOSFET. The boundaries determined by (A) resistance; (B) current; (c) Thermal; (D) second breakdown; (E) voltage limits.

#### Leakage current caused by mobile ions

Movement of mobile ions through the passivation can increase leakage through the junctions and the transistor becomes overheated and thermal runaway ensues [Grant89]. The leakage current also reduces the junction breakdown voltage level and avalanche failures occur when excessive avalanche current is drawn during unclamped inductive switching.

#### 4.5 IGBTs

IGBT is one of the most complicated power devices, which can be regarded as a combination of a power MOSFET and a power diode or transistor. The failure mechanisms discussed above for power MOSFETs, power bipolar transistor and power diodes all affect the reliability of a power IGBT. IGBTs normally work at medium frequency, from n kHz to n\*10 kHz, and medium power ratings, so IGBTs have their own failure characteristics:

#### **EOS failures:**

(1) Tail current is one of the typical IGBT characteristics and attention is generally paid to the turn off. The failure mechanisms of IGBT's under short-circuit and clamped inductive stresses are studied in [Trivedi99]. Short-circuit failure, even to a latch-up free IGBT, is caused by competing mechanisms of parasitic thyristor latchup and thermally assisted carrier multiplication.

Heat removal is less efficient under short-circuit stress since the hot spot is located away from the metallic contact, and presence of gate oxide further deteriorates the robustness of the device. Short-circuit failure amounts to the most stressful condition.

(2) For the half bridge and full bridge configuration modules, a fast turn on due to a small gate resistance results in a high dv/dt, which may trigger latch up because of the parasitic thyristor structure. Because of the parasitic thyristor structure, IGBT devices are sensitive to a deep voltage swing. Therefore it is necessary to optimize IGBT protect circuit in order to avoid EOS failure. Gate driving circuit affects the switching performance of IGBT modules. A small gate resistance is beneficial to get high switching speed and a low power loss but with a high surge voltage during switching. Therefore, trade-off between switching speed and reliability is very important in IGBT application design [Wu 98].

## Partial discharge

Partial discharge (PD) is a partial breakdown of the insulation material. An example for a PD source is a small void in ceramics [Schutze98]. Insulation is guaranteed by interposition of AlN (Aluminium Nitride) between module copper baseplate and copper layer underneath chips. Some variation of inception voltage of partial discharge after power cycling was found [Fratelli99], which may lead to insulation failure.

# SOA

The SOA issue of IGBTs is very complicated. It is related to the IGBT structure and can be optimised by improving the p-well layer [Mochizuki02] and the gate driver circuit [Luo98, Wu98].

The thermal destructions of an n-channel punch through IGBT in forward biased SOA were studied and the cause was the thermal disappearance of built-in potential of p-n junction between the n+ emitter and the p base. The electrical destruction is caused by impact ionization at the n- drift/n+ buffer and the n- drift/p base junctions [Hagino 96]. Similar destructions were reported in [Benmansour07], where the failure mechanisms of thermal destructions were concluded as thermal runaway.

Package limitations, mainly solder fatigue and bonding, have been widely studied as early as IGBTs were produced. IGBT package reliability issues, mainly solder fatigue and bond wire lift-off, will be summarised in Section 4.5.

# 4.6 Packaging

The IGBT is a very important type of power device. Multi-chip plastic module and press pack IGBTs have been applied. The latter was borrowed from GTO and IEGT technology used in high power applications [Omura03]. The reliability of these two packaging technologies will be introduced and compared.

Two main failure mechanisms for IGBT modules will then be reviewed. The technology of high power IGBT modules has some weaknesses. The first weakness has been identified the bonding attach reliability, i.e., bond wire lift-off after thermal or electrical cycles. The second failure mode is the solder attach reliability, i.e., the cracking of the solder layer, which is also called solder fatigue [Coquery99].

Finally reliability issues caused by the lead-free technology for power modules will be reviewed.

## 4.6.1 Press pack and power modules

There are two main package technologies for the IGBT, plastic module IGBT (PMI) and press pack IGBT (PPI), shown in Figure 15 and 16 respectively. IGBT plastic modules are widely used in all power electronics areas. Press packed IGBTs are used in traction [Golland05], motor drives [Bellamy04, Bernard07, Janning07, Jacob07], power System T&D [Gunturi06, Kaufmann02, Eicher04] and pulse applications [Welleman03].

IGBTs use the same semiconductor chips that are used in the wire bonded flat pack IGBT modules, so all the fast switching, protection and control abilities of the PMI is retained. Such a package provides a superior set of benefits compared to all the other possible fast switching semiconductors including IGCTs [Bellamy04]. The PPI has been used to refurbish the SCR/ GTO –based inverters in traction applications since they have the same style package [Golland05].

For very high voltage applications such as power transmission systems, IGBTs are connected in series to form a compact stack structure. If one IGBT in such a system fails in a short circuit mode, the system can continue to work in many cases. If the IGBT fails in an open circuit mode, on the other hand, the system does not work anymore. The failure mode of the PPI is basically short circuit due to the press contact type structure, which means the PPI has an advantage for very high voltage series connected applications over the conventional IGBT module, i.e., PMI, whose failure mode is basically open. Therefore, the press-pack IGBT is much more suitable device for high power and/or high voltage applications than the conventional IGBT modules due to the capability to make the system compact and reliable [Uchida00].



(a) 1.7 kV Dynex IGBT module; (b) Scheme of a IGBT module [Held97] Figure 15: Plastic IGBT module



(a) 2.5 kV Toshiba Press pack IGBT; (b) Scheme of a press pack IGBT [Ye02]

# Figure 16: Press-pack IGBT

Compared with the PMI, the PPI can be double side cooled, so improved cooling of the semiconductor chips can be achieved. This structure brings the following merits [Bellamy04]:

- Enables higher current ratings to be achieved
- Eliminates the needs for wire bonds as used in the PMI.
- Has a very significantly improved thermal cycling versus life rating

For the press pack, during the formation of the conducting alloy and at later stages of operation, molten Al attacks Mo forming cracks in the base plate. The interaction of the Mo particles with Si and Al over a period of time leads to the formation of various intermetallics in the conducting alloy, reducing its conductivity. Ohmic heat increases due to increasing volume fraction of intermetallics, leading to further deterioration and failure by oxidation of the conducting alloy [Gunturi06].

The solder fatigue and bond wire lift off are not major problems to the PPI, but to the PMI, which will be introduced in the following parts.

# 4.6.2 Bond wire lift off:

Bond wire lift off is caused by thermomechanical stress during power/thermal cycles. The thermal fatigue due to these cycles has been focused since 1980s on two main failure mechanisms on the packaging level: wire bond lift-off and solder delamination [Wu95, Held97 and Hamidi01]. The former is reviewed in this subsection and the latter in the next subsection. A diagram for bond wire lift-off is shown in Figure 17.



Figure 17: An example for bond wire lift off [Johnson05]

Lifting of bond wires is mainly caused by crack growth which is induced by the thermomechanical stress caused by temperature swings and the different coefficients of thermal expansion (CTE) of silicon and aluminium [Held97]. In thermal fatigue testing with high  $\Delta T_j$ , cracks propagated from both ends of Al wire bonds to the centre along the small grain boundaries of Al wires due to thermal tensile stress. [Onuki00]. It is assumed that the difference in grain size is caused by the wedge bond process. When a crack reaches the centre the bond wire lifts i.e. the mechanical and electrical contact to the metallization is interrupted [Held97, Ciappa00].

The parasitic effects related to the closed coupling between the bonding wires cause non-uniform current distribution between bonding wires, unbalanced transient current between paralleled IGBT chips in high power IGBT modules, as well as the mechanical stress on the connection joints induced by the magnetic fields. This phenomenon may also contribute to the wire-bonds' fatigue [Xing98].

In devices with multiple bond wires, for example IGBTs, bond wire lift off can be regarded as a domino effect because it leads to a non-homogeneous current distribution on the IGBT chip and therefore to higher local and average temperatures which accelerate the lifting of further wires [Held97].

A lifetime model has been developed for IGBT modules from experimental data. The model takes into account both the redundancy of the bond wires within an IGBT module and the fatigue damage due to realistic application profiles [Ciappa00].

Bond wire lift off for IGBTs may occur on emitter wires or gate bond wires, so the most useful indicator for this failure mechanisms are  $V_{CEsat}$ ,  $V_{GE(th)}$  or gate current [Fratelli99, Ciappa00]. It appears difficult to establish the cause of gate leakage failure; it could be due to mechanical damage occurred on chip metallization during cycling test due to the grain reconstruction of the metallization [8], or during manufacturing due to the bonding process [Coquery00].

Electrical detection of bond wire lift-off for power semiconductors were reported [Lehmann03, Cova98, Glavanovics04]. It should be noted that the signatures of bond wire lift off are shared by other failure mechanisms, such as solder fatigue and crack, or aluminium reconstruction [Ciappa07]. Microscopy technology may be necessary to exclude other failure mechanisms.

Countermeasures to improve the reliability of metallization and bond wires like using of polymeric coating was proposed and their efficiency at high temperatures should be evaluated [Held97, Schütze98].

## 4.6.3 Solder fatigue and cracking

Solder fatigue and cracking are caused by the thermal stress and the defects inside the solder, for example voids will accelerate the process of solder degradation. In some papers, solder fatigue and cracking are classified into two mechanisms [Ciappa07].

As shown in Figure 15, solder is used as connect die and DCB attaches. Initial solder microstructure, substrate metallization, intermetallic compounds and cooling rate are all related with the solder fatigue process [Thébaud00].

The solder fatigue and cracking cause the increase of contact resistance and finally lead to the delamination of attach layers. Thermal fatigue parameters include thermal resistance ( $R_{th}$ ), saturated collector emitter voltage ( $V_{CE,sat}$ ) and gate leakage current ( $I_{GE,s}$ ) [Coquery99].

An experimental set-up to test the soldering fatigue is given in [Coquery99]:

• 3300V, 1200 A IGBT modules are tested. Accelerated power cycling tests for both plate versions: copper and AlSiC under test condition as simple as possible.

- Junction temperature  $(T_j)$  is determined by an electrical measurement of  $V_{CE,sat}$  using the electrical method by injection of a low  $I_c$  used to the initial thermal calibration.
- Test results are plotted as curves showing the variation of  $V_{CE,sat}$ ,  $R_{th}$  (junction to case and junction to water). the criteria for the detection of failures are: +20% for  $R_{th}$ ; +5% for  $V_{CE,sat}$ ; and  $I_{GEs}$  >0.1mA [Fratelli99, Coquery00].
- The base plate is equipped with thermocouples installed in special holes directly under the chip to measure the case temperature.
- Ultrasonic imaging to show the location of failure and destruction level.

According to [Coquery99], the bonding failure is directly related to the swing of junction temperature  $(\Delta T_j)$ , and solder fatigue is directly related to the case temperature  $(\Delta T_c)$ . Such a classification is reasonable considering the thermal distances for these two failure locations.

Thermal fatigue tests were also explained in [Cova97, Thébaud00, Dupont03, and Mermet07]. Such tests are very application-dependent, so a proper definition of mission profile is very important.

Scanning acoustic microscopy is applied to study the solder joint reliability in [Herr97]. Results from temperature cycling tests were combined with results from power cycling tests to predict the solder joint reliability over a wide range of temperature excursions.

A validation of the numerical tool is presented by comparison with experimental measurements [Khatir04]. The paper points out the effect on the thermal stress of the IGBT chips position on the DCB substrate. In particular, a slight shifting of the silicon chips may be sufficient to delay significantly the initiation and the propagation of the cracks, allowing a higher device lifetime of the studied module.

Some technologies have been proposed to counteract the solder failures. A metal matrix compound instead of copper for baseplate was proposed to dramatically reduce the thermo-mechanical stress caused by power cycling [Lefranc00]. Low temperature joining technique was recommended to be a reliable alternative even at extreme thermal conditions [Amro06].

Void Induced Thermal problems for power MOSFETs were studied by [Katsis03, 06]. However, for IGBT modules, the void issue seems less important and void free technology for IGBTs has been reported [Onuki00].

# 4.6.4 Temperature measurement issues

Thermal impedance measurements have allowed evaluating the damage in the solder joints [Thébaud00]. The derivation of thermal impedance needs the online measurement of junction temperature and calculation of power loss. Online estimating junction temperature techniques were reported in [Reimann00, Cova97 and Murdock06]. A framework of estimating the chip temperature and power losses is given in Figure 18.



Figure 18: Block diagram of the supervision of the chip temperature and power losses [Reimann00]

## 4.7 Summary

There are many failure mechanisms for power devices and several mechanisms often take effects simultaneously, which makes the analysis more complicated.

There is still much space to improve the power devices reliabilities though reliability is not a new issue. For example, huge efforts have been reported on the package limitations on IGBTs, and package issues has never disappeared from the concern list for reliability, especially with the sustainable growth of novel IGBT technologies. Condition monitoring for power electronics is, therefore, very important.

# 5. Condition Monitoring for Power Electronics Reliability

## 5.1 introduction

Condition monitoring offers the potential of preventing catastrophic failure. By condition monitoring, the following purposes can be realised [www12]:

- Reduce or avoid forced outages;
- Improve safety to personnel and the environment;
- Improve equipment or system utilization;
- Improve equipment or system availability;
- Optimize implementation and maintenance costs.

The complete condition monitoring of a large system can be very difficult due to the huge number of components. Normally most critical and fragile components are selected to be monitored. For example, the condition monitoring for power system transformers in electrical power system [www12, COMET07], for key motors in motor drive industries [Ran98], and for wind turbine generators in distribution generations [McMillan07].

The most fragile electronics parts, as explained above, are capacitors, PCBs, semiconductor devices, solder joints and connectors. Most condition monitoring

projects for power electronics concentrates on these parts, which will be reviewed in the next subsection.

The normal flow of condition monitoring can be summarized as [www12]:

- Identify failure mechanisms and screen others;
- Identify failure modes;
- Identify failure causes;
- Identify effects of failure modes;
- Identify criticality or risks;
- Select on-line monitoring to match characteristic of developing failure causes.

The failure modes, causes and relevant effects can be studied by means of targetoriented tests and simulations, which are explained the later subsections.

# 5.2 Review of research projects

Some failure mechanisms and failure indicators have been determined since years of laboratory research thanks to research project as LESIT (1991-95) and RAPSDRA (E.U.1996-98) [Coquery00]. These two projects, especially LESIT, built a foundation for the later research projects. The review of these two projects is followed by the projects from CALCE and condition monitoring projects on power electronics.

# 5.2.1 LESIT

LESIT (Leistungselektronik, Systemtechnik und Informationstechnologie) is a research programme in the area of power electronics, system and information technologies directly supported by Swiss parliament. The main concern of this project was to study the impacts on energy consumption [www13].

The development of these technologies has been fostered with some 50 million SFr. The key modules were as follows [www13]:

Power electronics:

- Module 1: Silicon Power Device Technology
- Module 2: Power Electronic Circuits
- Module 3: Power Electronic Systems

Information technology:

- Module 4: Microsensor Technology
- Module 5: Radio Communications
- Module 6: Microwave and Gigabit Electronics

and:

- Module 8: Dielectric Electronic Materials
- Module 9: Reliability and Electromagnetic Compatibility

LESIT was announced to be completed with outstanding scientific results and efficient transfer of research to industry [Fichtner02]. Several publications were from those who were involved in LESIT project, for example, [Held97, Jacob95, and Wu95]. These papers concentrate on IGBT thermal stress under traction applications, and were highly regarded by later researchers [Amro 04, Coquery00, Mermet07].

During LESIT project standard modules with base plate have been tested. At the end of the project a considerable data-base was collected. The results for different temperature swings at three medium temperatures are shown in Figure 19.



Figure 19: LESIT results for different  $\Delta$ Tj [Held97, Amro04]

All three lines of the fit in Figure 18 can be expressed by the equation [Held97, Scheuermann02]:

$$N_f = A\Delta T_j^{\alpha} \exp\left(\frac{E_a}{k_B T_m}\right)$$
(29)

where N<sub>f</sub>: number of power cycles;

k<sub>B</sub>: Boltzmann constant=  $1.380* 10^{-23}$ JK<sup>-1</sup>; E<sub>a</sub>: activation energy, J; A: constant = 302500 K<sup>- $\alpha$ </sup>; A: constant =-5039.

# 5.2.2 RAPSDRA

RAPSDRA (Reliability of advanced high power semiconductor devices for railway traction applications) is an EU project. the collaborators include: Semicondutor manufactures (ABB, Eupec, Mitel, AnSaldo Transporti, Siemens ZT), Railway operators (SNCF, DB, FS), Research Institutes (INRETS, ETH, Ansaldo Research), and Universities (Bordeaux, Dortmund, Cambridge, and Parma) [RAPSDRA98].

The failure mechanims and reliability test explored in RAPSDRA project are listed in Table 6.

There are plenty of publications from this project, for example, [Ciappa00, Cova97, Berg98, Petrarca99, Thébaud00a, Thébaud00b].

| Failures            | Items                         | Tests                                    |
|---------------------|-------------------------------|------------------------------------------|
| Silicon chip        | Local distribution of free    | IV @ 25&90°C                             |
|                     | carriers                      |                                          |
|                     | Ion migration                 | Humidity test 85°C/85%RH.                |
|                     | Oxide defects                 | Temperature & voltage accelerated test   |
|                     | Current                       | Leakage test/switching and power cycling |
|                     | crowding/filamentation        |                                          |
|                     | P-n junction                  | Optical beam induced current enhancement |
|                     | Aluminum reconstruction       | Rapid power cycling.                     |
|                     | Back metal delamination       | Power cycling                            |
| Encapsulation       | Cracking of wire bond at chip | Accelerated power cycling                |
|                     | Cracking of wire bond at      | Accelerated power cycling                |
|                     | substrate                     |                                          |
|                     | Cracking of insulator         | Temperature cycling                      |
|                     | Partial discharge metallised  | Temperature cycling                      |
|                     | ceramic                       |                                          |
|                     | Partial discharge within gel  | High voltage cycling                     |
|                     | Current sharing               | Switching operation                      |
|                     | Insulation failure            | Power cycling                            |
| External impacts    | Cosmic rays.                  |                                          |
| Abnormal conditions | At low temperature            |                                          |
|                     | At high temperature           |                                          |
|                     | Under short circuit           |                                          |
|                     | Frequency effect              |                                          |
|                     |                               |                                          |

Table 6: Failure mechanisms explored in RAPSDRA project [RAPSDRA98]

# **5.2.3 Projects in CALCE**

The Centre for Advanced Life Cycle Engineering (CALCE) at University of Maryland, the largest electronic products and systems research centre focused on electronics reliability, according to its website [www14], is dedicated to providing a knowledge and resource base to support the development of competitive electronic components, products and systems. Its structure is illustrated in Figure 20.

The areas of expertise at CALCE can be divided as [www14]:

- Lead Free Issues
- Supply Chain Management
- Parts and Components
- Printed Wiring Boards
- Ball Grid Arrays
- Permanent Interconnects
- Photonics and Telecommunications
- Contacts and Connectors
- Accelerated Testing
- Failure Analysis
- Thermal Management
- Electromagnetic Compatibility
- Electronic Systems Cost Modelling

The bolded items are supposed to be important to the reliability of power devices.



Figure 20: CALCE research projects [McCluskey99]

CALCE has a very close collaboration with utility and military industries and its funding is around \$5M per year [McCluskey99]. There are many publications on power electronics reliability every year. Typical examples are:

- The fibre effect on PCB reliability [Pecht 99];
- Main concerns in high temperature electronics systems [McCluskey98].
- A new model for predicting flex cracking failures of ceramic capacitors [www14, 2004].
- The prognostics of an electronic system with vibrating loads [Gu07].

# **5.2.4 Condition Monitoring Projects for Capacitors**

Traditional capacitor diagnosis methods are based on the signatures of ESR [Gasperi96], or voltage and current [Lahyani98, Venet99].

New techniques have been reported on condition monitoring for electrolytic capacitors.

(1) A PhD thesis at Georgia Institute of Technology [Imam07]. The issues, such as background literatures, FFT based fault prediction, and failure prediction based on system modelling, condition monitoring by parameter estimations and failures due to inrush currents are reported in this work:

- A failure prediction of electrolytic capacitor was given using system modelling and experimental work.
- Condition monitoring of electrolytic capacitors by parameter estimation was implemented. Simulation and experimental results were compared and discussed.

• Capacitor failures due to inrush current were tested and relevant failure mechanisms were analysed.

The author also had other publications on capacitor reliability [Imam 05-07].

(2) Condition monitoring for the DC link electrolytic capacitors in adjustable speed drives [Kim07]. A new sensorless technique for the condition monitoring of inverter dc link aluminium electrolytic capacitors is based on the estimation of ESR and capacitance. The proposed sensorless technique was claimed to be a simple and low cost solution.

## 5.3 Condition monitoring tools

Reliability evaluation tools are reviewed first, and cycling tests and accelerated stress tests are then described due to their importance. Finally simulation tools for condition monitoring are introduced.

## **5.3.1. Reliability Evaluation Tools**

The reliability evaluation tools mainly include the following items:

- (1) Thermal or power cycling are the most common tests, in which units are cycled repeatedly between two temperature extremes [Crowe01]. HAST (Highly Accelerated Stress Test) is normally designed produces the same effect by exerting an accelerated stress [Crowe01]. Sometimes the acceleration factor is needed to estimate the effect caused by the HAST.
- (2) *Ionic contamination* testing is used mainly to monitor solder flux cleaning processes. A sample of board-level assemblies is taken from the manufacturing line, just after the final cleaning and before the next assembly step. These boards are placed in an agitated bath of alcohol and water with a known level of resistance. If the boards have any flux or contaminant left on them that is soluble in water or alcohol, it will decrease the resistance of the bath fluid [Crowe01].
- (3) *Constant acceleration* can simulate how a part or system will react to the effects of constant accelerated stress in aircraft, missiles, etc [Crowe01].
- (4) *Environment Stress test* is necessary for power electronics system working in harsh environment [Crowe01]. Stresses may include cosmic radiations, humidity and lighting.
- (5) *Electrical stress test* to define the safe operation area and study the effects caused by other electrical stress failure mechanisms [Trivedi99].
- (6) Microscopy imaging is widely used to check the location and level of failures. Many scanning techniques have been involved. Such as SAM (scanning acoustic Microscopes) [Coquery99, Herr97, Fratelli99], SEM (Scanning Electronic Microscopy) [Gunturi06, Thébaud00], and EDX (Energy Dispersive X-rays) [Thébaud00].

Cycling test and HAST are to be reviewed due to their importance for the reliability exploration.

#### 5.3.2. Cycling tests and accelerated stress test

Thermal cycling and HAST are most often used to study the power electronics reliability. A non-accelerated thermal cycling test is very time-consuming until failures can be observed, so much more projects select HAST programme.

Statistics analysis such as Weibull analysis is widely used in the result analysis for cycling test and accelerated cycling tests to extract the reliability level and to predict the lifetime.

#### A. Cycling tests

Power cycling lifetime of IGBT modules was reported by [Scheuermann 02]. Power cycling test results for junction temperature swings of 80K and 110 K were used as a basis to investigate the superposition of the both conditions.

The most important result was the perception that the parameter  $\Delta T_j$  alone is not sufficient to describe the characteristics of the lifetime of modules in power cycling tests, which was suggested in [Held97]. A second parameter,  $T_m = T_{j,min} + \Delta T_j/2$  has a considerable influence on the test result. The number of cycles to failure can be expressed by

$$N_f = A\Delta T_j^{\gamma} \exp\left(\frac{E_a}{k_B T_m}\right)$$
(30)

where  $K_B$  is the Boltzmann constant;

 $E_a$  is the activation energy; A is the constant parameter;  $\gamma$  is the exponent.

For both conditions, i.e., 80 K and 100K swings, the observed failure mode was bond wire lift-off caused by the reconstruction of the Al metallization of the IGBT. The observation supports the assumption that the detected bond wire lift-off is a secondary failure, initiated by the primary mechanism of solder fatigue.

The superimposed power cycling with interleaved  $\Delta T_{j,nom} = 110 \text{K}/80 \text{K}$  was tested; the number of cycles to failure of the interleaved test conditions is in good correspondence with the sum of the cycles to failure under each single load condition. It suggests that these failure modes, 80K and 100K swings, are independent.

It is noted that the test work took more than 2 years. Such tests are very rare due to the long time span. Most reported tests are accelerated ones, which will be reviewed as follows.

#### **B.** Fast power cycling tests

The test described above was completed in years and the realistic converter may last tens of years, so the accelerated or fast power cycling test is very necessary to study the power electronics reliability. The focus for the accelerated test is that the failure mode in power cycling should be same as the real applications.

A fast power cycling test for IGBTs used in tractions was developed which allows reproducing millions of temperature changes in a short time. The fast power cycling test was proved to produce approximately the same damage and failure mode as slower cycling stress by a mechanical analysis [Held97].

An accelerated Power Cycling Test on IGBTs for tractions was reported in [Coquery00] with a time span over 6 months. It suggested that the failure indicators such as  $R_{th}$ ,  $V_{CE}$ , or  $I_{GES}$  increasing must be linked to the interface behaviour (thermal and mechanical) and the electrical capabilities to switch on and off in a converter. This paper discussed test methodology and protocol of accelerated Power Cycling Test (PCT), including turn off Safe Operating Area measurement before and after reliability tests to evaluate the influence of the parameters drift  $R_{th}$ ,  $V_{CE}$ , and  $I_{GES}$ . Mainly, PCT and SOA results were presented on 1200A-3300V IGBT module with A1SiC base plate materials after a 4000 hours test (376000 cycles) on very hard conditions.

Similar test work was reported by [Fratelli99, Ciappa00, Thébaud00a, 00b, Dupont03, and Mermet07].

Power cycling is also used to test the applicability of new techniques. A new soldering technique, called low temperature joining, is introduced to replace the chipsubstrate soldering technique. The power cycling capability of power modules can be increased widely with this new technique. Replacing also the bond wires and using a double-sided low temperature joining technique, a further significant increase in the life-time of power devices has been achieved [Armo05]. The LESIT results were compared to specify the improvement of the new technology.

The reliability of multi-chip IGBTs was reported by [Lefranc00]. The design of an accelerated test method to identify reliability problems during early phases of product development is discussed in [Baskoro06].

#### C. Accelerated electrical or temperature stress tests

Another type of accelerated stress tests is to study the life time under high temperature high voltage or current stresses. To estimate the effect of the HAST, an accelerated factor is introduced.

To give an example, the expected life of a device at 500 K is shorter than that of one at 400 K is calculated by [Bernstein06]:

$$AF = \frac{\lambda(T_2, V_2)}{\lambda(T_1, V_1)} = AF_T AF_V = \exp\left(\frac{E_a}{k}\left(\frac{1}{T_1} - \frac{1}{T_2}\right)\right) \exp(\gamma_1(V_2 - V_1))$$
(31)

where AF is the acceleration factor. The temperature acceleration factor  $(AF_T)$  and the voltage acceleration factor  $(AF_V)$  can be calculated separately

Supposing these two conditions have same voltage and  $E_a = 1 \text{ eV}$ , that is,  $V_1 = V_2$ , the acceleration factor is exp [11,602/ (1/400-1/500)]  $\approx 330$ . A lifetime of 1000 hours measured at 500 K leads to a predicted median lifetime of 330,000 hours for a device run at 400 K [Grant89].

The accelerated test is based on the idea that one failure mechanisms is dominant at both cases: one normal operation and one accelerated cases. Normally the temperature swings for these two cases are same and the frequency of swings is increased to generate an accelerated effect. For the situation with no single dominant failure mechanism, the overall acceleration factor needs to consider the contributions of different mechanisms. Such a model is well explained in [Bernstein06].

## 5.3.3 Simulation tools

Simulations are very helpful to guide the direction of experimental work, to characterise the test results and study the phenomenon and theory beyond experimental work since sometimes the experimental conditions may not be available.

Most simulations are based on time-to-fail calculations since it is straight forward and easy to compare with experimental results. Such simulations are reported in [Palmer03, Dupont03, Coquery99].

A new failure rate-based SPICE reliability simulation methodology was proposed [Bernstein 06]. The main purpose/advantage exclaimed in the paper was that it eliminates the demand for analysing each detail of every node in each circuit. This paper focused on electronics and its suitability for power electronics may need further consideration.

A semi- analytic model for thermal fatigue failure of die attach in power electronic building blocks was proposed in [Sundararajan98] and analysis for electrical charge in MOS components was discussed by [Fruchier]. All these work are valuable when the relevant failure mechanisms are considered.

The cooling and thermal control technology used IGBTs for high power applications, for example, automotive traction drives, was studied by [Bharathan03]. Reliability and thermal performance of IGBT plastic modules for the more electric aircraft was reported by [Newcombe03].

As for the system effect caused by the thermal effects of power devices, plenty of electro-thermal compact and fast electro-thermal models have been developed [Mawby06, Bryant06, Azar03, and Palmer03]. Such models are main foundation stones for this project – Condition Monitoring for Power Electronics Reliability (COMPERE).

## 5.4 Summary

The methodologies for condition monitoring projects are quite similar though some are for electrical power system and others are for electrolytic capacitors.

The important research projects in the past were reviewed. The results obtained in LESIT were high regarded, quoted and compared by later research work. The research group CALCE has been active in recent years and claimed to be the largest one in the world and has contributed some high quality publications.

The condition monitor tools, including test tools and simulation tools, are reviewed. As cycling and accelerated cycling tests are especially important, they are reviewed separately.

## 6. COMPERE project

The concept of condition monitoring has seldom been applied to the converter. The only components that have been considered are electrolytic capacitors, such as those used in the d.c. link. By checking online signatures, such as ESR, or voltage and current, the condition of these capacitors can be estimated [Venet02, Imam07].

The condition monitoring of a converter will be much more complicated due to the large number of components and their interactions. The main concerns of the project COMPERE will be paid on power devices and capacitor failures, and the relevant failure detection and management.

## 6.1 Novelties

The project aims to make the following contributions:

(1) Further failure mechanism exploration

The past projects, LESIT and RAPSDRA, are very much related with rail tractions, so the failure mechanisms may be application dependent.

The target applications for of COMPERE could be associated with traction or propulsion drives, and renewable energy and embedded generation systems interfaced to the grid through power electronic converters [COMPERE]. The different applications may focus different concerns on reliability.

This project is a collaboration of two parties, University of Warwick and University of Durham. One concentrates on the failure modes of power devices and anther on the system effect caused by the failures. Compared with the past research projects, the advantages of this collaboration are:

- The projects taken by power device people might not consider the system effect.
- The projects taken by power electronics system people might not be able to study the mechanisms behind failures.

Therefore, by properly management of the project, novel findings of failure mechanisms are anticipated with COMPERE.

## (2) Simulation network

The simulation tools to study thermal effect of power devices may be too detailed for system analysis. An example of FEM analysis is given in [Ye02], which focused on the packing itself and not much system effect was considered. Meanwhile, normal circuit simulators may not be able to study the power device thermal effects.

A simulation network is, therefore, necessary to study the reliability and monitor conditions, which will be based on the available compact fast electro-thermal models.

(3) Demo system for condition monitoring

A demo system will be built to prepare for a practical condition monitoring and even commercial products in the future. Efforts from both power device side and power converter side will be combined.

#### **6.2** Methodologies

Like the flow chart given in [Crowe01], the methodologies can be summarised as:

- (1) Define Problem: well explained in [COMPERE]
- (2) Collect Data: questionnaire survey has been carried out.
- (3) Define Analysis: condition to be monitored will be determined based on the literature review and questionnaire results.
- (4) Execute Plan: simulation and experimental work will be carried out.
- (5) Identify Root Cause: failure analysis to study the failure mechanisms and the feasibility of condition monitoring.
- (6) Characterisation process: the simulation and experimental work will be checked to guarantee the effectiveness of the proposed simulation network and demo system.
- (7) Document Database: deliverables to be prepared.

A test rig based on back-to-back system and initial ideas are given in Figure 21. More contents and upgradations are likely to implement the research aims and targets.



Figure 21: Test rig and initial ideas for COMPERE

#### 7. Summary

There are many challenging tasks for the COMPERE projects, since physical phenomena are not so straightforward and several failure mechanisms may mix

together. The extraction of signals and analysis of failures will be key points for experimental work and characterisations.

Although plenty of work has been carried out in the power electronics reliability area, the publications for condition monitoring of a power electronics system are relatively rare, which makes the COMPERE project difficult and meanwhile worth doing.

#### **Reference:**

Amerasekera and Campbell, ESD pulse continuous voltage breakdown in MOS capacitor structures, *Proc. EOS/ESD Symp.* 1986, pp. 208–213

Amerasekera and Najm, Failure Mechanisms in Semiconductor Devices, John Wiley & Sons, 1997

Amro, Lutz, Lindemann, Power cycling with high temperature swing of discrete components based on different technologies, *PESC 04*, 2004, pp. 2593- 2598

Amro, Lutz, Rudzki, Sittig, Thoben, Power Cycling at High Temperature Swings of Modules with Low Temperature Joining Technique, *ISPSD 2006* 

Azar, Udrea, Ng, Dawson, Findlay, Waind, Amaratunga, Advanced Electro-thermal SPICE Modelling of Large Power IGBTs, *ISPSD 2003*, pp. 291 – 294

Baliga, Modern Power Devices, 1987, New York: John Wiley & Sons, 1987

Baskoro et al, Developing MESA: An Accelerated Reliability Test, *Proceedings Annual Reliability* And Maintainability Symposium, 2003

Baskoro, The design of an accelerated test method to identify reliability problems during early phases of product development, *PhD thesis to Technische Universiteit Eindhoven*, 2006

Beckedahl, Turky, Scheuermann, Packaging consideration of an integrated inverter Module (IIM) for hybrid vehicle, *PCIM Europe*, 2005

Bell, Recovery Characteristics of Ionic Drift Induced Failures under Time/ Temperature Stress, Pro. 18th IRPS, pp. 217–219, 1980

Benmansour, Azzopardi, Martin and Woirgard, Trench IGBT failure mechanisms evolution with temperature and gate resistance under various short-circuit conditions, *Microelectronics Reliability* 47 (2007) 1730–1734

Berg, Wolfgang, Advanced IGBT modules for railway traction applications: Reliability testing, *Microelectronics Reliability*, Vol. 38(6), 1998, pp. 1319-1323

Berth, Partial Discharge Behaviour of Power Electronic Packaging Insulation, *Proceedings of* 1998 International Symposium on Electrical Insulating Materials

Bernstein, Gurfinkel, Li et al, Electronic Circuit Reliability Modelling, *Microelectronics and Reliability*, Vol. 46 (12), 2006, pp. 1957-1979

Bharathan, Gawlik, Kramer,

 $www.nrel.gov/vehicles and fuels/powerelectronics/pdfs/advanced_power_electronics_thermal_mgmt.pd~f,~2003$ 

Bloomer et al., "Failure mechanisms in through-hole packages," in *Electronic Materials Handbook*, Vol. I, Packaging, ASM International, Materials Park, Ohio, 1989, pp. 969-981.

Blache and Shrivastava, Defining failure of manufacturing machinery and equipment, *Proceedings* Annual Reliability and Maintainability Symposium, pp. 69-75, 1994

Brisbin, Strachan and Chaparala, PMOS Drain Breakdown Voltage Walk-in: A New Failure Mode in High Power BiCMOS Applications, *IEEE International* 42<sup>nd</sup> *Reliability Physics Symposium Proceedings*, 2004, pp. 265-268

Bryant A., Mawby, P., Santi E. and Hudgins J., Exploration of Power Device Reliability using Compact Device Models and Fast Electro-thermal Simulation, *41st IAS Annual Meeting*, Vol. 3, 2006, pp. 1465-1472

Bryant A., Palmer P., Santi E., Hudgins, J., Simulation and Optimization of Diode and Insulated Gate Bipolar Transistor Interaction in a Chopper Cell using Matlab and Simulink, *IEEE Transactions on Industry Applications*, Vol. 43 (4) 2007, pp. 874 - 883

Condition Monitoring for Power Electronics Reliability (COMPERE): Case for Support

Chan, Compensating Effects in Time-Dependent Dielectric Breakdown, *IEEE Transactions On Reliability*, Vol. 41, No. 3, 1992 September.

Chan, Yeung, Failure Mechanisms of Miniaturized Multilayer Ceramic Capacitors under Normal Service Conditions, 43rd Electronic Components and Technology Conference, 1993, pp. 1152-1155

Chang, Hsieh, Martens, Yang, Wire-bond void formation during high temperature aging, *IEEE Transactions on Components and Packaging Technologies*, Vol.(1), 2004 pp.: 155 - 160

Christou, Electromigration and Electronic Device Degradation, New York: John Wiley & Sons, 1994

Ciappa, Wolfgang, Lifetime Prediction of IGBT Modules for Traction Applications, 38<sup>th</sup> Annual International Reliability Physics Symposium, 2000, 210-216

Ciappa, Reliability of High-Power Devices, EPE, 2003.

Ciappa, Wolfgang, Lifetime Prediction of IGBT Modules for Traction Applications, 38<sup>th</sup> Annual International Reliability Physics Symposium, 2000, 210-216

Ciappa, Reliability of High-Power Igbt Modules for Traction Applications, 45th Annual International Reliability Physics Symposium, Phoenix, 2007

COMET project, An R&D strategy for condition monitoring of T&D plant, *IET Power Convention*, 2007, pp. 1–20

Coquery et al, Reliability Improvement of the Soldering Thermal Fatigue with AISiC. Technology on traction high-power IGBT Modules, *EPE 99* 

Coquery and Lallemand, Failure criteria for long term Accelerated Power Cycling Test linked to electrical turn off SOA on IGBT module. A 4000 hours test on 1200A–3300V module with AlSiC base plate, *Microelectronics Reliability*, Vol.40 (8-10) 2000, pp. 1665-1670

Coquery et al, Power module lifetime estimation from chip temperature direct measurement in an automotive traction inverter, *Microelectronics Reliability*, Vol. 41, 2001, pp. 1695–1700

Cova, Ciappa, Franceschini, Malberti, Fantini, Thermal Characterization of IGBT power modules, *Microelectronic Reliab.*, Vol 37, 1997

Cova ,Fantini, On the effect of power cycling stress on IGBT modules, *Microelectronic Reliab.*, Vol 38, 1998

Crowe and Feinberg, Design for reliability, CRC Press, 2001.

DeGroot and Schervish, *Probability and Statistics*, 3<sup>rd</sup> ed., Addison-Wesley, 2002

Dupont, Lefebvre, Khatir, Faugières, Power Cycling Test Circuit for thermal Fatigue Resistance Analysis of Solder Joints in IGBT, *EPE 2003* 

Duvvury et al, Device Integration for ESD Robustness of High Voltage Power MOSFETs, *Tech. Dig. IEDM*, 1994, pp. 407 – 411

Edwards, Testing for MOS IC Failure Modes, IEEE Trans. Rel., TR-31, 1982, pp. 9-17

Fichtner, Huang, Kaeslin, Felber, Aemmer, *Research Review 2002*, Integrated Systems Laboratory Microelectronics Design Centre, Eidgenössische Technische Hochschule Zürich

Fruchier, Notingher et al, Applications of the Thermal Step Method to the Characterization of Electric Change in MOS Components, 42nd IAS Annual Meeting, 2007, pp. 444 - 451

Fratelli, Giannini, Cascone , Busatto, Reliability Test of Power IGBT's for Railway Traction, EPE 1999

Gasperi, Life prediction model for aluminum electrolytic capacitors, *IAS '96*, Vol. 3, pp 1347 - 1351 Gerling07a, Definitions and Characteristics, *ECPE Tutorial "Reliability of Power Electronic Systems"*, April 2007

Gerling07b, Built-in reliability and zero-defect technology, *ECPE Tutorial "Reliability of Power Electronic Systems"*, April 2007

Gerling07c, Robustness Validation, ECPE Tutorial "Reliability of Power Electronic Systems", April 2007

Gerling07d, Physics of failure concepts, ECPE Tutorial "Reliability of Power Electronic Systems", April 2007

Glavanovics, Detzel, Weber, Impact of thermal overload operation on wirebond and metallization reliability in smart power devices, *Solid-State Device Research conference (ESSDERC)*, 2004

Golland and Wakeman, Application of Press-pack IGBTs in Traction applications, APEC 2005

Gollentz, High power inverter using press pack IGBT for high speed applications, 2007 European Conference on Power Electronics and Applications

Grant and Gowar, Power MOSFET-Theory and Applications, John Wiley and Sons, 1989

Graas, Lee, McPherson, Havemann, Electromigration Reliability Improvement of W-plug Vias by Titanium Layering, *Proc. 32<sup>nd</sup> IRPS*, pp. 173–177, 1994

Gu, Barker and Pecht, Prognostics Implementation of Electronics under Vibration Loading, *Microelectronics Reliability*, Vol. 47, Issue 12, pp. 1849-1856, Dec. 2007.

Hagino, Yamashita, Uenishi, and Haruguchi, An experimental and numerical study on the forward biased SOA of IGBT's, *IEEE IEEE on Trans. Electron Devices*, vol. 43, no. 3, pp. 490–500, 1996.

Hamidi, Kaufmann, Herr, Increased Lifetime of Wire Bonding Connections for IGBT Power Modules, APEC, 2001

Held, Jacob, Nicoletti, Scacco, Poech, Fast Power Cycling Test for IGBT modules in Traction Application, *International Conference on Power Electronics and Drive Systems*, Vol.1, pp. 425-430, 1997

Herr, Frey, Schlegel, Stuck, Zehringer, Substrate-to-Base Solder Joint Reliability in High Power IGBT Modules, *Microelectronic Reliability*, Vol. 37(1997)1719-1722

Hnatek, *Practical reliability of electronic equipment and products*, New York: Marcel Dekker, 2003 Horowitz, Hill, *The Art of Electronics*, Cambridge University Press: 1989

Hu, Yang, Shin, Mechanism and thermal effect of delamination in light-emitting diode packages, *Microelectronics Journal*, Vol 38(2), 2007, pp. 157-163

Imam, Condition monitoring of electrolytic capacitors for power electronics applications, Georgia Institute of Technology, PhD thesis, May 2007

Imam; Divan; Harley; Habetler Electrolytic Capacitor Failure Mechanism Due to Inrush Current, 42nd IAS Annual Meeting, 2007, pp. 730 - 736

Imam; Divan; Harley; Habetler, Real-Time Condition Monitoring of the Electrolytic Capacitors for Power Electronics Applications, pp.1057 - 1061

Imam; Habetler; Harley; Divan;, LMS based condition monitoring of electrolytic capacitor, IECON 2005

Imam; Habetler; Harley; Divan, Condition Monitoring of Electrolytic Capacitor in Power Electronic Circuits using Adaptive Filter Modeling, PESC '05, 2005 pp. 601-607

Imam; Habetler; Harley; Divan, Failure prediction of electrolytic capacitor using DSP methods, APEC 2005, Vol.2, pp. 965-970

Jacob, Held, Scacco, Wu, 1995, Reliability testing and analysis of IGBT power semiconductor modules, *Proceedings of the 20th International Symposium for Testing and Failure analysis*, p.319-325.

Jacob, 3-Level High Power Converter with Press Pack IGBT, EPE 07

Janning, Mercier, Medium voltage three-level inverter for high speed applications, 2007 European Conference on Power Electronics and Applications

Johnson, Packaging for wide band-gap power electronics, *Workshop on Advanced Semiconductor Materials for Power Electronics (WASMPE 2005)* 

Jones, Line Width Dependence of Stresses in Aluminium Interconnect, Proc. 25th IPRS, pp.9-14, 1987

Jonnalagadda, Qi, Liu, Mechanical Fatigue reliability of PBGA Assemblies with Lead-free Solder and Halogen-free PCBs, 9th intersociety conference on Thermal and Thermomechanical Phenomena in Electronic Systems, ITHERM '04, Vol. 2 2004, pp. 165-170

Kampen, "Ensure AC Film Capacitor Reliability with Thermal Analysis," PCIM 2001, pp 56-67.

Kastha and Bose, Investigation Of Fault Modes Of Voltage-Fed Inverter System For Induction Motor Drive, *IEEE Trans. on Industry Applications*, Vol. 30, No. 4, 1994, pp.1028-1038

Katsis and Wyk, Void induced thermal impedance in power semiconductor modules: some transient temperature effects, *IEEE Trans Industry Appl* **39** (2003), pp. 1239–1246.

Katsis and Wyk, A Thermal, Mechanical, and Electrical Study of Voiding in the Solder Die-Attach of Power MOSFETs, *IEEE Transactions On Components And Packaging Technologies*, Vol. 29, No. 1, March 2006, 127-136

Khatir, Lefebvre, Boundary Element Analysis Of Thermal Fatigue Effects On High Power IGBT Modules, *Microelectronics Reliability*, Vol. 44 (6), 2004, pp. 929-938

Kim, Shin, Ryu, Chang, Reliability evaluation and failure analysis for high voltage ceramic capacitor, *Electronic Materials and Packaging*, 2001, pp. 286-295

Kim, Lee, Yon, Lee, Yoo, Condition Monitoring of DC Link Electrolytic Capacitors in Adjustable Speed Drives, 42nd IAS Annual Meeting, 2007, pp. 237 - 243

King, Schaick, Lusk, Electrical Overstress of Non-encapsulated Bond Wires, Proc. 27th IRPS, pp. 141-151, 1989

Kirihata, Takahashi, Wakimoto, Niino, Investigation of Flat-Pack IGBT Reliability, *Thirty-Third IAS Annual Meeting*, 1998, pp. 1016-1021

Kobayashi, Ariyoshi, Masuda, Reliability Evaluation and Failure Analysis for Multilayer Ceramic Chip Capacitors, IEEE Transactions on Components, Hybrids, and Manufacturing Technology, Vol.1 (3), 1978, pp. 316-324

Koziarz, Gilmour, Anomalous Thermal Conductivity in Regions of Non-Uniform Die Attach Integrity, *Proc.* 33<sup>rd</sup> *IRPS*, 1995, pp. 107-111

Krishnaswami, Das, Hull, Ryu, Scofield, Agarwal, Palmour, Gate Oxide Reliability Of 4H-Sic MOS Devices, *IEEE International Reliability Physics Symposium*, 2005, pp.592 – 593

Lagies, Göhler, Sigg, Turkes, Kraus, Degradation Modelling of Semiconductor Devices and Electrical Circuit, *ICSE'98 Proc.*, 1998

Lahyani, Venet, Grellet and Viverge, Failure prediction of electrolytic capacitors during operation of a switch mode power supply, *IEEE Transaction on Power Electronics*, Vol. 13, Issue 6, pp. 1199-1207.

Lee, Matijasevic, Highly Reliable Die Attachment On Polished GaAs Surfaces Using gold-Tin Eutectic Alloy, *IEEE Trans. on Components, Packaging, and Manufacturing Technology*, Vol. 12(3),1989, pp.406 - 409

Lehmann, Netzel, Herzer and Pawel, Method for Electrical Detection of Bond Wire Lift-off for Power Semiconductor, *ISPSD*, 2003

Lefranc, Licht, Schultz, Beinert, Mitic, Reliability Testing Of High Power Multi-Chip IGBT Modules, *Microelectronics Reliability*, Vol. 40, pp 1659-1663, 2000

Li, Lai, Xu, Improved Reliability Of Ge MOS Capacitor with Hftion High-K. Dielectric by Using Ge Surface Pretreatment in Wet NO., *Microelectronic Engineering*, Vol. 84(9), 2007, pp. 2340-2343

Lin, Perng, Chien, Chiou, Chang et al, Plasma Charging Induced Gate Oxide Damage During Metal Etching and Ashing, *Proc. 1<sup>st</sup> Int. Symp. On Plasma Process-Induced Damage (P2ID)*, 1996, pp. 113-116

Lindqvist, Doksum, Mathematical and statistical methods in reliability, World Scientific, 2003

Luo; Liang; Cho; Design of IGBT gate drive circuit with SOA consideration, Power Electronic Drives and Energy Systems for Industrial Growth, 1998, Vol. 1, pp. 307 - 311

Lycoudes, Childers, Semiconductor Instability Failure Mechanism Review, *IEEE Trans. Rel.*, Vol 29,1980, pp. 237-247

Mahalik, A Digital Meter for Measuring Dissipation Factor, *Proceedings of the 5th International Conference on Properties and Applications of Dielectric Materials*, May 25-30, 1997, Seoid, Koreu

Martin, Reliability of Gate Dielectrics and Metal–Insulator–Metal Capacitors, *Microelectronics and Reliability*, Vol. 45 (5), 2005, pp. 834-840

Maugain, Papadas, Ghibaudo, Gambetta, Mortini, On the Degradation Features of Poly-Emitter NPN BJTs after Hot Carrier Injection, *Proc.34<sup>th</sup> IRPS*, 1995, pp.266-275

Mawby, Bryant, Palmer, Santi, and Hudgins, High Speed Electro-Thermal Models for Inverter Simulations. 25<sup>th</sup> International Conference on Microelectronics, 2006, pp.166-173, 2006, pp. 166-173

McCluskey, Physics of failures, IEEE Proceedings Aerospace Conference, 1999

McCluskey, Grzybowski et al, Reliability Concerns in High Temperature Electronic System, Engineering Foundation Conference on High Temperature Electronic Materials, Devices and Sensors, 1998,

McMillan, Ault, Towards Quantification of Condition Monitoring Benefit for Wind Turbine Generators, *European Wind Energy Conference*, May 2007

Mermet-Guyennet, Perpiñá, Piton, Revisiting Power Cycling Test for Better Life-Time Prediction in Traction, *Microelectronics Reliability*, 2007, pp. 1690-1695

Meysenc, Jylhakallio, Barbosa, Power Electronics Cooling Effectiveness versus Thermal Inertia, *IEEE Tran. on Power Electronics*, vol. 20, No.3, 2005, pp. 687- 693

Mistry, Hokinson, Gieseke, Fox, Preston and Doyle, Voltage Overshoots and N-MOSFET Hot Carrier Robustness in VLSI Circuits, *Proc.* 32<sup>nd</sup> IRPS, 1994, pp. 65-71

Mochizuki,; Suekawa; Iura; Satoh; Development of 6.5 kV class IGBT with wide safety operation area, *Power Conversion Conference*, 2002, Volume 1, pp. 248 - 252

Moore, Kelsall, The Impact Of Delamination On Stress-Induced Andcontamination-Related Failure In Surface Mount ICs, *30th IRPS*, 1992, pp. 169-176

Mohan, Undeland, and Robbins, *Power Electronics: converters, applications and design*, 2<sup>nd</sup> ed. John Wiley &Sons, 1995

Morozumi, Yamada, Miyasaka, Seki, Reliability of Power Cycling for IGBT Power Semiconductor Modules, *IAS 2001* 

Moss, Caution-Electrostatic Discharge at Work, *IEEE Transaction on Components, Hybrids, and Manufacturing Technology*, Volume: 5 (4), 1982, pp. 512- 515

Murdock, Torres, Connors, and Lorenz, Active Thermal Control of Power Electronic Modules, *IEEE Transactions On Industry Applications*, Vol. 42, No. 2, March/April 2006

Newcombe, Reliability and Thermal Performance of IGBT Plastic Modules for the More Electric Aircraft, *ISPSD*, 2003

Nicollian, Brews, MOS physics and Technology, New York: John Wiley& Sons, 1982

Oates, Thin-Film Electromigration: Al Alloy Metallizations for Submicron IC Technologies, *Tutorial Notes, Proc. IRPS*, 1994, pp.2.1-2.23

Ohring, Reliability and failure of electronic materials and devices. San Diego: Academic Press, 1998.

Omi, Fujuta, Tsuda, Maeda, Causes of Cracks in SMD and Type-Specific Remedies, *Proc. Elec. Comp. and Dev. Conf.*, 1991, pp. 776-781

Omura, Electrical and Mechanical Package Design for 4.5kV Ultra High Power IEGT with 6kA Turnoff Capability, *ISPSD 2003* 

Onuki, Koizumi, "Reliability of thick wire bonds in IGBT modules for traction motor drives" ISPSD 95, pp. 428-433

Otsuki, Advanced thin IGBTs with new thermodynamics solution, ISPSD 2003

Palmer, Joyce et al, Circuit Simulator Models for the Diode and IGBT with full temperature dependent features, *IEEE Trans. Power Electronics*, Vol. 18(5), 2003, pp. 1220-1229

Pecht, Hillman, Rogers, Jennings, Conductive filament formation: a potential reliability issue inlaminated printed circuit cards with hollow fibers, *IEEE Transactions on Electronics Packaging Manufacturing*, Vol 22(1), 1999, 80-84

Pepper, Electron injection into SiO, from an avalanching p-n junction, J. Phys D: Appl. Phys., Vol. 6, 1973 (Printed in Great Britain)

Petrarca; Cascone; Fratelli; Vitelli; Partial discharge diagnostics on 3.3 kV, 1.2 kA IGBT modules, 11th international symposium on High Voltage Engineering, 1999 Vol.4, pp. 324 - 327

Papoulis, Probability, random variables and stochastic processes, 2nd ed., McGraw-Hill, 1984.

Pham, Handbook of reliability engineering, Springer, 2003

Phoon, Generation System Reliability Evaluations with Intermittent Renewables, Master thesis, Strathclyde, 2006

Prendergast, Suehle, P. Chaparala, E. Murphy and M. Stephenson, TDDB Characterization of Thin SiO<sub>2</sub> Films with Bimodal Failure Populations. *IEEE Proc. IRPS*, 1995, pp. 124–130.

Ran, Gokani., Clare, et al, Conducted electromagnetic emissions in induction motor drive systems. Time domain analysis and identification of dominant modes, *IEEE Transactions on Power Electronics*, Vol 13(4), 1998

Ran, Gokani, Clare, et al, Conducted electromagnetic emissions in induction motor drive systems, II. Frequency domain models, *IEEE Transactions on Power Electronics*, Vol 13(4), 1998

RAPSDRA: Failure Mechanisms in IGBT Modules for Traction Applications, 1998

Reed, Tantalum chip capacitor reliability in high surge and ripple current applications. In: *Proceedings* of the 15th Capacitor and Resistor Technology Symposium (CARTS), San Diego, California (March 1995), pp. 122–129.

Reimann, Krümer, Franke, Petzoldt, Lorenz, Real Time Calculation of the Chip Temperature of Power Modules in PWM Inverters using a 16Bit Microcontroller, *ISPSD 2000* 

Ritz, Stacy, Broadbent, The Microstructure of Ball Bond Corrosion Failures, *Proc. 25<sup>th</sup> IRPS*, pp. 28-33, 1987

Rogers, Hillman and Pecht, Hollow Fibers Can Accelerate Conductive Filament Formation, ASM International Practical Failure Analysis, Vol. (4), 2001, pp. 57-60

Sarjean, Dollinger, Zirnheld, MacDougall, Goldberg, Capacitors-past, present, and future: a transnational perspective, 22nd International symposium on Power Modulator, 1996, pp. 209-212

Scheuermann and Hecht, Power Cycling Lifetime of Advanced Power Modules for Different Temperature Swings, *Proc. Of PCIM 2002, pp. 59-64* 

Schutze; Berg; Hierholzer; Further improvements in the reliability of IGBT modules, Industry Applications Conference, 1998 IAS Annual Meeting, vol.2, pp. 1022 - 1025

Scott, Dumin, Hughes, Dumin, Moore, Properties of high-voltage stress generated traps in thin silicon oxide, *IEEE Transactions on Electron Devices*, Vol.43(7), 1996, pp. 1133–1143

Seppl, Saarinen, Ristolainen, Reliability of unencapsulated SMD plastic film capacitors, *Soldering & Surface Mount Technology*, Vol. 2(1), 2000, pp. 15-22

Shekar, Baliga, Nandakumar, Tandon, Reisman, Characteristics of the Emitter-Switched Thyristor, *IEEE Trans. on Electronic Devices*, Vol. 39, 1991, pp. 1619–23

Shirley, Blish, Thin-Film Cracking and Wire Ball Shear in Plastic Dips Due to Temperature Cycle and Thermal Shock, 25<sup>th</sup> Reliability Physics Symposium, 1987. pp.238-249

Smolens, Gold, Hoe, Falsafi, and Mai, Detecting Emerging Wearout Faults, *IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE-3)*, 2007

Suehle, Chaparala, Messick, Miller, Boyko, Field and temperature acceleration of time-dependent dielectric breakdown in intrinsic thin SiO2, *Proc.* 32<sup>nd</sup> *IRPS*, 1994, *pp.* 12-125

Sundararajan, McCluskey, and Azarm, Semi Analytic Model for Thermal Fatigue Failure for Die Attache in Power Electronic Building Blocks, *Fourth High Temperature Electronics Conference* (*HITEC*), 1998, pp. 94 - 102

Sze, Physics of Semiconductor Devices, John Wiley & Sons, 1981

Tavner, Xiang and Spinato, Reliability Analysis for Wind Turbines, *Wind Energy*, Vol. 10(1), 2006, pp.1-18

Thébaud<sup>00a</sup>, Woirgard, Zardini, Sommer, Extensive Fatigue Investigation of Solder Joints in IGBT High Power Modules, *Electronic Components and Technology Conference*, 2000, pp. 1436-1442

Thébaud<sup>00b</sup>, High Power IGBT Modules: Thermal Fatigue Resistance Evaluation of the Solder Joints, *IWIPP* 2000, pp 79 - 83

Tonti, Bolam., Hansch, Impact of shallow trench isolation on reliability of buried- and surface-channel sub-µm PFET, *Proc.* 33<sup>rd</sup> IRPS, 1995. pp. 24-29

Tummala and Rymaszewsk, *Microelectronics Packaging Handbook*, New York: Van Nostrand Reinhold, 1989

Turbini., Ready and Smith, Conductive Anodic Filament (CAF) Formation: A Potential Reliability Problem for Fine-Line Circuits, http://smaplab.ri.uah.edu/lce/turbini.pdf,

Vallon, Rechardeau et al, Converter Topology for Reliability Test Bench Dedicated to PWM Inverters. EPE 2003

Venet, Lahyani, Grellet and Jaco, Influence of aging on electrolytic capacitors function in static converters: Fault prediction method, *European Physical Journal, Applied Physics*, v 5, n 1, 1999, p 71-83.

Venet, Perisse, El-Husseini M.H. and Rojat G., Realization of a smart electrolytic capacitor circuit, IEEE Industry Applications Magazine, Jan/Feb, 2002, pp.16-20

Van der Pol, Koomen, Shot Loop Monitoring of Metal Step Coverage by Simple Electrical Measurements, *Proc.* 34<sup>th</sup> IRPS, pp. 148-155, 1996

Verwey, Amerasekera, Bisschop, Physics of SiO<sub>2</sub> layers, Rep. Prog. Phys. 53, 1990, pp. 1297-1331

Varalakshimi, Usa, Udayakumar, Behaviour of MPPF Capacitors under Transient Over Voltages, *Proceedings of International Symposium on Electrical Insulating*, 2005, pp. 390-392

Welsher, Mitchell, Lando, CAF in Composite Printed-Circuit Substrates: Characterization, Modelling and a Resistant Material, 18th annual Reliability Physics Symposium, 1980. pp. 235-237

Whitaker, Electronic systems maintenance handbook, 2nd ed., Boca Raton: CRC Press, 2002.

Witczak, Kosier, Schrimpf, Galloway, Synergetic effects of radiation stress and hot-carrier stress on the current gain of npn bipolar junction transistors, *IEEE Transactions on Nuclear Science*, Vol.41(6), 1994, pp. 2412-2419

Wolfgang 07a, Examples for Failures in Power Electronics Systems, ECPE Tutorial "Reliability of Power Electronic Systems", April 2007

Wolfgang 07b, Introduction, ECPE Tutorial "Reliability of Power Electronic Systems", April 2007

Wolfgang 07c, Reliability Risk, ECPE Tutorial "Reliability of Power Electronic Systems", April 2007

Woods, Rossenberg, EPROM Reliability: Part I and II, Electronics, Vol. 53, 1980, pp. 109-115

Xing, Lee, and Boroyevich, Extraction of Parasitics within Wire-Bond IGBT Modules, APEC98

Ye, Lin, Basaran, Failure modes and FEM analysis of power electronic packaging, *Finite Elements in Analysis and Design*, Vol. 38, 2002, pp. 601–612

Yiqi, Qing, H<sub>FE</sub> Noise and 1/f Instability in Bipolar Transistor, Proc. 28<sup>th</sup> IRPS, pp. 290–297, 1991

Wu, Held, Jacob, Scacco, Birolini, Investigation on the Long Term Reliability of Power IGBT Modules, *ISPSD 95*, pp. 443-448, 1995

Wu; Wu; Zhang; Dong; Jacob; Held; A study of EOS failures in power IGBT modules, 5th International Conference on Solid-State and Integrated Circuit Technology, 1998 pp. 152 - 155

Zhao, Liu, Reliability assessment of the metallized film capacitors from degradation data, *Microelectronics Reliability*, Vol. 47, 2007, pp. 434–436

Zhu, Power System Reliability Analysis with Distributed Generators, Master thesis, Virginia Polytechnic Institute and State University, 2003

#### Web links:

[www1] http://seattlepi.nwsource.com/business/174159\_electric20.html

- [www2] http://www.weibull.com/basics/lifedata.htm
- [www3] http://www.investis.com/ngt/ara\_2005/ofr\_or.html

[www4] http://www.national.com/quality/reliability\_programs.html

[www5] http://www.tyndall.ie/industry/failure\_analysis\_reverse\_engineering.html

[www6] http://my.execpc.com/~endlr/

[www7] http://www.massmind.org/images/www/hobby\_elec/e\_capa.htm

[www8] http://www.calce.umd.edu/whats\_new/2005/CFF.pdf

[www9] http://www.siliconfareast.com/mic.htm

[www10]http://www02.abb.com/global/seitp/seitp202.nsf/0/2b02717d8ba3086cc12573f50030651c/\$fil e/ACS\_5000\_.pdf

[www11] http://www.m3electronix.com/features.html

[www12] www.ewh.ieee.org/soc/pes/switchgear/Presentations/Bergman.ppt

[www13] http://www.energieanalysen.ethz.ch/bernard/te\_as\_en.htm

[www14] http://www.calce.umd.edu/