Cryto Eth

The type of production that puts components together is referred to as a(n)

The study of component and process reliability is the basis of many efficiency evaluations in Operations Management discipline. For example, in the calculation of the Overall Equipment Effectiveness [OEE] introduced by Nakajima [1], it is necessary to estimate a crucial parameter called availability. This is strictly related to reliability. Still as an example, consider how, in the study of service level, it is important to know the availability of machines, which again depends on their reliability and maintainability.

Reliability is defined as the probability that a component [or an entire system] will perform its function for a specified period of time, when operating in its design environment. The elements necessary for the definition of reliability are, therefore, an unambiguous criterion for judging whether something is working or not and the exact definition of environmental conditions and usage. Then, reliability can be defined as the time dependent probability of correct operation if we assume that a component is used for its intended function in its design environment and if we clearly define what we mean with "failure". For this definition, any discussion on the reliability basics starts with the coverage of the key concepts of probability.

A broader definition of reliability is that "reliability is the science to predict, analyze, prevent and mitigate failures over time." It is a science, with its theoretical basis and principles. It also has sub-disciplines, all related - in some way - to the study and knowledge of faults. Reliability is closely related to mathematics, and especially to statistics, physics, chemistry, mechanics and electronics. In the end, given that the human element is almost always part of the systems, it often has to do with psychology and psychiatry.

In addition to the prediction of system durability, reliability also tries to give answers to other questions. Indeed, we can try to derive from reliability also the availability performance of a system. In fact, availability depends on the time between two consecutive failures and on how long it takes to restore the system. Reliability study can be also used to understand how faults can be avoided. You can try to prevent potential failures, acting on the design, materials and maintenance.

Reliability involves almost all aspects related to the possession of a property: cost management, customer satisfaction, the proper management of resources, passing through the ability to sell products or services, safety and quality of the product.

This chapter presents a discussion of reliability theory, supported by practical examples of interest in operations management. Basic elements of probability theory, as the sample space, random events and Bayes' theorem should be revised for a deeper understanding.

2. Reliability basics

The period of regular operation of an equipment ends when any chemical-physical phenomenon, said fault, occurred in one or more of its parts, determines a variation of its nominal performances. This makes the behavior of the device unacceptable. The equipment passes from the state of operation to that of non-functioning.

In Table 1 faults are classified according to their origin. For each failure mode an extended description is given.

Failure causeDescriptionStress, shock, fatigueFunction of the temporal and spatial distribution of the load conditions and of the response of the material. The structural characteristics of the component play an important role, and should be assessed in the broadest form as possible, incorporating also possible design errors, embodiments, material defects, etc..TemperatureOperational variable that depends mainly on the specific characteristics of the material [thermal inertia], as well as the spatial and temporal distribution of heat sources.WearState of physical degradation of the component; it manifests itself as a result of aging phenomena that accompany the normal activities [friction between the materials, exposure to harmful agents, etc..]CorrosionPhenomenon that depends on the characteristics of the environment in which the component is operating. These conditions can lead to material degradation or chemical and physical processes that make the component no longer suitable.

Table 1.

Main causes of failure. The table shows the main cases of failure with a detailed description

To study reliability you need to transform reality into a model, which allows the analysis by applying laws and analyzing its behavior [2]. Reliability models can be divided into static and dynamic ones. Static modelsassume that a failure does not result in the occurrence of other faults. Dynamic reliability, instead, assumes that some failures, so-called primary failures, promote the emergence of secondary and tertiary faults, with a cascading effect. In this text we will only deal with static models of reliability.

In the traditional paradigm of static reliability, individual components have a binary status: either working or failed. Systems, in turn, are composed by an integer numbern of components, all mutually independent. Depending on how the components are configured in creating the system and according to the operation or failure of individual components, the system either works or does not work.

Let’s consider a genericXsystem consisting ofnelements. The static reliability modeling implies that the operating status of thei-thcomponent is represented by the state functionXidefined as:

Xi=1 if the i-th component works 0 if the i-th component fails E1

The state of operation of the system is modeled by the state functionΦX

ΦX=1 if the system works0 if the system fails E2

The most common configuration of the components is the series system. A series system works if and only if all components work. Therefore, the status of a series system is given by the state function:

ΦX=∏i=1nXi=mini∈1,2,…,n⁡XiE3

where the symbol∏indicates the product of the arguments.

System configurations are often represented graphically with Reliability Block Diagrams [RBDs] where each component is represented by a block and the connections between them express the configuration of the system. The operation of the system depends on the ability to cross the diagram from left to right only by passing through the elements in operation. Figure 1 contains the RBD of a four components series system.

Figure 1.

Reliability block diagram for a four components [1,2,3,4] series system.

The second most common configuration of the components is the parallel system. A parallel system works if and only if at least one component is working. A parallel system does not work if and only if all components do not work. So, ifΦ-Xis the function that represents the state of not functioning of the system andX-iindicates the non-functioning of thei-thelement, you can write:

Accordingly, the state of a parallel system is given by the state function:

ΦX=1-∏i=1n1-Xi=∐i=1nXi=maxi∈1,2,…,n⁡XiE5

where the symbol∐indicates the complement of the product of the complements of the arguments. Figure 2 contains a RBD for a system of four components arranged in parallel.

Figure 2.

Parallel system. The image represents the RBD of a system of four elements [1,2,3,4] arranged in a reliability parallel configuration.

Another common configuration of the components is the series-parallel systems. In these systems, components are configured using combinations in series and parallel configurations. An example of such a system is shown in Figure 3.

State functions for series-parallel systems are obtained by decomposition of the system. With this approach, the system is broken down into subsystems or configurations that are in series or in parallel. The state functions of the subsystems are then combined appropriately, depending on how they are configured. A schematic example is shown in Figure 4.

Figure 3.

Series-parallel system. The picture shows the RBD of a system due to the series-parallel model of 9 elementary units.

Figure 4.

Calculation of the state function of a series-parallel. Referring to the configuration ofFigure 3, the state function of the system is calculated by first making the state functions of the parallel of{1,2}, of{3,4,5}and of{6,7,8 , 9}. Then we evaluate the state function of the series of the three groups just obtained.

A particular component configuration, widely recognized and used, is the parallelkout ofn. A systemkout ofnworks if and only if at leastkof thencomponents works. Note that a series system can be seen as a systemnout ofnand a parallel system is a system 1 out ofn. The state function of a systemkout ofnis given by the following algebraic system:

ΦX=1 if ∑i=1nXi≥k0 otherwise E6

The RBD for a systemkout ofnhas an appearance identical to the RBD schema of a parallel system ofncomponents with the addition of a label "kout ofn". For other more complex system configurations, such as the bridge configuration [see Figure 5], we may use more intricate techniques such as the minimal path set and the minimal cut set, to construct the system state function.

A Minimal Path Set - MPS is a subset of the components of the system such that the operation of all the components in the subset implies the operation of the system. The set is minimal because the removal of any element from the subset eliminates this property. An example is shown in Figure 5.

Figure 5.

Minimal Path Set. The system on the left contains the minimal path set indicated by the arrows and shown in the right part. Each of them represents a minimal subset of the components of the system such that the operation of all the components in the subset implies the operation of the system.

A Minimal Cut Set - MCS is a subset of the components of the system such that the failure of all components in the subset does not imply the operation of the system. Still, the set is called minimal because the removal of any component from the subset clears this property [see Figure 6].

Figure 6.

Minimal Cut Set. The system of the left contains the minimal cut set, indicated by the dashed lines, shown in the right part. Each of them represents a minimum subset of the components of the system such that the failure of all components in the subset does not imply the operation of the system.

MCS and MPS can be used to build equivalent configurations of more complex systems, not referable to the simple series-parallel model. The first equivalent configuration is based on the consideration that the operation of all the components, in at least a MPS, entails the operation of the system. This configuration is, therefore, constructed with the creation of a series subsystem for each path using only the minimum components of that set. Then, these subsystems are connected in parallel. An example of an equivalent system is shown in Figure 7.

Figure 7.

Equivalent configurations with MPS. You build a series subsystem for each MPS. Then such subsystems are connected in parallel.

The second equivalent configuration, is based on the logical principle that the failure of all the components of any MCS implies the fault of the system. This configuration is built with the creation of a parallel subsystem for each MCS using only the components of that group. Then, these subsystems are connected in series [see Figure 8].

Figure 8.

Equivalent configurations with MCS. You build a subsystem in parallel for each MCS. Then the subsystems are connected in series.

After examining the components and the status of the system, the next step in the static modeling of reliability is that of considering the probability of operation of the component and of the system.

The reliabilityRiof thei-thcomponent is defined by:

while the reliability of the systemRis defined as in equation 8:

The methodology used to calculate the reliability of the system depends on the configuration of the system itself. For a series system, the reliability of the system is given by the product of the individual reliability [law of Lusser, defined by German engineer Robert Lusser in the 50s]:

R=∏i=1nRi since R=P⋂i=1nXi=1=∏i=1nPXi=1=∏i=1nRiE9

For an example, see Figure 9.

Figure 9.

serial system consisting of 4 elements with reliability equal to0.98, 0.99, 0.995and0.975. The reliability of the whole system is given by their product:R = 0.98 · 0.99 · 0.995 · 0.975 = 0.941

For a parallel system, reliability is:

In fact, from the definition of system reliability and by the properties of event probabilities, it follows:

R=P⋃i=1nXi=1=1-P⋂i=1nXi=0=1-∏i=1nPXi=0==1-∏i=1n1-PXi=1=1-∏i=1n1-Ri=∐i=1nRiE11

In many parallel systems, components are identical. In this case, the reliability of a parallel system withnelements is given by:

Figure 10.

A parallel system consisting of 4 elements with the same reliability of 0.85. The system reliability s given by their co-product:1-1-0.854=0.9995.

For a series-parallel system, system reliability is determined using the same approach of decomposition used to construct the state function for such systems. Consider, for instance, the system drawn in Figure 11, consisting of 9 elements with reliabilityR1=R2=0.9; R3=R4=R5=0.8andR6=R7=R8=R9=0.7. Let’s calculate the overall reliability of the system.

Figure 11.

The system consists of three groups of blocks arranged in series. Each block is, in turn, formed by elements in parallel. First we must calculateR1,2=1-1-0.82=0.99. So it is possible to estimatedR3,4,5=1-1-0.83=0.992. Then we must calculate the reliability of the last parallel blockR6,7,8,9=1-1-0.74=0.9919. Finally, we proceed to the series of the three blocks:R=R1,2∙R3,4,5∙R6,7,8,9=0.974.

To calculate the overall reliability, for all other types of systems which can’t be brought back to a series-parallel scheme, it must be adopted a more intensive calculation approach [3] that is normally done with the aid of special software.

Reliability functions of the system can also be used to calculate measures of reliability importance.

These measurements are used to assess which components of a system offer the greatest opportunity to improve the overall reliability. The most widely recognized definition of reliability importanceI'iof the components is the reliability marginal gain, in terms of overall system rise of functionality, obtained by a marginal increase of the component reliability:

For other system configurations, an alternative approach facilitates the calculation of reliability importance of the components. LetR1ibe the reliability of the system modified so thatRi=1andR0ibe the reliability of the system modified withRi=0, always keeping unchanged the other components. In this context, the reliability importanceIiis given by:

In a series system, this formulation is equivalent to writing:

Thus, the most important component [in terms of reliability] in a series system is the less reliable. For example, consider three elements of reliabilityR1=0.9,R2=0.8eR3=0.7. It is therefore:I1=0.8∙0.7=0.56,I2=0.9∙0.7=0.63andI3=0.9·0.8=0.72which is the higher value.

If the system is arranged in parallel, the reliability importance becomes as follows:

It follows that the most important component in a parallel system is the more reliable. With the same data as the previous example, this time having a parallel arrangement, we can verify Eq. 16 for the first item:I1=R11-R01=1-1-1·1-0.8∙1-0.7-1-1-0·1-0.8∙1-0.7=1-0-1+1-0.8∙1-0.7=1-0.8∙1-0.7.

For the calculation of the reliability importance of components belonging to complex systems, which are not attributable to the series-parallel simple scheme, reliability of different systems must be counted. For this reason the calculation is often done using automated algorithms.

3. Fleet reliability

Suppose you have studied the reliability of a component, and found that it is 80% for a mission duration of 3 hours. Knowing that we have 5 identical items simultaneously active, we might be interested in knowing what the overall reliability of the group would be. In other words, we want to know what is the probability of having a certain number of items functioning at the end of the 3 hours of mission. This issue is best known as fleet reliability.

Consider a set ofmidentical and independent systems in a same instant, each having a reliabilityR. The group may represent a set of systems in use, independent and identical, or could represent a set of devices under test, independent and identical. A discrete random variable of great interest reliability isN, the number of functioning items. Under the assumptions specified,Nis a binomial random variable, which expresses the probability of a Bernoulli process. The corresponding probabilistic model is, therefore, the one that describes the extraction of balls from an urn filled with a known number of red and green balls. Suppose that the percentageRof green balls is coincident with the reliability after 3 hours. After each extraction from the urn, the ball is put back in the container. Extraction is repeatedmtimes, and we look for the probability of findingngreen. The sequence of random variables thus obtained is a Bernoulli process of which each extraction is a test. Since the probability of obtainingNsuccesses inmextractions from an urn, with restitution of the ball, follows the binomial distributionBm,RB, the probability mass function ofNis the well-known:

The expected value ofNis given by:EN=μN=m∙Rand the standard deviation is:σN=m∙R∙1-R.

Let’s consider, for example, a corporate fleet consisting of 100 independent and identical systems. All systems have the same mission, independent from the other missions. Each system has a reliability of mission equal to 90%. We want to calculate the average number of missions completed and also what is the probability that at least 95% of systems would complete their mission. This involves analyzing the distribution of the binomial random variable characterized byR = 0.90andm = 100. The expected value is given byEN=μN=100∙0.9=90.

The probability that at least 95% of the systems complete their mission can be calculated as the sum of the probabilities that complete their mission 95, 96, 97, 98, 99 and 100 elements of the fleet:

PN≥n=∑n=95100m!n!m-n!Rn1-Rm-n=0,058E18

4. Time dependent reliability models

When reliability is expressed as a function of time, the continuous random variable, not negative, of interest isT, the instant of failure of the device. Letf[t]be the probability density function ofT, and letF[t]be the cumulative distribution function ofT.F[t]is also known as failure function or unreliability function [4].

In the context of reliability, two additional functions are often used: the reliabilityand the hazard function. Let’s define ReliabilityR[t]as the survival function:

The Mean Time To Failure - MTTFis defined as the expected value of the failure time:

Integrating by parts, we can prove the equivalent expression:

5. Hazard function

Another very important function is the hazard function, denoted byλ[t], defined as the trend of the instantaneous failure rate at timetof an element that has survived up to that timet. The failure rate is the ratio between the instantaneous probability of failure in a neighborhood oft- conditioned to the fact that the element is healthy int- and the amplitude of the same neighborhood.

The hazard functionλ[t][5] coincides with the intensity functionz[t]of a Poisson process. The hazard function is given by:

λt=limΔt→0⁡Pt≤Tt+t0|T>t0=PT>t0|T>t+t0∙PT>t+t0PT>t0E28

And, given thatPT>t0|T>t+t0=1, we obtain the final expression, which determines the residual reliability:

The residual Mean Time To Failure– residual MTTFmeasures the expected value of the residual life of a device that has already survived a timet0:

MTTFt0=ET-t0|T>t0=∫0∞Rt+t0|t0∙dtE30

For an IFR device, the residual reliability and the residual MTTF, decrease progressively as the device accumulates hours of operation. This behavior explains the use of preventive actions to avoid failures. For a DFR device, both the residual reliability and the residual MTTF increase while the device accumulates hours of operation. This behavior motivates the use of an intense running [burn-in] to avoid errors in the field.

The Mean Time To Failure–MTTF, measures the expected value of the life of a device and coincides with the residual time to failure, wheret0=0. In this case we have the following relationship:

MTTF=MTTF0=ET|T>0=∫0∞Rt∙dtE31

The characteristic lifeof a device is the timetCcorresponding to a reliabilityRtCequal to1e, that is the time for which the area under the hazard function is unitary:

RtC=e-1=0,368 →RtC=∫0tCλu∙du=1E32

Let us consider a CFR device with a constant failure rateλ. The time-to-failure is an exponential random variable. In fact, the probability density function of a failure, is typical of an exponential distribution:

ft=λt∙e-∫0tλu∙du=λe-λ∙tE33

The corresponding cumulative distribution functionF[t]is:

Ft=∫-∞tfzdz=∫-∞tλe-λ∙zdz=1-e-λ∙tE34

The reliability functionR[t]is the survival function:

For CFR items, the residual reliability and the residual MTTF both remain constant when the device accumulates hours of operation. In fact, from the definition of residual reliability,∀t0∈0,∞, we have:

Rt+t0|t0=Rt+t0Rt0=e-λ∙t+t0e-λ∙t0=e-λ∙t+t0+λ∙t0=e-λ∙t=RtE36

Similarly, for the residual MTTF, is true the invariance in time:

MTTFt0=∫0∞Rt+t0|t0∙dt=∫0∞Rt∙dt ∀t0∈0,∞E37

This behavior implies that the actions of prevention and running are useless for CFR devices. Figure 13 shows the trend of the functionft=λ∙e-λ∙tand of the cumulative distribution functionFt=1-e-λ∙tfor a constant failure rateλ=1. In this case, sinceλ=1, the probability density function and the reliability function, overlap:ft=Rt=e-t.

Figure 13.

Probability density function and cumulative distribution of an exponential function. In the figure is seen the trend offt=λ∙e-λ∙tand offt=λ∙e-λ∙twithλ=1.

The probability of having a fault, not yet occurred at timet, in the nextdt, can be written as follows:

Recalling the Bayes' theorem, in which we consider the probability of an hypothesis H, being known the evidence E:

we can replace the evidence E with the fact that the fault has not yet taken place, from which we obtainP[E]→P[T>t]. We also exchange the hypothesis H with the occurrence of the fault in the neighborhood oft, obtainingPH→ Pt