In a Bayesian network (BN), how a node of interest is affected by the observation at another node is a main concern, especially in backward inference. This challenge necessitates the proposed global sensitivity analysis (GSA) for BN, which calculates the Sobol’ sensitivity index to quantify the contribution of an observation node toward the uncertainty of the node of interest. In backward inference, a low sensitivity index indicates that the observation cannot reduce the uncertainty of the node of interest, so that a more appropriate observation node providing higher sensitivity index should be measured. This GSA for BN confronts two challenges. First, the computation of the Sobol’ index requires a deterministic function while the BN is a stochastic model. This paper uses an auxiliary variable method to convert the path between two nodes in the BN to a deterministic function, thus making the Sobol’ index computation feasible. Second, the computation of the Sobol’ index can be expensive, especially if the model inputs are correlated, which is common in a BN. This paper uses an efficient algorithm proposed by the authors to directly estimate the Sobol’ index from input–output samples of the prior distribution of the BN, thus making the proposed GSA for BN computationally affordable. This paper also extends this algorithm so that the uncertainty reduction of the node of interest at given observation value can be estimated. This estimate purely uses the prior distribution samples, thus providing quantitative guidance for effective observation and updating.

## Introduction

During the past 30 years, the Bayesian network (BN) has become a key method for representation and reasoning under uncertainty in the fields of engineering [1,2], machine learning [3,4], artificial intelligence [5,6], etc. In a BN, random variables are denoted by nodes and their dependence relationships are denoted by directed arcs (arrows). An arc (arrow) between two nodes indicates that the distribution of the downstream node (child node) is conditioned on the value of the upstream node (parent node), i.e., the arc is associated with a conditional probability distribution (CPD). The entire BN represents the joint probability distribution of the random variables. As long as the BN has been established, two tasks can be pursued: (1) calculate the distribution of a downstream node based on the given observation of an upstream node, i.e., forward propagation and (2) estimate the posterior distribution of an upstream node at the given observation of a downstream node, i.e., backward inference. Usually, the purpose of backward inference is to reduce the uncertainty of the variable (node) of interest, as shown in Fig. 1.

We can see that both the forward propagation and the backward inference concern that whether fixing a node would affect the distribution of another node. And this concern actually requires a sensitivity analysis: in forward propagation, if the node of interest has a low sensitivity with respect to an input node (an upstreaming node which causes uncertainty in the node of interest), we can be simply fix this input node to achieve dimension reduction without affecting the resultant distribution of the node of interest; and in backward propagation, if the node of interest has a low sensitivity with respect to an observation node, then this observation node cannot effectively reduce the uncertainty of the node of interest (as shown in Fig. 1(b)); thus, we should switch to observe a more appropriate node leading to higher sensitivity. Note that this sensitivity analysis is desired before conducting the forward propagation and/or the backward inference.

where $x\u2212i$ means all the model inputs other than $xi$, and $V(\u22c5)$ means the variance, and $E(\u22c5)$ means the mean value. For the numerator of the middle term of Eq. (1), $V\Phi l(y)=E\Phi l(Vx\u2212i(y|xi))+V\Phi l(Ex\u2212i(y|xi))$ denotes the mean value of $y$ across different values of $x\u2212i$ at given value of $xi$, and $Vxi(\u22c5)$ means the variance across different values of $xi$. Similarly, for the numerator of the right term of Eq. (1), $Vx\u2212i(y|xi)$ denotes the variance of $y$ across different values of $x\u2212i$ at given value of $xi$, and $Exi(\u22c5)$ means the variance across different values of $xi$. Furthermore, the middle term and the right term are equal due to the law of total variance $V(y)=Vxi(Ex\u2212i(y|xi))+Exi(Vx\u2212i(y|xi))$.

$Si$ quantifies the contribution of input $xi$ by itself to the uncertainty in output $y$. Specifically, the last term of Eq. (1) also indicates that $Si$ is the average ratio of the output variance reduction by fixing $xi$. For example, $Si=0.3$ means that the variance of $y$ will reduce by 30% on average after fixing $xi$.

The proposed GSA for BN aims to calculate the first-order Sobol’ index to quantity the contribution of node $X1$ toward the uncertainty of the node of interest $XN$. (As mentioned earlier, $X1$ denotes an input node in the case of forward propagation, and an observation node in the case backward inference). However, this GSA for a BN confronts two challenges of feasibility and affordability. First, the computation of the Sobol’ index requires a deterministic function [15]. The term “deterministic function” means that if all the inputs are given specific values, the function outputs a single value. It does not matter whether the function is closed-form or implicit. But a unique input to the model gives a unique value of the output, and the variance-based Sobol’ index [16] needs this assumption. In a previous paper, we have discussed the need for a deterministic function in the Sobol index computation [10]. Our paper emphasizes the term deterministic function to contrast from a “stochastic” prediction; in the latter case, the function output is probabilistic (i.e., it has many possible realizations) even if all inputs are fixed. A simple stochastic model is a Gaussian variable $X\u223cN(\mu ,\sigma 2)$ if we consider $\mu $ and $\sigma $ as model inputs and $X$ as model output. The BN is a stochastic model, i.e., the node of interest still has a distribution even at given values of other nodes. And the required deterministic function mapping $X1$ (and some other variables) to the node of interest $XN$ is unestablished. Proof of the existence and the establishment of this deterministic function needs to be solved. Note that usually the Sobol’ index is applied to deterministic function of continuous input/output, so that this paper only considers the continuous BN where all the nodes are continuous variables.

Second, the computation of the Sobol’ index can be expensive even if the deterministic function is established. Based on Eq. (1), computing $Si$ analytically is nontrivial, since $Ex\u2212i(\u22c5)$ in Eq. (1) indicates a multidimensional integral. Computing $Si$ by Monte Carlo simulation (MCS) directly is also expensive. The numerator in Eq. (1) leads to a double-loop MCS [7]: the inner loop $Ex\u2212i(y|xi)$ computes the mean value of $y$ using $Xt\u2208\u211cm$ random samples of $x\u2212i$; and the outer loop computes $Vxi(Ex\u2212i(y|xi))$ by iterating the inner loop $n2$ times at different values of $xi$. Various algorithms have been proposed to reduce the cost in the computation of the first-order Sobol’ index, categorized into analytical methods [17–19] and Monte Carlo methods [16,20–23]. However, all of these algorithms assume uncorrelated model inputs, while the nodes in a BN are usually correlated. Saltelli's paper [23] in 2002 mentioned that there is no alternative to the expensive double-loop MCS to compute $Si$ with correlated model inputs. This hurdle in computational cost is also to be solved.

The rest of the paper is organized as follows: Section 2 uses the auxiliary variable method to convert the path between node $X1$ and node $XN$ to a deterministic function, thus making the Sobol’ index computation feasible for a BN. Section 3.1 introduces an efficient algorithm to directly estimate the Sobol’ index from Monte Carlo samples of the prior distribution of the BN, so that the proposed GSA for the BN is affordable. The resultant index is the averaged variance reduction ratio (VRR) of the node of interest across all possible values of the observation node. If a specific value of the observation node is given, the proposed method is extended to give a better estimate of variance reduction ratio at the given observation value. Section 4 illustrates the proposed method using two examples, including a time-independent static BN and a time-dependent dynamic BN.

Note: the sensitivity analysis in this paper is based on “current knowledge,” which means the joint distribution of the Bayesian network. This paper uses the prior distribution samples to conduct the sensitivity analysis since that is our current knowledge before any observation and updating. Of course, after collecting an observation and subsequent calculation of the posterior distribution, we can also use the samples from the posterior distribution to calculate the posterior sensitivity index.

## Feasibility of Global Sensitivity Analysis for a Bayesian Network

### Auxiliary Variable Method.

The auxiliary variable method was developed by Sankararaman and Mahadevan [24] to quantitatively distinguish the contributions of aleatory natural variability and epistemic distribution parameter uncertainty in a random variable $x$. The distribution of $x$ is conditioned on the value of its distribution parameter $\theta x$. The uncertainty about the value of $\theta x$ is represented by a probability density $\pi (\theta x)$, and this uncertainty is epistemic. The aleatory uncertainty of $x$ is denoted by a conditional distribution $\pi (x|\theta x)$, which changes as the value of $\theta x$ changes.

Equation (2) is a deterministic function $f:(Ux,\theta x)\u2192x$, since a sample of $\theta x$ and a sample of $Ux$ lead to a single value of $x$. At a given value of $\theta x$, the sample of $Ux$ and the sample of $x$ have a one-to-one mapping, i.e., a single value of $x$ is determined once the value of $Ux$ is decided. Thus the natural variability in $x$ is represented by $Ux$. This standard uniform random variable $Ux$, which is the CDF value of $\pi (x|\theta x)$, is named as the auxiliary variable.

Although the use of auxiliary variable is a standard procedure in sampling random variables, generally it is used implicitly and only the resultant samples of the random variables are recorded and utilized. However, if we use this auxiliary variable explicitly, it makes the GSA for BN possible. This will be explained in the remainder of this section.

### Deterministic Function for a Directed Path in Bayesian Network.

The auxiliary variable method have been extended to any variable whose distribution is conditioned on other variables [10,26], i.e., to any CPD in the BN. Assume that the distribution of a random variable $C$ depends on the value of two other random variables $A$ and $B$ by a CPD $\pi (C|A,B)$. Then the variability in $\pi (C|A,B)$ can be captured by a single auxiliary variable $UC$, which is the CDF value of $\pi (C|A,B)$. Thus, the uncertainty in variable $C$ is caused by two components: (1) the uncertainty due to the parent nodes $A,B$; and (2) the uncertainty expressed by the CPD at given values of $A$ and $B$. The introduced auxiliary variable captures the later part. (With reference to Sec. 2.1, $A$ and $B$ are analogous to distribution parameters for the random variable $C$). As shown in Fig. 2 $(A)$, $(B),$ and $(C)$ constitute a simple BN. The introduced auxiliary variable $UC$ converts $C$ to be a deterministic node, which means the value of $C$ is fixed once the value of its parent nodes ${A,B,UC}$ is given. Finally, this auxiliary variable build a deterministic function $C=F\u22121(UC|A,B)$, where $F\u22121(\u22c5)$ is the inverse CDF of the CPD $\pi (C|A,B)$.

The auxiliary variable method can be further extended to a directed path $X1\u2192X2\u2026\u2192XN$ in a BN. Here, $XN$ is the node of interest, and the objective is to compute the sensitivity index of $X1$ to decide whether $XN$ is sensitive to it. An example of the directed path is $A\u2192C\u2192E\u2192G$ in Fig. 3. As shown in Fig. 4, by introducing auxiliary variables to each CPD in this directed path, a deterministic function mapping $X1$ to $XN$ is established

where $F\u22121(UXi|PaXi\u2032,Xi\u22121)$ for $i=2toN$ is the inverse CDF of the CPD $\pi (Xi|PaXi\u2032,Xi\u22121)$, and $UXi$ is the auxiliary variable introduced for this CPD, and $PaXi\u2032$ represents the parent nodes of $Xi$ that are not in this path (Note that another notation $PaV$ is used later, which means all the parents node of $V$, i.e., $PaXi={PaXi\u2032,Xi\u22121}$ in Fig. 4. The inputs of Eq. (3) are ${X1,PXi\u2032,UXi}$ for $i=2toN$, thus Eq. (3) can be also denoted as a deterministic function $f:{X1,PXi\u2032,UXi}\u2192XN$.

### Deterministic Function for a Trail in Bayesian Network.

The directed path from $X1$ to $XN$ in Sec. 2.2 requires that all the arcs are directed toward $XN$, which may NOT be satisfied. It is not usually that some arrows are toward $X1$. In backward inference, the node of interest $XN$ is usually an upstream node (probably root node) while the observation node is usually a downstream node. Thus, the path between them turns to be $X1\u2190X2\u2190\u2026\u2190XN$, where all the arrows are toward $X1$. In this case, the function $f:X1\u2192XN$ will be still missing.

To solve this problem, let consider a general case named “trail.” A trail $X1\u2212X2\u2212\u2026\u2212XN$ (where the arc “ $\u2212$ ” is still directed, either “ $\u2192$ ” or “ $\u2190$ ”) only requires all the adjacent nodes in the path are connected by arcs, regardless of the direction of the arcs. The deterministic function established in Eq. (3) for the directed path can be also extended to the trail based on the theorem of arc reversal [27].

Theorem 1 Arc Reversal. *Given that there is an arc*$(V1,V2)$*from node*$V1$*to node*$V2$*, but no other directed path from*$V1$*to*$V2$*, arc*$(V1,V2)$*can be replaced by arc*$(V2,V1)$*. Afterward, both nodes inherit each other's parent nodes.*

This theorem is illustrated in Fig. 5. Here, $PaV1$ indicates the parent nodes of $V1$, and $PaV2$ indicates the parent nodes of $V2$. In addition, $PaV1\PaV2$ are the nodes which are the parents of $V1$ but not the parents of $V2$, and correspondingly $PaV2\PaV1$ are the nodes which are the parents of $V2$ but not the parents of $V1$; and $PaV1\u2229\u2009PaV2$ are the shared parents of $V1$ and $V2$. Figure 5 shows that after reversing the arc between $V1$ and $V2$, extra arcs $(PaV1\PaV2,V2)$ and $(PaV2\PaV1,V1)$ are also derived based on Ref. [27] and added the new BN to guarantee that the new BN after arc reversal is mathematically equivalent to the original BN. The CPDs also need to be redefined, and the derivation of the new CPDs can be also found in Ref. [27]. However, note that the proposed method in this paper do NOT need to derive these new CPDs. The main focus of this section is to illustrate the possibility of arc reversal and prove the existence of the deterministic function mapping $X1$ to $XN$ even if the path between them is undirected.

With respect to the trail between $X1$ and $XN$, Theorem 1 proves that the arc $(Xi,Xj)$ between two adjacent nodes $Xi$ and $Xj$ ($i=j+1orj\u22121$ so that they are adjacent) can be reversed, as long as there is no other directed path from $Xi$ to $Xj$. If all the arcs toward $X1$ can be reversed, this trail will be converted to a directed path from $X1$ to $XN$ so that a deterministic function mapping $X1$ to $XN$ exists based on Eq. (3). In Fig. 3, the trail $H\u2190E\u2192G$ can be converted to a directed path $H\u2192E\u2192G$ by reserving the arc $(E,H)$; then, a deterministic function mapping $H$ to $G$ can be constructed using auxiliary variables.

The arc reversal makes the sensitivity analysis regarding the backward Bayesian inference possible. In this case, we aim to compute the Sobol’ index of observation node $X1$ regarding an inference node of interest $XN$ so that we can quantify the variance reduction of $XN$ by fixing node $X1$ at an observation value. The existing path in the Bayesian network is $X1\u2190X2\u2190\u2026\u2190XN$, then the arc reversal enables us to reverse all the arcs to build the required deterministic function.

It is arc reversal that makes the sensitivity analysis regarding Bayesian inference possible. In this case, we aim to compute the sensitivity of root node $XN$ with respect to the observation node $X1$ so that we can quantify the uncertainty reduction of $XN$ by fixing node $X1$ at an observation value. We need a directed path from $X1$ to $XN$. However, the existing path in the Bayesian network is always directed from root node $XN$ toward the observation $X1$, so that we must reverse all the arcs to build the required deterministic function.

### Global Sensitivity for a Path in the Bayesian Network.

For two nodes $X1$ and $XN$ in the BN, Secs. 2.1–2.3 explained the possibility to build a deterministic function mapping $X1$ to $XN$ via: (1) the directed path or trail between them, (2) the theorem of arc reversal, and (3) the introduction of auxiliary variables. Thus, we can conduct the GSA on Eq. (3) and compute the first-order Sobol’ index $SX1$ for $X1$. As explained below in Eq. (1), $SX1$ is the average ratio of the reduced variance of $XN$ by fixing $X1$. In backward inference, a low value $SX1$ indicates that measuring $X1$ to reduce the uncertainty of $XN$ is just a waste of effort, and another node of higher sensitivity index would be more appropriate.

However, the computation of $SX1$ is nontrivial. First, building the deterministic function explicitly can be complicated for both directed and trail. For the directed path $X1\u2192\u2026\u2192XN$, the effort to build the deterministic function in Eq. (3) becomes intensive if the path is long so that many nodes and auxiliary variables will be involved. For the trail, more efforts to modify the structure of the BN and derive new CPDs are required. In backward inference, usually the observation node is downstream and the node of interest is the upstream; thus, the corresponding path is $X1\u2190\u2026\u2190XN$. To build the required deterministic function mapping $X1$ to $XN$, all arcs in the path need to be reversed, which brings intensive computational efforts.

Second, even with the deterministic function established, calculating the sensitivity index also needs intensive effort. The inputs of the deterministic function include the nodes in the BN, so the correlation between them is very common. As mentioned in Sec. 1, since current efficient algorithms for Sobol’ index usually require uncorrelated inputs, the expensive double-loop MCS is the only choice.

Actually, the purpose of this section is to mathematically prove the existence of a deterministic function between two nodes (regardless of the computational effort), so that the Sobol’ index, which requires the deterministic function, is applicable to quantify the sensitivity for the Bayesian network. The Sobol’ index computation will be realized by an algorithm proposed by the authors. Details of this algorithm can be found in Ref. [28], and a brief introduction is given in Sec. 3.1. This algorithm directly extracts the first-order Sobol’ index from the input–output samples. Using this algorithm, the Sobol’ index of the observation node quantifying its contribution toward the uncertainty of the node of interest can be computed purely using their samples generated from the joint prior distribution of the BN, where the computation of the inference is NOT needed. In detail, we generate samples of the root nodes from their prior distributions and then generate samples of nonroot nodes based on the samples of their parent nodes and the CPDs. Therefore, explicitly establishing the deterministic function is not necessary, and the expensive double-loop MCS method is also avoided. Section 3.2 extends this algorithm to a better variance reduction estimate at a specific observation value.

## Sobol’ Index Computation and Variance Reduction at Given Observation

### Directly Estimate the First-Order Sobol’ Index From Input–Output Samples.

Consider a deterministic function of $y=f(x)$ where $x={x1,\u2026,xk}$. This algorithm is regarding first-order Sobol’ index expression of $Si=1\u2212Exi(Vx\u2212i(y|xi))/V(y)$, whose numerator $Exi(Vx\u2212i(y|xi))$ implies an expensive double-loop Monte Carlo simulation.

First we illustrate stratified sampling, which generates samples in equal probability intervals to represent the distribution of a random variable $xi$. Figure 6(a) shows one strategy [7] of stratified sampling: (1) divide the CDF of $xi$ into $M$ intervals such that these intervals have the same length and (2) generate one sample $ul$ (the red dots in Fig. 6(a), and $l=1toM$) from each CDF interval and obtain samples of $xi$ (the green dots in Fig. 6) by CDF inversion $xil=P\u22121(ul)$, where $P\u22121(\u22c5)$ is the inverse CDF of $xi$. If we take the bounds of these intervals of the CDF as the inputs of $P\u22121(\u22c5)$, the sampling space of $xi$ is actually divided into $M$ equally probable intervals $\Phi l(l=1toM)$, as shown in Fig. 6, and $xil$ is actually a random sample generated within $\Phi l$. In this paper we also denote $\Phi ={\Phi 1,\Phi 2,\u2026,\Phi M}$.

where $xil#$ is an unknown but existing point in $\Phi l$. Note that now $V\Phi l(y)$ is the variance of $y$ given $xi\u2208\Phi l$ and $V\Phi l(Ex\u2212i(y|xi))$ is the variance of $Ex\u2212i(y|xi)$ given $xi\u2208\Phi l$.

where $\Phi l(l=1toM)$ then Eq. (11) can be realized by the following steps:

- (1)
Generate $n$ random samples of $x$;

- (2)
Obtain corresponding values of $y$ by evaluating $y=f(x)$, and estimate $V(y)$ using all samples of $y$;

- (3)
Divide the domain of $xi$ into $M$ equally probable intervals, as shown in Fig. 6;

- (4)
Assign the samples of $y$ into divided intervals based on one-to-one mapping between the samples of $xi$ and samples of $y$;

- (5)
Estimate $V\Phi l(y)$ as the sampling variance of $y$ in each interval;

- (6)
Estimate $E\Phi (V\Phi l(y))$ as the sampling mean of $t$ in step 5;

- (7)
The first-order index is $Si=1\u2212E\Phi (V\Phi l(y))/V(y)$.

The steps to realize the proposed method are also illustrated in Fig. 7, where samples in different equally probable intervals are represented in different colors. Step 1 in Fig. 7 shows that the proposed algorithm can directly estimate $Si$ from input–output samples without rerunning the underlying model. Moreover, steps 3 and 4 show that the samples of $x\u2212i$ are not used in calculating the index $Si$ for $xi$; therefore, the calculation of $Si$ purely depends on the samples of $xi$ and $y$*,* and can be achieved even if the samples of $x\u2212i$ are missing. In addition, the derivation of Eq. (11) does not assume uncorrelated inputs; thus, this algorithm can handle both correlated and uncorrelated inputs. These three advantages make this algorithm suitable for the GSA of a BN using the samples generated from the joint prior distribution.

One question is how many samples are needed to reach desirable accuracy of the computed index. This question has been addressed in an earlier paper [28] by the authors. A heuristic rule is: (1) at least 50 intervals and (2) each interval $\Phi l$ contains at least 50 samples. To guarantee this condition, we simply need to generate 2500 samples.

### Variance Reduction Assessment by the Proposed Algorithm.

Specifically, the desired first-order index $SX1$ can be calculated by considering the $X1$ as “$xi$” and $XN$ as “$y$” in Fig. 7.

The resultant index $SX1$ is the average ratio of the variance reduction of $XN$ by fixing $X1$ at its observation; and this is an average over all possible values of $X1$. This is an informative estimate before the forward propagation or backward inference if the value of the observation is NOT known.

Compared to Eq. (12) which computes the average VRR of $XN$ over all possible values of $X1$, Eq. (13) estimates the VRR of $XN$ at a specific value of $X1$. The accuracy of Eq. (13) will be higher if: (1) $\Phi \u0302$ is narrower so that $X\u03021$ is closer to $X1#$ and (2) more samples of $XN$ are assigned to $\Phi \u0302$ so that $V\Phi \u0302(XN)$ is a better estimate of $V(XN|X1=X1#)$.

### Summary.

Section 3.1 introduces a new algorithm to efficiently estimate the first-order Sobol’ index from Monte Carlo samples. In the proposed algorithm, the calculation of the first-order Sobol’ index of an input does not need the samples of other inputs. This advantage makes it an ideal algorithm for the GSA of a BN, where generating prior samples are much easier than building the underlying deterministic function in discussed in Sec. 2. The proposed algorithm can also handle correlated inputs, so that it is suitable for a BN where the nodes are generally correlated.

Section 3.2 described the implementation of the proposed algorithm to the BN. Equation (12) can be used to calculate the average ratio of the variance reduction of a node of interest $XN$ over all possible observations of another node $X1$ ; Eq. (13) provides a better estimate of variance reduction ratio if the value of the observation is known.

In Sec. 4, two numerical examples are provided to illustrate the proposed GSA method for the BN. The first example is a time-independent static BN, and the second example is a time-dependent dynamic BN.

## Numerical Examples

### Structural Dynamics Problem.

A structural dynamics problem provided by Sandia National Laboratories is used to illustrate the proposed method, and more details on this problem can be found in Refs. [29–32]. As shown in Fig. 8, the system of interest contains three mass–spring–damper components in series; and these components are mounted on a beam supported by a hinge at one end and a spring at the other end; and a sinusoidal force input $P=3000\u2009sin(350t)$ is applied on the beam.

This system has three model parameters of spring stiffnesses $k=(k1,k2,k3)$ and they are assumed to have unknown true values to be calibrated. The prior distribution of $ki$ is assumed to be Gaussian with a coefficient of variation of 10% and mean values of $\mu k1=5000$, $\mu k2=10,000$, and $\mu k3=9000$.

The quantity to be measured for model calibration is the maximum acceleration $A3$ in the third mass $m3$. A computational model $A3=F(k)$ based on finite element analysis has been provided by Sandia National Laboratories [29].

To improve the computational efficiency, a Gaussian process (GP) [33,34] surrogate model $A3=GP(k)$ is constructed to replace the expensive dynamics computational model. The prediction of the GP model is a Gaussian distribution $N(\mu (k),\sigma 2(k))$; thus, a CPD is given by the GP model.

The observation variable is denoted as $D$ and we have $D=A3+\u03f5m$ where $\u03f5m$ is the measurement error with a zero-mean Gaussian distribution $\u03f5m\u223cN(0,\sigma m2)$. Thus, another CPD is given by the measurement error. In this example, $\sigma m$ is another parameter to be calibrated and we assign a noninformative uniform prior distribution $U(150,250)$ to it.

A BN is established for this model calibration problem, as shown in Fig. 9. In this example, we are interested in: (1) calculating the first-order Sobol’ index of the observation variable $D$ quantifying its contribution toward the uncertainty in each calibration parameters of interest ${k1,k2,k3,\sigma m}$ and (2) predicting the VRR of the calibration parameters at a given observation.

As samples are generated from the joint prior distribution of this network, the first-order Sobol’ indices of ${k1,k2,k3,\sigma m}$ are obtained by considering the calibration parameter as $XN$ and the observation variable $D$ as $X1$ in Eq. (12). The results are listed in Table 1. From this table, we conclude that the variance of $k1$ will reduce by 50% on average due to calibration; the variance reduction of $k3$ is 11% on average; while the variance of $k2$ and $\sigma m$ will not be reduced significantly by calibration. This is a very valuable insight, which is also supported by Fig. 10. Thus, if we want to reduce the uncertainty in $k2$, we need to observe another quantity. In the latter computation of VRR at specific observations of $A3$, we focus on $k1$ and $k3$.

Parameter of interest | $k1$ | $k2$ | $k3$ | $\sigma m$ |
---|---|---|---|---|

First-order Sobol’ index of observation node $D$ | 0.50 | 0.02 | 0.11 | 0.00 |

Parameter of interest | $k1$ | $k2$ | $k3$ | $\sigma m$ |
---|---|---|---|---|

First-order Sobol’ index of observation node $D$ | 0.50 | 0.02 | 0.11 | 0.00 |

Table 1 shows the average VRR. Now assume that the specific observed value of $A3$ is known (a synthetic data point). Based on Eq. (13), we predict the VRR of $k1$ and $k3$ for this specific observation, as shown in Table 2, where the “by inference” method mean we finish the Bayesian inference and compute the VRR based on the posterior distributions. In this example, the Bayesian inference is finished by the rejection sampling algorithm [35] which results in $2\xd7104$ samples from the posterior distributions of the calibration parameters. Figure 10 shows the probability density functions of these posterior distributions at data point of $D=4500$. We recalculate the VRR by comparing the variances of the posterior samples and the prior samples. As shown in Table 2, our earlier predictions are close to the sample-based results. However, the proposed method only uses the samples from the prior distribution, and no actual Bayesian inference effort is required. In a personal computer of Intel Core i7-4710HQ CPU, the proposed method spends less than 1 s to calculate the sensitivity index at a data point, while the other method by inference spends about 20 s.

$k1$ | $k3$ | |||
---|---|---|---|---|

Data point | Proposed method (Bayesian inference NOT needed) (%) | By inference (Bayesian inference needed) (%) | Proposed method (%) | By inference (%) |

3900 | 49.9 | 47.0 | 3.2 | 4.0 |

4000 | 45.2 | 44.2 | 13.6 | 15.4 |

4100 | 38.3 | 42.4 | 23.6 | 19.7 |

4200 | 43.5 | 44.8 | 20.7 | 25.5 |

4300 | 48.2 | 54.2 | 35.2 | 31.9 |

4400 | 60.5 | 63.2 | 41.7 | 38.9 |

4500 | 69.6 | 69.1 | 43.3 | 44.7 |

$k1$ | $k3$ | |||
---|---|---|---|---|

Data point | Proposed method (Bayesian inference NOT needed) (%) | By inference (Bayesian inference needed) (%) | Proposed method (%) | By inference (%) |

3900 | 49.9 | 47.0 | 3.2 | 4.0 |

4000 | 45.2 | 44.2 | 13.6 | 15.4 |

4100 | 38.3 | 42.4 | 23.6 | 19.7 |

4200 | 43.5 | 44.8 | 20.7 | 25.5 |

4300 | 48.2 | 54.2 | 35.2 | 31.9 |

4400 | 60.5 | 63.2 | 41.7 | 38.9 |

4500 | 69.6 | 69.1 | 43.3 | 44.7 |

In summary, this example verified the effectiveness of the proposed method to predict the variance reduction ratio before conducting the Bayesian inference. Thus, the proposed method provides valuable guidance for selecting observation nodes; for example, in the subsequent updating, nodes such as $k2$ and $\sigma m$ cannot be updated by observing $A3$ data.

### Example of a Dynamic Bayesian Network.

This example applies the proposed method to a mathematical example of a dynamic BN, as shown in Fig. 11. The CPDs of this dynamic BN are as follows.

The root node $C0$ has an unknown true value to be calibrated, so that $C0$ is a static node and $C0t=C0t+1$. The prior distribution of $C0$ is $N(2,0.52)$. $C1$ and $C2$ are two dynamic state variables, and their states are to be tracked. At $t=1$ the CPD of the child node $C1$ is $C11\u223cN(C012+10,12)$; at $t>1$ the CPD of $C1$ is $C1t\u223cN(C0t2+0.9C1t\u22121+1,12)$, thus the distribution of $C1$ depends on its previous value and the value of $C0$. $C2$ is the child node of $C1$ and its CPD is $C2t\u223cN(C1t2,52)$. In this problem, the observation node is $C3$ and its CPD is $C3t\u223cN(C2t,(C2t/20)2)$, i.e., the value of $C2t$ plus a measurement error of zero mean Gaussian distribution. This example considers the first 30 steps of this dynamic BN. Assuming the true value of $C0$ is 2.5, the synthetic data of the observation node $C3$ is generated at each step, as shown in Fig. 12.

A widely used particle filter method named sequential importance resampling (SIR) algorithm [36] is applied in this example to track the state variables. Here a particle is a sample from the joint distribution of the state variables. This SIR algorithm propagates the particles of the posterior distribution at time step $t\u22121$ to time step $t$ to obtain the particles of the prior distribution of time step $t$. The likelihoods of these particles are calculated and normalized as the weights for them. Then the particles are resampled based on the weight terms and the resultant new particles represent the posterior joint distribution of the state variables in time step $t$. Details of this SIR algorithm can be found in Ref. [36] and the Appendix.

The number of particles in this example is 50,000. The mean value and 95% bounds of the posterior distributions of the state variables are shown in Fig. 13. The uncertainty of $C0$ reduces and its posterior distribution approximates to its true value 2.5, but this uncertainty reduction is not significant after step 20.

At each step, before the calculation of the likelihoods and the particle resampling, we apply the proposed method using the particles of the prior joint distribution of the state variables. The VRR of each state variable is predicted by the proposed method of Eq. (13) using the prior samples of the state variables. This VRR is also calculated by the prior/posterior samples at each step. Figure 14 shows that the results from these two methods are consistent so that the proposed method is verified. Note that the proposed method uses the prior samples and the observation data, while the sample-based method needs both the prior and posterior sample. In other words, the proposed method can be applied before Bayesian inference, but the sample-based method happens after the Bayesian inference has been done.

In this example, the CPD of $C0$ is $C0t=C0t+1$ so the uncertainty of $C0$ will not be enlarged in the propagation from time step $t\u22121$ to time step $t$. However, its uncertainty is reduced by the updating in each time step. Figure 13 shows that this uncertainty reduction is significant for the first five times steps so that the VRR in Fig. 14 has a large value before time step 5; this uncertainty reduction is negligible after time step 20 so that the value of VRR is closer to zero after time step 20.

In comparison, Fig. 13 shows that the uncertainty in the posterior distributions of $C1$ and $C2$ are increasing, even if their VRR values in Fig. 14 are always significant. The reason is that the uncertainties of $C1$ and $C2$ are enlarged in the propagation from time step $t\u22121$ to time step $t$, so their prior distributions at $t$ have more uncertainty than the posterior distribution at $t\u22121.$ The uncertainty in the prior at $t$ is reduced by the updating, but the posterior uncertainty at $t$ may not be smaller than the posterior uncertainty at $t\u22121$ if the uncertainty reduction by the updating cannot outperform the uncertainty enlargement by the propagation. Note that the VRR in Fig. 14 is the variance reduction with respect to the prior/posterior distribution at the same time step, not the variance reduction for adjacent posterior distributions.

In summary, this example extended the proposed sensitivity analysis to a dynamic BN and verified its validity. The proposed method is capable of predicting the variance reduction of each state variable before updating.

## Conclusion

In a BN, how a node of interest is affected by fixing another node at some value is of prominent interest. The proposed GSA for BN calculates the first-order Sobol’ index to quantity the sensitivity of the node of interest $XN$ with respect to an input/observation node $X1$. In forward propagation where $X1$ is an upstream node, a low index indicates that $XN$ is not sensitive to the value of $X1$ so that we can simply fix $X1$ at a deterministic value. In backward inference where $X1$ is a downstream node, a low sensitivity index indicates that the uncertainty of $XN$ cannot be reduced by observing $X1$; thus, we should observe another node of higher Sobol’ index in order to reduce its uncertainty.

The proposed GSA for BN is developed in two steps. First, an auxiliary variable method is used to convert the path between node $X1$ and node $XN$ to a deterministic function thus making the Sobol’ index computation feasible for a BN. If the path from $X1$ to $XN$ is not a directed path form, the theorem of arc reversal is used to transform it to the desired directed path so that the auxiliary variable method can still be used to build the deterministic function. Second, this paper uses an efficient algorithm to directly estimate the Sobol’ index from the samples of the prior distribution of the BN, so that the proposed GSA for the BN is computationally affordable. The resultant Sobol’ index is the average variance reduction ratio across all possible observations of $X1$. This paper also extend the algorithm to give a better estimate of the uncertainty reduction of the node of interest when the value of the observation is known, thus providing an informative guidance before the updating.

The limitation of the proposed method at present is that currently it only considers a single observation; thus, an extension to the case of multiple observations needs to be addressed in future work.

## Acknowledgment

The authors thank Dr. Zhen Hu for valuable discussions.

## Nomenclature

- $F\u22121(\u22c5)$ =
inverse CDF

- $k$ =
stiffnesses

- $PV1,PV2$ =
parent nodes of $V1,V2$

- $Si$ =
first-order Sobol’ index of $xi$

- $Ux$ =
auxiliary variable of $x$

- $V1,V2$ =
nodes in BN

- $xi$ =
a random variable

- $x\u2212i$ =
all random variables other than $xi$

- $Xj$ =
a node in the BN

- $\theta x$ =
distribution parameter of random variable $x$

- $\pi (\u22c5)$ =
probability density function

- $\Phi l$ =
a equally probably interval

## Funding Data

The Air Force Office of Scientific Research (Grant No. FA9550-15-1-0018).

### Appendix: Introduction to Particle Filter

Particle filter is a general algorithm to track the evolution of the state variables in a dynamic Bayesian network (DBN). In this section, capital letters denote random variables; lower-case letters denote particles, where the superscript $i$ indicates that it is the $i$th particle. The subscripts of letters indicate the time step. Thus, the state variables at time step $t$ are denoted as $Xt$, as shown in the simple DBN of Fig. 15.

where $nt\u2208\u211cn$ is the vector of noise terms in the measurement function.

where $\delta x0:ti$ is a delta function at $x0:ti$.

In other words, the new state $Xti$ of the $i$th particle at time step $t$ is sampled from a distribution which takes the current state $X0:t\u22121i$ and the observation $Z1:t$ as parameters.

In addition, the initial state $X0i$ are sampled from the joint prior distribution of the state variables, and the initial weight $\omega 0i$ for each particle is $1/N$.

In practice, iterations of Eqs. (A4) and (A5) over time step $t$ may lead to particle degeneracy problem, i.e., only a few particles have significant weights but most particles have negligible weights. In that case, we are assigning most computational efforts to the particles of nonsignificant contribution to the posterior distribution. This degeneracy problem can be solved by resampling, i.e., generating a new set of $N$ particles based on the particles of $Xt$. The new particles represent the same posterior distribution as the former particles.

The simplest strategy of resampling is generating new resampled particles based on the discrete approximation shown in Eq. (A3), and the weight of each new particle is set as $1/N$ again. This resampling is bootstrapping process of $N$ iterations, and each iteration selects one particle from current particles with replacement. In an iteration, the probability that a particle is selected is proportional to its weight.