We consider the utilization of a computational model to guide the optimal acquisition of experimental data to inform the stochastic description of model input parameters. Our formulation is based on the recently developed *consistent* Bayesian approach for solving stochastic inverse problems, which seeks a posterior probability density that is consistent with the model and the data in the sense that the push-forward of the posterior (through the computational model) matches the observed density on the observations almost everywhere. Given a set of potential observations, our optimal experimental design (OED) seeks the observation, or set of observations, that maximizes the expected information gain from the prior probability density on the model parameters. We discuss the characterization of the space of observed densities and a computationally efficient approach for rescaling observed densities to satisfy the fundamental assumptions of the consistent Bayesian approach. Numerical results are presented to compare our approach with existing OED methodologies using the classical/statistical Bayesian approach and to demonstrate our OED on a set of representative partial differential equations (PDE)-based models.

## Introduction

Experimental data are often used to infer valuable information about parameters for models of physical systems. However, the collection of experimental data can be costly and time consuming. For example, exploratory drilling can reveal valuable information about subsurface hydrocarbon reservoirs, but each well can cost upward of tens of millions of U.S. dollars. In such situations, we can only afford to gather some limited number of experimental data; however, not all experiments provide the same amount of information about the processes they are helping inform. Consequently, it is important to design experiments in an optimal way, i.e., to choose some limited number of experimental data to maximize the value of each experiment.

The first experimental design methods employed mainly heuristics, based on concepts such as space-filling and blocking, to select field experiments [1–5]. While these methods can perform well in some situations, these methods can be improved upon by incorporating any knowledge of the underlying physical processes being inferred or measured. Using physical models to guide, experiment selection has been shown to drastically improve the cost-effectiveness of experimental designs for a variety of models based on ordinary differential equations [6–8], partial differential equations (PDEs) [9] and differential algebraic equations [10]. When model observables are linear with respect to the model parameters, the alphabetic optimality criteria are often used [11–13]. For example, *A*-optimality to minimize the average variance of parameter estimates, *D*-optimality to maximize the differential Shannon entropy, or *G*-optimality to minimize the maximum variance of model predictions. These criteria have been developed in both Bayesian and non-Bayesian settings [11,12,14–17].

In this paper, we focus attention on Bayesian methods for optimal experimental design (OED) that can be applied to both linear and nonlinear models [18–20]. Specifically, we pursue OEDs, which are optimal for inferring model parameters on finite-dimensional spaces from experimental data observed at a set of sensor locations. In the context of OED for inference, analogs of the alphabetic criterion, for linear models, have also been applied to nonlinear models [9,14,21,22]. In certain situations, for example infinite-dimensional problems (random variables are random fields) or problems with computational expensive models, OED based upon linearizations of the model response, and Laplace (Gaussian) approximations of the posterior distribution have been necessary [14,21]. In other settings, non-Gaussian approximations of the posterior have also been pursued [17–20].

This manuscript presents a new approach for OED based upon *consistent* Bayesian inference, introduced in Ref. [23]. We adopt an approach for OED similar to the approach in Ref. [24] and seek an OED that maximizes the expected information gain from the prior to the posterior over the set of possible observational densities. Although our OED framework is Bayesian in nature, this approach is fundamentally different from the statistical Bayesian methods mentioned previously. The aforementioned Bayesian OED methods use what we will refer to as the classical/statistical Bayesian approach for stochastic inference (e.g., see Ref. [25]) to characterize posterior densities that reflect an assumed error model. In contrast, consistent Bayesian inference assumes a probability density on the observations is given and produces a posterior density that is consistent with the model and the data in the sense that the push-forward of the posterior (through the computational model) matches the observed density almost everywhere. We direct the interested reader to Ref. [23] for a discussion on the differences between the consistent and statistical Bayesian approaches. Consistent Bayesian inference has some connections with measure-theoretic inference [26], which was used for OED in Ref. [27], but the two approaches make different assumptions and therefore typically give different solutions to the stochastic inverse problem.

The consistent Bayesian approach is appealing for OED since it can be used in an offline-online mode. Consistent Bayesian inference requires an estimate of the push-forward of the prior, which although expensive can be computed offline or obtained from archival simulation data. Once the push-forward of the prior is constructed, the posterior density can be approximated cheaply. Moreover, this push-forward of the prior does not depend on the density on the observations, which enables a computationally efficient approach for solving multiple stochastic inverse problems for different densities on the observations. This can significantly reduce the cost of computing the expected information gain if the set of candidate observation is known a priori.

The main objectives in this paper are to derive an OED formulation using the consistent Bayesian framework and to present a computational strategy to estimate the expected information gained for an experimental design. The pursuit of a computationally efficient approach for coupling our OED method with continuous optimization techniques is an intriguing topic that we leave for future work. Here, we consider batch design over a discrete set of possible experiments. Batch design, also known as open-loop design, involves selecting a set of experiments concurrently such that the outcome of any experiment does not affect the selection of the other experiments. Such an approach is often necessary when one cannot wait for the results of one experiment before starting another, but is limited in terms of the number of observations we can consider.

The remainder of this paper is outlined as follows: In Sec. 2, we summarize the consistent Bayesian method for solving stochastic inverse problems. In Sec. 3, we discuss the information content of an experiment and present our OED formulation based upon expected information gain. During the process of defining the expected information gain of a given experimental design, care must be taken to ensure that the model can predict all of our potential observed data. In Sec. 4, we discuss situations for which this assumption is violated and means for avoiding these situations. Numerical examples are presented in Sec. 5 and concluding remarks are provided in Sec. 6.

## A Consistent Bayes Formulation for Stochastic Inverse Problems

We are interested in experimental designs, which are optimal for inferring model parameters from experimental data. Inferring model parameters for a single design and realization of experimental data is a fundamental component of producing such optimal designs. In this section, we summarize the consistent Bayes method for parametric inference, originally presented in Ref. [23]. Although Bayesian in nature, the consistent Bayesian approach differs significantly from its classical Bayesian counterpart [25,28,29], which was used for OED in Refs. [12,14,17,21], [24], and [30–32]. We refer the interested reader to Ref. [23] for a full discussion of these differences.

### Notation, Assumptions, and a Stochastic Inverse Problem.

Let *M*(*Y*, *λ*) denote a deterministic model with solution *Y* (*λ*) that is an implicit function of model parameters $\lambda \u2208\Lambda \u2282\mathbb{R}n$. The set Λ represents the largest physically meaningful domain of parameter values, and, for simplicity, we assume that Λ is compact. In practice, modelers are often only concerned with computing a relatively small set of quantities of interest (QOI), ${Qi(Y)}i=1m$, where each *Q _{i}* is a real-valued functional dependent on the model solution

*Y*. Since

*Y*is a function of parameters

*λ*, so are the QOI and we write

*Q*(

_{i}*λ*) to make this dependence explicit. Given a set of QOI, we define the QOI map $Q(\lambda ):=(Q1(\lambda ),\u2026,Qm(\lambda ))\u22a4:\Lambda \u2192D\u2282\mathbb{R}m$ where $D:=Q(\Lambda )$ denotes the range of the QOI map.

Assume $(\Lambda ,B\Lambda ,\mu \Lambda )$ and $(D,BD,\mu D)$ are measure spaces. We assume $B\Lambda $ and $BD$ are the Borel *σ*-algebras inherited from the metric topologies on $\mathbb{R}n$ and $\mathbb{R}m$, respectively. The measures *μ*_{Λ} and $\mu D$ are volume measures.

*Q*is at least piecewise smooth implying that

*Q*is a measurable map between the measurable spaces $(\Lambda ,B\Lambda )$ and $(D,BD)$. For any $A\u2208BD$, we then have

Furthermore, $B\u2286Q\u22121(Q(B))$ for any $B\u2208B\Lambda $, although in most cases, $B\u2260Q\u22121(Q(B))$ even when *n* = *m*.

*P*

_{Λ}, described as a probability density,

*π*

_{Λ}, such that the push-forward measure agrees with $PDobs$. We use $PDQ(P\Lambda )$ to denote the push-forward of

*P*

_{Λ}through

*Q*(

*λ*), i.e.,

for all $A\u2208BD$. Using this notation, a solution to the stochastic inverse problem is defined formally as follows:

*(Consistency). Given a probability measure*$PDobs$

*on*$(D,BD)$

*that is absolutely continuous with respect to*$\mu D$

*and admits a density*$\pi Dobs$

*, the stochastic inverse problem seeks a probability measure P*$(\Lambda ,B\Lambda )$

_{Λ}on*that is absolutely continuous with respect to μ*

_{Λ}and admits a probability density π_{Λ}such that the subsequent push-forward measure induced by the map, Q(λ), satisfies*for any*$A\u2208BD$*. We refer to any probability measure P _{Λ} that satisfies Eq.*

*(1)*

*as a consistent solution to the stochastic inverse problem.*

Clearly, a consistent solution may not be unique, i.e., there may be multiple probability measures that are consistent in the sense of Definition 1. This is analogous to a deterministic inverse problem where multiple sets of parameters may produce the observed data. A unique solution may be obtained by imposing additional constraints or structure on the stochastic inverse problem. In this paper, such structure is obtained by incorporating prior information to construct a unique Bayesian solution to the stochastic inverse problem.

### A Bayesian Solution to the Stochastic Inverse Problem.

Following the Bayesian philosophy [33], we introduce a prior probability measure $P\Lambda prior$ on $(\Lambda ,B\Lambda )$ that is absolutely continuous with respect to *μ*_{Λ} and admits a probability density $\pi \Lambda prior$. The prior probability measure encapsulates the existing knowledge about the uncertain parameters.

*Q*is at least measurable, then the prior probability measure on Λ, $P\Lambda prior$, and the map,

*Q*, induce a push-forward measure $PDQ(prior)$ on $D$, which is defined for all $A\u2208BD$

We note that if $\pi DQ(prior)=\pi Dobs$, i.e., if the prior solves the stochastic inverse problem, then the posterior density will be equal to the prior density.

It was recently shown in Ref. [23] that the posterior given by Eq. (3) defines a consistent probability measure using a *contour σ*-algebra. When interpreted as a particular iterated integral of Eq. (4), the posterior defines a probability measure on $(\Lambda ,B\Lambda )$ in the sense of Definition 1, i.e., the push-forward of the posterior matches the observed probability density. Approximating the posterior density using the consistent Bayesian approach only requires an approximation of the push-forward of the prior probability on the model parameters, which is fundamentally a forward propagation of uncertainty. While numerous approaches have been developed in recent years to improve the efficiency and accuracy of the forward propagation of uncertainty using computational models, in this paper, we only consider the most basic of methods, namely Monte Carlo sampling, to sample from the prior. We evaluate the computational model for each of the samples from the prior and use a standard Kernel density estimator (KDE) [34] to approximate the push-forward of the prior.

Given the approximation of the push-forward of the prior, we can evaluate the posterior at any point *λ* ∈Λ if we compute *Q*(*λ*). This provides several possibilities for interrogating the posterior. In Sec. 3.2, we compute *Q*(*λ*) on a uniform grid of points to visualize the posterior after we compute the push-forward of the prior. This does require additional model evaluations, but visualizing the posterior is rarely required and only useful for illustrative purposes in 1 or 2 dimensions. More often, we are interested in obtaining samples from the posterior. This is also demonstrated in Sec. 3.2 where the samples from the prior are either accepted or rejected using a standard rejection sampling procedure. For a given *λ*, we compute the ratio $\pi \Lambda post(\lambda )/(M\pi \Lambda prior(\lambda ))$, where *M* is an estimate of the maximum of the ratio over Λ, and compare this value with a sample, *η*, drawn from a uniform distribution on (0,1). If the ratio is larger than *η*, then we accept the sample. We apply the accept/reject algorithm to the samples from the prior and therefore the samples from the posterior are a subset of the samples used to compute the push-forward of the prior. Since we have already computed *Q*(*λ*) for each of these samples, the computational cost to select a subset of the samples for the posterior is minimal. However, in the context of OED, we are primarily interested in computing the information gained from the prior to the posterior, which only involves integrating with respect to the prior (see Sec. 3.1) and does not require additional model evaluations or rejection sampling.

In practice, we prefer to use data that are sensitive to the parameters since otherwise it is difficult to infer useful information about the uncertain parameters. Specifically, if *m* ≤ *n* and the Jacobian of *Q* is defined almost everywhere in Λ and is full rank almost everywhere, then the push-forward volume measure $\mu D$ is absolutely continuous with respect to the Lebesgue measure [23].

For the rest of this work, we maintain the following assumptions needed to produce a unique consistent solution to the stochastic inverse problem:

(A1) We have a mathematical model and a description of our prior knowledge about the model input parameters,

(A2) The data exhibit sensitivity to the parameters almost everywhere in Λ; hence, we use the Lebesgue measure

*μ*as the volume measure on the data space,(A3) The observed density is absolutely continuous with respect to the push-forward of the prior.

The assumption concerning the absolute continuity of the observed density with respect to the prior is essential to define a solution to the stochastic inverse problem [23]. While this assumption may appear rather abstract, it simply assures that the prior and the model can predict, with nonzero probability, any event that we have observed. Since the observed density and the model are assumed to be fixed, this is only an assumption on the prior.

In the remainder of this work, we focus on quantifying the value of these posterior densities. We use the Kullback–Leibler (KL) divergence [35,36] to measure the information gained about the parameters from the prior to the posterior. We compute the expected information gain of a given set of QOI (a given experimental design), and then determine the OED to deploy in the field.

## The Information Content of an Experiment

We are interested in finding the OED for inferring model input parameters. Conceptually, a design is informative if the posterior distribution of the model parameters is significantly different from the prior. To quantify the *information gain* of a design, we use the Kullback–Leibler divergence [36] as a measure of the difference between a prior and posterior distribution. While the KL divergence is by no means the only way to compare two probability densities, it does provide a reasonable measure of the information gained in the sense of Shannon information [37] and is commonly used in Bayesian OED [30]. In this section, we discuss how to compute the KL divergence and define our OED formulation based upon expected information gain over a specific space of possible observed densities.

### Information Gain: Kullback–Leibler Divergence.

*μ*

_{Λ}[23] and admits a probability density, $\pi \Lambda post$. The KL divergence of the posterior from the prior (information gain), denoted

*I*, is given by

_{Q}*I*is simply a function of the posterior

_{Q}*I*as a function of the observed density

_{Q}The observation that *I _{Q}* is a function of only $\pi Dobs$ allows us to define the expected information gain in Sec. 3.3 based on a specific space of observed densities.

*Q*(

*λ*) as follows:

where the second equality comes from a simple substitution using Eq. (4). Given a set of samples from the prior, we only need to compute the push-forward of the prior in the data space to approximate *I _{Q}*. This observation provides an efficient method for approximating

*I*given a high dimensional parameter space and a low dimensional data space. In fact, we found it convenient to use Eq. (8) whenever the prior is not uniform. In the consistent Bayesian formulation, we evaluate the model at the samples generated from the prior to estimate the push-forward of the prior. It is a computational advantage to also use these samples to integrate with respect to the prior rather than integrating with respect to the volume measure, which would require additional model evaluations.

_{Q}### A Motivating Nonlinear System.

The first QOI is the second component, i.e., *Q*_{1}(*λ*) = *x*_{2}(*λ*). The parameter ranges are given by *λ*_{1} ∈ [0.79, 0.99] and $\lambda 2\u2208[1\u22124.50.1,1+4.50.1]$, which are chosen as in Ref. [26] to induce an interesting variation in the QOI. We assume the observed density on *Q*_{1} is a truncated normal distribution with mean 0.3 and standard deviation of 0.01, see Fig. 1 (right).

We generate 40,000 samples from the uniform prior and use a Kernel density estimator to construct an approximation to the resulting push-forward density, see Fig. 1 (right). Then we use Eq. (4) to construct an approximation to the posterior density using the same 40,000 samples, see Fig. 1 (left), and a simple accept/reject algorithm to generate a set of samples from the posterior, see Fig. 1 (middle). We propagate this set of samples from the posterior through the model and approximate the resulting push-forward of the posterior density using a KDE. In Fig. 1 (right) we see that the push-forward of the posterior agrees quite well with the observed density. Notice the support of the posterior lies in a relatively small region of the parameter space. The information gain from this posterior is $IQ1(\pi Dobs)\u22482.015$.

We assume the observed density on *Q*_{2} is a truncated normal distribution with mean 1.015 and standard deviation of 0.01. We approximate the push-forward density and the posterior using the same 40,000 samples and again generate a set of samples from the posterior and propagate these samples through the model to approximate the push-forward of the posterior, see Fig. 2.

Although both *Q*_{1} and *Q*_{2} have the same standard deviation in their observed densities, clearly the two QOI produce very different posterior densities. The posterior corresponding to data from *Q*_{2} has a much larger region of support within the parameter space compared to that of the posterior corresponding to *Q*_{1}. This is quantified with the information gain from this posterior $IQ2(\pi Dobs)\u22480.466$. Given these two maps, *Q*_{1} and *Q*_{2}, and the specified observed data on each of these data spaces, the data *Q*_{1} is more *informative* of the parameters than the data *Q*_{2}.

Next, we consider using the data from both *Q*_{1} and *Q*_{2}, *Q*: Λ → (*Q*_{1}, *Q*_{2}), with the same means and standard deviations as specified earlier. Again, we approximate the push-forward density and the posterior using the same 40,000 samples, see Fig. 3. With the information from both *Q*_{1} and *Q*_{2}, we see a substantial decrease in the support of the posterior density. Intuitively, the support of the posterior using both *Q*_{1} and *Q*_{2} is the support of the posterior using *Q*_{1} intersected with the support of the posterior using *Q*_{2}. This is quantified in the information gain of this posterior $IQ(\pi Dobs)\u22482.98$.

In the scenario in which we can afford to gather data on both *Q*_{1} and *Q*_{2}, we benefit greatly in terms of reducing the uncertainties on the model input parameters. However, suppose we could only afford to gather one of these QOI in the field. Based on the information gained from each posterior, *Q*_{1} is more informative about the parameters than *Q*_{2}. However, consider a scenario in which the observed data have different means in both *Q*_{1} and *Q*_{2}. Due to the nonlinearities of the maps, it is not necessarily true that *Q*_{1} is still more informative than *Q*_{2}. If we do not know the mean of the data for either *Q*_{1} or *Q*_{2}, then we want to determine which of these QOI we *expect* to produce the most informative posterior.

### Expected Information Gain.

Optimal experimental design must select a design before experimental data become available. In the absence of data, we use the simulation model to quantify the *expected* information gain of a given experimental design. Let $O$ denote the space of densities over $D$. We want to define the expected information gain as some kind of average over this density space in a meaningful way. However, this is far too general of a space to use to define the expected information gain. This space includes densities that are unlikely to be observed in reality. Therefore, we restrict $O$ to be a space more representative of densities that may be observed in reality.

where $N\u0302(q,\sigma 2)$ is a truncated Gaussian function with mean *q* and standard deviation *σ*. More details of this definition of $OD$ are addressed in Sec. 4. We can easily generalize our description of $O$. For example, we could also consider the standard deviation of the observed data to be uncertain, in which case we would also average over some interval of possible values for *σ*. However, in this work, we only vary the center of the Gaussian densities.

*Remark 1*. We can restrict $O$ in other ways as well. For example, if we expect the uncertainty in each QOI to be described by a uniform density, then we define the restriction on $O$ accordingly. This choice of characterization of the observed density space is largely dependent on the application. The only limitation is that we require the measure specified on the observed density space to be defined in terms of the push-forward measure, $PDQ(prior)$, as described later. In Sec. 5.2, we describe one approach for defining a restricted observed density space where the observed density of each QOI has a Gaussian profile and the standard deviations are functions of the magnitudes of each QOI.

*Q*, the model informs us that some data are more likely to be observed than other data; this is seen in the plot of $\pi DQ(prior)$ in Fig. 3 (upper left). This implies we do not want to average over $D$ with respect to

*μ*or $\mu D$, but rather with respect to the push-forward of the prior on

*D*, $PDQ(prior)$. This respects the prior knowledge of the parameters and the sensitivity information provided by the model. We define the

*expected information gain*, denoted

*E*(

*I*), as just described

_{Q}*I*itself is defined in terms of an integral. The expanded form for

_{Q}*E*(

*I*) is then an iterated integral

_{Q}where we make explicit that $\pi \Lambda post$ is a function of the observed density and, by our restriction of the space of observed densities in Eq. (9), therefore a function of $q\u2208D$. We utilize Monte Carlo sampling to approximate the integral in Eq. (10) as described in Algorithm 1.

1. Given a set of samples from the prior density: $\lambda (i),\u2009i=1,\u2026,N$; |

2. Given a set of samples from the push-forward density: $q(j)=Q(\lambda (j)),\u2009j=1,\u2026,N$ ; |

3. Construct an observed density centered at each q^{(}^{j}^{)}. |

4. For j = 1,…, M approximate I(_{Q}q^{(}^{j}^{)}) using Eq. (8) |

$IQ(q(j))\u2248\mu \Lambda (\Lambda )N\u2211i=1N\pi Dobs(Q(\lambda (i)))\pi DQ(prior)(Q(\lambda (i)))\u2009log(\pi Dobs(Q(\lambda (i)))\pi DQ(prior)(Q(\lambda (i))))$ |

5. Compute $E(IQ)\u2248(1/M)\u2211j=1MIQ(q(j))$; |

1. Given a set of samples from the prior density: $\lambda (i),\u2009i=1,\u2026,N$; |

2. Given a set of samples from the push-forward density: $q(j)=Q(\lambda (j)),\u2009j=1,\u2026,N$ ; |

3. Construct an observed density centered at each q^{(}^{j}^{)}. |

4. For j = 1,…, M approximate I(_{Q}q^{(}^{j}^{)}) using Eq. (8) |

$IQ(q(j))\u2248\mu \Lambda (\Lambda )N\u2211i=1N\pi Dobs(Q(\lambda (i)))\pi DQ(prior)(Q(\lambda (i)))\u2009log(\pi Dobs(Q(\lambda (i)))\pi DQ(prior)(Q(\lambda (i))))$ |

5. Compute $E(IQ)\u2248(1/M)\u2211j=1MIQ(q(j))$; |

Design location | 50 | 200 | 1000 | 5000 |
---|---|---|---|---|

(0.558, 0.571) | 2.758 | 2.767 | 2.815 | 2.826 |

(0.561, 0.546) | 2.752 | 2.762 | 2.809 | 2.820 |

(0.582, 0.574) | 2.729 | 2.736 | 2.782 | 2.793 |

(0.549, 0.570) | 2.728 | 2.735 | 2.781 | 2.792 |

(0.593, 0.596) | 2.726 | 2.733 | 2.779 | 2.790 |

Design location | 50 | 200 | 1000 | 5000 |
---|---|---|---|---|

(0.558, 0.571) | 2.758 | 2.767 | 2.815 | 2.826 |

(0.561, 0.546) | 2.752 | 2.762 | 2.809 | 2.820 |

(0.582, 0.574) | 2.729 | 2.736 | 2.782 | 2.793 |

(0.549, 0.570) | 2.728 | 2.735 | 2.781 | 2.792 |

(0.593, 0.596) | 2.726 | 2.733 | 2.779 | 2.790 |

Algorithm 1 appears to be a computationally expensive procedure since it requires solving *M* stochastic inverse problems and, as noted in Ref. [23], approximating $\pi DQ(prior)$ can be expensive. In Ref. [23] and in this paper, we use kernel density estimation techniques to approximate $\pi DQ(prior)$, which does not scale well as the dimension of $D$ increases [34]. On the other hand, for a given experimental design, we only need to compute this approximation once, as each *I _{Q}* in step 4 of Algorithm 1 is computed using the same prior and map

*Q*and, therefore, the same $\pi DQ(prior)$. In other words, the fact that the consistent Bayes method only requires approximating the push-forward of the prior implies that this information can be used to approximate posteriors for different observed densities without requiring additional model evaluations. This significantly improves the computational efficiency of the consistent Bayesian approach in the context of OED. We leverage this computational advantage throughout this paper by considering a discrete set of designs, which allows us to compute the push-forward for all of the candidate designs simultaneously. Utilizing a continuous design space might require computing the push-forward for each iteration of the optimization algorithm, since the designs (locations of the observations) are not known a priori. The additional model simulations required to compute the push-forward of the prior at new design points might be intractable if the number of iterations is large; however, the need for new simulations may be avoided if the new observations can be extracted from archived state-space data. For example, if one stores the finite element solutions of a PDE at all samples of the prior at the first iteration of the design optimization, one can evaluate observations at new design locations, which are functionals of this PDE solution, via interpolation using the finite element basis.

### Defining the Optimal Experimental Design.

We are now in a position to define our OED formulation. Recall that our experimental design is defined as the set of QOI computed from the model and we seek the optimal set of QOI to deploy in the field. Given a physics-based model, prior information on the model parameters, a space of potential experimental designs, and a generic description of the uncertainties for each QOI, we define our OED as follows.

*(OED). Let*$Q$

*represent the design space, i.e., the space of all possible experimental designs, and*$Qz\u2208Q$

*be a specific design. Then, the OED is*$Qz\u2208Q$

*that maximizes the expected information gain*

As previously mentioned, the focus in this paper is on the utilization of the consistent Bayesian methodology within the OED framework, so we do not explore different approaches for solving the optimization problem given by Definition 2 and simply find the optimal design over a discrete set of candidate designs.

*Remark 2.* Consistent Bayesian inference is potentially well suited to finding OED in continuous design spaces. Typically, OED based upon statistical Bayesian methods uses Markov chain Monte Carlo methods to characterize the posterior distribution. Markov chain Monte Carlo methods do not provide a functional form for the posterior but rather only provide samples from the posterior. Consequently, gradient-free or stochastic gradient-based optimization methods must be used to find the optimal design. In contrast, consistent Bayesian inference provides a functional form for the posterior, which allows the use of more efficient gradient-based optimizers. Exploring the use of more efficient continuous optimization procedures will be the subject of future work.

## Infeasible Data

The OED procedure proposed in this manuscript is based upon consistent Bayesian inference, which requires that the observed measure is absolutely continuous with respect to the push-forward measure induced by the prior and the model (assumption A3). In other words, any event that we observe with nonzero probability will be predicted using the model and prior with nonzero probability. During the process of computing *E*(*I _{Q}*), it is possible that we violate this assumption. Specifically, depending on the mean and variance of the observational density, we may encounter $\pi Dobs\u2208OD$ such that $\u222bD\pi Dobsd\mu <1$, i.e., support of $\pi Dobs$ extends beyond the range of the map

*Q*, see Fig. 4 (upper right). In this section, we discuss the causes of infeasible data and options for avoiding infeasible data when estimating an optimal experimental design.

### Infeasible Data and Consistent Bayesian Inference.

When inferring model parameters using consistent Bayesian inference, the most common cause for infeasible data is that the model being used to estimate the OED is inadequate. That is, the deviation between the computational model and reality is large enough to prohibit the model from predicting all of the observational data. The deviation between the model prediction and the observational data is often referred to as model structure error and can often be a major source of uncertainty. This is an issue of most if not all inverse parameter estimation problems [29]. Recently, there has been a number of attempts to quantify this error (e.g., see Ref. [28]); however, such approaches are beyond the scope of this paper. In the following, we will assume that the model structure error does not prevent the model from predicting all the observational data.

### Infeasible Data and Optimal Experimental Design.

To estimate an approximate OED, we must quantify the *expected* information gain of a given experimental design (see Sec. 3.3). The expectation is over all possible normal observation densities with mean $q\u2208D$ and variance *σ*, defined by the space (9). When the support of $D$ is bounded, these densities may produce infeasible data. The effect of this violation increases as *q* approaches the boundary of $D$.

*q*and standard deviation

*σ*, and

*C*is the integral of $N\u0302(q,\sigma 2)$ over $D$ with respect to the Lebesgue measure on $D$

_{q}A similar approach for normalizing Gaussian densities over compact domains was taken in Ref. [38].

### A Nonlinear Model With Infeasible Data.

In this section, we use the nonlinear model introduced in Sec. 3.2 to demonstrate that infeasible data can arise from relatively benign assumptions. Suppose the observed density on *Q*_{1} is a truncated normal distribution with mean 0.3 and standard deviation of 0.04. In this one-dimensional (1D) data space, this observed density is absolutely continuous with respect to the push-forward of the prior on *Q*_{1}, see Fig. 5 (left). Next, suppose the observed density on *Q*_{2} is a truncated normal distribution with mean 0.982 and standard deviation of 0.01. Again, in this new one-dimensional data space, this observed density is absolutely continuous with respect to the push-forward of the prior on *Q*_{2}, see Fig. 5 (right). Both of these observe densities are dominated by their corresponding push-forward densities, i.e., the model can reach all of the observed data in each case.

However, consider the data space defined by *both Q*_{1} and *Q*_{2} and the corresponding push-forward and observed densities on this space, see Fig. 4. The nonrectangular shape of the combined data space is induced by the nonlinearity in the model and the correlations between *Q*_{1} and *Q*_{2}. As we see in Fig. 4, the observed density using the product of the one-dimensional Gaussian densities is *not* absolutely continuous with respect to the push-forward density on (*Q*_{1}, *Q*_{2}), i.e., the support of $\pi Dobs$ extends beyond the support of $\pi DQ(prior)$. Referring to Eq. (13), we normalize this observed density over $D$, see Fig. 4 (right). Now that the new observed density obeys the assumptions needed, we could solve the stochastic inverse problem as described in Sec. 2.

### Computational Considerations.

The main computational challenge in the consistent Bayesian approach is the approximation of the push-forward of the prior. Following Ref. [23], we use Monte Carlo sampling for the forward propagation of uncertainty. While the rate of convergence is independent of the number of parameters (dimension of Λ), the accuracy in the statistics for the QOI may be relatively poor unless a large number of samples can be taken. Alternative approaches based on surrogate models can significantly improve the accuracy, but are generally limited to small number of parameters. We also employ kernel density estimation techniques to construct a nonparametric approximation of the push-forward density, but it is well known that these techniques do not scale well with the number of observations (dimension of $D$) [34].

*π*to indicate this function does not integrate to 1 because we have violated (A3)) over Λ. Although Λ may not always be a generalized rectangle, (A1) implies we have a clear definition of Λ and therefore can efficiently integrate $\pi \u0303\Lambda post$ over Λ and then normalize $\pi \u0303\Lambda post$ by

Thus, we can use the values of $\pi Dobs$ and $\pi DQ(prior)$ computed for the samples generated from the prior, which were used to estimate the push-forward of prior, to integrate $\pi Dobs/\pi DQ(prior)$ with respect to the prior.

## Numerical Examples

In this section, we consider several models of physical systems. First, we consider a stationary convection–diffusion model with a single uncertain parameter controlling the magnitude of the source term. Next, we consider a transient transport model with a two-dimensional parameter space determining the location of the source of a contaminant. Then, we consider an inclusion problem in computational mechanics where two uncertain parameters control the shape of the inclusion. Finally, we consider a high-dimensional example of single-phase incompressible flow in porous media where the uncertain permeability field is given by a Karhunen–Loéve expansion [39].

In each example, we have a parameter space Λ, a set of possible QOI, and a specified number of QOI we can afford to gather during the experiment. This in turn defines a design space $Q$ and we let $Qz\u2208Q$ represent a single experimental design and $Dz=Qz(\Lambda )$ the corresponding data space. For each experimental design, we let *σ ^{z}* represent the standard deviations defined by the uncertainties in each QOI that compose

*Q*and $ODz$ representing the observed density space.

^{z}All of these examples have continuous design spaces, so we approximate the OED by selecting the OED from a large set of candidate designs. This approach was chosen because it is much more efficient to perform the forward propagation of uncertainty using random sampling only once and to compute all of the candidate measurements for each of these random samples. Alternatively, one could pursue a continuous optimization formulation, which would require a full forward propagation of uncertainty for each new design. As mentioned in Sec. 3.4, one could limit the number of designs using a gradient-based or Newton-based optimization approach, but this is beyond the scope of this paper.

### Stationary Convection–Diffusion: Uncertain Source Amplitude.

In this section, we consider a convection–diffusion problem with a single uncertain parameter controlling the magnitude of a source term. This example serves to demonstrate that the OED formulation gives intuitive results for simple problems.

#### Problem Setup.

where Ω = [0, 1]^{2}, *u* is the concentration field, the diffusion coefficient *D* = 0.01, the convection vector * v* = [1, 1], and

*S*is a Gaussian source with the following parameters:

*x*

_{src}is the location,

*A*is the amplitude,

*h*is the width. We impose homogeneous Neumann boundary conditions on Γ

*(right and top boundaries) and homogeneous Dirichlet conditions on Γ*

_{N}*(left and bottom boundaries). For this problem, we choose*

_{D}*x*

_{src}= [0.5, 0.5], and

*h*= 0.05. We let

*A*be uncertain within [50, 150]; thus, the parameter space for this problem is Λ = [50, 150]. Hence, our goal is to gather some limited amount of data that provide the best information about the amplitude of the source, i.e., reduces our uncertainty in

*A*. To approximate solutions to the PDE in Eq. (18), given a source amplitude

*A*, we use a finite element discretization with continuous piecewise bilinear basis functions defined on a uniform (25 × 25) spatial grid.

#### Results.

We assume that we have limited resources for gathering experimental data; specifically, we can only afford to place one sensor in the domain to gather a single concentration measurement. Our goal is to place this single sensor in Ω to maximize the expected information gained about the amplitude of the source. We discretize Ω using 2000 uniform random points, which produces a design space with 2000 possible experimental designs. For this problem, we let the uncertainty in each QOI be described by a truncated Gaussian profile with a fixed standard deviation of 0.1. This produces observed density spaces, $ODz$, as described in Eq. (13).

We generate 5000 uniform samples from the prior and simulate measurements of each QOI for each of these 5000 samples. We consider approximate solutions to the OED problem using subsets of the 5000 samples of size 50, 200, 1000, and 5000. For each experimental design, we calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 6. Notice the expected information gain is greatest near the center of the domain (near the location of the source) and in the direction of the convection vector away from the source. This result matches intuition, as we expect data gathered in regions of the domain that exhibit sensitivity to the parameters to produce high expected information gains.

We note that, for this example, a sufficiently accurate approximation to the design space and the OED is obtained using only 50 samples corresponding to 50 model evaluations. In Table 1, we show the top five experimental designs (computed using the full set of 5000 samples) and corresponding $E(IQz)$ for each set of samples.

### Time-Dependent Diffusion: Uncertain Source Location.

In this section, we compare results from a statistical Bayesian formulation of OED to the formulation described in this paper. Specifically, we consider the model in Ref. [24] where the author uses a classical Bayesian framework for OED to determine the optimal placement of a single sensor that maximizes the expected information about the location of a contaminant source.

#### Problem Setup.

where Ω = [0, 1]^{2}, *u* is the space–time concentration field, we impose homogeneous Neumann boundary conditions along with a zero initial condition, and *S* is a Gaussian source with the following parameters: *x*_{src} is the location, *s* is the intensity, *h* is the width, and *τ* is the shutoff time.

Our goal is to gather some limited amount of data that provide the best information about the location of the source, i.e., reduce our uncertainty in *x*_{src}. For this problem, we choose *s* = 2.0, *h* = 0.05, and *τ* = 0.3 and let *x*_{src} be uncertain within [0, 1]^{2} such that Λ = [0, 1]^{2}. To approximate solutions to the PDE in Eq. (19) given a location of *S*, i.e., a given *x*_{src}, we use a finite element discretization with continuous piecewise bilinear basis functions defined on a uniform (25 × 25) spatial grid and backward Euler time integration with a step size Δ*t* = 0.004 (100 time steps).

#### Results.

*t*= 0.24. Our goal is to place this single sensor in Ω to maximize the expected information gained about the location of the contaminant source. For simplicity, we discretize Ω using an 11 × 11 regular grid of points, which produces a design space with 121 possible experimental designs. We let the uncertainty in each QOI be described by a Gaussian profile with a standard deviation that is a function of the magnitude of the QOI, i.e.,

*M*is the dimension of the data space. This produces observed density spaces, $ODz$, that consist of truncated Gaussian functions with varying standard deviations

We generate 5000 uniform samples from the prior and simulate measurements of each QOI for each of these 5000 samples. We consider approximate solutions to the OED problem using subsets of the 5000 samples of size 50, 200, 1000, and 5000. For each experimental design, we use these data to calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 7. Notice the expected information gain is greatest near the corners of the domain and smallest near the center; this is consistent with Ref. [24]. In Table 2, we show the top five experimental designs, approximated using the full set of 5000 samples, and corresponding $E(IQz)$ for each set of samples.

Design location | 50 | 200 | 1000 | 5000 |
---|---|---|---|---|

(0, 0) | 0.687 | 0.828 | 0.738 | 0.741 |

(1, 1) | 0.653 | 0.713 | 0.747 | 0.740 |

(0, 0.1) | 0.687 | 0.817 | 0.733 | 0.736 |

(0.1, 0) | 0.687 | 0.810 | 0.728 | 0.735 |

(1, 0.9) | 0.648 | 0.713 | 0.742 | 0.735 |

Design location | 50 | 200 | 1000 | 5000 |
---|---|---|---|---|

(0, 0) | 0.687 | 0.828 | 0.738 | 0.741 |

(1, 1) | 0.653 | 0.713 | 0.747 | 0.740 |

(0, 0.1) | 0.687 | 0.817 | 0.733 | 0.736 |

(0.1, 0) | 0.687 | 0.810 | 0.728 | 0.735 |

(1, 0.9) | 0.648 | 0.713 | 0.742 | 0.735 |

In Fig. 8 we consider three different posteriors computed using data from the OED approximated using 5000 samples, i.e., data gathered by a sensor placed in the bottom left corner of the domain, where each posterior corresponds to a different possible location of the source. We see varying levels of information gain in these three scenarios, reiterating the point that we choose the OED based on the average of these information gains, *E*(*I _{Q}*).

*Remark 3.* Although many of the results in this section seem to match our intuition about which measurement locations should produce high expected information gains, this may not always be the case. In particular, we have found that our results can depend on our choice of the variance in the observed densities *σ*. If *σ* is chosen to be large relative to the range of a data space, then the posteriors produced as we average over $OD$ are all nearly the same and potentially produce unusually high information gains when the observed densities have substantial support over regions of the data space with very small probability (very small values of the push-forward of the prior). Another way to think of this is the push-forward densities have high entropy and because *σ* is large, $\pi Dobs$ is very close to uniform and this produces posterior densities with high information gains. If *σ* is chosen to be small relative to the range of the data space, i.e., if we expect the experiments to be informative, we do not encounter this issue because we are integrating over $D$ with respect to the push-forward measure so most of our potential observed data lie in high probability regions of the data space.

### A Parameterized Inclusion.

In this section, we consider a simple problem in computational mechanics where the precise boundary of an inclusion is uncertain. We parameterize the inclusion and seek to determine the location to place a sensor that will maximize the information gained regarding the shape of the inclusion. We use a linear elastic formulation to model the response of the media to surface forces and measure horizontal stress at each sensor location. We assume that the material properties (Poisson ratio and Young's modulus) are different inside the inclusion and that these properties are known a prior (Fig. 9).

#### Problem Setup.

*λ*and

*μ*, which are related to the Poisson ratio,

*ν*, and Young's modulus,

*E*, via the following expressions:

*x*

_{0}=

*y*

_{0}= 0 and

*α*is uniformly distributed on [0.5, 1] and

*β*is uniformly distributed on [0.25, 0.5]. The material properties are assumed to be known and are given by

These material properties were not chosen to emulate any particular materials, just to demonstrate the proposed OED formulation.

Next, let us impose homogeneous Dirichlet boundary conditions on the bottom boundary and stress-free boundary conditions on the sides, and impose a uniform traction in the *y*-direction along the top boundary ($ttop=(0,\u22121)T$). And finally assume that we can probe the media and measure the horizontal stress at a given sensor location. We do not want to puncture the inclusion, so we only consider sensor locations outside the bounds on the inclusion. Equation (22) was solved using a finite element discretization with piecewise linear basis functions defined on a uniform 400 × 80 mesh resulting in a system with 64,962 degrees-of-freedom. The computational model is implemented using the Trilinos toolkit [40] and each realization of the model requires approximately 1 s using eight processors.

#### Results.

As in previous examples, we assume that we have limited resources for gathering experimental data; specifically, we can only afford to place one sensor in the domain to gather a single stress measurement. Our goal is to place this single sensor to maximize the expected information gained about the shape of the inclusion. We select 2000 random sensor locations (outside the inclusion bounds), which produce a design space with 2000 possible experimental designs. For this problem, we let the probability density for the QOI be described by a truncated Gaussian profile with a fixed standard deviation of 0.001. We generate 1000 uniform samples from the prior and compute the horizontal stress at each sensor location for each of these 1000 samples.

First, we compare the posterior densities for two sensor locations, (3.5294, 1.3049) and (1.3902, 1.2100), under the assumption that we have already gathered data at these sensor locations. The purpose here is to demonstrate that we obtain different posterior densities and therefore gain different information from each sensor. The first sensor is further from the inclusion; so we expect that the data from the second sensor will constrain the posterior more than the data from the first. In Figs. 10 and 11, we plot the samples from the posterior and the corresponding kernel density estimate of the posterior for the first and second sensor locations, respectively. It is clear that measuring the horizontal stress closer to the inclusion increases the information gained from the prior to the posterior.

We consider approximate solutions to the OED problem using subsets of the 1000 samples of size 10, 50, 100, and 1000. For each experimental design, we use this data to calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 12. Notice the expected information gain is greatest near the bottom of the domain near the inclusion and is reasonably symmetric around the inclusion. Also, note that the expected information gain is relatively large in the bottom corners of the domain. This is due to the choice of boundary conditions for the model, which induces a large amount of stress in these corners. In Table 3, we show the top five experimental designs, approximated using the full set of 1000 samples, and corresponding $E(IQz)$ for each set of samples.

### A Higher-Dimensional Porous Media Example With Uncertain Permeability.

In this section, we consider an example of single-phase incompressible flow in porous media with a Karhunen–Loéve expansion of the uncertain permeability field. The purpose of this example is to demonstrate the OED formulation on a problem with a high-dimensional parameter space and more than one sensor.

#### Problem Setup.

*p*is the pressure field and

*K*is the permeability field, which we assume is a scalar field given by a Karhunen–Loéve expansion of the log transformation, $Y=log\u2009K$, with

*ξ*are mutually uncorrelated random variables with zero mean and unit variance [41,42]. The eigenvalues,

_{i}*η*, and eigenfunctions,

_{i}*f*, are computed numerically using the following covariance function:

_{i}where *σ _{Y}* and

*η*denote the variance and correlation length in the

_{i}*i*th spatial direction, respectively. We assume a correlation length of 0.01 in each spatial direction and truncate the expansion at 100 terms. This choice of truncation is purely for the sake of demonstration. In practice, the expansion is truncated once a sufficient fraction of the energy in the eigenvalues is retained [39,41]. This truncation gives 100 uncorrelated random variables,

*ξ*

_{1},…,

*ξ*

_{100}, with zero mean and unit variance which implies $\Lambda =\mathbb{R}100$. To approximate solutions to the PDE in Eq. (23), we use a finite element discretization with continuous piecewise bilinear basis functions defined on a uniform (50 × 50) spatial grid.

#### Results.

In this section, we present approximate solutions to several different design problems. We begin with the familiar problem of choosing a single sensor location within the physical domain. Then, we consider approximating the optimal location of a second sensor *given* the location of the first sensor. In this way, we solve the *greedy* OED problem and determine the greedy optimal locations of 1–8 sensors within the physical domain. We then consider solving the exhaustive OED problem where we limit the sensors to 25 locations and consider determining the optimal location of five available sensors.

First, assume that we have limited resources for gathering experimental data, specifically, we can only afford to place one sensor in the domain to gather a single pressure measurement. Our goal is to place this single sensor in Ω to maximize the expected information gained about the amplitude of the source. We discretize Ω using 1301 points on a grid, which produces a design space with 1301 possible experimental designs. For this problem, we let the uncertainty in each QOI be described by a truncated Gaussian profile with a fixed standard deviation of 0.01. This produces observed density spaces, $ODz$, as described in Eq. (13).

We generate 10,000 samples from the prior and simulate measurements of each QOI. We consider approximate solutions to the OED problem using subsets of the 10,000 samples of size 50, 100, 1000, and 10,000. For each experimental design, we calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 13. Notice the expected information gain is greatest near the top and bottom of the domain away from the left and right edges. This result matches intuition, as we expect data gathered near the left and right edges to be less informative given the Dirichlet boundary condition imposed on those boundaries. We note that, for this example, a sufficiently accurate approximation to the design space and the OED is obtained using only 1000 samples corresponding to 1000 model evaluations. In Table 4, we show the top five experimental designs (computed using the full set of 10,000 samples) and corresponding $E(IQz)$ for each set of samples.

Next, we consider the greedy OED problem of placing eight sensors within the physical domain. We choose to use all of the available 10,000 samples to solve this problem. In Fig. 14, we see the design space as a function of the previously determined locations of placed sensors. We observe a strong symmetry to this problem, as is expected due to the symmetry of the physical process defined on this domain with the given boundary conditions. In the bottom right of Fig. 14, notice the very small range of the expected information gained indicating the possible values of the expected information gain. This suggests, for this example, we expect there is a limit on the number of useful sensor locations for informing likely parameter values.

Finally, we consider the exhaustive OED problem of placing five sensors within the physical domain and, for computational feasibility, restrict the possible locations of these five sensors to 25 points in the physical domain, see Fig. 15. We choose to use 1000 samples to solve this problem. In Fig. 15, we plot the design space for a single sensor location using these 1000 samples and show the optimal location of 1, 2, 3, 4, and 5 sensors. The results are quite similar to the greedy results previously described.

Design location | 10 | 50 | 100 | 1000 |
---|---|---|---|---|

(−1.001, 0.961) | 2.303 | 3.662 | 4.085 | 4.569 |

(−1.050, 1.035) | 2.303 | 3.391 | 3.811 | 4.261 |

(1.041, 0.959) | 2.303 | 3.437 | 3.839 | 4.256 |

(−1.050, 1.130) | 2.302 | 3.418 | 3.661 | 4.096 |

(1.005, 0.870) | 2.277 | 3.246 | 3.544 | 3.813 |

Design location | 10 | 50 | 100 | 1000 |
---|---|---|---|---|

(−1.001, 0.961) | 2.303 | 3.662 | 4.085 | 4.569 |

(−1.050, 1.035) | 2.303 | 3.391 | 3.811 | 4.261 |

(1.041, 0.959) | 2.303 | 3.437 | 3.839 | 4.256 |

(−1.050, 1.130) | 2.302 | 3.418 | 3.661 | 4.096 |

(1.005, 0.870) | 2.277 | 3.246 | 3.544 | 3.813 |

Design location | 50 | 100 | 1000 | 10,000 |
---|---|---|---|---|

(0.48, 1) | 1.966 | 2.001 | 1.992 | 2.008 |

(0.5, 0.98) | 1.952 | 1.963 | 1.993 | 2.007 |

(0.46, 0.98) | 1.930 | 1.985 | 1.986 | 2.006 |

(0.52, 0) | 1.777 | 1.915 | 1.999 | 2.006 |

(0.56, 0) | 1.751 | 1.863 | 1.996 | 2.006 |

Design location | 50 | 100 | 1000 | 10,000 |
---|---|---|---|---|

(0.48, 1) | 1.966 | 2.001 | 1.992 | 2.008 |

(0.5, 0.98) | 1.952 | 1.963 | 1.993 | 2.007 |

(0.46, 0.98) | 1.930 | 1.985 | 1.986 | 2.006 |

(0.52, 0) | 1.777 | 1.915 | 1.999 | 2.006 |

(0.56, 0) | 1.751 | 1.863 | 1.996 | 2.006 |

## Conclusion

In this manuscript, we developed an OED formulation based on the recently developed *consistent* Bayesian approach for solving stochastic inverse problems. We used the Kullback–Leibler divergence and the posterior obtained using consistent Bayesian inference to measure the information gain of a design and present a discrete optimization procedure for choosing the optimal experimental design that maximizes the expected information gain. The optimization procedure presented in this paper is limited in terms of the number of observations we can consider, but was chosen to focus attention on the definition and approximation of the expected information gained. More efficient strategies, utilizing gradient-based methods on continuous design spaces, will be pursued in a future work. We discussed a characterization of the space of observed densities needed to compute the expected information gain and a computationally efficient approach for rescaling observed densities to satisfy the requirements of the consistent Bayesian approach. Numerical examples were given to highlight the properties and utility of our approach.

## Acknowledgment

J.D. Jakeman's work was supported by DARPA EQUIPS. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under Contract No. DE-NA-0003525.

## Funding Data

Defense Advanced Research Projects Agency (Grant No. EQUiPS).