Abstract
Bayesian optimization (BO) is a sequential optimization strategy that is increasingly employed in a wide range of areas such as materials design. In real-world applications, acquiring high-fidelity (HF) data through physical experiments or HF simulations is the major cost component of BO. To alleviate this bottleneck, multi-fidelity (MF) methods are used to forgo the sole reliance on the expensive HF data and reduce the sampling costs by querying inexpensive low-fidelity (LF) sources whose data are correlated with HF samples. However, existing multi-fidelity BO (MFBO) methods operate under the following two assumptions that rarely hold in practical applications: (1) LF sources provide data that are well correlated with the HF data on a global scale, and (2) a single random process can model the noise in the MF data. These assumptions dramatically reduce the performance of MFBO when LF sources are only locally correlated with the HF source or when the noise variance varies across the data sources. In this paper, we view these two limitations and uncertainty sources and address them by building an emulator that more accurately quantifies uncertainties. Specifically, our emulator (1) learns a separate noise model for each data source, and (2) leverages strictly proper scoring rules in regularizing itself. We illustrate the performance of our method through analytical examples and engineering problems in materials design. The comparative studies indicate that our MFBO method outperforms existing technologies, provides interpretable results, and can leverage LF sources which are only locally correlated with the HF source.
1 Introduction
Bayesian optimization (BO) is a sequential and sample-efficient global optimization technique that is increasingly used in the optimization of expensive-to-evaluate (and typically black-box) functions [1–3]. BO has two main ingredients: an emulator which is typically a Gaussian process (GP) and an acquisition function (AF) [4,5]. The first step in BO is to train an emulator on some initial data. Then, an auxiliary optimization is solved to determine the new sample that should be added to the training data. The objective function of this auxiliary optimization is the AF whose evaluation relies on the emulator. Given the new sample, the training data are updated and the entire emulation-sampling process is repeated until the convergence conditions are met [6–9].
Although BO is a highly efficient technique, the total cost of optimization can be substantial if it solely relies on the accurate but expensive high-fidelity (HF) data source. To mitigate this issue, multi-fidelity (MF) techniques are widely adopted [10–14] where one uses multiple data sources of varying levels of accuracy and cost in BO. The fundamental principle behind MF techniques is to exploit the correlation between low-fidelity (LF) and HF data to decrease the overall sampling costs [11,15,16]. Compared to single-fidelity BO (SFBO), the choice of the emulator is more important in multi-fidelity BO (MFBO) due to the MF nature of the data. In this regard, we note that most works on SFBO or MFBO leverage variations of Gaussian processes (GPs) for emulations since in scarce data regimes other methods such as probabilistic neural networks suffer overfitting, over confidence, or long training times [6,17–21].
Over the past two decades, many MFBO strategies have been proposed. However, each of these methods has some major drawbacks that are mainly rooted in their emulation strategy which fails to consider some features of MF data sets. For example, many existing MF techniques require prior knowledge about the hierarchy of LF data sources [22–26] and hence they break down when one does not know the relative accuracy of the LF sources. The MF modeling methods defined in Refs. [26–29] struggle to capture intricate correlations among different fidelities as separate emulators are trained for each data source. Kennedy and O’Hagan’s bi-fidelity approach and its extensions [30–33] are limited to bi-fidelity cases and presume simple bias forms (e.g., an additive function [30,34–36]) for the LF sources. Co-Kriging and its extensions [23,37–42] fail to accurately capture cross-source correlations. Botorch [43] which is a widely popular MFBO packages is sensitive to the sampling costs (where highly inexpensive LF sources are heavily sampled which, in turn, causes numerical and convergence issues) and also requires a prior knowledge about the hierarchy of data sources. The above-reviewed methods also fail to directly handle categorical variables which frequently arise in applications such as materials design.
The recent work in Ref. [44] addresses most of the above limitations with two contributions. First, it achieves MF emulation via latent map Gaussian processes (LMGPs) which can simultaneously fuse any number of data sources, do not require any prior knowledge about the hierarchy of the data sources, can handle categorical variables, and do not use any simplification assumptions (e.g., linear correlation among sources, additive biases, etc.) while fusing MF data sets. Second, it quantifies the information value of LF and HF samples differently to consider the MF nature of the data while exploring the search space. The AF used in Ref. [44] is cost-aware in that it considers the sampling cost in quantifying the value of HF and LF data points. Henceforth, we refer to this method as MFBO.
While MFBO performs quite well, it shares two limitations with other MFBO methods. To demonstrate the first one, we consider a simple 1D example in Fig. 1 where each of the two LF sources is more correlated with the HF function in half of the domain. Specifically, LF1 and LF2 are only correlated with the HF source in the left and right regions, respectively, see Fig. 1(a). Before starting BO, the first step in MFBO is excluding highly biased LF sources from BO with the rationale that they can steer the search process in the wrong direction. This decision is made based on the fidelity manifold that LMGP learns using the initial data. In this manifold, each data source is encoded with a point and distances between these points are inversely related to the global correlations between the corresponding data sources, see Fig. 1(b) and Ref. [44]. So, based on Fig. 1(b), the HF source is barely correlated with the LF sources even though they are close to the HF function in half of the domain (since eliminating both sources converts the BO into a single-fidelity process in this example, we assume that MFBO only excludes LF2 and samples from the other two sources as shown in Fig. 1(a)). This is obviously a sub-optimal decision as it precludes the possibility of leveraging an LF source that is valuable in a small portion of the search space which may include the global optimum of the HF source (in the example of Fig. 1(a), MFBO should ideally leverage LF2 but mostly sample from it in the x > 5 region). Hence, the first limitation of existing methods is their inability to leverage LF sources which are only locally correlated with the HF source.
The second limitation of existing MFBO methods (including MFBO) is that they assume all sources are corrupted with the same noise process (with unknown noise variance). However, MF datasets typically have different levels of noise especially if some sources represent deterministic computer simulations while others are physical experiments [45,46]. In such applications, the emulators used in existing MFBO methods overestimate the uncertainties associated with the noise-free data sources which, in turn, adversely affects the exploration property of BO.
In this paper, we introduce MFBOUQ which addresses the two aforementioned limitations of existing technologies because it (1) never discards an LF source (regardless of the magnitude of its global bias with respect to the HF source) to leverage it in parts of the domain where its samples are locally correlated with the HF data, and (2) estimates a separate noise process for each data source. More specifically, we view these two limitations as uncertainty sources and we address them by reformulating the training process of MFBOUQ’s emulator to improve it’s uncertainty quantification (UQ) ability. We argue that this improvement in the emulator helps MFBO in balancing exploration and exploitation more effectively. Figure 1(c) schematically demonstrates the advantages of MFBOUQ over MFBO in a 1D example where there are one HF and two LF sources. As it can be observed the LF sources are mostly sampled where they either are well correlated with the HF source, or provide attractive function values (e.g., very small values in minimization). The advantages of MFBOUQ over MFBO in finding the optimum of HF (y*) hold over various initializations, see Fig. 1(d).
The rest of the paper is organized as follows. We provide the methodological details in Sec. 2 and then evaluate the performance of MFBOUQ via multiple ablation studies in Sec. 3. In Sec. 3, we also visualize how strategic sampling in MFBOUQ, driven by accurate uncertainty quantification and effective handling of biased data sources, results in its superior performance compared to MFBO. This is demonstrated through two real-world high-dimensional material design examples with noisy and highly biased data sources. We conclude the paper in Sec. 4 by summarizing our contributions and providing future research directions.
2 Methods
In this section, we first provide some background on LMGP and MF modeling with LMGP in Secs. 2.1 and 2.2, respectively. We then propose our efficient mechanism for inversely learning a noise process for each data source in Sec. 2.3. Next, we introduce the cost-aware AF of MFBOUQ in Sec. 2.4. Finally, in Sec. 2.5 we elaborate on our idea that improves the UQ capabilities of LMGPs and, in turn, benefits MFBO.
2.1 Latent Map Gaussian Process.
2.2 Multi-Fidelity Emulation Via LMGP.
The first step to MF emulation with LMGP is to augment the inputs with the additional categorical variable s that indicates the source of a sample, i.e., s = {′1′, ′2′, …, ′ds′} where the jth element corresponds to source j for j = 1, …, ds. Subsequently, the training data from all sources are concatenated and used in LMGP to build an MF emulator. Upon training, to predict the objective value of at point x from source j, x is concatenated with the categorical variable s that corresponds to source j and fed into the trained LMGP. We refer the readers to Ref. [44] for more detail but note here that in case the input variables already contain some categorical features (see Sec. 3.2 for an example), we endow LMGP with two manifolds where one encodes the fidelity variable s while the other manifold encodes the rest of the categorical variables. While this choice does not noticeably affect the accuracy of LMGP during test time, it increases interpretability. For instance, we use the learned manifold for the categorical variables in Sec. 3.2 to show the trajectory of BO in the design space.
It has been recently shown [48] that LMGPs have the following primary advantages over other MF emulators: (1) they provide a more flexible and accurate mechanism to build MF emulators since they learn the relations between the sources in a nonlinear manifold, (2) they learn all the sources quite accurately rather than just emulating the HF source, and (3) they provide a visualizable global metric for comparing the relative discrepancies/similarities among the data sources.
2.3 Source-Dependent Noise Modeling.
The presence of noise significantly affects the performance of BO and incorrectly modeling it can cause over-exploration or under-exploration of the search space. To mitigate the effects of noise in BO, we reformulate LMGPs to independently model a noise process for each data source. This reformulation improves emulation accuracy and, in turn, improves the search process when LMGP is deployed in MFBO.
To model noise in GPs, the nugget or jitter parameter, δ, is used [49] to replace R with where I is an n × n identity matrix. With this approach, the estimated stationary noise variance in the data is δσ2 and the mean and variance formulations in Eqs. (6) and (7) are modified by using instead of R.
Although incorporating this modification in the correlation matrix can enhance the performance of the emulator and BO in single-fidelity (SF) problems, it does not yield the same benefits in MF optimization. This is likely because of the dissimilar nature of the data sources and their corresponding noises. When dealing with multiple sources of data, each source may suffer from different levels and types of noise. Consider a bi-fidelity dataset where the HF data come from an experimental setup and are subject to measurement noise, while the LF data are generated by a deterministic computer code which has a systematic bias due to missing physics. In this case, using only one nugget parameter in LMGP for MF emulation is obviously not an optimum choice.
2.4 Multi-Source Cost-Aware Acquisition Function.
2.5 Emulation for Exploration.
The composite AF in Eq. (14) quantifies the information value of LF samples via Eq. (10) whose value scales with the prediction uncertainties, i.e., σ(u). The source-dependent noise modeling of Sec. 2.3 improves LMGP’s ability in learning the uncertainty by introducing a few more hyperparameters. However, the added hyperparameters may result in overfitting and, in turn, deteriorate the predicted uncertainties [51,52]. A related issue is the effect of large local biases of LF sources which can inflate the uncertainty quite substantially and, as a result, increase γLF(u; j). This increase causes MFBO to repeatedly sample from the biased LF sources. Such repeated samplings reduce the efficiency of MFBO and may cause numerical issues (due to ill-conditioning of the covariance matrix) or even convergence to a sub-optimal solution.
To address the above issues simultaneously, we argue that the training process of the emulator should increase the importance of UQ which directly affects the exploration part of MFBO. To this end, we leverage strictly proper scoring rules while training LMGPs.
3 Results and Discussion
We demonstrate the performance of MFBOUQ on two analytic examples (see Table 1 in the Supplementary Information available in the Supplemental Materials on the ASME Digital Collection for details on functional forms, size of initial data, sampling costs, and number of LF sources and their accuracy with respect to the HF source) and two real-world problems. For analytic examples, we compare the results against Botorch, MFBO, and SFBO. MFBOUQ and MFBO use the AFs introduced in Sec. 2.4 while SFBO uses EI as its AF and LMGP as its emulator. Botorch employs single-task multi-fidelity GP and knowledge gradient as its emulator and AF, respectively [58,59]. All the baselines except Botorch are also used for engineering examples. Botorch is not applicable to them since it cannot handle categorical variables and also the values determined by Botorch are not obtained through direct sampling from the available data sets (rather, the samples are obtained by optimizing the learned posterior).
We assume that the cost of querying any of the data sources is much higher than the computational costs of BO (i.e., fitting LMGP and solving the auxiliary optimization problem). Therefore, we compare the methods based on their capability to identify the global optimum of the HF source and the overall data collection cost. By comparing these methods, we aim to demonstrate: (1) the advantages of estimating noise process for each data source, (2) that using IS improves the accuracy of LMGP and, in turn, enhances the convergence of BO (since our defined AFs highly rely on the quality of the prediction), and (3) that deploying IS eliminates the need for excluding highly biased LF sources from BO.
We use the same stop conditions across all the baselines to clearly demonstrate the benefits of our two contributions. In particular, the optimization is stopped when either of the following happens: (1) the overall sampling cost exceeds a pre-determined maximum budget, or (2) the best HF sample does not change over 50 iterations. The maximum budget for the analytical examples is 40,000 units, while it is 1000 and 1800 for the two real-world examples. These budgets are chosen based on the data collection costs.
3.1 Analytical Examples.
We consider two analytical examples, Wing [60] and Borehole [61], whose input dimensionality is 10 and 8, respectively. To challenge the convergence and better illustrate the power of separate noise estimation, we only add noise to the HF data (the noise variance is defined based on the range of each function). The added noise variance to the HF source of Wing and Borehole are 9 and 16, respectively. Both examples are single response and details regarding their formulation, initialization, and sampling cost is presented in SI A available in the Supplemental Materials). To assess the robustness of the results and quantify the effect of random initial data, we repeat the optimization process 20 times for each example with each of the baselines (all initial data are generated via Sobol sequence).
In each example, the relative root mean squared error is calculated between LF sources and their corresponding HF source based on 10,000 samples to show the relative accuracy of the LF sources (presented in Table 1 in SI available in the Supplemental Materials). Based on these ground truth numbers (which are not used in BO), in the case of Borehole the source ID, true fidelity level, and sampling costs are not related (e.g., although the first LF source is the most expensive one, it has the least accuracy compared to the HF source). In the case of Wing, however, these numbers match (e.g., LF1 is the most accurate and expensive LF source and is followed by LF2 and then LF3).
MFBO excludes the highly biased LF sources from BO before any new samples are obtained (also, during BO, the initial samples from highly biased LF sources are not used in emulation). This exclusion is done based on the latent map of the LMGP model that is trained on the initial data. Figure 2 shows the latent maps of Wing and Borehole examples. As shown in Fig. 2, while all the fidelity sources of Wing are beneficial (since the points encoding the LF sources are very close to the HF point), the first two LF sources of Borehole are not correlated enough with the HF (their latent positions are distant from that of the HF) and hence are excluded in MFBO. However, MFBOUQ does not require this exclusion because it leverages the biased LF sources merely in the regions that they are correlated with the HF source. We also keep the biased sources in Botorch since there is no explicit requirement to exclude them within the package’s documentation. In this paper, we do not exclude the biased sources from MFBO to have a comprehensive comparison with other baseline methods and, most notably, to effectively illustrate the impacts of our contributions.
Figure 3 summarizes the convergence history of each example by depicting the best HF sample (y*) found by each method versus its accumulated sampling cost. We note that the initialization process is identical for all MFBO methods and the reason for observing different starting points for them is that we report y* versus cumulative cost. More specifically, a method may take samples from any of the sources but this action may not improve y* in which case the cumulative cost increases while y* does not. We also note that SFBO has a different initialization since we must use more initial HF samples in SFBO to ensure its starting cost is comparable to the costs of MF methods which use both HF and LF data.
As we expect, MFBOUQ and MFBO outperform SFBO in Wing (Fig. 3(a)) by leveraging the inexpensive LF sources that are globally correlated with the HF source. However, the large added noise adversely affects the performance of Botorch in estimating the correlation among sources. This inaccurate correlation estimation combined with large cost differences among the data sources prevents Botorch from leveraging the correlated LF sources and causes convergence to the sub-optimal solution y* = 183.72 while the ground truth is 123.25. The superiority of MFBOUQ is more evident in the Borehole example where there are highly biased LF sources. In Borehole (Fig. 3(b)), all the thin red curves (MFBO) are straight lines, except for two curves. This means that for 18 repetitions, the optimization process fails to improve. The reason behind this failure is that MFBO cannot handle the large bias of the LF sources and samples points that steer the optimization in the wrong direction. Consequently, MFBO cannot find any efficient HF sample with large enough information value (that justifies its high sampling cost) which results in the lack of improvement in y*. Conversely, all the thin green curves (MFBOUQ) converge to a value very close to the ground truth. Additionally, while efficient sampling from LF sources improves the performance of MFBOUQ, the large added noise to the HF source adversely affects the performance of SFBO and results in a sub-optimal convergence.
As detailed in SI B.1 available in the Supplemental Materials, we note that unlike Botorch, the performance of MFBOUQ is robust to the sampling costs and local correlations. For instance, in Borehole (Fig. 3(b)), Botorch estimates the optimum as while this value is for MFBOUQ (the ground truth is 3.98). The reason behind this inaccuracy is that Botorch fails to find an HF sample whose information value is large enough to justify its high sampling cost and, as a result, cheap LF sources are largely queried. Additionally, due to the strong bias in two of the LF sources, Botorch fails to effectively sample them within the correlated domain. So, LF queries do not improve and Botorch stops without finding the optimum.
3.2 Real-World Datasets.
In this section, we study two materials design problems where the aim is to find the composition that best optimizes the property of interest. We do not add noise to these two examples as they are inherently noisy. The design space of both examples has categorical inputs (denoted by t) and we add one more categorical variable (denoted by s) to enable data fusion as described in Sec. 2.2. We design our LMGP to map the categorical inputs onto two 2D manifolds (one for t and the other for s) to help with the visualization of the exploration–exploitation behavior of BO in the design space. The HF and LF data are obtained via simulations (based on the density functional theory) with different fidelity levels.
The first example is on designing a nanolaminate ternary alloy (NTA) which is used in applications such as high-temperature structural materials [62]. NTA is in the form of M2AX where M is an early transition metal, A is a main group element, and X is either carbon or nitrogen. This problem is bi-fidelity where the goal is to find the member of NTA family with the largest bulk modulus. The HF and LF datasets have 224 samples each and are 10-dimensional (7 quantitative and 3 categorical where the latter have 10, 12, and 2 levels). The cost ratio between the HF and LF sources is 10/1 and we initialize the BO with 30 HF and 30 LF samples (the composition with the largest bulk modulus is never in the initial data). To quantify the sensitivity of the results to the random initial data, we repeat this process 20 times for each BO method.
Our second problem is on designing hybrid organic–inorganic perovskite (HOIP) crystals in the form of ABX3 where B is occupied by metal cation, A can be organic or inorganic cation, and X denotes a choice of halide [63]. In this example, our goal is to find the compound with the smallest inter-molecular binding energy. There are three datasets (one from HF and two from LF sources) which have the same dimensionality (1 output and 3 categorical inputs with 10, 3, and 16 levels) but different sizes. The HF dataset has 480 samples while the first and second LF datasets have 179 and 240 samples, respectively. The cost ratio between the three sources is 15/10/5 (where the HF and LF2 sources are the most expensive and cheapest, respectively) and we initialize the BO with (15, 20, 15) samples for the HF and LF sources (the best compound is excluded from the initial data). We repeat the BO process 20 times to assess the sensitivity of the results to the initial data. As mentioned before, the first step in MFBO is to train an LMGP to the initial data in each problem to exclude the highly biased sources. As NTA has categorical variables, LMGP learns two manifolds. Based on Fig. 4(a), the latent points of the fidelity sources of NTA are very close in the learned fidelity manifold which indicates that there is a high correlation between the corresponding two data sources. However, both latent points of LF sources in HOIP are far from the HF one so they both should be excluded due to their large global bias. By excluding both LF sources the MF problem in HOIP reduces to an SF one so we do not exclude the biased LF sources from HOIP to be able to compare the performance of MFBOUQ with MFBO.
A summary of the convergence history of NTA and HOIP is depicted in Fig. 5 by showing the best HF sample (y*) found by each method versus its accumulated sampling cost. The initialization is the same for all the MFBO methods and observing different starting points follows the same rationale mentioned for Fig. 3. In Fig. 5(a), the LF sources are globally correlated with the HF source and hence both MF methods perform better than SFBO by using inexpensive and informative LF data. Additionally, the higher prediction accuracy of the emulator of MFBOUQ results in a more efficient sampling and faster convergence of BO in MFBOUQ compared to MFBO. Regarding the spike in the convergence plot of MFBO in Fig. 5(a), we note that 18 repetitions converge at costs below 500. Consequently, the thick red line (which is the average across the 20 repetitions) becomes highly sensitive to the convergence values after cost exceeds 500 since it is an average of only two values. Specifically, in one of these two repetitions the best sample found is 237 for many iterations until the cost reaches 544 when MFBO suddenly converges to the ground truth (i.e., 255). This sudden convergence results in the spike in the corresponding history and, in turn, the average behavior captured by the thick red line.
The superiority of MFBOUQ is more obvious in HOIP (see Fig. 5(b)) which has two highly biased LF sources. In this example, MFBO expectedly converges to a sub-optimal compound since both LF sources are only locally correlated with the HF source. So, the AFs fail to sample valuable points to improve the optimization as they cannot find the region where the LF sources are beneficial and informative. Additionally, each data source is obtained from a distinct process so it suffers from different types and levels of noise. Therefore, estimating a single noise for all the data sources in MFBO reduces the emulation accuracy and further exacerbates the performance of AFs. MFBOUQ overcomes these issues by focusing more on UQ and estimating separate noise processes; resulting in a better performance compared to SFBO and especially to MFBO.
The 2D manifolds in Figs. 6 and 7 demonstrate the trajectory of BO in the categorical design space of each data source in NTA and HOIP, respectively. The top and bottom rows of these figures correspond to MFBO and MFBOUQ. In these manifolds, each latent point indicates a compound and is color-coded based on the ground truth response value (i.e., the bulk modulus) from each source. The marker shapes in these manifolds indicate whether a compound is part of the initial data, sampled during BO, or never seen by LMGP. As expected, most markers are triangles which indicates that most combinations are never tested by either MFBO or MFBOUQ. The red arrows next to the legend mark the response ranges in each data set which indicate that, unlike in Fig. 6 for NTA, the response ranges across the three sources are quite different in the HOIP problem.
To benefit any MFBO approach, LF sources should be sampled in two primary regions of their input space: (1) the region that contains their own optima since each data source is analyzed separately in the auxiliary optimization problems (see Sec. 2.4 for details), and (2) the region where the LF sources are correlated with the HF source. These two regions may overlap with each other (as is the case in NTA) or not (as is the case in HOIP or the 1D example in Fig. 1(b) where MFBOUQ only samples LF2 once when x < 5). We note that exploring the correlation region (if it exists!) is crucial for capturing the relationship between the LF and HF sources and as shown below the effectiveness of this exploration highly depends on the accuracy of the emulator in surrogating each source, estimating uncertainties, and identifying the correlation patterns among different data sources.
As shown in Fig. 6, for both MFBO and MFBOUQ manifolds with very similar structures are learnt by LMGP for HF and LF data (this was expected per Fig. 4 which indicates that the two sources are highly correlated). For instance, for both LF and HF data, the optimum compound is located at the top-right corner of the manifold and their values are also quite close (255 for HF and 244 for LF). This similarity indicates that MFBO and MFBOUQ are both able to learn about the HF source by sampling the space of the LF source. However, this sampling is more effective in the case of MFBOUQ since its emulator quantifies the uncertainties more accurately. In particular, MFBOUQ correctly samples compounds from the LF source that are mostly encoded in the top-right corner of the manifold (see Fig. 6(d)) while MFBO tests compounds that explore the entire design space (see Fig. 6(b)).
As shown in Fig. 7, for any of the sources and with either MFBO or MFBOUQ, the compounds in the HOIP example are encoded by LMGP into two major clusters where the smaller one contains the optimum design. By examining these two clusters we observe that all the compounds in the smaller cluster have dimethylformamide (DMF) solvent. These observations are quite interesting in that they provide engineers with insights into the most important design variables that affect the materials properties (e.g., DMF solvent which decreases the binding energy in this example).
The initial HF dataset used in either MFBO or MFBOUQ (see Figs. 7(a) and 7(d)) is very small and does not have any compounds from the small cluster that contains the optimum. However, there are some initial samples from LF1 and LF2 in this cluster and so we should expect BO to leverage these samples (and the fact that they have some correlation with the unseen HF compounds) in emulating the HF source and sampling compounds from it that belong to the small cluster. While this expectation is met by MFBOUQ, MFBO fails to explore the (encoded) design space that contains the optimum HF sample. This failure is because (1) both LF sources (especially LF1) provide smaller binding energies than the HF source, and (2) the emulator of MFBO overestimates the uncertainties in LF sources. The combination of these two factors prevents MFBO to find an HF sample that is valuable enough to be selected in Eq. (15). We refer readers to SI B.2 available in the Supplemental Materials for more analysis on the performance of MFBOUQ in these two examples.
4 Conclusion
In this paper, we develop a novel method to improve the performance of multi-fidelity cost-aware BO techniques. Our method enhances the accuracy and convergence rate of MFBO through two main contributions. First, we enable the emulator to estimate separate noise processes for each source of data. This feature increases the accuracy of the trained model since different data sources may exhibit different types and levels of noise. Second, we define a new objective function penalized by strictly proper scoring rules to (1) improve the prediction, (2) increase the focus on UQ, and (3) forgo the need to exclude highly biased data sources from BO. Our BO method, MFBOUQ, accommodates any number of data sources with any levels of noise, does not require any prior knowledge about the relative accuracy of (or relation between) these sources, and can handle both continuous and categorical variables. In this paper, we illustrate these features via both analytic and engineering problems.
In this work, we use two fixed AFs in each iteration. However, one can also customize the choice of AFs for different iterations using adaptive approaches. Additionally, the examples presented in this paper are limited to single-objective problems and we do not aim to exclude the effect of noise in the final solution (i.e., the best HF sample found is noisy). We intent to study these directions in our future works.
Footnotes
Log-Normal.
Log-Half-Horseshoe with zero lower bound and scale parameter 0.01.
Acknowledgment
We appreciate the support from National Science Foundation (award number CMMI − 2238038), the Early Career Faculty grant from NASA’s Space Technology Research Grants Program (award number 80NSSC21K1809), and the UC National Laboratory Fees Research Program of the University of California (Grant No. L22CR4520).
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.