Graphical Abstract Figure
Graphical Abstract Figure
Close modal

Abstract

When performing time-intensive optimization tasks, such as those in topology or shape optimization, researchers have turned to machine-learned inverse design (ID) methods—i.e., predicting the optimized geometry from input conditions—to replace or warm start traditional optimizers. Such methods are often optimized to reduce the mean squared error (MSE) or binary cross entropy between the output and a training dataset of optimized designs. While convenient, we show that this choice may be myopic. Specifically, we compare two methods of optimizing the hyperparameters of easily reproducible machine learning models including random forest, k-nearest neighbors, and deconvolutional neural network model for predicting the three optimal topology problems. We show that under both direct inverse design and when warm starting further topology optimization, using MSE metrics to tune hyperparameters produces less performance models than directly evaluating the objective function, though both produce designs that are almost one order of magnitude better than using the common uniform initialization. We also illustrate how warm starting impacts both the convergence time, the type of solutions obtained during optimization, and the final designs. Overall, our initial results portend that researchers may need to revisit common choices for evaluating ID methods that subtly tradeoff factors in how an ID method will actually be used. We hope our open-source dataset and evaluation environment will spur additional research in those directions.

1 Introduction

Design optimization, such as topology optimization (TO) or shape optimization, frequently requires expensive (in both time and computing resources) iterations to converge. For example, following the implementation of the governing equations and required parameters in a computational environment, TO problems typically require the iterative solving of these equations followed by updates to the problem conditions after each iteration. These computational expenses become significant, and sometimes prohibitive, in cases that require large numbers of calculations per iteration, large numbers of iterations, where computational resources are limited, or where a good solution is needed in a short duration of time. In response, researchers have tried to circumvent this iterative process via inverse design (ID)—training a machine learning (ML) model to directly output an optimal design for a new problem, given a dataset of past (typically expensive) physics-based optimizations [1,2]. In cases where such a dataset is available and one needs to evaluate many new input conditions or requirements quickly, ID methods can often provide significant time savings compared to optimizing a design for each bespoke input condition [1,2].

Inverse design models are typically trained to minimize the pointwise mean square error (PMSE) of how well the ID model predicts the optimized geometry for the input condition. This standard choice results from formulating ID as a supervised learning problem—input conditions in and optimized designs out—and measuring the output’s discrepancy with training or test samples. Researchers typically optimize any hyperparameters of such models in similar fashion.

However, an ID method’s ultimate goal can differ from the above aim. Are we using the predictions to capture, as accurately as possible, the geometry or design itself? Or do we care just about outputting high-performance designs, irrespective of how closely they match the training set? More importantly, are we using the predicted designs as-is, or using them to accelerate further optimization (i.e., warm starting)? Does warm starting with ID methods actually help, and if so, how and when? Is mean squared error (MSE) always the best thing to optimize? This paper addresses some of these questions.

Specifically, we attempt to demonstrate using different and easily reproducible machine learning models (random forests, k-nearest neighbors, and deconvolutional neural networks) how ID predictions impact the topology optimization problems we considered for this paper.2 These problems include a classical topology optimization problem with structural compliance [3] and both 2D and 3D conduction problems governed by the Poisson equation [4]. We examine multiple measures of ID performance, how the ID predictions modify the optimization process compared to a standard benchmark, and what, if any, effects altering the hyperparameter tuning method has on our results. The overall contributions of this paper are as follows:

  1. We formulate three inverse design problems including the design of two-dimensional and three-dimensional heat conduction based on the problem described in Ref. [4] and shown in Fig. 1, and classical cantilever beam problem which described in Ref. [3] and shown in Fig. 2. This results in datasets and ID evaluation environments that we make available for the research community, along with performance diagnostics that shed light onto how optimizers are affected by the warm start predictions provided by ID methods.

  2. We compare the performance of k-nearest neighbors, random forests, and deconvolutional neural networks models on these inverse design problems across multiple metrics including MSE and objective function value, both for the initial prediction and as a warm start to an adjoint optimizer. We provide both aggregated results (Figs. 3, 5, and 7) as well as illustrative examples (Fig. 9) that shed light on how adjoint optimizers adjust to warm starting by ID methods.

  3. We compare two methods for optimizing the hyperparameters of those ID models, specifically minimizing the PMSE and another that minimizes the objective function value of the ID predictions at iteration 0, which we call the prediction objective function minimization method (POFMM). We show in Figs. 4, 6, and 8 and Tables 24 that models optimized with POFMM outperform those trained using PMSE.

Fig. 1
The physical layout of the 2D and 3D TO problems that we will test ID methods on, adapted from Refs. [4,5]. Note that the bar refers to the value of the mass function at a given point.
Fig. 1
The physical layout of the 2D and 3D TO problems that we will test ID methods on, adapted from Refs. [4,5]. Note that the bar refers to the value of the mass function at a given point.
Close modal
Fig. 2
The physical layout of the topology optimization problem of a cantilever beam: (left) design domain of a cantilever beam with design parameters of force location (h) and direction (α) on the free side of the beam, (right) topology-optimized beam
Fig. 2
The physical layout of the topology optimization problem of a cantilever beam: (left) design domain of a cantilever beam with design parameters of force location (h) and direction (α) on the free side of the beam, (right) topology-optimized beam
Close modal
Fig. 3
Median normalized optimization trajectories for the tested initialization techniques
Fig. 3
Median normalized optimization trajectories for the tested initialization techniques
Close modal
Fig. 4
The normalized initial optimality gap of the either PMSEM or POFMM optimized of KNN, RF, and DeCNN models for 2D heat conduction problem. The lines in the middle of each box plot represent the median, and the dashed line represents the optimal value achieved under uniform initialization (the control condition).
Fig. 4
The normalized initial optimality gap of the either PMSEM or POFMM optimized of KNN, RF, and DeCNN models for 2D heat conduction problem. The lines in the middle of each box plot represent the median, and the dashed line represents the optimal value achieved under uniform initialization (the control condition).
Close modal
Fig. 5
Median normalized optimization trajectories for the tested initialization techniques
Fig. 5
Median normalized optimization trajectories for the tested initialization techniques
Close modal
Fig. 6
The normalized initial optimality gap of the PMSEM and POFMM optimized of KNN, RF, and DeCNN models for 3D heat conduction problem. The lines in the middle of each box plot represent the median, and the dashed line represents the optimal value achieved under uniform initialization (the control condition).
Fig. 6
The normalized initial optimality gap of the PMSEM and POFMM optimized of KNN, RF, and DeCNN models for 3D heat conduction problem. The lines in the middle of each box plot represent the median, and the dashed line represents the optimal value achieved under uniform initialization (the control condition).
Close modal
Fig. 7
(Left) Median normalized optimization trajectories for the tested initialization techniques and (right) a detailed view of initial ID models’ warm start trajectories
Fig. 7
(Left) Median normalized optimization trajectories for the tested initialization techniques and (right) a detailed view of initial ID models’ warm start trajectories
Close modal
Fig. 8
The normalized initial optimality gap of either PMSEM or POFMM optimized of KNN, RF, and DeCNN models for 2D cantilever beam problem. The lines in the middle of each box plot represent the median, and the dashed line represents the optimal value achieved under uniform initialization (the control condition).
Fig. 8
The normalized initial optimality gap of either PMSEM or POFMM optimized of KNN, RF, and DeCNN models for 2D cantilever beam problem. The lines in the middle of each box plot represent the median, and the dashed line represents the optimal value achieved under uniform initialization (the control condition).
Close modal
Fig. 9
The evolution of 2D heat conduction designs over the course of the optimization process. Here, the trajectory referenced as “KNN” refers to the trajectory initialized with the prediction of a KNN model optimized using PMSEM. The control trajectory is initialized with a constant distribution set to the volume limit. Note that the mass distributions shown here are plotted in the function form used by the optimizer, and hence display interpolation between points.
Fig. 9
The evolution of 2D heat conduction designs over the course of the optimization process. Here, the trajectory referenced as “KNN” refers to the trajectory initialized with the prediction of a KNN model optimized using PMSEM. The control trajectory is initialized with a constant distribution set to the volume limit. Note that the mass distributions shown here are plotted in the function form used by the optimizer, and hence display interpolation between points.
Close modal

2 Background and Related Work

In this section, we provide background on ID problems in general, the specific ID methods we use, and the needed background on the physical problems the paper addresses.

2.1 Inverse Design.

Typical machine learning approaches to accelerating optimization or automating design under new input conditions (e.g., cRD, such as new boundary conditions, altered constraints, etc.) involve training a surrogate model that takes a set of design variables in (e.g., xRN, such as an airfoil mesh) and produces an estimate of the physical behavior of a device as an output (e.g., f(x|c), such as a resultant flow field). Optimization can then be accelerated by using the cheaper or faster surrogate to do gradient-based or gradient-free optimization. While widely used, these approaches can be expensive to train well on high-dimensional spaces, since it requires learning a function that maps all inputs to (possibly multiple) objective functions (e.g., RN+DR in the case of a single objective).

In contrast, a different approach, which we refer to as ID throughout this paper, attempts to directly predict the optimal design (e.g., x*, such as an optimized airfoil mesh), given target input conditions (e.g., c, such as a desired Reynolds or Mach number) [6]. This bypasses the typical need to learn or approximate an entire partial differential equation-based solution field or performance quantity, and instead focuses only on learning the mapping between the input conditions and the final optimized design, (i.e., learning the mapping RDRN). These predicted designs can either be used in place of using an existing optimizer, or they can warmstart or accelerate an existing optimizer by providing a high-quality initial guess [6]. While it may seem a first glance to be more difficult to learn this function compared to traditional surrogate models, note that if one has an existing dataset of optimized designs along with their conditions (i.e., {xi*,ci}iM) and if DN, then the sample complexity needed to learn this mapping from RD can be much smaller compared to covering the space of inputs in RN+D required by surrogate models.

Currently, in the field of mechanical engineering, a large amount of work has been done investigating the utility of ID in efficiently designing and characterizing materials (particularly nanomaterials and metamaterials) and microstructures [1,69]. The area of design for electromagnetic wave manipulation (e.g., nanophotonics) is particularly active [1014]. ID methods have also been applied to problems in areas including molecular discovery [15], additive manufacturing [16], airfoil design [6,17], and imaging [18]. Due to their typically greater sample efficiency relative to surrogate modeling techniques, ID methods have also been explored as a means of accelerating the optimization process, often by providing good initialization points [6,12,1921].

Inverse design has emerged as a prominent approach to tackle the limitations associated with classical TO, such as costly iterations and susceptibility to local optima due to their gradient-based nature. Recent advancements in the field have showcased innovative methodologies combining optimization and machine learning techniques. For instance, Nie et al. introduced an end-to-end TO framework named TOPGAN, which exploits various physical fields computed on the original, unoptimized material domain to predict optimal topology domains [22]. In parallel, Wang et al. demonstrated the use of U-net, achieving a substantial reduction in computation cost with minimal impact on the performance of design solutions [23]. Noteworthy contributions include the utilization of the diffusion model by researchers for high-performance inverse design in TO problems [24,25]. Regenwetter et al. conducted a comprehensive review and analysis of deep generative machine learning models in engineering design, specifically focusing on TO problems, providing valuable insights for interested readers [26]. Additionally, our group conducted a study that quantitatively addresses the question of when it is worthwhile to incorporate inverse design compared to optimizing designs without machine learning assistance [27].

Beyond applications to inverse design, prior work across a range of fields has studied formal or automated approaches to model selection. This includes work in what is referred to as automatic machine learning, for which Ref. [28] provides a recent overview relevant to engineering design for interested readers. In addition, the surrogate modeling community has pursued methods for model selection in predicting the forward performance model of a design. This includes, as one exemplar, recent work in concurrent surrogate model selection [29], which uses criteria other than the mean squared error to enable more robust modal selection. For example, using outlier-insensitive measures of location, such as the median or mode produced more robust surrogate models than typical MSE-based measures. In contrast to past work that has focused primarily on automating model search for high-accuracy surrogate models, this paper focuses on predicting design variables directly (inverse design), and specifically on understanding the causal effect of optimizing design-centered error measures (MSE) versus optimizing performance measures (the optimality gap).

While researchers have studied inverse design using a wide variety of predictive models, in this paper, we chose three main model families of non-linear supervised learning approaches ranging from fairly simple prototype-based methods, such as k-nearest neighbors (KNN), to ensembles such as random forests (RF), and to adaptive basis function methods, such as deconvolutional neural networks (DeCNN). We chose these models since they can provide a readily reproducible benchmark for future research in this area while allowing us to rigorously study the fundamental questions on interest regarding the effects of ID metrics and warm start behavior. We also attempted kernel-based methods, such as Gaussian processes, but these methods could not ultimately scale to the larger problems we show later in the paper, and thus we exclude them here. With this in mind, we will now provide some brief background on KNN, RF, and DeCNN models which have been applied to inverse design problems, and in particular, topology optimization problems [3032]. While length constraints prevent us from providing a thorough description of each algorithm and its myriad variants, we provide for each pointers to additional literature for interested readers who wish to learn more about them.

2.1.1 K-Nearest Neighbors.

K-nearest neighbors is an algorithm that classifies (or, in the case of regression problems, assigns a value to) unknown data points based on combining (typically through averaging) the values of data points in the training set that are closer in proximity to an unknown data point. The Euclidean distance is often used to assess proximity between points, although other distance metrics are frequently used. The selection of the number of neighbors (k), can have a large affect on model performance [33]. It is one of the simplest non-linear ML methods, which makes it an appropriate baseline benchmark, even if it can struggle to extrapolate well beyond nearby training data. Readers interested in finding out more details are directed to Refs. [34,35].

2.1.2 Random Forests.

The random forest technique is an ensemble method that employs several decision trees to classify (or, in the case of regression, assign a value to) a data point [36]. This method generates multiple decision trees by selecting random features to use for each tree and then constructing the decision tree by computing the optimal “split point” via the Gini-index cost [33]. After constructing several trees, the “forest” makes a prediction by aggregating decisions across those trees. It is one of the most widely used and implemented ensemble methods, and interested readers can learn more in the original paper [36], or via Ref. [34] or Ref. [35].

2.1.3 Deconvolutional Neural Network.

Deconvolutional neural networks, also known as transposed convolutional neural networks, have emerged as a powerful tool in the field of inverse design. These models exhibit a structure similar to that of CNNs, but with reversed operations. These models have gained significant popularity in the realm of inverse design, particularly in the domains of topological and shape optimization problems [27,37]. In this paper, we explore the potential and limitations of DeCNNs in tackling complex pattern learning and achieving superior performance compared to simpler models. The concept of DeCNNs was first introduced by Zeiler et al., who proposed an innovative approach to learn feature maps in a reverse manner [38]. Their work laid the foundation for the application of DeCNNs in various tasks, such as image reconstruction and visualization [39,40]. DeCNNs have demonstrated their effectiveness in handling a wide range of functions, allowing them to learn complex patterns and achieve lower test mean squared errors compared to simpler models. However, it is important to acknowledge certain limitations inherent to DeCNNs, including the need for optimizing numerous hyperparameters and requiring a substantial amount of training data to attain desirable results, which may not make them the most cost-effective option [27].

2.2 Background on Chosen Topology Optimization Problems.

One type of problem that lends itself to inverse design approaches is that of topology optimization. By topology optimization, one seeks to determine the distribution of material in a space that best satisfies certain performance criteria [41].

A particular subset of topology optimization problems are those in which an optimal distribution of material is sought to facilitate the transfer of heat through a domain while satisfying certain criteria. In more sophisticated cases, this can encompass the optimization of heat exchanger geometry subject to convective heat transfer in turbulent flow [42]. Two first problems we consider are the topology optimization of a 2D/3D heat sinks subject to pure conduction derived from a demonstration provided in Ref. [4]. In this problem, we seek to minimize the thermal compliance (CT)
(1)
wherein Ω delineates the spatial domain under consideration which is the unit square for the 2D case and unit cube for the 3D case, f is the heat source term indicating the magnitude and distribution of thermal input within the defined domain, T is the temperature, and α is a regularization term, introducing a controlled constraint to the optimization process and influencing the tradeoff between different terms in the objective function. a(x) is a spatially varying mass distribution function which varies continuously between 0 and 1, and where a(x)=1 represents regions of highly thermally conductive material, a(x)=0 represents low thermally conductive material, and intermediate values non-linearly but continuously transition between those two conductivities [4]. This non-linear continuous transition between conductivity values in this model is based on the solid isotropic material with penalization (SIMP) thermal conductivity rule, as expressed by the function
(2)
Here k(a) represents the conductivity, a is the spatially varying mass distribution function, ε is a small constant, and p is the penalization parameter. The transition between highly thermally conductive material and low thermally conductive material is achieved through a power-law relationship involving a raised to the power of p. The choice of the penalization parameter p influences the smoothness and rate of the transition. The physical interpretation of this heat sink problem is to determine the optimal material distribution a that minimizes the integral of the temperature while considering limitations on the amount of highly conducting material. This problem is subject to the Poisson equation with mixed Dirichlet–Neumann conditions
(3)
(4)
(5)
Here T represents the temperature, k(a)= is conductivity, f is the volumetric heat source, and n is an outward unit normal vector. The heat flux is defined as q=k(a)T. The physical interpretation of the boundary conditions is that the temperature at ΓD is fixed at T=0, and the boundary ΓN is insulated (adiabatic region). This problem is also subject to the constraints of a(x) [0,1] and a volume constraint V on the high thermally conductive material such that
(6)
for the entirety of Ω [4]. These problems are described in greater detail in Example 1 of Ref. [5]. Following the example in Ref. [4], we employ an interior point optimization method described in Ref. [43] to generate data for model training and evaluation purposes.
The last problem we consider is the cantilever beam problem, a classical problem in structural topology optimization which is shown in Fig. 2. The objective of this problem is to find the optimal distribution of material within the beam such that minimizes the structural compliance, with a constraint on the total amount of material [44]. The specific minimization problem is
(7)
where c(x) is structural compliance, U(x) is the node displacement, F is the tensor of applied loads, K(x) is the stiffness matrix. V(x), and V0 is the material volume and design domain volume, respectively and f is the prescribed volume fraction. xmin is a vector of minimum relative densities for preventing the singularity condition.

3 Methodology

To address the contributions mentioned in the Introduction, our methodology has the following main steps: (1) defining heat conduction and cantilever beam topology optimization problems and how we generate the data sets, (2) how we train and optimize our specific ID methods, and (3) how we measure and evaluate the results from the ID methods.

Dataset Creation Via 2D and 3D Heat Conduction Topology Optimization.

To create a dataset of realistic yet manageable benchmark problems for our iterative design experiments, we built upon a classical thermal compliance example with Poisson equation constraints from Ref. [5], which we further describe in the following.

The optimization problems minimize the thermal compliance of given geometries while adhering to constraints on the volume of highly conductive material used and the presence of an adiabatic region. The adiabatic region refers to a specified length on the bottom side of the 2D problem space or a prescribed asymmetric area on the bottom surface of the 3D problem space (for further details, refer to the above background section and Fig. 1).

To generate our dataset for the 2D/3D problem set, we explored how the optimal design changed as a function of two input parameters: the upper limit of material volume within the unit square/cube (referred to as volume fraction) and the length/area of the adiabatic region. We selected values for the volume limit ranging from 0.3 to 0.6, as our chosen interior point solver (IPOPT) produced reliable results within that range, and values within that range exhibited sufficient topological variability. As for the adiabatic region length/area, we chose values between 0 (representing the absence of an adiabatic region) and 1 (corresponding to an entire side of the unit square/cube being adiabatic). The adiabatic region for the 3D problem is defined as a square area on the bottom side of the cube with a symmetric distance from the edges. We divided each design input range into 20 segments, resulting in 21 values of interest for each parameter. Consequently, the volume bounds were sampled at 0.3,0.315,0.33,,0.57,0.585,and0.6, while the lengths/areas were sampled at 0.0,0.05,0.1,,0.9,0.95,and1.0. By combining each volume limit value with each adiabatic region length/area value, we generated optimized topologies using the interior point solver, iterating until convergence.

For the 2D problem, we employed a 70×70 mesh, while for the 3D problem, we used a 50×50×50 mesh as the design domain. These mesh sizes were chosen to provide fine details within the design space, typical of topologically optimized solutions to these problems (as shown in Fig. 1), without unnecessarily increasing the computational running time for the solver.

For each combination of design parameters, we conducted 100 iterations of optimization, with IPOPT terminating when the tolerance of 1.0×10100 was satisfied. To perform finite element optimization, we used Dolfin-Adjoint [45,46] in conjunction with IPOPT [43].

Upon completion of the optimization run for a particular set of parameters, the resulting distribution was discretized to transform the data into a format suitable for the regression models used in this study. To achieve this, we divided the unit square into a 70×70 grid, extracting the mass function value at each intersection point. This process yielded a total of 71×71=5041 data points per topology in the 2D case. Similarly, for the 3D case, we divided the unit cube into a 50×50×50 grid, capturing the mass function value at each intersection point. This grid-based approach enabled us to capture all the pertinent information present in the output of the optimizer, resulting in a dataset of 51×51×51=132,651 data points per topology.

3.1 Dataset Creation Via Cantilever Beam Topology Optimization.

To test our claim across multiple benchmark problems, we also created a dataset via a classical structure compliance topology optimization problem using the code provided by Andreassen et al. [3], which we detail in the following.

This optimization problem minimizes structural compliance of a cantilever beam (a rectangular domain with a 2:1 ratio) while satisfying the constraints on the volume of material used and the boundary conditions given force location and direction. We used a material volume fraction constraint of 0.3 and a force magnitude of 5000N. The physical layout of this problem is shown in Fig. 2. The parameters h and α represent the design parameters including the force location on the free side and force direction (for further details, refer to the background section and Fig. 2).

To generate a dataset of optimized topologies needed for our experiment, we explored an input space including the force location and direction. We selected values for the force location ranging from 0 to 1 and force direction ranging from π2 to π2. We divided each parameter range into 20 segments, resulting in 21 values for each parameter. For example, the force locations were sampled at 0.0, 0.05, 0.1, …, 0.9, 0.95, and the force directions were sampled at π2, 45π100, 40π100, …, 40π100,45π100, and π2. By combining each force location value with each direction value, we generated optimized topologies by running the interior point solver for 1000 iterations or until the optimizer satisfied a tolerance of 0.01 for the displacement.

Upon completion of the optimization run for a particular set of parameters, the resulting distribution was discretized to transform the data into a format suitable for the regression models used in this study. For this problem, we employed an 80×40 mesh as the design domain. This mesh size was chosen to provide fine details within the design space, without unnecessarily increasing the computational running time for the solver. At the end, we provide a summary of the TO problems information we studied in Table 1.

Table 1

Summary of TO problems information

Problem nameInverse design inputsDesign input rangeMesh size
2D heat conductionVolume fraction0.3–0.670×70
Adiabatic length01.0
3D heat conductionVolume fraction0.3–0.650×50×50
Adiabatic area0–1.0
2D cantilever beamForce location0.0–1.080×40
Force directionπ2π2
Problem nameInverse design inputsDesign input rangeMesh size
2D heat conductionVolume fraction0.3–0.670×70
Adiabatic length01.0
3D heat conductionVolume fraction0.3–0.650×50×50
Adiabatic area0–1.0
2D cantilever beamForce location0.0–1.080×40
Force directionπ2π2

3.2 Model Training and Cross-Validation.

Since we discretized each possible input into 21 values, we had to assign some subset of values for training, validation, and test data. To do this, we randomly selected two out of the 21 values from each input for each problem to keep for either validation or testing, and any remaining values were used for training. From these two held-out values for each input, we generate all possible combinations of the input values for each problem and split these combinations 50/50 between validation and testing. This generates mutually exclusive training, validation, and testing datasets. The validation set is specifically chosen for hyperparameter selection during the training phase. On the other hand, the testing set is entirely excluded from the training process and hyperparameter selection for evaluating the final, trained models’ performance. Since all three problems have two input parameters, this corresponds to training, validation, and testing sizes of 361, 40, and 40 respectively.

To implement our models, we used the KNN and RF implementations available in the Scikit-Learn library [47]. For the KNN model, hyperparameter optimization involved finding the optimal settings for both the weighting scheme and the number of neighbors. Similarly, in the RF model, we optimized the number of estimators and the minimum number of samples required in newly created leaves. To implement the DeCNN model, we used Tensorflow [48]. Hyperparameters for the DeCNN included the optimal learning rate and batch sizes for each problem. Throughout this work, we have been using DECNN architecture begins with four dense layers with expanding in capacity 32, 128, 1024, and 4096 nodes, respectively. Each layer is followed by batch normalization and leaky ReLU activation function (with α=0.2). Then five transposed convolutional layers with decreasing filter sizes starting from 128 and cascading down to 64, 32, 16, and eventually reaching 1. A consistent kernel size of (4, 4) and strides of (2, 2) are used in each transposed convolutional layer to upsample the feature map to the desired output channels. Moreover, there are two fully connected layers with batch normalization and a sigmoid activation function to resize the output to the desired resolution.

To perform the hyper-parameter optimization, we employed the Bayesian optimization package in Ref. [49], which approximates the objective function using a Gaussian process. We use gp_hedge as the acquisition function which probabilistically chooses one of lower confidence bound, negative expected improvement, and negative probability of improvement acquisition functions at every iteration. We run each hyper-parameter Bayesian optimization runs until the acquisition function converges. Our aim was to cover all possible outcomes by carefully selecting the hyperparameter range for Bayesian optimization. For the KNN mode, we considered the number of neighbors, which ranged from 1 to 100 as integer values, and the weightings, which could be either “uniform” or “distance” as Boolean variables. Similarly, for RF models, we focused on the number of estimators and the minimum samples per leaf, both ranging from 1 to 30 as integer values. In the case of the DecNN model, we explored the learning rate within the range of 0 to 0.1, and the batch size as an integer number between 1 and 362 (except for the 3D heat conduction problem, where the batch size was determined between 1 and 100 due to computational power limitations).

To select the final optimal hyperparameters for each KNN, RF, or DeCNN model, we had to select which metric to optimize for over the cross-validation cases. Herein lies a major difference between the standard way of selecting ID models—picking the model that minimizes the pointwise MSE—and one that evaluates the model directly on the objective function of interest—what we refer to later in the paper as prediction objective function minimization. We will describe each approach in turn, and then show in the results section how they impact ID performance. In addition to the MSE, we also separately tested KNN, RF, and DeCNN models optimized using the log-loss (i.e., binary cross entropy), but its results were similar to that of using the MSE and thus we did not include it in the paper for space reasons.

3.2.1 Hyperparameter Optimization: PMSEM.

Our pointwise mean squared error method (PMSEM) for model performance evaluation uses the mean squared error between models’ predictions and their corresponding points in the topology from the validation set.

Mathematically, the PMSEM in a given trial can be described as a minimization of the PMSEM error measure E with respect to model hyperparameters H, where E is defined as
(8)
wherein M is the number of images used for data validation, N are the number of points per topology, i is the index of the topology, j is the index of the point in a given topology, P is the predicted value of the mass function at said point, and R is the actual value from the validation set.

Physically, PMSEM compares the similarity of the mass distributions predicted by a model to the ground truth distributions.

3.2.2 Hyperparameter Optimization: POFMM.

In contrast to PMSEM, our POFMM uses an alternative approach to model hyperparameter optimization. As one intention of using a model in an inverse design problem is to produce a prediction that is as close to an optimal design as possible and is therefore a good initialization point for future iterations, it is therefore desirable to find hyperparameter values which enable the model to yield predictions with good objective function values at iteration 0. In our problem, this means minimizing the objective function (thermal compliance for 2D and 3D heat conduction problem and structural compliance for cantilever beam) value of the model’s predictions at iteration 0. Rather than comparing the predictions to the corresponding mass distributions in the validation set, the model hyperparameters H can be optimized solely with respect to this objective function value F. Note that unlike PMSE, a model optimized in this way may not produce designs that are as close in the design space (i.e., have the same geometry) as the training set compared to the PMSE method, yet should in principle still be able to produce results with high performance. In practice, because the primary parameters of the model are trained via MSE, these differences only affect model choice at the hyper-parameter level.

3.3 Evaluating the Model Predictions for Warmstart Optimization.

For a given hyperparameter setting, we can now train the correspondent models via the train-test split scheme described above. We then use the optimized KNN, RF, and DeCNN models to generate predictions for each tested combination of design inputs.

We initialized the solvers (IPOPT for 2D and 3D conduction problems, and CVXOPT for 2D structural problems) with the corresponding predicted designed from the KNN, RF, and DeCNN models. The IPOPT optimizer exited each run if it reached a tolerance of 1.0×10100 or 100 iterations. The CVXOPT solver exited each run if it reached 1000 iterations or a displacement tolerance of less than 0.01.

As a control condition for the 2D and 3D heat conduction problems, we used a uniform initialization with a constant mass distribution equal to the volume fraction, since this is the most common initialization for SIMP-based density TO methods. Similarly, for the cantilever beam topology optimization problem we used a constant uniform distribution equal to the constant volume (0.3) fraction.

3.3.1 Data Post-Processing.

Following the conclusion of the optimized model evaluation process, thermal and structural compliance trajectory results were normalized, and the median value calculated. This was done to prevent any individual combinations of design inputs from having a disproportionately large or small effect on the thermal compliance trajectory. Specifically, the post-processing procedure is as follows:

  1. Normalize each value in each objective function trajectory with respect to the optimal (minimum) value obtained in said trajectory.

  2. Find the median of these normalized values at each iteration number for the PMSEM-optimized KNN models, the PMSEM-optimized RF models, PMSEM-optimized DeCNN models, POFMM-optimized KNN models, the POFMM-optimized RF models, and the POFMM-optimized DeCNN models. We chose to report the median value for the optimization trajectories since it is less sensitive to outliers.

  3. Render all trajectories uniform in the number of iterations considered by extending runs that terminate before the maximum number of iterations attained among any runs. This extension is achieved by conservatively extrapolating the final value reached in each run over the remaining iterations.

4 Results

Following the above methodology, this section first reviews the optimal models that we found for our specific inverse design problems, then presents the main quantitative results on the impact of the initialization methods on ID performance. Following these, we provide qualitative comparisons of the final designs produced under each method, and an example trajectory that helps shed light how the ID method influences the warm start behavior of further topology optimization.

Note that in all subsequent plots with shaded regions depict the 95% empirical confidence intervals of the median for their corresponding plotted functions. For example, in the case of Fig. 3, the shading represents the 95% bootstrapped empirical confidence interval on the median at each iteration for each combination of model type and hyperparameter optimization method (with 100 bootstrap resamples).

4.1 Optimal Model Hyperparameters.

We employed a Bayesian model to optimize the hyperparameters using the PMSEM and POFMM methods. The optimal hyperparameters using the PMSEM and POFMM are reported in the following paragraphs.

4.1.1 PMSEM.

Using PMSEM for the 2D heat conduction problem, we discovered that the KNN models performed best with 31 neighbors and a “distance” weighting scheme. Similarly, RF models performed best with 23 estimators and a minimum of 22 samples per leaf. Lastly, the DeCNN performed best with a learning rate of 1×106 and a batch size of 350.

For the 3D heat conduction problem, the KNN models performed best with two neighbors and a “uniform” weighting approach. The RF models performed best with 23 estimators and a minimum of two samples per leaf. Lastly, the DeCNN performed best with a learning rate of 1×104 and a batch size of 100.

For the 2D cantilever beam problem, the KNN models performed best with two neighbors and a “distance” weighting strategy. The RF models performed best with 23 estimators and a minimum of 13 samples per leaf. Lastly, the DeCNN performed best with a learning rate of 1×104 and a batch size of 361.

4.1.2 POFMM.

Using POFMM metric for the 2D heat conduction problem, we discovered that the KNN models performed optimally with one neighbor and a “distance” weighting scheme. Similarly, RF models showed the best results with 30 estimators and a minimum of one samples per leaf. Furthermore, the DeCNN performed its best performance with a learning rate of 0.0523 and a batch size of 114.

For the 3D heat conduction problem, the KNN models achieved the best outcomes with one neighbor and a “uniform” weighting approach. Similarly, the RF models displayed optimal performance with one estimator and a minimum of one sample per leaf. Additionally, for the DeCNN, we found that a learning rate of 0.0158 and a batch size of 30 yielded the most favorable results.

For the case of the 2D cantilever beam problem, the KNN models performed best with one neighbor with a “uniform” weighting strategy. Likewise, the RF models demonstrated their highest performance with 27 estimators and a minimum of one sample per leaf. Moreover, the DeCNN performed best with a learning rate of 1×1004 and a batch size of 150.

4.2 Impact of Different Initialization Methods on Prediction Performance and Trajectory Acceleration for 2D Heat Conduction Problem.

Using these optimized models, we can now compare how they perform at both predicting the optimal geometry as well as how they act as a warm start to further topology optimization, compared to a control (uniform initialization).

We found that, on average, all model types tested with either hyperparameter optimization method (POFMM or PMSEM) produced predictions with thermal compliance values significantly less than that of the control (Fig. 3). We also found that initializing the IPOPT optimization process using these methods also, on average, offered an acceleration for low evaluation numbers, despite the fact that the optimizer increases the thermal compliance in early iterations of warm starting (Fig. 3)—we show why this occurs later in the paper. Beyond around 20 iterations, the control begins to reach comparable performance to that of the warm-started optimizers.

Specifically, the median of the KNN models optimized using POFMM generated predictions that, at iteration 0, achieved 12.3% of the minimum thermal compliance reached in the corresponding control run, whereas predictions generated using KNN models optimized using PMSEM achieved 76.9%. For the RF models, these values were 3.6% and 57.8% using POFMM and PMSEM, respectively. Furthermore, for the DeCNN models, these values were 5.8% and 54.2% using POFMM and PMSEM. For comparison, a constant mass distribution set the volume limit (the control for this experiment) had a median performance of 602% (Table 2), almost an order of magnitude larger than the ID methods.

Table 2

Median normalized thermal compliance (MNTC) at the zeroth iteration for tested initialization schemes (2D heat conduction problem)

Model typeMNTC at iteration 0
KNN PMSEM1.769
RF PMSEM1.578
DeCNN PMSEM1.542
KNN POFMM1.123
RF POFMM1.036
DeCNN POFMM1.058
Control7.020
Model typeMNTC at iteration 0
KNN PMSEM1.769
RF PMSEM1.578
DeCNN PMSEM1.542
KNN POFMM1.123
RF POFMM1.036
DeCNN POFMM1.058
Control7.020

To further illustrate the impact of using ML models for predicting the final design, we have depicted the relationship between the model used and the normalized initial optimality gap (IOG) in Fig. 4. The normalized initial optimality gap measures how closely the initial prediction of the ID model matches the performance of the control (TO) solution before undergoing further optimization.

4.3 Impact of Different Initialization Methods on Prediction Performance and Trajectory Acceleration for 3D Heat Conduction Problem.

As with the above, for the 3D heat conduction problem, we evaluated how each model’s warm start performance was affected by the choice of hyper-parameter optimization metric. As before, all models consistently yielded designs with lower final thermal compliance values compared to those obtained through the control method (uniform initialization), regardless of how we optimized the hyperparameters (refer to Fig. 5).

To compare the models in terms of their initial design predictions (prior to warm starting), Fig. 6 plots how the normalized IOG changes across each model and hyper-parameter optimization strategy. Our findings reveal that the median predictions of KNN models optimized using POFMM and PMSEM share the same hyperparameters, and thus perform the same at 238.6% of the minimum thermal compliance achieved in the corresponding control run at iteration 0. Similarly, RF models achieve values of 186.6% and 185.4% using POFMM and PMSEM, respectively. The DeCNN models also demonstrate improved performance, with values of 207.3% and 160.6% using POFMM and PMSEM, respectively. In comparison, the control method, represented by a constant mass distribution, exhibits a significantly higher median value of 690.5% (Table 3), nearly five times larger than the ML-based methods.

Table 3

MNTC at the zeroth iteration for tested initialization schemes (3D heat conduction problem)

Model typeMNTC at iteration 0
KNN PMSEM-POFMM2.386
RF PMSEM2.856
DeCNN PMSEM2.073
RF POFMM2.854
DeCNN POFMM1.606
Control7.905
Model typeMNTC at iteration 0
KNN PMSEM-POFMM2.386
RF PMSEM2.856
DeCNN PMSEM2.073
RF POFMM2.854
DeCNN POFMM1.606
Control7.905

After warmstarting, we can also observe that hyperparameters optimized through the POFMM method tend to yield lower thermal compliance values (e.g., see the DeCNN POFMM model).

4.4 Impact of Different Initialization Methods on Prediction Performance and Trajectory Acceleration for the Cantilever Beam Problem.

For the cantilever beam problem, we again see in both Figs. 7 and 8 that irrespective of whether the POFMM or PMSEM hyperparameter optimization methods were employed, the ML models consistently yield initial thermal compliance values significantly lower than those obtained through the control method (refer to Fig. 7). We show in Fig. 8 that the median performance of KNN models, optimized using POFMM, yielded predictions at only 8.4% of the minimum thermal compliance achieved in the corresponding control run during the initial iteration. In contrast, KNN models optimized using PMSEM had an average performance of 52.2%. Similarly, RF models achieve 11.4% and 56.3% with POFMM and PMSEM optimizations, respectively. Lastly, for DeCNN models achieve 22.8% and 33.6% using POFMM and PMSEM approaches. In contrast, a constant mass distribution, which served as the control for this experiment, exhibited a median of 3200%, nearly two orders of magnitude larger than the performance of the ID methods (as shown in Table 4).

Table 4

MNTC at the zeroth iteration for tested initialization schemes (2D cantilever beam problem)

Model typeMNTC at iteration 0
KNN PMSEM1.522
RF PMSEM1.563
DeCNN PMSEM1.336
KNN POFMM1.084
RF POFMM1.114
DeCNN POFMM1.228
Control33.878
Model typeMNTC at iteration 0
KNN PMSEM1.522
RF PMSEM1.563
DeCNN PMSEM1.336
KNN POFMM1.084
RF POFMM1.114
DeCNN POFMM1.228
Control33.878

When we use the ML predictions to warm start optimization the ID models notably accelerate convergence, particularly during early iterations.

4.5 Why Does Thermal Compliance Increase After Warmstarting?.

A typical normalized optimization trajectory for the warmstarted optimizations of 2D heat sink problem is shown in Fig. 9. In particular, Fig. 9 displays the trajectory taken by the optimization of a mass distribution subjected to a volume limit of 0.36, an adiabatic region length of 0.45, and an initialization produced by a KNN model optimized using PMSEM (Fig. 9).

In this case for the KNN-initialized trajectory, it appears that the increase in thermal compliance that peaks at iteration 4 is which can be attributed to a reduction in the specificity of the predicted material distribution. This reduction occurs when the optimizer, at and around this iteration, tends to adjust the values of a towards intermediate levels, deviating from the strict bounds of the SIMP formulation in Eq. (2). In other words, the optimizer is subtracting some of the material predicted by the model.

In contrast, the control trajectory is significantly smoother and lacks this degree of loss of definition in its mass distribution. In our other study, we compared the IPOPT and SciPy solvers, utilizing the sequential least squares programming method [27]. The outcomes of this comparison can be found in Supplemental Material available in the Supplemental Materials on the ASME Digital Collection. We observed that the SciPy solver exhibited a gradual and monotonic decrease in compliance. In contrast, the warm start optimization with IPOPT displayed an intriguing behavior, initially showcasing an increase in compliance. We hypothesize that IPOPT’s solution method, which involves numerical approximation of the Hessian matrix during the initial iterations of the solver, may be responsible for this behavior. It is plausible that these early sub-optimal steps taken by IPOPT, stemming from the approximation process, contribute to the initial “bump” witnessed in the convergence trajectory.

4.6 Impact of Different Initializations on Final Optimized Designs.

We also observed qualitative differences between the 2D heat conduction designs produced from optimization processes run with different initializations. This is expected due to the non-convexity of the problem, as in such problems differing initializations are often expected to result in a numerical optimizer arriving at different local minima. Likewise, we saw that the final designs in each case had similar final objective function values, including the cases presented in Fig. 10.

Fig. 10
There are numerous structural differences between optimized designs that used different initializations. Note that the distributions shown here are direct plots of the mass distribution function inputs and outputs to the initialization process, and hence display interpolation between points.
Fig. 10
There are numerous structural differences between optimized designs that used different initializations. Note that the distributions shown here are direct plots of the mass distribution function inputs and outputs to the initialization process, and hence display interpolation between points.
Close modal

Figure 10 displays the results of optimization runs using different initializations. All runs used a volume fraction of 0.36 and an adiabatic region length of 0.45. Note that the optimizer exited upon the satisfaction of the tolerance of 1.0×10100. There are small differences in the structures of the designs that are visible upon inspection. The designs nevertheless appear to retain their dendritic character in this case and share a substantial resemblance to each other.

4.7 Tradeoff Between the Objective Value and Constraints.

One important parameter in constrained inverse design optimization is conserving the constraint. In order to gain insights into the impact of using ML models optimized through either POFMM or PMSEM on the deviation from the constraint volume in a 2D heat conduction problem, we have generated a plot showcasing the deviation of mass used for the ML models’ predictions in this study (refer to Fig. 11). Figure 11 illustrates the disparity in mass used for each model prediction. Positive values imply that there is remaining mass before exceeding the constraint limits, while negative values indicate that mass should be removed to meet the constraint limit. Upon observation, Fig. 11 makes evident that, on average, ML models optimized using the POFMM method better preserve the constraint limits compared to PMSEM models. In both cases, however, there is no guarantee that any ID model will exactly preserve the constraints, and empirically we see that the effect on the volume fraction constraint can vary.

Fig. 11
The initial volume deviation of either PMSEM or POFMM optimized of KNN, RF, and DeCNN models for 2D heat conduction problem. The lines in the middle of each box plot represent the median, and the dashed line represents the value when the mass used for design is equal to the constraint volume of the problem.
Fig. 11
The initial volume deviation of either PMSEM or POFMM optimized of KNN, RF, and DeCNN models for 2D heat conduction problem. The lines in the middle of each box plot represent the median, and the dashed line represents the value when the mass used for design is equal to the constraint volume of the problem.
Close modal

Nevertheless, this outcome raises intriguing questions for future research into ID models. Namely, to what extent can ID models both predict high-performance designs while simultaneously adhering to constraints?

5 Discussion, Limitations, and Future Work

While the specific ID models we tested were simple, they nevertheless highlight interesting phenomena that may generalize to other problems or ID methods. Here we review possible limitations or areas of future work that may affect our stated contributions.

5.1 Alternative Training and Testing Metrics.

Changing how we optimized the hyper-parameters of the ID models (from MSE-driven to objective function-driven) affected not only quantitative convergence measures (per Fig. 3), but also qualitative predictions (per Fig. 10), even though the underlying primary training measures were identical (i.e., the KNN, RF, and DeCNN models minimize the design’s MSE reconstruction error during training).

This raises questions not only about how we evaluate ID methods but also points toward alternative training procedures. For example, strategies such as training over joint losses that include both reconstruction error-type losses as well as objective-function-derived losses. Moreover, if we know that the goal of an ID method is to act as a warm start for additional TO, then perhaps predicting the “peak” distribution seen in Fig. 9 may be faster than either of the PMSEM or POFMM approaches used in this paper. In other words, the best prediction might not be the best performing initially, but one that is most helpful to downstream optimization. Understanding under what conditions and ID applications different methods excel would be a fruitful avenue for future work.

5.2 Variations in ID Methods and Dataset Size and Problems.

Lastly, using more advanced ID methods may alter some of our observations, for example, by eliminating the need for an adjoint optimizer to redo portions of the ID prediction as shown in Fig. 3 if the ID predictions lie sufficiently close to the global optima.

We are currently exploring whether more advanced ID methods—such as diffusion models—might yield further improved performance or change the behavior noted in the trajectory figures (refer to Figs. 3, 5, and 7). This said the DeCNN model that we tested had a sufficiently large model capacity to be competitive with state-of-the-art methods, so it is not clear whether more advanced models would significantly affect these results.

Likewise, we did not investigate here how ID performance is modulated by the size of the training data set; uncovering transition points where ID methods become performant remains an open question worthy of future study in general, and could be studied using techniques introduced in Ref. [27]. Furthermore, while we got better performance in using the POFMM method (refer to Figs. 3, 5, and 7) in comparison to the PMSEM method this does not come without a cost. One notable drawback is the computational cost associated with the objective evaluation for each model. Due to the nature of the POFMM algorithm, which involves computing an objective function evaluation for each training design, the computational time, and resources required to evaluate each model can be significant. Furthermore, the return on investment of employing the POFMM method should be carefully considered. While improved performance is desirable, it is crucial to assess whether the benefits outweigh the computational cost. Factors such as the specific application, the available computational resources, and the desired level of accuracy need to be taken into account, and recent work provides some guidance on how to address these tradeoffs [27].

6 Conclusions

We compared several inverse design models (KNNs, RFs, and DeCNNs) and two approaches for model hyperparameter optimization with standard uniform initialization using the SIMP-based TO on three topology optimization problems. We described a benchmark set of environments and datasets and showed how those models affected both the initial predictions from the ID methods as well as downstream acceleration when warm starting optimization.

Our findings indicate that both methods of hyperparameter optimization yield KNN, RF, and DeCNN models that can substantially accelerate the optimization process when their predictions initialize an interior point solver. These predictions also tend to have objective values close to the minimum obtained in a corresponding control optimization run, and tend to significantly outperform initialization with a uniform mass distribution—a common TO initialization method. Furthermore, our study challenges the conventional approach of optimizing ID methods solely based on the MSE in the reconstruction of test set designs. We demonstrate that optimizing for models that produce lower objective function values can outperform standard MSE-derived hyperparameter optimization methods.

Although we investigated specific physical problems, two model hyperparameter optimization methods, and three ID model types (KNN, RF, and DeCNN) there remains a large space for future work in both different physical problems and in different computational approaches to modeling and model optimization. Overall, our results highlight the nuances in evaluating ID methods—that your end goal in inverse design, whether that be direct prediction, distribution matching, or warm starting an optimizer, can affect both your evaluation approach and how you optimize your models.

Footnotes

2

Code to reproduce the results in this paper is located at: https://github.com/IDEALLab/JMD_MSE_ID.

3

See Note 2.

Acknowledgment

This research was supported in part by funding from the U.S. Department of Energy’s Advanced Research Projects Agency-Energy (ARPA-E) DIFFERENTIATE funding opportunity through award DE-AR0001216 and from the National Science Foundation through award #1943699.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The data and information that support the findings of this article are freely available online.3

References

1.
Lee
,
X. Y.
,
Balu
,
A.
,
Stoecklein
,
D.
,
Ganapathysubramanian
,
B.
, and
Sarkar
,
S.
,
2019
, “
A Case Study of Deep Reinforcement Learning for Engineering Design: Application to Microfluidic Devices for Flow Sculpting
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111401
.
2.
Shi
,
X.
,
Qiu
,
T.
,
Wang
,
J.
,
Zhao
,
X.
, and
Qu
,
S.
,
2020
, “
Metasurface Inverse Design Using Machine Learning Approaches
,”
J. Phys. D: Appl. Phys.
,
53
(
27
), p.
275105
.
3.
Andreassen
,
E.
,
Clausen
,
A.
,
Schevenels
,
M.
,
Lazarov
,
B. S.
, and
Sigmund
,
O.
,
2011
, “
Efficient Topology Optimization in MATLAB Using 88 Lines of Code
,”
Struct. Multidiscipl. Optim.
,
43
(
1
), pp.
1
16
.
4.
Topology Optimisation of Heat Conduction Problems Governed by the Poisson Equation. http://www.dolfin-adjoint.org/en/latest/documentation/poisson-topology/poisson-topology.html. Accessed February 10, 2022.
5.
Bendsoe
,
M. P.
, and
Sigmund
,
O.
,
2003
,
Topology Optimization: Theory, Methods, and Applications
,
Springer Science & Business Media
.
6.
Chen
,
Q.
,
Wang
,
J.
,
Pope
,
P.
,
Chen
,
W. W.
, and
Fuge
,
M.
,
2022
, “
Inverse Design of 2D Airfoils Using Conditional Generative Models and Surrogate Log-Likelihoods
,”
ASME J. Mech. Des.
,
144
(
5
), p.
053302
.
7.
Kim
,
B.
,
Lee
,
S.
, and
Kim
,
J.
,
2020
, “
Inverse Design of Porous Materials Using Artificial Neural Networks
,”
Sci. Adv.
,
6
(
1
), p.
eaax9324
.
8.
Kim
,
S.
,
Noh
,
J.
,
Gu
,
G. H.
,
Aspuru-Guzik
,
A.
, and
Jung
,
Y.
,
2020
, “
Generative Adversarial Networks for Crystal Structure Prediction
,”
ACS Central Sci.
,
6
(
8
), pp.
1412
1420
.
9.
Challapalli
,
A.
,
Patel
,
D.
, and
Li
,
G.
,
2021
, “
Inverse Machine Learning Framework for Optimizing Lightweight Metamaterials
,”
Mater. Des.
,
208
, p.
109937
.
10.
Huang
,
Z.
,
Liu
,
X.
, and
Zang
,
J.
,
2019
, “
The Inverse Design of Structural Color Using Machine Learning
,”
Nanoscale
,
11
(
45
), pp.
21748
21758
.
11.
Liu
,
Z.
,
Zhu
,
D.
,
Raju
,
L.
, and
Cai
,
W.
,
2021
, “
Tackling Photonic Inverse Design With Machine Learning
,”
Adv. Sci.
,
8
(
5
), p.
2002923
.
12.
Wiecha
,
P. R.
,
Arbouet
,
A.
,
Girard
,
C.
, and
Muskens
,
O. L.
,
2021
, “
Deep Learning in Nano-Photonics: Inverse Design and Beyond
,”
Photon. Res.
,
9
(
5
), pp.
B182
B200
.
13.
So
,
S.
, and
Rho
,
J.
,
2019
, “
Designing Nanophotonic Structures Using Conditional Deep Convolutional Generative Adversarial Networks
,”
Nanophotonics
,
8
(
7
), pp.
1255
1261
.
14.
Jiang
,
J.
, and
Fan
,
J. A.
,
2020
, “
Simulator-Based Training of Generative Neural Networks for the Inverse Design of Metasurfaces
,”
Nanophotonics
,
9
(
5
), pp.
1059
1069
.
15.
Sanchez-Lengeling
,
B.
,
Outeiral
,
C.
,
Guimaraes
,
G. L.
, and
Aspuru-Guzik
,
A.
,
2017
, “
Optimizing Distributions Over Molecular Space. An Objective-Reinforced Generative Adversarial Network for Inverse-Design Chemistry (Organic)
”.
16.
Jin
,
Z.
,
Zhang
,
Z.
,
Demir
,
K.
, and
Gu
,
G. X.
,
2020
, “
Machine Learning for Advanced Additive Manufacturing
,”
Matter
,
3
(
5
), pp.
1541
1556
.
17.
Sekar
,
V.
,
Zhang
,
M.
,
Shu
,
C.
, and
Khoo
,
B. C.
,
2019
, “
Inverse Design of Airfoil Using a Deep Convolutional Neural Network
,”
AIAA J.
,
57
(
3
), pp.
993
1003
.
18.
Ongie
,
G.
,
Jalal
,
A.
,
Metzler
,
C. A.
,
Baraniuk
,
R. G.
,
Dimakis
,
A. G.
, and
Willett
,
R.
,
2020
, “
Deep Learning Techniques for Inverse Problems in Imaging
,”
IEEE J. Sel. Areas Inf. Theory
,
1
(
1
), pp.
39
56
.
19.
Kim
,
I.
,
Park
,
S. J.
,
Jeong
,
C.
,
Shim
,
M.
,
Kim
,
D. S.
,
Kim
,
G.-T.
, and
Seok
,
J.
,
2022
, “
Simulator Acceleration and Inverse Design of Fin Field-Effect Transistors Using Machine Learning
,”
Sci. Rep.
,
12
(
1
), pp.
1
9
.
20.
Hegde
,
R.
,
2021
, “
Sample-Efficient Deep Learning for Accelerating Photonic Inverse Design
,”
OSA Contin.
,
4
(
3
), pp.
1019
1033
.
21.
Klaučo
,
M.
,
Kalúz
,
M.
, and
Kvasnica
,
M.
,
2019
, “
Machine Learning-Based Warm Starting of Active Set Methods in Embedded Model Predictive Control
,”
Eng. Appl. Artif. Intell.
,
77
, pp.
1
8
.
22.
Nie
,
Z.
,
Lin
,
T.
,
Jiang
,
H.
, and
Kara
,
L. B.
,
2021
, “
Topologygan: Topology Optimization Using Generative Adversarial Networks Based on Physical Fields Over the Initial Domain
,”
ASME J. Mech. Des.
,
143
(
3
), p.
031715
.
23.
Wang
,
D.
,
Xiang
,
C.
,
Pan
,
Y.
,
Chen
,
A.
,
Zhou
,
X.
, and
Zhang
,
Y.
,
2022
, “
A Deep Convolutional Neural Network for Topology Optimization With Perceptible Generalization Ability
,”
Eng. Optim.
,
54
(
6
), pp.
973
988
.
24.
Mazé
,
F.
, and
Ahmed
,
F.
,
2023
, “
Diffusion Models Beat GANS on Topology Optimization
,”
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)
,
Washington, DC
,
Feb. 7
.
25.
Giannone
,
G.
,
Srivastava
,
A.
,
Winther
,
O.
, and
Ahmed
,
F.
,
2023
, “
Aligning Optimization Trajectories With Diffusion Models For Constrained Design Generation
,”
Advances in Neural Information Processing Systems
,
New Orleans, FL
,
Feb. 13
.
26.
Regenwetter
,
L.
,
Nobari
,
A. H.
, and
Ahmed
,
F.
,
2022
, “
Deep Generative Models in Engineering Design: A Review
,”
ASME J. Mech. Des.
,
144
(
7
), p.
071704
.
27.
Habibi
,
M.
,
Wang
,
J.
, and
Fuge
,
M.
,
2023
, “
When Is It Actually Worth Learning Inverse Design?
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
, Vol.
87301
,
American Society of Mechanical Engineers
, p.
V03AT03A025
.
28.
Regenwetter
,
L.
,
Weaver
,
C.
, and
Ahmed
,
F.
,
2023
, “
Framed: An Automl Approach for Structural Performance Prediction of Bicycle Frames
,”
Comput.-Aided Des.
,
156
(
C
), p.
103446
.
29.
Mehmani
,
A.
,
Chowdhury
,
S.
,
Meinrenken
,
C.
, and
Messac
,
A.
,
2018
, “
Concurrent Surrogate Model Selection (COSMOS): Optimizing Model Type, Kernel Function, and Hyper-parameters
,”
Struct. Multidiscipl. Optim.
,
57
(
3
), pp.
1093
1114
.
30.
Jiang
,
X.
,
Wang
,
H.
,
Li
,
Y.
, and
Mo
,
K.
,
2020
, “
Machine Learning Based Parameter Tuning Strategy for MMC Based Topology Optimization
,”
Adv. Eng. Softw.
,
149
, p.
102841
.
31.
Li
,
J. K.
, and
Zhang
,
Y. M.
,
2011
, “
Method of Continuum Structural Topology Optimization With Information Functional Materials Based on K Nearest Neighbor
,”
Adv. Mater. Res.
,
321
, pp.
200
203
. www.scientific.net/AMR.321.200
32.
Jin
,
K. H.
,
McCann
,
M. T.
,
Froustey
,
E.
, and
Unser
,
M.
,
2017
, “
Deep Convolutional Neural Network for Inverse Problems in Imaging
,”
IEEE Trans. Image Process.
,
26
(
9
), pp.
4509
4522
.
33.
Singh
,
A.
,
Halgamuge
,
M. N.
, and
Lakshmiganthan
,
R.
,
2017
, Impact of Different Data Types on Classifier Performance of Random Forest, Naive Bayes, and k-Nearest Neighbors Algorithms.
34.
Murphy
,
K. P.
,
2012
,
Machine Learning: A Probabilistic Perspective
,
MIT Press
.
35.
Bishop
,
C. M.
,
2006
,
Pattern Recognition and Machine Learning
, Vol.
4
,
Springer
.
36.
Breiman
,
L.
,
2001
, “
Random Forests
,”
Mach. Learn.
,
45
(
1
), pp.
5
32
.
37.
Mao
,
S.
,
Cheng
,
L.
,
Zhao
,
C.
,
Khan
,
F. N.
,
Li
,
Q.
, and
Fu
,
H.
,
2021
, “
Inverse Design for Silicon Photonics: From Iterative Optimization Algorithms to Deep Neural Networks
,”
Appl. Sci.
,
11
(
9
), p.
3822
.
38.
Zeiler
,
M. D.
,
Krishnan
,
D.
,
Taylor
,
G. W.
, and
Fergus
,
R.
,
2010
, “
Deconvolutional Networks
,”
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
,
San Francisco, CA
,
June 13
, IEEE, pp.
2528
2535
.
39.
Mohan
,
R.
,
2014
, Deep Deconvolutional Networks for Scene Parsing. arXiv preprint arXiv:1411.4101.
40.
Fakhry
,
A.
,
Zeng
,
T.
, and
Ji
,
S.
,
2016
, “
Residual Deconvolutional Networks for Brain Electron Microscopy Image Segmentation
,”
IEEE Trans. Med. Imag.
,
36
(
2
), pp.
447
456
.
41.
Sigmund
,
O.
, and
Maute
,
K.
,
2013
, “
Topology Optimization Approaches
,”
Struct. Multidiscipl. Optim.
,
48
(
6
), pp.
1031
1055
.
42.
Dilgen
,
S. B.
,
Dilgen
,
C. B.
,
Fuhrman
,
D. R.
,
Sigmund
,
O.
, and
Lazarov
,
B. S.
,
2018
, “
Density Based Topology Optimization of Turbulent Flow Heat Transfer Systems
,”
Struct. Multidiscipl. Optim.
,
57
(
5
), pp.
1905
1918
.
43.
Wächter
,
A.
, and
Biegler
,
L. T.
,
2006
, “
On the Implementation of an Interior-Point Filter Line-Search Algorithm for Large-Scale Nonlinear Programming
,”
Math. Program.
,
106
(
1
), pp.
25
57
.
44.
Sigmund
,
O.
,
2001
, “
A 99 Line Topology Optimization Code Written in Matlab
,”
Struct. Multidiscipl. Optim.
,
21
(
2
), pp.
120
127
.
45.
Mitusch
,
S. K.
,
Funke
,
S. W.
, and
Dokken
,
J. S.
,
2019
, “
Dolfin-Adjoint 2018.1: Automated Adjoints for Fenics and Firedrake
,”
J. Open Sourc. Softw.
,
4
(
38
), p.
1292
.
46.
Funke
,
S. W.
, and
Farrell
,
P. E.
,
2013
, A Framework for Automated PDE-Constrained Optimisation. arXiv preprint arXiv:1302.3894.
47.
Buitinck
,
L.
,
Louppe
,
G.
,
Blondel
,
M.
,
Pedregosa
,
F.
,
Mueller
,
A.
,
Grisel
,
O.
, and
Niculae
,
V.
,
2013
, “
API Design for Machine Learning Software: Experiences From the Scikit-Learn Project
,”
European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases
,
Prague, Czech Republic
,
Septemeber
, pp.
108
122
.
48.
Abadi
,
M.
,
Agarwal
,
A.
,
Barham
,
P.
,
Brevdo
,
E.
,
Chen
,
Z.
,
Citro
,
C.
,
Corrado
,
G. S.
, et al.,
2015
, TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software Available From tensorflow.org.
49.
Head
,
T.
,
Kumar
,
M.
,
Nahrstaedt
,
H.
,
Louppe
,
G.
, and
Shcherbatyi
,
I.
,
2020
, Scikit-Optimize/Scikit-Optimize.

Supplementary data