A Machine Learning Approach to Determine Airport Asphalt Concrete Layer Moduli Using Heavy Weight Deflectometer Data

An integrated approach based on machine learning and data augmentation techniques has been developed in order to predict the stiffness modulus of the asphalt concrete layer of an airport runway, from data acquired with a heavy weight deflectometer (HWD). The predictive model relies on a shallow neural network (SNN) trained with the results of a backcalculation, by means of a data augmentation method and can produce estimations of the stiffness modulus even at runway points not yet sampled. The Bayesian regularization algorithm was used for training of the feedforward backpropagation SNN, and a k-fold cross-validation procedure was implemented for a fair performance evaluation. The testing phase result concerning the stiffness modulus prediction was characterized by a coefficient of correlation equal to 0.9864 demonstrating that the proposed neural approach is fully reliable for performance evaluation of airfield pavements or any other paved area. Such a performance prediction model can play a crucial role in airport pavement management systems (APMS), allowing the maintenance budget to be optimized.


Introduction
Road networks and airport areas are key assets [1] for both developed and developing countries [2]. Maintenance work costs are more significant for developed countries, while first construction work costs are more relevant for developing ones [3]. However, both maintenance and construction work require huge natural resource consumption in terms of aggregates and bituminous binder. In fact, such transport infrastructures have a great impact on energy consumption for the industrial productions involved and the related emissions, given that their service life fixed is in several tens of years [4]. In this regard, the prediction of long-term performance is fundamental, particularly the mechanical strength (stiffness modulus) from repeated investigations over time, in order to properly implement maintenance and rehabilitation (M&R) strategies to achieve sustainable technical, economic, and environmental solutions [5]. It is widely recognized that efficient airport management has positive environmental effects both in reducing major airport congestion and in minimizing land consumption by making a better use of existing infrastructures [6], thus improving the sustainability of airport construction.
The runway is the most important airport infrastructure as it has to ensure adequate operability whatever the traffic and weather conditions. However, there are many causes that can lead to its deterioration such as aging, increased traffic demand, and lack of adequate investment [7]. It is therefore necessary to employ the most suitable and reliable investigation techniques [8] to assess its structural integrity, compliant with the limits of budget constraints [9]. The structural assessment of airport pavements provides valuable information about their expected behavior [10]. It is very useful for estimating their formal theory for their solution [37]. Although ANNs work on the basis of a non-physically based approach, they are characterized by high computational efficiency and small prediction errors [38]. It is also worth pointing out that with such neural models data processing is very fast, allowing the routine F/HWD deflection analysis to be performed even in the field [39]. However, artificial neural networks usually require a large dataset to be successfully trained. For this reason, starting from multiple finite element simulations, synthetic databases are often prepared and subsequently used as input for the neural models in order to allow its proper training [40]. In this study, the goal was to develop a predictive model which can enhance the conventional backcalculation by means of advanced machine learning features. The result is an innovative soft computing tool that, using state-of-the-art data augmentation techniques and the computational effectiveness of artificial neural networks, can predict reliable values of the runway asphalt concrete modulus (E AC ), even in points not yet sampled. Since with the proposed approach, it is possible to predict the stiffness modulus by providing as input the deflections and the spatial distribution of the sampling points only, the functional relationship between such variables is respected. Compared to the traditional use of ANNs in the field of pavement engineering, the proposed approach avoids preliminary long finite element simulations for training the model. Therefore, this fast and reliable approach could be used in APMS as a support tool for the planning of intervention priorities.
By providing a numerical estimation of the modulus in an arbitrary location on the runway, the proposed pavement performance prediction model could allow us to identify the areas that most require maintenance interventions thus reducing the costs of instrumental monitoring and consequently allowing active and efficient management [41].

In Situ Investigation
The experimental campaign took place at the "Falcone e Borsellino" airport of Palermo-Punta Raisi in the period between 9 and 29 February 2012. The Italian international airport is located 35 km west of Palermo. This infrastructure consists of two intersecting runways: the main one named Runway 07/25 and the secondary one named Runway 02/20. On the latter, deflection measurements were collected by means of a heavy weight deflectometer.
Five measurement lines were established: the central axis of the runway (0 m) and other axes ±3 m and ±6 m away from it. The portion of the runway considered, starting from header 02 in the south and arriving at header 20 in the north, was 1800 m long ( Figure 1). The experiments were carried out at regular 100 m intervals, thus obtaining 19 impact points for each longitudinal axis. Since there were 5 measurement lines, the total number of measuring points was 95 ( Figure 2).     The instrumentation used was the Dynatest 8000, a device capable of performing non-destructive tests and widely used to determine physical and mechanical properties of airport pavements. The principle is to induce small surface deflections of the pavement by applying an impulsive load in a controlled manner. This is done in order to better simulate the effects of an aircraft's moving wheel [42]. The load is obtained by dropping a suspended mass from a predetermined height on a 30 cm diameter plate resting on the pavement surface. The magnitude of the impulse load transmitted by the device to the pavement can be varied from 30 kN to 240 kN by changing the weight and height of the fall. A magnitude of around 140 kN was adopted. Together with the plate, a set of accelerometric transducers is placed in contact with the pavement and is able to measure the deflections induced at several distances from the loading axis such as 0 (δ ), 200 (δ ), 300 (δ ), 450 (δ ), 600 (δ ), 900 (δ ), 1200 (δ ), 1500 (δ ), and 1800 mm (δ ) from the center of the load. This method is preferred among other methods because it is simpler, more reliable, and cheaper [43]. In fact, the costs that would be associated with reconstructing the area of operation if destructive testing were adopted are avoided [44]. A preliminary contour map of the δ values measured just below the loading plate ( Figure 3) immediately shows that the highest deflections are located near the central axis of the pavement. This makes it possible to identify the so-called touchdown zones (TDZs), i.e., those portions of a runway, beyond the threshold, where landing airplanes are first intended to meet the runway [45]. In these areas, δ values exceed 900 μm and are more than twice or sometimes three times higher than the deflections measured moving away toward the ends of the runway. This provides a preliminary evaluation of which runway areas could require maintenance interventions. The instrumentation used was the Dynatest 8000, a device capable of performing non-destructive tests and widely used to determine physical and mechanical properties of airport pavements. The principle is to induce small surface deflections of the pavement by applying an impulsive load in a controlled manner. This is done in order to better simulate the effects of an aircraft's moving wheel [42]. The load is obtained by dropping a suspended mass from a predetermined height on a 30 cm diameter plate resting on the pavement surface. The magnitude of the impulse load transmitted by the device to the pavement can be varied from 30 kN to 240 kN by changing the weight and height of the fall. A magnitude of around 140 kN was adopted. Together with the plate, a set of accelerometric transducers is placed in contact with the pavement and is able to measure the deflections induced at several distances from the loading axis such as 0 (δ 0 ), 200 (δ 1 ), 300 (δ 2 ), 450 (δ 3 ), 600 (δ 4 ), 900 (δ 5 ), 1200 (δ 6 ), 1500 (δ 7 ), and 1800 mm (δ 8 ) from the center of the load. This method is preferred among other methods because it is simpler, more reliable, and cheaper [43]. In fact, the costs that would be associated with reconstructing the area of operation if destructive testing were adopted are avoided [44]. A preliminary contour map of the δ 0 values measured just below the loading plate ( Figure 3) immediately shows that the highest deflections are located near the central axis of the pavement. This makes it possible to identify the so-called touchdown zones (TDZs), i.e., those portions of a runway, beyond the threshold, where landing airplanes are first intended to meet the runway [45]. In these areas, δ 0 values exceed 900 µm and are more than twice or sometimes three times higher than the deflections measured moving away toward the ends of the runway. This provides a preliminary evaluation of which runway areas could require maintenance interventions.

Deflection Basin Parameters
Deflection basin parameters (DBPs) are obtained by processing the results produced by a HWD investigation and sometimes can be used to monitor the structural integrity of in-service pavements [46]. Based on a comprehensive literature review, the most widely used are:

Deflection Basin Parameters
Deflection basin parameters (DBPs) are obtained by processing the results produced by a HWD investigation and sometimes can be used to monitor the structural integrity of in-service pavements [46]. Based on a comprehensive literature review, the most widely used are:

•
Surface curvature index (SCI) which provides information on changes in the nearsurface layer's relative strength.
SCI is an accurate indicator of the AC layer conditions and, for certain thicknesses, E AC and SCI tend to show an approximately linear behavior in a log-log scale [47].

•
Deflection ratio (DR) which takes into account the type and quality of materials by relating them to the ratio of two deflections.
• Area under deflection basin curve (AREA) which relates the stiffness of the pavement structure to a shape factor. In fact, it is the partial area under the deflection basin curve normalized with respect to δ 0 using Simpson's rule [48].
This is the definition of the AREA when the deflections are given in millimeters.
• Area under pavement profile (AUPP) [49] which is another indicator sensitive to AC layer properties as found by Thompson and Garg [50].

Backcalculation Process
The Road Moduli Evaluation (RO.M.E.) method was used to determine the elastic moduli of the surface layer impact points. This procedure refers to the multi-layer elastic theory, the Boussinesq-Odemark equation, and the equivalent thickness (E.T.) method. In fact, starting from the thickness measurements of the layers composing the track (obtained from cores and radargrams (Figure 4)) as well as from the FWD results obtained during the experimental campaign, the RO.M.E. method makes it possible to determine the stress/deformation state of each point of the bituminous layer assumed as homogeneous, isotropic, and of semi-infinite thickness. Subsequently, using an iterative procedure, the RO.M.E. method makes the theoretical deflection basin congruent with the experimentally measured one so the pavement layers' moduli can be estimated. For a complete and comprehensive description of RO.M.E. method operations, please refer to the work of Battiato et al. [51].
isotropic, and of semi-infinite thickness. Subsequently, using an iterative procedure, the RO.M.E. method makes the theoretical deflection basin congruent with the experimentally measured one so the pavement layers' moduli can be estimated. For a complete and comprehensive description of RO.M.E. method operations, please refer to the work of Battiato et al. [51]. Moduli thus backcalculated in Figure 5 make it clear that those areas characterized by the highest deflections (TDZs) show lower modulus values. Similarly, lower deflections are related to higher modulus values. Moreover, the prior determination of the different pavement layers' thicknesses allowed the identification of homogeneous sections, i.e., those sections of the runway which show the same stratigraphy. A categorical variable that takes into account the homogeneous sections will hereafter be referred to as HS. Backcalculated moduli together with HS have therefore been associated, for each impact point, with the corresponding measured deflections and DBPs. These data will become part of the Input/Target table needed to implement the following supervised learning strategy.

Neural Modeling
An artificial neural network (ANN) is a soft computing technique that takes inspiration from biological neural networks. In this connection, similar to the biological nervous system, such model is constituted by a group of artificial neurons and more or less reinforced interconnections among them. Thus, the artificial neural network is an adaptive system that progressively adapts according to the internal and/or external information Moduli thus backcalculated in Figure 5 make it clear that those areas characterized by the highest deflections (TDZs) show lower modulus values. Similarly, lower deflections are related to higher modulus values. Moreover, the prior determination of the different pavement layers' thicknesses allowed the identification of homogeneous sections, i.e., those sections of the runway which show the same stratigraphy. A categorical variable that takes into account the homogeneous sections will hereafter be referred to as HS. Backcalculated moduli together with HS have therefore been associated, for each impact point, with the corresponding measured deflections and DBPs. These data will become part of the Input/Target table needed to implement the following supervised learning strategy.
isotropic, and of semi-infinite thickness. Subsequently, using an iterative procedure, the RO.M.E. method makes the theoretical deflection basin congruent with the experimentally measured one so the pavement layers' moduli can be estimated. For a complete and comprehensive description of RO.M.E. method operations, please refer to the work of Battiato et al. [51]. Moduli thus backcalculated in Figure 5 make it clear that those areas characterized by the highest deflections (TDZs) show lower modulus values. Similarly, lower deflections are related to higher modulus values. Moreover, the prior determination of the different pavement layers' thicknesses allowed the identification of homogeneous sections, i.e., those sections of the runway which show the same stratigraphy. A categorical variable that takes into account the homogeneous sections will hereafter be referred to as HS. Backcalculated moduli together with HS have therefore been associated, for each impact point, with the corresponding measured deflections and DBPs. These data will become part of the Input/Target table needed to implement the following supervised learning strategy.

Neural Modeling
An artificial neural network (ANN) is a soft computing technique that takes inspiration from biological neural networks. In this connection, similar to the biological nervous system, such model is constituted by a group of artificial neurons and more or less reinforced interconnections among them. Thus, the artificial neural network is an adaptive system that progressively adapts according to the internal and/or external information

Neural Modeling
An artificial neural network (ANN) is a soft computing technique that takes inspiration from biological neural networks. In this connection, similar to the biological nervous system, such model is constituted by a group of artificial neurons and more or less reinforced interconnections among them. Thus, the artificial neural network is an adaptive system that progressively adapts according to the internal and/or external information learned during the so-called training phase. Its structure, however, does not change over time [52]. Such models are often used to simulate complex relationships between inputs and outputs that other analytic functions cannot represent. Typically, an artificial neural network is organized as a series of layers [53]. Each of them is a collection of neurons and plays a different role. A layer that represents the known data is called the input layer. A layer that Sustainability 2021, 13, 8831 7 of 17 produces the network output is called the output layer. All other layers between the input and the output ones are called hidden layers and it is where the computational processing takes place. Here it is necessary to define the activation function, which modulates the amplitude of the output and defines whether or not it should be transmitted to subsequent layers. It is advisable to add more hidden layers as the complexity of the problem to be investigated increases. However, to solve most input-output fitting problems, a single hidden layer with a sufficient number of neurons is adequate [54]. This kind of structure is often called shallow neural network (SNN).
In the current study, three SNNs were examined. The first presents a 4 − n − 1 architecture, the second an 8 − n − 1, whereas the third a 9 − n − 1. The first number represents the neurons composing the input layer (a neuron for every input feature). n stands for the number of neurons in the hidden layer with n being an integer in the range 1 − 30. Once the first model has been trained and tested (i.e., starting from n = 1), a model with n + 1 neurons in the hidden layer was iteratively generated until the 30-hidden neurons architecture was obtained. The best hidden activation function for every model was also investigated within a group of four different functions: the exponential linear (ELU), the rectified linear (ReLU), the hyperbolic tangent (TanH), and the logistic sigmoid (LogS). Their analytical expressions together with the resulting graphs are shown in Figure 6. This resulted in a grid-search procedure aimed at identifying the best combination of hidden neurons function guaranteeing the best network performances within the search intervals. Finally, the output layer always consisted of 1 neuron, and it was associated with a linear activation function. learned during the so-called training phase. Its structure, however, does not change over time [52]. Such models are often used to simulate complex relationships between inputs and outputs that other analytic functions cannot represent. Typically, an artificial neural network is organized as a series of layers [53]. Each of them is a collection of neurons and plays a different role. A layer that represents the known data is called the input layer. A layer that produces the network output is called the output layer. All other layers between the input and the output ones are called hidden layers and it is where the computational processing takes place. Here it is necessary to define the activation function, which modulates the amplitude of the output and defines whether or not it should be transmitted to subsequent layers. It is advisable to add more hidden layers as the complexity of the problem to be investigated increases. However, to solve most input-output fitting problems, a single hidden layer with a sufficient number of neurons is adequate [54]. This kind of structure is often called shallow neural network (SNN).
In the current study, three SNNs were examined. The first presents a 4 − n − 1 architecture, the second an 8 − n − 1, whereas the third a 9 − n − 1. The first number represents the neurons composing the input layer (a neuron for every input feature). n stands for the number of neurons in the hidden layer with n being an integer in the range 1 − 30. Once the first model has been trained and tested (i.e., starting from n = 1), a model with n + 1 neurons in the hidden layer was iteratively generated until the 30-hidden neurons architecture was obtained. The best hidden activation function for every model was also investigated within a group of four different functions: the exponential linear (ELU), the rectified linear (ReLU), the hyperbolic tangent (TanH), and the logistic sigmoid (LogS). Their analytical expressions together with the resulting graphs are shown in Figure 6. This resulted in a grid-search procedure aimed at identifying the best combination of hidden neurons function guaranteeing the best network performances within the search intervals. Finally, the output layer always consisted of 1 neuron, and it was associated with a linear activation function. Each model presented as its input the homogeneous sections (HS) and spatial coordinates of the impact points along the runway (X, Y). In addition, the first model added to the feature vector the deflections, δ , collected. The second one considered both the δ and all deflections between δ and δ . The third one considered the DBPs defined in Section 2.2 instead of individual deflections. The only output was the elastic modulus of the bituminous layer (E ). Even before being assigned to the network, for both inputs and Each model presented as its input the homogeneous sections (HS) and spatial coordinates of the impact points along the runway (X, Y). In addition, the first model added to the feature vector the deflections, δ 0 , collected. The second one considered both the δ 0 and all deflections between δ 2 and δ 5 . The third one considered the DBPs defined in Section 2.2 instead of individual deflections. The only output was the elastic modulus of the bituminous layer (E AC ). Even before being assigned to the network, for both inputs and outputs a standardization procedure was provided. This means that each data item was subtracted from its respective mean and divided by its respective standard deviation. This was done because the algorithm chosen to train the network (Bayesian regularization) works best when the inputs and targets are scaled so that they fall approximately in the range of [−1,1] [55].

Bayesian Regularization
The training process of a neural network consists of applying the sequential steps required to fine-tune synaptic weights and biases in order to generalize the solutions produced by its outputs [56]. In the current study, it started from the Input/Target table mentioned in Section 2.3 and for this reason it is called supervised. The process involves two fundamental steps: a forward and a backward one [57]. The former consists in assigning the feature vector x (i.e., the set of known data) to the network and computing the corresponding outputŷ. The latter consists in comparing the generated output with the desired target y. The difference betweenŷ and y will define the so-called loss function F ŷ , y used to determine the corrections to the weights and biases matrix W. There are several analytical expressions that define how to update W according to the value assumed by the loss function for a fixed number of iterations E, thus differentiating the several learning algorithms. Usually, the mean squared error (MSE) is used as loss function F. Its analytical expression is presented in Equation (5).
The gradient of the loss function F with respect to W, calculated applying a backpropagation algorithm [58], allows one to update network weights in order to minimize the loss value. Equation (6) shows that the weights at the next iteration e + 1 are calculated starting from those used at the previous one ∀e ∈ {1, . . . , E}: Combining this with the Levenberg-Marquardt (LM) backpropagation algorithm [59] results in: where W is the matrix of weights and biases, J is the Jacobian matrix of the training loss function F with respect to W e , I is the identity matrix, whereas v is the vector of the network errors obtained through the expression: The scalar µ (also known as "learning rate") defines the algorithm convergence rate. As µ is increased, the LM algorithm moves toward a small step in the direction of steepest descent J T (W e )v(W e ). Instead, if µ is decreased, convergence will be faster with the possibility that the algorithm may jump over the minimum. For this reason, during the training phase, the µ value is modified to reach convergence as quickly as possible avoiding undesirable local minima. An initial value of µ is set for the first step. Subsequently it increases (or decreases) being multiplied by a factor µ inc > 1 (or µ dec < 1) if the previous iteration led to worse (or better) results in terms of the loss function F. In this way, the loss function tends to gradually decrease step by step. Finally, it is necessary to set a maximum value of µ (indicated by µ max ) in order to interrupt the training if it is exceeded. When µ max has been reached (or alternatively at the end of the E iterations) the best weights and biases are identified and kept fixed. Then the test feature vector is assigned as input making the network work only in the forward manner thus determining the model's loss index on some data never processed before. However, working this way, it is possible to run into a phenomenon known in machine learning as "overfitting". Overfitting occurs when the model performs very well with the given training set of observations but cannot generalize correctly beyond it and consequently performs very poorly with test data or different sets of observations [60]. This often occurs when the connection weights' values are too high. For this reason, in the current study, it was decided to use a regularization technique considering the matrix W values. This technique consists in no longer rewriting the loss function as MSE but as follows: The · 2 2 operator represents the 2-norm, and it is applied first to the errors v(W e ) = y(W e ) − y and then to network parameters W e . For this reason, the first term is also known as the sum squared error (SSE) while the second term represents a correction that considers the complexity of the network through its weights (also known as "penalty"). In addition, both terms are pre-multiplied by parameters indicated as β and α whose ratio defines the smoothness of the loss function. The larger the ratio α/β is, the smoother the network response will be. This concept was first introduced by Tikhonov [61]. The penalty term (also referred to as "regularization") forces the resulting loss function to be smooth. When the weights are large, the loss function can have large slopes and is therefore more likely to overfit the training data. On the other hand, if the weights are forced to be small, the loss function will smoothly interpolate the training data, thus achieving good performance even beyond the training set. The correct choice of the regularization ratio α/β is essential in producing a network that generalizes properly. In this study, it was decided to use David MacKay's approach [62] to correctly define the regularization terms. This approach involves an early initialization of the α and β parameters (as well as the network weights). Using Bayesian statistical and optimization techniques, α and β values are varied at each iteration thus changing the loss function and obviously its minimum point. Generally, each time a new minimum is set, the regularization parameters are more accurate. Eventually, the precision will be high enough that the objective function will no longer significantly change with every subsequent iteration and convergence will therefore have been achieved. The default values of the hyperparameters implemented in the MATLAB ® Toolbox LM algorithm (µ, µ inc , µ dec , µ max and E) were used, namely the initial µ was set equal to 0.001, µ inc to 10, µ dec to 0.1, µ max to 1 × 10 10 , whereas the maximum number of training epochs E was 1000. As explained in Section 3.1, the size of the hidden layer together with its activation function were identified by performing a grid search. Moreover, one last parameter, w w , was defined. It represented the number of re-trainings performed for each iteration and was set equal to 10. This was done in order to obtain the best network among the 10 fitted networks for every combination of neurons function as a result of a k-fold cross-validation partitioning.

K-Fold Cross-Validation
Together with the standardization procedure explained in Section 3.1, a further data pre-processing step was implemented. Usually, the dataset is simply split into two partitions: one for training and the other for testing. This method, also known as "hold-out", produces errors in the test phase with very variable results depending on which observations have ended up in one partition or another [63]. This undesirable effect is even more pronounced when the dataset is particularly limited so, to overcome this problem, the k-fold cross-validation (CV) was adopted [64]. This technique depends on a single parameter k, an index of the number of partitions into which the dataset must be divided. After randomly mixing the known data, a value of k is established and consequently the dataset is divided into k groups each containing the same number of elements. In this way, k − 1 groups are used to build the model and the left-out sample to validate it Figure 7. This is an iterative procedure that is repeated k-times so that each of the k-folds is successively assigned as validation data [65].

K-Fold Cross-Validation
Together with the standardization procedure explained in Section 3.1, a further data pre-processing step was implemented. Usually, the dataset is simply split into two partitions: one for training and the other for testing. This method, also known as "hold-out", produces errors in the test phase with very variable results depending on which observations have ended up in one partition or another [63]. This undesirable effect is even more pronounced when the dataset is particularly limited so, to overcome this problem, the kfold cross-validation (CV) was adopted [64]. This technique depends on a single parameter k, an index of the number of partitions into which the dataset must be divided. After randomly mixing the known data, a value of k is established and consequently the dataset is divided into k groups each containing the same number of elements. In this way, k − 1 groups are used to build the model and the left-out sample to validate it Figure 7. This is an iterative procedure that is repeated k-times so that each of the k-folds is successively assigned as validation data [65]. For each iteration, a validation score is tracked and recorded before moving on to the next iteration. The average of the obtained k-validation scores is assumed as the general performance of the model [66]: A review of model validation methods recommends k-fold cross-validation, particularly five or ten-fold [67]. The value of k has therefore been set equal to five, to remain consistent with the relevant literature [68]. Because of the five-fold CV, the dataset is split as follows: 80% of the available data for the training set and 20% for the test set. For each iteration, a validation score is tracked and recorded before moving on to the next iteration. The average of the obtained k-validation scores is assumed as the general performance of the model [66]: Val.Score i (10) A review of model validation methods recommends k-fold cross-validation, particularly five or ten-fold [67]. The value of k has therefore been set equal to five, to remain consistent with the relevant literature [68]. Because of the five-fold CV, the dataset is split as follows: 80% of the available data for the training set and 20% for the test set.

Data Augmentation
In the context of machine learning, whenever a certain technique is used to expand the size of the starting dataset, a data augmentation technique is used. This is a generation of synthetic data that comes from data processing and not from a new experimental campaign. In this sense, increasing the amount of data available to the network increases its predictive capabilities by improving its generalization. Data augmentation techniques are particularly popular in the field of image classification because rotating, zooming, cropping, or changing the brightness does not mean changing the information stored in an image. Similarly, in the case of time-series data augmentation it is necessary to use techniques that do not disturb the information collected during the experimental campaign but, at the same time, allow one to increase the sample size. Since collecting new data can be very expensive and time-consuming, multiple interpolation techniques are progressively emerging in data augmentation. Interpolation is a method of estimating unknown values using known data. More precisely, if the function f(x) of the real variable x is unknown and the function value f(x i ) for the value x i (i = 1, 2, . . . , n) of two or more variables with a certain interval is known, estimating a function value for any x in between is called interpolation [69]. As suggested by Oh et al. [70], the known dataset should never be shorter but rather longer than the one obtained through interpolation. For this reason, it was decided to nearly double the initial dataset by interpolating the unknown bidimensional function at a midpoint between two successive impact points of the same measuring line (Figure 8).
image. Similarly, in the case of time-series data augmentation it is necessary to use techniques that do not disturb the information collected during the experimental campaign but, at the same time, allow one to increase the sample size. Since collecting new data can be very expensive and time-consuming, multiple interpolation techniques are progressively emerging in data augmentation. Interpolation is a method of estimating unknown values using known data. More precisely, if the function f(x) of the real variable x is unknown and the function value f(x ) for the value x (i = 1,2, … , n) of two or more variables with a certain interval is known, estimating a function value for any x in between is called interpolation [69]. As suggested by Oh et al. [70], the known dataset should never be shorter but rather longer than the one obtained through interpolation. For this reason, it was decided to nearly double the initial dataset by interpolating the unknown bidimensional function at a midpoint between two successive impact points of the same measuring line (Figure 8). The function chosen as interpolator becomes a kind of model hyperparameter. The proposed method used the modified Akima interpolation, often known as the "makima" technique. In general, the Akima algorithm performs cubic interpolation to produce piecewise polynomials with continuous first-order derivatives [71]. Unlike 3 degree polynomial or cubic spline interpolation, the Akima algorithm avoids excessive local undulations while still being able to deal with oscillatory data. The modified Akima algorithm (i.e., "makima") is an extension of the method just mentioned and it is designed for interpolating values given at points of a rectangular grid in a plane by a smooth bivariate function z = z(x, y). The interpolating function is therefore a bicubic polynomial in each cell of the rectangular grid [72]. The main purpose is still to produce fewer undulations between given grid points. Data obtained through makima interpolation (85 augmented points) were used exclusively during the model training phase and were added to the k-fold partitioning for a total of 161 training points.
Assuming as current state-of-practice (CSP) the ANN model implemented in MATLAB ® Toolbox, the comparison between this simplified model and the one proposed by the authors has been considered. Such comparison allows one to evaluate the improvement of stiffness values estimation given by the proposed model. Moreover, assuming that data augmentation implementation is nearly equivalent to a denser experimental The function chosen as interpolator becomes a kind of model hyperparameter. The proposed method used the modified Akima interpolation, often known as the "makima" technique. In general, the Akima algorithm performs cubic interpolation to produce piecewise polynomials with continuous first-order derivatives [71]. Unlike 3rd degree polynomial or cubic spline interpolation, the Akima algorithm avoids excessive local undulations while still being able to deal with oscillatory data. The modified Akima algorithm (i.e., "makima") is an extension of the method just mentioned and it is designed for interpolating values given at points of a rectangular grid in a plane by a smooth bivariate function z = z(x, y). The interpolating function is therefore a bicubic polynomial in each cell of the rectangular grid [72]. The main purpose is still to produce fewer undulations between given grid points. Data obtained through makima interpolation (85 augmented points) were used exclusively during the model training phase and were added to the k-fold partitioning for a total of 161 training points.
Assuming as current state-of-practice (CSP) the ANN model implemented in MATLAB ® Toolbox, the comparison between this simplified model and the one proposed by the authors has been considered. Such comparison allows one to evaluate the improvement of stiffness values estimation given by the proposed model. Moreover, assuming that data augmentation implementation is nearly equivalent to a denser experimental campaign, it can be understood how the change of the HWD measurements resolution affects the modeling result.

Results and Discussion
Although the starting dataset is highly variable both in terms of measured deflections and backcalculated moduli along the runway (due to the pronounced differences in mechanical behavior), the proposed neural model returns very satisfactory results (Tables 1-3). The performance of the model is expressed in terms of Pearson coefficient R, mean squared error MSE, and adjusted coefficient of determination R 2 adj . The first one expresses, if any, a linear correlation between the backcalculated moduli E AC and those predicted by the neural modelÊ AC : where µ and σ are the mean and the standard deviation of the variables, respectively. Values of R exceeding 0.8 are typical of a satisfactory correlation [73,74]. The second provides an accurate estimation of the model generalization capability by averaging the squares of the differences between backcalculated and predicted moduli: The third one represents in what percentage the input variables can explain a variation of the output variable: is the sum of squared total, n is the number of observations, and p is the number of model inputs. Its formulation allows one to understand whether adding more independent variables will improve the goodness-of-fit for the regression model. In fact, the adjusted R-squared score increases when new terms improve the model fit while decreases when this improvement is not appreciable.
For what concerns the grid search approach, among the possible combinations of hidden neurons-activation function, the one that maximized the R 2 adj was chosen. The intention was to produce a network that would make small errors and, at the same time, make the best use of the parameters provided as input. From this point of view, the model that produced the best results is the third one. The architecture 9 − 23 − 1 with logistic sigmoid activation function (hereafter referred to as LogS-SNN) has produced a R 2 adj value equal to 0.9516, confirming that the DBPs, together with the homogeneous sections and the spatial coordinates of the impact points, are very suitable starting data to make modulus predictions. The graphical trend of R and MSE is shown in Figure 9a for a clearer appreciation of the learning process. It is important to notice how the former increases up to the best value and then settles around constant values. Similarly, the latter decreases down to the best value and then settles around constant values. This is the representation of the happened learning process. Adding more hidden neurons beyond the threshold of the best value would computationally weigh down the model without leading to any benefit in predictive terms. make the best use of the parameters provided as input. From this point of view, the model that produced the best results is the third one. The architecture 9 − 23 − 1 with logistic sigmoid activation function (hereafter referred to as LogS-SNN) has produced a R value equal to 0.9516, confirming that the DBPs, together with the homogeneous sections and the spatial coordinates of the impact points, are very suitable starting data to make modulus predictions. The graphical trend of R and MSE is shown in Figure 9a for a clearer appreciation of the learning process. It is important to notice how the former increases up to the best value and then settles around constant values. Similarly, the latter decreases down to the best value and then settles around constant values. This is the representation of the happened learning process. Adding more hidden neurons beyond the threshold of the best value would computationally weigh down the model without leading to any benefit in predictive terms. As further evidence of the network efficiency from the computational cost point of view, another parameter was kept under observation: the effective number of parameters (ENP). It provides a measure of how many parameters (weights and biases) in the neural model are effectively used in reducing the loss function [59]. The ENP is expressed by: where is the matrix of weights and biases of the i-th SNN model trained during the k-fold cross-validation procedure and γ is the total number of parameters of the same ith SNN. The trace of the Hessian matrix can be computed starting from the Jacobian matrix of the training set errors and the parameters α and β , as explained by Hagan et al. [59]. With a ratio of 185/254, the LogS-SNN ENP shows that, despite the large number of neurons within the hidden layer, more than 70% of the model's parameters are used to As further evidence of the network efficiency from the computational cost point of view, another parameter was kept under observation: the effective number of parameters (ENP). It provides a measure of how many parameters (weights and biases) in the neural model are effectively used in reducing the loss function [59]. The ENP is expressed by: where W i is the matrix of weights and biases of the i-th SNN model trained during the k-fold cross-validation procedure and γ i is the total number of parameters of the same i-th SNN. The trace of the Hessian matrix can be computed starting from the Jacobian matrix of the training set errors and the parameters α i and β i , as explained by Hagan et al. [59]. With a ratio of 185/254, the LogS-SNN ENP shows that, despite the large number of neurons within the hidden layer, more than 70% of the model's parameters are used to reduce the loss function. It is probably the ideal combination of computational efficiency and accuracy in the predictions as also highlighted by the value of mean squared error MSE CV equal to 0.0321. For a complete illustration of the results, the performance of the LogS-SNN model for each of the five folds has been graphically represented Figure 9b. As explained in Section 3.3, averaging the results over the five folds, the predictive capabilities of the model can be evaluated resulting in a R CV equal to 0.9864. Using a VivoBookProN580GD-FI018T with Intel(R) core(TM) i7-8750H CPU @2.20 GHz and 16 GB RAM, this model takes 13 s to process the data. The LogS-SNN model finally made it possible to predict the modulus value at each point of the runway thus generating a contour map (Figure 10). From a comparison with the map in Section 2.3, it can be concluded that the neural modeling procedure followed was successful in fitting the presented experimental data. The contour map provides a quantitative estimation of the pavement mechanical performance at the time of the HWD experimental investigations.
ProN580GD-FI018T with Intel(R) core(TM) i7-8750H CPU @2.20 GHz and 16 GB RAM, this model takes 13 s to process the data. The LogS-SNN model finally made it possible to predict the modulus value at each point of the runway thus generating a contour map (Figure 10). From a comparison with the map in Section 2.3, it can be concluded that the neural modeling procedure followed was successful in fitting the presented experimental data. The contour map provides a quantitative estimation of the pavement mechanical performance at the time of the HWD experimental investigations. It is worth pointing out that a simple interpolation of the backcalculated modulus data can certainly estimate the value of E at any point on the runway, but it would predict the stiffness without any phenomenological relationship being considered. On the other hand, the proposed neural model predicts the AC modulus value at any point on the pavement but requires the measured deflections, their coordinates along the runway, and the stratigraphy as input. This approach preserves the fundamental relationship between the modulus and the mentioned variables, from a logical point of view.

Conclusions
This paper was focused on prediction of the stiffness modulus of a runway pavement AC layer, a critical element in pavement management, by means of an innovative methodological approach based on soft computing techniques, namely machine learning and data augmentation. Specifically, shallow neural networks, characterized by a three-layer architecture, and the modified Akima algorithm were adopted. The results of an experimental trial carried out on Runway 02/20 at Palermo airport by means of HWD, were used to develop three different predictive SNN models.
The spatial coordinates of the impact points and a categorical variable representative of the pavement section type were always included in the model's input vector. In addition to such variables, the first SNN model considered the measure δ . In the second predictive model, all deflections between δ and δ were also included. In the third and last neural model, the deflection measurements were replaced by the deflection basin parameters.
To optimize the number of hidden neurons and to select the best suited type of activation function, a grid search was carried out. Bayesian regularization algorithms and a It is worth pointing out that a simple interpolation of the backcalculated modulus data can certainly estimate the value of E AC at any point on the runway, but it would predict the stiffness without any phenomenological relationship being considered. On the other hand, the proposed neural model predicts the AC modulus value at any point on the pavement but requires the measured deflections, their coordinates along the runway, and the stratigraphy as input. This approach preserves the fundamental relationship between the modulus and the mentioned variables, from a logical point of view.

Conclusions
This paper was focused on prediction of the stiffness modulus of a runway pavement AC layer, a critical element in pavement management, by means of an innovative methodological approach based on soft computing techniques, namely machine learning and data augmentation. Specifically, shallow neural networks, characterized by a three-layer architecture, and the modified Akima algorithm were adopted. The results of an experimental trial carried out on Runway 02/20 at Palermo airport by means of HWD, were used to develop three different predictive SNN models.
The spatial coordinates of the impact points and a categorical variable representative of the pavement section type were always included in the model's input vector. In addition to such variables, the first SNN model considered the measure δ 0 . In the second predictive model, all deflections between δ 2 and δ 5 were also included. In the third and last neural model, the deflection measurements were replaced by the deflection basin parameters.
To optimize the number of hidden neurons and to select the best suited type of activation function, a grid search was carried out. Bayesian regularization algorithms and a k-fold cross-validation procedure was implemented in order to identify the model characterized by the best predictive performance.
The results showed that the best performing SNN model is the third one, characterized by 23 neurons in the hidden layer and a logistic sigmoid activation function. This model can produce very accurate stiffness modulus predictions with R and MSE values equal to 0.9864 and 0.0321, respectively. Moreover, in order to consider both the size of the available dataset and the number of parameters, the R 2 adj evaluation metric has been used; its value, equal to 0.9516, has fully confirmed the good performance of the predictive model.
The elaboration of the data by means of CSP ANN model produced a Pearson correlation coefficient equal to 0.9460. Conversely, thanks to the implementation of a data augmentation technique, the proposed model produces more accurate predictions. The R-value is equal to 0.9864, resulting in a 4% increase in the correlation coefficient.
The proposed soft computing approach, which is proven to be able to predict the stiffness modulus at any point on the runway, can become a crucial element in APMS, thus allowing optimization of the available maintenance budget, in order to improve safety and sustainability of airport infrastructures.
The proposed approach has been developed with respect to the Palermo runway, but it could be easily adopted to analyze any runway or paved area such as parking lots. So far, the study does not consider historical series of deflections: it would be interesting to implement this additional information in order to evaluate if the presented methodology is suitable not only for evaluating the current pavement deterioration state, but for predicting its evolution and scheduling intervention priorities over time as well. Funding: This research received funding from the ERDF Regional Operational Programme Sicily 2014-2020 (POR FESR Sicilia 2014-2020), Action 1.1.5-"SMARTEP".