The prediction of the wind speed at different heights by machine learning methods

In Turkey, many enterprisers started to make investment on renewable energy systems after new legal regulations and stimulus packages about production of renewable energy were introduced. Out of many alternatives, production of electricity via wind farms is one of the leading systems. For these systems, the wind speed values measured prior to the establishment of the farms are extremely important in both decision making and in the projection of the investment. However, the measurement of the wind speed at different heights is a time consuming and expensive process. For this reason, the success of the techniques predicting the wind speeds is fairly important in fast and reliable decisionmaking for investment in wind farms. In this study, the annual wind speed values of Kutahya, one of the regions in Turkey that has potential for wind energy at two different heights, were used and with the help of speed values at 10 m, wind speed values at 30 m of height were predicted by seven different machine learning methods. The results of the analysis were compared with each other. The results show that support vector machines is a successful technique in the prediction of the wind speed for different heights.


Introduction
Renewable energy technologies such as solar, wind, biomass, geothermal, etc., become more important for the future of countries, since there are local resources and indefinite sources of energy [1].Similar to other countries, Turkey is also making progress in the use of renewable energies.Within this scope, according to the 2015-2019 Strategic Plan of the Ministry of Energy and Natural Sources, in 2014, it is intended to increase the share of renewable energy resources in primary energy supply and electricity generation.In the plan, it is aimed to increase the established wind power capacity from 5600MW in 2015 to 10000MW in 2019 [2].In order to achieve these targets, the government introduced new stimulus packages and provided some convenience for renewable energy investments.This is quite encouraging for enterprisers to make investments in this field.
Wind is available virtually everywhere on earth, although there are wide variations in wind strengths.Wind energy is being developed in the industrialized world for environmental reasons and it has attractions in the developing world as it can be installed quickly in areas where electricity is urgently needed [3].In Turkey, the production of electricity through wind energy connected to the grid started in 1998 and increased one fold in each year after 2006, see Figure 1.As seen from the Figure 2, in 2015, the overall energy produced via wind energy reached to 4718,3 MW [4].
The wind speed is one of the most important parameters in determination of the wind energy potential of a region.For this reason, in a potential region, wind speed data are measured hourly and saved for one year and these data are used in measurement of the wind potential of that region.For this purpose, the measurement station is placed at a point of the region which is representative of that field.In the farm field, the height of the measurement station, which is located perpendicular to the direction of the dominating wind, is commonly two-third of the height of the wind turbine.The measurements could be performed at different heights, e.g.10m, 30m and 50m height of observation pole.These measurements are necessary to make a decision for investment.However, as they are long-term and expensive, they bring about extra cost and also prolonged the duration to the investment.For this reason, the success of the wind speed prediction methods for different heights could offer fast, reliable and cost-effective way by which the investment could be planned well-in advance.Wind energy industry depends on wind speed forecasts to help determine facility location, facility layout, as well as the optimal use of turbines in day today operations [5].There are physical, statistical, artificial neural and hybrid methods on the prediction of wind speed.Especially, in recent years, artificial intelligence techniques, like artificial neural networks (ANN), fuzzy logic and support vector machines (SVM), and hybrids of these methods are widely used in the prediction of the wind speeds.In a review study, presenting the previous studies on the prediction of the wind speed and the energy produced,  [11].
To the best our knowledge, this is the first study that handles the wind speed prediction problem via seven different algorithms.The remainder of this paper is organized as follows.In section 2, machine learning regression models used in wind speed forecasting are introduced.In Section 3, data set is given and in section 4 forecasting models are compared and evaluated.Finally, Section 5 presents the main conclusion of the paper.

Machine learning regression methods
Machine learning (ML) regression methods predict an unknown dependency between the inputs and output from a dataset [12].WEKA is a comprehensive collection of machinelearning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization [13].In this study the algorithms are directly applied with WEKA platform and we utilized three categories of WEKA 3.6.13platform as Functions, Lazylearning, and Tree-based learning algorithms.Functions incorporate algorithms, which are based on the mathematical models.Lazy-learning algorithms handle with training data until a query is answered.The lazy-learning algorithms aggregate the training data in memory and find out associated data in the database to satisfy a specific query.Tree-based learning algorithms are proper for making predictions via a tree structure.Leaves of the trees exemplify classifications and branches of the trees indicate conjunctions of features.The brief summary of the methods, used in this paper, are presented in Table 1.

Support vector regression (SVMr)
The foundations of support vector machines (SVMs) have been developed by Vapnik [14] and have been increasingly used in different forecasting problems.Successful forecasting studies were performed with support vector regression (SVMr) in different fields such as production forecasting [15], speed of traffic flow forecasting [16] and financial time series forecasting [17,18].Also SVMr is used as a predictor to determine wind speed [8,19].SVMr formulation is given below [20,21,22]; The simplest classification problem is two-class linear separable case.Assume that there is a training set which has " l " number points.

11
( , ),...., ( , ) Suppose that there are some hyper planes that separates two classes can shown as .0 w x b  (2) where w is weight vector which is normal to hyperplane, and b is the threshold value.In the simplest linearly separable case, we seek for "largest margin".Margin borders can be formulated as . 1 Eq. ( 3) can be generalizable as Here w is the Euclidean norm of w .According to theory, to determine unique solution with finding optimal hyperplane "d" must be maximized.To calculate optimal hyperplane we have to minimize 1/2 w 2 (6) subject to Eq. ( 3).This quadratic optimization problem can be solved with Lagrange Multipliers.7) is a Lagrangian where w and b are primal variables and i  is dual variable.To find the optimal solution of the primal optimization problem (Eq.7) we have to minimize primal variables w and b .( , , ) 0 After calculating above differential operations, Eqs.(10,11) are found.
(11) By using a generalized method of Lagrange multipliers called Karush-Kuhn-Tucker conditions we can provide below equation where 0  points from the Eq. ( 4).Those points are subset of training data with the non-zero Lagrangian multipliers called Support Vectors.
[ ( . ) 1] 0, We can transform Eq. ( 12) into Eq.( 13) subject to Eqs. (10,11) .In our Lagrangian equation, there are only dual variables after substitution primal variables w and b .Now, our problem is a dual optimization problem, it can be solved as shown below, Maximize ) subject to Eq. (10).

Multi layer perceptron (MLP)
The MLP is a feed forward artificial neural network (ANN) trained with the back propagation algorithm that consists of neurons with substantially weighted interconnections, where signals always travel to the direction of the output layer.These neurons are mapped as sets of input data onto a set of proper outputs with hidden layers.The input signals are sent by the input layer to the hidden layer without execution of any operations.Then the hidden and output layers multiply the input signals by a set of weights, and either linearly/non-linearly transforms results into output values.The connection between units in following layers has an associated weight.These weights are optimized to compute reasonable accuracy of prediction [23,24].A typical MLP with one hidden layer can be mathematically describe in Eqs.(14,18) as below [25,26,27]: Eq. ( 14) defines summing products of the inputs ( i X ) and weight vectors ( ij a ) and a bias term of hidden layer ( j a 0 ).In Eq. ( 15), the outputs of hidden layer ( j Z ) are obtained as transforming this sum, which is defined in Eq. ( 14), by using activation function g .) ( The most widely used activation function is sigmoid function [28], which is defined in Eq. ( 16) for input x .The hidden and output layers are based on this sigmoid function.In Eq. ( 18), the outputs of the output layer ( k Y ) are obtained by transforming this sum, that is calculated in Eq. ( 17), using sigmoid function g , which is defined in Eq. ( 16).

Radial basis function neural networks (RBFNetwork)
An RBFNetwork is a type of a feed-forward neural network comprised of three layers: input, hidden and output layers.Even though the computations between input and hidden layers are nonlinear, they are linear between hidden and output layers.An RBFNetwork can build both regression and classification models [29].It differs from an MLP in the way the hidden layer units perform calculations.In an RBFNetwork, inputs from the input layer are mapped to each of the hidden units.The hidden units use radial functions such as the bell-shaped Gaussian function for activation.The activation h(x) of the Gaussian function for a given input x decreases monotonically as the distance between x and the center c of the Gaussian function increases.The most general RBFNetwork can be mathematically defined as below [30]: where c is the center, R is the metric and  is the function The metric is often Euclidean so that for some scalar radius r and the Eq.( 20) simplifies to The simplification is a one-dimensional input space in which case The Gaussian function Therefore a typical radial function is the Gaussian which, in the case of a scalar input, is

KStar (K*)
KStar is an instance-based classifier used for regression problems [31].It uses entropic measure, based on probability of transforming instance into another by randomly selecting between all possible transformations.Using entropy as appraise of distance has numerous utility.Tackling with the missing values by classifiers pose a problem.Usually missing values treated as a separate value, thought as maximally different, substitute for average value, which otherwise would simply be ignored.Entropy based classifier is a solution for such issues [32].

Locally weighted learning (LWL)
The LWL uses an instance-based algorithm, assigns instance weights.This algorithm can perform both classification and regression [33].
The basic idea of the LWL is that any non-linearity can be approximated by a linear model, if the output surface is smooth.Therefore, instead of looking for a complex global model, it is simple to approximate non-linear functions by using individual local models [34].

DecisionStump
DecisionStump, constructs one-level binary decision trees for datasets with a categorical or numeric class, handling with missing values by treating them as a separate value and extending a third branch from the stump [35].It makes (i) regression based on mean-squared errors or (ii) classification based on entropy depending on the data type to be predicted [36].It also finds a single attribute that provides the best discrimination between the classes and then bases future predictions on this attribute [37].

RandomTree
RandomTree is also a regression-based decision tree algorithm.Trees built by RandomTree consider randomly selected attributes at each node.It performs no pruning.Also has an option to allow prediction of class probabilities based on a hold-out set (backfitting) [35].

The data set
In this study, twelve months data used which consists of 10 m and 30 m heights.Measurements were generally taken at 10-30 m heights from the ground [1].An annual set of data, collected for a wind farm which is planned to be established in Kutahya was investigated.Kutahya is a region of Turkey has potential of wind power.By using wind speed values obtained for 10 m of height, wind speed values for 30 m of height were predicted by SVMr, MLP (the most commonly used technique in prediction of wind speed), RBFNetwork, K*, LWL, DecisionStump, RandomTree techniques.The results of prediction were compared by each other and actual wind speed.
From whole set of data, the first eleven months of the year data is used for training stage and one month (December) data is used for validating the results obtained.Daily averaged datas are used and the data collected in the first five days of the June were ignored and were not considered in calculations due to maintenance of the station that was performed in that period.The data are summarized in

Findings
In this paper, a comparative assessment on wind speed prediction has been performed via seven machine learning regression methods.Forecasting results obtained were compared to each other and actual data sets.For wind prediction, data of 331 days wind speed measured at 10 and 30 meters were used for training, while data of 31 days in December were used for testing.After many different trials for each model, polynomial kernel was selected for SVMr; where p and C (complexity coefficient) were taken as 1.In MLP method, learning coefficient was L=0.3, moment was M=0.2, training number was N=500 and hidden layer number was H=2.Minimum standard deviation was 0.1, random seed was 1 and number of clusters was 2 for RBFnetwork.In random tree the number of randomly chosen attributes K and maximum depth for unlimited was taken as 0, the minimum total weight was 1, one random seed and no allow of unclassified instances.The parameter for global blending was determined as 20 for K* and weighting function was determined as linear for LWL.
Wind speed forecasting comparison results are presented in Figure 3 and Figure 4 via Bar Chart with lower and upper bounds and Cumulative Line Chart, also the numerical descriptors are depicted with Boxplots presented in Figure 5.The boxes indicate the interquartile ranges, the whiskers show the 5th and 95th percentile of observed and predicted data and the horizontal line within each boxes indicate the median values.Skewness, a description of wind speed distribution asymmetry, is shown in the Figure 5.Typically, wind speed data are positively skewed, placing the mean in the upper half of the data, i.e., they have a long right tail.This means that large positive deviations from the mean wind speed are more frequent than negative deviations of the same magnitude.This is, because wind speed values are one-sidedly bounded.The degree of positive skewness illustrates that wind speed typically occurs as many small events with a few large events that elevate the mean.The predictions fit the observed data well.The MLP and RandomTree did a fairly good job at capturing the observed data, however the overall performance of SVMr are the best when compared to the patterns of the observed wind speed data.Overall, results suggest that SVMr forecasting results are more realistic than other six methods' forecasting results.

Result
Within the scope of the study, wind speed predictions were performed by seven machine learning regression methods.To the best our knowledge, it is the first so comprehensive comparative study that handles the wind speed prediction problem via seven different algorithms.All seven methods are compared and it is shown that each of the three methods, SVMr, MLP and RandomTree, are highly successful in wind speed forecasting than the other four methods.When the methods are compared, the correlation between wind speed at 30 m and prediction result are very close to each other for these three techniques.As seen from Figure 5, the wind speed predictions of the SVMr, MLP and RandomTree methods were fairly accurate.If statistical analysis criteria were applied, however, it can be seen that MAE, RMSE, RAE and RRSE values are much smaller for SVM technique.Thus, it can be stated that, in this sample study, SVMr shows a better performance compared to others.
The study shows that three methods are quite successful in the prediction of the wind speeds and the predicted values are very close to the real measurements.For this reason, it can be stated that wind speed predictions for different heights made by SVMr, MLP and RandomTree may help in decision making for establishment of wind farms and in wind farm planning activities.Finding suitable and accurate wind speed predictor is crucial in wind energy applications.it's obtained that prediction success of SVM has been found more satisfactory than the other's.It is concluded that the SVM's can be used effectively as an alternative method by researchers and the investors for predicting the wind speed.

Figure 3 .
Figure 3. Bar chart comparison of actual wind speed with lower and upper bounds.

Figure 4 .
Figure 4. Cumulative line chart comparison with machine learning regression methods.

Figure 5 .
Figure 5. Boxplot comparison of actual wind speed with machine learning regression methods.
Lei et al. state that artificial techniques are more successful than the traditional techniques and hybrid models, which come out nowadays, of cause are advanced ones and have less error than others [6].A research was developed by Mohandes et al. based on ANN and autoregressive model (AR), the results indicated that the ANN was superior to the AR model [7].Mohandes et al.

Table 1 .
Regression methods used in this paper.
Table 2 monthly used in this paper.

Table 3 .
Comparison of statistical measures.