Daily model simulations provide the meteorological forecasts in the areas where wind farms operate. Precisely, horizontal wind speed outputs derived from model grid points close to the examined wind farms are converted into power. This is obtained by a statistical regression model using non-polynomial equations that employ observed wind speed and power data in previous time interval (Stathopoulos et al., 2013), applied dynamically in each forecasting step.

**Figure 2**Autocorrelation of wind speed and wind power error in time for **(a)** park A and **(b)** park C.

Download

For the generation of the wind power probabilistic forecasts, the error distribution curve of the deterministic forecast from a previous time interval is utilized. The selection of the training period used is based on a prior analysis of the error characteristics. The following paragraphs address the points and requirements taken into consideration for the development of the methodology applied.

**Figure 3**Distribution of error in wind speed and wind power for several samples in park B.

Download

The temporal dependency of the forecasting error (*e*_{i}), which results from the model values minus the observed ones at the same time, can be retrieved by the autocorrelation (*R*_{c}) which represents the level of similarity between a time series and its lagged version over successive time intervals, given as

In two different locations, the autocorrelation of errors both in wind speed and wind power presents a poor correlation between a current error and its past values in long time horizons (Fig. 2). In the first examined case, the errors in wind speed and wind power present some consistency for the first nine hours and are gradually reduced, tending to minimize at the end of a daily cycle. On the other hand, for the same period in another station, the relation has rapidly decreased within the first three to six hours. The demonstrated cases represent mainly a temporal and a small spatial dependency in the forecasting errors.

**Figure 4**Probability density of wind power error in the different quantiles of Pn **(b)** and distributions of error for each quantile **(a)**, for park B.

Download

**Figure 5****(a)** Error distributions for the Prop and Ref method, experimental and theoritical quantile-quantile error plots for **(b)** Ref method and **(c)** Prop method, in park D.

Download

Towards the definition of the optimum selection of the previous period employed, the shaped distributions of wind speed and wind power errors of different sampling frequencies are examined. In Fig. 3 the distributions of error are presented, resulting from different sampling frequencies: 20 days (480 samples), 60 days (1440 samples) and 100 days (2400 samples). While the uncertainty in wind power forecast is related to the quality of the forecasted prevailing wind conditions, the forecasting errors of wind speed and wind power, shape different distributions. Wind speed errors are normally shaped covering a wide range of values. On the contrary, the deviations in wind power are mainly concentrated in a range between ±20 % of the nominal power (Pn), forming a more leptokurtic distribution. Moreover, it must be noted that the selection of a short historic period might exclude certain types and ranges of errors that can mislead the construction of the forecasting densities.

Additionally, the shape of deviations in different quantiles of wind power is assessed with the aim of examining the associated uncertainty under different magnitudes. Figure 4 illustrates the frequency of error occurrence in different quantiles of wind power. In addition, the distribution stemming from the errors emerging in each bin is depicted. Most of the errors occur in the ranges between 0 %–20 % and 90 %–100 % of the nominal power. This suggests an increased uncertainty in lower and maximum values of the energy yield.

The aforementioned points indicate the necessity of specific optimization techniques for the derivation of efficient probabilistic forecast methods. Error characteristics differ both spatially and temporally. A multi-period and spatially diversified training process is required, rather than one which is predetermined and uniform. Therefore, a sample of forecast errors shaping a distribution with standard deviation above the 30 % of Pn is arbitrarily considered. An analysis of twenty days is initially performed and the sample period is extended until the aforementioned criterion is met. Moreover, since the uncertainty is enhanced in certain power magnitudes, the process is recursively performed in each range of Pn, using a 10 % interval.

In order to increase the symmetry of the shaped error curve, a non-linear Kalman filtering method is utilized. Kalman filter is a dynamic approach also used in forecasting for the improvement of the initial model prediction. However, in this case the ability of extracting the random errors by the elimination of the systematic ones is further employed. Detailed descriptions and applications of the applied Kalman filtering can be found in Galanis et al. (2006). The main associated processes are stated here: The relation between the model output *f*_{i}, the observed value *o*_{i} and the model error *e*_{i}(*f*_{i}−*o*_{i}) at the same time *t*_{i}, can be expressed in a non-linear form as:

with *r*_{i} the Gaussian non-systematic error and *a*_{j,i} the coefficients that need to be defined by the process. With the elimination of the systematic error in a next forecasting interval (e.g. the next 12–48 h), the residual forecasting error is normally distributed with near zero mean value.

For the extraction of the CI, the empirical cumulative distribution of the deviation data is computed. The derived curve is approximated by a piecewise nonlinear function and the CI are further estimated. As a result, a nonparametric representation of the examined sample is obtained.

To evaluate the proposed method, termed as Prop, other methods of probabilistic forecasting are also used for comparison using the same time interval of past error values. A Reference method (Ref) is applied, calculating intervals from the model error without any other process involved. Moreover, a persistence based method (Pers) which adds and subtracts the standard deviation of observations from previous hours in each upcoming point prediction and a constant one (Const) which follows the same operation for the 20 % of nominal power, are also utilized. This value was selected in order to cover the mean absolute differences between modelled and observed power values, normalized with the nominal power of each park: 17.25 % for Park A, 14.3 % for Park B, 13 % for Park C and 18.1 % for Park D, as calculated for the whole examination period.

For the evaluation of the forecasting probabilities the reliability (PIr) and the continuous ranked probability score (CRPS) are employed. Considering an indicator *S*_{i} at time *t*_{i} equals to unity, if the observed value is inside the prediction bounds and zero if not, the reliability at a level of significance *a* is measured as the number of successful cases over the total number *N* of the samples:

The continuous ranked probability score combines both reliability and sharpness of the prediction intervals and is the squared difference of the modelled *F*(y) and the observed *F*_{o}(y) cumulative distributions:

with *F*_{o}(y) equals to unity, when the forecast variable *y* is above the observed value and zero otherwise. This score has a negative orientation, with smaller values suggesting better predictions.