Combining regional rainfall frequency analysis and rainfall-runoff modelling to derive frequency distributions of peak flows in ungauged basins : a proposal for Sicily region ( Italy )

In the present study an attempt is made to provide a general Monte Carlo approach for deriving flood frequency curves in ungauged basins in Sicily region (Italy). The proposed procedure consists of (i) a regional frequency analysis of extreme rainfall series, combined with Huff curves-based synthetic hyetographs, for design storms and (ii) a rainfallrunoff model, based on the Time-Area technique, to generate synthetic hydrographs. Validation of the procedure is carried out on four gauged river basins in Sicily region (Italy), where synthetic peak flow frequency curves, obtained by simulating 1000 flood events, are compared with observed values. Results of the application reveal that the proposed Monte Carlo approach is suitable to reproduce with reasonable accuracy the hydrologic response of the investigated basins. Given its relative simplicity, the developed procedure can be easily extended to poorly gauged or ungauged basins.


Introduction
Derivation of frequency distributions of peak flows is extremely important for many practical applications in engineering, such as: designing hydraulic structures, water resources planning and management, flood risk assessment and floodplain management.Hydrologic modelling in ungauged basins consists in indirect methods based on simplified description of flood formation process.At the beginning of the 90s the Italian National Research Group for the Prevention of Hydro-Geological Disaster (GNDCI), developed uniform procedures to estimate intense rainfall and peak flow at regional level, under the so-called VAPI project.The results of the VAPI project for Sicily region have been published by Cannarozzo et al. (1993Cannarozzo et al. ( , 1995)).In spite of some recent efforts to update the VAPI study for Sicily (Ferro and Porto, 2006;Noto and La Loggia, 2009), currently there is no consensus on a single standard procedure.
To this end, the present study aims to provide a general Monte Carlo approach for deriving peak flow frequency curves in ungauged basins in Sicily.The proposed procedure is a combination of statistical methods for deriving the design storms and simulation techniques, through an event-based rainfall-runoff model, to generate synthetic hydrographs.In particular, a regional analysis of extreme rainfall, based on the index flood-type method (Dalrymple, 1960), is carried out to derive total storm depths of different return periods.Then, such storm depths are distributed in time according to synthetic hyetographs generated by the Huff curves method (Huff, 1967), which provides a probabilistic representation of accumulated storm depths for corresponding accumulated storm durations expressed in dimensionless form.Estimation of effective storm depths is made through the Curve Number (CN) method developed by the US Natural Resources Conservation Service (NRCS), formerly known as the Soil Conservation Service (SCS).Capitalizing on approaches proposed in previous studies (De Michele and Salvadori, 2002;Aronica and Candela, 2007), the Antecedent Moisture Conditions (AMC) in CN method are treated as a random variable with a discrete probability distribution, in order to relax the iso-frequency assumption between the design storms and the resulting hydrographs.Finally, a rainfall-runoff analysis built upon the Time-Area (TA) concept (Clark, 1945) is used as flood routing technique.The proposed methodology is then validated on four gauged river basins in Sicily region, by comparing synthetic versus observed peak flow frequency curves.

Data and methods
Figure 1 illustrates the structure of the proposed Monte Carlo approach for deriving frequency distributions of peak flows in ungauged basins.The procedure consists of a cascade of three different modules, i.e.: a rainfall generator module providing design gross rainfall hyetographs, a module for hydrologic losses computation to transform gross rainfall into excess (net) rainfall hyetographs and a flood routing module whose outputs are synthetic surface runoff hydrographs.Each module is described in details in the following subsections.
It is worth underlining that all the models included in the methodological framework can perform either as lumped or distributed models, by considering respectively uniform or spatially variable excess rainfall and catchment characteristics.In this paper, lumped models are used for the application of the procedure to the illustrative case studies.

Case studies and data
The procedure is validated on four gauged river basins in Sicily region, Italy (see Fig. 2), namely: Belice at Ponte Belice, Imera at Drasi, Salso at Ponte Gagliano and Alcantara at Moio.
As most of the arid or semi-arid Mediterranean river basins, they are characterized by ephemeral stream regimes, with large floods in autumn and winter and low flows in sum-mer, with the only exception of the Alcantara river basin, usually characterized by perennial surface flows, enriched by spring water arising from the big aquifer of the Etna volcano.
The rainfall generator module is calibrated on historical annual maxima rainfall (AMR) series of short duration, i.e. 1, 3, 6, 12 and 24 h, retrieved by a dense network of mechanical recording rain gauges operated by the Water Observatory of Sicily Region (www.osservatorioacque.it)since the end of the 20 s, and a 10 min time-resolution rainfall dataset retrieved by the telemetry network of the Sicilian Agrometeorological Information System (http://www.sias.regione.sicilia.it/), in operation since 2002.Data are regionally pooled with respect to homogeneous regions, as described in Sect.2.2.
Implementation of the TA technique in the flood routing module is based on topographic data derived from 100 m grid resolution Digital Elevation Models (DEM) of the basins, through Geographic Information System (GIS) functions.
The performance of the methodology is assessed by comparing the outputs of Monte Carlo simulation with historical annual maxima peak flow series available at the streamflow gauges located at the catchments' outlet cross-sections (see Table 1).

Rainfall generator module
The rainfall generator module encompasses two submodules.The first sub-module is based on a regional frequency analysis of AMR series for synthetic generation of long series of total storm rainfall depths.It implies the inclusion of regional pooled data samples having similar frequency distributions in the frequency analysis.This allows the implementation of multiparameter probability distributions, usually more suitable in modelling extreme events than traditional two parameter distributions.
This procedure first requires identification of homogeneous regions, namely a set of sites whose frequency distributions can be considered to be approximately the same, apart from a scale factor.In particular, given a generic site i within a homogeneous region, the quantile estimate of a T -year rainfall depth at time t and site i, can be expressed as: where y (t, T ) is a common dimensionless quantile function (i.e.growth curve) and η i (t) is a site specific scale factor, Figure 2. Location of the investigated river basins and homogeneous regions derived from the regional rainfall frequency analysis carried out by Bonaccorso and Aronica (2016).
also called index value, commonly the sample mean or median.In case of ungauged site, the latter can be estimated from maps derived by spatially interpolating available local values.
Several methods of forming regions are available in literature, among which cluster analysis is one of the most widely applied to regional frequency analysis (Muñoz-Diaz and Rodrigo, 2004;Noto and La Loggia, 2009;Yang et al., 2010;Ngongondo et al., 2011).A further step is to test whether the identified regions can be accepted as being homogeneous.To this end the readers may refer to Viglione et al. (2007) and references therein.
Once that homogeneous regions are identified, AMR series of the sites in each region can be pooled together and standardized with respect to the corresponding index values.Standardized data are then used to fit the parameters of different candidate probability distributions by using the L-moment method (Hosking and Wallis, 1997).Application of goodness of fit methods allows selecting the most suitable probability distribution for the available data.Finally, one can derive the corresponding growth curve y (t, T ) as the inverse cumulative distribution function.
Computation of the total rainfall depth of the T -year design storm through Eq. ( 1), requires the definition of a critical duration, namely the duration maximizing the resulting peak flow.In the present study, the time of concentration t c of each river basin is used as the critical rainfall duration.In particular, this latter parameter is calculated according to Giandotti's empirical formula (Ferro, 2002): where A [km 2 ] is the area of the basin, L [km] is the maximum flow travel length and H [m s.m.m.] is the basin mean elevation.
The second sub-module, calibrated on 10 min rainfall data, consists of a simple method to define the temporal pattern of each simulated rainfall event at site i, whose total depth is then y i (t c , T ).For the sake of brevity, the term T is dropped in the following equations.A common approach is to use synthetic hyetograph of fixed pattern of time distribution of rainfall intensities within a period.To this end, we make use of the probabilistic representation of accumulated storm depths for corresponding accumulated storm durations expressed in dimensionless form, namely the mass curve concept developed by Huff (1967) and recalled by Garcia-Guzman and Aranda-Oliver (1993), Bonta (2004) and Candela et al. (2014), among others.Following this approach, rainfall variability within a rainy period can be expressed as: where h(s) is the rainfall intensity at time s, d is a fraction of the total rainfall duration t and V is the total rainfall depth (or total volume).The most important advantage of this procedure is the separation of rainfall intensity from total rainfall In principle, any continuous distribution density function ranging from 0 and 1 could be used to model H(d).In the present study, for each dimensionless time step t j , a uniform sampling of 1000 H(d) values between 5 and 95 % cumulative frequencies of occurrence is carried out.The 1000 isopleths, drawn by connecting dimensionless depths corresponding to identical assigned frequencies by straight lines, represent the storm mass curves originating 1000 synthetic hyetographs.
Prior to characterizing and developing frequency distributions of storm duration and depth data, storms must be identified and separated within rainfall data to form the underlying database for developing a conceptual and mathematical model of within-storm intensities.Several methods exist to identify storms, however, often a minimum dry period is used.Although such minimum dry period can depend on the time of year, often a constant, arbitrary inter-event time is used to separate storms.In the present study, storm events from the available dataset are selected by assuming an interevent time equal to 6 h, as suggested by Huff (1967).
It should be emphasized that from a statistical standpoint it would be advisable only to work with very long sequences.However, in semiarid climates, short duration rains of convective origin are rather frequent.Consequently, a balance between data quality and loss of information has to be found.Since rainfall data used in this module are recorded at fixed time interval of 10 min, we decide to disregard all the storm events lasting less than 100 min, which can provide too biased information about the 10 min rainfall.

Hydrologic losses module
The hydrologic losses module allows to transform gross rainfall into effective rainfall.In particular, the SCS-CN method, adopted by US Natural Resources Conservation Service, is used for this purpose.This method allows to incorporate information on land use changes, as CN is a function of soil type, land use, soil cover condition and degree of saturation of the soil before the start of the storm, ranging from 0 (for fully permeable soil) to 100 (for completely impervious soil).
It is worth reminding that initial soil moisture conditions represent a major parameter in the surface runoff generation, since this condition controls infiltration capacity (Bronstert and Bárdossy, 1999), especially at the beginning of the storm.
In particular, in the SCS-CN method the soil conditions are described through the definition of three antecedent moisture conditions (AMC) classes, dry (AMC I), normal (AMC II) and saturated (AMC III), depending on the total 5 days rainfall previous to the storm event and the season.Hawkins et al. (1985) have found formulas to calculate the CN for AMC I and AMC III from the values of CN corresponding to AMC II.
Since, a rainfall event variable in time is considered, the effective rainfall y e,AMC (t) is calculated in a dynamic form, as a function of the maximum soil potential retention S AMC , according to the following well-known expressions (Chow et al., 1988): where S AMC = 254 • 100 CN AMC − 1 and c • S AMC is the initial abstraction, namely the initial fraction of storm depth after which runoff begins (typically c = 0.2).

Flood routing module
In this study, the Time-Area (TA) rainfall-runoff model is applied as flood routing technique to simulate the hydrologic responses to design storms.According to this well-known conceptual method, largely applied in hydrology, the river basin is schematized as a network of linear channels linking each point to the outlet, so that storage effects are disregarded.In particular, the basin is divided into an arbitrary number of subareas separated by isochrones, i.e. isolines of equal travel time to the outlet (Saghafian et al., 2002).The histogram of consecutive contributing subareas from the outlet in upstream direction, is the unit hydrograph, namely the model transfer function which transforms the excess rainfall in runoff.To construct the Time-Area histogram, the catchment time to equilibrium must be determined.This time is a function of rainfall intensity and catchment characteristics and represents the time for the most hydraulically remote point in the catchment to contribute to the surface runoff at the outlet for infinitely long rainfall duration.In what follows, time to equilibrium is loosely replaced by the time of concentration.This time is divided into a number of equal time intervals, representing the time difference between adjacent isochrones.
In this study, we use a GIS software for deriving the Time-Area histogram based on a grid Digital Elevation Models (DEM) of the basins under investigation.In particular, following an approach proposed by Kull and Feldman (1998), the travel time for each grid cell within the basin is assumed proportional to the time of concentration scaled by the ratio of travel length of the cell over the maximum travel length.In other words, we assume an uniform and constant average velocity of runoff travelling from any point in the basin to the outlet.This approach allows to derive the time of concentra- Once that the Time-Area histogram is known, the runoff hydrograph may be determined through convolution: where Q n is the runoff discharge at the time step n, i e m is the effective rainfall intensity at time m and A n−m+1 is the contributing subarea at the time step n − m + 1.
It should be stressed here that, the effective rainfall intensities in Eq. ( 6) correspond to the effective rainfalls resulting from the hydrologic losses module for each of the three AMC classes.In other words, three hydrographs are derived for each rainfall event.Meanwhile, the frequency distribution of AMC classes is computed based on the total 5 days rainfall records previous to historical storm events.Finally, recalling the Bayes' rule, the runoff hydrograph conditioned by the AMC classes is given by: where ω l , l = 1, 2 ...3, is the frequency of lth AMC class.Following this approach 1000 synthetic hydrographs are generated in response to the generated hyetographs.
3 Application of the proposed Monte Carlo procedure By using a hierarchical clustering approach based on the Ward's minimum variance hierarchical algorithm (Ward, 1963) and the heterogeneity measures proposed by Hosking and Wallis (1997) as homogeneity test, Bonaccorso and Aronica (2016) have identified 5 approximately homogeneous regions for Sicily region.In Fig. 2, the four gauged river basins in Sicily region (Italy) under investigation are superimposed to the identified homogeneous regions.
Testing the goodness of fit of several candidate probability distributions to the regionally pooled standardized AMR series y (t) through the L-moment ratio diagram, they have found out that the generalized extreme value distribution (GEV) is suitable to model extreme rainfall events with duration greater than 1 h and less than 24 h.As the time of concentration of the considered basins ranges between 6.5 and 17.90 h (see Table 2), the GEV is assumed appropriate to model the rainfall depth quantile according to Eq. ( 1).
In particular, the GEV cumulative distribution function can be expressed as: with k, µ and σ , the shape, location and scale parameters respectively, estimated through the regional frequency analysis within the rainfall generator module.
Reminding that the return period T is proportional to the exceedance probability 1 − F y (t) , after some simple algebra, the T -year quantile of GEV distribution is derived as: which represents the growth curve in Eq. ( 1).
With reference to the index value η i (t) in Eq. ( 1), following a traditional empirical approach, in the present study the sample median of the variable under investigation is assumed as a valid estimate (Hampel, 1974;Robson and Reed, 1999).
For a lumped application of the model, a regional index value η R (t) must be considered in place of a site specific scale factor η i (t) in Eq. ( 1).This is equivalent to assume that design storm is uniformly distributed over the basin.Therefore, Eq. ( 1) becomes: In order to assess the regional index value η R (t) for each basin, first a power law relation between local medians of AMR series and the corresponding storm durations (e.g. 1, 3, 6, 12 and 24 h) is determined as: η i (t) = a i • t n i .Then, the local parameters a i and n i are spatially interpolated and averaged over the catchments' areas in GIS environment, so that the regional parameters, a R and n R , are determined for each basin.Finally, the regional index value corresponding to the time of concentration t c is computed as: η R (t c ) = a R •t n R c .Monte Carlo simulations have been carried out to assess the accuracy of the estimated regional growth curves (Eq.9).

B. Bonaccorso et al.: Frequency distributions of peak flows in ungauged basins
In particular, the parameters of GEV distributions have been fitted to the regional averaged L-moment ratios, then a large number N sim of realizations of K samples have been generated.For each realization, GEV distributions have been fitted to the at-site data and growth curves, for specified return periods T, have been estimated.Finally, the relative RMSE (rRMSE) values are calculated as follows: where ŷR,j (t, T ) is the estimated T -year regional growth curve at j th simulation and y i (t, T ) is the true T-year growth curve for site i.For this procedure, the package lmomRFA by Hosking (2015) in R Software has been used.Results suggest that the performance of the regional L moment approach can be considered acceptable, although estimates with return periods larger than T = 100 years should be treated with caution, mainly for region 3 where the regional record length amounts to 565 data (see Table 2 in Bonaccorso and Aronica, 2016).
Therefore, one thousand design storm depths for every basin are randomly drawn by solving Eq. ( 10).Then, each storm depth is distributed in time according to a synthetic hyetograph randomly selected among those ones resulting from the Huff curves procedure, by simply multiplying the time of concentration and the storm depth for the x axis and y axis of the selected hyetograph respectively.
Mean areal CN values for AMC II are derived for each river basin from a CN AMC II digital map at 100 m grid resolution available for the entire Sicily region.Also, mean areal CN values for AMC I and AMC III are obtained by using the formulas by Hawkins et al. (1985).One thousand effective rainfall intensities for each AMC conditions are thus derived through Eq. ( 5).Finally, the synthetic hydrographs are obtained by using Eq. ( 7), where the AMC class frequencies ω l for the considered river basins have been computed by Aronica and Candela (2007).In particular, they derived the AMC frequency distributions only from daily discharge data at catchment outflow, in order to make the procedure applicable also to catchments where the timing of maximum peak discharges is not available.
In Table 2, rainfall generator model parameters are reported for each river basin together with other information, i.e.: belonging homogeneous region, area [km 2 ], time of concentration [hours] and mean areal CN.

Results
Synthetic and observed peak flow values are plotted versus the corresponding return periods T = 1/(1 − F (Q)), where F (Q) is the non-exceedance frequency here computed by the Weibull plotting position formula.The resulting flood frequency curves for all basins are reported in Fig. 3 and compared each other.Flood frequency curves derived by considering CN values for AMC II condition only are also represented by red colored lines.
A fair agreement can be observed for all the considered river basins when all the AMC class frequencies are taken into consideration (black lines), with Salso at Ponte Gagliano performing a little worse.This fact can be related to the limited sample size of Salso at Ponte Gagliano record, which just counts 18 data.As it can be expected, in general the quality of the agreement decays for high values of return period.
To this end, it should be pointed out that the uncertainty of the measurements increases for high values peak discharge (high return periods), due to a lower accuracy of river stagedischarge relationship.In addition, inundation of floodplains in some part of the catchments (more common for higher return periods) could produce a reduction of maximum peak discharge within the river, leading to biased measurements at the end.Finally, it is worth reminding that the adopted flood routing technique does not take into account potential storage effects in the hydrologic response of the catchments, which could be a limitation for large areas.Nevertheless, the results confirm that, overall, the proposed procedure can reproduce observed flood frequency curves with reasonable accuracy.

Conclusions
In the present paper a simple Monte Carlo scheme is proposed to derive peak flow frequency distribution in ungauged or poorly gauged basins.The proposed approach is based on a simplified and parsimonious description of the flood formation process, characterized by a reduced number of parameters in order to ensure a reduced uncertainty in model predictions.Clearly, each module of the procedure shows some advantages and drawbacks.
The regional frequency analysis in the rainfall generator module enables to reduce the uncertainty related to the limited sample size of AMR series (more accurate assessment of long T -year extremes).On the other hand, it relies on the concept of homogeneous region, whose identification is somehow subjective and could change as more and more AMR data are available.The hydrologic losses module based on the CN method is easy to apply and able to include soil type and cover condition, land use and soil moisture conditions, which has been found to be a dominant factor in hydrological response of the catchments under study.However, estimation of the AMC frequency distribution is so far possible for streamflow gauged basins only.The performance of the Time-Area technique in the flood routing module could be improved by considering non-uniform temporal distribution of rainfall as well as spatially variable basin characteristics.Nonetheless, it does not take into account storage effects and is stationary, namely a unique time-invariant trans- fer function is considered for runoff hydrograph computation, regardless of the effective rainfall input.
Following previous remarks, future enhancement of this study could include: (i) the use of meteorological information through synthetic indexes combined with rainfall data analysis to improve the selection of homogeneous regions; (ii) a regional model of antecedent soil moisture conditions and (iii) the implementation of a dynamic flood routing technique for a more flexible and reliable simulation of the flood formation process.
Competing interests.The authors declare that they have no conflict of interest.

Table 1 .
Streamflow gauges where historical annual maxima peak flow series are available.
Figure 1.Sketch of the Monte Carlo procedure for peak flows simulation in ungauged basins.
Bonaccorso et al.:Frequency distributions of peak flows in ungauged basins depth and duration, which simplifies the data analysis and makes it applicable to large areas.In practice, the dimensionless cumulative hyetograph H(d) can be modelled based on the sample of observed dimensionless accumulated storm depths, i.e.: www.adv-geosci.net/44/15/2017/Adv.Geosci., 44, 15-22, 2017 B.

Table 2 .
Rainfall generator model parameters and main characteristics of the investigated river basins.