This document provides an introduction to the weathergen
R package. This package provides functions for generating synthetic weather data using the method described by Steinschneider and Brown (2013):
Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments. Water Resources Research, 49(11), 7205–7220. doi:10.1002/wrcr.20528
The code in this package is based on the original scripts written by Scott Steinschneider, PhD and modified by Jeffrey D Walker, PhD.
In order to run the code below, a few packages must be loaded first.
library(dplyr)
library(lubridate)
library(tidyr)
library(ggplot2)
library(gridExtra)
library(zoo)
library(moments)
library(weathergen)
theme_set(theme_bw())
# set random seed for reproducibility
set.seed(1)
The weathergen
package includes a dataset of historical climate data for three cities in the Northeast US. These datasets are provided in the climate_cities
dataset, which is a list containing dataframes for 3 cities: Boston, Providence, and Worcester:
data(climate_cities)
str(climate_cities)
## List of 3
## $ boston :'data.frame': 22645 obs. of 5 variables:
## ..$ DATE: Date[1:22645], format: "1949-01-01" ...
## ..$ PRCP: num [1:22645] 0.35 0 0 0.17 16.67 ...
## ..$ TMAX: num [1:22645] 5.99 2.14 3.75 5.46 7.07 14.3 9.21 12.9 9 7.33 ...
## ..$ TMIN: num [1:22645] -0.84 -1.51 -3.15 -1.49 0.61 3.26 1.5 2.21 2.12 4.26 ...
## ..$ WIND: num [1:22645] 10.93 13.11 9.57 7.68 3.31 ...
## $ providence:'data.frame': 22645 obs. of 5 variables:
## ..$ DATE: Date[1:22645], format: "1949-01-01" ...
## ..$ PRCP: num [1:22645] 0 0 0 0.65 12.9 18.4 0 0 0 0 ...
## ..$ TMAX: num [1:22645] 3.68 0.93 4.06 5.44 9.95 ...
## ..$ TMIN: num [1:22645] -1.96 -2.19 -4.68 -3.78 -1.26 2.65 0.66 2.25 0.18 2.87 ...
## ..$ WIND: num [1:22645] 12.32 13.82 10.05 7.52 3.2 ...
## $ worcester :'data.frame': 22645 obs. of 5 variables:
## ..$ DATE: Date[1:22645], format: "1949-01-01" ...
## ..$ PRCP: num [1:22645] 0.47 0 0 0 9.23 ...
## ..$ TMAX: num [1:22645] 6.6 -0.06 3.26 3.82 3.82 ...
## ..$ TMIN: num [1:22645] -3.39 -3.39 -5.62 -8.39 -5.62 -1.73 -3.95 -2.28 -2.28 2.71 ...
## ..$ WIND: num [1:22645] 11.21 13.18 9.58 7.06 3.83 ...
For this vignette, we will use the dataset for Boston, MA.
# extract dataset for boston
obs_day <- climate_cities[['boston']]
# subset complete water years using complete_water_years() function
obs_day <- obs_day[complete_water_years(obs_day$DATE),]
obs_day <- mutate(obs_day,
WYEAR=wyear(DATE, start_month=10), # extract water year
MONTH=month(DATE), # extract month
TEMP=(TMIN+TMAX)/2) %>% # compute mean temperature
select(WYEAR, MONTH, DATE, PRCP, TEMP, TMIN, TMAX, WIND)
# compute annual timeseries by water year
obs_wyr <- group_by(obs_day, WYEAR) %>%
summarise(PRCP=sum(PRCP),
TEMP=mean(TEMP),
TMIN=mean(TMIN),
TMAX=mean(TMAX),
WIND=mean(WIND))
# save the daily and annual timeseries to a list
obs <- list(day=obs_day, wyr=obs_wyr)
# summary of the daily timeseries
summary(obs[['day']])
## WYEAR MONTH DATE PRCP
## Min. :1950 Min. : 1.000 Min. :1949-10-01 Min. : 0.000
## 1st Qu.:1965 1st Qu.: 4.000 1st Qu.:1964-12-30 1st Qu.: 0.000
## Median :1980 Median : 7.000 Median :1980-03-31 Median : 0.000
## Mean :1980 Mean : 6.523 Mean :1980-03-31 Mean : 3.259
## 3rd Qu.:1995 3rd Qu.:10.000 3rd Qu.:1995-07-01 3rd Qu.: 2.120
## Max. :2010 Max. :12.000 Max. :2010-09-30 Max. :185.850
## TEMP TMIN TMAX WIND
## Min. :-18.68 Min. :-25.130 Min. :-14.05 Min. : 0.020
## 1st Qu.: 3.09 1st Qu.: -0.850 1st Qu.: 6.98 1st Qu.: 4.090
## Median : 10.86 Median : 6.070 Median : 15.64 Median : 6.330
## Mean : 10.62 Mean : 5.903 Mean : 15.33 Mean : 6.724
## 3rd Qu.: 18.86 3rd Qu.: 13.810 3rd Qu.: 23.98 3rd Qu.: 8.920
## Max. : 31.99 Max. : 26.240 Max. : 38.15 Max. :23.980
This figure shows the daily timeseries of each climate variable.
gather(obs[['day']], VAR, VALUE, PRCP:WIND) %>%
ggplot(aes(DATE, VALUE)) +
geom_line() +
facet_wrap(~VAR, scales='free_y', ncol=1) +
labs(x='', y='', title='Historical Daily Weather Data for Boston, MA')
This figure shows the annual sum of precipitation, and means of minimum temperature, maximum temperature, and wind speed by water year.
gather(obs[['wyr']], VAR, VALUE, PRCP:WIND) %>%
ggplot(aes(WYEAR, VALUE)) +
geom_line() +
facet_wrap(~VAR, scales='free_y', ncol=1) +
labs(x='', y='', title='Historical Annual Weather Data for Boston, MA')
The primary function for running the daily weather generator on a single site is wgen_daily()
. The function then depends on a number of other functions, which are described in the next section.
The first argument to wgen_daily()
is the daily timeseries of historical climate variables. This argument must be provided as a zoo
object which is designed to store regular and irregular timeseries (see the zoo
package homepage for more details).
This code converts the daily timeseries (currently in obs[['day']]
) to a zoo object. The first argument provides the values of each variable, the second argument provides the associated dates.
zoo_day <- zoo(x = obs[['day']][, c('PRCP', 'TEMP', 'TMIN', 'TMAX', 'WIND')],
order.by = obs[['day']][['DATE']])
class(zoo_day)
## [1] "zoo"
summary(zoo_day)
## Index PRCP TEMP TMIN
## Min. :1949-10-01 Min. : 0.000 Min. :-18.68 Min. :-25.130
## 1st Qu.:1964-12-30 1st Qu.: 0.000 1st Qu.: 3.09 1st Qu.: -0.850
## Median :1980-03-31 Median : 0.000 Median : 10.86 Median : 6.070
## Mean :1980-03-31 Mean : 3.259 Mean : 10.62 Mean : 5.903
## 3rd Qu.:1995-07-01 3rd Qu.: 2.120 3rd Qu.: 18.86 3rd Qu.: 13.810
## Max. :2010-09-30 Max. :185.850 Max. : 31.99 Max. : 26.240
## TMAX WIND
## Min. :-14.05 Min. : 0.020
## 1st Qu.: 6.98 1st Qu.: 4.090
## Median : 15.64 Median : 6.330
## Mean : 15.33 Mean : 6.724
## 3rd Qu.: 23.98 3rd Qu.: 8.920
## Max. : 38.15 Max. :23.980
The daily historical timeseries is then passed as the first argument to the wgen_daily()
function. The other arguments specify a number of parameters for the simulation. These parameters include:
n_year
- number of simulation yearsstart_month
- first month of the water year (e.g. 10=October)start_water_year
- first water year to use for the simulated timeseriesinclude_leap_days
- boolean flag to include or exclude leap days in the simulation timeseries (currently only allows FALSE
)n_knn_annual
- number of years to use in the annual k-nearest neighbors (knn) sampling algorithmdry_wet_threshold
- daily precipitation threshold between the dry
and wet
Markov stateswet_extreme_quantile_threshold
- daily precipitation quantile used as threshold between the wet
and extreme
Markov statesadjust_annual_precip
- boolean flag to adjust simulated daily precipitation amounts to match the simulated annual precipitationannual_precip_adjust_limits
- lower and upper limits for the daily precipitation adjustment factor to match simulated annual precipitation (e.g. c(0.9, 1.1)
will only adjust daily values between 90-110%)dry_spell_changes
- adjustment factor for changing the transition probabilities of dry spellswet_spell_changes
- adjustment factor for changing the transition probabilities of wet spellsprcp_mean_changes
- multiplicative adjustment factor for changing the annual mean precipitationprcp_cv_changes
- multiplicative adjustment factor for changing the coefficient of variation for precipitationtemp_mean_changes
- additive adjustment factor for changing the annual mean temperatureThe last five parameters for assigning adjustment factors can each be either a single number, which is applied to all months, or a vector of length 12 where each value is the change factor for a specific month (starting with January).
sim <- wgen_daily(zoo_day,
n_year=2,
start_month=10,
start_water_year=2000,
include_leap_days=FALSE,
n_knn_annual=100,
dry_wet_threshold=0.3,
wet_extreme_quantile_threshold=0.8,
adjust_annual_precip=TRUE,
annual_precip_adjust_limits=c(0.9, 1.1),
dry_spell_changes=1,
wet_spell_changes=1,
prcp_mean_changes=1,
prcp_cv_changes=1,
temp_mean_changes=0)
The wgen_daily()
function returns a list object with the following elements:
str(sim, max=1)
## List of 6
## $ obs :'data.frame': 22280 obs. of 9 variables:
## $ state_thresholds :'data.frame': 12 obs. of 3 variables:
## $ transition_matrices:List of 12
## $ state_equilibria :List of 12
## $ change_factors :List of 6
## $ out :'data.frame': 730 obs. of 11 variables:
The first element, sim[['obs']]
, is the historical daily timeseries, with additional columns for the water year (WYEAR
), month (MONTH
), and Markov state (d
=Dry, w
=Wet, e
=Extreme).
head(sim[['obs']])
## DATE WYEAR MONTH STATE PRCP TEMP TMIN TMAX WIND
## 1 1949-10-01 1950 10 d 0.00 13.765 8.89 18.64 8.53
## 2 1949-10-02 1950 10 d 0.00 10.225 4.94 15.51 7.31
## 3 1949-10-03 1950 10 d 0.00 12.665 6.06 19.27 3.50
## 4 1949-10-04 1950 10 d 0.00 16.555 11.06 22.05 2.76
## 5 1949-10-05 1950 10 d 0.28 18.970 14.02 23.92 5.59
## 6 1949-10-06 1950 10 d 0.00 14.135 7.84 20.43 6.76
The second element, sim[['state_thresholds']]
, is a dataframe with the monthly precipitation thresholds for each Markov state. For example, in June the Markov state on a given day is dry if the precipitation on that day is less than 0.3 mm/day, wet if it is between 0.3 and 3.33 mm/day, and extreme if it is greater than 3.33 mm/day.
sim[['state_thresholds']]
## MONTH DRY_WET WET_EXTREME
## 1 1 0.3 4.18
## 2 2 0.3 3.80
## 3 3 0.3 4.93
## 4 4 0.3 4.03
## 5 5 0.3 3.60
## 6 6 0.3 3.33
## 7 7 0.3 2.70
## 8 8 0.3 3.12
## 9 9 0.3 2.60
## 10 10 0.3 2.88
## 11 11 0.3 4.75
## 12 12 0.3 4.60
The third element, sim[['transition_matrices']]
, is a list of length 12 where each element is the Markov state transition matrix for each month. For example, the transition matrix for June is:
sim[['transition_matrices']][[6]]
##
## d w e
## d 0.6977169 0.1634703 0.1388128
## w 0.5189189 0.2621622 0.2189189
## e 0.3863014 0.2520548 0.3616438
The fourth element, sim[['state_equilibria
]]`, is a list of length 12 where each element is the state equilibrium of each monthly transition matrix. For example, the equilibrium for June is:
sim[['state_equilibria']][[6]]
## d w e
## 0.5997119 0.2009608 0.1993273
The fifth element, sim[['change_factors']]
, is a list containing the monthly values for each adjustment factor as well as an additional set of adjustments that are computed from the simulated state transitions (ratio_probability_wet
). Each of these elements will each be a vector of length 12 (even if the corresponding parameter to wgen_daily()
was a single value, in which case the same value will be used for all months).
str(sim[['change_factors']])
## List of 6
## $ ratio_probability_wet: num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
## $ dry_spell_changes : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
## $ wet_spell_changes : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
## $ prcp_mean : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
## $ prcp_cv : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
## $ temp_mean : num [1:12] 0 0 0 0 0 0 0 0 0 0 ...
The last element, sim[['out']]
, contains the simulated daily timeseries as a dataframe with the following columns:
SIM_YEAR
- simulation year starting at 1DATE
- simulation date (starting on the first day of the start_month
and start_water_year
parameters, e.g. 1999-10-01 for water year 2000 starting in month 10 (October))MONTH
- simulation monthWDAY
- simulation “water day” which is the julian day starting at the beginning of the water year (e.g. water day 1 = Oct 1)SAMPLE_DATE
- the historical date sampled from the KNN sampling algorithm and used for the each simulation dateSTATE
- simulated Markov statePRCP
, TEMP
, TMIN
, TMAX
, WIND
- simulated climate variablesstr(sim[['out']])
## 'data.frame': 730 obs. of 11 variables:
## $ SIM_YEAR : int 1 1 1 1 1 1 1 1 1 1 ...
## $ DATE : Date, format: "1999-10-01" "1999-10-02" ...
## $ MONTH : num 10 10 10 10 10 10 10 10 10 10 ...
## $ WDAY : int 1 2 3 4 5 6 7 8 9 10 ...
## $ SAMPLE_DATE: Date, format: "2004-10-01" "1985-09-30" ...
## $ STATE : Ord.factor w/ 3 levels "d"<"w"<"e": 1 1 1 1 2 3 3 1 1 1 ...
## $ PRCP : num 0.284 0 0 0 2.25 ...
## $ TEMP : num 15.8 17.3 15.2 12.7 12.5 ...
## $ TMIN : num 12.07 10.11 10.19 7.48 7.84 ...
## $ TMAX : num 19.5 24.6 20.3 17.9 17.1 ...
## $ WIND : num 1.8 3.56 5.44 5.95 7.41 6.29 9.69 7.3 6.37 9.68 ...
This figure shows the simulated timeseries for each variable:
select(sim[['out']], DATE, PRCP, TEMP, TMIN, TMAX, WIND) %>%
gather(VAR, VALUE, -DATE) %>%
ggplot(aes(DATE, VALUE)) +
geom_line() +
facet_wrap(~VAR, ncol=1, scales='free_y') +
labs(x='Simulation Date', y='Simulated Value', title='Simulated Timeseries') +
theme_bw()
The daily weather generator uses the following algorithm:
adjust_annual_precip=TRUE
)the sim_annual_arima()
function is used to simulate annual precipitation using an ARIMA model. The arguments to this function include:
x
- historical annual precipitation as a zoo
objectfunction first fits the ARIMA model to the historical annual precipitation. The ARIMA model is then used to generate a simulation of annual precipitation for length n_year
. The sim_annual_arma()
function returns a list of length 3 containing the ARIMA model object (model
), the historical annual precipitation (observed
), and the simulated annual precipitation (values
).
zoo_precip_wyr <- zoo(obs[['wyr']][['PRCP']], order.by = obs[['wyr']][['WYEAR']])
sim_annual_prcp <- sim_annual_arima(x = zoo_precip_wyr, start_year = 2000, n_year = 40)
This function returns a list of length 3 containing elements:
model
- the ARIMA model object fitted to the historical datax
- the historical annual precipitation used to fit the model as a zoo
objectout
- the simulated annual precipitation as a zoo
objectstr(sim_annual_prcp, max.level=1)
## List of 3
## $ model:List of 17
## ..- attr(*, "class")= chr [1:2] "ARIMA" "Arima"
## $ x :'zoo' series from 1950 to 2010
## Data: num [1:61] 798 1235 1248 1186 1686 ...
## Index: num [1:61] 1950 1951 1952 1953 1954 ...
## $ out :'zoo' series from 2000 to 2039
## Data: num [1:40] 1571 1404 1071 1234 1437 ...
## Index: num [1:40] 2000 2001 2002 2003 2004 ...
The model
element contains an ARIMA model object, which contains the parameters and error statistics of the fitted model:
summary(sim_annual_prcp[['model']])
## Series: x
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## intercept
## 1190.3175
## s.e. 27.0648
##
## sigma^2 estimated as 44683: log likelihood=-413.13
## AIC=830.26 AICc=830.47 BIC=834.48
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 8.013399e-14 211.3826 162.3144 -3.399893 14.55972 0.6760569
## ACF1
## Training set -0.0452353
The out
element is an annual precipitation timeseries as a zoo
object:
sim_annual_prcp[['out']]
## 2000 2001 2002 2003 2004 2005 2006
## 1570.8299 1404.1184 1071.1577 1233.7399 1436.6759 1663.0372 1254.2111
## 2007 2008 2009 2010 2011 2012 2013
## 969.9498 982.4138 1614.2915 752.6348 1836.2482 1135.0726 1094.2667
## 2014 2015 2016 2017 2018 2019 2020
## 1223.6231 1387.6197 1254.3302 776.8215 1265.0490 1285.5294 1329.7351
## 2021 2022 2023 2024 2025 2026 2027
## 972.2932 689.1246 1121.7078 990.7092 1028.4217 988.7052 1106.1863
## 2028 2029 2030 2031 2032 2033 2034
## 1124.5317 1358.5977 1398.8313 1022.3674 1125.0388 1266.7207 1485.9976
## 2035 2036 2037 2038 2039
## 1178.4652 831.2053 1239.3271 1165.1438 1564.9916
This figure plots the simulated annual precipitation and shows the mean annual precipitation based on the historical dataset for reference.
data.frame(WYEAR=time(sim_annual_prcp[['out']]),
PRCP=zoo::coredata(sim_annual_prcp[['out']])) %>%
ggplot(aes(WYEAR, PRCP)) +
geom_line() +
geom_hline(yintercept=mean(obs[['wyr']][['PRCP']]), color='red') +
geom_text(aes(label=TEXT), data=data.frame(WYEAR=2000, PRCP=mean(obs[['wyr']][['PRCP']]), TEXT="Historical Mean"),
hjust=0, vjust=-1, color='red') +
labs(x="Simulation Water Year", y="Annual Precip (mm/yr)", title="Simulated Annual Precipitation")
To determine whether the annual precipitation generator sufficiently reproduces the statistics of the historical timeseries, a Monte Carlo simulation is performed for 50 trials using the same historical timeseries.
# batch simulation to compare mean/sd/skew
sim_wyr_batch <- lapply(1:50, function(i) {
sim_annual_arima(x = obs[['wyr']][['PRCP']], start_year = 2000, n_year = 20)[['out']]
})
names(sim_wyr_batch) <- paste0(seq(1, length(sim_wyr_batch)))
sim_wyr_batch <- do.call(merge, sim_wyr_batch) %>%
apply(MARGIN=2, FUN=function(x) { c(mean=mean(x), sd=sd(x), skew=skewness(x))}) %>%
t
sim_wyr_stats <- rbind(t(colMeans(sim_wyr_batch)),
c(mean=mean(obs[['wyr']][['PRCP']]),
sd=sd(obs[['wyr']][['PRCP']]),
skew=skewness(obs[['wyr']][['PRCP']])))
sim_wyr_stats <- as.data.frame(sim_wyr_stats)
sim_wyr_stats$dataset <- c('obs', 'sim')
sim_wyr_stats <- gather(sim_wyr_stats, stat, value, mean, sd, skew)
The following figure shows the distributions of the mean, standard deviation, and skewness statistics over the simulation trials. The overall mean value of each statistic (blue point) is also compared to the statistic value of the historical timeseries (red point). This figure shows that the annual precipitation generator creates simulated timeseries with similar statistics as the historical timeseries.
as.data.frame(sim_wyr_batch) %>%
mutate(trial=row_number()) %>%
gather(stat, value, mean:skew) %>%
ggplot(aes(x=stat, value)) +
geom_boxplot() +
geom_point(aes(x=stat, y=value, color=dataset), data=sim_wyr_stats, size=3) +
scale_color_manual('', values=c('sim'='deepskyblue', 'obs'='orangered'),
labels=c('sim'='Mean of Simulated Trials', 'obs'='Mean of Historical')) +
labs(x='', y='') +
facet_wrap(~stat, scales='free')
The annual precipitation generated in the previous step is then used to drive a daily simulation for all weather variables. For each year, a k-nearest neighbors algorithm samples n_knn_annual
years from the historical record with replacement. The daily historical timeseries for each year is then extracted and combined into a daily timeseries, where the sequence of years is based on the KNN sampling. A Markov Chain model is then fitted to the sampled historical daily and used to simulate a sequence of states for each simulation year. Finally, the simulated states are used to sample the historical daily values using a daily KNN algorithm.
For each simulation year, a set of n_knn_annual
years are sampled with replacement from the historical record using inverse distance weighted sampling. Historical years with annual precipitation similar to the current simulated annual precipitation are given a higher weight and thus more likely to be sampled than historical years where the annual precipitation was much higher or lower than the simulated precipitation.
n_knn_annual <- 100
# generate vector of sampled years
sampled_years <- knn_annual(prcp=coredata(sim_annual_prcp[['out']])[1],
obs_prcp=zoo_precip_wyr,
n=n_knn_annual)
sampled_years
## [1] 1984 1998 1998 1990 2006 1998 1954 2006 1984 2006 1998 1958 1998 1958
## [15] 1984 1960 1951 1984 1984 1967 1984 1970 1974 1956 1998 1998 1955 1998
## [29] 2004 1955 1984 1998 1960 1998 1994 1952 1958 1998 1998 1998 2010 1996
## [43] 1984 2006 1998 1998 1955 1961 1996 1954 1958 1998 1970 1984 1998 1983
## [57] 2008 1967 1984 2004 1954 1958 2003 1998 1955 2009 1955 1998 1998 1987
## [71] 1998 1998 2008 1996 1998 1998 2006 1976 2010 1998 1984 2004 1998 1998
## [85] 1998 1954 1954 1967 1998 2003 1998 1987 2006 1984 1998 2006 2006 1974
## [99] 1970 1984
This figure shows the historical annual precipitation for each water year. The red line shows the annual precipitation for the first simulated year. The red points show which historical years were included in the KNN sampling. Note that these points tend to be close to the target simulated precipitation.
ggplot() +
geom_point(aes(WYEAR, PRCP, color=SAMPLED), data=obs[['wyr']] %>% mutate(SAMPLED=WYEAR %in% sampled_years)) +
geom_hline(yintercept=sim_annual_prcp[['out']][1], linetype=2, color='red') +
geom_text(aes(x=x, y=y, label=label),
data=data.frame(x=min(obs[['wyr']][['WYEAR']]),
y=sim_annual_prcp[['out']][1],
label='Target Simulated Annual Precip'),
vjust=-0.5, hjust=0) +
scale_color_manual('Sampled in KNN', values=c('TRUE'='orangered', 'FALSE'='grey30')) +
labs(x='Water Year', y='Annual Precipitation (mm/yr)')
For each of the sampled years in sampled_years
, the corresponding daily values are extracted from the historical time series and then combined into a single timeseries. The following figure shows the resulting daily timeseries for the 100 years sampled for the first simulated water year. The points are colored by their historical water year, and thus show years where the historical annual precip is closest to the simulated precip are repeated.
# loop through population years and extract historical daily values
sampled_days <- lapply(sampled_years, function(yr) {
obs[['day']][which(obs[['day']][['WYEAR']]==yr), ]
}) %>%
rbind_all() %>%
mutate(SAMPLE_INDEX=row_number())
# plot combined daily timeseries based on the KNN sampled years
ggplot(sampled_days, aes(SAMPLE_INDEX, PRCP, color=WYEAR)) +
geom_point(size=1) +
labs(x='Index Day', y='Daily Precip (mm/day)', title='Synthetic Daily Timeseries from KNN Sampled Years')
The resulting timeseries of daily values for the sampled years is then used to run the sim_daily()
function. In this function, a Markov Chain simulation is fitted to the synthetic daily timeseries.
The first step is to compute the state thresholds using the mc_state_threshold()
function which takes a vector of daily precipitation, a vector of corresponding months, and the dry_wet_threshold
and wet_extreme_quantile_threshold
.
thresh <- mc_state_threshold(sampled_days[['PRCP']], sampled_days[['MONTH']],
dry_wet_threshold=0.3, wet_extreme_quantile_threshold=0.8)
thresh_df <- do.call(rbind, thresh) %>%
as.data.frame() %>%
mutate(month=row_number()) %>%
select(month, dry_wet, wet_extreme)
thresh_df
## month dry_wet wet_extreme
## 1 1 0.3 4.600
## 2 2 0.3 5.000
## 3 3 0.3 4.970
## 4 4 0.3 5.356
## 5 5 0.3 8.100
## 6 6 0.3 5.970
## 7 7 0.3 3.280
## 8 8 0.3 5.150
## 9 9 0.3 1.800
## 10 10 0.3 3.150
## 11 11 0.3 7.350
## 12 12 0.3 4.694
This figure shows the threshold amounts between the Dry/Wet and Wet/Extreme states for each month.
gather(thresh_df, var, value, -month) %>%
ggplot(aes(factor(month), value, fill=var)) +
geom_bar(stat='identity', position='dodge') +
scale_fill_manual('Threshold', values=c('dry_wet'='orangered', 'wet_extreme'='deepskyblue'),
labels=c('dry_wet'='Dry-Wet', 'wet_extreme'='Wet-Extreme')) +
labs(x='Month', y='Threshold Amount (mm/day)', title='Markov State Thresholds by Month')
After the state thresholds are determined, they are assigned to the synthetic daily timeseries, sampled_days
.
sampled_days$STATE <- mc_assign_states(sampled_days$PRCP, sampled_days$MONTH, c('d', 'w', 'e'), thresh)
head(sampled_days)
## Source: local data frame [6 x 10]
##
## WYEAR MONTH DATE PRCP TEMP TMIN TMAX WIND SAMPLE_INDEX
## (dbl) (dbl) (date) (dbl) (dbl) (dbl) (dbl) (dbl) (int)
## 1 1984 10 1983-10-01 3.15 15.420 11.41 19.43 4.91 1
## 2 1984 10 1983-10-02 5.35 16.630 12.47 20.79 1.33 2
## 3 1984 10 1983-10-03 0.00 19.825 14.59 25.06 5.89 3
## 4 1984 10 1983-10-04 0.00 20.760 13.24 28.28 7.72 4
## 5 1984 10 1983-10-05 0.55 15.110 10.38 19.84 8.08 5
## 6 1984 10 1983-10-06 0.10 16.965 13.49 20.44 6.39 6
## Variables not shown: STATE (fctr)
This figure shows the Markov state for each day in the first 5 simulated years.
filter(sampled_days, SAMPLE_INDEX <= 365*5) %>%
ggplot(aes(SAMPLE_INDEX, PRCP, color=STATE)) +
geom_point(size=2) +
scale_color_manual('Markov State',
values=c('d'='orangered', 'w'='chartreuse3', 'e'='deepskyblue'),
labels=c('d'='Dry', 'w'='Wet', 'e'='Extreme')) +
ylim(0, 50) +
labs(x='Sample Date Index', y='Daily Precipitation (mm/day)', title='Markov State of Synthetic Daily Timeseries')
## Warning: Removed 17 rows containing missing values (geom_point).
After the states have been assigned ot the sampled daily timeseries, the mc_fit()
function can be used to fit a Markov Chain model for each month.
transition_matrices <- mc_fit(states=sampled_days[['STATE']], months=sampled_days[['MONTH']])
For example, the transition matrix for June is shown below. The probability of having a wet day follow a dry day is 0.26, and the probability of two dry days in a row is 0.66.
transition_matrices[[6]]
##
## d w e
## d 0.65725542 0.25607354 0.08667104
## w 0.44909502 0.28167421 0.26923077
## e 0.24957841 0.39797639 0.35244519
The equilibrium state can be determined for each month using the mc_state_equilibrium()
function. This function finds the eigenvector of a transition matrix. The equilibrium state for June is thus:
mc_state_equilibrium(transition_matrices[[6]])
## d w e
## 0.5192105 0.2905118 0.1902777
Once the transition matrices have been fit to the sampled historical daily values, the daily generator can be run.
Given the fitted monthly transition matrices, the daily simulation for each simulation year can be performed using the sim_mc_knn_day()
function. This function simulates a chain of Markov states using the transition matrices, and uses a KNN sampling algorithm to choose the values of each weather variable.
sampled_days <- dplyr::mutate(sampled_days,
WDAY=waterday(DATE, start_month=10),
STATE_PREV=lag(STATE),
PRCP_PREV=lag(PRCP),
TEMP_PREV=lag(TEMP),
TMAX_PREV=lag(TMAX),
TMIN_PREV=lag(TMIN),
WIND_PREV=lag(WIND))
sim_days <- sim_mc_knn_day(x=sampled_days, n_year=1, states=c('d', 'w', 'e'), transitions=transition_matrices,
start_month=10, start_water_year=2000, include_leap_days=FALSE)
str(sim_days, max=1)
## 'data.frame': 365 obs. of 11 variables:
## $ SIM_YEAR : num 1 1 1 1 1 1 1 1 1 1 ...
## $ DATE : Date, format: "1999-10-01" "1999-10-02" ...
## $ MONTH : num 10 10 10 10 10 10 10 10 10 10 ...
## $ WDAY : int 1 2 3 4 5 6 7 8 9 10 ...
## $ SAMPLE_DATE: Date, format: "1957-10-01" "1957-10-02" ...
## $ STATE : Ord.factor w/ 3 levels "d"<"w"<"e": 2 1 1 1 2 1 1 1 1 1 ...
## $ PRCP : num 0.35 0 0 0 1.77 0 0 0 0 0 ...
## $ TEMP : num 19 14.7 12 12.9 12.2 ...
## $ TMIN : num 12.39 9.66 5.89 6.46 6.63 ...
## $ TMAX : num 25.7 19.6 18.1 19.4 17.7 ...
## $ WIND : num 0.4 8.55 9.28 7.65 2.9 ...
This figure shows the simulated weather variables for the current simulation year.
select(sim_days, DATE, PRCP:WIND) %>%
gather(VAR, VALUE, -DATE) %>%
ggplot(aes(DATE, VALUE)) +
geom_line() +
facet_wrap(~VAR, scales='free_y', ncol=1) +
labs(x='Simulate Date', y='Value', title='Daily Simulation for One Year')
Because the daily precipitation values are sampled directly from historical values, the total precipitation for each simulation year will likely differ from the corresponding simulated annual precipitation.
The adjust_daily_to_annual()
function is used to adjust the simulated daily precipitation values so that the sum of these values equals the simulated annual total precipitation.
For example, the current simulated annual precipitation is 1571 mm/yr, but the sum of the simulated daily precipitation is 970.6 mm/yr. To adjust the daily values such that their sum equals the simmulated annual value, the daily values can be all multiplied by a factor of 1.62. The following code block uses the adjust_daily_to_annual()
function to perform this adjustment for the first simulation year. Note that this function can be applied to the entire simulated timeseries containing multiple years, and adjust each individual accordingly.
orig_prcp <- sim_days[['PRCP']]
adj_prcp <- adjust_daily_to_annual(values_day=sim_days[['PRCP']],
years_day=wyear(sim_days[['DATE']]),
values_yr=coredata(sim_annual_prcp[['out']])[1],
years_yr=time(sim_annual_prcp[['out']])[1],
min_ratio=0.5, max_ratio=1.5)
c(Original=sum(orig_prcp), Adjusted=sum(adj_prcp$adjusted), Annual=coredata(sim_annual_prcp[['out']])[1])
## Original Adjusted Annual
## 970.58 1455.87 1570.83
This figure shows the original and adjusted daily precipitation values. Note how the adjusted values (dashed red line) are linearly scaled by a factor of 1.5.
data.frame(DATE=sim_days[['DATE']],
ORIGINAL=orig_prcp,
ADJUSTED=adj_prcp$adjusted) %>%
gather(VAR, VALUE, -DATE) %>%
ggplot(aes(DATE, VALUE, color=VAR, linetype=VAR)) +
geom_line() +
scale_color_manual('', values=c('ORIGINAL'='black', 'ADJUSTED'='orangered')) +
scale_linetype_manual('', values=c('ORIGINAL'=1, 'ADJUSTED'=2)) +
labs(x='Simulation Date', y='Daily Precipitation (mm/day)')
After generating the simulated daily timeseries, the precipitation and temperature variables can be adjusted using the change factor parameters.
For this example, a 5-year simulated timeseries is first generated without using any change factors
sim <- wgen_daily(zoo_day,
n_year=5,
start_month=10,
start_water_year=2000,
include_leap_days=FALSE,
n_knn_annual=100,
dry_wet_threshold=0.3,
wet_extreme_quantile_threshold=0.8,
adjust_annual_precip=TRUE,
annual_precip_adjust_limits=c(0.5, 1.5),
dry_spell_changes=1,
wet_spell_changes=1,
prcp_mean_changes=1,
prcp_cv_changes=1,
temp_mean_changes=0)
The simulated precipitation timeseries can then be adjusted using the adjust_daily_gamma()
function. This function performs a quantile mapping by first fitting the non-zero precipitation values to a gamma distribution, adjusting the scale and shape parameters of the distribution, and replacing the precipitation values by mapping the original values through the adjusted distribution.
The following code block computes three sets of adjusted timeseries by adjusting the mean, the coefficient of variation, or both parameters.
prcp.unadj <- sim[['out']][, c('DATE','MONTH','PRCP')]
prcp.adj.mean <- adjust_daily_gamma(prcp.unadj[['PRCP']], prcp.unadj[['MONTH']],
mean_change=1.2, cv_change=1)
prcp.adj.cv <- adjust_daily_gamma(prcp.unadj[['PRCP']], prcp.unadj[['MONTH']],
mean_change=1, cv_change=1.2)
prcp.adj.both <- adjust_daily_gamma(prcp.unadj[['PRCP']], prcp.unadj[['MONTH']],
mean_change=1.2, cv_change=1.2)
prcp.df <- data.frame(DATE=prcp.unadj[['DATE']],
UNADJESTED=prcp.unadj[['PRCP']],
ADJUST_MEAN=prcp.adj.mean[['adjusted']],
ADJUST_CV=prcp.adj.cv[['adjusted']],
ADJUST_BOTH=prcp.adj.both[['adjusted']]) %>%
gather(GROUP, VALUE, -DATE)
This figure shows the unadjusted and three adjusted daily precipitation timeseries.
ggplot(prcp.df, aes(DATE, VALUE)) +
geom_line() +
facet_wrap(~GROUP) +
labs(x='', y='Daily Precipitation (mm/day)', title='Unadjusted and Adjusted Daily Precipitation')
This figure shows the annual precipitation for the unadjusted and adjusted precipitation timeseries by water year.
mutate(prcp.df, WYEAR=wyear(DATE, start_month=10)) %>%
group_by(WYEAR, GROUP) %>%
summarise(VALUE=sum(VALUE)) %>%
ggplot(aes(factor(WYEAR), VALUE, fill=GROUP)) +
geom_bar(stat='identity', position='dodge') +
scale_fill_discrete('') +
labs(x='Water Year', y='Annual Precipitation (mm/yr)', title='Annual Precipitation With Adjustments')
This figure shows the cumulative distribution frequency of each timeseries. Note that the y-axis has been truncated to a maximum of 60 mm/yr in order to show the lower part of the distribution.
prcp.df %>%
filter(VALUE>0) %>%
group_by(GROUP) %>%
arrange(VALUE) %>%
mutate(ROW=row_number(),
PROB=(ROW-0.5)/n()) %>%
ggplot(aes(PROB, VALUE, color=GROUP)) +
geom_line() +
ylim(0, 60) +
scale_color_discrete('') +
labs(y='Daily Precipitation (mm/day)', x='Non-Exceedence Probability', title='Distributions of Adjusted Precipitation Timeseries')
## Warning: Removed 31 rows containing missing values (geom_path).
The temperature timeseries can also be adjusted using a linear additive factor through the adjust_daily_additive()
function. This function simply adds the adjustment factor to each daily value. In this example, the mean temperature is adjusted by a factor of 5 meaning each daily mean temperature is increased by 5 degrees.
temp.unadj <- sim[['out']][, c('DATE','MONTH','TEMP')]
temp.adj <- adjust_daily_additive(temp.unadj[['TEMP']], temp.unadj[['MONTH']],
mean_change=5)
temp.df <- data.frame(DATE=temp.unadj[['DATE']],
UNADJESTED=temp.unadj[['TEMP']],
ADJUSTED=temp.adj[['adjusted']]) %>%
gather(GROUP, VALUE, -DATE)
This figure shows the unadjusted and adjusted daily mean temperature.
ggplot(temp.df, aes(DATE, VALUE, color=GROUP)) +
geom_line() +
scale_color_discrete('') +
labs(x='', y='Daily Mean Temperature (degC)', title='Unadjusted and Adjusted Daily Mean Temperature')
This figure shows the unadjusted and adjusted annual mean temperature. Note how the adjusted annual mean temperatures are 5 degrees higher than the unadjustd temperatures.
mutate(temp.df, WYEAR=wyear(DATE, start_month = 10)) %>%
group_by(WYEAR, GROUP) %>%
summarise(VALUE=mean(VALUE)) %>%
ggplot(aes(factor(WYEAR), VALUE, fill=GROUP)) +
geom_bar(stat='identity', position='dodge') +
scale_fill_discrete('') +
labs(x='', y='Annual Mean Temperature (degC)', title='Unadjusted and Adjusted Annual Mean Temperature')