This document demonstrates how to use the myrwaR package to export data from the MyRWA Water Quality Database to csv files that can be imported into WQX/STORET.

There are three types of datasets that can be imported to WQX: Projects, Locations, and Results. For each of these three dataset types, myrwaR provides a corresponding wqx_<type>() function that extracts the data from the database, convert it to the format accepted by WQX, and validates the dataset against WQX criteria and allowed values (aka WQX domain values). Note that the WQX domain values are provided in the package source code in the folder inst/extdata/wqx_domain/.

Open Database Connection

In order to extract data from the water quality database, first we need to create a connection to the database:

library(myrwaR)
base_dir <- 'D:/Dropbox/Work/mystic/db'
db_path <- file.path(base_dir, "MysticDB_20160208.accdb")
ch <- db_connect(db_path)

Projects Table

The WQX Projects table is generated using the wqx_projects() function, which requires the database channel (ch) and a list of one or more project IDs (projects). For example, to export the baseline (Project.ID="BASE") project to WQX format.

projects <- wqx_projects(ch, projects='BASE')
str(projects)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ ProjectID         : chr "BASE"
#>  $ ProjectName       : chr "Baseline Mystic River Watershed MA"
#>  $ ProjectDescription: chr "Water quality data collected since 2000 by MyRWA from fifteen sites across the watershed that documents trends in water quality"| __truncated__

This data frame can be saved to a csv file, which can then be imported to WQX:

write.csv(projects, file='projects.csv', row.names=FALSE)

Results Table

The WQX Results table is generated using the wqx_results() function, which requires the database channel (ch) and a list of one or more project IDs (projects). For example, to get the results for the baseline sampling program:

results <- wqx_results(ch, projects='BASE')
str(results)
#> 'data.frame':    21846 obs. of  25 variables:
#>  $ ProjectID                              : chr  "BASE" "BASE" "BASE" "BASE" ...
#>  $ MonitoringLocationID                   : chr  "MYRMMP" "MYRMMP" "MYRMMP" "MYRMMP" ...
#>  $ ActivityID                             : chr  "MYRMMP:200801290806:L:FD" "MYRMMP:200801290806:F:FD" "MYRMMP:200801290806:F:FD" "MYRMMP:200801290806:L:FD" ...
#>  $ ActivityType                           : chr  "Quality Control Sample-Field Replicate" "Quality Control Field Replicate Msr/Obs" "Quality Control Field Replicate Msr/Obs" "Quality Control Sample-Field Replicate" ...
#>  $ ActivityMediaName                      : chr  "Water" "Water" "Water" "Water" ...
#>  $ ActivityStartDate                      : chr  "2008-01-29" "2008-01-29" "2008-01-29" "2008-01-29" ...
#>  $ ActivityStartTime                      : chr  "08:06:00" "08:06:00" "08:06:00" "08:06:00" ...
#>  $ ActivityStartTimeZone                  : chr  "EST" "EST" "EST" "EST" ...
#>  $ CharacteristicName                     : chr  "Nitrite" "Dissolved oxygen (DO)" "Dissolved oxygen (DO)" "Enterococcus" ...
#>  $ MethodSpeciation                       : chr  "as N" NA NA NA ...
#>  $ SampleCollectionMethod                 : chr  "grab" NA NA "grab" ...
#>  $ SampleCollectionEquipmentName          : chr  "Water Bottle" NA NA "Water Bottle" ...
#>  $ ResultDetectionCondition               : chr  "Present Below Quantification Limit" NA NA NA ...
#>  $ ResultValue                            : num  NA 11.41 73.62 46 0.23 ...
#>  $ ResultUnit                             : chr  "mg/l" "mg/l" "%" "cfu/100ml" ...
#>  $ ResultMeasureQualifier                 : chr  NA NA NA NA ...
#>  $ ResultSampleFraction                   : chr  "Dissolved" NA NA NA ...
#>  $ ResultStatusID                         : chr  "Final" "Final" "Final" "Final" ...
#>  $ ResultValueType                        : chr  "Actual" "Actual" "Calculated" "Actual" ...
#>  $ ResultAnalyticalMethodID               : chr  "4500-NO3(F)" "D888(B)" "D888(B)" "1600" ...
#>  $ ResultAnalyticalMethodContext          : chr  "APHA" "ASTM" "ASTM" "USEPA" ...
#>  $ ResultDetectionQuantitationLimitType   : chr  "Lower Quantitation Limit" NA NA NA ...
#>  $ ResultDetectionQuantitationLimitMeasure: num  0.1 NA NA NA NA NA NA NA NA 0.1 ...
#>  $ ResultDetectionQuantitationLimitUnit   : chr  "mg/l" NA NA NA ...
#>  $ ResultComment                          : chr  NA NA NA NA ...

The results table can then be validated using wqx_validate_results(). This function will check for missing columns, missing values (only for columns where missing values are not allowed), and whether any values are not part of the WQX domain dictionary.

wqx_validate_results(results)
#> Checking results for missing columns...OK
#> Checking results for columns with missing values...OK
#> Checking results have units when value is not empty...OK
#> Checking results have detection condition when value is empty...OK
#> Checking results against WQX domain values
#>    ActivityType ...OK
#>    ActivityMediaName ...OK
#>    SampleCollectionEquipmentName ...OK
#>    CharacteristicName ...OK
#>    MethodSpeciation ...OK
#>    ResultMeasureQualifier ...OK
#>    ResultDetectionCondition ...OK
#>    ResultSampleFraction ...OK
#>    ResultStatusID ...OK
#>    ResultValueType ...OK
#>    ResultDetectionQuantitationLimitType ...OK
#>    ResultUnit ...OK
#>    ActivityStartTimeZone ...OK
#> Checking results for invalid methods...OK
#> 
#> Validation Complete (OK)

If any errors occur, they will be reported as a Warning, and the message Validation Failed will be shown at the end. Let’s set an invalid value to one of the columns and re-run the validation to see what an error looks like.

results_invalid <- results
results_invalid[1, "CharacteristicName"] <- "Moose"
wqx_validate_results(results_invalid)
#> Checking results for missing columns...OK
#> Checking results for columns with missing values...OK
#> Checking results have units when value is not empty...OK
#> Checking results have detection condition when value is empty...OK
#> Checking results against WQX domain values
#>    ActivityType ...OK
#>    ActivityMediaName ...OK
#>    SampleCollectionEquipmentName ...OK
#>    CharacteristicName ...FAIL
#> Warning in wqx_validate_results(results_invalid): Invalid value(s) in
#> results column CharacteristicName: "Moose"
#>    MethodSpeciation ...OK
#>    ResultMeasureQualifier ...OK
#>    ResultDetectionCondition ...OK
#>    ResultSampleFraction ...OK
#>    ResultStatusID ...OK
#>    ResultValueType ...OK
#>    ResultDetectionQuantitationLimitType ...OK
#>    ResultUnit ...OK
#>    ActivityStartTimeZone ...OK
#> Checking results for invalid methods...OK
#> 
#> Validation Failed

The valid results table can be saved to a csv file and then imported to WQX (note that the na="" is important, otherwise R will insert “NA” into the file, which WQX will not accept):

write.csv(results, file='results.csv', na="", row.names=FALSE)

Locations Table

The WQX Locations Table is generated using the wqx_locations() function, which also requires the database channel (ch) and a list of location IDs (locations). The location IDs can be extracted using the unique() function on the MonitoringLocationID column of the results data frame.

locations <- wqx_locations(ch, locations=unique(results$MonitoringLocationID))
str(locations)
#> 'data.frame':    16 obs. of  9 variables:
#>  $ MonitoringLocationID                                 : chr  "ABR006" "ABR028" "ABR049" "ALB006" ...
#>  $ MonitoringLocationName                               : chr  "Aberjona River at USGS Gaging Station in Winchester; the bank upstream of weir" "Aberjona River at Washington Street in Winchester; upstream side of the bridge " "Aberjona River at Salem Street in Woburn; downstream side of the bridge" "Alewife Brook at Broadway Bridge in Somerville; downstream of the bridge on the bank" ...
#>  $ MonitoringLocationType                               : chr  "River/Stream" "River/Stream" "River/Stream" "River/Stream" ...
#>  $ MonitoringLocationLatitude                           : num  42.4 42.5 42.5 42.4 42.4 ...
#>  $ MonitoringLocationLongitude                          : num  -71.1 -71.1 -71.1 -71.1 -71 ...
#>  $ MonitoringLocationHorizontalCollectionMethod         : chr  "GPS-Unspecified" "GPS-Unspecified" "GPS-Unspecified" "GPS-Unspecified" ...
#>  $ MonitoringLocationHorizontalCoordinateReferenceSystem: chr  "WGS84" "WGS84" "WGS84" "WGS84" ...
#>  $ MonitoringLocationCountyCode                         : chr  "017" "017" "017" "017" ...
#>  $ MonitoringLocationState                              : chr  "MA" "MA" "MA" "MA" ...

The locations table can be validated using wqx_validate_locations(), which works similarly to wqx_validate_results():

wqx_validate_locations(locations)
#> Checking locations for missing columns...OK
#> Checking locations for columns with missing values...OK
#> Checking locations against WQX domain values
#>    MonitoringLocationType ...OK
#>    MonitoringLocationHorizontalCollectionMethod ...OK
#>    MonitoringLocationHorizontalCoordinateReferenceSystem ...OK
#> 
#> Validation Complete (OK)

The locations table can then be saved to a csv file for importing to WQX (note again the na="" argument):

write.csv(locations, file='locations.csv', na="", row.names=FALSE)