Investigating Air Temperature with HydroClient and RStudio
Introduction
Conceptual Outcomes
Practical Outcomes
Complex graph of Air Temperature in R Studio
Summary Statistics of Air Temperature in R Studio
Time Required
Computing/Data Outputs
Hardware/Software Required
R and R Studio
WaterML and ggplot2 CRANPackages
Any major internet browser
Instructions
Open up RStudio. If you do not already have RStudio installed on your computer, you can download a free version here: https://www.rstudio.com/products/rstudio/download/. You will also need to install WaterML and ggplot2 packages. The WaterML package allows you to retrieve data from the Hydrologic Information System. The ggplot2 package is a plotting system for R that makes it easy to produce complex graphics.
To install WaterML and ggplot2 packages use the install.packages() command:
Install.packages ("WaterML")
install.packages("ggplot2")
Now use the require() command to load both packages:
require(WaterML)
require(ggplot2)
Now open up HydroClient. Search Boston, MA, United States and zoom out twice as shown below.
Set your search parameters to
- Date Range: 09/01/2014 – 09/30/2014
- Keyword: Air Temperature
- Data Service: Organization-U.S. Geological Survey, Title-NWIS Daily Values
After clicking Search Map, the data results will appear as below.
Click on the Data Tab to access the Data Table. In the Data Table search Stony Brook. The time series with the Site Name: STONY BROOK RESERVOIR AT DAM NEAR WALTHAM, MA will appear. Use the scroll bar to scroll right to view the Web Service Description URL as shown below. Click here if you are not sure how to search for data on HydroClient.
Click on the link. Now copy the URL from the webpage shown below (i.e. http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL). This link will tell R where the data are located on the internet.
Once you have copied the URL you may exit out of the Hydroportal and HydroClient webpages because you have just gathered the necessary link to access the time series data and graph it in R.
In RStudio, set a variable that defines the server location by pasting the URL from the previous step:
USGS <- "http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL"
This defines the CUAHSI HIS service that you are connecting to by giving the URL to that service's WSDL file. This example uses a service from the USGS.
##Set Variable for data service URL
USGS <- "http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL"
You will use the GetValues command to set the variable for the data. The GetValues command requires three inputs:
- Data Service: The Data Source registered in the HIS Central Catalog, e.g. US Geological Survey (USGS), U.S. Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) SNOTEL Data, National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network (GHCN).
- Site Code: The unique code given by the data publisher for the location of the observation.
- Variable Code: The unique code used by the organization that collects the data.
The Start Date and End Date are two optional inputs that can be inserted in the GetValues command.
## Use the GetValues command to set the variable as "AirTemp" and specify the time series
AirTemp <- GetValues(USGS,"NWISDV:01104480","NWISDV:00020DataType=MEAN",startDate = "2014-09-01", endDate="2014-09-30")
The results are shown below.
Use the View command to see the time series data in a table format
View(AirTemp)
You can use the Plot() command to create a simple plot of Air Temperature over time. The Plot() command requires two inputs:
- X-Values, that are specified by the data table name and the name of the column which these values are located
- Y-Values, that are specified by the data table name and the name of the column which these values are located
##Use the Plot() command to make a graph of Air Temperature with time
Plot(AirTemp$time,AirTemp$DataValue)
The plot is shown below:
You will use the summary command to obtain summary statistics about the time series. The summary command requires two inputs:
- Data Table name
- Column name
##Use the summary command to obtain summary statistics on the Air Temperature time series
summary(AirTemp$DataValue)
The results are shown below.
To create a more complex graph you can use the ggplot2 package. Notice that although the package is named "ggplot2" the actual command is ggplot. You will use ggplot to make a scatterplot of AirTemp with Time on the X axis, and DataValues on the Y axis. To simplify the following step, you will assign this plot as variable 'g'. The ggplot command will use three inputs:
- Variable name
- 'aes' command: used to generate aesthetic mappings that describe variables in the data. The aes command requires the x values and the y values.
- geom_point() command: used to generate points for the scatterplot
## Create plot with ggplot
g <- ggplot(AirTemp, aes(x=AirTemp$time, y = AirTemp$DataValue)) + geom_point()
*To get more information on ggplot go to https://ggplot2.tidyverse.org/
Now that you have assigned the first plot as variable "g" you will make a more complex scatterplot that contains color-coded data points symbolized by air temperature value.
##Add color to existing plot
g + geom_point(aes(color = AirTemp$DataValue))
The graph is shown below.