> > Investigating Air Temperature with HydroClient and RStudio

Investigating Air Temperature with HydroClient and RStudio

Liza Brazil, Jon Pollak, Consortium of Universities for the Advancement of Hydrologic Science, Inc
Author Profile

Introduction

The following step demonstrates how to retrieve a Web Service URL for a specific time series in HydroClient and view and graph the data set in R Studio. This example uses time series data for Air Temperature measured in Stony Brook Boston, MA between 9/01/2014 and 9/30/2014. The purpose of this step is to combine HydroClient data discovery abilities with R Studio data analysis tools. Students will discover data and obtain metadata on HydroClient then access data in R Studio

Conceptual Outcomes

Students will learn how to retrieve, view, and graph time series data discovered in HydroClient with R Studio. Students are also introduced to interpreting summary statistics of time series data in R Studio.

Practical Outcomes

Complex graph of Air Temperature in R Studio

Summary Statistics of Air Temperature in R Studio

Time Required

1 hour 30 minutes to 2 hours

Computing/Data Outputs

Hardware/Software Required

R and R Studio

WaterML and ggplot2 CRANPackages

Any major internet browser

Instructions

Open up RStudio. If you do not already have RStudio installed on your computer, you can download a free version here: https://www.rstudio.com/products/rstudio/download/. You will also need to install WaterML and ggplot2 packages. The WaterML package allows you to retrieve data from the Hydrologic Information System. The ggplot2 package is a plotting system for R that makes it easy to produce complex graphics.

To install WaterML and ggplot2 packages use the install.packages() command:
Install.packages ("WaterML")
install.packages("ggplot2")

Now use the require() command to load both packages:
require(WaterML)
require(ggplot2)

Now open up HydroClient. Search Boston, MA, United States and zoom out twice as shown below.


1444240080

Set your search parameters to

  • Date Range: 09/01/2014 – 09/30/2014
  • Keyword: Air Temperature
  • Data Service: Organization-U.S. Geological Survey, Title-NWIS Daily Values

After clicking Search Map, the data results will appear as below.

1444240167

Click on the Data Tab to access the Data Table. In the Data Table search Stony Brook. The time series with the Site Name: STONY BROOK RESERVOIR AT DAM NEAR WALTHAM, MA will appear. Use the scroll bar to scroll right to view the Web Service Description URL as shown below. Click here if you are not sure how to search for data on HydroClient.

Click on the link. Now copy the URL from the webpage shown below (i.e. http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL). This link will tell R where the data are located on the internet.

1444240417

Once you have copied the URL you may exit out of the Hydroportal and HydroClient webpages because you have just gathered the necessary link to access the time series data and graph it in R.

In RStudio, set a variable that defines the server location by pasting the URL from the previous step:
USGS <- "http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL"
This defines the CUAHSI HIS service that you are connecting to by giving the URL to that service's WSDL file. This example uses a service from the USGS.

##Set Variable for data service URL

USGS <- "http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL"

You will use the GetValues command to set the variable for the data. The GetValues command requires three inputs:

  1. Data Service: The Data Source registered in the HIS Central Catalog, e.g. US Geological Survey (USGS), U.S. Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) SNOTEL Data, National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network (GHCN).
  2. Site Code: The unique code given by the data publisher for the location of the observation.
  3. Variable Code: The unique code used by the organization that collects the data.

The Start Date and End Date are two optional inputs that can be inserted in the GetValues command.

## Use the GetValues command to set the variable as "AirTemp" and specify the time series

AirTemp <- GetValues(USGS,"NWISDV:01104480","NWISDV:00020DataType=MEAN",startDate = "2014-09-01", endDate="2014-09-30")

The results are shown below.

1444240668

Use the View command to see the time series data in a table format
View(AirTemp)

1444240742

You can use the Plot() command to create a simple plot of Air Temperature over time. The Plot() command requires two inputs:

  1. X-Values, that are specified by the data table name and the name of the column which these values are located
  2. Y-Values, that are specified by the data table name and the name of the column which these values are located

##Use the Plot() command to make a graph of Air Temperature with time

Plot(AirTemp$time,AirTemp$DataValue)

The plot is shown below:

1444240945

You will use the summary command to obtain summary statistics about the time series. The summary command requires two inputs:

  1. Data Table name
  2. Column name

##Use the summary command to obtain summary statistics on the Air Temperature time series

summary(AirTemp$DataValue)

The results are shown below.

1444241061

To create a more complex graph you can use the ggplot2 package. Notice that although the package is named "ggplot2" the actual command is ggplot. You will use ggplot to make a scatterplot of AirTemp with Time on the X axis, and DataValues on the Y axis. To simplify the following step, you will assign this plot as variable 'g'. The ggplot command will use three inputs:

  1. Variable name
  2. 'aes' command: used to generate aesthetic mappings that describe variables in the data. The aes command requires the x values and the y values.
  3. geom_point() command: used to generate points for the scatterplot

## Create plot with ggplot

g <- ggplot(AirTemp, aes(x=AirTemp$time, y = AirTemp$DataValue)) + geom_point()

*To get more information on ggplot go to http://docs.ggplot2.org/current/

Now that you have assigned the first plot as variable "g" you will make a more complex scatterplot that contains color-coded data points symbolized by air temperature value.

##Add color to existing plot

g + geom_point(aes(color = AirTemp$DataValue))

The graph is shown below.

1444241128

Additional Activities and Variants

Related Steps