Using EZStats


This EZStats Instructional video is part of the EvaluateUR instructional videos.
The information in this guide will help you to make the most of the data collected using EvaluateUR. The EvaluateUR EZStats tab allows you to generate a summary report of your student and mentor data along with a limited number of statistical measures including: gains, percent change, and frequency tables for all outcome components. If data are available for 20 or more student-mentor pairs, paired sample t-tests are provided.

The EZStats report considers all the outcomes assessed in EvaluateUR, any additional program-wide outcomes added by the site administrator and any project-specific outcomes added by the student-mentor pair. Click here for a complete list

To ensure the highest data integrity, EZStats only uses assessment data for student-mentor pairs who have completed all of the EvaluateUR steps. Furthermore, it is essential that the pre-, mid-, and end-of-research steps have been completed at the appropriate time during the research experience. Any significant time lag in completing EvaluateUR steps could compromise the findings.

Jump to: Descriptive Statistics | Significance Testing | Downloading Data

Descriptive Statistics

This section explains the descriptive statistics provided by EZStats. To illustrate these, Figure 1 is an example for one of EvaluateUR's outcome components. Figure 2 shows the portion of Figure 1 that includes the descriptive statistics.

Sample Size (Highlighted in the orange box in Figure 2)

  • The number of respondents (n=) is provided. It is useful to compare this value to the total number of student-mentor pairs participating in your program. Any difference – that is, when 'n' is less than the number of pairs that you registered and activated – is due to one or more of the student-mentor pairs not completing all the EvaluateUR steps.

Means (Highlighted in the blue boxes in Figure 2)

  • The statistical mean refers to the average for the data and is determined by adding all the assessment score points assigned to a given population of students or mentors and then dividing that total by the total number of students or mentors.
  • EZStats provides means for students and mentors at the pre-, mid-, and end-of-research (labeled as post in Figures 1 and 2).
  • It is not unusual for either a student or mentor to assign a high score on the first assessment (pre-research) and then adjusts the score downward on the mid-research assessment.

Standard Deviation (Highlighted in green boxes in Figure 2)

  • The standard deviation (SD) is a measure of the spread of the data around the mean. Figure 3 Illustrates the standard deviation in a normal distribution with the mean in the center of the distribution. By definition:
    • 68.3% of data are within one standard deviation of the mean
    • 95.5% of data are within two standard deviations of the mean
    • 99.7% of data values are within three standard deviations of the mean

Data sets can have the same mean value with different standard deviations. Figure 4 illustrates data with the same statistical mean but with different standard deviations indicating the scatter of the data around the mean. A higher standard deviation (SD=1 and SD=2 in Figure 4) indicates that there is greater variation (more spread) in the data compared to SD=0.5. There is no clear definition for what constitutes a low or high standard deviation as it depends on the sample and the tolerance for variability. For example, you might see a high variation if some of your students are completing their first undergraduate research experience while others are on their second.

Gains and Percent Change

Figure 5 (an enlarged view of Figure 1) shows the assessment score gains and percent changes (highlighted in purple and red, respectively), between the pre-mid, mid-post, and the pre-post means. A positive gain indicates that the assessment score increased over time. The percentage change is a way to express the change in a variable (Figure 5). It represents the relative change between the initial (old) and final (new) values.


EZStats provides additional details for each outcome component. To view this information, click the link in the lower left corner of the score report "Show Details". Shown in Figure 6 with a red box. Once opened, the Show Details will change to Hide as shown in Figure 7. Click "Hide" to collapse the Details.

The frequency table provide the number (count) and percent of responses from students and mentors using the 5-point rubric at each of the three assessment times. The last column provides the percent of responses for each response and will total 100%. The frequency distribution tables allow patterns and outliers to be identified.

Significance testing (when n is 20 or greater)

When the number of student-mentor pairs (n) is 20 or greater, EZStats includes values for significance testing. These values are shown to the right of the means and gains (Figure 8). The measure of significance used in EZStats is the paired sample t-test and significance at the 0.5 level. These are explained below.

Paired sample t-test

A paired t-test is used to examine the difference between two variables for the same subject. For EvaluateUR, the two variables in the paired sample t-test are separated by time. Specifically, we are comparing the pre-research assessment scores to the end-of-research (post) assessment scores. The paired t-test is used to determine whether or not the mean difference between two sets of observations is zero. For EvaluateUR, it indicates whether or not the pre-research assessment mean and the post-research assessment mean are the same and therefore their difference is zero.

The t test tells you how significant the differences between groups are. In other words, it is an indication if the differences in means could have happened by chance.

P value

The p value is the probability that the pattern of data in the sample could be produced by random data. Probabilities range between 0% and 100% and when referencing p values they are written in decimal form. For example, a p value of .05 means there is a 5% chance there is no real difference. If p = .01, there is a 1% chance there is no difference. For EZStats, the test standard (alpha) is p=.05 and is an accepted value for educational research. If the p value is equal to or less than .05 the p value highlighted in red (Figure 9). If the p value is higher than .05, but still close to .05, it is considered to be 'marginally significant' or 'close to significance'. In such cases, consider including the p value in reports.

Low p-values indicate that the data did not occur by chance. For example, a p-value of 0.01 means there is only a 1% probability that the results from an experiment happened by chance. P values of 0.000 are not an error. Rather it indicates that the p value is extremely small ≤ 0.0005 – and is rounded down to 0.000 and indicates that the data are highly significant. It would be reported as p < .001 not as p=0.000 because this p value is based on a sample and there is still a very slight chance the p value would be greater than 0.000 with a different sample from the same population.

Degrees of freedom

The degrees of freedom (df) are an estimate of the number of independent pieces of information that went into calculating the estimate. It is related to the sample size and is calculated as n-1. The degrees of freedom affect the shape of the distribution graph in the t-distribution. As the df value increases, the area in the tails of the distribution get smaller. As df approaches infinity (bigger and bigger sample sizes), the t-distribution will look like a normal distribution. If the df increases, it also stands that the sample size is increasing and the graph of the distribution will have smaller tails, concentrating values towards the mean. More information on df can be found here.

Effect Size

Cohen's d effect size is a statistical concept that measures the strength of the relationship between two variables. It allows us to move beyond statistical significance to practical significance (e.g., 'does it work or not?' versus 'how well does it work?'). The t-value depends essentially on two things: the size of the effect and the size of the sample (power). The effect size in EZStats is Cohen's d. Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), large (d = 0.8), and very large (d ≥ 1). This means that if two groups' means don't differ by at least 0.2 standard deviations, the difference is trivial, even if it is statistically significant. If effect size equals 1, then the two group's means differ by 1 standard deviation. If the effect size is 0.5, then the groups differ by half of one standard deviation. See Figure 10 for examples. APA recommends reporting the effect size in journal articles even for statistically non-significant findings. More information about effect size can be found here.

References and Further Discussions

Cohen, J. (1969) Statistical Power Analysis for the Behavioral Sciences. NY: Academic Press.

More on Effect size

More on Effect size video

More on Degrees of freedom video

Downloading Data

EvaluateUR provides the raw data than could be used with other statistical programs. Excel has excellent statistical options and SPSS, SAS, and R provide powerful tools for analyzing your data further. Depending on your background, consider consulting with a colleague with expertise in statistics.

The options for downloading your data are listed below. Because the SERC server will automatically delete all current academic year EvaluateUR data after one year, we recommend that you download all data that you might want to use either for reports or for additional analyses in the future.

The "download all data as a csv" option produces data for an Excel file or statistical package that contains all student and mentor assessment scores, including scores on all optional program-wide and project-specific outcomes. It also includes student and mentor responses to any open-ended questions.

      Next Page »