Finding, Evaluating and Applying Baseball Research (Part 3): Guest Blog By Dr. Ed Fehringer and Jordan Rassmann
We presented a series of questions about baseball related research to a highly regarded Omaha, NE orthopedic surgeon, Dr. Ed Fehringer:
1. Why do we need good research in baseball?
2. Why is it important for a coach/instructor to be able to review, understand and critique research.
3. How do I find relevant research (online search hacks)?
4. What are the parts/design of a typical study?
5. How do I evaluate the quality of a study?
6. What are the dangers of only reading the abstract.
7. What are the typical statistical tools used and what do they mean?
8. How do I evaluate the authors’ methods, discussion and conclusions?
9. What are the most common pitfalls in reading research?
10. How do I begin to apply the results of research to help my players/team?
11. What are the possible consequences of not learning to search out and evaluate quality research?
His responses so far have been outstanding, but today we’re shifting gears a little. For question # 7 Dr. Fehrenger has deferred to the knowledge of our full time data analysis, Jordan Rassmann, to answer question # 7. I’m no statistician and math isn’t my thing. But, when reviewing research, it’s important to have at least a rudimentary understanding of the statistical tools an what they mean. Around here, that’s Jordan’s job. So, take it away Jordie!
7. What are the typical statistical tools used and what do they mean?
As Dr. Fehringer so eloquently pointed out, it is of paramount importance to understand that perfect research is impossible because of the infinite number of human and other variables present. However, there are definitely some statistical tools that can help researchers quantify the possible importance of a study.
Statistics is defined as a branch of science that deals with the collection, organization, analysis of data, and drawing of inferences from the samples to the whole population. There are two main types of statistics: descriptive and inferential statistics.
Descriptive statistics essentially provide a summary of data in a variety of forms. The mean, which is the sum of all the values divided by the total number of values, and the median, which is the exact middle value in an ordered set (half of total values above value and half of total values below value), are measures of the center or middle of the data. The standard deviation, which is used to measure how spread out the distribution of the data is (the formula is a bit complex but it essentially allows us to have a standard way of knowing what is relatively normal in a sample), and the range, which is the difference between the maximum value and minimum value in a dataset, are measures of the overall spread of the data. These and many more types are not always used in research, but they can be helpful when trying to understand how values are distributed throughout a dataset.
Inferential statistics incorporates data being analyzed from a sample in order to make inferences about the total population. It is important to note that these inferences are made with a certain degree of probability or likelihood that the insights drawn from the sample can actually be applied to the total population.
In research, there is always an underlying hypothesis, which is a proposed explanation for a particular phenomenon (or usually what you are doing the research for in the first place). The alternate hypothesis is the statement that generally researchers make where they believe that there is a relationship between variables in a study. On the other hand, the null hypothesis (usually denoted as ) serves as the devil’s advocate in that it the opposite of the alternate hypothesis and always has the viewpoint that there is actually not significant difference between the variables in question.
One of the most commonly used statistical tool to check for significance between variables is the p-value. The p-value is the probability of the event occurring by chance if the null hypothesis is true. This sounds unnecessarily confusing but it isn’t nearly as bad as it sounds. The p-value, like any other probability, ranges from 0 to 1 and is used in order to decide whether or not to reject the null hypothesis (the alternate hypothesis is true) or accept the null hypothesis (the alternate hypothesis is false).
If the P value is less than an arbitrarily chosen value, which is referred to as the significance level, the null hypothesis is rejected. This is usually the desired outcome when analyzing data, especially for researchers. Usually a p-value below at or below 0.05 is deemed as being seen as statistically significant.
Among the most powerful and commonly used tests of relationships used within research are: the Pearson r coefficient, linear/multiple regressions, t-tests, analysis of variance (ANOVA), Chi-square test, and logistic regressions. It is also important to note that there are underlying conditions that must be true in order to actually use these tests, such as normality and independence of variables, but I won’t go into full detail on these conditions. Another very important concept to also understand is that correlation does not imply causation!
The Pearson r coefficient (or R-value) is used to test a relationship between two continuous variables (meaning that they are basically just any value of number). R-values range from -1 to 1 and measures the strength and direction of a linear relationship (line of best fit). A value of -1 would mean that there is a perfect downhill or negative linear relationship, a value of 0 indicate no linear relationship at all, and a value of 1 would indicate a perfect uphill or positive linear relationship.
Similarly, a linear/multiple regression tests the relationship between multiple predictor (or independent) variables and one continuous outcome variable. This kind of test would be useful if you have a bunch of numeric variables and are trying to predict another numeric variable.
A t-test is used to test the null hypothesis that there is no difference between the means of the two groups. ANOVA is used to test if there is any significant difference between the means of two or more groups. In ANOVA, we look at two variances: between-group and within-group variability. Within-group variability (also known as error variance) is variation that cannot be accounted for in the study design and is based on random differences that happen to be present in the study. Between-group is the result of the actual treatment (usually used regarding the effects of medical treatments/procedures).
The Chi-square test compares the frequencies and basically tests wither or not the observed sample data differs significantly from that of the expected data if there were no differences between the groups. This is essentially used under the general consensus of the null hypothesis.
Logistic regression, like linear/multiple regression, tests the relationship between multiple predictor or independent variables but, instead of predicting the value of a particular variable, it predicts the odds or probability of whether or not an event will or will not happen.
I won’t be able to explain every statistical tool used when conducting and analyzing research, but we realistically covered a good chunk of it. Having a basic understanding of these important statistical tools can help you better understand how to interpret the results of a research study and identify whether or not the conclusions they are making are actually justified.
And, there you have it. Dr. Fehringer will be back Friday with his answers to questions 8-12.
Randy Sullivan, MPT, CSCS
CEO, The Florida Baseball Ranch.