Flashcards in Hypothesis Testing And Formulas In Excel Deck (18)
One sample t-test
Used for data with a NORMAL DISTRIBUTION
Example: effectiveness of a medicine. The same group of people are tested, one time before treatment and one time after.
The measurements cannot be carried out at the same time and the data are gathered from the same subjects.
Nullhypothesis: there is no significant difference (the medicine is not effective).
Degree of freedom: (n-1)
In this example that would mean the amount of people participating in the study minus one.
P value: T.TEST (array1, array2, 2,1)
Array 1: patients before treatment
Array 2: patients after treatment
T value: T.INV.2T
Appropriate statistical parameter: T-value (p-value, degree of freedom)
Two sample t-test
Used for data with a NORMAL DISTRIBUTION
Used for two individual groups with one variable. Example: is there a significant difference in body height between men and women?
First you have to use a F-test to find out which t-test to use. The null hypothesis of a f-test is that the variances of the two datasets are equal variance/variance = 1.
The f-test/f-value is variance of one dataset (the variance with the biggest value) divided by the variance of the other dataset.
The p-value of the f-test: t-test function.
If the f-test gives a value over 5% the variances are equal, and if the f-test gives a value under 5% the variances are unequal.
Equal variance: t-test type 2
Unequal variance: t-test type 3
Nullhypothesis of the t-test: there is no significant difference.
Degree of freedom: count both datasets (different groups) and subtract 2.
P value: T.TEST (array1, array2,2,2)
Array 1: height of men
Array 2: height of women
Appropriate statistical parameter is T value: T.INV.2T (probability, degree of freedom)
Degree of freedom: count both groups - 2
What is the f-value and when do you need it?
You only need the f-value when you have two different samples.
F-value: variance/variance. The biggest value is always on top, f-value is never below 0. If above 5%, variances are equal. Below, it's unequal.
Variance in excel: Var.S
F.test gives p-value
P-value of two-sided f-test: F.dist.2t
How do you use frequency function?
Create intervals for your data.
Create “bins” for your data. The bins are the last, or highest, number in the intervals.
Intervals: 0-10, 11-20, 21-30, 31-40
Bins: 10, 20, 30, 40
Select empty boxes for the frequencies, remember to add an extra box for at the bottom of the colon.
Press =FREQUENCY(array;bins). Select you data as the array and your bins for bins. Do not directly press enter, but press instead ctrl + shift + enter.
Create a histogram with your frequency, and change the x-axis to the intervals. Under chart design, press “select data” and select the intervals.
The frequency in that “class” or category divided by total number of elements
Frequency divided by the width of the “class” or interval.
Relative frequency density
Frequency density divided by total number of elements
Sum the frequencies as you go down the colons.
Total element number minus cumulative distribution
Array: data you want to find the percentile of
K= how many percent in decimals. (10% = 0,1)
Correlation (Pearsons) test
Used for data with NORMAL DISTRIBUTION
- usually the work correlation is mentioned in question.¨
1 group with 2 variables. Example: is there a correlation between blood potassium conc. and the length of musculus palmaris longus?
Degree of freedom: n - 2
Remember to only count one dataset if the same group is tested.
Correlation coefficient (r): CORREL function. Not important which dataset is in which array. It is always between 1 and 0. If r = 0, there is no correlation.
Determination coefficient (r^2): tells how strong the relationship between the
Appropriate statistical parameter: t-value = (r* root square (degree of freedom)) / (1- r^2)
P-value: T.dist.2t (t-value, degree of freedom)
First column: x, second column: y
Slope: SLOPE (x,y)
Intercept: INTERCEPT(x,y) (intercept is where the y axis is intercepted?)
y: a x + b
If 5 is given as x;
y: a * x + b
If 5 is y;
x: (y - b) / a
Wilcoxon signed rank test
The dataset has NON-NORMAL DISTRIBUTION
1 sample and 1 variable. Example: weight before and after diet
Degree of freedom: count - 1
Standard deviation: =STDEV.S
Standard error: ST DEV/ square root (count)
T-value: (mean/ standard error)
P-value: T.DIST (t-value, degree of freedom)
Wilcoxon value: wilcoxon function (array 1, array 2, 2, 0)
Array 1: weight before diet
Array 2: weight after diet
Tail (?): 2
Mann-whitney U- test
Used for dataset that has a NON-NORMAL DISTRIBUTION
When we have two independent samples. Example: change in headache in subjects given aspirin and subjects given a placebo pill.
Function: Mann-whitney function: (array 1, array 2, 2)
Chi- square test
For datasets that has NON-NORMAL DISTRIBUTION
- two of more groups. e.g. 'frequency of pulmonary cancer amongst smokers and non-smokers'. (No dependency)
- two or more groups:
- frequency data
- probability chart
- observed frequency and expected frequency. e.g, 'frequency of wearing glasses among girls and boys'.
- one group (known distribution), frequency data, testing how well the observed frequencies fit the expected frequencies. e.g, 'testing the normality of frog red blood cells'.
Degree of freedom: (number of rows - 1 )* (number of columns - 1)
Chi-square test: CHISQ.TEST (actual range; expected range)
Actual range: INNER four squares of measured.
Expected range: the calculated expected range of the measured.
Appropriate statistical parameter: Chi-square value: CHISQ.INV.RT (probability; degree of freedom)
Probability: chi-square test
Analysis of variance.
Three or more independent group, separated from each other. One variable has normal distribution.
Kruskall- wallis test
3 or more groups
non-parametric (non-normal distribution)
Ranked data without separating the groups
Example used: sick/healthy people.
Sensitivity - how many are true positive
Specificity - how many are true negative
Prevalence - how many are negative (all the sick true positive and false negative)
Start by making a chart with total set to 2000. Divide 2000 between sick and healthy people. If prevalence is 0,5, 1000 sick and 1000 healthy.
Sensitivity is how many true positive there are. Divide the amount of sick people by 100 and multiply that number with the percentage of sensitivity.
Specificity is how many true negative there are. Divide amount of healthy people by 100 and multiply that number with the percentage of specificity.
Positive predicted value (relevance): True positive / all positive
Negative predicted value (segregation): True negative / all negative
A-prior-probability: all bacterial / total
Correct classification rate/efficiency: all negative / total