Hypothesis Testing And Formulas In Excel Flashcards Preview

Biostatistics > Hypothesis Testing And Formulas In Excel > Flashcards

Flashcards in Hypothesis Testing And Formulas In Excel Deck (18)
Loading flashcards...

One sample t-test

Used for data with a NORMAL DISTRIBUTION
Example: effectiveness of a medicine. The same group of people are tested, one time before treatment and one time after.

The measurements cannot be carried out at the same time and the data are gathered from the same subjects.

Nullhypothesis: there is no significant difference (the medicine is not effective).

Degree of freedom: (n-1)
In this example that would mean the amount of people participating in the study minus one.

P value: T.TEST (array1, array2, 2,1)
Array 1: patients before treatment
Array 2: patients after treatment
Tail: 2
Type: 1

T value: T.INV.2T

Appropriate statistical parameter: T-value (p-value, degree of freedom)


Two sample t-test

Used for data with a NORMAL DISTRIBUTION

Used for two individual groups with one variable. Example: is there a significant difference in body height between men and women?

First you have to use a F-test to find out which t-test to use. The null hypothesis of a f-test is that the variances of the two datasets are equal variance/variance = 1.

The f-test/f-value is variance of one dataset (the variance with the biggest value) divided by the variance of the other dataset.
The p-value of the f-test: t-test function.
If the f-test gives a value over 5% the variances are equal, and if the f-test gives a value under 5% the variances are unequal.

Equal variance: t-test type 2
Unequal variance: t-test type 3

Nullhypothesis of the t-test: there is no significant difference.

Degree of freedom: count both datasets (different groups) and subtract 2.

P value: T.TEST (array1, array2,2,2)
Array 1: height of men
Array 2: height of women
Tails: 2
Type: 2

Appropriate statistical parameter is T value: T.INV.2T (probability, degree of freedom)
Degree of freedom: count both groups - 2


What is the f-value and when do you need it?

You only need the f-value when you have two different samples.

F-value: variance/variance. The biggest value is always on top, f-value is never below 0. If above 5%, variances are equal. Below, it's unequal.

Variance in excel: Var.S

F.test gives p-value
P-value of two-sided f-test: F.dist.2t


How do you use frequency function?

Create intervals for your data.

Create “bins” for your data. The bins are the last, or highest, number in the intervals.


Intervals: 0-10, 11-20, 21-30, 31-40
Bins: 10, 20, 30, 40

Select empty boxes for the frequencies, remember to add an extra box for at the bottom of the colon.
Press =FREQUENCY(array;bins). Select you data as the array and your bins for bins. Do not directly press enter, but press instead ctrl + shift + enter.

Create a histogram with your frequency, and change the x-axis to the intervals. Under chart design, press “select data” and select the intervals.


Relative frequency

The frequency in that “class” or category divided by total number of elements


Frequency density

Frequency divided by the width of the “class” or interval.


Relative frequency density

Frequency density divided by total number of elements


Cumulative distribution

Sum the frequencies as you go down the colons.


Integral discriminator

Total element number minus cumulative distribution



Function: percentile.inc
Array: data you want to find the percentile of
K= how many percent in decimals. (10% = 0,1)


Correlation (Pearsons) test

Used for data with NORMAL DISTRIBUTION
- usually the work correlation is mentioned in question.¨

1 group with 2 variables. Example: is there a correlation between blood potassium conc. and the length of musculus palmaris longus?

Degree of freedom: n - 2
Remember to only count one dataset if the same group is tested.

Correlation coefficient (r): CORREL function. Not important which dataset is in which array. It is always between 1 and 0. If r = 0, there is no correlation.
Determination coefficient (r^2): tells how strong the relationship between the
data is.

Appropriate statistical parameter: t-value = (r* root square (degree of freedom)) / (1- r^2)

P-value: T.dist.2t (t-value, degree of freedom)

First column: x, second column: y

Slope: SLOPE (x,y)
Intercept: INTERCEPT(x,y) (intercept is where the y axis is intercepted?)

y: a x + b

a: slope
b: intercept

If 5 is given as x;
y: a * x + b

If 5 is y;
x: (y - b) / a


Wilcoxon signed rank test


1 sample and 1 variable. Example: weight before and after diet

Degree of freedom: count - 1

Standard deviation: =STDEV.S
Standard error: ST DEV/ square root (count)
T-value: (mean/ standard error)
P-value: T.DIST (t-value, degree of freedom)

Wilcoxon value: wilcoxon function (array 1, array 2, 2, 0)

Array 1: weight before diet
Array 2: weight after diet
Tail (?): 2
Type: 0


Mann-whitney U- test

Used for dataset that has a NON-NORMAL DISTRIBUTION

When we have two independent samples. Example: change in headache in subjects given aspirin and subjects given a placebo pill.

Function: Mann-whitney function: (array 1, array 2, 2)


Chi- square test

For datasets that has NON-NORMAL DISTRIBUTION

For independency:
- two of more groups. e.g. 'frequency of pulmonary cancer amongst smokers and non-smokers'. (No dependency)

For homogenity:
- two or more groups:

- frequency data
- probability chart
- observed frequency and expected frequency. e.g, 'frequency of wearing glasses among girls and boys'.
- one group (known distribution), frequency data, testing how well the observed frequencies fit the expected frequencies. e.g, 'testing the normality of frog red blood cells'.

Degree of freedom: (number of rows - 1 )* (number of columns - 1)

Chi-square test: CHISQ.TEST (actual range; expected range)
Actual range: INNER four squares of measured.
Expected range: the calculated expected range of the measured.

Appropriate statistical parameter: Chi-square value: CHISQ.INV.RT (probability; degree of freedom)
Probability: chi-square test



Analysis of variance.

Three or more independent group, separated from each other. One variable has normal distribution.


Kruskall- wallis test

3 or more groups

non-parametric (non-normal distribution)

Ranked data without separating the groups


Diagnostic tests

Example used: sick/healthy people.

Sensitivity - how many are true positive
Specificity - how many are true negative
Prevalence - how many are negative (all the sick true positive and false negative)

Start by making a chart with total set to 2000. Divide 2000 between sick and healthy people. If prevalence is 0,5, 1000 sick and 1000 healthy.

Sensitivity is how many true positive there are. Divide the amount of sick people by 100 and multiply that number with the percentage of sensitivity.

Specificity is how many true negative there are. Divide amount of healthy people by 100 and multiply that number with the percentage of specificity.

Positive predicted value (relevance): True positive / all positive
Negative predicted value (segregation): True negative / all negative

A-prior-probability: all bacterial / total
Correct classification rate/efficiency: all negative / total


Hemoglobin reference range: 13,7-16,2 g/dL
Hemoglobin measured value: 14,7 g/dL

What is the mean and standard deviation?
How many measured values do we expect to fall out of the reference range is we made 420 measurements?

Mean: max+min / 2

(16,2 + 13,7)/2 = 14,95 g/dL

Standard deviation: 4*standard deviation = max-min
The whole range is the same as 4*standard deviation

(16,2-13,7)/4 = 0,625

How many measured values do we expect to fall out of the reference range when we have 420 measurements?

Reference range is 95%, so we are looking for the 5% who fall out of this range. NOTE that on the test they might ask for which measurements fall above or below the reference range, which is 2,5%.

0,05 * 420 = 21 measurements