Definition of biostatistics
The science of collecting, organizing, analyzing, interpreting and presenting data for the purpose of more effective decisions in clinical context.
4 Importance of biostatistics
IDID
 Identify and develop treatments for disease and estimate their effects
 Design, monitor, analyze, interpret, and report results of clinical studies
 Identify risk factors for diseases
 Develop statistical methodologies to address questions arising from medical/public health data
When do you need biostatistics?
BEFORE you start your study!
dx betw population and a sample

Population includes all objects of interest

assoc w/ PARAMETERS(μ,σ)

Sample is only a portion of the population
 assoc w/ STATISTICS(X,s)
 assoc w/ PARAMETERS(μ,σ)
 assoc w/ STATISTICS(X,s)
compute statistics, and use them to estimate
parameters.
reasons why we dont work with populations

usually large, and often impossible to get
data for every object of study

Sampling is costly, and the
more items surveyed, the larger the cost
usually large, and often impossible to get
data for every object of study
Sampling is costly, and the
more items surveyed, the larger the cost
Descriptive vs Inferential statistics
statistics are computed in order to estimate the parameters of a population
Descriptive Statistics
 first (computational) part of statistical analysis
 procedure used to organize and summarize masses
of data
Inferential Statistics i
 second (estimated) part of statisticcal analysis
 used to find out info about a population, based on a sample
define biased sample
Biased sample is one in which the method used to
create the sample results in samples that are
systematically different from the population.
define random sampling
Each element/item in the population has an equal chance of
occuring.
 preferred way but difficult to execute
 requires complete list of each element in pop therefore usually assoc w/ comp gen list
define systematic sampling
elements are counted off /every xth element is taken
OTHER TYPES OF SAMPLING
convenience sampling:
readily available data is used (first people the surveyor runs into.)
cluster sampling:
 divides the pop into groups/clusters usually geographically.
 clusters are randomly selected,
 each element in the chosen clusters are used.
Stratified sampling
 divides the population into groups called strata.
 by some characteristic,(M/F) not geographically
 sample taken from each strata using
 random, systematic, or convenience sampling.
3 things thtat determine a good sample

Random selection

Representativeness by structure

Representativeness by number of cases
Random selection
Representativeness by structure
Representativeness by number of cases
types of error
Random error = sampling variability.
Bias (systematic error) difference betw/ observed value and the true value due to all causes other than sampling variability.
absence of error of all kinds = accuracy
sample size calculation principles
 law of large numbers= as the sample size increases the margin of error decreases as percentage diff betw/ popo and sample goes to zero
 number of experimental units are justified
 purpose of size calculation = large enough for acc info but small enough for practicality
factors sample size depend on APEUS
Acceptable level of confidence
Power of the study
Expected effect size
Underlying event rate in the population
Standard deviation in the population
stages of biomedical research

Planning and organization

Conduction of the investigation

Data processing and analyses of results
Planning and organization
Conduction of the investigation
Data processing and analyses of results
8 components of research programme

Aim : summary and
formulatation of the research hypothesis.

Object: event, that is going to be studied.

Units of observation: logical(studied case) and technical(evn of logical unit

Indices of observation: important mesaurable factors. they are *measurable*additive*self controlling

factorial

resultative

Place

Time:

single: studied in single "critical' moments

continuous: show long term tendency of events

Statistical analyses

Methodology
Aim : summary and
formulatation of the research hypothesis.
Object: event, that is going to be studied.
Units of observation: logical(studied case) and technical(evn of logical unit
Indices of observation: important mesaurable factors. they are *measurable*additive*self controlling

factorial

resultative
Place
Time:

single: studied in single "critical' moments

continuous: show long term tendency of events
Statistical analyses
Methodology
one vs many measurements
many measurements on one subject: get to know the one subject quite well but learn nothing about how the response varies across subjects.
one measurement on many subjects, you learn less about each individual, but you get a good sense of how the response varies across subjects.
explain paired and unpaired data
paired: 2+ measurements are made on the same observational unit (subjects, couples)
unpaired: only one type of measurement is made on each unit.
describe the parts of the research plan in planning and organisation

Definition of the team responsible for the study and preliminary training.

Administration and management of the study.
Definition of the team responsible for the study and preliminary training.
Administration and management of the study.
key components of information processing

Data check and correction

Data coding

Data aggregation

according to data use: Primary /Secondary

according to indice number: Simple /Complex
Data check and correction
Data coding
Data aggregation

according to data use: Primary /Secondary

according to indice number: Simple /Complex
benefits of data summary
 become familiar with the data and the characteristics of the sample that you are studying
 identify problems with data collection or
errors in the data (data management issues)
 Range checks for illogical values
errors in the data (data management issues)
dx betw/ variable and data
variable: something whose value can vary.
data: values obtained from measuring a variable
categories of variables
Nominal

Values in arbitrary categories

Order of the categories is completely arbitrary/ meaningless

No units!

Data has no units of measurement.
Ordinal

Values in ordered categories

Order of the categories is not arbitrary. possible to
order the categories in a meaningful way. 
No units!

Data do not have any units of measurement.
what are the four levels of measurement
 Nominal is the lowest level. Only names are meaningful here.
 genotype no numbers calc is meaningless
 Ordinal adds an order to the names.
 pain scoreorder matters but not diff betw
 Interval adds meaningful differences.
 temp diff betw 2 points
 Ratio adds a zero so that ratios are meaningful.
 height clear def of 0, can look at the ratio of 2 measurements
 genotype no numbers calc is meaningless
 pain scoreorder matters but not diff betw
 temp diff betw 2 points
 height clear def of 0, can look at the ratio of 2 measurements
visual methods to summarize data
 tables
 frequency table
 graphs/ graphical summary
 bar chart = categorical data
 histogram = continuous data
 boxplot= continous data
 frequency table
 bar chart = categorical data
 histogram = continuous data
 boxplot= continous data