what is a variable
a condition/ characteristic that can change or have different values
defining characteristics
 attribute that describes a person/ place/ thing
 value can vary betw/ diff entity's
qualitative vs quantitative variables
qualitative: values that are names/ labels
quantitative: numeric variables that measure quantity
discrete vs continuous variables
continuous variable: a variable that can have any value bet/ it's minimum and max values
discrete: variable that can't have any value betw/ min and max
univariate vs bivariate data
Univariate data: when a study consists of only one variabe
Bivariate data: when a study examines the relationship bet/ 2 variabels
what is a nominal scale
 lowest statistical measurement level
 this scale is given to items that are divided into categoris without any order or structure
 e.g.
 gender
 eye colour
 blood type
 e.g.
 gender
 eye colour
 blood type
what is the ordinal scale
consists of variables that have an inherent order to the relationship among diff categories
 a ranking of responses that may have diff meaning among individuals
 allows gross order but not the relative distance between them as the distance is not equal
 properties of ordinal scale:
 1)Identity: quality being measured
 2) Magnitude: amount of the quality being measured gives a quantitative distance betw/
what is the interval scale
variables that have a constant and equal distances between values but the zero point is arbitrary
properties:
 identity
 magnitude
 equal distance: shows how the difference bet/ points
e.g. IQ score, pain scale w/ no,
what is a ratio scale
top level of measurement with all the properties of abstract an abstract number system but with an absolute zero
properties
 identity
 magnitude
 equal distance
 absolute zero: allows how many times greater one case is from another
 allows use of all mathematical operations
 e.g.
 wieght,
 pulse rate
 respiratory rate
 e.g.
what is a measure of central tendency / central location
a single value that attempts to describe a set of data by identifying the central position within the data set
 mean
 median
 mode
describe the mean §
 most familiar measure of central tendancy
 most common value in the data set even though its not one of the values=> min error
 used wi/ discrete and continous data, latter most common
 sum of all the values divided by no of values in data set to min error
 includes every value of data set
 only central tendency w/ the sum of deviations of each value from mean = 0
 sample mean = X bar
 populaiton mean = µ
what is the main disadvantage of the mean
very susceptible to outliers (values unusual compared to data set by small/ narge numerical value)
mean can be skewed by these values
if so the median is a better measure of central tendency
when not to use the mean and use the median instead
presence of outliers
skewed distribution the mean moves away from the centre but the median stays central and is least influenced
 in normal distribution: mean= median=mode
what is the median
the middle score for a set of data that has been arranged in order of magnitude
 least affected by outliers and skewed data
 order the values and find te middle, if even no. find mean of the two
what is the mode
most frequent score in the data set.
 highest bar on histogram
 used for categorical data when the most popular option is sought after
problems with the mode

non unique,
 causes problems when 2 values are equally popular
 even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data
 if the mode is far from the rest of the data in the set then it's inaccurate
 causes problems when 2 values are equally popular
 even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data
which data sets are best used in normal and sxewed distributions
normal distribution
 mean or median can be used but mean ideal because:
 as it has least amount of error since it includes all values in data set
 any change in the scores affects the value of the mean but not mode /meadian
skewed distribution
 the mean is dragged in the direction of the skew so the median is best
 increased skew increases the ddx bet/w mean and median
normal distribution w/ NON NORMAL DATA SETS by normality tests
 median> mean as a rule of thumb unless there's no reall dx betw/ median and mean
match the variable to the type of central tendency preffered
nominal= Mode
Ordinal=Median
interval/ratio(non skew)= Mean
interval/ratio skew=Median
what is a measure of sprad // measure of dispersion
describes the variability i a sample/ population
 used wlongside measures of central tendency to give a describtion of the overall data
what is the purpose of measuring a data spread
 shows how well the central tendency represents the data
 large spread suggests large diff betw individual scores and vv for small spread
 consists of
 range
 quartiles
 absolute deviation
 standard deviation
 range
 quartiles
 absolute deviation
 standard deviation
what is the range
the difference between the highest and lowest scores in a data set and is the simplest measure of spread
 range =max valuemin value
 sets the boundraries for scores
 useful for measuring critilically high or low thresholds
 detects errors when inputing data
what are quartile and interquartile ranges
quartiles: breaks data into quarters
even numbers: finds the mean of the 2 scores at the quarterly places in the data set
odd number: the value at 25th, 50th and 75th, positions are the quartiles
Q2 i=median
benefits of qurtiles and what is interquartile range
 less affected by outliers and skewed data like the median so are best choice for measuring the spread of these data sets

interquartile range= the dx bet/w Q3 & Q1 which shows the range in the mid half of the distribution score
 Q3Q1= interquartile range
 semi interquartile range: half the interquartile range= (Q3Q2) /2
 Q3Q1= interquartile range
Drawback of quartiles
they dont rake into account every score in the data set
what is the absolute/ variance/ standard deviation
how to calculate absolute & mean absolute deviation
shows the amount of deviation/variation that occurs around the mean score
total variability: addition of the deviation of each score/ by the number of scores
the choice of absolute deviation, variance and standard deviation depends on the type of statistic
 easiest way to calc deviation = individual score minus mean score
 values above mean are +ve and below are ve
 total variability would be 0 cause of the positive and negative cancelling so the signs are ignored and only absolute values are used = absolute deviation=>divided to give == mean absolute deviation
how to calc variance
achieves positive values of the deviations from the mean by squaring them
addition of te squared deviations gives the sum of squares
the sum of squares is divided by n
 if the values in the data are spread out from the mean then the variance is a large number
 if the values are closer to the mean then the variance is small
 problems with variance
 squaring gives more values to extreme scores so is susceptible to outliers
 the units of variance are squared so they differ from the units of the data set so they can't be directly related to data set values
 calulating the standard deviation solves this problem
what is the standard deviation
a measure of the spread of scores w/in a data set
sample SD's divver from population SD's in their calculation
when to calculate the pop SD
 if data on entire pop is present
 if the sample is all you're interested in and don't want to generalize your result
when to use the sample SD: if you have sample data and wish to generalize to population
NB: the sample SD is not a deviation of the sample itself but an estimate of the pop SD based on sample date
which type of data of data should be used to calculate SD
 SD is used along w/ the mean to summarize continous data NOT CATEGORICAL DATA
 anly appt if the data is normally distributed/ non skewed
define the EMPIRICAL RULE
: for a normal distribution nearly all of the data will fall within three standard deviations of the mean
what are the three parts of the empirical rule
 68% of data falls inthe 1st SD from the mean: µ ± 1xSD
 95% falls w/in 2 SD's: µ ± 2xSD
 99.7% fall w/in 3SD;s: µ ± 3xSD
aka the 3 sigma rule