List the core analytical methods of cost estimating discussed in this module.
These methods are essential for understanding and analyzing cost data.
What does Exploratory Data Analysis involve?
Analyzing and investigating the data set to summarize main characteristics
It marks the first step in developing a cost or risk estimate.
Define univariate data.
Data consisting of a single variable
Examples include cost data for a single element or historical cost growth factors.
What is the difference between bivariate and multivariate data?
Bivariate data has one independent and one dependent variable; multivariate data has several independent variables and one dependent variable
Examples include software development cost (bivariate) and cost of ship supplies based on crew size and hours underway (multivariate).
What is time series data?
A bivariate data set with time as the independent variable
It requires different analytical approaches compared to univariate or multivariate data.
What is the purpose of regression analysis?
Identifying smooth trends in data
It does not detect paradigm shifts or cycles in time series data.
What are outliers?
Data points that fall far from the central mass of the data
They can distort both descriptive and inferential statistics.
True or false: Outliers should always be removed from data sets.
FALSE
Outliers must never be removed without justification and documentation.
What is the mean?
The arithmetic average of a data set
It is sensitive to outliers and may not represent the data set accurately.
How is the median calculated?
Order data from lowest to highest and find the middle value
If there is an even number of data points, average the two middle values.
What is the mode?
The most frequently occurring value in a data set
It is least used among the measures of central tendency.
What is the Coefficient of Variation (CV)?
A measure of dispersion that indicates the ratio of the standard deviation to the mean
It is useful for comparing the degree of variation between different data sets.
What is the significance of standard deviation?
It measures the amount of variation or dispersion in a set of values
It is crucial for understanding the spread of data points around the mean.
What does data validation involve?
Examining descriptive statistics, assessing outliers, and comparing historical results
It ensures the credibility of the data used in analysis.
What is a paradigm shift in data analysis?
A marked change in the nature of the data occurring at some point or over some period
An example includes lower cost growth in programs due to changes in acquisition law.
What are cycles in data analysis?
Repeating periodic trends often found in seasonal data
Examples include higher electricity costs in winter and summer.
What is autocorrelation?
When a variable is correlated to its past values
It indicates dependencies within the data.
True or false: The mode is the most commonly used measure of central tendency.
FALSE
The mode is the least used of the three measures of central tendency.
What does a lower variance indicate?
Less dispersion or spread of data
A lower variance means the data points are closer to the mean.
What is the formula for sample variance?
s² = ∑(xi - x̄)² / (n - 1)
This formula ensures that the sample standard deviation is an unbiased estimator of the population standard deviation.
What is the standard deviation?
The square root of the variance
It measures the absolute distance of data points from their mean.
What does it mean if a distribution is skewed right?
The median is lower than the mean
This indicates that the distribution stretches out to the right.
What is kurtosis?
A measure that describes the shape of a distribution’s tails
It indicates whether the tail contains extreme values.
What does a low Coefficient of Variation (CV) indicate?
Less dispersion in the data
A CV of less than 15% is desired in cost estimating.