Module 6 - Basic Data Analysis Principles Flashcards

Question

What is **data visualization**?

Answer 1

The principle of displaying data sets in an easily understood manner ## Footnote It helps reveal data distribution, patterns, trends, and outliers.

Answer 2

To graph two variables along two axes ## Footnote Scatter plots help detect trends, shifts, or potential outliers in data.

Answer 3

y-axis ## Footnote The independent variables that drive cost are graphed on the x-axis.

Answer 4

The correlation strength between two variables ## Footnote It represents a reduction in variability for predicting the dependent variable.

Answer 5

3 * (Mean - Median) / Standard Deviation ## Footnote This measures the standard deviations separating the mean and median.

Answer 6

It indicates the proportion of variance explained by the model ## Footnote Stakeholders may expect a high R² value, but cost estimation typically does not yield such high values.

Answer 7

Independent variable (x) ## Footnote The dependent variable (y) is plotted on the vertical axis.

Answer 8

TRUE ## Footnote Linear data allows for straightforward analysis and regression.

Answer 9

Transform the data to make it linear ## Footnote This allows the analyst to perform linear regressions on the transformed data.

Answer 10

y = a + bx ## Footnote Here, 'a' is the y-intercept and 'b' is the slope.

Answer 11

* Linear functions * Power functions * Exponential functions ## Footnote Each type has specific characteristics and applications in cost estimation.

Answer 12

y = ax^b ## Footnote This equation can be transformed to show a linear relationship in log space.

Answer 13

ln(y) = ln(a) + b ln(x) ## Footnote In this transformation, 'b' is the slope and 'ln(a)' is the y-intercept.

Answer 14

y = ae^(bx) ## Footnote This can also be expressed as y = ak^x, where k = e^b.

Answer 15

The function is exponentially increasing ## Footnote Conversely, if b is less than zero, the function is exponentially decreasing.

Answer 16

To show the density of univariate data ## Footnote Histograms group data into bins and display the frequency or relative frequency.

Answer 17

The number and size of bins ## Footnote Poor choices can obscure important information about the data distribution.

Answer 18

Categorical data as rectangles ## Footnote The height or length of each rectangle corresponds to the value for each category.

Answer 19

A dollar-weighted average ## Footnote This is used for comparing descriptive statistics among different groups.

Answer 20

A dollar-weighted average for work performed by different companies ## Footnote Computed from Selected Acquisition Reports (SARs) for developmental programs with an Engineering and Manufacturing Development (EMD) phase.

Answer 21

* Minimum * First quartile * Median * Third quartile * Maximum ## Footnote Box plots summarize the distribution of a data set and also plot outliers.

Answer 22

TRUE ## Footnote Box plots can demonstrate the same data distribution as histograms, providing a different visual perspective.

Answer 23

To analyze correlation between variables ## Footnote Heat maps depict values for a main variable across two axis variables using a grid of colored squares.

Answer 24

To separate pieces of a stacked bar chart and show changes over time ## Footnote Useful in cost estimating to highlight changes between estimates.

Answer 25

A field of artificial intelligence where algorithms learn from data to make predictions ## Footnote ML involves statistical, mathematical, and numerical techniques.

Answer 26

* Supervised learning * Unsupervised learning * Reinforcement learning ## Footnote These techniques differ in how they analyze data.

Answer 27

Supervised learning analyzes labeled data; unsupervised learning analyzes unlabeled data ## Footnote This distinction is crucial for understanding how each method operates.

Answer 28

* Classification * Regression ## Footnote Classification predicts categorical values, while regression predicts numerical values.

Answer 29

* Naïve Bayes * Decision tree * Random forest * k-Nearest Neighbors * Logistic regression ## Footnote These algorithms are used to predict the label or class of unseen input data.

Answer 30

Joint probability distributions ## Footnote It uses Baye's theorem to calculate probabilities for predictions.

Answer 31

The probability of A given B ## Footnote Calculated as the probability of B given A times the probability of A divided by the probability of B.

Answer 32

They can indicate unusual data points that may require special treatment ## Footnote Understanding outliers is crucial for accurate data interpretation.

Answer 33

To make inferences about populations based on sample data ## Footnote These tests help in understanding the relationships and differences within data.

Answer 34

Classification ## Footnote It computes probabilities for data points based on their features.

Answer 35

90.83% ## Footnote This indicates a strong likelihood of the point belonging to class A/B.

Answer 36

Classification and regression analysis ## Footnote It creates branches based on yes/no questions to split data.

Answer 37

FALSE ## Footnote Decision trees work well with nonlinear data grouped in various ways.

Answer 38

Maximum number of branch layers ## Footnote Adjusting depth can help prevent underfitting or overfitting.

Answer 39

Ensemble models that combine multiple decision trees ## Footnote They improve prediction accuracy and reduce overfitting.

Answer 40

* Predict future cost based on historical data * Assess risk via probability scores * Rank importance of variables * Detect anomalies * Provide what-if analysis ## Footnote Random forests enhance decision-making in cost analysis.

Answer 41

Number of closest neighbors considered ## Footnote This tuning parameter influences the classification outcome.

Answer 42

Estimates probability of binary outcomes ## Footnote It is useful for making binary cost decisions and assessing risks.

Answer 43

P = 1 / (1 + e^-(β0 + β1X1 + β2X2 + ... + βnXn)) ## Footnote This equation transforms linear combinations of input features into probabilities.

Answer 44

Predict numerical outcomes ## Footnote It can be used for continuous variables like cost.

Answer 45

* Univariate: Single predictor variable * Multivariate: Multiple predictor variables ## Footnote Both can be used for classification by mapping continuous outputs to class labels.

Answer 46

The significance of variables in predicting outcomes ## Footnote It is determined by the coefficients assigned to each variable.

Answer 47

Modify model complexity to fit data better ## Footnote Examples include depth in decision trees and k in kNN.

Answer 48

* Multiple predictor variables * Estimating an outcome * Mapping continuous outputs to class labels ## Footnote Techniques include Linear Discriminant Analysis (LDA) and polynomial regression.

Answer 49

* Making laws * Consists of the House of Representatives and the Senate ## Footnote Together, they form the U.S. Congress.

Answer 50

* Project size * Materials used ## Footnote These predictors can be used in multivariate regression.

Answer 51

* Predicting numerical values * Splitting data into yes or no questions ## Footnote It works well with nonlinear data in groups.

Answer 52

* Combines multiple decision trees * Averages their results ## Footnote Typically uses 100 or more trees for accuracy.

Answer 53

* Continuous values * Based on average or weighted average of nearest neighbors ## Footnote It can also classify projects into categories based on thresholds.

Answer 54

FALSE ## Footnote It analyzes unlabeled data to discover inherent similarities.

Answer 55

* Finding similar data * Creating clusters from a dataset ## Footnote It minimizes variance between centroids and observations.

Answer 56

* Input layer * Hidden layers * Output layer ## Footnote It uses forward and backward propagation for calculations and learning.

Answer 57

* Responding to an environment in real time * Achieving a specified goal ## Footnote Examples include autonomous vehicles and game-playing computers.

Answer 58

* Heuristic and cognitive bias ## Footnote Examples include availability heuristic and optimism bias.

Answer 59

* Predicting project cost risk * Assessing whether a project is high risk (1) or low risk (0) ## Footnote It uses historical data to build a model.

Answer 60

* Higher likelihood of the outcome being 1 ## Footnote Negative coefficients decrease the likelihood of the outcome being 1.

Answer 61

information ## Footnote They consist of interconnected nodes (neurons) that solve complex problems.

Answer 62

* Strength of feature's influence * Direction of feature's influence ## Footnote Positive coefficients increase the likelihood of outcomes being 1, while negative coefficients decrease it.

Answer 63

P = 1 / (1 + e^-(β₀ + β₁X₁ + β₂X₂ + ... + βnXn) ## Footnote Where β₀ is the intercept, βi are the coefficients, and Xi are the feature values.

Answer 64

Intercept ## Footnote It is the constant term in the logistic regression equation.

Answer 65

Most significant positive influence on cost risk ## Footnote A higher complexity score increases the likelihood of high cost risk.

Answer 66

Significant negative influence ## Footnote Experienced teams reduce the likelihood of high cost risk.

Answer 67

Log-odds = -8.0 + (0.9*4) + (0.5*10) + (-0.8*6) + (0.2*9) + (1.5*6) + (0.05*65) ## Footnote The calculation results in log-odds = 0.85.

Answer 68

P = 0.99995 ## Footnote This indicates a very high likelihood that the project is high risk.

Answer 69

TRUE ## Footnote A successful analyst must understand the data and the processes of analysis.

Answer 70

* Scatter plots * Histograms * Bar charts ## Footnote These graphics help in effectively communicating data analysis results.

Answer 71

Properly select the right analytical methods ## Footnote This understanding is crucial for effective data analysis.

Answer 72

Additional layers of data analysis ## Footnote They offer innovative ways of classifying and grouping project data.

Module 6 - Basic Data Analysis Principles Flashcards

(96 cards)