Data Quality Dimensions Flashcards

(37 cards)

1
Q

What is accuracy?

A

How close the data is to the true or correct value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an example of accuracy?

A

a customer’s birthdate is recorded as 02/15/1990 when it is actually 02/16/1990

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is completeness?

A

whether all required data is present

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an example of completeness?

A

if 30% of rows in “phone number” column are blank, then the data is incomplete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is consistency?

A

whether data is consistent across different systems or datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of consistency?

A

A customer’s address appears as “123 Main st.” in one table and “123 Main Street” in another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is continuity?

A

the level of availability of both historical and current snapshot data points or records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an example of continuity?

A

the continued acquisition and availability of data should be sufficiently consistent with previous data and compatible with future data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is coverage?

A

the availability and comprehensiveness of data compared to the total universe or population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is coverage important?

A

ensures that a dataset or subproduct is fit-for-purpose for client consumption across entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is discoverability?

A

the degree to which data can be found by an end user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we make data more discoverable?

A

has the right metadata associated with the content of interest and is indexed/ easily searchable for a user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is timeliness?

A

whether data is up to date and available when needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an example of timeliness?

A

a stock price feed delayed by 10 minutes isn’t timely enough for real-time trading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is uniqueness?

A

whether each record is unique and there are no duplicates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is validity?

A

whether data follows the correct format, rules or constraints

17
Q

What is an example of validity?

A

a field expecting a zip code has a “ABCDE” value

18
Q

What is the metric for accuracy?

19
Q

What is the metric for timeliness?

A

time to publish

20
Q

What is the metric for completeness?

A

% missing data

21
Q

What is the metric for coverage?

A

% in-scope of regulation

22
Q

What is the best in class metric for accuracy?

23
Q

What is the acceptable metric for accuracy?

24
Q

What is the bad metric for accuracy?

25
What is the flight risk metric for accuracy?
<90%
26
What is the best in class metric for timeliness?
24h
27
What is the acceptable metric for timeliness?
3 weeks
28
What is the bad metric for timeliness?
5 weeks
29
What is the flight risk metric for timeliness?
8+ weeks
30
What is the best in class metric for completeness?
10% missing
31
What is the acceptable metric for completeness?
15% missing
32
What is the bad metric for completeness?
20% missing
33
What is the flight risk metric for completeness?
35%+ missing
34
What is the best in class metric for coverage?
95%
35
What is the acceptable metric for coverage?
90%
36
What is the bad metric for coverage?
75%
37
What is the flight risk metric for coverage?
50%