03_String_Date_DataCleaning Flashcards

(11 cards)

1
Q

Front

A

Back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you detect and replace substrings in R with stringr?

A

Use str_detect() to test; str_replace/all to substitute.

Code:
library(stringr)
str_detect(‘DataCamp’, ‘Camp’) # TRUE
str_replace(‘2024-01-01’, ‘-‘, ‘/’) # ‘2024/01/01’
str_replace_all(‘a-b-c’, ‘-‘, ‘_’) # ‘a_b_c’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you extract text with regex in stringr?

A

Use str_extract()/str_extract_all().

Code:
library(stringr)
str_extract_all(‘id: A12; id: B34’, ‘[A-Z][0-9]+’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you split strings and trim whitespace?

A

Use str_split() and str_trim()/str_squish().

Code:
library(stringr)
str_squish(‘ tidy data ‘) # ‘tidy data’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you parse dates with lubridate? (ymd, dmy, mdy)

A

Use ymd()/dmy()/mdy() for common orders.

Code:
library(lubridate)
d1 <- ymd(‘2024-07-15’)
d2 <- dmy(‘15-07-2024’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you handle datetimes with hours/minutes/seconds?

A

Use ymd_hms()/ymd_hm() etc.

Code:
library(lubridate)
ts <- ymd_hms(‘2024-07-15 12:30:00’, tz = ‘UTC’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you extract date parts with lubridate?

A

Use year(), month(), day(), wday(), yday().

Code:
library(lubridate)
year(ymd(‘2024-07-15’)); month(ymd(‘2024-07-15’), label=TRUE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you round/floor/ceiling dates?

A

Use floor_date(), round_date(), ceiling_date().

Code:
library(lubridate)
floor_date(ymd(‘2024-07-15’), unit=’month’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you add/subtract time spans? (periods)

A

Use days(), months(), years() with +/-.

Code:
library(lubridate)
ymd(‘2024-07-15’) + months(1) - days(3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you detect inconsistent types in columns?

A

Check with dplyr::summarise + across where is.character/is.numeric.

Code:
library(dplyr)
iris %>% summarise(across(everything(), class))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you profile missingness across columns?

A

Use across with is.na to count NAs per column.

Code:
library(dplyr)
df %>% summarise(across(everything(), ~ sum(is.na(.x))))

Notes:
Replace df with your data frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly