1_Data Processing fundamentals Flashcards by Julien Heck

Data Lifecycle

Ingestion is the process of bringing application data, streaming data, and batch data into the cloud.
Storage stage focuses on persisting data to an appropriate storage system.
Processing and analyzing is about transforming data into a form suitable for analysis.
Exploring and visualizing focuses on testing hypotheses and drawing insights from data.

How well did you know this?

Not at all

Perfectly

Batch Data

Batch data is ingested in bulk, typically in files.
- Examples of batch data ingestion include uploading files of data exported from one application to be processed by another.
Large sets of data tha ‘pool’ up over time.
Low latency is not as important.
Both batch and streaming data can be transformed and processed using Cloud Dataflow.

How well did you know this?

Not at all

Perfectly

Streaming Data

Streaming data is a set of data that is sent in small messages that are transmitted continuously from the data source.
Streaming data may be telemetry data, which is data generated at regular intervals, and event data, which is data generated in response to a particular event.
Stream ingestion services need to deal with potentially late and missing data.
Requires low latency.
Streaming data is often ingested using Cloud Pub/Sub.

How well did you know this?

Not at all

Perfectly

Data Processing Solutions

How well did you know this?

Not at all

Perfectly

Levels of structure of data

These levels are structured, semi-structured, and unstructured.
Structured data has a fixed schema, such as a relational database table.
Semi-structured data has a schema that can vary; the schema is stored with data.
Unstructured data does not have a structure used to determine how to store data.

How well did you know this?

Not at all

Perfectly

Choosing a datastore

How well did you know this?

Not at all

Perfectly

1_Data Processing fundamentals Flashcards