Ingestion is the process of bringing application data, streaming data, and batch data into the cloud.
Storage stage focuses on persisting data to an appropriate storage system.
Processing and analyzing is about transforming data into a form suitable for analysis.
Exploring and visualizing focuses on testing hypotheses and drawing insights from data.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
Batch Data
Batch data is ingested in bulk, typically in files.
Examples of batch data ingestion include uploading files of data exported from one application to be processed by another.
Large sets of data tha ‘pool’ up over time.
Low latency is not as important.
Both batch and streaming data can be transformed and processed using Cloud Dataflow.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
Streaming Data
Streaming data is a set of data that is sent in small messages that are transmitted continuously from the data source.
Streaming data may be telemetry data, which is data generated at regular intervals, and event data, which is data generated in response to a particular event.
Stream ingestion services need to deal with potentially late and missing data.
Requires low latency.
Streaming data is often ingested using Cloud Pub/Sub.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
Data Processing Solutions
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
Levels of structure of data
These levels are structured, semi-structured, and unstructured.
Structured data has a fixed schema, such as a relational database table.
Semi-structured data has a schema that can vary; the schema is stored with data.
Unstructured data does not have a structure used to determine how to store data.