AWS Redshift Flashcards by Keith Tobin

When you create a cluster, what do you get as a base configuration?

You get two nodes, leader and a data node, giving 160GB.

How well did you know this?

Not at all

Perfectly

Do you get to select the disk size for RedShift?

No, you do not get to select the dist size. You do get to select the overall size of the Redshift cluster, through a slider in the console or parameter in CLI & API. AWS will then figure the number of disks in each data node.

How well did you know this?

Not at all

Perfectly

I need to add capacity to my redshift cluster, how can I do this?

You have two options, you can scale up or out. Scaling up means you can change the size of the instance or you can add more node by scaling out.

How well did you know this?

Not at all

Perfectly

What interfaces does RedShift support?

ODBC
JDBC
Postgres

How well did you know this?

Not at all

Perfectly

What is RedShift built on?

AWS Postgress, AWS separated the storage from the query engine and then replaced the storage engine with a columnar database.

How well did you know this?

Not at all

Perfectly

What is RedShift used for?

Data Wherehouse

- Analytics

How well did you know this?

Not at all

Perfectly

I have data in S3, is it possible to query this data from RedShift?

Yes, RedShift has a service called RedShift Spectrum, the data in S3 must be in a CVS format.

How well did you know this?

Not at all

Perfectly

What type of database is Redshift?

It is a columnar database, it is designed to scan columns of data fast. With columnar data, it is easy, to sum up, a column or find the min and max fast.

How well did you know this?

Not at all

Perfectly

What is the architecture of a RedShift cluster?

You have a leader node and data nodes, data nodes have slices and these slices are the storage where data is stored and searched.

How well did you know this?

Not at all

Perfectly

What is the purpose of the leader node?

The leader node distributes the query to the data nodes in the cluster, the leader node is the query planner node..

How well did you know this?

Not at all

Perfectly

Is RedShift an OLAP or OLTP?

It is OLAP (online analytic processing).

How well did you know this?

Not at all

Perfectly

Is RedShift a regional, Global?

Redshift just lives in a single Subnet in a single AZ, the reason for this is the components need to be fast and keeping them together requires the components kept local.

How well did you know this?

Not at all

Perfectly

Is data compressed in Redshift?

You can have data compressed in Redshift, this is not blanket compression but is defined when you create a table and is defined per field in the table.

How well did you know this?

Not at all

Perfectly

Is Redshift a service or do you get a cluster of nodes?

You get a cluster of nodes, one leader and the rest are data nodes.

How well did you know this?

Not at all

Perfectly

Dose Redshift support encryption?

Yes, you can use KMW or CloudHSM, with KMW you can use AWS Managed CMK’s or you can use you own CMK

How well did you know this?

Not at all

Perfectly

Can I resize a cluster?

You have two options, elastic resize and classic resize. Elastic resize makes a new cluster and copies from one node to another. Elastic resize just adds node and rebalance the data.

How well did you know this?

Not at all

Perfectly

I wnat to increase the size of the Redshift cluster nodes, how can I do this?

You have to use classic resize as it enables the resizing of nodes. A new cluster will be created and the data will be copied over to the new cluster.

How well did you know this?

Not at all

Perfectly

How is Redshift backed up?

When the cluster is created the default is, automatic backups, backup snapshots are taken of the Redshift cluster and you can also use manual snapshots. Snapshot data is stored in S3.

How well did you know this?

Not at all

Perfectly

What services can push or load data into Redshift?

Kinesis
S3
DataPipeline

How well did you know this?

Not at all

Perfectly

How often does AWS take snapshots of the Redshift cluster?

every 6 - 8 hrs or every 5gb of data changes.

How well did you know this?

Not at all

Perfectly

Is it possible to take a manual snapshot of the Redshift cluster?

Yes, 100%, you also set how long you wnat the snapshot to be retained, -1 forever.

How well did you know this?

Not at all

Perfectly

I am concerned about DR for my Redshift, what options do I have?

You can configure to have the snapshots replicated to another region, you select the region and retention period.

How well did you know this?

Not at all

Perfectly

If I wnat to restore a table form a snapshot/backup, is this possible?

Yes, you can select the backup/snapshot and then the database and the table.

How well did you know this?

Not at all

Perfectly

I wnat to be able to restore my cluster in the event of a disaster, what options do I have?

You can have the Redshift cluster take snapshots/backups and then you will be able to restore.

How well did you know this?

Not at all

Perfectly

What is the Max data the RedShift can manage?

2PB

What type of database is RedShift

Colomer database

Is Redshift an OLAP or OLTP?

OLTP

I what to increase the amount of data in my Redshift cluster, how can I do this?

Increase the number of nodes as each node is a computer and storage unit.

What types of nodes do you get in a redshift cluster?

You get a leader node and data nodes

For data nodes, are there different types of nodes?

Yes, you have two instance type options, - Instance DC2 (SSD) - Instance DS2 (Magnetic)

I have one large file (1TB), what should I do when loading into Redshift and why?

You need split the file into a smaller file so that each of the files will get loaded on to separate nodes in the RedShift cluster.

What are the two operations you perform on a Redshift cluster to get dat in and out?

load and unload

Where is Redshift deployed to?

VPC

Can you purchase reservations?

Yes

I wnat to be able to store user information and update individual user data fields, is Redshift suitable, give reson?

Redshift is an OLAP (Colum DB) and not suitable for OLTP type data.

Can you make a redshift cluster public?

Yes

How do backups work on Redshift?

You get to take snapshots manually and automatically, these are incremental and like other databases, you can restore to any point in time

What are the AWS services that can put data into Redshift?

- Datapipeline - Kinesis firehose - S3

How can I increase the DR capabilities of Redshift?

Ensure snapshot are automatically take/configured, enable cross-regions snapshots to copy the s3 snapshot to another region.

Can I just restore a Table and not the whole database?

Yes, you have the ability to restore just a table.

What is RedShift?

Redshift is a fully managed, fast and powerful, petabyte-scale data warehouse service

What is the smallest redshift cluster you can have?

1 one it acts are both the compute and lead node.

What is the purpose of the lead mode?

it is to distribute the incoming request to the leads and collect the result.

In a redshift cluster, how may lead nodes will it take to store 1PB of data?

None, lead nodes do not store data, data nodes store data in redshift.

How can you query data in a redshift cluster?

Using Postgres SQL.

Can I select form a T2 micro and a T2 standard instance when creating a RedShift cluster?

No, there are only two instance types supported, - DC2 instance types - S2

When we load data, what are we doing?

Putting data in the RedShift cluster.

Where is data stored in the RedShift cluster?

Data is stored in slices in the data nodes, a data can cna either have 2 or 16 slices. Each slice will query its own data to get a result.

I am using ODBC and I need to load data into RedShift, do I need a third party product to load that data?

No, ODBC is supported by RedShift

I am using JDBC and I need to load data into RedShift, do I need a third party product to load that data?

No, JDBC is supported by RedShift

As redshift is a managed service form AWS, I am concerned I will not be able to have RedShift deployed to my VPC, is this valid?

No, RS can be deployed to your VPC

I wnat to have a RedShift cluster public-facing, how can i do this?

Yes, its an option, you deploy to you public VPC

By default where is the RS cluster deployed?

To the default VPC

I am architecting a solution, my org does not wnat any data in transit over the public internet, I have data in S3 to be loaded into redshift, how can I architect this solution so no data goes over the internet?

Put the RS cluster in a private VPC with VPC endpoint gateway for access to s3 data so we cna load the data without going over the public internet.

I know I am going to be using my RS cluster for the next 3 years, I wnat to reduce the cost, how can I do this?

You can do a reservation for the nodes in RS, just like EC2 reservations and this will save you on cost.

My org requires data at rest encryption, how can I implement this in RS?

Just like other AWS products/services, RS supports encryption of data, SSE with customer or AWS managed CMKs

Can I resize RS?

Yes, two options, - Classic: Create a new cluster and copy data. - Elastic resize: You cna just change the number of node

When you resize an RS clutter is there some disruption?

Yes

What I am using RS to access S3 through a VPC endpoint, what option do I need to enable?

Enhanced routing

How are backups created in RS?

Scheduled and manual snapshots

How long are automatic snapshots retained for RS?

You set the retention period, afterwords the data is deleted.

Can I take manual snapshots for RS?

Yes

How long are manual snapshots retained for RS?

You set the retention period, afterwords the data is deleted.

How can I load data into an RS cluster?

- Data pipeline

I have data in MySQL table and I wnat to load it into my RS cluster, how cna I do this?

- Data pipeline has a template to load MySQL data into the RS cluster.

I wnat to each day load my MYSQL table into my RS cluster, how cna I do this?

- Data pipeline has a template for copying data and using a schedule.

I wnat to copy data from S3 to my SR cluster several times a day, how cna I do this?

You can use the Datapipeline template to copy S3 data to RS and modify it so it uses a schedule.

How cna I load and unload data from S3, would I use the datapipeline?

No, RS has this ability native.

When using automated snapshots, when is the snapshot preformed?

every 6 - 8 hrs or after 5GB of data changes, whichever comes first.

Where are backups stored?

Is it possible to exclude a table form backup processes?

Yes, there is a parameter to have 'no backup' on a per-table base.

Can I take a manual snapshot?

Yes

If I set a retention period for backup to -1 what will happen?

The backup will never expire and be deleted.

I want to have a DR capability for my RS cluster, how can I do this?

Enable cross-region replication to have a copy of the data in another region.

I have just deleted data from my table in RS, I keep daily backups (snapshots), do I have to do a full restore?

No, you have two options, - Full restore - Table restore

From a high availability perspective for RS, is there only one copy of your data on each node?

No, each node also copies its data to another node in the cluster, you do not pay extra for this replicated data.

Di I need to patch the Leader and Data node in my RS cluster?

No, RS is a managed service and patching is performed by AWS.

What is the max retention period for backups?

35 days

What is the min retention period for backups?

1 day

How can you scale an RS cluster?

Up or out, you cna change the size of the instance or add more instances.

When you change the size of the RS instances, when are changes applied?

- Now if you select it | - or in the maintenance window.

I want the high-level of security, where my RS encryption keys are meeting the high level of security, what options do I have?

You cna use the CloudHSM for key management.

You have recently joined a startup company building sensors to measure street noise and air quality in urban areas. The company has been running a pilot deployment of around 100 sensors for 3 months. Each sensor uploads 1KB of sensor data every minute to a backend hosted on AWS. During the pilot, you measured a peak of 10 IOPS on the database, and you stored an average of 3GB of sensor data per month in the database. The current deployment consists of a load-balanced auto scaled Ingestion layer using EC2 instances and a PostgreSQL RDS database with 500GB standard storage. The pilot is considered a success and your CEO has managed to get the attention of some potential investors. The business plan requires a deployment of at least 100K sensors, which needs to be supported by the backend. You also need to store sensor data for at least two years to be able to compare year over year Improvements. To secure funding, you have to make sure that the platform meets these requirements and leaves room for further scaling. Which setup will meet the requirements?

Add an SQS queue to the ingestion layer to buffer writes to the RDS instance (RDS instance will not support data for 2 years) Ingest data into a DynamoDB table and move old data to a Redshift cluster (Handle 10K IOPS ingestion and store data into Redshift for analysis) Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage (Does not handle the ingestion issue) Keep the current architecture but upgrade RDS storage to 3TB and 10K provisioned IOPS (RDS instance will not support data for 2 years)

Does RS provide automatic out of the box backup?

Yes, 100%. RS provides the ability to have you RS data backed up, this happens every 6-8 hrs or hewn you get a 5GB data change. Retention is settable for 1 - 35 days.

How can I lower cost in RS?

Opt to use reserved instances for RS the is running for very long periods.

I wnat to query my CVS data in S3 from RedShift, how cna I do this?

You can use RedShift spectrum.

I have exabytes of CVS data in S3, I have an application using ODBC, how can I query it?

You can use RedShift spectrum, RSS can use ODBS just like redshift and can also query S3

How do I configure and use spot instances with redshift?

You cant, sport instances can be taken back by AWS at andy point and can not be used with RedShift Leader or data nodes as they require always running instances.

Where is RedShift deployed?

In a single region and in a single AZ.

Can you select T1, M1? C1 instances?

No, with RedShift you can select from a limited number of instance types, - ra3.16xlarge (48 vCPU) - dc2.large (2 vCPU) - dc2.8xlarge ( 32 vCPU)

Can you change the cluster VPC after it is created?

No, you can create a new cluster.

What storage size do I get?

It depends, each node as an instance size and an amount of storage, there are 3 instances sizes and each has a different size of storage. You can scale up RS to 8PB storage by scaling out the number of data nodes. to 128, this is 128 nodes with 64TB each

Can you make the RedShift cluster public?

Yes, 100%, it just in a VPC, so you cna add an INternetGateway and give it an EIP. All available through many network provisioning.

I wnat to use Encryption and I need to understand what options I have for key management?

You can use, - KMS - AWS Managed CMKs - KMS - Customer managed CMK - CloudHSM

What is the backup retention period?

1 - 35 days

My ORG is concerned about having a DR available for the RedShift cluster, we want to ensure we can if needed DR to another location, what would be the best method?

- User CloudFormation to recreate the RS cluster in another region if needed. - Use backups with regional replication enabled it to ensure backups are offloaded to another region and can be used to recreate the data.

How are backups taken in RedShift?

RS has two types of backups, - Manual (You take them) - Automatic (Every 8hrs or 5GB per node of data change)

I wnat to take snapshots of the RedShift cluster on my schedule, how can this be done?

You have to create a snapshot schedule, this schedule defines when you wnat snapshots created for your cluster. This schedule is attached to one or more clusters.

Where are snapshots stored?

When you do a snapshot restore for RedShift, how is the data restored to the cluster?

It is not restored to the cluster, you will get a new cluster.

When restoring a cluster, can I use it straight away or do I have to way until all the data is streamed over from the snapshot in S3?

You can use the cluster straight away, data is streamed as needed.

Can you monitor the progress of snapshots?

Yes.

Are backups charged extra?

Sort of, you get to backup using to the same size storage on the RS data nodes, after you snapshots go over thet you are charged at normal rate.

I wnat to manually copy the RS snapshot to a new region, how do I select it to be copied to the new region?

RS snapshots are managed by AWS RS and not visible to you in an S3 bucket.

Can you restore just a table form a snapshot?

Yes, you select the snapshot, database and table

I have a VPC what do I need to create from a network perspective to create your RedShift cluster in your VPC?

You will need a subnet group and security group

AWS Redshift Flashcards

(106 cards)