Databases Flashcards

(89 cards)

1
Q

What is the most common scaling method with typical relational databases?

A

Vertical scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of query operation is not supported by NoSQL databases?

A

Joins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you do “SUM” or “AVG” aggregations with DynamoDB?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two storage tiers available in DynamoDB?

A

Infrequent access and standard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an item in DynamoDB?

A

A row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two options for choosing a primary key in DynamoDB?

A

Just Hash or Hash+Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The partition key is unique for every ____

A

The partition key is unique for every row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a good rule of thumb to ensure originality when choosing a partition key?

A

Choose the category with the highest cardinality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When should you not use DynamoDB?

A

When traditional relational databases would be a better fit, if the use case needs BLOB storage, if you need to do joins or complex transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 2 DynamoDB capacity modes?

A

Provisioned and on-demand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When should you use on-demand mode for DynamoDB?

A

If you have unpredictable application traffic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can you exceed the specified throughput in DynamoDB provisioned mode?

A

Yes, using burst capacity, but if this is all used up then you’ll get an exception and need to retry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equivalent of 1 write capacity unit?

A

1 write per second for an item up to 1KB in size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equivalent of 1 read capacity unit?

A

1 strong read per second or 2 eventually consistent reads per second for an item up to 4KB in size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an eventually consistent read?

A

Where if you try to read immediately after a write then it might result in the reading of stale data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a ProvisionedThroughPutExceededException often caused by?

A

Exceeding the RCUs or WCUs at the partition level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are 2 ways that you can counteract hot keys/hot partitions/very large items?

A
  • Exponential backoff
  • Distribute partition keys
  • Use DAX if it’s an RCU issue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the API call that you would use with DynamoDB to create or replace an item with the same primary key?

A

PutItem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a conditional write in DynamoDB?

A

The write/update/delete is only accepted if a certain condition is met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does GetItem do?

A

Reads a row based on a primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What type of read does GetItem use by default?

A

Eventually consistent, but can be made to use strongly consistent reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Can FilterExpression be used with key attributes?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does KeyConditionExpression do?

A

The partition key must meet some condition to be included in the query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Are batch operations in DynamoDB parallelised?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is PartiQL?
A SQL compatible query language for DynamoDB. Since it's still on DynamoDB, it cannot do joins.
26
Are 3 key points on a local secondary index?
An alternative sort key for your table, can have up to 5 per table, must be defined when you create the table
27
What does a Global Secondary Index do/enable?
Gives an alternative primary key (hash or hash + range) to speed up queries on non-key attributes
28
What is important to remember for a GSI in terms of capacity for performance?
You have to provision RCUs and WCUs for it separately to the table
29
Can you add or modify a GSI after the table's creation?
Yes
30
What is DAX?
DynamoDB accelerator. A fully managed in-memory cache for DynamoDB
31
Do you have to change application logic for DAX?
No
32
What is DynamoDB streams?
An ordered stream of item-level modifications in a table
33
Why might you use DynamoDB streams?
To react to changes in your DynamoDB table
34
Do you have to provision shards for DynamoDB streams?
No
35
If you want to integrate a Lambda function with your DynamoDB stream, what must it have, and how will it be invoked?
It must have an event source mapping and it will be invoked asynchronously
36
What destinations can you have for DDB Streams?
Kinesis Client Library, Lambda or Kinesis Data Streams
37
Do TTL deletes consume WCUs?
No
38
How quickly are TTL items deleted after expiring?
Within a few days
39
What are 2 common patterns for which there are integrations between S3 and DynamoDB?
Store the large objects in S3 and store the URLs in DynamoDB of where to find them (Related) Use DynamoDB as a metadata store for the objects in S3
40
What type of fine-grained access control does DynamoDB support?
Row-based access control
41
Does DynamoDB support point in time recovery?
Yes
42
What does ACID stand for?
Atomic, consistent, isolated, durable
43
What do each of the components of ACID mean?
Atomic - If any part of the transaction fails, it all fails Consistent - transaction must comply with the constraints set upon it Isolated - transactions executing at the same time should not interfere with each other, e.g. reading and writing in the same place should show either the new data or the old data but not a partial state Durable - once the transaction has been submitted it is permanent or 'committed'
44
What are the two database engines that are compatible with Aurora?
PostgreSQL and MySQL
45
What is the point of a lock in RDS?
To prevent people from being able to read/write to a specific row/table at the same time
46
What are the two types of lock in RDS?
Shared lock - writes disallowed but reads allowed Exclusive lock - no reads or writes allowed
47
What is important for transactions that have locks on them in RDS?
They need to complete, otherwise the lock might never get lifted
48
What is DocumentDB?
The proprietary version of MongoDB on AWS, NoSQL, JSON database
49
What is Amazon MemoryDB for Redis?
A super fast in-memory database that stores transaction logs over multiple AZs
50
Name 2 key facts about Apache Cassandra
1 - Uses Cassandra Query Language 2 - Point in time recovery up to 35 days 3 - Used for storing IoT and time series data
51
What are Neptune's 3 query languages?
* Gremlin (looks ever ever ever so slightly like Java) * openCypher * SPARQL (looks ever so slightly like SQL)
52
What is Neptune?
A fully managed graph database, able to query billions of connections with millisecond latency
53
What is timestream? What does it do with new and old data for efficiency?
A serverless, SQL compatible, time series database that keeps recent data in-memory and older data in cost optimised storage
54
Is Redshift row or column based?
Column-based
55
How is the cluster architecture of Redshift composed?
1 leader node, 1 or more compute nodes each with 1 or more database
56
What happens if a node fails in Redshift?
Redshift automatically recovers with a new one
57
What are the jobs of the leader node in a Redshift cluster?
To get the queries from clients, parse them, develop execution plans, co-ordinate the parallel execution, get the intermediary results and then return the results to the client
58
What can you do with Redshift Spectrum?
Query exabytes of unstructured data in S3 without transforming it or loading it into Redshift.
59
How does Redshift handle horizontal scaling? What happens in the process?
A new cluster is created while the old one remains available for reads. Once this is deployed the CNAME is flipped and traffic is directed to the new cluster
60
What is the goal of distributing data across nodes and slices?
To make sure that data is moving as little as possible during data query execution
61
What are the 4 different Redshift distribution styles?
AUTO / EVEN / KEY / ALL
62
What is the EVEN Redshift data distribution style?
Rows are distributed across slices in a round-robin, only good if neither KEY nor ALL are clearly preferable
63
What is the ALL Redshift data distribution style? When is this appropriate?
The entire table is copied across every node. This multiplies the storage requirement significantly, and is only appropriate for infrequently updated tables
64
What is the most efficient command to move data into Redshift?
COPY
64
What is the KEY Redshift data distribution style?
Rows are distributed based on a 'key' column - this is good if you're querying based on that specific column
65
What command should you use to move data from one RS table to another?
"INSERT INTO...SELECT" or "CREATE TABLE AS"
66
What are 2 things that COPY can do as it moves data?
Compress the data, and decrypt data as it is being loaded FROM S3
67
What is DBLINK and why might you use it?
Allows you to connect your Redshift cluster to a PostgreSQL instance You might to do this to have a good parity of row based and column based storage
68
What is WLM in Redshift?
Workload management - allows for Redshift to prioritise fast queries over long, slow queries and manages query queues
69
What is concurrency scaling in Redshift WLM? Why is it important to choose which queries get this selectively?
The ability to automatically add cluster capacity to handle increases in concurrent read queries It is important to check which queries have this capability since it will cost money
70
What is SQA in Redshift?
Short query acceleration - allows short running queries to get prioritised over longer running queries Can be used instead of WLM if you just care about accelerating short queries over longer ones
71
What is Vacuum in Redshift? What are its 4 versions?
A function that is used to recover space from deleted rows Full, delete only, sort only, reindex
72
What are the 3 ways to resize on the fly with Redshift?
Elastic resize, classic resize, snapshot, restore and resize
73
Why might you use the snapshot, restore and resize strategy over classic resizing in Redshift?
SRR allows you to keep the cluster available during the resizing process as opposed to classic resizing which can take hours or days during which the cluster is read-only
73
How does elastic resize work in Redshift?
You can add or remove nodes of the same type (doesn't allow changing node types) Cluster goes down for a few minutes Limited to doubling or halving for some node types
74
What is special about RA3 nodes in Redshift?
They have decouple storage and compute, which allows you to scale each of those factors up independently
75
What does Redshift ML do/enable?
Allows you to create a model using an SQL command, uses Sagemaker Autopilot in the backend
76
What commands can you use to give privileges to users in Redshift using SQL?
GRANT or REVOKE
77
What is one drawback for Redshift serverless in terms of updates?
If an update is being rolled out the connection will drop without a warning
78
What is the payment model for Redshift serverless?
You pay for Redshift processing units per second and storage used
79
What is a Redshift materialised view? Why might it be beneficial?
A stored result of a query that you can then query again This can be for speeding up your querying as you are no longer querying the whole dataset
80
What is a risk for Redshift materialised views?
The MV can be out of sync if not updated properly
81
When might you use materialised views?
For predictable and recurring queries
82
What does Redshift data sharing allow? Why might you do this?
The read-only sharing of data across Redshift clusters To allow workload isolation, so if the cluster where the data is being bogged down it doesn't affect the place where the data is being written/updated
83
What is Lambda UDF?
Lambda User Defined Functions - allows you to call Lambda functions within your SQL queries in Redshift
84
What are Redshift Federated Queries?
Read-only query access that allow you to analyse across databases, warehouses and data lakes in AWS (Aurora, RDS)
85
Can you use Redshift Federated Queries to query Redshift from RDS?
No, you can only query RDS from Redshift
86
What is contained in Redshift System Tables and System Views?
Information on how Redshift itself is functioning, e.g. with query and workload usage
87
What is the Redshift Data API used for?
To connect your applications to your Redshift cluster over HTTP - send SQL statements to your Redshift clusters