Databases Flashcards

Question

What is PartiQL?

Answer 1

A SQL compatible query language for DynamoDB. Since it's still on DynamoDB, it cannot do joins.

Answer 2

An alternative sort key for your table, can have up to 5 per table, must be defined when you create the table

Answer 3

Gives an alternative primary key (hash or hash + range) to speed up queries on non-key attributes

Answer 4

You have to provision RCUs and WCUs for it separately to the table

Answer 5

DynamoDB accelerator. A fully managed in-memory cache for DynamoDB

Answer 6

An ordered stream of item-level modifications in a table

Answer 7

To react to changes in your DynamoDB table

Answer 8

It must have an event source mapping and it will be invoked asynchronously

Answer 9

Kinesis Client Library, Lambda or Kinesis Data Streams

Answer 10

Within a few days

Answer 11

Store the large objects in S3 and store the URLs in DynamoDB of where to find them (Related) Use DynamoDB as a metadata store for the objects in S3

Answer 12

Row-based access control

Answer 13

Atomic, consistent, isolated, durable

Answer 14

Atomic - If any part of the transaction fails, it all fails Consistent - transaction must comply with the constraints set upon it Isolated - transactions executing at the same time should not interfere with each other, e.g. reading and writing in the same place should show either the new data or the old data but not a partial state Durable - once the transaction has been submitted it is permanent or 'committed'

Answer 15

PostgreSQL and MySQL

Answer 16

To prevent people from being able to read/write to a specific row/table at the same time

Answer 17

Shared lock - writes disallowed but reads allowed Exclusive lock - no reads or writes allowed

Answer 18

They need to complete, otherwise the lock might never get lifted

Answer 19

The proprietary version of MongoDB on AWS, NoSQL, JSON database

Answer 20

A super fast in-memory database that stores transaction logs over multiple AZs

Answer 21

1 - Uses Cassandra Query Language 2 - Point in time recovery up to 35 days 3 - Used for storing IoT and time series data

Answer 22

* Gremlin (looks ever ever ever so slightly like Java) * openCypher * SPARQL (looks ever so slightly like SQL)

Answer 23

A fully managed graph database, able to query billions of connections with millisecond latency

Answer 24

A serverless, SQL compatible, time series database that keeps recent data in-memory and older data in cost optimised storage

Answer 25

Column-based

Answer 26

1 leader node, 1 or more compute nodes each with 1 or more database

Answer 27

Redshift automatically recovers with a new one

Answer 28

To get the queries from clients, parse them, develop execution plans, co-ordinate the parallel execution, get the intermediary results and then return the results to the client

Answer 29

Query exabytes of unstructured data in S3 without transforming it or loading it into Redshift.

Answer 30

A new cluster is created while the old one remains available for reads. Once this is deployed the CNAME is flipped and traffic is directed to the new cluster

Answer 31

To make sure that data is moving as little as possible during data query execution

Answer 32

AUTO / EVEN / KEY / ALL

Answer 33

Rows are distributed across slices in a round-robin, only good if neither KEY nor ALL are clearly preferable

Answer 34

The entire table is copied across every node. This multiplies the storage requirement significantly, and is only appropriate for infrequently updated tables

Answer 35

Rows are distributed based on a 'key' column - this is good if you're querying based on that specific column

Answer 36

"INSERT INTO...SELECT" or "CREATE TABLE AS"

Answer 37

Compress the data, and decrypt data as it is being loaded FROM S3

Answer 38

Allows you to connect your Redshift cluster to a PostgreSQL instance You might to do this to have a good parity of row based and column based storage

Answer 39

Workload management - allows for Redshift to prioritise fast queries over long, slow queries and manages query queues

Answer 40

The ability to automatically add cluster capacity to handle increases in concurrent read queries It is important to check which queries have this capability since it will cost money

Answer 41

Short query acceleration - allows short running queries to get prioritised over longer running queries Can be used instead of WLM if you just care about accelerating short queries over longer ones

Answer 42

A function that is used to recover space from deleted rows Full, delete only, sort only, reindex

Answer 43

Elastic resize, classic resize, snapshot, restore and resize

Answer 44

SRR allows you to keep the cluster available during the resizing process as opposed to classic resizing which can take hours or days during which the cluster is read-only

Answer 45

You can add or remove nodes of the same type (doesn't allow changing node types) Cluster goes down for a few minutes Limited to doubling or halving for some node types

Answer 46

They have decouple storage and compute, which allows you to scale each of those factors up independently

Answer 47

Allows you to create a model using an SQL command, uses Sagemaker Autopilot in the backend

Answer 48

GRANT or REVOKE

Answer 49

If an update is being rolled out the connection will drop without a warning

Answer 50

You pay for Redshift processing units per second and storage used

Answer 51

A stored result of a query that you can then query again This can be for speeding up your querying as you are no longer querying the whole dataset

Answer 52

The MV can be out of sync if not updated properly

Answer 53

For predictable and recurring queries

Answer 54

The read-only sharing of data across Redshift clusters To allow workload isolation, so if the cluster where the data is being bogged down it doesn't affect the place where the data is being written/updated

Answer 55

Lambda User Defined Functions - allows you to call Lambda functions within your SQL queries in Redshift

Answer 56

Read-only query access that allow you to analyse across databases, warehouses and data lakes in AWS (Aurora, RDS)

Answer 57

No, you can only query RDS from Redshift

Answer 58

Information on how Redshift itself is functioning, e.g. with query and workload usage

Answer 59

To connect your applications to your Redshift cluster over HTTP - send SQL statements to your Redshift clusters

Databases Flashcards

(89 cards)