SQL Flashcards

Question

DROP

Answer 1

Deletes database objects permanently. It can remove information from a table. -DROP TABLE users: Removes the users table and its data. -DROP INDEX idx_email ON users: Deletes an index. NOTE: If you delete a column, you will remove all of the data inside that column as well. Syntax: DROP TABLE table_name; Example: DROP TABLE Shippers;

Answer 2

AND BETWEEN IN LIKE NOT OR

Answer 3

CREATE ALTER DROP

Answer 4

No Join Inner Join Left Join Right Join Full Join

Answer 5

Specific, Measurable, Actionable, Relevant, Time-Bound Specific: Do you currently use data to drive decisions in your business? If so, what kind(s) of data do you collect, and how do you use it? Measurable: Do you know what percentage of sales is from your top-selling products? Actionable: Are there business decisions or changes that you would make if you had the right information? For example, if you had information about how umbrella sales change with the weather, how would you use it? Relevant: How often do you review data from your business? Time-bound: Can you describe how data helped you make good decisions for your store(s) this past year?

Answer 6

Software that manages all of the requests that are put through to the database. (Requests may include people using SQL to request info from the database, applications using SQL to request info from the database, SQL requests to create data visuals from the database, etc.)

Answer 7

The hardware. Like a powerful PC, lives 24/7. (physical machine where database(s) live) can host multiple databases (i.e., containers of your data)

Answer 8

Server>DBMS>SQL>Database Server: Physical machine where the database lives DBMS: Software that manages the database SQL: Language used to speak to the database SQL: Language used to speak to the database

Answer 9

* Relational Database: spreadsheets (i.e., columns and rows) w/ relationships linking them to each other. * Key-Value Database: Pairs of keys and value. (Think of it like a dictionary where you have a word (key) and definition (value)). * Column-Based Database: Group data by columns (as opposed to by row). This is an advanced database that handles large quantities of data. Main purpose is to search for data. * Graph Database: The relationship between objects. Main idea: how to connect my data points? Document Database: Data stored as entire documents. Structure of data is less important.

Answer 10

spreadsheets (i.e., columns and rows) w/ relationships linking them to each other. NOTE: The relational database is called a SQL database (e.g., Microsoft SQL Server, MySQL, PostgreSQL)

Answer 11

Pairs of keys and value. (Think of it like a dictionary where you have a word (key) and definition (value)). A key-value database is called a No SQL database. Key-value databases include Redis, Amazon Dynamo DB)

Answer 12

Group data by columns (as opposed to by row). This is an advanced database that handles large quantities of data. Main purpose is to search for data. A column-based database is called a No SQL database. Column-based databases include Apache Cassandra, Amazon Redshift

Answer 13

The relationship between objects. Main idea: how to connect my data points? A graph database is called a No SQL database. Graph databases include Neo4j

Answer 14

Data stored as entire documents. Structure of data is less important. A document database is called a No SQL database. Document databases include MangoDB

Answer 15

* Servers (physical machine where database(s) live) can host multiple databases (i.e., containers of your data) > * Each database can have multiple schemas (i.e., a logical container in which you can group up related objects)> ○ Example: You have hundreds of tables, put all tables that have to do with order in one schema & all orders that have to do with customers in another schema). * Each schema can have multiple objects (e.g., tables). ○ A table organizes data into columns (e.g.., ID, name, score, birthdate, etc.). (Each column may be called a field.) ○ The data is stored in each row. (A row may be called a record.) ○ Primary Key: every table has this. Necessary to have one unique identifier (e.g., each customer has a unique ID like 1 or 2 or 3). ○ Cell: singular value § Data Types: □ Numeric: Integer (1, 2, 30), Decimal (3.14, 100.50) □ String/Text: Char or Varchar ® Char: fixed (e.g., if you define it as 5 characters, it will go and reserve 5 characters from the space) ® Varchar: use when you want things to be more dynamic ® Date & Time: ◊ Date: '2025-10-30' Time: '09:30:00'

Answer 16

used to define the way data is structured in a database. DDL consists of SQL commands that define and modify database structures. It sets up structures like tables and schemas, modifies their design, and removes them when they're no longer needed. When setting up a database, DDL commands define tables, indexes, and relationships. These commands modify structures and remove database objects. ○ Create: Defines new database objects like tables, indexes, or entire databases. § CREATE DATABASE my_database: Creates a new database § CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR (50), email VARCHAR (100): Defines a table with columns and data types. § CREATE INDEX idx_email ON users (email): Generates an index for faster lookups. ○ Alter: Modifies existing database structures without affecting stored data. § ALTER TABLE users ADD COLUMN phone VARCHAR (20): Adds a new column to the users table. § ALTER TABLE users MODIFY COLUMN name VARCHAR (100): Changes a column's data type. ○ Drop: Deletes database objects permanently. § DROP TABLE users: Removes the users table and its data. § DROP INDEX idx_email ON users: Deletes an index. ○ Truncate: Clears all records from a table without removing its structure. § TRUNCATE TABLE users: Empties the users table while keeping its schema intact.

Answer 17

used to manage data within a database. It's a component of SQL used to insert, retrieve, modify, and delete records within a database structure. a. INSERT data into a database/adds new rows to a table (e.g., you have an app that is generating a lot of data, you would use SQL to "insert" that data into the database). b. UPDATE already existing data in table/modifies existing data within a table c. DELETE already existing data from table/removes specific rows from a table. d. SELECT: Retrieves data from one or more tables (e.g., selects records from the customers table)

Answer 18

Responsible for reading/querying data from the database. In SQL, this corresponds to the SELECT

Answer 19

Responsible for the administrative tasks of controlling the database itself (e.g., granting and revoking database permissions for users).

Answer 20

DDL: Data Definition Language. Defines database objects like tables and indexes DML: Data Manipulation Language. Manages data within those structures DQL: Data Query Language. Queries data from the database. DCL: Data Control Language. Responsible for admin tasks related to the database (i.e., granting and revoking database permissions for users)

Answer 21

ORDER BY example: SELECT first_name, last_name FROM actor ORDER BY last_name, first_name; Output Christian Akroyd Debbie Akroyd Cuba Allen Kim Allen Meryl Allen Jim Astaire GROUP BY example: SELECT count(film_actor.actor_id) AS num_of_films_appeared_in, actor.actor_id, actor.first_name, actor.last_name FROM actor INNER JOIN film_actor ON actor.actor_id = film_actor.actor_id GROUP BY film_actor.actor_id; Output: 2 1 Penelope Guiness 25 2 Nick Wahlberg 22 3 Ed Chase GROUP BY & ORDER BY together: SELECT count(film_actor.actor_id) AS num_off_films_appeared_in, actor.actor_id, actor.first_name, actor.last_name FROM actor INNER JOIN film_actor ON actor.actor_rid = film_actor.actor_id GROUP BY film.actor.actor_id ORDER BY num_of_films_appeared_in DESC; Output: # films actor_id first_name last_name 42 107 Gina Degeneres 41 102 Walter Torn 40 198 Mary Keitel 39 181 Matthew Carrey

Answer 22

UNION: Merges all unique rows from two or more SELECT statements, eliminating duplicates. UNION ALL: Merges all rows from two or more SELECT statements, keeping duplicates. INTERSECT: Returns only the rows that appear in both SELECT statements. EXCEPT: Returns rows from the first SELECT statement that don't appear in the second.

Answer 23

A SQL Set Operator Merges all unique rows from two or more SELECT statements, eliminating duplicates. For example, suppose we have two tables, employees and contractors, each with similar columns such as contractors, department, and salary. For learning purposes, let’s consider these two dummy tables: name department salary Alice Marketing 65000 Bob Sales 70000 Carol Engineering 80000 John HR 55000 Employees name department salary David Marketing 60000 Eva Sales 68000 Carol Engineering 75000 We can combine the results from both tables using the following command: -- Using INTERSECT to find common employees SELECT name, department FROM employees INTERSECT SELECT name, department FROM contractors; This query selects the name, department, and salary columns from both the employees and contractors tables and combines them into a single result set. The UNION operator automatically removes duplicate rows from the final result set. name department salary Alice Marketing 65000 Bob Sales 70000 Carol Engineering 80000 John HR 55000 David Marketing 60000 Eva Sales 68000 Notice that Carol, who appears in both tables, is only listed once in the result. If we wanted to keep both instances of Carol (with her different salaries), we would use UNION ALL. NOTE: the UNION operator does not remove NULL values. If a column contains NULL values in one result set and non-NULL values in the corresponding column of another result set, the NULL values will be retained in the final result set produced by the UNION operator. If we want to include NULL values in the result set and prevent their removal by the UNION operator, we can use the UNION ALL operator instead. This operator combines the results of multiple SELECT queries, including all rows from each result set, regardless of whether they are duplicates or contain NULL values.

Answer 24

The INTERSECT operator returns only the rows that appear in both result sets. Think of it as finding the people who belong to both groups. -- Using INTERSECT to find common employees SELECT name, department FROM employees INTERSECT SELECT name, department FROM contractors; Output: name department Carol Engineering Note: The INTERSECT operator handles NULL values based on standard comparison rules, considering NULL values equal when comparing corresponding columns. It also results in an empty set when dealing with empty result sets. In other words, if a NULL value is present in one result set and the corresponding column in the other result set contains a non-NULL value, the rows are not considered equal – they will not be included in the intersection result. Additionally, If one of the result sets provided to the INTERSECT operator is empty (i.e., it contains no rows), the overall result of the INTERSECT operation will also be empty since there are no common rows between an empty set and any other set.

Answer 25

SQL Set Operator The EXCEPT operator retrieves the rows present in the first result set but not in the second. For example, let’s say we execute the following query: -- Using EXCEPT to find employees who are not contractors SELECT name, department, salary FROM employees EXCEPT SELECT name, department, salary FROM contractors; The name, department, and salary columns are selected from the employees table and return only the rows that do not exist in the contractors table. name department salary Alice Marketing 65000 Bob Sales 700 John HR 55000 Note: The EXCEPT operator also follows standard comparison rules for handling NULL values. Its behavior with empty result sets results in an empty set if the first result set is empty or includes all rows from the first result set if the second result set is empty.

Answer 26

Set Operators: Union, Intersect, Except Rule #1: ORDER BY can only be used once Rule #2: Same number of columns Rule #3: Matching data types Rule #4: Same order of columns Rule #5: First query controls aliases Rule #6: Map correct columns

Answer 27

To find and return values with even or odd values: check the remainder when you divide by 2. -If the remainder is 0, it's an even number. -If not, it's an odd number. --Find rows where a specified column has even values (MS SQL Server): SELECT * FROM table_name WHERE column_name % 2 = 0; Find rows where a specified column has odd values (MS SQL Server): SELECT * FROM table_name WHERE column_name % 2 <> 0;

Answer 28

( SELECT CITY, LENGTH(CITY) AS NAME_LENGTH FROM STATION ORDER BY LENGTH(CITY) ASC, CITY ASC LIMIT 1 ) UNION ( SELECT CITY, LENGTH(CITY) AS NAME_LENGTH FROM STATION ORDER BY LENGTH(CITY) DESC, CITY ASC LIMIT 1 ); LENGTH(CITY): gets the number of characters in the city name. The first subquery: -Orders by LENGTH(CITY) ascending (shortest first), then by CITY alphabetically. -LIMIT 1 gets the shortest city name. The second subquery: -Orders by LENGTH(CITY) descending (longest first), then by CITY alphabetically. -LIMIT 1 gets the longest city name. -UNION ensures both rows appear in the result (duplicates are not an issue here since lengths differ).

Answer 29

SELECT DISTINCT CITY FROM STATION WHERE LOWER(LEFT(CITY, 1)) IN ('a', 'e', 'i', 'o', 'u') AND LOWER(RIGHT(CITY, 1)) IN ('a', 'e', 'i', 'o', 'u'); Explanation: DISTINCT: removes duplicates. LOWER(...): ensures the comparison is case-insensitive. LEFT(CITY, 1): gets the first character. RIGHT(CITY, 1): gets the last character. The IN clause checks whether the first and last characters are vowels.

Answer 30

LEFT(column_name, character_number) Example: LEFT(CITY, 1): Gets the first character of the City name Example 2: LEFT(CITY, 1) IN ('a', 'e', 'i', 'o', 'u') Output: Pulls any names starting with a vowel. Example 3: LOWER(LEFT(CITY, 1)) IN ('a', 'e', 'i', 'o', 'u') Output: Pulls any names starting with a vowel after ensuring that the comparison is case sensitive (i.e., makes all characters lowercase so as to not inadvertently exclude pertinent values).

Answer 31

RIGHT(column_name, character_number) Example 1: RIGHT(CITY, 1) Output: Gets the last character of the city name Example 2: RIGHT(CITY, 1) IN ('a', 'e', 'i', 'o', 'u') Output: Pulls any names ending in vowel. Example 3: LOWER(RIGHT(CITY, 1) IN ('a', 'e', 'i', 'o', 'u') ) Output: Pulls any names starting with a vowel after ensuring that the comparison is case sensitive (i.e., makes all characters lowercase so as to not inadvertently exclude pertinent values).

Answer 32

lowercases values Example: LOWER(RIGHT(CITY, 1) IN ('a', 'e', 'i', 'o', 'u') ) Output: Pulls any names starting with a vowel after ensuring that the comparison is case sensitive (i.e., makes all characters lowercase so as to not inadvertently exclude pertinent values).

Answer 33

AND can be too strict. In the below example query, using AND (i.e., AND NOT LOWER) only produces results in which the city name does not start with a vowel or end with a vowel. Whereas, using OR (i.e., OR NOT LOWER) gives an output of city names that do not start with a vowel AND city names that do not end with a vowel. Example: Query the list of CITY names from STATION that either do not start with vowels or do not end with vowels. Your result cannot contain duplicates. Solution: SELECT DISTINCT CITY FROM STATION WHERE NOT LOWER(LEFT(CITY, 1)) IN ('a', 'e', 'i', 'o', 'u') OR NOT LOWER(RIGHT(CITY, 1)) IN ('a', 'e', 'i', 'o', 'u') ORDER BY CITY; SQL Logic Checklist: AND vs OR 1. Rephrase the condition in plain English. "I want cities that do not start with a vowel OR do not end with a vowel." → Clearly calls for OR, not AND. 2. Use a simple Venn diagram in your mind. AND = overlap (both conditions must be true). ✅ Great for narrowing results OR = union (either condition can be true). ✅ Great for widening results

Answer 34

SELECT Name FROM STUDENTS WHERE Marks > 75 ORDER BY LOWER(RIGHT(Name, 3)), ID;

Answer 35

SELECT ROUND(SUM(LAT_N), 2) AS sum_lat_n, ROUND(SUM(LONG_W), 2) AS sum_long_w FROM STATION

Answer 36

The ROUND() function rounds a number to a specified number of decimal places. Syntax: ROUND(number, decimals, operation/function) Parameter Description -number Required. The number to be rounded. -decimals Required. The number of decimal places to round number to. -operation/function Optional. It decides the rounding direction for values exactly in the middle of two numbers. By default, its value is 0. Any other value causes the function to truncate or cut off the input without rounding. Example 1: Round the number to 2 decimal places, and also use the operation parameter: SELECT ROUND(235.415, 2, 1) AS RoundValue; Output: 235.410 Example 2: Round the number to -1 decimal place: SELECT ROUND(235.415, -1) AS RoundValue; Output: 240.000 Example 3: SELECT ROUND(SUM(LAT_N), 2) AS sum_lat_n, ROUND(SUM(LONG_W), 2) AS sum_long_w FROM STATION Output: 42850.04 47381.48

Answer 37

The TRUNCATE() function truncates a number to the specified number of decimal places. Syntax: TRUNCATE(number, decimals) Example: SELECT TRUNCATE(345.156, 0); Output: 345

Answer 38

Option 1 (using aggregate; more concise/cleaner): SELECT TRUNCATE(MAX(LAT_N), 4) AS rounded_lat_n FROM STATION WHERE LAT_N < 137.2345; Option 2: SELECT TRUNCATE(LAT_N, 4) AS rounded_lat_n FROM STATION WHERE LAT_N < 137.2345 ORDER BY LAT_N DESC LIMIT 1;

Answer 39

SELECT ROUND(LAT_N, 4) FROM STATION WHERE LAT_N > 38.7780 ORDER BY LAT_N ASC LIMIT 1;

Answer 40

The ABS() function returns the absolute (positive) value of a number. Why is this needed? Any distance metric (e.g., the Manhattan Distance), must always be non‑negative. Why can't a distance metric be negative? -Distance measures magnitude only (i.e., how much), not direction (i.e., which way). -A negative number would imply "distance traveled backward," which has no meaning in this context. -Without ABS, you’re just summing signed differences, which can cancel out and give zero or even negative "distance," which is mathematically wrong. Syntax: ABS(number) Parameter Description number Required. A numeric value Example: SELECT ABS(-243.5); Output: 243.5

Answer 41

Shortest distance between 2 points. (Based on the grid-like system of Manhattan.) d = (b1- c1) + (b2 - c2) SQL Query to Solve the Provided Prompt: /* x1 or a: MIN(LAT_N) x2 or c: (MAX(LAT_N) y1 or b: MIN(LONG_W) y2 or d: MAX(LONG_W) P1 (x1, y1) P1 (a, b) (MIN(LAT_N), MIN(LONG_W)) AS p1 P2(X2, Y2) P2 (c,d) ((MAX(LAT_N), MAX(LONG_W))) AS p2 Manhattan Distance: (x1 - x2) + (y1 - y2) (MIN(LAT_N) - (MAX(LAT_N)) + (MIN(LONG_W) - MAX(LONG_W)) */ SELECT ROUND((MIN(LAT_N) - MAX(LAT_N)) + (MIN(LONG_W) - MAX(LONG_W)), 4) AS manhattan_distance_p1_p2 FROM STATION;

Answer 42

Euclidean Distance: The distance between two points/the straight line distance. Point A is (x1, y1) on the graph & Point B is (x2, y2) on the graph. Euclidean Distance Formula: d = √(x2 - x1)² + (y2 - y1)² Why we square before adding: Distances in different dimensions may have opposite signs: Example: moving right (positive x) but down (negative y). -Without squaring, differences could cancel out: (5−0)+(0−5)=0, implying zero distance, which is wrong. -Squaring: √(5)² + (-5)² = √25+25= 7.07, which correctly shows the actual straight-line distance. Reminder: -5²=25 (Squaring a negative number always results in a positive number.) Conclusion regarding the Euclidean Distance Formula: -Square → makes all contributions positive and weighted equally. Square root → converts the result back to the original distance scale (not "squared units"). Query for Provided Prompt: /* P1 (a, c) (x1, y1) P2 (b, d) (x2, y2) a or x1: MIN(LAT_N) c or y1: MIN(LONG_W) b or x2: MAX(LAT_N) d or y2: MAX(LONG_W) Euclidean Distance: The distance between two points/the straight line distance. Point A is (x1, y1) on the graph & Point B is (x2, y2) on the graph. Euclidean Distance Formula: d = √(x2 - x1)² + (y2 - y1)² SQRT(number) POWER(base number, exponent) */ SELECT ROUND( SQRT( POWER( MAX(LAT_N) - MIN(LAT_N) , 2) + POWER( MAX(LONG_W) - MIN(LONG_W) , 2) ) , 4) AS euclidian_distance FROM STATION;

Answer 43

SQL, MySQL The SQRT() function returns the square root of a number. Syntax: SQRT(number) Parameter Description number required. A number to calculate the square root of. Must be greater than 0. Example: SELECT SQRT(13)

Answer 44

SQL, MySQL The POWER() function returns the value of a number raised to the power of another number. Note: This function is equal to the POW() function. Both work in MySQL. Syntax: POWER (x, y) Parameter Description x Required. A number (the base). y Required. A number (the exponent). Example: Return 8 raised to the third power. SELECT POWER( 8, 3) AKA 8^3

Answer 45

Perform calculations (e.g., aggregation) on a specific subset of data, without losing the level of details of rows. That is, they keep every row without collapsing the dataset. 5 common windows functions: -Partitioned Aggregate -Running Total/Cumulative Sum -ROW_NUMBER -RANK() vs DENSE_RANK -Row-to Row Comparisons: LAG()/LEAD() Example 1: Dataset: emp_id department salary 1 Sales 5000 2 Sales 6000 3 Sales 7000 4 IT 8000 5 IT 9000 Using a Normal Aggregate (GROUP BY) SELECT department, AVG(salary) AS dept_avg_salary FROM employees GROUP BY department; Result: department dept_avg_salary Sales 6000 IT 8500 ❌ Problem: -You lost individual employee rows. -You only have one row per department. vs. Using a Window Function: SELECT emp_id, department, salary, AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary FROM employees; Result: emp_id department salary dept_avg_salary 1 Sales 5000 6000 2 Sales 6000 6000 3 Sales 7000 6000 4 IT 8000 8500 5 IT 9000 8500 ✅ Benefit: -Each row keeps its details (emp_id, salary) -We add an aggregate column without collapsing rows. Example 2: Same Value for All Rows Example If you omit PARTITION BY: SELECT emp_id, salary, SUM(salary) OVER () AS total_salary FROM employees; Results: emp_id salary total_salary 1 5000 35000 2 6000 35000 3 7000 35000 4 8000 35000 5 9000 35000 💡 This is still useful because now you can compute, for example, each employee's share of the total salary. salary / SUM(salary) OVER () AS pct_of_total Example 3: Running total Starting Data emp_id department salary 1 Sales 5000 2 Sales 6000 3 Sales 7000 4 Sales 8000 2️⃣ Running Total Query SELECT emp_id, salary, SUM(salary) OVER (ORDER BY emp_id) AS running_total FROM employees WHERE department = 'Sales'; 3️⃣ How the Window Works The window grows as we move down the rows (because of ORDER BY emp_id), and the sum is recalculated for each row. Step (Row) Window of Rows Considered SUM(salary) Row 1 (5000) 5000 Row 2 (5000, 6000) 11000 Row 3 (5000, 6000, 7000) 18000 Row 4 (5000, 6000, 7000, 8000) 26000 4️⃣ Final Output emp_id salary running_total 1 5000 5000 2 6000 11000 3 7000 18000 4 8000 26000

Answer 46

Windows function. Adds a group-level metric to every row without collapsing rows. Example: Per Group Average: SELECT emp_id, department, salary, AVG(salary) OVER (PARTITION BY department) AS dept_avg FROM employees; Results (input → output): emp_id dept salary dept_avg 1 Sales 5000 6000 2 Sales 7000 6000 3 IT 8000 8500 4 IT 9000 8500

Answer 47

Windows function. Adds a cumulative sum in a defined order. Example: SELECT order_date, revenue, SUM(revenue) OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_rev FROM sales_by_day; Results: date revenue running_rev 2025‑01‑01 100 100 2025‑01‑02 150 250 2025‑01‑03 200 450

Answer 48

Windows function. Unique sequential number per partition (no ties). Example: SELECT emp_id, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn FROM employees; Results: emp_id dept salary rn 2 Sales 7000 1 1 Sales 5000 2 4 IT 9000 1 3 IT 8000 2 Use for picking “top 1 per group” (filter rn = 1).

Answer 49

Windows function. Rank within a partition; tie handling differs. Example: SELECT emp_id, department, salary, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS drnk FROM employees; Results (ties at 7000): salary RANK DENSE_RANK 9000 1 1 7000 2 2 7000 2 2 5000 4 3 RANK skips numbers after ties; DENSE_RANK does not.

Answer 50

Windows function. Row-to-Row Comparisons. Peek previous/next row’s value (within a partition & order). Example: SELECT order_date, revenue, LAG(revenue, 1) OVER (ORDER BY order_date) AS prev_rev, LEAD(revenue, 1) OVER (ORDER BY order_date) AS next_rev, revenue - LAG(revenue, 1) OVER (ORDER BY order_date) AS delta_from_prev FROM sales_by_day; Results: o_date revenue prev_rev next_rev delta_from_prev 01‑01 100 NULL 150 NULL 01‑02 150 100 200 50 01‑03 200 150 NULL 50

Answer 51

Percent of total: revenue * 1.0 / SUM(revenue) OVER () AS pct_of_total Top‑N per group: use ROW_NUMBER() and filter (WHERE rn <= N). Reset windows by group: add PARTITION BY department (or your group key). Order matters: ORDER BY inside OVER() changes row context; add it for running totals, ranks, LAG/LEAD. Frame clauses: Be explicit for cumulative metrics: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Answer 52

COUNT(expression) SUM(expression) AVG(expression) MIN(expression) MAX(expression)

Answer 53

Window Functions perform calculations within a window. Aggregate Window Functions: COUNT(expression) SUM(expression) AVG(expression) MIN(expression) MAX(expression) Rank Window Functions: *ROW_NUMBER() *RANK() *DENSE_RANK() *CUME_DIST() *PERCENT_RANK() NTILE(number) Value (Analytics) Window Functions: LEAD(expression, offset, default) LAG(expression, offset, default) FIRST_VALUE(expression) NOTES: -All functions with an asterisk * are empty. No number or expression should be entered within its parentheses. -Expression: an argument that you pass to a function. Example of function & expression: AVG(Sales) OVER(PARTITION BY Category OVER OrderDate ROWS UNBOUNDED PRECEEDING) -AVG: Window Function -Sales: Expression NOTES cont'd: A function expression can be: -empty, e.g., RANK() OVER (ORDER BY OrderDate) -use a column header, e.g., AVG(Sales) OVER (ORDER BY OrderDate) -use a number, e.g., NTILE(2) OVER (ORDER BY OrderDate) -have multiple arguments, e.g., LEAD(Sales, 2, 10) OVER (ORDER BY OrderDate) -use conditional logic, e.g., SUM(CASE WHEN Sales>100 THEN 1 ELSE 0 END) OVER (ORDER BY OrderDate) NOTES cont'd: -We use "OVER()" to 1) tell SQL that we're using a window function, otherwise SQL will think we're using a regular GROUP BY function, e.g., AVG(Sales) OVER( & 2) define a window/subset of data. -Everything inside the OVER() function is optional, i.e., PARTITION BY, ORDER BY, the window frame (i.e., ROWS UNBOUNDED PRECEEDING)

Answer 54

All columns in SELECT must be included in GROUP BY So, for example, the below query would result in an error message. Prompt: Find the total sales for each product & provide other details like order ID and order date. SELECT OrderID, OrderDate, ProductID, SUM(Sales) TotalSales FROM Sales.Orders GROUP BY ProductID Solution: Use a Window function instead: SELECT OrderID, OrderDate, ProductID, SUM(Sales) OVER(PARTITION BY ProductID) TotalSalesByProduct FROM Sales.Orders Output: (No header) OrderID OrderDate ProductID TotalSalesByProduct 1 1 25-01-01 101 140 2 3 25-25-01-10 101 140 6 2 25-01-05 102 105

Answer 55

Operates like GROUP BY. It's just the wording that's used for Window functions. Example: Prompt: Find the total sales for each product & provide other details like order ID and order date. SELECT OrderID, OrderDate, ProductID, SUM(Sales) OVER(PARTITION BY ProductID) TotalSalesByProduct FROM Sales.Orders Output: (No header) OrderID OrderDate ProductID TotalSalesByProduct 1 1 25-01-01 101 140 2 3 25-25-01-10 101 140 6 2 25-01-05 102 105 So, PARTITION BY divides the rows into groups, based on the columns. Example: Data set: Month Product Sales Jan Bottle 20 Jan Caps 10 Jan Bottle 30 Feb Gloves 5 Feb Caps 70 Feb Gloves 40 SUM(Sales) OVER(): Calculation is done on the entire dataset. Month Product Sales Sum Jan Bottle 20 175 Jan Caps 10 175 Jan Bottle 30 175 Feb Gloves 5 175 Feb Caps 70 175 Feb Gloves 40 175 vs. SUM(Sales) OVER(PARTITION BY Month) : Calculation is done individually on each window. Month Product Sales Sum Window 1: Jan Bottle 20 175 Jan Caps 10 175 Jan Bottle 30 175 Window 2: Feb Gloves 5 175 Feb Caps 70 175 Feb Gloves 40 175 NOTE: You can partition the data in 3 main ways. 1) Without a PARTITION BY Total sales across all rows (i.e., the entire result set) SUM(Sales) OVER() 2) PARTITION BY a single column Total sales for each product SUM(Sales) OVER (PARTITION BY Product) 3) PARTITION BY combined rows Total sales for each combination of product and order status SUM(Sales) OVER (PARTITION BY Product, OrderStatus) Example of PARTITION BY a single column vs combined rows: PARTITION BY a single column: PARTITION BY Product SELECT OrderID, Product, OrderStatus, Sales, SUM(Sales) OVER (PARTITION BY Product) AS TotalSalesPerProduct FROM Orders; Output: OrderID Product OrderStatus Sales TotalSalesPerProduct 1 Apple Shipped 100 300 2 Apple Pending 50 300 3 Apple Shipped 150 300 4 Banana Shipped 200 400 5 Banana Pending 75 400 6 Banana Pending 125 400 Explanation: Here, we’re grouping rows only by Product: -All "Apple" rows are treated as one partition → total = 100 + 50 + 150 = 300 -All "Banana" rows are treated as one partition → total = 200 + 75 + 125 = 400 PARTITION BY combined rows: PARTITION BY Product, OrderStatus SELECT OrderID, Product, OrderStatus, Sales, SUM(Sales) OVER (PARTITION BY Product, OrderStatus) AS TotalSalesPerProductAndStatus FROM Orders; Output: OrderID Product OrderStatus Sales TotalSalesPerProductAndStatus 1 Apple Shipped 100 250 2 Apple Pending 50 50 3 Apple Shipped 150 250 4 Banana Shipped 200 200 5 Banana Pending 75 200 6 Banana Pending 125 200 Explanation: Now, we’re grouping rows by the combination of Product AND OrderStatus: -Apple + Shipped → 100 + 150 = 250 -Apple + Pending → 50 -Banana + Shipped → 200 -Banana + Pending → 75 + 125 = 200 NOTE: I believe the above example shows a combination of a fact (sales) AND dimensions (order status, product)

Answer 56

GROUP BY: Simple data analysis (aggregations) Window: Advanced data analysis (aggregations + details)

Answer 57

Window Function + Partition Clause AND/OR Order Clause AND/OR Frame clause -Window Function: Calculation used on the window -Partition Clause: Divides the dataset into windows (partitions) -Order Clause: Sorts the data in a window -Frame Clause: Define a subset of rows in a window NOTES: -You must use ORDER BY function for the rank and value window functions, i.e., ---rank window functions: ROW_NUMBER, RANK, DENSE_RANK, CUME_DIST, PERCENT_RANK, NTILE, ---value window functions: LEAD, LAG, FIRST_VALUE. -You don't need the ORDER BY function for the aggregate window functions (i.e., COUNT, SUM, MIN, MAX). Example: AVG(Sales) OVER (PARTITION BY Category ORER BY OrderDate ROWS UNBOUNDED PRECEDING)

Answer 58

Divides the dataset into windows (partitions) -UNBOUNDED FOLLOWING: the last row within a window. -UNBOUNDED PRECEDING: the first row within a window -N-PRECEDING: The n-th row before the current row. Example frames: ROWS BETWEEN CURRENT ROW n FOLLOWING -n should be a number. -The frame shifts down after each result. ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING -"UNBOUNDED FOLLOWING" is the last row. ROWS BETWEEN n PRECEEDING AND CURRENT ROW -n should be a number. ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW -"UNBOUNDED PRECEEDING" is the first row in the data set. ROWS BETWEEN n PRECEEDING AND n FOLLOWING -n should be a number. ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUND FOLLOWING -Includes all data points.

Answer 59

Window Function, one of the rank functions. NOTES: - RANK() expression is always blank, i.e., don't enter anything in the parentheses. -You must use ORDER BY function for the rank and value window functions, i.e., ---rank window functions: ROW_NUMBER, RANK, DENSE_RANK, CUME_DIST, PERCENT_RANK, NTILE, ---value window functions: LEAD, LAG, FIRST_VALUE. Example 1: RANK() OVER (PARTITION BY Month ORDER BY Sales DESC) Original Dataset: Month Product Sales Jan Bottle 20 Jan Caps 10 Jan Bottle 30 Feb Gloves 5 Feb Caps 70 Feb Gloves 40 Output: Month Product Sales Rank Jan Bottle 30 1 Jan Caps 20 2 Jan Bottle 10 3 ------ Feb Gloves 70 1 Feb Caps 40 2 Feb Gloves 5 3 Example 2: Prompt: Rank each order based on their sales from highest to lowest & provide details such as order ID and order date. SELECT OrderID, OrderDate, Sales RANK() OVER (ORDER BY Sales DESC) RankSales FROM Sales.Orders Output: (Blank) OrderID OrderDate Sales RankSales 1 8 25-02-18 90 1 2 4 25-1-20 60 2 3 10 25-03-15 60 2 5 6 25-02-05 50 4

Answer 60

Defines a subset of rows within each window that is relevant for the chosen calculation. frame, visual with written explanation: { [ () ] } Entire Data>PARTITION BY: Window 1 & Window 2>Frame: frame within window 1 & frame within window 2 {}: Entire Dataset []: Partition BY (): Frame Syntax: AVG(Sales) OVER (PARTITION BY Category ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING) Frame Types: ROWS or RANGE (above syntax example uses ROWS) Frame Boundary (lower value): CURRENT ROW, N PRECEEDING, or UNBOUNDED PRECEDING (The above syntax example uses CURRENT ROW) Frame Boundary (higher value): CURRENT ROW, N FOLLOWING, or UNBOUNDED FOLLOWING (The above syntax example uses UNBOUNDED PRECEDING) NOTES: -You must use ORDER BY if you want to use the frame clause (i.e., ROWS or RANGE) -The lower values must be BEFORE the higher boundary.

Answer 61

Example: SUM(Sales) OVER(ORDER BY Month ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS Result In a window frame like ROWS BETWEEN CURRENT ROW AND …, “CURRENT ROW” means “the row currently being evaluated in the window,” not “the first row of the data set.” A few key points: -It’s relative and re-evaluated per row. For each output row, the “current row” is that row itself. So the frame starts at different places as you move down the result. -Partitions matter. “Current row” is defined within the window’s PARTITION BY (if any). It’s not the first row of the whole table—only the first of its partition when the evaluation is on that first row. -Order matters. The “current row” is with respect to the window’s ORDER BY. Only when you’re evaluating the first ordered row of a partition does “current row” coincide with the first row. -ROWS vs RANGE. With ROWS, CURRENT ROW refers to that single physical row. With RANGE, CURRENT ROW typically expands to include all “peer” rows that tie on the ORDER BY value.

Answer 62

Example: SUM(Sales) OVER(ORDER BY Month ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS Result Original Dataset: Month Sales Jan 20 Feb 10 Mar 30 Apr 5 Jun 70 Step 1: Month Sales Result Current Row: Jan 20 60 Feb 10 2. Following: Mar 30 Apr 5 Jun 70 Step 2: Frame shifts down a row. Month Sales Result Jan 20 60 Current Row: Feb 10 45 Mar 30 2. Following: Apr 5 Jun 70 *Step 3: Frame shifts down another row. Month Sales Result Jan 20 60 Feb 10 45 Current Row: Mar 30 105 Apr 5 2. Following: Jun 70 *Step 4: Frame shifts down another row Month Sales Result Jan 20 60 Feb 10 45 Mar 30 105 Current Row: Apr 5 75 Jun 70 2. Following: *Step 5: Frame shifts down another row Month Sales Result Jan 20 60 Feb 10 45 Mar 30 105 Apr 5 75 Current Row: Jun 70 70 2. Following:

Answer 63

Example: SUM(Sales) OVER (ORDER BY Month ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) Month Sales Result Jan 20 Current Row: Feb 10 115 Mar 30 Apr 5 UNBOUNDED FOLLOWING: Jun 70

Answer 64

Example: SUM(Sales) OVER(ORDER BY Month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -UNBOUNDED PRECEDING: the first row within a window Step 1: Month Sales Result UNBOUND PROCEDING Jan 20 Feb 10 CURRENT ROW: Mar 30 60 Apr 5 Jun 70 Step 2: Frame shifts down. Note that the frame will always be fixed at the first row, because the frame specifies UNBOUNDED PRECEDING. Month Sales Result UNBOUND PRECEDING Jan 20 Feb 10 Mar 30 60 CURRENT ROW: Apr 5 65 Jun 70

Answer 65

Example: SUM(Sales) OVER (ORDER BY Month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) Month Sales Result Jan 20 PRECEDING Feb 10 CURRENT ROW: Mar 30 45 1 FOLLOWING: Apr 5 Jun 70

Answer 66

Example: SUM(Sales) OVER (ORDER BY Month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUND FOLLOWING) UNBOUNDED FOLLOWING: the last row within a window. UNBOUNDED PRECEDING: the first row within a window Month Sales Result UNBOUNDED PRECEDING Jan 20 135 Feb 10 135 CURRENT ROW Mar 30 135 Apr 5 135 UNBOUNDED FOLLOWING Jun 70 135

Answer 67

Output: (Blank) OrderID OrderDate OrderStatus Sales TotalSales 1 1 25-01-01 Delivered 10 55 2 3 21-01-10 Delivered 20 95 3 5 25-02-01 Delivered 25 105 4 6 25-02-05 Delivered 50 80 5 7 25-02-15 Delivered 30 30 6 2 25-01-05 Shipped 15 165 Notes: -Row 4 is just 50+ 30 & Row 5 is just 30, because row 5 is the last record of the current partition.

Answer 68

Longhand: ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING Shorthand: ROWS 2 FOLLOWING NOTES: -Compact frames only work with preceding or unbounded preceding. (It does not work with unbounded following or following.) -Compact frame does not seem to be a standard term in SQL. (Only saw it referenced by Baraa and nowhere else.) Longhand example: SELECT OrderID, OrderDate OrderStatus, Sales, SUM(Sales) OVER (PARTITION BY OrderStatus ORDER BY OrderDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) TotalSales FROM Sales.Orders Shorthand example (with same results as above longhand example): SELECT OrderID, OrderDate, OrderStatus, Sales, SUM(Sales) OVER (PARTITION BY OrderStatus ORDER BY OrderDate ROWS 2 PRECEDING) TotalSales FROM Sales.Orders

Answer 69

SQL uses a default frame if ORDER BY is used without a frame. Note: Even though no frame is specified in the below example, you can tell there is a hidden frame. Example: SELECT OrderID, OrderDate, OrderStatus, Sales SUM(Sales) OVER (PARTITION BY OrderStatus ORDER BY OrderDate) TotalSales FROM Sales.Orders; SQL Background Process: 1. The data is partitioned by OrderStatus (i.e., by Delivered and Shipped). 2. SQL orders the rows in each respective partition by OrderDate ASC 3. Default Frame: UNBOUNDED PRECEDING AND CURRENT ROW. UNBOUNDED PRECEDING: First row in each partition. Output: (Blank) OrderID OrderDate OrderStatus Sales TotalSales Start of Frame1: 1 1 25-01-01 Delivered 10 10 2 3 21-01-10 Delivered 20 30 3 5 25-02-01 Delivered 25 55 4 6 25-02-05 Delivered 50 105 End of Frame1: 5 7 25-02-15 Delivered 30 135 Start of Frame2 6 2 25-01-05 Shipped 15 15 7 4 25-01-20 Shipped 60 75

Answer 70

1) Window functions can only be used in the SELECT and ORDER BY clauses. Example, SELECT clause: SELECT OrderID, OrderDate, OrderStatus, Sales, SUM(Sales) OVER (PARITION BY OrderStatus) TotalSales FROM Sales.Orders Example, ORDER BY clause: SELECT OrderID, OrderDate, OrderStatus, Sales, SUM(Sales) OVER(PARTITION BY OrderStatus) TotalSales FROM Sales.Orders ORDER BY SUM(Sales) OVER (PARTITION BY OrderStatus) Note: Window functions can't be used to filter data. The following results in an error message: WHERE SUM(Sales) OVER (PARTITION BY OrderStatus) > 100 2) Nesting window functions is not allowed. The below example would result in an error message. SELECT OrderID, OrderDate, OrderStatus, Sales, SUM( SUM(Sales) OVER (PARITION BY OrderStatus) )TotalSales FROM Sales.Orders 3) SQL executes window functions AFTER the WHERE clause. So, SQL will execute WHERE ProductID IN (101, 102) BEFORE it executes SUM(Sales) OVER (PARITION BY OrderStatus) TotalSales SELECT OrderID, OrderDate, OrderStatus, Sales, SUM(Sales) OVER (PARTITION BY OrderStatus) TotalSales FROM Sales.Orders WHERE ProductID IN (101, 202) Step 1: SQL will pull all requested columns where product ID is 101 or 102 Step 2: SQL will partition the results by OrderStatus (i.e., Delivered vs Shipped) Step 3: SQL will sum the sales for each partition. Output: (Blank) OrderID OrderDate OrderStatus Sales TotalSales 1 3 25-01-10 Delivered 20 60 2 7 21-02-15 Delivered 30 60 3 1 25-01-01 Delivered 10 60 4 2 25-01-05 Shipped 15 185 5 8 25-02-18 Shipped 90 185 6 9 25-03-10 Shipped 20 185 7 10 25-03-15 Shipped 60 185 4) A Window function can be used together with the GROUP BY in the same query only if the same columns are used. Example: -- Rank customers based on their total sales SELECT CustomerID, SUM(Sales) TotalSales, RANK() OVER(ORDER BY SUM(Sales) DESC) RankCustomers FROM Sales.Orders GROUP BY CustomerID Output: (Blank) ColumnID TotalSales RankCustomers 1 3 125 1 2 1 110 2 3 4 90 3 4 2 55 4

Answer 71

Use GROUP BY for simple aggregations. (Use window functions for complex aggregations.) Adding more columns to the GROUP BY function splits the data into more granular groups, changing the number of rows you get back. You can think of GROUP BY as defining "buckets" into which SQL groups rows BEFORE applying aggregate functions like SUM or COUNT or AVG. More columns = more buckets. Example Dataset: OrderID Customer Product OrderStatus Sales 1 Alice Laptop Shipped 1000 2 Alice Mouse Shipped 20 3 Bob Laptop Pending 1000 4 Bob Mouse Shipped 25 5 Alice Laptop Shipped 900 Query A: Group by one value SELECT Customer, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer; Output Customer TotalSales Alice 1920 Bob 1025 Here, all rows for the same customer are grouped together, ignoring product or order status. Query B: Group by two values SELECT Customer, Product, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer, Product; Output Customer Product TotalSales Alice Laptop 1900 Alice Mouse 20 Bob Laptop 1000 Bob Mouse 25 Now we split the buckets further, grouping by both customer and product. Instead of 2 rows, we get 4 rows because each combination is a separate group. Query C: Group by even more values SELECT Customer, Product, OrderStatus, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer, Product, OrderStatus; Output Customer Product OrderStatus TotalSales Alice Laptop Pending 900 Alice Laptop Shipped 1000 Alice Mouse Shipped 20 Bob Laptop Pending 1000 Bob Mouse Shipped 25 Here, grouping becomes even more granular because we're differentiating based on status too. Key rules to remember: -Only non-aggregated columns need to appear in GROUP BY. -Columns that are wrapped in an aggregate function do not need to be listed there. Key Takeaways -GROUP BY does not filter data, it aggregates rows into groups. -Every column in SELECT that’s not inside an aggregate function must appear in the GROUP BY, otherwise SQL doesn’t know how to collapse multiple rows into one. -Adding more columns in GROUP BY splits the groups into smaller subsets, producing more rows. -If you want to filter groups, you do that with: ---WHERE (filters before grouping) ----HAVING (filters after grouping)

Answer 72

WHERE and HAVING can filter results, but they’re applied at different stages of query processing. -WHERE filters rows before grouping or aggregation happens. -HAVING filters groups after aggregation has been calculated. Tip: Think of WHERE as row-level filtering, while HAVING is group-level filtering. How SQL processes a query (order of execution): 1) FROM -- Identify source tables and joins 2) WHERE -- Filter individual rows (raw data) 3) GROUP BY -- Put the rows in buckets 4) Aggregate functions -- Calculate aggregations (SUM, COUNT, AVG, MIN, & MAX) 5) HAVING -- Filter groups based on aggregated results 6) SELECT -- Return chosen columns or expressions (can now include aggregates) 7) DISTINCT -- Remove duplicate rows from the final result set. 8) ORDER BY -- Sort the results 9) TOP/LIMIT -- Return only a subset of rows 7) ORDER BY – sort the output Example 1 (WHERE) & 2 (HAVING) Dataset: OrderID Customer Product OrderStatus Sales 1 Alice Laptop Shipped 1000 2 Alice Mouse Shipped 20 3 Bob Laptop Pending 1000 4 Bob Mouse Shipped 25 5 Alice Laptop Pending 900 Example 1: WHERE SELECT Customer, Product, SUM(Sales) AS TotalSales FROM Sales.Orders WHERE OrderStatus = 'Shipped' -- row-level filter GROUP BY Customer, Product ORDER BY Customer, Product; Output (only shipped rows are summed): Customer Product TotalSales Alice Laptop 1000 Alice Mouse 20 Bob Mouse 25 Takeaways: -WHERE filters rows before grouping -Only rows where OrderStatus='Shipped' are included in the aggregation. -Alice’s Pending laptop sale (900) is excluded, so her Laptop total is 1000. vs. Example 2: HAVING SELECT Customer, Product, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer, Product HAVING SUM(CASE WHEN OrderStatus = 'Shipped' THEN 1 ELSE 0 END) > 0 -- group-level filter ORDER BY Customer, Product; Output (groups with any shipped row are kept; totals include all rows in the group): Customer Product TotalSales Alice Laptop 1900 Alice Mouse 20 Bob Mouse 25 Takeaways: -HAVING keeps groups that have at least one shipped row -All rows are aggregated; we then keep groups where there is any shipped row. -The totals, however, include all statuses because WHERE did not filter them out. -Because the group (Alice, Laptop) has at least one Shipped row, it’s kept—but the SUM(Sales) adds both the shipped 1000 and the pending 900, yielding 1900. -Bob’s Laptop group has no shipped rows, so that group is removed entirely. Why the WHERE and HAVING queries differ: -WHERE removes non-shipped rows before grouping → shipped-only totals. -HAVING can test for “shipped-ness” after grouping (using a conditional aggregate) → keeps groups that have shipped rows, but totals may still include non-shipped rows unless you also make the SUM conditional. Using WHERE and HAVING together in a query SELECT Customer, Product, SUM(Sales) AS TotalSales FROM Sales.Orders WHERE OrderStatus = 'Shipped' -- filter individual rows GROUP BY Customer, Product HAVING SUM(Sales) > 500; -- filter aggregated groups Output: Customer Product TotalSales Alice Laptop 1000 Alice Mouse 20 Bob Mouse 25 Here: -WHERE ensures only shipped orders are grouped. -HAVING removes any group below a total of 500 (so “Mouse” rows would actually drop if we set HAVING > 500). ✅ Summary Clause Filters data at… Can use aggregates? WHERE Row level (before GROUP) ❌ No HAVING Group level (after GROUP) ✅ Yes

Answer 73

It doesn’t need to be listed in GROUP BY—and in fact, you should not put it there when it’s inside an aggregate function like SUM(). Here’s why: 1️⃣ Two categories of columns in a SELECT with GROUP BY: When you write a SELECT with GROUP BY, every column falls into one of two buckets: -Bucket A: Grouping columns – these appear in GROUP BY because you want one row per unique combination of these values. -Bucket B: Aggregate expressions – these calculate something across each group (e.g., SUM(Sales), COUNT(*), MAX(Date)). SQL says: Every selected column must be either be: -in the GROUP BY (defining the group), or -wrapped in an aggregate function (defining a calculation over that group). Sales is not listed in GROUP BY in the below example because we’re not grouping by individual sales rows, we’re summing them up. If you added Sales to the group, you’d undo the aggregation because you'd group by individual sales values instead of combining them. 2️⃣ Example: What happens if you (incorrectly) add Sales? SELECT Customer, Sales, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer, Sales; Output: Customer Sales TotalSales Alice 1000 1000 Alice 900 900 Alice 20 20 Bob 1000 1000 Bob 25 25 Here, each distinct Sales value becomes its own group. The SUM doesn’t actually combine Alice’s orders into 1920 because you forced SQL to keep them separate. This defeats the purpose of aggregation. ✅ Correct approach: SELECT Customer, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer; Now Sales is aggregated, not grouped by individual amounts. Output: Customer TotalSales Alice 1920 Bob 1025 Key rules to remember: -Only non-aggregated columns need to appear in GROUP BY. Columns that are wrapped in an aggregate function do not (and SHOULD NOT) be listed there.

Answer 74

A simplified logical order of execution: 1) FROM -- Identify source tables and joins 2) WHERE -- Filter individual rows (raw data) 3) GROUP BY -- Put the rows in buckets 4) Aggregate functions -- Calculate aggregations (SUM, COUNT, AVG, MIN, & MAX) 5) HAVING -- Filter groups based on aggregated results 6) SELECT -- Return chosen columns or expressions (can now include aggregates) 7) DISTINCT -- Remove duplicate rows from the final result set. 8) ORDER BY -- Sort the results 9) TOP/LIMIT -- Return only a subset of rows

Answer 75

-Use WHERE when you need to filter individual rows BEFORE aggregation. -Use HAVING when you need to filter AFTER aggregation, especially based on the results of aggregate functions (e.g., SUM, COUNT, AVG, MIN, MAX). Prompt 1: "Show me total sales per customer, but only for orders that were shipped." Answer: Use WHERE Prompt 2: "Show me customers who have more than $2000 in total sales across all their orders." Answer: Use HAVING Prompt 1: "Show me total sales per customer, but only for orders that were shipped." SELECT Customer, SUM(Sales) AS TotalSales FROM Sales.Orders WHERE OrderStatus = 'Shipped' -- filter rows first GROUP BY Customer; Why WHERE? -We know which rows we want before grouping (OrderStatus='Shipped'), so filtering early reduces unnecessary work and avoids counting irrelevant rows. Prompt 2: "Show me customers who have more than $2000 in total sales across all their orders." SELECT Customer, SUM(Sales) AS TotalSales FROM Sales.Orders GROUP BY Customer HAVING SUM(Sales) > 2000; -- filter after aggregation Why HAVING? -We can’t know which customers meet this condition until after we’ve grouped and summed their sales. WHERE cannot filter on SUM(Sales) because that value doesn’t exist until after grouping.

Answer 76

a lot of analysts start with Standard SQL and then adjust the dialect they use based on what database they are working with. Standard SQL works with a majority of databases and requires a small number of syntax changes to adapt to other dialects. As a junior data analyst, it is important to know that there are slight differences between dialects. But by mastering Standard SQL, which is the dialect you will be working with in this program, you will be prepared to use SQL in any database.

Answer 77

A database is a computer program that can store and process large amounts of data. There are many different vendors of databases. PostgreSQL, MySQL, Oracle, and SQL Server are all database products by different vendors. SQL is the programming language used to talk to these databases, and each database product has its own variant of SQL. We call these variants SQL dialects.

Answer 78

As the amount of data in databases grew, NoSQL databases started gaining popularity. MongoDB, Cassandra, and Redis are among the most widely used NoSQL tools. Interestingly, most of these tools also include the capabilities for retrieving data with SQL. For that reason, some people prefer to call it “Not only SQL” rather than “NoSQL.”

Answer 79

Standard SQL. MySQL Microsoft SQL Server/MS SQL/SQL Server PostgreSQL Transact-SQL, often abbreviated “T-SQL,” is the dialect of SQL used in the Microsoft SQL Server database. Simply put, the SQL Server’s version of SQL has its own proper name. No other database has a proper name for their SQL dialect. What about PL/SQL and PL/pgSQL? These are languages with which to write your own functions. When you write a piece of SQL code, you usually use the constructions and functions that are predefined in your database. However, all databases allow you to create your own functions or stored procedures. These are called user-defined functions (UDF) and user-defined stored procedures (USP). The language in PostgreSQL is called PL/pgSQL. PostgreSQL’s version was inspired by Oracle’s version; that’s why the names are similar.

Answer 80

Byte (B): 8 bits Example (with approximate size): 1 character in a string (1 byte) Kilobyte (KB): 1024 bytes Example (with approximate size): A page of text (4 kilobytes) Megabyte (MB): 1024 Kilobytes Example (with approximate size): 1 song in MP3 format (2-3 megabytes) Gigabyte (GB) 1024 Megabytes Example (with approximate size): 300 songs in MP3 format (1 gigabyte) Terabyte (TB): 1024 Gigabytes Example (with approximate size): 500 hours of HD video (1 terabyte) Petabyte (PB): 1024 Terabytes Example (with approximate size): 10 billion Facebook photos (1 petabyte) Exabyte (EB): 1024 Petabytes Example (with approximate size): 500 million hours of HD video (1 exabyte) Zettabyte (ZB): 1024 Exabytes Example (with approximate size): All the data on the internet in 2019 (4.5 ZB)

Answer 81

Anytime an uploaded local dataset name is typed in the FROM section of a BigQuery query, the dataset and data table file path name will always be preceded by the project name. Here is a template of the file path structure: personal project name.customer_data.customer_address Where, your personally customized project name will be inserted before the dataset name. If you don't insert your personal project name, it may cause an error to occur in your query. Keep this technique in mind for future lessons anytime you are typing a local file path name in the FROM section of a query. Example: SELECT DISTINCT fuel_type FROM your project name.cars.car_info LIMIT 1000 FROM clause of the syntax above, you will need to begin the Table ID line with your personalized project name, period, the dataset name, period, and end with the table name.

Answer 82

The SUBSTRING function in SQL is used to extract a portion of a string (a substring) from a larger string. Example: SUBSTR(country, 1, 2) country: the column 1: start with the first letter 2: pull two letters Output: US (not USA)

Answer 83

Can be used to convert anything from one data type to another. Example: CAST(purchase_price AS FLOAT64) BigQuery stores FLOAT as "FLOAT64". That isn't the case in other SQL platforms. (It just means that were casting numbers in the 64-bit system as FLOATs)

Answer 84

A number that contains a decimal

Answer 85

Converting data from one type to another.

Answer 86

Adds strings together to create new text strings that can be used as unique keys.

Answer 87

Can be used to return non-null values in a list. Example: COALESCE(product, product_code) AS product_info product: The first column to check product_code: The column to check if the first column value is null In the above example, we wanted a list of product names, but if the name isn't available, we want the product code.

Answer 88

Adds new data to a database

Answer 89

Change existing data in a database

Answer 90

Remove data from a database

Answer 91

A process to confirm that a data-cleaning effort was well-executed and the resulting data is accurate and reliable.

Answer 92

A file containing a chronologically ordered list of modifications made to a project.

Answer 93

SELECT Customer_id, CASE WHEN first_name = 'Tnoy' THEN 'Tony' ELSE first_name END AS cleaned_name FROM project-id.customer_data.customer_name

Answer 94

A tool that automatically searches for and eliminates duplicate entries from a spreadsheet

Answer 95

Numeric CAST function can convert to: - Integer - Numeric (number) - Big number - Floating integer - String String CAST function can convert to: - Boolean - Integer - Numeric (number) - Big number - Floating integer - String - Bytes - Date - Date time - Time - Timestamp Date CAST function can convert to: - String - Date - Date time - Timestamp

Answer 96

American National Standards Institute

Answer 97

Syntax: CAST(expression AS typename) Example: SELECT CAST(MyCount AS STRING) FROM MyTable

Answer 98

Using the CAST function in a query that fails returns an error in BigQuery. To avoid errors in the event of a failed query, use the SAFE_CAST function instead. The SAFE_CAST function returns a value of Null instead of an error when a query fails. Syntax (same as for CAST): SELECT SAFE_CAST(MyDate AS STRING) FROM MyTable

Answer 99

a method you can use to import data from one table to another is to use the INSERT INTO command together with a SELECT statement. Syntax: INSERT INTO [destination_table_name] SELECT [column names, separated by commas, or * for all columns] FROM [source_table_name] WHERE [condition]

Answer 100

Join together two or more text strings, such as combining street addresses and primary contacts in a business’ vendor database. You can add multiple items by separating them with commas. Syntax: =CONCATENATE(item 1, item 2) Where appropriate, such as when you’re combining a customer’s first and last name, you should add a space between the items you’re combining by typing quotation marks space quotation marks [“ ”] between the items. Separate this information by a comma as well. This would change the formula to: =CONCATENATE(item 1, " ", item 2).

Answer 101

In SQL, use the CONCAT function to join strings together to create new text strings. You might combine data simply to improve the readability of reports (such as combining a customer’s first and last name when generating a customer list). Or, you might combine data to generate a unique identifier for the rows in a table. Syntax: In SQL, use the CONCAT function to join strings together to create new text strings. You might combine data simply to improve the readability of reports (such as combining a customer’s first and last name when generating a customer list). Or, you might combine data to generate a unique identifier for the rows in a table. Example: CONCAT('Google', '.com') Output: Google.com

Answer 102

BigQuery stores numbers in a 64-bit memory system, which is why there's a 64 after integer in this case.

Answer 103

LIMIT function should appear at the end of the query to limit results (e.g., LIMIT 1000 or LIMIT 100 or LIMIT 10) Example: SELECT usertype, CONCAT(start_station_name, " to ", end_station_name) AS route, COUNT(*) AS num_trips, ROUND(AVG(CAST(tripduration AS int64)/60), 2) AS duration_minutes FROM `bigquery-public-data.new_york.citibike_trips` GROUP BY start_station_name, end_station_name, usertype ORDER BY num_trips DESC LIMIT 10

Answer 104

CONCAT: You can only combine two items (e.g., JeffBaker). NOTE: A space would be a third item, so JeffBaker is smooshed together. CONCATENATE: You can combine multiple items (e.g., May 30, 1955)

Answer 105

CONCAT With Separator It's similar to CONCAT, but it includes a separator, such as a space or period, between the strings. Example: SELECT CONCAT_WS('.', 'www', 'your_company', 'com') as website FROM web_data; The output includes a period ('.') between each part of the website URL, ensuring that the URL is in correct, navigable format. Example: CONCAT_WS (' . ', 'www', 'google', 'com') Output: www.google.com

Answer 106

In BigQuery, you can use the || operator to concatenate strings. For instance, if you're working with a dataset of book information and want to create a full title by combining the book's name and its edition, you could use ||, like so: SELECT book_name || ' - ' || edition AS full_book_title FROM library; Ouput: Pride & Prejudice-first edition Note: In some other SQL environments, you cannot use the || operator to concatenate strings. You must use + instead. For example, to concatenate the strings 'Google' and '.com' in Microsoft SQL server, you would use: SELECT 'Google' + '.com' Example: 'Google' || '.com' Output: Google.com

Answer 107

Traditional Relational Database Management Systems (RDBMS): BigQuery, MySQL, Postgre SQL, Microsoft SQL Server, Oracle Database, MariaDB, SQ Lite Cloud Data Warehouses & Analytics Platforms: Snowflake, Amazon Redshift, Azure Synapse Analytics Other Data Platforms with SQL Interfaces: Apache Hive, Presto/Trino

Answer 108

CONCATENATE: Concatenate multiple strings. (more than 2 items) =CONCATENATE(item 1, " ", item 2) CONCAT: Concatenate strings to create new text strings. (2 items max) Example: =CONCAT('Google', '.com') Output: Google.com CONCAT_WS: Concatenate two or more strings together with a separator between each string Example: =CONCAT_WS(' . ', 'www', 'google', 'com') Output: www.google.com ||: Concatenate two or more strings together with the || operator Example: 'Google' || '.com' Output: Google.com

Answer 109

MOD function calculates the remainder of a division operation. Syntax: MOD(x, y) X:The dividend (the number being divided) Y: The divisor (the number by which x is divided Example 1: MOD(10, 3) Output: 1 (10 divided by 3 is 3 with a remainder of 1) Example 2: MOD(19, 4) Output: 3 (19 divided by 4 is 4 with a remainder of 3) Use cases: -Determining Divisibility: checking if a number is divisible by another. -Data Distribution/Bucketing: Distributing data into groups based on a numeric pattern (e.g., MOD(item_id, bucket_count) assigns items to specific buckets). -Pattern Analysis: Identifying and analyzing repeating patterns in data, such as in time series.

Answer 110

The number being divided. For example: 20/2 20 is the dividend.

Answer 111

The number by which a number is divided. For example: 20/2 2 is the divisor.

Answer 112

A programming concept. It's a collection of values. Example: An array named my_fruits has the values 'banana', 'apple' and 'orange' stored inside of it. Each value in the array has a position (called an index) which starts at 0. my_fruits = ['banana', 'apple', 'orange'] Name of the array: my_fruits Values: banana, apple, orange Indexes: 0 (banana), 1 (apple), and 2 (orange) Purpose? Arrays make it easier to work with groups of values compared to using a separate variable for each value. So, instead of creating 3 values (e.g., fruit1 = 'banana' fruit2 = 'apple' fruit3 = 'orange), you care an array (e.g., my_fruits = ['banana', 'apple', 'orange'] With an array, you can: -Store a collection of numbers, words, or objects -Access any value using its index (i.e., its position) -Read, update, insert, or remove any of the array values

Answer 113

A function that returns records with matching values in both tables There must be key values in both tables. The records will only merge if there are matches in both tables.

Answer 114

A function that returns all the records from the left table (first mentioned) and only the matching records from the right table (second mentioned). Each row in the left table will appear in the results, even if there is no corresponding match in the right table.

Answer 115

A function that returns all records from the right table (second mentioned) and only the matching records from the left table (first mentioned). Each row in the right table will appear in the results, even if there is no corresponding match in the right table.

Answer 116

A function that combines the RIGHT JOIN and LEFT JOIN to return all matching records in both tables. It will return all records in both tables.

Answer 117

A SQL clause that is used to combine rows from two or more tables based on a related column. Joins help you join matching or related columns from different tables.

Answer 118

Inner Left Right Outer

Answer 119

Reference columns in which each value is unique. You can only have one primary key per table.

Answer 120

Primary keys in other tables. You can have multiple foreign keys in a table.

Answer 121

In English and SQL, we read from left to right. So, the table mentioned first is left and the table mentioned second is right.

Answer 122

Column Aliases: Use AS Ex// SELECT column_name AS alias_name FROM table_name; If using AS results in an error when running a query because the SQL database you are working with doesn't support it, you can leave it out. In the previous examples, the alternate syntax for aliasing a table: -FROM table_name alias_name -SELECT column_name alias_name Notes: -If the alias contains spaces, you need to put it inside single quotes (or double quotes) like this: column_name AS 'Alias Name' -If a query contains expressions, you can assign column aliases to the expressions. For example: SELECT first_name, last_name, salary * 1.1 AS new_salary FROM employees; Table Aliases: table_name AS table_alias table_name table_alias OR Notes: Assigning an alias to a table does not rename the table permanently. It just gives the table another name temporarily during the execution of a query.

Answer 123

Since you assign aliases to columns in the SELECT clause, you can only reference the aliases in the clauses that are evaluated after the SELECT clause. The following query will result in an error: SELECT first_name, last_name, salary * 1.1 AS new_salary FROM employees WHERE new_salary > 5000 Error: Unknown column 'new_salary' in 'where clause' Code language: JavaScript (javascript) Why? In this SELECT statement, the database evaluates the clauses in the following order: FROM > WHERE > SELECT Code language: SQL (Structured Query Language) (sql) The database evaluates the WHERE clause before the SELECT clause. Therefore, at the time it evaluates the WHERE clause, the database doesn’t have the information of the new_salary column alias. So it issued an error. However, the following query works correctly: SELECT first_name, last_name, salary * 1.1 AS new_salary FROM employees ORDER BY new_salary; Code language: SQL (Structured Query Language) (sql) Try it In this example, the database evaluates the clauses of the query in the following order: FROM > SELECT > ORDER BY Code language: SQL (Structured Query Language) (sql) The database evaluates the SELECT clause before the ORDER BY clause. Therefore, at the time of evaluating the ORDER BY clause, the database has the information of the new_salary alias, which is the alias of the expression salary * 1.1. Hence, it works as expected

Answer 124

A SQL query that is nested inside a larger query. Note: -All of the logic of the outer query depends on the logic of the inner query. -The inner most query always executes first. -Standardly subqueries are nested in FROM or WHERE clauses.

Answer 125

Outer Query/Outer Select Inner Query/Inner Select

Answer 126

Allows you to add a filter to your query instead of the underlying table that can only be used with aggregate functions.

Answer 127

Returns records with your conditions by allowing you to include if/then statements in y our query.

Answer 128

Subqueries must be enclosed within parentheses. A subquery can have one or more columns specified in the SELECT clause. Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator which allows you to specify multiple values in a WHERE clause. (e.g., SELECT * FROM customers WHERE country IN ('Germany', 'France', 'UK'); A subquery can’t be nested in a SET command. The SET command is used with UPDATE to specify which columns (and values) are to be updated in a table (e.g., UPDATE Customers SET ContactName = 'Alfred Schmidt', City = Frankfurt' WHERE CustomerID = 1; To return more than one row, subqueries must contain multiple value operators such as the IN command.

Answer 129

Subqueries help you identify and work with smaller sections of data from a large dataset.

Answer 130

not equal to

Answer 131

Put them in the SELECT statement Example 1: SELECT Date, Region, Total_Bags, Small_Bags, (Small_Bags / Total_Bags) * 100 AS Small_Bags_Percent FROM `vaulted-algebra-458421-a4.avocado_data.avocado_prices` WHERE Total_Bags != 0 Example 2: SELECT Date, Region, Small_Bags, Large_Bags, XLarge_Bags, Total_Bags, Small_Bags + Large_Bags + XLarge_Bags AS Total_Bags_Calc FROM vaulted-algebra-458421-a4.avocado_data.avocado_prices

Answer 132

The extract command lets us pull one part of a given date to use. Example: Get the year from all of the start_times in a column. (The data source contains information about bike routes & rides.) SELECT EXTRACT(YEAR FROM starttime)

Answer 133

When you don't need GROUP BY: -If your query is only aggregates (no plain columns): SELECT SUM(order_total) FROM orders; -- total across all rows -Or if it’s only plain columns (no aggregates): SELECT customer_id, order_total FROM orders; When you need GROUP BY: -When you have both aggregates and plain columns. Why? When you use something like SUM(order_total), you’re no longer talking about one row — you’re asking SQL to look across many rows at once and collapse them down. SELECT SUM(order_total) FROM orders; Totally fine — you asked for one number from the entire table. BUT imagine you ask SQL for both at the same time: SELECT customer_id, SUM(order_total) FROM orders; One part (customer_id) is row-based: “give me the ID from this row.” The other part (SUM(order_total)) is aggregate-based: “look at all the rows and collapse them.” SQL pauses and says: ❓ “Wait — which customer_id should I show alongside that SUM? The first one? The last one? Pick one at random? I need you to be specific.” How GROUP BY resolves the conflict GROUP BY tells SQL: 👉 “Don’t get confused — I want you to calculate the aggregate separately for each distinct value of this column (or set of columns).” So: SELECT customer_id, SUM(order_total) FROM orders GROUP BY customer_id; Mental model Think of SQL rows as post-it notes on a wall. -Row mode → SQL is reading each post-it note/line in the table as-is. -Aggregate mode → you sweep all the notes/lines together and add something up. -Mix them? You need GROUP BY to say “stack notes by customer first, then total each stack.”

Answer 134

A database table that is created and exists temporarily on a database server. They're automatically deleted when you end your session. Useful when you only need a table for a short time to complete analysis tasks like calculations. Use cases: Temporary tables in SQL (TEMP, TEMPORARY, or #tables depending on dialect) are like scratchpads. They exist only for the duration of your session/query and don’t clutter your database afterward. They’re useful when you want to organize intermediate steps in analysis or speed things up without permanently saving data. Here are some concrete business use cases: 1. Complex analysis in stages Say you need to analyze customer churn, but the raw data has multiple joins (transactions, support tickets, logins). Use case: First, create a temp table with “active customers last quarter.” Then, create another temp table with “customers who haven’t logged in since.” Finally, join them to calculate churn rate. This avoids writing one gigantic unreadable query. 2. Performance optimization On large datasets, re-running the same subquery repeatedly is expensive. Example: You want to see daily revenue and daily refund totals separately, then combine them. Instead of calculating sales and refunds in two separate subqueries, you: Store daily transactions in a temp table. Re-use that temp table to calculate both revenue and refunds. 👉 Faster and less costly (especially in BigQuery, where query costs are tied to data scanned). 3. Data quality checks You might need to debug or QA transformations. Example: Before rolling up sales by region, you create a temp table of transactions missing region codes. This lets analysts fix data issues before they affect final reports. 4. Iterative what-if analysis Business analysts often test scenarios. Example: Marketing wants to know: What if we gave a 10% discount to customers who ordered 3+ times last month? Step 1: Create a temp table of “loyal customers.” Step 2: Apply the discount logic in a temp table. Step 3: Compare revenue before/after discount. 5. Breaking down complex logic Temp tables make it easier to communicate analysis to non-technical teammates. Example: Instead of handing off a giant query with 15 nested CTEs, you: Build stepwise temp tables: customers_with_orders, customers_with_refunds, net_revenue_by_customer. Then give a simple final query that selects from the last temp table. This helps when working with finance or marketing teams who need to audit logic. 6. One-off “sandboxing” Sometimes business wants an ad hoc analysis. Example: Sales VP asks: “How many orders came from customers created during last year’s Black Friday campaign?” You build a temp table of “Black Friday customers” once, then slice/dice orders against it in multiple ways without rewriting that filter each time.

Answer 135

A type of temp table that you can query from multiple times. WITH clause approximates a temp table (i.e., it creates something that does the same thing as a temp table). It doesn't add a temp table to a database for others to see, but you can see your results and anyone who needs to review your work can see the code that led to your results. Example: WITH longest_used_bike AS ( SELECT bike_id, SUM(duration_minutes) AS trip_duration FROM bigquery-public-data.austin_bikeshare.bikeshare_trips GROUP BY bike_id ORDER BY trip_duration DESC LIMIT 1 ) After writing the above, you can run as many queries as you want on that temp table. For instance: Find the station where the longest-used bike leaves most often. SELECT trips.start_station_id, COUNT(*) AS trip_ct FROM longest_used_bike AS longest INNER JOIN bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips ON longest.bike_id = trips.bike_id GROUP BY trips.start_station_id ORDER BY trip_ct DESC LIMIT 1 *Note: You need to run both the temp table and the above query together and then you'll get the below output. Output: start_station_id: 3798 trip_ct: 177

Answer 136

Temporary tables, or temp tables, store subsets of data from standard data tables for a certain period of time. Temp tables allow you to run calculations in temporary data tables without needing to make modifications to the primary tables in your database. Because they are temporary, they are automatically deleted at the end of your SQL session.

Answer 137

Using SELECT INTO is a good practice when you want to keep the database uncluttered & other people don't need to use the table. NOTE: BigQuery doesn't recognize SELECT INTO; you have to use it in a different RDBMS (relational database management system). Example: SELECT * INTO AfricaSales FROM GlobalSales WHERE Region = "Africa"

Answer 138

If lots of people are using a table, use the CREATE TABLE statement instead of the SELECT INTO statement. CREATE TABLE adds the table into the database. Example: CREATE TABLE AfricaSales AS ( SELECT * FROM GlobalSales WHERE Region = "Africa" )

Answer 139

If lots of people are using a table, use the CREATE TABLE statement instead of the SELECT INTO statement. Using SELECT INTO is a good practice when you want to keep the database uncluttered & other people don't need to use the table.

Answer 140

Relational Database Management System (RDBMS), is a software system that organizes data into tables with rows and columns to create, manage, and interact with relational databases. It uses the Structured Query Language (SQL) to manipulate data and maintain relationships between different data points through keys, ensuring data integrity, consistency, and security.

Answer 141

Temp Tables can be used as a holding area for storing values if you are making a series of calculations. This is sometimes referred to as pre-processing of the data.

Answer 142

Temp Tables can collect the results of multiple, separate queries. This is sometimes referred to as data staging. Staging is useful if you need to perform a query on the collected data or merge the collected data.

Answer 143

They are automatically deleted from the database when you end your SQL session. They can be used as a holding area for storing values if you are making a series of calculations. This is sometimes referred to as pre-processing of the data. They can collect the results of multiple, separate queries. This is sometimes referred to as data staging. Staging is useful if you need to perform a query on the collected data or merge the collected data. They can store a filtered subset of the database. You don’t need to select and filter the data each time you work with it. In addition, using fewer SQL commands helps to keep your data clean.

Answer 144

Deleting a temporary table removes the rows of the table but leaves the table definition and columns ready to be used again. Dropping a temporary table not only removes the information contained in the rows of the table, but removes the table variable definitions (columns) themselves.

Answer 145

A temporary table is only accessible to the connection that created that temporary table. It is not accessible to other connections. However, we can create temporary tables that are accessible to all the open connections. Such temporary tables are called global temporary tables. The name of the global temporary table starts with a double hash symbol (##). Let’s create a global temporary table that contains records of all female students from the student table. USE schooldb; SELECT name, age, gender INTO ##FemaleStudents FROM student WHERE gender = 'Female'

Answer 146

You should consider table variables as a first choice when working with relatively small data sets. They are easier to work with and they trigger fewer recompiles in the routines in which they’re used, compared to using temporary tables. If you are doing more complex processing on temporary data, or need to use more than reasonably small amounts of data in them, then local temporary tables are likely to be a better choice. Note: there might be situations where you don’t need to create permanent tables. Remember: storing data in SQL costs the company money and resources. If you don’t need a permanent table, you can create temporary tables instead. Temporary tables only exist within your session (or up to 24 hours depending on your SQL platform) and aren’t shareable or viewable by others. Temporary tables exist only for you during your session. Think of temporary tables like a scratch pad where you can scribble out your calculations before putting down your final answer.

Answer 147

1) SELECT INTO statement Example: SELECT select_list INTO temporary_table FROM table_name .... 2) CREATE TABLE statement Example: CREATE TABLE #haro_products ( product_name VARCHAR(MAX), list_price DEC(10,2) ); There are three key things to notice here: 1.If you are running two queries at the same time, which you have to do with temporary tables (only available during that session), then you need a semicolon separating each query. 2.The first query is where you created the temporary table. The second query references this temporary table and its fields. That is why you can access the sum of tickets as total_tickets in the second query. 3.When creating table names, don’t use spaces or the query will return an error. Best practice is to use camelcase capitalization when naming the table you are building. Camelcase capitalization means that you capitalize the start of each word without any spaces in between, just like a camel’s hump. For example, the table TicketsByOccasion uses camelcase capitalization.

Answer 148

The WHERE clause is used to make filters on your table, like a filter for certain date ranges or specific countries. The HAVING clause is used to make filters on your aggregations and has to be paired with a GROUP BY clause.

Answer 149

Here are the three conditions for when you would want to create a permanent table. All three conditions should be met. 1. Complex query containing multiple JOINs 2. Result output is a table 3. You need to run the query frequently or on a regular basis Alternatively, Temporary tables are used to break down complex queries into smaller increments. These complex queries can contain multiple JOINs but don’t necessarily have to. You might want to use temporary tables if one or more of the following conditions apply: ●Slowly running query with multiple JOINs and WHERE statements ●Slowly running query containing GROUP BY and HAVING ●Nested queries (i.e. query within a query) ●If you need to do a calculation on top of a calculation (e.g. take sum per day then average across the day sums) If any of the above conditions are met, then using a temporary table may speed up your query, which will make it easier for you to write the query, as well as make it easier for you to troubleshoot your query if something goes wrong.

SQL Flashcards

(176 cards)