What is Data Modeling?
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another.
The process of creating data models for an information system
Data modeling can easily translate to database modeling, as this is the essential end state.
Key points about Data Modeling
Intro to Relational Databases
Relational Model: organizes data into one or more tables (or “relations”) of columns and rows, with a unique key identifying each row. Generally, each table represents one “entity type” (such as customer or product).
Relational Database is a digital database based on the relational model of data… a software system used to maintain relational databases is a relational database management system (RDBMS).
SQL (Structured Query Language) is the language used across almost all relation database system for querying and maintaining the database

Relational DB - Basics

ACID Transactions
Properties of database transactions intended to guarantee validity even in the event of errors or power failures.
When Not to Use a Relational Database
Basics of Apache Cassandra

Importance of Relational Model
Online Transactional Processing (OLTP):
is involved in the operation of a particular system. OLTP is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). It involves Queries accessing individual record like Update your Email in Company database.
Online Analytical Processing (OLAP):
deals with Historical Data or Archival Data. OLAP is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). Sometime query need to access large amount of data in Management records like what was the profit of your company in last year.
Objectives of Normal Form
Normal Form
The process of normalization is a step by step process:
How to reach First Normal Form (1NF):
Second Normal Form (2NF):
Third Normal Form (3NF):
Denormalization
The process of trying to improve the read performance of a database at the expense of losing some write performance by adding redundant copies of data.
JOINS on the database allow for outstanding flexibility but are extremely slow. If you are dealing with heavy reads on your database, you may want to think about denormalizing your tables. You get your data into normalized form, and then you proceed with denormalization. So, denormalization comes after normalization.
Logical Design Change
Normalization vs Denormalization
Normalization is about trying to increase data integrity by reducing the number of copies of the data. Data that needs to be added or updated will be done in as few places as possible.
Denormalization is trying to increase performance by reducing the number of joins between tables (as joins can be slow). Data integrity will take a bit of a potential hit, as there will be more copies of the data (to reduce JOINS).
Fact and Dimension Tables
Fact table consists of the measurements, metrics or facts of a business process.
Dimension table is a structure that categorizes facts and measures in order to enable users to answer business questions. Dimensions are people, products, place and time.
In the image below, the unique primary key for each Dimension table is included in the Fact table.
In this example, it helps to think about the Dimension tables providing the following information:

Star Schema
Star Schema is the simplest style of data mart schema. The star schema consists of one or more fact tables referencing any number of dimension tables.
Benefits
Drawbacks
Snowflake Schema
Logical arrangement of tables in a multidimensional database represented by centralized fact tables which are connected to multiple dimensions.
A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborated, having multiple levels of relationships, child tables having multiple parents.

Eventual Consistency
A consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
CAP Theorem
A theorem in computer science that states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees of consistency, availability, and partition tolerance.
Is Eventual Consistency the opposite of what is promised by SQL database per the ACID principle?
Much has been written about how Consistency is interpreted in the ACID principle and the CAP theorem. Consistency in the ACID principle refers to the requirement that only transactions that abide by constraints and database rules are written into the database, otherwise the database keeps previous state. In other words, the data should be correct across all rows and tables. However, consistency in the CAP theorem refers to every read from the database getting the latest piece of data or an error.
Data Modeling in Apache Cassandra
Denormalization of tables in Apache Cassandra is absolutely critical. The biggest take away when doing data modeling in Apache Cassandra is to think about your queries first. There are no JOINS in Apache Cassandra.
