SQL Crash Course

Introduction
Basics
- DataTypes
- SELECT
- EXCEPT
- CREATE
- INSERT
- UPDATE
- DELETE
JOIN
- Relationships
Constrains
ORDER BY
INDEX
Aggregate Clause
GROUP BY
- HAVING
- COALESCE
- STRING_AGG
- ROLLUP
CASE
SUB-QUERIES
VIEWS
TRANSACTIONS
Normalization
Stored Procedures
User Defined Functions
Window Functions
Common table Expressions
Triggers
Order of Execution

This crash course is a quick reference to SQL commands and concepts. It covers the most commonly used SQL commands, their syntax, and examples. It is not exhaustive but should be enough to get you started with SQL. It is a good idea to have a basic understanding of SQL before diving into this crash course. You can find many online resources and tutorials to help you learn SQL.

Some of them are defined by the SQL standard
But a lot of these depends on the underlying DBMS

SELECT a, b from table_name;
 
-- = For the exact match
select a, b from table_name where a = 'hero';
 
-- LIKE: for a coarse search
select a, b from table_name where a like '%h'; -- must end with h
select a, b from table_name where a like 'o%'; -- must start with o
select a, b from table_name where a like '%r%'; -- must contain r
 
-- AS: alias for the name of returned column names
select a, b as x from table_name;
 
-- DISTINCT: returns non-duplicate rows
select distinct a from table_name;
 
-- IN: with a matching row with given list of distinct values
select * from table_name where age in (21, 23, 25);

A set operator in SQL that returns the distinct rows that are present in the result st of the first query but not in the result set of second query
Also known as set difference operator

-- syntax
SELECT column1 [, column2 ] FROM table_name [WHERE condition]
EXCEPT SELECT column1 [, column2 ] FROM tables [WHERE condition]

SQL Except

create table [if not exists] table_name (
	id integer primary key,
	name varchar(255)
);

the if not exists is optional

insert into table_name (a, b, c) values (1, 2, 5);

update table_name set a = 5 where b = 2;

delete from table_name where c = 5;

select a, b from table_1
<inner | left | right | full> join table_2
on table_1.col_name = table_2.col_name;
 
-- Think of the tables as a venn diagram
-- Let the first table be A and the second be B
 
-- INNER JOIN: Intersection of both
-- LEFT JOIN: Everything on A
-- RIGHT JOIN: Everything on B
-- FULL JOIN: A union B

Read more about Joins at https://www.datacamp.com/cheat-sheet/sql-joins-cheat-sheet

SQL Joins

One to One
One to Many
Many to Many

PRIMARY KEY: Ensures uniqueness of column's values
FOREIGN KEY: Maintains referential integrity between two tables
NOT NULL: Ensures a column must have a value
UNIQUE: Ensures unique values in a column
AUTO_INCREMENT: increments the given number column for the next value (if value not already provided)
DEFAULT: Add default values to column (if not provided while insert)

create table if not exists people (
	id integer primary key,
	first_name varchar(255) not null,
	last_name varchar(255),
	age integer default 21,
	gender enum('m', 'f'),
	ssn char(32) unique,
 
	constraint name_constraint unique(first_name, last_name),
	constraint age_constraint check (age >= 0 and age < 200)
 
	foreign key (id) references post(post_id) -- Foreign keys
);

We can also add constaints to already existing tables

alter table people add constraint unique_lastname unique(last_name);

in doing so, we might have conflicts due to previous data not matching the newly created constraint. In those cases, we can use TRUNCATE TABLE table_name

Sorts the data in one of two ways

ASC
DESC

select a, count(b) as cb from table_name
group by a
order by cb ASC;

Database objects used to speed up data retrieval

SUM
AVG
COUNT
MIN
MAX

-- count(*) and count(col_name) are different
-- count(*) counts null values too, while the latter does not

select a, sum(b) from table_name group by a;

When using the group by clause, the select statement must contain either of the two
- Columns used for grouping
- Aggregate functions to calculate values within each group

Similar to where clause but it runs after the group by clause

If any returned value is null, replace it with a default value

select coalesce(name, 'No Name') from employees;
-- in case name is null, use 'No Name' as the default value

Concatenates the rows of strings into a single string separated by the specified separator

string_agg (expression, separator) [<order clause>]
within group (order by <expression_list>) { asc | desc }
-- use `within_group` to work on the concatenated strings
-- ex: it can be used to sort the order with order by ... etc
 
-- example: a user has many phone numbers, show it separated by column
select user_id, string_agg(phone, ',') as phones
from phone_list_table group by user_id;
 
-- example: the above query but with phone numbers sorted
select
	user_id,
	string_agg(phone, ',') within group(order by phone) as phones
from phone_list_table group by user_id;

The group by clause within an aggregate function can only return an aggregate result at a single level of detail
To return aggregated results at multiple levels of details, use rollup with group by clause

Using rollup with a single column returns sum of that column

select state, sum(salary) as total_sal from salaries
group by rollup(state) order by state nulls last;

by default the returned column name is null
we used nulls last just to make sure the aggregated result for rollup comes in the end

Using rollup with multiple columns, it returns aggregation for each data group

select state, job, sum(salary) as total_sal from salaries
group by rollup(state, job) order by state nulls last;

select count(*) as no_employees,
case
	when salary < 5000 then 'Low'
	when salary between 5000 and 10000 then 'Medium'
	else 'High'
end as sal_cat
from employees
group by sal_cat
order by no_employees desc;

Query embedded within another query
Mainly used to retrieve data to be used as a filter or condition in the main query
Context filtering and data retrieval in an efficient manner

select emp_name from employees
where dept_id = (
	select dept_id from departments
	where name = 'HR'
);

Virtual tables created by queries and can be used like regular tables
A way to simplify complex queries and restrict access to sensitive data

-- syntax
create view view_name as select column1, column2, ...
from table_name where condition;

Sequence of one of more SQL operations treated as a single unit.

begin transaction;
-- sql statements (insert, update, delete, seelct etc.) go here
commit; -- to save the changes permanently
rollback; -- to undo the changes

This is a collective term for

Atomicity
Consistency
Isolation
Durability

Organizing data in a database to eliminate redundancy and improve data integrity

Pre-compiled SQL code that can be stored and executed on the database server
Help in improving performance and security

-- syntax for postgres (similar for mysql and others)
create [or replace] procedure procedure_name (
	parameter_name parameter_type ...
) language plpgsql as $$ -- to define a delimiter
begin
-- procedure code here
end;
$$;
 
-- example
create procedure update_emp_salary (id INT, new_sal numeric)
language plpgsql as $$
begin
	update employees set salary = new_sal where emp_id = id;
end;
$$;
call update_emp_salary(2, 30000); -- to call the procedure

Defined by the user to perform specific operations and return a value

-- syntax for postgres (similar for mysql and others)
create [or replace] function function_name (parameters)
return return_type as $$
begin
	-- function body (sql statements)
	return some_value; -- for scalar functions
	-- or "return query" to return the value returned by the query
end;
$$ language plpgsql;
 
-- example
create or replace function emp_with_max_salary_in_dept (
	dept_name varchar
)
return table(emp_id int, name varchar, salary numeric) as $$
begin
	return query
	select e.emp_id, e.name, e.salary from employees e
		where e.dept = dept_name
		and e.salary = (
			select max(emp.salary) from employees emp
			where emp.dept = dept_name;
		);
end;
$$ language plpgsql;
 
select * from emp_with_max_salary_in_dept('HR'); -- call function

Also known as analytic functions
Allows to perform calculations across a set of rows related to the current row
Window functions do not collapse rows while aggregations. You can calculate aggregates while retaining individual row details
Useful in advanced analytics like Running totals, Rank calculations, cumulative distributions etc.
Defined by over() clause

select
	name,
	salary,
	sum(salary) as running_sal over(order by salary)
from employees;

order by aggregates over the entire dataset
partition by works like the group by clause while aggregating
row_number returns row number of that row
lag and lead calculates the aggregated values keeping one row behind or ahead respectively

select name, salary, lag(salary) over() from employees;
 
-- example: getting diff of salary of employees
select
	name,
	salary,
	(salary - lead(salary) over(order by salary desc)) as diff
from employees;

rank and dense_rank to rank the results based on a given criteria

select name, salary rank() over(order by salary desc)
from employees;

In case of duplicate values (on which rank is defined), rank gives both the values the same rank (let say 1, then the next rank starts at 3). It preserves the number of rankings within the same rank
In case of dense_rank, it goes for ranking first. In case of duplicates, the next value gets the next rank. (let say duplicate rank is 1, next rank would be 2 instead of 3, as with rank)

CTE is a temporary result set that you define within a query to simplify complex SQL statements

-- syntax
with cte_name (optional_column_list) as (
	-- CTE query definition
)
-- main query referencing CTE
select ... from cte_name where ...;
 
-- example: calculate avg_salary per dept and then find all employees whose salary is above the average salary of their dept
with avg_sal as (
	select dept, avg(salary) as avg_salary
	from employees group by dept;
)
select e.emp_id, e.name, e.dept, e.salary, a.avg_salary
from employees e
join avg_sal a on e.dept = a.dept
where e.salary > a.avg_salary;

Database objects that automatically execute in response to specific events
or replace keyword is optional
{ before | after | instead of } specifies when the trigger will be activated concerning the triggering event
{ insert | update | delete } specifies the type of action that will activate the trigger
for each row is a clause when you want to fired for each affected row in the table. usually used in the row level triggers

create [or replace] trigger trigger_name
{ before | after  | instead of | } { insert | update | delete }
on table_name
[ for each row ]
begin
-- sql statements to be executed when trigger is activated
end;
 
-- postgres specific trigger syntax
create trigger trigger_name
{ before | after | instead of }
{ insert | update | delete | truncate }
on table_name
for each { row | statement }
execute function trigger_function_name();
 
create or replace function trigger_function_name()
returns trigger as $$
begin
	-- trigger logic here
	return new;
end; $$ language plpgsql;
 
-- example: if negative salary is inserted, trigger a function to set it to 0
create or replace function check_sal()
returns trigger as $$
begin
	if new.salary < 0 then new.salary = 0; end if;
	return new;
end; $$ language plpgsql;
 
create or replace trigger before_update_salary
before update on employees
for each row
execute function check_sal();

FROM / JOIN : Identifying the tables required for this SQL query
WHERE : Filter the rows on the basis of condition
GROUP BY : Group data based on specified condition
HAVING : Similar to where, but applied after grouping the data
SELECT : Select the columns to be included in the final result
DISTINCT : Remove the duplicates from the returned result
ORDER BY : Sorts the result ascending (ASC) or descending (DESC)
LIMIT / OFFSET : Number of rows to be returned, and where to start from