- Introduction
- Basics
- JOIN
- Constrains
- ORDER BY
- INDEX
- Aggregate Clause
- GROUP BY
- CASE
- SUB-QUERIES
- VIEWS
- TRANSACTIONS
- Normalization
- Stored Procedures
- User Defined Functions
- Window Functions
- Common table Expressions
- Triggers
- Order of Execution
This crash course is a quick reference to SQL commands and concepts. It covers the most commonly used SQL commands, their syntax, and examples. It is not exhaustive but should be enough to get you started with SQL. It is a good idea to have a basic understanding of SQL before diving into this crash course. You can find many online resources and tutorials to help you learn SQL.
- Some of them are defined by the SQL standard
- But a lot of these depends on the underlying DBMS
SELECT a, b from table_name;
-- = For the exact match
select a, b from table_name where a = 'hero';
-- LIKE: for a coarse search
select a, b from table_name where a like '%h'; -- must end with h
select a, b from table_name where a like 'o%'; -- must start with o
select a, b from table_name where a like '%r%'; -- must contain r
-- AS: alias for the name of returned column names
select a, b as x from table_name;
-- DISTINCT: returns non-duplicate rows
select distinct a from table_name;
-- IN: with a matching row with given list of distinct values
select * from table_name where age in (21, 23, 25);
- A set operator in SQL that returns the distinct rows that are present in the result st of the first query but not in the result set of second query
- Also known as set difference operator
-- syntax
SELECT column1 [, column2 ] FROM table_name [WHERE condition]
EXCEPT SELECT column1 [, column2 ] FROM tables [WHERE condition]
create table [if not exists] table_name (
id integer primary key,
name varchar(255)
);
- the
if not exists
is optional
insert into table_name (a, b, c) values (1, 2, 5);
update table_name set a = 5 where b = 2;
delete from table_name where c = 5;
select a, b from table_1
<inner | left | right | full> join table_2
on table_1.col_name = table_2.col_name;
-- Think of the tables as a venn diagram
-- Let the first table be A and the second be B
-- INNER JOIN: Intersection of both
-- LEFT JOIN: Everything on A
-- RIGHT JOIN: Everything on B
-- FULL JOIN: A union B
Read more about Joins at https://www.datacamp.com/cheat-sheet/sql-joins-cheat-sheet
- One to One
- One to Many
- Many to Many
- PRIMARY KEY: Ensures uniqueness of column's values
- FOREIGN KEY: Maintains referential integrity between two tables
- NOT NULL: Ensures a column must have a value
- UNIQUE: Ensures unique values in a column
- AUTO_INCREMENT: increments the given number column for the next value (if value not already provided)
- DEFAULT: Add default values to column (if not provided while insert)
create table if not exists people (
id integer primary key,
first_name varchar(255) not null,
last_name varchar(255),
age integer default 21,
gender enum('m', 'f'),
ssn char(32) unique,
constraint name_constraint unique(first_name, last_name),
constraint age_constraint check (age >= 0 and age < 200)
foreign key (id) references post(post_id) -- Foreign keys
);
- We can also add constaints to already existing tables
alter table people add constraint unique_lastname unique(last_name);
- in doing so, we might have conflicts due to previous data not matching the newly created constraint. In those cases, we can use
TRUNCATE TABLE table_name
Sorts the data in one of two ways
- ASC
- DESC
select a, count(b) as cb from table_name
group by a
order by cb ASC;
Database objects used to speed up data retrieval
- SUM
- AVG
- COUNT
- MIN
- MAX
-- count(*) and count(col_name) are different
-- count(*) counts null values too, while the latter does not
select a, sum(b) from table_name group by a;
- When using the
group by
clause, the select statement must contain either of the two- Columns used for grouping
- Aggregate functions to calculate values within each group
- Similar to where clause but it runs after the
group by
clause
If any returned value is null, replace it with a default value
select coalesce(name, 'No Name') from employees;
-- in case name is null, use 'No Name' as the default value
- Concatenates the rows of strings into a single string separated by the specified separator
string_agg (expression, separator) [<order clause>]
within group (order by <expression_list>) { asc | desc }
-- use `within_group` to work on the concatenated strings
-- ex: it can be used to sort the order with order by ... etc
-- example: a user has many phone numbers, show it separated by column
select user_id, string_agg(phone, ',') as phones
from phone_list_table group by user_id;
-- example: the above query but with phone numbers sorted
select
user_id,
string_agg(phone, ',') within group(order by phone) as phones
from phone_list_table group by user_id;
- The
group by
clause within an aggregate function can only return an aggregate result at a single level of detail - To return aggregated results at multiple levels of details, use
rollup
withgroup by
clause
Using rollup
with a single column returns sum of that column
select state, sum(salary) as total_sal from salaries
group by rollup(state) order by state nulls last;
- by default the returned column name is
null
- we used
nulls last
just to make sure the aggregated result forrollup
comes in the end
Using rollup
with multiple columns, it returns aggregation for each data group
select state, job, sum(salary) as total_sal from salaries
group by rollup(state, job) order by state nulls last;
select count(*) as no_employees,
case
when salary < 5000 then 'Low'
when salary between 5000 and 10000 then 'Medium'
else 'High'
end as sal_cat
from employees
group by sal_cat
order by no_employees desc;
- Query embedded within another query
- Mainly used to retrieve data to be used as a filter or condition in the main query
- Context filtering and data retrieval in an efficient manner
select emp_name from employees
where dept_id = (
select dept_id from departments
where name = 'HR'
);
- Virtual tables created by queries and can be used like regular tables
- A way to simplify complex queries and restrict access to sensitive data
-- syntax
create view view_name as select column1, column2, ...
from table_name where condition;
- Sequence of one of more SQL operations treated as a single unit.
begin transaction;
-- sql statements (insert, update, delete, seelct etc.) go here
commit; -- to save the changes permanently
rollback; -- to undo the changes
This is a collective term for
- Atomicity
- Consistency
- Isolation
- Durability
- Organizing data in a database to eliminate redundancy and improve data integrity
- Pre-compiled SQL code that can be stored and executed on the database server
- Help in improving performance and security
-- syntax for postgres (similar for mysql and others)
create [or replace] procedure procedure_name (
parameter_name parameter_type ...
) language plpgsql as $$ -- to define a delimiter
begin
-- procedure code here
end;
$$;
-- example
create procedure update_emp_salary (id INT, new_sal numeric)
language plpgsql as $$
begin
update employees set salary = new_sal where emp_id = id;
end;
$$;
call update_emp_salary(2, 30000); -- to call the procedure
- Defined by the user to perform specific operations and return a value
-- syntax for postgres (similar for mysql and others)
create [or replace] function function_name (parameters)
return return_type as $$
begin
-- function body (sql statements)
return some_value; -- for scalar functions
-- or "return query" to return the value returned by the query
end;
$$ language plpgsql;
-- example
create or replace function emp_with_max_salary_in_dept (
dept_name varchar
)
return table(emp_id int, name varchar, salary numeric) as $$
begin
return query
select e.emp_id, e.name, e.salary from employees e
where e.dept = dept_name
and e.salary = (
select max(emp.salary) from employees emp
where emp.dept = dept_name;
);
end;
$$ language plpgsql;
select * from emp_with_max_salary_in_dept('HR'); -- call function
- Also known as analytic functions
- Allows to perform calculations across a set of rows related to the current row
- Window functions do not collapse rows while aggregations. You can calculate aggregates while retaining individual row details
- Useful in advanced analytics like Running totals, Rank calculations, cumulative distributions etc.
- Defined by
over()
clause
select
name,
salary,
sum(salary) as running_sal over(order by salary)
from employees;
order by
aggregates over the entire datasetpartition by
works like thegroup by
clause while aggregatingrow_number
returns row number of that rowlag
andlead
calculates the aggregated values keeping one row behind or ahead respectively
select name, salary, lag(salary) over() from employees;
-- example: getting diff of salary of employees
select
name,
salary,
(salary - lead(salary) over(order by salary desc)) as diff
from employees;
rank
anddense_rank
to rank the results based on a given criteria
select name, salary rank() over(order by salary desc)
from employees;
- In case of duplicate values (on which rank is defined),
rank
gives both the values the same rank (let say 1, then the next rank starts at 3). It preserves the number of rankings within the same rank - In case of
dense_rank
, it goes for ranking first. In case of duplicates, the next value gets the next rank. (let say duplicate rank is 1, next rank would be 2 instead of 3, as with rank)
- CTE is a temporary result set that you define within a query to simplify complex SQL statements
-- syntax
with cte_name (optional_column_list) as (
-- CTE query definition
)
-- main query referencing CTE
select ... from cte_name where ...;
-- example: calculate avg_salary per dept and then find all employees whose salary is above the average salary of their dept
with avg_sal as (
select dept, avg(salary) as avg_salary
from employees group by dept;
)
select e.emp_id, e.name, e.dept, e.salary, a.avg_salary
from employees e
join avg_sal a on e.dept = a.dept
where e.salary > a.avg_salary;
- Database objects that automatically execute in response to specific events
or replace
keyword is optional{ before | after | instead of }
specifies when the trigger will be activated concerning the triggering event{ insert | update | delete }
specifies the type of action that will activate the triggerfor each row
is a clause when you want to fired for each affected row in the table. usually used in the row level triggers
create [or replace] trigger trigger_name
{ before | after | instead of | } { insert | update | delete }
on table_name
[ for each row ]
begin
-- sql statements to be executed when trigger is activated
end;
-- postgres specific trigger syntax
create trigger trigger_name
{ before | after | instead of }
{ insert | update | delete | truncate }
on table_name
for each { row | statement }
execute function trigger_function_name();
create or replace function trigger_function_name()
returns trigger as $$
begin
-- trigger logic here
return new;
end; $$ language plpgsql;
-- example: if negative salary is inserted, trigger a function to set it to 0
create or replace function check_sal()
returns trigger as $$
begin
if new.salary < 0 then new.salary = 0; end if;
return new;
end; $$ language plpgsql;
create or replace trigger before_update_salary
before update on employees
for each row
execute function check_sal();
- FROM / JOIN : Identifying the tables required for this SQL query
- WHERE : Filter the rows on the basis of condition
- GROUP BY : Group data based on specified condition
- HAVING : Similar to where, but applied after grouping the data
- SELECT : Select the columns to be included in the final result
- DISTINCT : Remove the duplicates from the returned result
- ORDER BY : Sorts the result ascending (ASC) or descending (DESC)
- LIMIT / OFFSET : Number of rows to be returned, and where to start from