SQL basics
Introduction to SQL
Terminology
- table in the database: as the name suggests
- fields: column names, such as
id
- key: the column to match different tables
Sorting ORDER BY
By default, in ascending order (smallest to largest). Or else, use DESC
or ASC
to clarify.
SELECT *
FROM customers
ORDER BY age DESC
Text are ordered in alphabetical.
SELECT *
FROM books
ORDER BY title ASC -- alphabetical
Limiting data LIMIT
, OFFSET
LIMIT
Only returns a few. Used in the end.
SELECT *
FROM customers
LIMIT 3
OFFSET
parameter is used with LIMIT
to skip a number of records
SELECT *
FROM customers
LIMIT 3 OFFSET 2 -- returns the 3, 4, 5th record
Filtering WHERE
SELECT name
FROM sales
WHERE year = 2024 -- country = 'USA'
Aliases AS
Used to create a new variable
SELECT name, price*delivery AS multiplication
FROM sales
-- for strings, can use concat to combine
SELECT CONCAT(first_name, last_name) AS full_name
FROM employees
Pattern matching LIKE %
Wild cards %
and _
.
'%case%'
'A_B8'
, where underscore is one character- mixed:
'A_B%'
SELECT *
FROM comics
WHERE title
LIKE 'The Avengers%' -- used for strings, and case sensitive
Upper and lower cases conversion
/*
Converts strings in the title field to lowercase
*/
SELECT LOWER(title)
FROM movies;
/*
Converts strings in the title field to uppercase
*/
SELECT UPPER(title)
FROM movies;
Can be combined like this for case insensitive search
SELECT title
FROM books
WHERE LOWER(title) LIKE '%cooking%'
Aggregation MAX
, MIN
, COUNT
, SUM
, AVG
SELECT MAX(year)
FROM movies
Grouping GROUP BY
SELECT genre, AVG(budget)
FROM movies
GROUP BY genre;
Typically when combining grouping and filtering with WHERE
, the filtering happens first. Using HAVING
, filter data after grouping
SELECT genre, AVG(budget)
FROM movies
GROUP BY genre
HAVING AVG(budget) > 50;
Case
Those do not match the conditions are left empty, unless else
is used.
SELECT firstname, lastname,
CASE
WHEN age >= 65 THEN 'senior'
WHEN age >= 25 AND age <65 THEN 'adult'
ELSE 'youth' -- all the others
END AS category
FROM customers
Checking data quality
Duplicates
--checking for id duplicates
SELECT id, COUNT(id)
FROM employees
GROUP BY id
HAVING COUNT(id) > 1;
Missing, IS NULL
, IS NOT NULL
SELECT *
FROM movies
WHERE genre IS NULL
More than one table
Select without joining
This uses WHERE
to bind the id
SELECT firstname, lastname, city, number, type
FROM Customers, PhoneNumbers
WHERE Customers.id = PhoneNumbers.customer_id
Join
SELECT ...FROM tab1 JOIN tab2 ON key1 = key2
The example above would become
SELECT firstname, lastname, city, number, type
FROM Customers JOIN PhoneNumbers
ON Customers.id = PhoneNumbers.customer_id
Left join combined with alias
SELECT C.firstname, C.lastname, C.city, PN.number, PN.type
FROM Customers AS C LEFT JOIN PhoneNumbers AS PN
ON C.id = PN.customer_id
Union
Combines the results from two SELECT by row, can be considered as rbind
. The columns need to be the same from two selects.
SELECT firstname, lastname, age FROM Customers
UNION
SELECT firstname, lastname, age FROM Contacts
It removes the duplicates. UNION ALL
keeps the duplicates.
When the columns don’t match, need to create NULL column
SELECT firstname, lastname, age FROM Customers
UNION
SELECT firstname, lastname, NULL FROM Contacts
Union with conditions,
SELECT firstname, lastname, age FROM Customers
WHERE age > 30
UNION
SELECT firstname, lastname, age FROM Contacts
WHERE age < 25
Subquery
Find the average for values computed from a middle-step,
SELECT AVG(count) FROM
-- the following computes the count per person, needed for the average final step
SELECT C.id, COUNT(PN.number) AS count
(FROM Customers AS C
LEFT JOIN PhoneNumbers AS PN
ON C.id = PN.customer_id
GROUP BY C.id)
AS Numbers
Common Table Expressions (CTEs)
WITH mytable1 as (
SELECT FROM WHERE....
)
as (
, mytable2 SELECT FROM WHERE....
)
SELECT b.asd, avg(mmm), avg(yyy)
FROM mytable1 as a
LEFT JOIN mytable2 as a
ON b..... = a....
WHERE ...
GROUP BY .... ORDER BY ...
;
SELECT b.asd, avg(mmm), avg(yyy)
FROM (
SELECT ...
FROM table1
WHERE ...
GROUP BY mmm
as a
) LEFT JOIN (
SELECT ...
FROM table1
WHERE ...
GROUP BY yyy
as a
) ON b..... = a....
WHERE ...
GROUP BY .... ORDER BY ...
Useful functions
String functions
CONCAT
: like paste in R, combines two strings as oneUPPER
,LOWER
SUBSTRING
, takes part of the string (e.g. starting from 1st character, up to next 3)REPLACE
, replaces certain values
SELECT CONCAT(firstname, '', lastname) AS name
FROM contacts;
SELECT UPPER(name) -- or LOWER
FROM contacts;
SELECT SUBSTRING(names, 1, 3) -- 1st, 2nd, 3rd
FROM contacts
SELECT
firstname,
lastnameREPLACE(city, 'new york', ny) -- only replaces where city matches new york, others remain the same
FROM contacts
Can combine functions together
SELECT CONCAT(
1, 1), -- first character
SUBSTRING(firstname, '. ', -- add a dot
UPPER(lastname)) AS name
FROM Customers
Others
Identity
Create a table
CREATE TABLE customers (
id int NOT NULL AUTO_INCREMENT, -- incremental integers
varchar(255),
firstname varchar(255)
lastname )
Add new rows, no need to insert id number
INSERT INTO Customers (firstname, lastname, city, age)
VALUES
'demo', 'demo', 'Paris', 52),
('test', 'test', 'London', 21) (
If need to alter the id, can use ALTER TABLE
Keys
Primary key, uniquely identifies the rows of a table. It must contain unique values, can not have NULL values and one table can only have one primary key.
CREATE TABLE customers (
id int NOT NULL AUTO_INCREMENT, -- incremental integers
varchar(255),
firstname varchar(255),
lastname PRIMARY KEY (id) -- set the key
)
Foreign key is a primary key in another table. There can be multiple foreign keys in a table.
CREATE TABLE phonenumbers (
id int NOT NULL AUTO_INCREMENT, -- incremental integers
int NOT NULL,
customer_id type varchar(255),
PRIMARY KEY (id), -- set the key
FOREIGN KEY (customer_id) REFERENCES
id)
customer( )
Unique
ALTER TABLE customers
ADD UNIQUE (lastname)