3. Correlated Queries, Nested Queries, and Common Table Expressions
3.1. Correlated subqueries
정의 : outer query의 값을 이용해서 결과를 내는 special subquery( 내부 subquery에서 외부 table의 값을 참조하고 싶을 때)
실행 : final dataset의 모든 row마다 다시 실행됨
사용 : advanced joining, filtering, evaluating data 같은 특별한 연산에 사용
예시 : 평균보다 많은 골이 득점된 경기 찾기
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season='2012/2013') AS overall_avg
FROM
(SELECT
stage,
AVG(home_goal+away_goal) AS avg_goals
FROM match
WHERE season='2012/2013'
GROUP BY stage) AS s
WHERE s.avg_goals>(SELECT AVG(home_goal+away_goal)
FROM match AS m
WHERE s.stage>m.stage);
SELECT s.stage, ROUND(s.avg_goals,2_ AS avg_goal, (SELECT AVG(home_goal + away_goal) FROM match WHERE season='2012/2013') AS overall_avg FROM (SELECT stage, AVG(home_goal+away_goal) AS avg_goals FROM match WHERE season='2012/2013' GROUP BY stage) AS s WHERE s.avg_goals > (SELECT AVG(home_goal+away_goal) FROM match WHERE season='2012/2013');
simple vs correlated subqueries
simple
main query와 독립적으로 실행될 수 있음
전체 query에서 한번만 실행됨
예시 : 국가별 평균 골
SELECT
c.name AS country
AVG(m.home_goal+m.away_goal)
AS avg_goals
FROM country AS c
LEFT JOIN match AS m
ON c.id=m.country_id
GROUP BY country;
correlated
main query가 실행되어야 실행 가능
생성되는 모든 row마다 실행됨 → 실행시간 증가
SELECT
c.name AS country,
(SELECT
AVG(home_goal+away_goal)
FROM match AS m
WHERE m.country_id=c.id)
AS avg_goals
FROM country AS c
GROUP BY country;
3.2. Nested subqueries
정의 : 다른 subquery에 nested 된 subquery
사용: 여러번의 변환이 필요할 때
예시 : 월별 총 골 수와 월별 평균 골 수의 차이Extract : 날짜에서 월을 가져옴
SELECT EXTRACT(MONTH FROM date) AS month, SUM(m.home_goal+m.away_goal) AS total_goals, SUM(m.home_goal+m.away_goal)- (SELECT AVG(goals) FROM (SELECT EXTRACT(MONTH from date) AS month, SUM(home_goal+away_goal) AS goals FROM match GROUP BY month) AS s) AS diff FROM match AS m GROUP by month;
correlated nested subqueries
SELECT
c.name AS country,
(SELECT AVG(home_goal+away_goal)
FROM match AS m
WHERE m.coutry_id=c.id
AND id IN(
SELECT id
FROM match
WHERE season='2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;
예시 : 2011/2012 시즌에 각 국가의 평균 골 수
3.3. Common Table Expressions(CTE)
필요성 : 길어지고 읽기 어려운 query
정의 : main query 전에 선언된 table
사용 : WITH
WITH cte AS( SELECT col1,col2 FROM table)
예시
WITH s AS( SELECT country_id,id FROM match WHERE (home_goal+away_goal)>=10 ) SELECT c.name AS country, COUNT(s.id) AS matches, FROM country AS c INNER JOIN s ON c.id=s.country_id GROUP BY country;
multiple CTEs
WITH s1 AS( SELECT country_id,id FROM match WHERE (home_goal+away_goal)>=10), s2 AS( SELECT country_id,id FROM match WHERE (home_goal+away_goal)<=1 )
why?
한번만 실행됨 - 메모리에 저장되어 query 효율이 좋아짐
query를 정리하는 데에 도움
다른 CTE를 reference 할 수 있음
스스로를 reference 할 수 있음(SELF JOIN)
3.4. Deciding on techniques to use
같은 task를 할 수 있는 여러 가지 방법을 배웠음
differentiating Techniques
2+ table 합칠 때
간단한 연산
correlated subqueries
subquery와 table을 match
한 table 당 두 개의 column을 join 할 수 없는 join의 한계 극복
시간 오래 걸림
Join
Multiple/Nested Subqueries
여러 단계의 변환이 필요할 때
Common Table Expressions
다양한 piece of information이 필요할 때
subquery를 순차적으로 정리
다른 CTE를 참조 가능
4. Window Functions
4.1. It’s OVER
필요성
열 기준 연산들 → 행을 기준으로는?
GROUP BY : aggregate value를 non-aggregate data와 비교할 수 없음
Window functions
이미 generate된 result set(window)을 가지고 연산, 비교할 수 있도록
data를 group by 하지 않아도 aggregate calculation을 할 수 있음 - SELECT 안의 subquery처럼
running total, ranking, moving average 등
비교 : 2011/2012 시즌의 득점 수, 득점 평균
SELECT
date,
(home_goal+away_goal) AS goals,
(SELECT AVG(home_goal+away_goal)
FROM match
WHERE season='2011/2012') AS
overall_avg
FROM match
WHERE season='2011/2012';
SELECT
date,
(home_goal+away_goal) AS goals,
AVG(home_goal+away_goal) OVER() AS overall_avg
FROM match
WHERE season='2011/2012';
generated result set을 바로 사용
Generate a RANK
SELECT
date,
(home_goal + away_goal) AS goals,
RANK() OVER(ORDER BY home_goal+away_goal DESC) AS goals_rank
FROM match
WHERE season='2011/2012'
RANK() OVER(ORDER BY {순위를 매기고 싶은 기준)}동점이 있으면 skip 1→3
default : ASC
예시 : 득점 수를 순위 매기기
key differences
ORDER BY를 제외하고 모든 query 이후에 연산되기 때문에, 기존 database가 아닌 result set의 정보를 사용함
SQLite에는 없음
4.2. OVER with a PARTITION
PARTITION BY
서로 다른 카테고리에서 value를 연산할 수 있음
같은 column에서 다른 aggregate value를 연산할 수 있게 함
사용 : AVG(home_goal) OVER(PARTITION BY {카테고리})
SELECT date, (home_goal+away_goal) AS goals, AVG(home_goal+away_goal) OVER(PARTITION BY season) AS season_avg FROM match;
각 match의 득점 수, season의 평균 득점 수
PARTITION BY multiple columns
OVER(PARTITION BY m.season,c.name) AS season_ctry_avg
PARTITION BY considerations
한 개 이상의 column에 대한 data를 partition 할 수 있음
aggregate calculation, rank 등에 대해서도 partition 할 수 있음
4.3. Sliding windows
필요성
aggregate function , rank 외에 dataset의 subsequent row에 따라 변하는 값을 calculate
Sliding windows
현재 row에 대해 상대적으로 연산을 수행하는 window function
1개 이상의 column에 대해 partition 될 수 있음
Sliding window keywords
<start>, <finish>에 들어가는 키워드FOLLOWING : row afterUNBOUNDED FOLLOWING : every row til end
CURRENT ROW : stop at current row
UNBOUNDED PRECEDING : every row since beginning
PRECEDING : row before
예시
SELECT
date,
home_goal,
away_goal,
SUM(home_goal)
OVER(ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM match
WHERE hometown_id=8456 AND season='2011/2012';
ROWS BETWEEN <start> AND <finish>
Sliding window frame
OVER(ORDER BY date ROWS BETWEEN
1 PRECEDING AND CURRENT ROW) AS running_total
4.4. Bringing it all together
case study : Who defeated 맨유 in 2013/2014 season?
get team names with CTEs
get match outcome with CASE
determine how badly they los with window function
-- Set up the home team CTE
WITH home AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome
FROM match AS m
LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
-- Set up the away team CTE
away AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
WHEN m.home_goal < m.away_goal THEN 'MU Win'
ELSE 'Tie' END AS outcome
FROM match AS m
LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
-- Select columns and and rank the matches by goal difference
SELECT DISTINCT
m.date,
home.team_long_name AS home_team,
away.team_long_name AS away_team,
m.home_goal, m.away_goal,
RANK() OVER(ORDER BY ABS(home_goal - away_goal) DESC) as match_rank
-- Join the CTEs onto the match table
FROM match AS m
LEFT JOIN home ON m.id = home.id
LEFT JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
AND ((home.team_long_name = 'Manchester United' AND home.outcome = 'MU Loss')
OR (away.team_long_name = 'Manchester United' AND away.outcome = 'MU Loss'));
PostgreSQL Summary Stats and Window Functions
1. Introduction to window functions
1.1. Introduction
Window functions
현재 row와 관련 있는 연산을 수행
GROUP BY 로 aggregate 연산을 수행하는 것과 비슷하지만, 모든 row가 결과에 포함됨
사용
바로 이전, 이후 row의 값을 fetch
row의 rank를 assign
running total, running average
Anatomy of a window function
OVER 가 window function임을 나타냄
subquery 예 : ORDER BY, PARTITION BY, ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED
FUCTION_NAME() OVER (empty or subquery)
Row numbers
대표적인 window function
ROW_NUMBER결과
SELECT YEAR, Event, Country, ROW_NUMBER() OVER () AS Row_N FROM Summer_Medals WHERE Medal='Gold';
1.2. ORDER BY
ORDER BY as subclauses within OVER clause
예시 : ORDER BY Year DESC : 가장 최근 연도가 1이 됨
SELECT Year, Event, Country, ROW_NUMBER() OVER (ORDER BY Year DESC) AS Row_N FROM Summer_Medals WHERE Medal='Gold';
Ordering by multiple columns
ROW_NUMBER() OVER (ORDER BY Year DESC, EVENT ASC) AS Row_N
Ordering in-and-outside OVER
in : assign Row_N
out : 결과 dataset의 출력 순서
SELECT
Year, Event, Country,
ROW_NUMBER() OVER (ORDER BY Year DESC, Event ASC) AS Row_N
FROM Summer_Medals
WHERE
Medal='Gold';
ORDER BY Country ASC, Row_N ASC;
결과
실행 순서 : in(OVER)→out
Application : Reigning champion( 현재, 이전 년도를 모두 이긴 champion)
전년도와 이번 년도의 champion이 같은 row의 두 column이 되어 비교할 수 있어야 함
keyword : LAG(column,n)
현재 row에서 같은 columns의 n개 이전 row값을 return
n=1이면 바로 이전 row의 값을 반환
WITH Discus_Gold AS (
SELECT
Year, Country AS Champion
FROM Summer_Medals
WHERE
Year IN (1996, 2000, 2004, 2008, 2012)
AND Gender='Men' AND Medal='Gold'
AND Event='Discus Throw')
SELECT
YEAR, Champion,
LAG(Champion, 1) OVER
(ORDER BY Year ASC) AS Last_Champion
FROM Discus_Gold
ORDER BY Year ASC;
결과
1.3. PARTITION BY
column의 값에 따라 split table into partition
GROUP BY와 역할이 유사하지만 하나의 column으로 합쳐지는 것이 아님
window function을 이용해서 따로따로 연산됨
ROW_NUMBER : 각 partition 마다 reset
LAG : 같은 partition의 previous value만 fetch
예시 : 종목별로 reigning champion 찾기결과
WITH Discus_Gold AS (...) SELECT Year, Event, Champion, LAG(Champion) OVER (PARTITION BY Event ORDER BY Event ASC, YEAR ASC) AS Last_Champion FROM Discus_Gold ORDER BY Event ASC, YEAR ASC;
Partitioning by multiple columns
ROW_NUMBER() OVER (PARTITION BY Year, Country)
2. Fetching, ranking, and paging
2.1. Fetching
Fetching : table의 값을 하나의 column으로 fetch해 오는 것
The four functions
Relative : current row를 기준으로
LAG(column, n) : 현재 row를 기준으로 같은 column의 n row 전의 값을 return
LEAD(column,n) : 현재 row를 기준으로 같은 column의 n row 이후 값을 return
Absolute: current row와 무관
FIRST_VALUE(column) : table 또는 partition의 첫번째 값을 return
LAST_VALUE(column) : table 또는 partition의 마지막 값을 return
LEAD
WITH Hosts AS(
SELECT DISTINCT Year, City
FROM Summer_Medals)
SELECT
Year, City,
LEAD(City,1) OVER (ORDER BY Year ASC)
AS Next_City,
LEAD(City,2) OVER (ORDER BY YEAR ASC)
AS After_Next_City
FROM Hosts
ORDER BY Year ASC;
예시 : 해당 년도 올림픽 개최국, 다음 올림픽 개최국, 그 다음 올림픽 개최국 fetch
FIRST_VALUE and LAST_VALUE
LAST_VALUE는 range를 정해줘야 한다
SELECT
Year, City,
FIRST_VALUE(City) OVER
(ORDER BY YEAR ASC) AS First_City,
LAST_VALUE(City) OVER (
ORDER BY YEAR ASC
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS Last_City
FROM Hosts
ORDER BY Year ASC;
예시 : 첫번째 올림픽 개최국, 마지막 올림픽 개최국
PARTITION BY 와 함께 사용할 수 있다
2.2. Ranking
The ranking functionsROW_NUMBER
unique하게 assign 됨. 값이 같아도 다른 값을 가지고 rank를 매김
마지막 rank가 전체 row의 수와 같다
RANK
같은 값이면 같은 값을 assign 하고 skip 1→3
마지막 rank가 전체 row의 수와 같다
DENSE_RANK
같은 값이면 같은 값을 assign하지만, skip 하지 않음. 1→2
마지막 rank의 값은 unique value의 수
Ranking with partitioning
WITH Country_Medals AS (...)
SELECT
Country, Athlete,
DENSE_RANK()
OVER (PARTITION BY Country
ORDER BY Medals DESC) AS Rank_N
FROM Country_Medals
ORDER BY Country ASC, Medals DESC;
예시 : 국가별로 rank
2.3. Paging
Paging : data를 비슷한 크기의 chunk로 split 하는 것
보내지는 data의 양을 줄이기 위해 page 단위로 return
Performance를 확인하기 위해 quartile이나 third로 나눔
NTILE(n)
n page로 data를 split
예시 : 총 67 row의 data를 15개의 chunk로 나눔결과
WITH Disciplines AS ( SELECT DISTINCT Discipline FROM Summer_Medals) SELECT Discipline, NTILE(15) OVER () AS Page FROM Disciplines ORDER BY PAGE Asc;
Top, middle, and bottom thirds
n=3으로 설정, ORDER BY
예시 : 메달 수를 기준으로 3개의 chunk로 나눔
WITH Country_Medals AS( SELECT Country, COUNT(*) AS Medals FROM Summer_Medals GROUP BY Country), SELECT Country, Medals NTILE(3) OVER (ORDER BY Medals DESC) AS THIRD FROM Country_Medals;
활용 : Thirds averages
예시 : 메달 수를 기준으로 3개의 chunk로 국가를 나눴을 때, 각 chunk 내부에서 메달 개수의 평균 구하기
WITH Country_Medals AS (...), Thirds AS( SELECT Country, Medals, NTILE(3) OVER (ORDER BY Medals DESC) AS Third FROM Country_Medals) SELECT Third, ROUND(AVG(Medals),2) AS Avg_Medals FROM Thirds GROUP BY Third ORDER BY Third ASC;
댓글 영역