WITH brazil_Medals AS(...)
Select
Year, Medals,
MAX(Medals)
Over(order by year asc) AS MAx_medals
From Brazil_medals;
Partitioning with aggregate window functions
OVER( PARTITION BY variable)
Frames
: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론
→ range between unbounded preceding and unbounded following
(범위를 한정해주는 역할을 한다)
Row between (start) and (finish)
n preceding : n row before the current row
current row
n following : n row after current row
ex) row between 3 preceding and current row
max(medals) over(order by year asc rows between current row and 1 following
example Frame
With Russia_medals as (...)
SELECT
Year, Medals,
Max(Medals)
Over(order by year asc) as max_medals,
max(medals)
over(order by year asc
rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc
Moving averages and totals
: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals
→ average of last n periods / total of last n periods
ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)
sum(medals) OVER( )
moving average
WITH US_medals AS(...)
SELECT
Year, Medals,
AVG(Medals) OVER
(ORDER BY Year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM US_medals
ORDER BY Year ASC
RANGE between [start] and [finish]
rows between과 비슷하다
OVER의 order by 똑같이 취급한다
RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다
Pivoting
: what if you want to transform ranking data from a vertical a horizontal structure ?
Transforming tables
pivoted by year: 열이 year의 Unique value로 들어가게 된다
CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * from crosstab($$
select country, year, count(*) :: integer as awards )
FROM Summer_medals
where
country in ('CHN','RUS','USA')
AND Year IN (2008,2012)
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)
ORDER BY Country asc
ROLLUP and CUBE
: SUM aggregate function with windows
→ What if you want to calculate group-level and grand totals
Rollup : Group by subclause
ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
hierarchical
ROLLUP(Country, Medal) include Country - level total
ROLLUP(Medal,Country) include Medal - level total
CUBE : non-hierarchical ROLLUP
generate all possible group-level aggregation
ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
ROLLUP과 CUBE 비교
A survey of useful functions
ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음
Pivot & LAG and LEAD 등등
coalesce(): take a list of values and returns the first non-null value
first non-null value
null value 대체하는 것
Compress data
Rank - order ⇒ redundant
String_AGG(column, separator)
결과물 separator 이용해서 표기하는 방법
What’s in the database
NULL - missing
IS NULL, IS NOT NULL - don’t use ‘=NULL’
count(*) - number of rows
count(column_name) - number of non-NULL values
count(DISTINCT column_name) - number of different non-NULL values
SELECT DISTINCT column_name - distinct values, including NULL
The keys to the database
Foreign keys
reference another row
in different table or same table
via unique ID (primary keys) → unique + non-NULL
value in referenced column + null → if null ⇒ no relationship for that row
select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
return a value from column
select percentile_count(percentile) within group (order by column_name)
interploate between values
ex) val 1,3,4,5
common issues
error codes
example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
missing vale
NA, NaN
0 = missing or 0 ?
outlier
not really a number
zip codes, survey response categories
Creating temporary tables
Create Temporary table
# create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
Insert into table → new row 생성
INSERT INTO top_companies
Select rank, title
from fortune500
where rank between 11 and 20
Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies
Aggregate window functions
단순히 max, sum 함수를 사용한 경우
window max query
WITH brazil_Medals AS(...)
Select
Year, Medals,
MAX(Medals)
Over(order by year asc) AS MAx_medals
From Brazil_medals;
Partitioning with aggregate window functions
OVER( PARTITION BY variable)
Frames
: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론
→ range between unbounded preceding and unbounded following
(범위를 한정해주는 역할을 한다)
Row between (start) and (finish)
n preceding : n row before the current row
current row
n following : n row after current row
ex) row between 3 preceding and current row
max(medals) over(order by year asc rows between current row and 1 following
example Frame
With Russia_medals as (...)
SELECT
Year, Medals,
Max(Medals)
Over(order by year asc) as max_medals,
max(medals)
over(order by year asc
rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc
Moving averages and totals
: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals
→ average of last n periods / total of last n periods
ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)
sum(medals) OVER( )
moving average
WITH US_medals AS(...)
SELECT
Year, Medals,
AVG(Medals) OVER
(ORDER BY Year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM US_medals
ORDER BY Year ASC
RANGE between [start] and [finish]
rows between과 비슷하다
OVER의 order by 똑같이 취급한다
RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다
Pivoting
: what if you want to transform ranking data from a vertical a horizontal structure ?
Transforming tables
pivoted by year: 열이 year의 Unique value로 들어가게 된다
CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * from crosstab($$
select country, year, count(*) :: integer as awards )
FROM Summer_medals
where
country in ('CHN','RUS','USA')
AND Year IN (2008,2012)
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)
ORDER BY Country asc
ROLLUP and CUBE
: SUM aggregate function with windows
→ What if you want to calculate group-level and grand totals
Rollup : Group by subclause
ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
hierarchical
ROLLUP(Country, Medal) include Country - level total
ROLLUP(Medal,Country) include Medal - level total
CUBE : non-hierarchical ROLLUP
generate all possible group-level aggregation
ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
ROLLUP과 CUBE 비교
A survey of useful functions
ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음
Pivot & LAG and LEAD 등등
coalesce(): take a list of values and returns the first non-null value
first non-null value
null value 대체하는 것
Compress data
Rank - order ⇒ redundant
String_AGG(column, separator)
결과물 separator 이용해서 표기하는 방법
What’s in the database
NULL - missing
IS NULL, IS NOT NULL - don’t use ‘=NULL’
count(*) - number of rows
count(column_name) - number of non-NULL values
count(DISTINCT column_name) - number of different non-NULL values
SELECT DISTINCT column_name - distinct values, including NULL
The keys to the database
Foreign keys
reference another row
in different table or same table
via unique ID (primary keys) → unique + non-NULL
value in referenced column + null → if null ⇒ no relationship for that row
select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
return a value from column
select percentile_count(percentile) within group (order by column_name)
interploate between values
ex) val 1,3,4,5
common issues
error codes
example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
missing vale
NA, NaN
0 = missing or 0 ?
outlier
not really a number
zip codes, survey response categories
Creating temporary tables
Create Temporary table
# create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
Insert into table → new row 생성
INSERT INTO top_companies
Select rank, title
from fortune500
where rank between 11 and 20
Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies
Aggregate window functions
단순히 max, sum 함수를 사용한 경우
window max query
WITH brazil_Medals AS(...)
Select
Year, Medals,
MAX(Medals)
Over(order by year asc) AS MAx_medals
From Brazil_medals;
Partitioning with aggregate window functions
OVER( PARTITION BY variable)
Frames
: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론
→ range between unbounded preceding and unbounded following
(범위를 한정해주는 역할을 한다)
Row between (start) and (finish)
n preceding : n row before the current row
current row
n following : n row after current row
ex) row between 3 preceding and current row
max(medals) over(order by year asc rows between current row and 1 following
example Frame
With Russia_medals as (...)
SELECT
Year, Medals,
Max(Medals)
Over(order by year asc) as max_medals,
max(medals)
over(order by year asc
rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc
Moving averages and totals
: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals
→ average of last n periods / total of last n periods
ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)
sum(medals) OVER( )
moving average
WITH US_medals AS(...)
SELECT
Year, Medals,
AVG(Medals) OVER
(ORDER BY Year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM US_medals
ORDER BY Year ASC
RANGE between [start] and [finish]
rows between과 비슷하다
OVER의 order by 똑같이 취급한다
RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다
Pivoting
: what if you want to transform ranking data from a vertical a horizontal structure ?
Transforming tables
pivoted by year: 열이 year의 Unique value로 들어가게 된다
CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * from crosstab($$
select country, year, count(*) :: integer as awards )
FROM Summer_medals
where
country in ('CHN','RUS','USA')
AND Year IN (2008,2012)
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)
ORDER BY Country asc
ROLLUP and CUBE
: SUM aggregate function with windows
→ What if you want to calculate group-level and grand totals
Rollup : Group by subclause
ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
hierarchical
ROLLUP(Country, Medal) include Country - level total
ROLLUP(Medal,Country) include Medal - level total
CUBE : non-hierarchical ROLLUP
generate all possible group-level aggregation
ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
ROLLUP과 CUBE 비교
A survey of useful functions
ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음
Pivot & LAG and LEAD 등등
coalesce(): take a list of values and returns the first non-null value
first non-null value
null value 대체하는 것
Compress data
Rank - order ⇒ redundant
String_AGG(column, separator)
결과물 separator 이용해서 표기하는 방법
What’s in the database
NULL - missing
IS NULL, IS NOT NULL - don’t use ‘=NULL’
count(*) - number of rows
count(column_name) - number of non-NULL values
count(DISTINCT column_name) - number of different non-NULL values
SELECT DISTINCT column_name - distinct values, including NULL
The keys to the database
Foreign keys
reference another row
in different table or same table
via unique ID (primary keys) → unique + non-NULL
value in referenced column + null → if null ⇒ no relationship for that row
select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
return a value from column
select percentile_count(percentile) within group (order by column_name)
interploate between values
ex) val 1,3,4,5
common issues
error codes
example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
missing vale
NA, NaN
0 = missing or 0 ?
outlier
not really a number
zip codes, survey response categories
Creating temporary tables
Create Temporary table
# create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
Insert into table → new row 생성
INSERT INTO top_companies
Select rank, title
from fortune500
where rank between 11 and 20
Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies
Aggregate window functions
단순히 max, sum 함수를 사용한 경우
window max query
WITH brazil_Medals AS(...)
Select
Year, Medals,
MAX(Medals)
Over(order by year asc) AS MAx_medals
From Brazil_medals;
Partitioning with aggregate window functions
OVER( PARTITION BY variable)
Frames
: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론
→ range between unbounded preceding and unbounded following
(범위를 한정해주는 역할을 한다)
Row between (start) and (finish)
n preceding : n row before the current row
current row
n following : n row after current row
ex) row between 3 preceding and current row
max(medals) over(order by year asc rows between current row and 1 following
example Frame
With Russia_medals as (...)
SELECT
Year, Medals,
Max(Medals)
Over(order by year asc) as max_medals,
max(medals)
over(order by year asc
rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc
Moving averages and totals
: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals
→ average of last n periods / total of last n periods
ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)
sum(medals) OVER( )
moving average
WITH US_medals AS(...)
SELECT
Year, Medals,
AVG(Medals) OVER
(ORDER BY Year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM US_medals
ORDER BY Year ASC
RANGE between [start] and [finish]
rows between과 비슷하다
OVER의 order by 똑같이 취급한다
RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다
Pivoting
: what if you want to transform ranking data from a vertical a horizontal structure ?
Transforming tables
pivoted by year: 열이 year의 Unique value로 들어가게 된다
CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * from crosstab($$
select country, year, count(*) :: integer as awards )
FROM Summer_medals
where
country in ('CHN','RUS','USA')
AND Year IN (2008,2012)
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)
ORDER BY Country asc
ROLLUP and CUBE
: SUM aggregate function with windows
→ What if you want to calculate group-level and grand totals
Rollup : Group by subclause
ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
hierarchical
ROLLUP(Country, Medal) include Country - level total
ROLLUP(Medal,Country) include Medal - level total
CUBE : non-hierarchical ROLLUP
generate all possible group-level aggregation
ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
ROLLUP과 CUBE 비교
A survey of useful functions
ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음
Pivot & LAG and LEAD 등등
coalesce(): take a list of values and returns the first non-null value
first non-null value
null value 대체하는 것
Compress data
Rank - order ⇒ redundant
String_AGG(column, separator)
결과물 separator 이용해서 표기하는 방법
What’s in the database
NULL - missing
IS NULL, IS NOT NULL - don’t use ‘=NULL’
count(*) - number of rows
count(column_name) - number of non-NULL values
count(DISTINCT column_name) - number of different non-NULL values
SELECT DISTINCT column_name - distinct values, including NULL
The keys to the database
Foreign keys
reference another row
in different table or same table
via unique ID (primary keys) → unique + non-NULL
value in referenced column + null → if null ⇒ no relationship for that row
select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
return a value from column
select percentile_count(percentile) within group (order by column_name)
interploate between values
ex) val 1,3,4,5
common issues
error codes
example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
missing vale
NA, NaN
0 = missing or 0 ?
outlier
not really a number
zip codes, survey response categories
Creating temporary tables
Create Temporary table
# create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
Insert into table → new row 생성
INSERT INTO top_companies
Select rank, title
from fortune500
where rank between 11 and 20
Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies
Aggregate window functions
단순히 max, sum 함수를 사용한 경우
window max query
WITH brazil_Medals AS(...)
Select
Year, Medals,
MAX(Medals)
Over(order by year asc) AS MAx_medals
From Brazil_medals;
Partitioning with aggregate window functions
OVER( PARTITION BY variable)
Frames
: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론
→ range between unbounded preceding and unbounded following
(범위를 한정해주는 역할을 한다)
Row between (start) and (finish)
n preceding : n row before the current row
current row
n following : n row after current row
ex) row between 3 preceding and current row
max(medals) over(order by year asc rows between current row and 1 following
example Frame
With Russia_medals as (...)
SELECT
Year, Medals,
Max(Medals)
Over(order by year asc) as max_medals,
max(medals)
over(order by year asc
rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc
Moving averages and totals
: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals
→ average of last n periods / total of last n periods
ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)
sum(medals) OVER( )
moving average
WITH US_medals AS(...)
SELECT
Year, Medals,
AVG(Medals) OVER
(ORDER BY Year ASC
ROWS BETWEEN
댓글 영역