상세 컨텐츠

본문 제목

[SQL 스터디_1팀] 5주차_강의노트 (2)

심화 스터디/SQL 스터디

by 라라밤쥬 2023. 5. 2. 10:23

본문

https://stormy-soup-0ee.notion.site/7561c17a9d4f43048d112b092982073a

 

박지우

Aggregate window functions

stormy-soup-0ee.notion.site

- 상세 설명 (이미지 포함) 위의 노션 참고 

Aggregate window functions

  • 단순히 max, sum 함수를 사용한 경우
  • window max query
WITH brazil_Medals AS(...)

Select 
 Year, Medals,
 MAX(Medals)
	Over(order by year asc) AS MAx_medals
From Brazil_medals;
  • Partitioning with aggregate window functions
    • OVER( PARTITION BY variable)

Frames

: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론

→ range between unbounded preceding and unbounded following

(범위를 한정해주는 역할을 한다)

  • Row between (start) and (finish)
    • n preceding : n row before the current row
    • current row
    • n following : n row after current row

ex) row between 3 preceding and current row

max(medals) over(order by year asc rows between current row and 1 following

  • example Frame
With Russia_medals as (...)

SELECT
Year, Medals,
Max(Medals)
	Over(order by year asc) as max_medals,
max(medals)
	over(order by year asc 
				rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc

Moving averages and totals

: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals

→ average of last n periods / total of last n periods

ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)

sum(medals) OVER( )

  • moving average
WITH US_medals AS(...)

SELECT 
Year, Medals,
AVG(Medals) OVER
	(ORDER BY Year ASC
	ROWS BETWEEN 
	2 PRECEDING AND CURRENT ROW) AS Medals_MA 
FROM US_medals
ORDER BY Year ASC
  • RANGE between [start] and [finish]
    • rows between과 비슷하다
    • OVER의 order by 똑같이 취급한다
    • RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다

Pivoting

: what if you want to transform ranking data from a vertical a horizontal structure ?

Transforming tables

  • pivoted by year: 열이 year의 Unique value로 들어가게 된다
  • CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

select * from crosstab($$ 
		select country, year, count(*) :: integer as awards ) 
FROM Summer_medals 
where 
country in ('CHN','RUS','USA')
AND Year IN (2008,2012) 
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)

ORDER BY Country asc   

ROLLUP and CUBE

: SUM aggregate function with windows

→ What if you want to calculate group-level and grand totals

  • Rollup : Group by subclause
    • ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
    • hierarchical
      • ROLLUP(Country, Medal) include Country - level total
      • ROLLUP(Medal,Country) include Medal - level total
  • CUBE : non-hierarchical ROLLUP
    • generate all possible group-level aggregation
      • ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
    • ROLLUP과 CUBE 비교

A survey of useful functions

ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음

Pivot & LAG and LEAD 등등

  • coalesce(): take a list of values and returns the first non-null value
    • first non-null value
    • null value 대체하는 것
  1. Compress data
  • Rank - order ⇒ redundant
  • String_AGG(column, separator)
    • 결과물 separator 이용해서 표기하는 방법

What’s in the database

  • NULL - missing
  • IS NULL, IS NOT NULL - don’t use ‘=NULL’
  • count(*) - number of rows
  • count(column_name) - number of non-NULL values
  • count(DISTINCT column_name) - number of different non-NULL values
  • SELECT DISTINCT column_name - distinct values, including NULL

The keys to the database

  • Foreign keys
    • reference another row
      • in different table or same table
      • via unique ID (primary keys) → unique + non-NULL
      • value in referenced column + null → if null ⇒ no relationship for that row
  • coalesce function
    • coalesce(value_1, value_2 .. )
      • operate row by row
      • return first non-null values

Column types and constraints

  • Foreign key : value in referenced column or NULL
  • Primary key : unique / not NULL
  • Unique : all be different except for NULL
  • Not null : NULL not allowed
  • Check constraints : conditions
    • column1 > 0
    • columnA > columnB
  • Data types
    • Common
      • Numeric
      • Character
      • Date / Time
      • Boolean
    • Special
      • Arrays
      • Monetary
      • Binary
      • Geometric …
  • CAST() : data type 변경하기
    • SELECT CAST( value AS new_type)
      • CAST(3.7 AS integer) → 4
    • CAST() == ::
      • SELECT 3.7 ::integer

Numeric data types and summary functions

  • integer
  • decimal
  • Division
    • select 10/4 → integer
    • select 10/4.0 → numeric
  • min and max
  • mean
  • Variance
    • var_pop → population
    • var_samp → sample / variance = var_samp
  • Standard deviation
    • stddev_samp == stddev
    • stddev_pop
  • Round

Exploring distributions

  • Truncate
    • select trunc(42.1256,2) → 42.12
    • select trunc(12345,-3) → 12000
    • ex) Truncating and grouping
  • Generate series
    • select generate_series(1,10,2) → 1부터 10까지 2개씩 넘어가면서
    • Create bins: query

More summary functions

  • Correlation - relationship between two variables
    • select corr(assets, equity)
  • Median
    • select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
      • return a value from column
    • select percentile_count(percentile) within group (order by column_name)
      • interploate between values
      • ex) val 1,3,4,5
  • common issues
    • error codes
      • example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
    • missing vale
      • NA, NaN
      • 0 = missing or 0 ?
    • outlier
    • not really a number
      • zip codes, survey response categories

Creating temporary tables

  • Create Temporary table
  • # create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
  • Insert into table → new row 생성
INSERT INTO top_companies 
Select rank, title 
from fortune500
where rank between 11 and 20 
  • Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies 

Aggregate window functions

  • 단순히 max, sum 함수를 사용한 경우
  • window max query
WITH brazil_Medals AS(...)

Select 
 Year, Medals,
 MAX(Medals)
	Over(order by year asc) AS MAx_medals
From Brazil_medals;
  • Partitioning with aggregate window functions
    • OVER( PARTITION BY variable)

Frames

: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론

→ range between unbounded preceding and unbounded following

(범위를 한정해주는 역할을 한다)

  • Row between (start) and (finish)
    • n preceding : n row before the current row
    • current row
    • n following : n row after current row

ex) row between 3 preceding and current row

max(medals) over(order by year asc rows between current row and 1 following

  • example Frame
With Russia_medals as (...)

SELECT
Year, Medals,
Max(Medals)
	Over(order by year asc) as max_medals,
max(medals)
	over(order by year asc 
				rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc

Moving averages and totals

: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals

→ average of last n periods / total of last n periods

ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)

sum(medals) OVER( )

  • moving average
WITH US_medals AS(...)

SELECT 
Year, Medals,
AVG(Medals) OVER
	(ORDER BY Year ASC
	ROWS BETWEEN 
	2 PRECEDING AND CURRENT ROW) AS Medals_MA 
FROM US_medals
ORDER BY Year ASC
  • RANGE between [start] and [finish]
    • rows between과 비슷하다
    • OVER의 order by 똑같이 취급한다
    • RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다

Pivoting

: what if you want to transform ranking data from a vertical a horizontal structure ?

Transforming tables

  • pivoted by year: 열이 year의 Unique value로 들어가게 된다
  • CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

select * from crosstab($$ 
		select country, year, count(*) :: integer as awards ) 
FROM Summer_medals 
where 
country in ('CHN','RUS','USA')
AND Year IN (2008,2012) 
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)

ORDER BY Country asc   

ROLLUP and CUBE

: SUM aggregate function with windows

→ What if you want to calculate group-level and grand totals

  • Rollup : Group by subclause
    • ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
    • hierarchical
      • ROLLUP(Country, Medal) include Country - level total
      • ROLLUP(Medal,Country) include Medal - level total
  • CUBE : non-hierarchical ROLLUP
    • generate all possible group-level aggregation
      • ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
    • ROLLUP과 CUBE 비교

A survey of useful functions

ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음

Pivot & LAG and LEAD 등등

  • coalesce(): take a list of values and returns the first non-null value
    • first non-null value
    • null value 대체하는 것
  1. Compress data
  • Rank - order ⇒ redundant
  • String_AGG(column, separator)
    • 결과물 separator 이용해서 표기하는 방법

What’s in the database

  • NULL - missing
  • IS NULL, IS NOT NULL - don’t use ‘=NULL’
  • count(*) - number of rows
  • count(column_name) - number of non-NULL values
  • count(DISTINCT column_name) - number of different non-NULL values
  • SELECT DISTINCT column_name - distinct values, including NULL

The keys to the database

  • Foreign keys
    • reference another row
      • in different table or same table
      • via unique ID (primary keys) → unique + non-NULL
      • value in referenced column + null → if null ⇒ no relationship for that row
  • coalesce function
    • coalesce(value_1, value_2 .. )
      • operate row by row
      • return first non-null values

Column types and constraints

  • Foreign key : value in referenced column or NULL
  • Primary key : unique / not NULL
  • Unique : all be different except for NULL
  • Not null : NULL not allowed
  • Check constraints : conditions
    • column1 > 0
    • columnA > columnB
  • Data types
    • Common
      • Numeric
      • Character
      • Date / Time
      • Boolean
    • Special
      • Arrays
      • Monetary
      • Binary
      • Geometric …
  • CAST() : data type 변경하기
    • SELECT CAST( value AS new_type)
      • CAST(3.7 AS integer) → 4
    • CAST() == ::
      • SELECT 3.7 ::integer

Numeric data types and summary functions

  • integer
  • decimal
  • Division
    • select 10/4 → integer
    • select 10/4.0 → numeric
  • min and max
  • mean
  • Variance
    • var_pop → population
    • var_samp → sample / variance = var_samp
  • Standard deviation
    • stddev_samp == stddev
    • stddev_pop
  • Round

Exploring distributions

  • Truncate
    • select trunc(42.1256,2) → 42.12
    • select trunc(12345,-3) → 12000
    • ex) Truncating and grouping
  • Generate series
    • select generate_series(1,10,2) → 1부터 10까지 2개씩 넘어가면서
    • Create bins: query

More summary functions

  • Correlation - relationship between two variables
    • select corr(assets, equity)
  • Median
    • select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
      • return a value from column
    • select percentile_count(percentile) within group (order by column_name)
      • interploate between values
      • ex) val 1,3,4,5
  • common issues
    • error codes
      • example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
    • missing vale
      • NA, NaN
      • 0 = missing or 0 ?
    • outlier
    • not really a number
      • zip codes, survey response categories

Creating temporary tables

  • Create Temporary table
  • # create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
  • Insert into table → new row 생성
INSERT INTO top_companies 
Select rank, title 
from fortune500
where rank between 11 and 20 
  • Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies 

Aggregate window functions

  • 단순히 max, sum 함수를 사용한 경우
  • window max query
WITH brazil_Medals AS(...)

Select 
 Year, Medals,
 MAX(Medals)
	Over(order by year asc) AS MAx_medals
From Brazil_medals;
  • Partitioning with aggregate window functions
    • OVER( PARTITION BY variable)

Frames

: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론

→ range between unbounded preceding and unbounded following

(범위를 한정해주는 역할을 한다)

  • Row between (start) and (finish)
    • n preceding : n row before the current row
    • current row
    • n following : n row after current row

ex) row between 3 preceding and current row

max(medals) over(order by year asc rows between current row and 1 following

  • example Frame
With Russia_medals as (...)

SELECT
Year, Medals,
Max(Medals)
	Over(order by year asc) as max_medals,
max(medals)
	over(order by year asc 
				rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc

Moving averages and totals

: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals

→ average of last n periods / total of last n periods

ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)

sum(medals) OVER( )

  • moving average
WITH US_medals AS(...)

SELECT 
Year, Medals,
AVG(Medals) OVER
	(ORDER BY Year ASC
	ROWS BETWEEN 
	2 PRECEDING AND CURRENT ROW) AS Medals_MA 
FROM US_medals
ORDER BY Year ASC
  • RANGE between [start] and [finish]
    • rows between과 비슷하다
    • OVER의 order by 똑같이 취급한다
    • RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다

Pivoting

: what if you want to transform ranking data from a vertical a horizontal structure ?

Transforming tables

  • pivoted by year: 열이 year의 Unique value로 들어가게 된다
  • CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

select * from crosstab($$ 
		select country, year, count(*) :: integer as awards ) 
FROM Summer_medals 
where 
country in ('CHN','RUS','USA')
AND Year IN (2008,2012) 
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)

ORDER BY Country asc   

ROLLUP and CUBE

: SUM aggregate function with windows

→ What if you want to calculate group-level and grand totals

  • Rollup : Group by subclause
    • ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
    • hierarchical
      • ROLLUP(Country, Medal) include Country - level total
      • ROLLUP(Medal,Country) include Medal - level total
  • CUBE : non-hierarchical ROLLUP
    • generate all possible group-level aggregation
      • ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
    • ROLLUP과 CUBE 비교

A survey of useful functions

ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음

Pivot & LAG and LEAD 등등

  • coalesce(): take a list of values and returns the first non-null value
    • first non-null value
    • null value 대체하는 것
  1. Compress data
  • Rank - order ⇒ redundant
  • String_AGG(column, separator)
    • 결과물 separator 이용해서 표기하는 방법

What’s in the database

  • NULL - missing
  • IS NULL, IS NOT NULL - don’t use ‘=NULL’
  • count(*) - number of rows
  • count(column_name) - number of non-NULL values
  • count(DISTINCT column_name) - number of different non-NULL values
  • SELECT DISTINCT column_name - distinct values, including NULL

The keys to the database

  • Foreign keys
    • reference another row
      • in different table or same table
      • via unique ID (primary keys) → unique + non-NULL
      • value in referenced column + null → if null ⇒ no relationship for that row
  • coalesce function
    • coalesce(value_1, value_2 .. )
      • operate row by row
      • return first non-null values

Column types and constraints

  • Foreign key : value in referenced column or NULL
  • Primary key : unique / not NULL
  • Unique : all be different except for NULL
  • Not null : NULL not allowed
  • Check constraints : conditions
    • column1 > 0
    • columnA > columnB
  • Data types
    • Common
      • Numeric
      • Character
      • Date / Time
      • Boolean
    • Special
      • Arrays
      • Monetary
      • Binary
      • Geometric …
  • CAST() : data type 변경하기
    • SELECT CAST( value AS new_type)
      • CAST(3.7 AS integer) → 4
    • CAST() == ::
      • SELECT 3.7 ::integer

Numeric data types and summary functions

  • integer
  • decimal
  • Division
    • select 10/4 → integer
    • select 10/4.0 → numeric
  • min and max
  • mean
  • Variance
    • var_pop → population
    • var_samp → sample / variance = var_samp
  • Standard deviation
    • stddev_samp == stddev
    • stddev_pop
  • Round

Exploring distributions

  • Truncate
    • select trunc(42.1256,2) → 42.12
    • select trunc(12345,-3) → 12000
    • ex) Truncating and grouping
  • Generate series
    • select generate_series(1,10,2) → 1부터 10까지 2개씩 넘어가면서
    • Create bins: query

More summary functions

  • Correlation - relationship between two variables
    • select corr(assets, equity)
  • Median
    • select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
      • return a value from column
    • select percentile_count(percentile) within group (order by column_name)
      • interploate between values
      • ex) val 1,3,4,5
  • common issues
    • error codes
      • example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
    • missing vale
      • NA, NaN
      • 0 = missing or 0 ?
    • outlier
    • not really a number
      • zip codes, survey response categories

Creating temporary tables

  • Create Temporary table
  • # create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
  • Insert into table → new row 생성
INSERT INTO top_companies 
Select rank, title 
from fortune500
where rank between 11 and 20 
  • Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies 

Aggregate window functions

  • 단순히 max, sum 함수를 사용한 경우
  • window max query
WITH brazil_Medals AS(...)

Select 
 Year, Medals,
 MAX(Medals)
	Over(order by year asc) AS MAx_medals
From Brazil_medals;
  • Partitioning with aggregate window functions
    • OVER( PARTITION BY variable)

Frames

: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론

→ range between unbounded preceding and unbounded following

(범위를 한정해주는 역할을 한다)

  • Row between (start) and (finish)
    • n preceding : n row before the current row
    • current row
    • n following : n row after current row

ex) row between 3 preceding and current row

max(medals) over(order by year asc rows between current row and 1 following

  • example Frame
With Russia_medals as (...)

SELECT
Year, Medals,
Max(Medals)
	Over(order by year asc) as max_medals,
max(medals)
	over(order by year asc 
				rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc

Moving averages and totals

: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals

→ average of last n periods / total of last n periods

ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)

sum(medals) OVER( )

  • moving average
WITH US_medals AS(...)

SELECT 
Year, Medals,
AVG(Medals) OVER
	(ORDER BY Year ASC
	ROWS BETWEEN 
	2 PRECEDING AND CURRENT ROW) AS Medals_MA 
FROM US_medals
ORDER BY Year ASC
  • RANGE between [start] and [finish]
    • rows between과 비슷하다
    • OVER의 order by 똑같이 취급한다
    • RANGE는 ORDER BY 하위 절의 열에 있는 중복 항목을 단일 엔터티로 처리한다

Pivoting

: what if you want to transform ranking data from a vertical a horizontal structure ?

Transforming tables

  • pivoted by year: 열이 year의 Unique value로 들어가게 된다
  • CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

select * from crosstab($$ 
		select country, year, count(*) :: integer as awards ) 
FROM Summer_medals 
where 
country in ('CHN','RUS','USA')
AND Year IN (2008,2012) 
and Medal ='Gold'
GROUP By country, year
order by country asc, year asc; $$)
AS ct(Country Varchar, "2008" INTEGER ,"2012" INTEGER)

ORDER BY Country asc   

ROLLUP and CUBE

: SUM aggregate function with windows

→ What if you want to calculate group-level and grand totals

  • Rollup : Group by subclause
    • ex) group by country, RollUP(Medal) → count all Country and Medal level totals → count only Country level and fill in Medal with null
    • hierarchical
      • ROLLUP(Country, Medal) include Country - level total
      • ROLLUP(Medal,Country) include Medal - level total
  • CUBE : non-hierarchical ROLLUP
    • generate all possible group-level aggregation
      • ex) CUBE(Country, Medal) → count Country, Medal level and grand totals
    • ROLLUP과 CUBE 비교

A survey of useful functions

ex. ROLLUP CUBE → null 값의 경우가 생김 → 이를 처리하고 싶음

Pivot & LAG and LEAD 등등

  • coalesce(): take a list of values and returns the first non-null value
    • first non-null value
    • null value 대체하는 것
  1. Compress data
  • Rank - order ⇒ redundant
  • String_AGG(column, separator)
    • 결과물 separator 이용해서 표기하는 방법

What’s in the database

  • NULL - missing
  • IS NULL, IS NOT NULL - don’t use ‘=NULL’
  • count(*) - number of rows
  • count(column_name) - number of non-NULL values
  • count(DISTINCT column_name) - number of different non-NULL values
  • SELECT DISTINCT column_name - distinct values, including NULL

The keys to the database

  • Foreign keys
    • reference another row
      • in different table or same table
      • via unique ID (primary keys) → unique + non-NULL
      • value in referenced column + null → if null ⇒ no relationship for that row
  • coalesce function
    • coalesce(value_1, value_2 .. )
      • operate row by row
      • return first non-null values

Column types and constraints

  • Foreign key : value in referenced column or NULL
  • Primary key : unique / not NULL
  • Unique : all be different except for NULL
  • Not null : NULL not allowed
  • Check constraints : conditions
    • column1 > 0
    • columnA > columnB
  • Data types
    • Common
      • Numeric
      • Character
      • Date / Time
      • Boolean
    • Special
      • Arrays
      • Monetary
      • Binary
      • Geometric …
  • CAST() : data type 변경하기
    • SELECT CAST( value AS new_type)
      • CAST(3.7 AS integer) → 4
    • CAST() == ::
      • SELECT 3.7 ::integer

Numeric data types and summary functions

  • integer
  • decimal
  • Division
    • select 10/4 → integer
    • select 10/4.0 → numeric
  • min and max
  • mean
  • Variance
    • var_pop → population
    • var_samp → sample / variance = var_samp
  • Standard deviation
    • stddev_samp == stddev
    • stddev_pop
  • Round

Exploring distributions

  • Truncate
    • select trunc(42.1256,2) → 42.12
    • select trunc(12345,-3) → 12000
    • ex) Truncating and grouping
  • Generate series
    • select generate_series(1,10,2) → 1부터 10까지 2개씩 넘어가면서
    • Create bins: query

More summary functions

  • Correlation - relationship between two variables
    • select corr(assets, equity)
  • Median
    • select percentile_disc(perccentile) within group (order by column_name) → percentile : 0과 1 사이
      • return a value from column
    • select percentile_count(percentile) within group (order by column_name)
      • interploate between values
      • ex) val 1,3,4,5
  • common issues
    • error codes
      • example: 9, 99, -99 → database에 따라 다르지만 이러한 숫자들이 특별한 의미를 가지는 경우가 있다
    • missing vale
      • NA, NaN
      • 0 = missing or 0 ?
    • outlier
    • not really a number
      • zip codes, survey response categories

Creating temporary tables

  • Create Temporary table
  • # create temp table syntax CREATE temp TABLE new_tablename as select column1, column2 from table #selct into syntax -> postgres X select column1, column2 INTO temp TABLE new_tablename From table
  • Insert into table → new row 생성
INSERT INTO top_companies 
Select rank, title 
from fortune500
where rank between 11 and 20 
  • Delete (drop) table
DROP TABLE top_companies
DROP TABLE IF EXISTS top_companies 

Aggregate window functions

  • 단순히 max, sum 함수를 사용한 경우
  • window max query
WITH brazil_Medals AS(...)

Select 
 Year, Medals,
 MAX(Medals)
	Over(order by year asc) AS MAx_medals
From Brazil_medals;
  • Partitioning with aggregate window functions
    • OVER( PARTITION BY variable)

Frames

: Partition, order에 대해서 window function이 작동하는 원리를 바꾸기 위한 하나의 방법론

→ range between unbounded preceding and unbounded following

(범위를 한정해주는 역할을 한다)

  • Row between (start) and (finish)
    • n preceding : n row before the current row
    • current row
    • n following : n row after current row

ex) row between 3 preceding and current row

max(medals) over(order by year asc rows between current row and 1 following

  • example Frame
With Russia_medals as (...)

SELECT
Year, Medals,
Max(Medals)
	Over(order by year asc) as max_medals,
max(medals)
	over(order by year asc 
				rows between 1 preceding and current row)
as max_medals_last
from russia_medals
order by year asc

Moving averages and totals

: 위에서 나온 Frame과 같이 많이 쓰는 the calculation of moving averages and totals

→ average of last n periods / total of last n periods

ex) AVG(medals) OVER(Order by year ASC ROWS BEtween 2 preceding ~~)

sum(medals) OVER( )

  • moving average
WITH US_medals AS(...)

SELECT 
Year, Medals,
AVG(Medals) OVER
	(ORDER BY Year ASC
	ROWS BETWEEN

관련글 더보기

댓글 영역