[SQL 스터디_1팀] 6주차

[SQL 스터디_1팀] 6주차_강의노트

심화 스터디/SQL 스터디

by j.hyeon 2023. 5. 17. 22:49

[이미지 포함 정리 참고]

https://www.notion.so/ad93357c471d4a1f827533099904e9e6

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

E

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

xploExploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

ring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Exploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Exploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Exploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Exploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Exploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

Exploring categorical data and unstructed text

Character data types and common issues

길이에 따른 character type
- character(n) or char(n) _ 길이 n으로 고정
- character varying(n) or varchar(n) _ 최대 길이 n
- text or varchar _ 길이 무제한
text data 유형
- categorical: 값을 가지는 짧은 길이의 string이 반복
- unstructed text: unique 값의 더 긴 string
  - 분석 _ text에서 기능 추출하거나, 특정 특성 존재 여부를 나타내는 변수 생성
categorical variables에 대한 grouping & counting
SELECT category, count(*) FROM product GROUP BY category;
Order
- Alphabetical Order _ ‘ ‘ < ‘A’ < ‘B’ < ‘a’ < ‘b’
SELECT category, count(*) FROM product GROUP BY category ORDER BY count DESC; # 많은 순서대로 ORDER BY category; # categorical variable 순서대로

Cases and Spaces

convert case

SELECT lower('aBc DeFg 7-');

case insensitive comparisons

SELECT *
FROM fruit
WHERE lower(fav_fruit) = 'apple';

case insensitive searches

SELECT *
FROM fruit
WHERE fav_fruit ILIKE '%apple%';

# LIKE 사용할 경우 문자 그대로 값에만 적용

trimming spaces
- trim(’ abc ‘) (== btrim(’ abc ‘)) = ‘abc’
- rtrim(’ abc ‘) = ‘ abc’
- lrim(’ abc ‘) = ‘abc ‘
trimming other values
SELECT trim('Wow!', '!'); # Wow SELECT trim('WoW!', '!wW'); #o
combining functions
SELECT trim(lower('Wow!'), '!w'); # o

Splitting and concatenating text

substring

SELECT left('abcde', 2),
				right('abcde', 2),
				
SELECT left('abc', 10),
				length(left('abc', 10));

SELECT substring(string FROM start FOR length);
- SELECT substr(string, start, length); 결과 동일
delimiters
- splitting _ SELECT split_part(string, delimiter, part);
```
SELECT split_part('a,bc,d', ',', 2);
# bc
```

concatenating text

SELECT concat('a', NULL, 'cc');
# acc
SELECT 'a' || NULL || 'cc';
#

SELECT concat('a', 2, 'cc'); SELECT 'a' || 2 || 'cc'; # a2cc

Strategies for multiple transformations

CASE WHEN

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
						WHEN category LIKE '% - %' THEN split_part(category, '- ', 1)
						ELSE split_part(category, ' | ', 1)
				END AS major_category, sum(business)
FROM naics
GROUP BY major_category;

recoding table

create temp table with original values

CREATE TEMP TABLE recode AS
SELECT DISTINCT fav_fruit AS original, 
								fav_fruit AS standardized
FROM fruit;

update to create standardized values

UPDATE recode
SET standardized=trim(lower(original));

UPDATE recode
SET standardized='banana'
WHERE standardized LIKE '%nn%';

UPDATE recode
SET standardized=trim(standardized, 's'));

join original data to standardized data

# original only
SELECT fav_fruit, count(*)
FROM fruit
GROUP BY fav_fruit;

# with recoded values SELECT standardized, count(*) FROM fruit LEFT JOIN recode ON fav_fruit=original GROUP BY standardized;

Working with dates and timestamps

Date/time types and formats

date _ YYYY-MM-DD
timestamp _ YYYY-MM-DD HH:MM:SS
standards
- timestamp with timezone
- YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS
comparisons _ >, <, =
- current _ now()
subtraction, addition

SELECT now() - '2018-01-01'

SELECT '2010-01-01'::date + 1;
# 2010-01-02

SELECT '2018-12-10'::date + '1 year'::interval;
# 2019-12-10 00:00:00

SELECT '2018-12-10'::date + '1 year 2 days 3 minutes'::interval;
# 2019-12-12 00:00:03

Date/time components and aggregation

common data/time fields
- century
- decade
- year, month, day
- hour, minute, second
- week
- dow: day of week
extracting fields

date_part('field', timestamp)
EXTRACT(FIELD FROM timestamp)

SELECT date_part('month', now()),
			EXTRACT(MONTH FROM now());

extract to summarize by field

# individual sales
SELECT *
FROM sales
WHERE date >= '2010-01-01'
			AND date < '2017-01-01';

# by month
SELECT date_part('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

truncating dates _ date_trunc(’field’, timestamp)

SELECT date_trunc('month', date) AS month, sum(amt)
FROM sales
GROUP BY month
ORDER BY month;

Aggregating with date/time series

generate series _ SELECT generate_series(from, to, interval)

from the beginning

SELECT generate_series('2018-01-31', '2018-12-31', '1 month'::interval)

aggregation with series

WITH hour_series AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 14:00:00',
														'1 hour'::interval) AS hours)

SELECT hours, count(data)
FROM hour_series
LEFT JOIN sales
ON hours=date_Truc('hour', date)
GROUP BY hours
GROUP BY hours

aggregation with bins

WITH bins AS(
		SELECT generate_series('2018-04-23 09:00:00',
														'2018-04-23 15:00:00',
														'3 hour'::interval) AS lower,
						generate_series('2018-04-23 12:00:00',
														'2018-04-23 18:00:00',
														'3 hour'::interval) AS upper)

SELECT lower, upper, count(date)
FROM bins
LEFT JOIN sales
ON date >= lower
	AND date < upper
GROUP BY lower, upper
ORDER BY lower

SELECT generate_series('2018-02-01', '2019-01-01', '1 month'::interval) - '1 day'::interval;

Time between events

lead and log

SELECT date,
				lag(date) OVER (ORDER BY date),
				lead(date) OVER (ORDER BY date)
FROM sales;

time between events

SELECT date,
			date - lag(data) OVER (ORDER BY date) AS gap
FROM sales 

# average time
SELECT avg(gap)
FROM (SELECT date - lag(date) OVER (ORDER BY date) AS gap
	FROM sales) AS gaps;

change in a time series

SELECT date, amount, lag(amount) OVER (ORDER BY date),
							amount - lag(amount) OVER (ORDER BY date) AS chane

'심화 스터디 > SQL 스터디' 카테고리의 다른 글

[SQL 스터디_1팀] 5주차_강의노트 (1) (0)	2023.05.09
[SQL 스터디_2팀] 5주차_강의노트 (0)	2023.05.09
[SQL_스터디_2팀] 5주차_강의노트(2) (0)	2023.05.05
[SQL 스터디_2팀] 4주차_강의노트 (2) (0)	2023.05.02
[SQL 스터디_1팀] 5주차_강의노트 (2) (0)	2023.05.02

KUBIG 2023-1 활동 블로그

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series

Time between events

Character data types and common issues

Cases and Spaces

Splitting and concatenating text

Strategies for multiple transformations

Working with dates and timestamps

Date/time types and formats

Date/time components and aggregation

Aggregating with date/time series