Post

🐘 PostgreSQL κΈ°λ³Έ β…© - 데이터 검사 및 μˆ˜μ •

🐘 PostgreSQL κΈ°λ³Έ β…© - 데이터 검사 및 μˆ˜μ •

πŸ“™ γ€Žμ‹€μš© SQL』을 읽고 μ •λ¦¬ν•œ κΈ€μž…λ‹ˆλ‹€.

윑λ₯˜, κ°€κΈˆλ₯˜ 및 κ³„λž€ 생산업체 데이터 κ°€μ Έμ˜€κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE meat_poultry_egg_establishments (
	establishment_number text CONSTRAINT est_number_key PRIMARY KEY,
	company text,
	street text,
	city text,
	st text,
	zip text,
	phone text,
	grant_date date,
	activities text,
	dbas text
);

COPY meat_poultry_egg_establishments
FROM 'C:\YourDirectory\MPI_Directory_by_Establishment_Name.csv'
WITH (FORMAT CSV, HEADER);

CREATE INDEX company_idx ON meat_poultry_egg_establishments (company);

데이터셋 μΈν„°λ·°ν•˜κΈ°

1
2
3
4
5
6
7
8
9
SELECT company,
	street,
	city,
	st,
	count(*) AS address_count
FROM meat_poultry_egg_establishments
GROUP BY company, street, city, st
HAVING count(*) > 1
ORDER BY company, street, city, st;
  • meat_poultry_egg_establishments ν…Œμ΄λΈ”μ€ μ‹ν’ˆ 생산업체λ₯Ό μ„€λͺ…ν•œλ‹€.
  • μ–Έλœ» 보면 각 ν–‰μ˜ 각 νšŒμ‚¬λ“€μ΄ μ„œλ‘œ λ‹€λ₯Έ μ£Όμ†Œμ—μ„œ μš΄μ˜λœλ‹€κ³  생각할 수 μžˆμ§€λ§Œ, κ·Έλ ‡μ§€ μ•Šμ„ 수 μžˆμœΌλ―€λ‘œ 같은 μ£Όμ†Œλ₯Ό κ°€μ§„ μ—¬λŸ¬ νšŒμ‚¬λ₯Ό μ‘°νšŒν•΄λ³΄μž.
  • HAVING μ ˆμ„ 톡해 두 개 μ΄μƒμ˜ 행이 μžˆλŠ” 경우만 좜λ ₯ν•œλ‹€.
companystreetcityst
Acre Station Meat Farm17076 Hwy 32 NPinetownNC
Beltex Corporation3801 North Grove StreetFort WorthTX
Cloverleaf Cold Storage111 Imperial DriveSanfordNC
Crete Core Ingredients, LLC2220 County Road ICreteNE
Crider, Inc.1 Plant AvenueStillmoreGA
Dimension Marketing & Sales, Inc.386 West 9400 SouthSandyUT
Foster Poultry Farms, A California Corporation6648 Highway 15 NorthFarmervilleLA
Freezer & Dry Storage, LLC21740 Trolley Industrial DriveTaylorMI
JBS Souderton Inc.249 Allentown RoadSoudertonPA
KB Poultry Processing LLC15024 Sandstone Dr.UticaMN
Lakeside Refrigerated Services2600 Oldmans Creek RoadSwedesboroNJ
Liberty Cold Storage1310 Remington Blvd.BolingbrookIL
M.G. Waldbaum Company120 Tower StreetGaylordMN
Midway International Logistics LLC948 Bradley StreetWatertownNY
Nordic Logistics and Warehousing, LLC220 Nordic WayPoolerGA
OK Foods, Inc.3921 Reed LaneFort SmithAR
Pacific Produce Corporation220 East Harmon Industrial Park RoadTamuningGU
Payless Distribution Center (PDC)370 Mendioka StreetDededoGU
Piatkowski Riteway Meats Inc.3555 Witmer RoadNiagara FallsNY
Preferred Freezer Services2700 Trade StreetChesapeakeVA
THE AMERICAN PIG25 MEADOW ROADASHEVILLENC
The Classic Jerky Company21655 Trolley Industrial DriveTaylorMI
United States Cold Storage Inc.11801 NW 102 RoadMedleyFL
  • 총 23개의 행이 좜λ ₯λ˜μ—ˆλ‹€.
  • νšŒμ‚¬κ°€ λ™μΌν•œ μ£Όμ†Œμ— μ—¬λŸ¬ 번 λ‚˜νƒ€λ‚˜λŠ” 데 νƒ€λ‹Ήν•œ μ΄μœ κ°€ μžˆμ„ 수 μžˆμ§€λ§Œ, 데이터 μž…λ ₯ 였λ₯˜κ°€ 발견된 것일 μˆ˜λ„ μžˆμœΌλ―€λ‘œ 데이터셋에 μ˜μ‘΄ν•˜μ§€ μ•Šκ³  μœ νš¨μ„±μ— λŒ€ν•œ 우렀λ₯Ό μ œκΈ°ν•˜λŠ” 것이 ν˜„λͺ…ν•œ 방법이닀.

κ²°μΈ‘κ°’ ν™•μΈν•˜κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- μ£Ό κ·Έλ£Ήν™”ν•˜κ³  μ„ΈκΈ°
SELECT st,
	count(*) AS st_count
FROM meat_poultry_egg_establishments
GROUP BY st
ORDER BY st;

-- IS NULL을 μ‚¬μš©ν•˜μ—¬ st μ—΄μ—μ„œ κ²°μΈ‘ κ°’ μ°ΎκΈ°
SELECT establishment_number,
	company,
	city,
	st,
	zip
FROM meat_poultry_egg_establishments
WHERE st IS NULL;
  • 첫 번째 μΏΌλ¦¬λŠ” λͺ¨λ“  μ£Ό 열에 값이 μžˆλŠ”μ§€, μžˆλ‹€λ©΄ λͺ‡ κ°œμ”© μžˆλŠ”μ§€ ν™•μΈν•˜λŠ” λͺ…λ Ήμ–΄λ‹€.
  • 싀행해보면, NULL 값을 κ°€μ§„ 행이 3개 μžˆλ‹€λŠ” 사싀을 μ•Œ 수 μžˆλ‹€.
  • 두 번째 쿼리λ₯Ό 톡해 결츑값을 μ‘°νšŒν•œ κ²°κ³ΌλŠ” λ‹€μŒκ³Ό κ°™λ‹€.
establishment_numbercompanycitystzip
V18677AAtlas Inspection, Inc.BlaineΒ 55449
M45319+P45319Hall-Namie Packing Company, IncΒ Β 36671
M263A+P263A+V263AJones Dairy FarmΒ Β 53538
  • 각 μ£Όμ—μ„œ λ³΄μœ ν•œ μ‹œμ„€μ˜ μ •ν™•ν•œ 개수λ₯Ό μ•Œκ³ μž ν•  λ•Œ μœ„μ™€ 같은 결츑값은 잘λͺ»λœ κ²°κ³Όλ₯Ό μ΄ˆλž˜ν•  수 μžˆλ‹€.
  • 이와 같은 였λ₯˜λ₯Ό λ°œμƒν•˜λ©΄ λ‹€μš΄λ‘œλ“œν•œ 원본 νŒŒμΌμ„ 직접 ν™•μΈν•˜μ—¬, 데이터λ₯Ό κ°€μ Έμ˜€λŠ” κ³Όμ •μ—μ„œ λ°œμƒν•œ 것인지, 데이터 μžμ²΄μ— λ‚΄μž¬λœ 였λ₯˜μΈμ§€ 확인해야 ν•œλ‹€.

일관성 μ—†λŠ” 데이터 κ°’ ν™•μΈν•˜κΈ°

1
2
3
4
5
SELECT company,
count(*) AS company_count
FROM meat_poultry_egg_establishments
GROUP BY company
ORDER BY company ASC;
  • count()와 ν•¨κ»˜ GROUP BYλ₯Ό μ‚¬μš©ν•˜μ—¬ μ—΄ λ‚΄μ—μ„œ μΌκ΄€λ˜μ§€ μ•Šκ²Œ μž…λ ₯된 데이터λ₯Ό 확인할 수 μžˆλ‹€.
  • κ°€λ Ή 쑰회 κ²°κ³Ό 쀑 μΌλΆ€λŠ” λ‹€μŒκ³Ό κ°™λ‹€.
companycompany_count
AdvancePierre Foods1
Advance Pierre Foods, Inc2
Advance Pierre Foods, Inc.1
AdvancePierre Foods, Inc1
AdvancePierre Foods, Inc.7
  • λ™μΌν•œ νšŒμ‚¬κ°€ μ†Œμœ ν•˜κ³  μžˆμ„ κ°€λŠ₯성이 높은 12개 μ‹œμ„€μ— λŒ€ν•΄ 적어도 λ‹€μ„― 개 μ΄μƒμ˜ μ„œλ‘œ λ‹€λ₯Έ ν‘œκΈ° 방식이 μ‘΄μž¬ν•œλ‹€.
  • μ§‘κ³„λ˜κ±°λ‚˜ ν•©μ‚°λœ λͺ¨λ“  ν•­λͺ©μ΄ μ μ ˆν•˜κ²Œ κ·Έλ£Ήν™”λ˜λ„λ‘ 이름을 ν‘œμ€€ν™”ν•˜λŠ” 것이 μ’‹λ‹€.

length()λ₯Ό μ‚¬μš©ν•˜μ—¬ 잘λͺ»λœ κ°’ ν™•μΈν•˜κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- length()와 count()λ₯Ό μ‚¬μš©ν•˜μ—¬ zip μ—΄ ν…ŒμŠ€νŠΈν•΄ 보기
SELECT length(zip),
	count(*) AS length_count
FROM meat_poultry_egg_establishments
GROUP BY length(zip)
ORDER BY length(zip) ASC;

-- zip μ—΄μ—μ„œ 짧은 값을 μ°ΎκΈ° μœ„ν•΄ length()둜 ν•„ν„°λ§ν•˜κΈ°
SELECT st,
	count(*) AS st_count
FROM meat_poultry_egg_establishments
WHERE length(zip) < 5
GROUP BY st
ORDER BY st ASC;
  • 사싀 CSV 데이터λ₯Ό κ°€μ Έμ˜€λŠ” κ³Όμ •μ—μ„œ 였λ₯˜λ₯Ό μž¬ν˜„ν•˜κΈ° μœ„ν•΄ 우편번호λ₯Ό ν…μŠ€νŠΈ κ°’ λŒ€μ‹  일반 숫자 ν˜•μ‹μœΌλ‘œ μŠ€ν”„λ ˆλ“œμ‹œνŠΈμ— μ €μž₯ν–ˆλ‹€.
  • μ΄λ ‡κ²Œ ν•˜λ©΄ μ •μˆ˜κ°€ 0으둜 μ‹œμž‘ν•  수 μ—†κΈ° λ•Œλ¬Έμ— 07502처럼 0으둜 μ‹œμž‘λ˜λŠ” μš°νŽΈλ²ˆν˜ΈλŠ” μ„ ν–‰ν•˜λŠ” 0을 μžƒλŠ”λ‹€.
  • μ‹€μ œλ‘œλŠ” μ΄λŸ¬ν•œ 상황을 λ°©μ§€ν•˜κΈ° μœ„ν•΄ ν…μŠ€νŠΈ ν˜•μ‹μ„ μ§€μ •ν•΄μ•Ό ν•˜λŠ” μˆ«μžμ— 주의λ₯Ό κΈ°μšΈμ—¬μ•Ό ν•œλ‹€.
  • μš°νŽΈλ²ˆν˜ΈλŠ” λ‹€μ„― 자리 숫자인데, 첫 번째 쿼리λ₯Ό 톡해 3, 4 자리의 μˆ«μžλ„ μ‘΄μž¬ν•˜λŠ” κ±Έ 확인할 수 μžˆλ‹€.
  • 두 번째 쿼리λ₯Ό 톡해 μ–΄λ–€ 주의 μš°νŽΈλ²ˆν˜Έκ°€ 잘λͺ» μ €μž₯λ˜μ—ˆλŠ”μ§€ λͺ©λ‘μ„ 확인할 수 μžˆλ‹€.
  • μ§€κΈˆκΉŒμ§€ ν™•μΈν•œ 문제 λͺ©λ‘μ€ λ‹€μŒκ³Ό κ°™λ‹€.
  1. st μ—΄μ˜ 3개 행에 λŒ€ν•œ κ²°μΈ‘κ°’
  2. ν•˜λ‚˜ μ΄μƒμ˜ νšŒμ‚¬ 이름 철자 뢈일치
  3. 파일 λ³€ν™˜μœΌλ‘œ 인해 λΆ€μ •ν™•ν•΄μ§„ 우편번호

ν…Œμ΄λΈ”, μ—΄, 데이터 μˆ˜μ •ν•˜κΈ°

  • ν…Œμ΄λΈ”μ—μ„œ μ—΄, 데이터 νƒ€μž…, 값에 이λ₯΄κΈ°κΉŒμ§€ λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ μƒμ„±λœ ν›„ ꡬ체적으둜 κ³ μ •λ˜λŠ” 것은 거의 μ—†λ‹€.
  • ν•„μš”μ— 따라 ν…Œμ΄λΈ”μ— 열을 μΆ”κ°€ν•˜κ³ , κΈ°μ‘΄ μ—΄μ˜ 데이터 νƒ€μž…μ„ λ³€κ²½ν•˜κ³ , 값을 νŽΈμ§‘ν•˜κ²Œ λœλ‹€.

데이터 μΈν„°λ·°μ—μ„œ λ„ˆλ¬΄ λ§Žμ€ κ²°μΈ‘κ°’ λ˜λŠ” μˆ˜μ²œμ„ μ˜ˆμƒν–ˆλŠ”λ° μˆ˜μ‹­ 얡에 이λ₯΄λŠ” μ‹μœΌλ‘œ 상식에 μ–΄κΈ‹λ‚˜λŠ” 값이 발견되면 κ·Έ λ°μ΄ν„°μ˜ μ‚¬μš©μ„ μž¬ν‰κ°€ν•΄μ•Ό ν•œλ‹€. 많이 μ˜μ‹¬λ˜λŠ” 경우 첫 번째 λ‹¨κ³„λŠ” 원본 데이터 νŒŒμΌμ„ λ‹€μ‹œ ν™•μΈν•˜λŠ” 것이닀. μ˜¬λ°”λ₯΄κ²Œ κ°€μ Έμ™”λŠ”μ§€, λͺ¨λ“  μ—΄μ˜ 데이터가 ν…Œμ΄λΈ”μ˜ λ™μΌν•œ 열에 μžˆλŠ”μ§€ 확인해야 ν•œλ‹€. λ˜λŠ” 데이터λ₯Ό μƒμ„±ν•œ κΈ°κ΄€μ΄λ‚˜ νšŒμ‚¬μ— μ—°λ½ν•˜μ—¬ 보고 μžˆλŠ” λ‚΄μš©μ„ ν™•μΈν•˜κ³  μ„€λͺ…을 μš”μ²­ν•˜λŠ” 것이닀. λ™μΌν•œ 데이터λ₯Ό μ‚¬μš©ν•œ λ‹€λ₯Έ μ‚¬λžŒμ—κ²Œ 쑰언을 κ΅¬ν•˜λŠ” 방법도 μžˆλ‹€.

ALTER TABLE둜 ν…Œμ΄λΈ” μˆ˜μ •ν•˜κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
-- ν…Œμ΄λΈ” μ—΄ μΆ”κ°€
ALTER TABLE table_name ADD COLUMN column_name data_type; 

-- ν…Œμ΄λΈ” μ—΄ 제거
ALTER TABLE table_name DROP COLUMN column_name;

-- ν…Œμ΄λΈ” μ—΄ 데이터 νƒ€μž… λ³€κ²½
ALTER TABLE table_name ALTER COLUMN column_name SET DATA TYPE data_type;

-- ν…Œμ΄λΈ” μ—΄ μ œμ•½ 쑰건 μΆ”κ°€/μ‚­μ œ
ALTER TABLE table_name ALTER COLUMN column_name SET NOT NULL;
ALTER TABLE table_name ALTER COLUMN column_name DROP NOT NULL;

UPDATE둜 κ°’ μˆ˜μ •ν•˜κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- λͺ¨λ“  ν–‰μ˜ 단일 μ—΄ μˆ˜μ •
UPDATE table_name SET column = value;

-- λͺ¨λ“  ν–‰μ˜ μ—¬λŸ¬ μ—΄ μˆ˜μ •
UPDATE table_name SET column_a = value, column_b = value;

-- νŠΉμ • ν–‰ μˆ˜μ •
UPDATE table_name SET column = value WHERE criteria;

-- ν•œ ν…Œμ΄λΈ”μ„ λ‹€λ₯Έ ν…Œμ΄λΈ”μ˜ κ°’μœΌλ‘œ μˆ˜μ •
UPDATE table_name
SET column = ( SELECT column
	FROM table_b
	WHERE table.column = table_b.column )
WHERE EXISTS ( SELECT column
	FROM table_b
	WHERE table.column = table_b.column );
  • λ§ˆμ§€λ§‰ 쿼리의 경우 WHERE EXISTS 절이 μ—†μœΌλ©΄ μ˜λ„μΉ˜ μ•Šκ²Œ 일뢀 값을 NULL둜 μ„€μ •ν•˜κ²Œ 될 수 μžˆλ‹€.
  • 일뢀 λ°μ΄ν„°λ² μ΄μŠ€ κ΄€λ¦¬μžλŠ” ν…Œμ΄λΈ” κ°„ μ—…λ°μ΄νŠΈλ₯Ό μœ„ν•œ μΆ”κ°€ ꡬ문을 μ œκ³΅ν•œλ‹€.
  • κ°€λ Ή PostgreSQLμ—μ„œλŠ” λ‹€μŒ 쿼리도 κ°€λŠ₯ν•˜λ‹€.
1
2
3
4
UPDATE table
SET column = table_b.column
FROM table_b
WHERE table.column = table_b.column;
  • UPDATE 문을 μ‹€ν–‰ν•  λ•Œ PostgreSQL은 영ν–₯을 λ°›λŠ” ν–‰ μˆ˜μ™€ ν•¨κ»˜ UPDATEλ₯Ό λ‚˜νƒ€λ‚΄λŠ” λ©”μ‹œμ§€λ₯Ό λ°˜ν™˜ν•œλ‹€.

RETURNING으둜 μˆ˜μ •λœ 데이터 보기

  • UPDATE 문에 RETURNING ν‚€μ›Œλ“œλ₯Ό μΆ”κ°€ν•˜λ©΄ λ³„λ„μ˜ 쿼리λ₯Ό μΆ”κ°€ μž…λ ₯ν•  ν•„μš” 없이 μˆ˜μ •λœ 값을 확인할 수 μžˆλ‹€.
1
2
3
UPDATE table_name
SET column_a = value
RETURNING column_a, column_b, column_c;
  • RETURNING은 λ°μ΄ν„°λ² μ΄μŠ€κ°€ μˆ˜μ •λœ ν–‰ 수λ₯Ό ν‘œμ‹œν•˜λŠ” λŒ€μ‹  μˆ˜μ •λœ ν–‰μ—μ„œ μ§€μ •ν•œ 열을 ν‘œμ‹œν•˜λ„λ‘ μ§€μ‹œν•œλ‹€.
  • RETURNING은 PostgreSQL μ „μš© κ΅¬ν˜„μœΌλ‘œ INSERT와 DELETE FROMκ³Ό ν•¨κ»˜ μ‚¬μš©ν•  μˆ˜λ„ μžˆλ‹€.

λ°±μ—… ν…Œμ΄λΈ” μƒμ„±ν•˜κΈ°

1
2
3
4
5
6
7
CREATE TABLE meat_poultry_egg_establishments_backup AS
SELECT * FROM meat_poultry_egg_establishments;

-- λ ˆμ½”λ“œ 수 확인
SELECT
	(SELECT count(*) FROM meat_poultry_egg_establishments) AS original,
	(SELECT count(*) FROM meat_poultry_egg_establishments_backup) AS backup;
  • ν…Œμ΄λΈ”μ„ μˆ˜μ •ν•˜κΈ° 전에 μ°Έμ‘° 및 백업을 μœ„ν•œ 볡사본을 λ§Œλ“œλŠ” 것이 μ’‹λ‹€.
  • 잘 λ§Œλ“€μ–΄μ‘ŒλŠ”μ§€λŠ” 두 번째 쿼리λ₯Ό 톡해 두 ν…Œμ΄λΈ”μ˜ λ ˆμ½”λ“œ 수λ₯Ό λΉ„κ΅ν•˜λ©΄ λœλ‹€.
originalbackup
6,2876,287
  • 참고둜 CREATE TABLE 문을 톡해 ν…Œμ΄λΈ” 백업을 생성할 λ•Œ μΈλ±μŠ€λŠ” λ³΅μ‚¬λ˜μ§€ μ•ŠλŠ”λ‹€.
  • λ°±μ—…μ—μ„œ 쿼리λ₯Ό μ‹€ν–‰ν•˜κΈ°λ‘œ κ²°μ •λœ 경우 λ³„λ„μ˜ 인덱슀λ₯Ό 생성해야 ν•œλ‹€.

λˆ„λ½λœ μ—΄ κ°’ λ³΅μ›ν•˜κΈ°

  • μ•žμ„  데이터 μΈν„°λ·°μ—μ„œ meat_poultry_egg_establishments ν…Œμ΄λΈ”μ˜ μ„Έ 행은 NULL 값을 κ°–κ³  μžˆμŒμ„ μ•Œ 수 μžˆμ—ˆλ‹€.
  • 각 μ£Όμ—μ„œ 전체 μ‹œμ„€ 수λ₯Ό μ–»μœΌλ €λ©΄ UPDATE 문을 μ‚¬μš©ν•˜μ—¬ λˆ„λ½λœ 값을 μ±„μ›Œμ•Ό ν•œλ‹€.
μ—΄ 볡사본 μƒμ„±ν•˜κΈ°
1
2
ALTER TABLE meat_poultry_egg_establishments ADD COLUMN st_copy text;
UPDATE meat_poultry_egg_establishments SET st_copy = "st";
  • μ–΄λ”˜κ°€μ— μ‹¬κ°ν•œ 였λ₯˜κ°€ λ°œμƒν•˜λ”λΌλ„ μ›λž˜ 데이터가 μœ μ§€λ˜λ„λ‘ 주의λ₯Ό 기울이기 μœ„ν•΄μ„œλŠ” μ—΄ 볡사본을 λ§Œλ“œλŠ” 것이 μ’‹λ‹€.
  • λ‹€μŒ 쿼리λ₯Ό 톡해 값이 μ œλŒ€λ‘œ λ³΅μ‚¬λ˜μ—ˆλŠ”μ§€ 확인할 수 μžˆλ‹€.
1
2
3
4
5
SELECT st,
	st_copy
FROM meat_poultry_egg_establishments
WHERE st IS DISTINCT FROM st_copy
ORDER BY st;
  • IS DISTINCT FROM은 두 μ—΄μ˜ 값이 λ‹€λ₯Έμ§€ ν™•μΈν•œλ‹€.
  • λ˜ν•œ NULL을 κ°’μœΌλ‘œ μ·¨κΈ‰ν•˜κΈ° λ•Œλ¬Έμ— 비ꡐ κ²°κ³ΌλŠ” 항상 true, false이닀.
값이 λˆ„λ½λœ ν–‰ μ—…λ°μ΄νŠΈν•˜κΈ°
1
2
3
4
5
6
7
8
9
10
11
12
UPDATE meat_poultry_egg_establishments
SET st = 'MN'
WHERE establishment_number = 'V18677A';

UPDATE meat_poultry_egg_establishments
SET st = 'AL'
WHERE establishment_number = 'M45319+P45319';

UPDATE meat_poultry_egg_establishments
SET st = 'WI'
WHERE establishment_number = 'M263A+P263A+V263A'
RETURNING establishment_number, company, city, st, zip;
  • 각 UPDATE 문이 단일 행에 영ν–₯을 미치길 μ›ν•˜λ―€λ‘œ WHERE μ ˆμ— κΈ°λ³Έ 킀에 λŒ€ν•œ 쑰건을 μΆ”κ°€ν•΄μ•Ό ν•œλ‹€.
μ›λž˜ κ°’ λ³΅μ›ν•˜κΈ°
1
2
3
4
5
6
7
8
9
-- λ°±μ—…ν•œ μ—΄ λ³΅μ›ν•˜κΈ°
UPDATE meat_poultry_egg_establishments
SET st = st_copy;

-- λ°±μ—…ν•œ ν…Œμ΄λΈ” λ³΅μ›ν•˜κΈ°
UPDATE meat_poultry_egg_establishments original
SET st = backup.st
FROM meat_poultry_egg_establishments_backup backup
WHERE original.establishment_number = backup.establishment_number;
  • 첫 번째 μΏΌλ¦¬λŠ” μ—΄ 볡사본을 톡해 데이터λ₯Ό λ³΅μ›ν•˜κ³ , 두 번째 μΏΌλ¦¬λŠ” λ°±μ—…ν•œ ν…Œμ΄λΈ” 톡해 데이터λ₯Ό λ³΅μ›ν•˜λŠ” 것이닀.

일관성을 μœ„ν•œ κ°’ μ—…λ°μ΄νŠΈν•˜κΈ°

1
2
3
4
5
AdvancePierre Foods
Advance Pierre Foods, Inc
Advance Pierre Foods, Inc.
AdvancePierre Foods, Inc
AdvancePierre Foods, Inc.
  • μ•žμ„  데이터 μΈν„°λ·°μ—μ„œ νšŒμ‚¬ 이름이 μΌκ΄€λ˜μ§€ μ•Šκ²Œ μž…λ ₯된 경우λ₯Ό λ°œκ²¬ν–ˆλ‹€.
  • UPDATE 문을 μ‚¬μš©ν•΄ 이 νšŒμ‚¬ μ΄λ¦„μ˜ 철자λ₯Ό ν‘œμ€€ν™”ν•  수 μžˆλ‹€.
  • 데이터λ₯Ό λ³΄ν˜Έν•˜κΈ° μœ„ν•΄ μƒˆ 열을 λ§Œλ“€κ³ , company μ—΄ 값을 μƒˆ 열에 λ³΅μ‚¬ν•˜κ³ μ„œ, λ³΅μ‚¬ν•œ κ°’μœΌλ‘œ μž‘μ—…ν•˜λ©΄ λœλ‹€.
1
2
3
4
ALTER TABLE meat_poultry_egg_establishments ADD COLUMN company_standard text;

UPDATE meat_poultry_egg_establishments
SET company_standard = company;
  • company 열에 μžˆλŠ” νšŒμ‚¬ 이름 쀑 Pierreκ°€ ν¬ν•¨λ˜μ–΄ μžˆλŠ” 경우 Advance-Pierre Foods, Inc.둜 ν‘œμ‹œν•˜κ³ μž ν•œλ‹€.
  • λ‹€μŒκ³Ό 같이 데이터λ₯Ό μˆ˜μ •ν•΄μ•Ό ν•œλ‹€.
1
2
3
4
UPDATE meat_poultry_egg_establishments
SET company_standard = 'Advance-Pierre Foods, Inc.'
WHERE company LIKE '%Pierre%'
RETURNING company, company_standard;
companycompany_standard
Advance Pierre Foods, IncAdvance-Pierre Foods, Inc.
Advance Pierre Foods, IncAdvance-Pierre Foods, Inc.
Advance Pierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre FoodsAdvance-Pierre Foods, Inc.
AdvancePierre Foods, IncAdvance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
AdvancePierre Foods, Inc.Advance-Pierre Foods, Inc.
  • 이제 ν•΄λ‹Ή νšŒμ‚¬ 이름은 μΌκ΄€λœ 철자둜 ν‘œμ€€ν™”λ˜μ—ˆκ³ , μ°Έμ‘°λ₯Ό μœ„ν•΄ κΈ°μ‘΄ company 열은 μœ μ§€ν•œλ‹€.

연결을 μ‚¬μš©ν•˜μ—¬ 우편번호 λ³΅κ΅¬ν•˜κΈ°

1
2
3
4
ALTER TABLE meat_poultry_egg_establishments ADD COLUMN zip_copy text;

UPDATE meat_poultry_egg_establishments
SET zip_copy = zip;
  • μœ„ μ½”λ“œλŠ” μ—΄ 볡사본을 μƒμ„±ν•˜λŠ” 쿼리닀.
1
2
3
4
5
6
7
8
9
-- 두 개의 μ„ ν–‰ 0이 λˆ„λ½λœ zip μ—΄μ˜ μ½”λ“œ μˆ˜μ •ν•˜κΈ°
UPDATE meat_poultry_egg_establishments
SET zip = '00' || zip
WHERE st IN('PR','VI') AND length(zip) = 3;

-- ν•œ 개의 μ„ ν–‰ 0이 λˆ„λ½λœ zip μ—΄μ˜ μ½”λ“œ μˆ˜μ •ν•˜κΈ°
UPDATE meat_poultry_egg_establishments
SET zip = '0' || zip
WHERE st IN('CT','MA','ME','NH','NJ','RI','VT') AND length(zip) = 4;
  • μš°νŽΈλ²ˆν˜Έμ— λˆ„λ½λœ 0을 μΆ”κ°€ν•˜λŠ” 쿼리닀.
  • 데이터λ₯Ό μˆ˜μ •ν•  λ•Œ, 데이터 인터뷰 μ‹œ 발견된 ν–‰ μˆ˜μ™€ μ‹€μ œλ‘œ μˆ˜μ •λœ ν–‰ 수λ₯Ό 확인해야 ν•œλ‹€.

μ—¬λŸ¬ ν…Œμ΄λΈ”μ—μ„œ κ°’ μ—…λ°μ΄νŠΈν•˜κΈ°

1
2
3
4
5
6
7
8
9
-- state_regions ν…Œμ΄λΈ” 생성 및 μ±„μš°κΈ°
CREATE TABLE state_regions (
	st text CONSTRAINT st_key PRIMARY KEY,
	region text NOT NULL
);

COPY state_regions
FROM 'C:\YourDirectory\state_regions.csv'
WITH (FORMAT CSV, HEADER);
  • μš°λ¦¬κ°€ λ‹€λ£¨λŠ” ν…Œμ΄λΈ”μ— μžˆλŠ” 각 νšŒμ‚¬λ“€μ˜ 검사 λ‚ μ§œλ₯Ό μ„€μ •ν•˜κΈ° μœ„ν•΄ ν…Œμ΄λΈ”μ„ μƒμ„±ν•˜κ³  데이터λ₯Ό κ°€μ Έμ˜€μž.
1
2
3
4
5
6
7
8
9
10
-- inspection_deadline μ—΄ μΆ”κ°€ 및 μ—…λ°μ΄νŠΈν•˜κΈ°
ALTER TABLE meat_poultry_egg_establishments
	ADD COLUMN inspection_deadline timestamp with time zone;

UPDATE meat_poultry_egg_establishments establishments
SET inspection_deadline = '2022-12-01 00:00 EST'
WHERE EXISTS (SELECT state_regions.region
	FROM state_regions
	WHERE establishments.st = state_regions.st
		AND state_regions.region = 'New England');
  • ALTER TABLE λ¬Έμ—μ„œ 열을 μΆ”κ°€ν•˜κ³ , 같은 주에 μ†ν•˜λ©΄μ„œ κ·Έ 지역이 λ‰΄μž‰κΈ€λžœλ“œμΈ 경우 검사 λ‚ μ§œ 값을 κ°±μ‹ ν•œλ‹€.
1
2
3
4
SELECT st, inspection_deadline
FROM meat_poultry_egg_establishments
GROUP BY st, inspection_deadline
ORDER BY st;
  • κ²°κ³Όλ₯Ό 확인해보면 λ‰΄μž‰κΈ€λžœλ“œ μ§€μ—­ μ™Έμ—λŠ” 아직 검사 λ‚ μ§œ 값이 NULL둜 μ„€μ •λ˜μ–΄ μžˆλŠ” 것을 확인할 수 μžˆλ‹€.

λΆˆν•„μš”ν•œ 데이터 μ‚­μ œν•˜κΈ°

ν…Œμ΄λΈ”μ—μ„œ ν–‰ μ‚­μ œν•˜κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- λͺ¨λ“  ν–‰ μ‚­μ œ
DELETE FROM table_name;

-- νŠΉμ • ν–‰ μ‚­μ œ
DELETE FROM table_name WHERE expression;

-- 예제 ν…Œμ΄λΈ”μ—μ„œ λ―Έκ΅­ μ˜ν† λ₯Ό μ œμ™Έ
DELETE FROM meat_poultry_egg_establishments
WHERE st IN('AS','GU','MP','PR','VI');

-- ν…Œμ΄λΈ” 데이터 λΉ„μš°κΈ°
TRUNCATE table_name

-- ν…Œμ΄λΈ” μ‹œν€€μŠ€ μž¬μ„€μ •
TRUNCATE table_name RESTART IDENTITY;

ν…Œμ΄λΈ” μ—΄ μ‚­μ œν•˜κΈ°

1
2
3
4
5
-- ν…Œμ΄λΈ” μ—΄ μ‚­μ œ 
ALTER TABLE table_name DROP COLUMN column_name;

-- 예제 ν…Œμ΄λΈ”μ—μ„œ μ—΄ 볡사본 μ‚­μ œ
ALTER TABLE meat_poultry_egg_establishments DROP COLUMN zip_copy;

λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ ν…Œμ΄λΈ” μ‚­μ œν•˜κΈ°

1
2
3
4
5
-- λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ ν…Œμ΄λΈ” μ‚­μ œ
DROP TABLE table_name;

-- λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ 예제 ν…Œμ΄λΈ” μ‚­μ œ
DROP TABLE meat_poultry_egg_establishments_backup;

νŠΈλžœμž­μ…˜μœΌλ‘œ λ³€κ²½ 사항 μ €μž₯ν•˜κΈ° λ˜λŠ” 되돌리기

  • μ§€κΈˆκΉŒμ§€ 배운 데이터 변경은 μ΅œμ’…μ μΈ 변경이닀.
  • 즉 DELETEλ‚˜ UPDATE 쿼리λ₯Ό μ‹€ν–‰ν•œ ν›„ λ³€κ²½ 사항을 μ‹€ν–‰ μ·¨μ†Œν•˜λŠ” μœ μΌν•œ 방법은 λ°±μ—…μ—μ„œ λ³΅μ›ν•˜λŠ” 것이닀.
  • 이 방법듀과 λ‹€λ₯΄κ²Œ λ³€κ²½ 사항을 μ™„λ£Œν•˜κΈ° 전에 ν™•μΈν•˜μ—¬, μ˜λ„ν•œ 변경이 μ•„λ‹ˆλΌλ©΄ κ·Έ λ³€κ²½ 사항을 μ·¨μ†Œν•˜λŠ” 방법도 μžˆλ‹€.
  • 쿼리의 μ‹œμž‘κ³Ό 끝에 λ‹€μŒ ν‚€μ›Œλ“œλ₯Ό μ‚¬μš©ν•΄ μ •μ˜ν•˜λŠ” λͺ…λ Ήλ¬Έ 그룹인 νŠΈλžœμž­μ…˜ 블둝 λ‚΄μ—μ„œ SQL 문으둜 감싸 이λ₯Ό μˆ˜ν–‰ν•œλ‹€.
  1. START TRANSACTION
    • νŠΈλžœμž­μ…˜ λΈ”λ‘μ˜ μ‹œμž‘μ„ μ•Œλ¦°λ‹€.
    • PostgreSQLμ—μ„œλŠ” ν‘œμ€€ ANSI SQL에 μ†ν•˜μ§€ μ•ŠλŠ” BEGIN ν‚€μ›Œλ“œλ₯Ό μ‚¬μš©ν•  μˆ˜λ„ μžˆλ‹€.
  2. COMMIT
    • λΈ”λ‘μ˜ 끝을 μ•Œλ¦¬κ³  λͺ¨λ“  λ³€κ²½ 사항을 μ €μž₯ν•œλ‹€.
  3. ROLLBACK
    • λΈ”λ‘μ˜ 끝을 μ•Œλ¦¬κ³  λͺ¨λ“  λ³€κ²½ 사항을 λ˜λŒλ¦°λ‹€.
  • BEGINκ³Ό COMMIT 사이에 μ—¬λŸ¬ λͺ…령문을 ν¬ν•¨ν•˜μ—¬ λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ ν•˜λ‚˜μ˜ μž‘μ—… λ‹¨μœ„λ₯Ό μˆ˜ν–‰ν•˜λŠ” 일련의 μž‘μ—…, 즉 νŠΈλžœμž­μ…˜μ„ μ •μ˜ν•  수 μžˆλ‹€.
  • μ΄λ•Œ ν•œ 단계가 μ‹€νŒ¨ν•˜λ©΄ λ‹€λ₯Έ 단계도 λͺ¨λ‘ μ·¨μ†Œλœλ‹€.
  • κ°€λ Ή 우리의 ν…Œμ΄λΈ”μ—μ„œ ARGO Merchants Oakland LLCλΌλŠ” νšŒμ‚¬μ™€ κ΄€λ ¨λœ μ§€μ €λΆ„ν•œ 데이터λ₯Ό μ •λ¦¬ν•œλ‹€κ³  κ°€μ •ν•΄λ³΄μž.
company
ARGO Merchants Oakland LLC
ARGO Merchants Oakland LLC
ARGO Merchants Oakland, LLC
  • μš°λ¦¬λŠ” 이름이 μΌκ΄€λ˜κΈ°λ₯Ό μ›ν•˜λ―€λ‘œ UPDATE 쿼리λ₯Ό μ‚¬μš©ν•˜μ—¬ μ„Έ 번째 행을 μˆ˜μ •ν•˜λ˜, μ΄λ²ˆμ—λŠ” μ΅œμ’… μ—…λ°μ΄νŠΈ 전에 μ—…λ°μ΄νŠΈ κ²°κ³Όλ₯Ό λ¨Όμ € 확인해볼 것이닀.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- νŠΈλžœμž­μ…˜μ„ μ‹œμž‘ν•˜κ³  μ—…λ°μ΄νŠΈν•˜κΈ°
START TRANSACTION;

UPDATE meat_poultry_egg_establishments
SET company = 'AGRO Merchantss Oakland LLC'
WHERE company = 'AGRO Merchants Oakland, LLC';

-- 변경사항 ν™•μΈν•˜κΈ°
SELECT company
FROM meat_poultry_egg_establishments
WHERE company LIKE 'AGRO%'
ORDER BY company;

-- 변경사항 λ¬΄μ‹œν•˜κΈ°
ROLLBACK;
company
AGRO Merchants Oakland LLC
AGRO Merchants Oakland LLC
AGRO Merchantss Oakland LLC
  • κ²°κ³Όλ₯Ό 보면 μœ„μ™€ 같이 μˆ˜μ •μ΄ 잘λͺ» 된 것을 확인할 수 μžˆλ‹€.
  • ROLLBACK λͺ…λ Ήμ–΄λ₯Ό 톡해 λ³€κ²½ 사항을 되돌릴 수 있고, κ·ΈλŒ€λ‘œ μ €μž₯ν•˜λ €λ©΄ COMMIT λͺ…λ Ήμ–΄λ₯Ό μ‚¬μš©ν•˜λ©΄ λœλ‹€.

νŠΈλžœμž­μ…˜μ„ μ‹œμž‘ν•˜λ©΄ COMMIT을 μ‹€ν–‰ν•˜κΈ° μ „κΉŒμ§€ 데이터에 λŒ€ν•œ λ³€κ²½ 사항이 λ‹€λ₯Έ λ°μ΄ν„°λ² μ΄μŠ€ μ‚¬μš©μžμ—κ²Œ ν‘œμ‹œλ˜μ§€ μ•ŠλŠ”λ‹€.

큰 ν…Œμ΄λΈ”μ„ μ—…λ°μ΄νŠΈν•  λ•Œ μ„±λŠ₯ ν–₯μƒν•˜κΈ°

1
2
3
4
5
6
7
8
9
10
11
12
13
-- μƒˆ 열을 μΆ”κ°€ν•˜κ³  μ±„μš°λ©΄μ„œ ν…Œμ΄λΈ” λ°±μ—…ν•˜κΈ°
CREATE TABLE meat_poultry_egg_establishments_backup AS
SELECT *,
	'2023-02-14 00:00 EST'::timestamp with time zone AS reviewed_date
FROM meat_poultry_egg_establishments;

-- ALTER TABLE을 μ‚¬μš©ν•˜μ—¬ ν…Œμ΄λΈ” 이름 λ°”κΎΈκΈ°
ALTER TABLE meat_poultry_egg_establishments
	RENAME TO meat_poultry_egg_establishments_temp;
ALTER TABLE meat_poultry_egg_establishments_backup
	RENAME TO meat_poultry_egg_establishments;
ALTER TABLE meat_poultry_egg_establishments_temp
	RENAME TO meat_poultry_egg_establishments_backup;
  • PostgreSQL이 λ‚΄λΆ€μ μœΌλ‘œ μž‘λ™ν•˜λŠ” 방식 λ•Œλ¬Έμ— ν…Œμ΄λΈ”μ— 열을 μΆ”κ°€ν•˜κ³  κ°’μœΌλ‘œ μ±„μš°λ©΄ ν…Œμ΄λΈ” 크기가 κΈ‰κ²©νžˆ 컀질 수 μžˆλ‹€.
  • κ°€λ Ή 값이 μ—…λ°μ΄νŠΈλ  λ•Œλ§ˆλ‹€ λ°μ΄ν„°λ² μ΄μŠ€κ°€ κΈ°μ‘΄ ν–‰μ˜ μƒˆ 버전을 μƒμ„±ν•˜λ©΄μ„œ 이전 버전 행을 μ‚­μ œν•˜μ§€λŠ” μ•ŠκΈ° λ•Œλ¬Έμ— 본질적으둜 ν…Œμ΄λΈ”μ˜ 크기λ₯Ό 두 배둜 λŠ˜λ¦°λ‹€.
  • μ‚¬μš©ν•˜μ§€ μ•ŠλŠ” λΆˆν•„μš”ν•œ 곡간 μ œκ±°λŠ” 이후에 19μž₯μ—μ„œ ν•™μŠ΅ν•œλ‹€.
  • μž‘μ€ 데이터셋을 닀룬닀면 λ¬΄μ‹œν•  μ •λ„μ§€λ§Œ, μˆ˜μ‹­λ§Œ λ˜λŠ” 수백만 행을 닀룬닀면 μ—…λ°μ΄νŠΈν•˜λŠ”λ° ν•„μš”ν•œ μ‹œκ°„κ³Ό 그에 λ”°λ₯Έ μΆ”κ°€ λ””μŠ€ν¬ μ‚¬μš©λŸ‰μ€ 상당할 수 μžˆλ‹€.
  • 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 전체 ν…Œμ΄λΈ”μ„ λ³΅μ‚¬ν•˜κ³  μž‘μ—… 쀑에 μ±„μ›Œμ§„ 열을 μΆ”κ°€ν•˜λŠ” λ°©μ‹μœΌλ‘œ λ””μŠ€ν¬ 곡간을 μ ˆμ•½ν•  수 μžˆλ‹€.
  • 그런 λ‹€μŒ ν…Œμ΄λΈ” 이름을 λ³€κ²½ν•˜μ—¬ 볡사본이 원본을 λŒ€μ²΄ν•˜κ³  원본이 λ°±μ—…μš©μ΄ λ˜λ„λ‘ ν•œλ‹€.
  • 그러면 μΆ”κ°€λœ κΈ°μ‘΄ ν–‰ 없이 μƒˆλ‘œμš΄ ν…Œμ΄λΈ”μ„ κ°–κ²Œ λœλ‹€.
  • μ—¬κΈ°μ„œλŠ” ν…Œμ΄λΈ” 이름을 μˆ˜μ •ν•˜κΈ° μœ„ν•΄ ALTER TABLE을 RENAME TO와 ν•¨κ»˜ μ‚¬μš©ν•œλ‹€.
  • 이제 원본 ν…Œμ΄λΈ”μ€ meat_poultry_egg_establishments_backup이고, μΆ”κ°€λœ 열이 μžˆλŠ” 볡사본 ν…Œμ΄λΈ”μ€ meat_poultry_egg_establishments이닀.
  • μ΄λŸ¬ν•œ ν”„λ‘œμ„ΈμŠ€λ₯Ό μ‚¬μš©ν•˜λ©΄ ν–‰ μ—…λ°μ΄νŠΈλ₯Ό λ°©μ§€ν•˜μ—¬ ν…Œμ΄λΈ”μ˜ κ³Όλ„ν•œ νŒ½μ°½μ„ ν”Όν•  수 μžˆλ‹€.
This post is licensed under CC BY 4.0 by the author.