쿼리 성능을 튜닝하기 위해 Amazon RDS PostgreSQL 또는 Aurora PostgreSQL에 대한 쿼리의 실행 계획을 기록하려면 어떻게 해야 합니까?

6분 분량

쿼리에 어떤 실행 계획이 사용되는지 확인하고 그에 맞게 적절히 튜닝하고 싶습니다. Amazon Relational Database Service(Amazon RDS) PostgreSQL DB 인스턴스 또는 Amazon Aurora PostgreSQL DB 클러스터의 이러한 쿼리에 대한 실행 계획을 기록하려면 어떻게 해야 합니까?

해결 방법

auto_explain 모듈을 사용하여 쿼리에 대한 실행 계획을 기록할 수 있습니다.

1. shared_preload_libraries 파라미터에 auto_explain을 추가합니다.

2. DB 인스턴스 또는 클러스터를 재부팅합니다.

3. auto_explain을 사용하려면 auto_explain.log_min_duration 파라미터를 -1 이외의 값으로 변경해야 합니다. 이 값은 최소 명령문 실행 시간(밀리초)으로, 전달되어 명령문이 실행될 경우 실행 계획이 기록됩니다. 파라미터를 0으로 설정하면 모든 계획이 기록됩니다.

중요: auto_explain.log_min_duration 파라미터를 0으로 설정하면 성능이 저하되고 스토리지 공간이 대폭 소비됩니다. 이로 인해 인스턴스에 문제가 발생할 수 있습니다.

4. 데이터베이스에서 쿼리를 실행합니다.

5. 실행 계획을 포함하여 PostgreSQL 로그 파일을 보거나 다운로드합니다. 다음 예제 실행 계획에는 실행 시간(4673.182ms)과 쿼리 텍스트가 포함된 실행 계획이 있습니다.

2020-01-02 03:08:44 UTC:27.0.3.156(21356):postgres@postgres:[2040]:LOG: duration: 4673.182 ms plan:
Query Text: SELECT COUNT(*) FROM hoge ORDER BY 1;
Sort (cost=215575.00..215575.00 rows=1 width=8)
Sort Key: (count(*))
-> Aggregate (cost=215574.98..215574.99 rows=1 width=8)
-> Seq Scan on hoge (cost=0.00..190336.18 rows=10095518 width=0)

auto_explain 파라미터를 사용하여 쿼리 성능 튜닝

pg_autoexplain을 활성화하는 것 이외에 auto_explain.log_min_duration 파라미터는 다른 유용한 파라미터를 사용할 수 있게 합니다. 다음 auto_explain 파라미터(재부팅하지 않고 변경할 수 있음)를 사용하여 쿼리 성능을 튜닝할 수 있습니다. 자세한 내용은 각 파라미터에 대해 PostgreSQL 설명서를 참조하세요.

auto_explain.log_analyze

auto_explain.log_analyze 파라미터는 실행 계획이 기록될 때 EXPLAIN ANALYZE 출력(EXPLAIN 출력 대신)을 인쇄합니다. 이 파라미터는 오버헤드로 인해 쿼리 성능이 저하시킬 수 있으며 기본값은 꺼짐입니다. 자세한 내용은 Postgres 설명서의 auto_explain.log_analyze를 참조하세요.

다음 예를 참조하세요.

2020-02-01 07:42:09 UTC:27.0.3.145(29943):master@postgres:[789]:LOG:  duration: 18.658 ms  plan:
 Query Text: SELECT COUNT(*) FROM t1 INNER JOIN t2 USING(id);
 Aggregate  (cost=577.45..577.46 rows=1 width=8) (actual time=18.641..18.641 rows=1 loops=1)
   ->  Hash Join  (cost=269.98..552.45 rows=9999 width=0) (actual time=8.108..16.576 rows=9999 loops=1)
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on t1  (cost=0.00..144.99 rows=9999 width=4) (actual time=0.839..4.151 rows=9999 loops=1)
         ->  Hash  (cost=144.99..144.99 rows=9999 width=4) (actual time=7.186..7.186 rows=9999 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               ->  Seq Scan on t2  (cost=0.00..144.99 rows=9999 width=4) (actual time=0.667..4.289 rows=9999 loops=1)

auto_explain.log_buffers

auto_explain.log_buffers 파라미터를 사용하면 실행 계획을 기록할 때 버퍼 사용량 통계를 인쇄할지 여부를 결정할 수 있습니다. 이 파라미터를 사용하면 설명과 함께 버퍼 옵션을 사용하는 것과 동일한 효과가 있습니다. 반드시 auto_explain.log_buffers와 함께 auto_explain.log_analyze를 활성화해야 합니다. 기본적으로 이 파라미터는 꺼져 있고, 슈퍼 사용자인 경우에만 변경할 수 있습니다. 자세한 내용은 PostgreSQL 설명서의 auto_explain.log_buffers를 참조하세요.

다음 예제 출력을 참조하세요.

2020-02-01 08:02:02 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 14.875 ms  plan:
 Query Text: SELECT COUNT(*) FROM t1 INNER JOIN t2 USING(id);
 Aggregate  (cost=577.45..577.46 rows=1 width=8) (actual time=14.861..14.861 rows=1 loops=1)
   Buffers: shared hit=93
   ->  Hash Join  (cost=269.98..552.45 rows=9999 width=0) (actual time=5.293..12.768 rows=9999 loops=1)
         Hash Cond: (t1.id = t2.id)
         Buffers: shared hit=93
         ->  Seq Scan on t1  (cost=0.00..144.99 rows=9999 width=4) (actual time=0.007..2.385 rows=9999 loops=1)
               Buffers: shared hit=45
         ->  Hash  (cost=144.99..144.99 rows=9999 width=4) (actual time=5.250..5.250 rows=9999 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               Buffers: shared hit=45
               ->  Seq Scan on t2  (cost=0.00..144.99 rows=9999 width=4) (actual time=0.009..2.348 rows=9999 loops=1)
                     Buffers: shared hit=45

auto_explain.log_nested_statements

기록을 위해 중첩된 구문을 고려하려면 auto_explain.log_nested_statements 파라미터를 사용합니다. 이 파라미터를 사용하지 않는 한, 최상위 쿼리 계획이 기본적으로 기록됩니다. 기본값은 꺼짐입니다. 자세한 내용은 PostgreSQL 설명서의 auto_explain.nested_statements를 참조하세요.

다음 예를 참조하세요.

CREATE OR REPLACE FUNCTION count_tables() RETURNS integer AS $$
        DECLARE
         i integer;
         j integer;
        BEGIN
                SELECT COUNT(*) INTO i FROM t1;
                SELECT COUNT(*) INTO j FROM t2;
                RETURN i + j;                
        END;
$$ LANGUAGE plpgsql;

다음 예제 출력을 참조하세요.

2020-02-01 08:15:25 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 0.651 ms  plan:
 Query Text: SELECT COUNT(*)        FROM t1
 Aggregate  (cost=169.99..170.00 rows=1 width=8)
   ->  Seq Scan on t1  (cost=0.00..144.99 rows=9999 width=0)
2020-02-01 08:15:25 UTC:27.0.3.145(27477):master@postgres:[15514]:CONTEXT:  SQL statement "SELECT COUNT(*)        FROM t1"
 PL/pgSQL function count_tables() line 6 at SQL statement
2020-02-01 08:15:25 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 0.637 ms  plan:
 Query Text: SELECT COUNT(*)        FROM t2
 Aggregate  (cost=169.99..170.00 rows=1 width=8)
   ->  Seq Scan on t2  (cost=0.00..144.99 rows=9999 width=0)
2020-02-01 08:15:25 UTC:27.0.3.145(27477):master@postgres:[15514]:CONTEXT:  SQL statement "SELECT COUNT(*)        FROM t2"
 PL/pgSQL function count_tables() line 7 at SQL statement
2020-02-01 08:15:25 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 1.435 ms  plan:
 Query Text: SELECT count_tables();
 Result  (cost=0.00..0.26 rows=1 width=4)

auto_explain.log_timing

auto_explain.log_timing 파라미터를 사용하면 실행 계획을 기록할 때 노드별 타이밍 정보를 인쇄할지 여부를 제어할 수 있습니다. 이 파라미터를 사용하면 설명과 함께 타이밍을 사용하는 것과 유사하게 작동합니다. auto_explain.log_timing을 사용하지 않도록 설정하면 시스템 클록을 반복적으로 읽는 오버헤드를 줄일 수 있습니다. 기본값은 켜짐입니다. 자세한 내용은 PostgreSQL 설명서의 auto_explain.log_timing을 참조하세요.

auto_explain.log_triggers

실행 계획을 기록할 때 트리거 실행 통계를 포함하려면 auto_explain.log_triggers 파라미터를 사용합니다. 또한 auto_explain.log_triggers 사용 시 auto_explain.log_analyze를 활성화해야 합니다. 기본값은 꺼짐입니다.

자세한 내용은 PostgreSQL 설명서의 auto_explain.log_triggers를 참조하세요.

다음 예를 참조하세요.

CREATE FUNCTION emp_log_trigger_func() RETURNS trigger AS $emp_stamp$
    DECLARE
        count integer;
    BEGIN
        SELECT COUNT(*) INTO count FROM emp WHERE empname = NEW.empname;
        INSERT INTO emp_log(created_at, operation, user_name) VALUES(current_timestamp, TG_OP, user);
        RETURN NEW;
    END;
$emp_stamp$ LANGUAGE plpgsql;
CREATE TRIGGER emp_stamp BEFORE INSERT OR UPDATE ON emp
    FOR EACH ROW EXECUTE PROCEDURE emp_log_trigger_func();

다음 예제 출력을 참조하세요.

2020-02-01 08:57:44 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 1.463 ms  plan:
 Query Text: INSERT INTO emp_log(created_at, operation, user_name) VALUES(current_timestamp, TG_OP, user)
 Insert on emp_log  (cost=0.00..0.03 rows=1 width=168) (actual time=1.461..1.461 rows=0 loops=1)
   ->  Result  (cost=0.00..0.03 rows=1 width=168) (actual time=0.009..0.010 rows=1 loops=1)
2020-02-01 08:57:44 UTC:27.0.3.145(27477):master@postgres:[15514]:CONTEXT:  SQL statement "INSERT INTO emp_log(created_at, operation, user_name) VALUES(current_timestamp, TG_OP, user)"
 PL/pgSQL function emp_log_trigger_func() line 3 at SQL statement
2020-02-01 08:57:44 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 1.602 ms  plan:
 Query Text: INSERT INTO emp VALUES('name', 100, current_timestamp, 'hoge');
 Insert on emp  (cost=0.00..0.01 rows=1 width=76) (actual time=1.600..1.600 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=76) (actual time=0.006..0.006 rows=1 loops=1)
 Trigger emp_stamp: time=1.584 calls=1

auto_explain.log_verbose

auto_explain.log_verbose 파라미터를 사용하면 실행 계획을 기록할 때 자세한 정보 표시 세부 정보를 인쇄할지 여부를 결정할 수 있습니다. 이 파라미터를 사용하면 설명과 함께 자세한 정보 표시 옵션을 사용하는 것과 유사하게 작동합니다. 기본값은 꺼짐입니다. 자세한 내용은 PostgreSQL 설명서의 auto_explain.log_verbose를 참조하세요.

다음 예를 참조하세요.

2020-02-01 09:03:20 UTC:27.0.3.145(27477):master@postgres:[15514]:LOG:  duration: 3.492 ms  plan:
 Query Text: SELECT COUNT(*) FROM t1 INNER JOIN t2 USING(id);
 Aggregate  (cost=577.45..577.46 rows=1 width=8)
   Output: count(*)
   ->  Hash Join  (cost=269.98..552.45 rows=9999 width=0)
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on public.t1  (cost=0.00..144.99 rows=9999 width=4)
               Output: t1.id
         ->  Hash  (cost=144.99..144.99 rows=9999 width=4)
               Output: t2.id
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               ->  Seq Scan on public.t2  (cost=0.00..144.99 rows=9999 width=4)
                     Output: t2.id

또는 튜닝을 위해 더 적은 쿼리의 설명 계획을 보려는 경우 해당 쿼리에서 EXPLAIN ANALYZE를 실행할 수 있습니다. 그러면 로깅의 오버헤드가 제거되고 각 단계의 실행 시간과 함께 쿼리 계획이 단계별로 표시됩니다. EXPLAIN ANALYZE 문을 수동으로 호출하면 임시 파일을 생성한 단계, 시퀀스 스캔을 수행하는 단계 등과 같은 이벤트를 식별할 수 있습니다.

postgres=> EXPLAIN ANALYZE SELECT * FROM hoge ORDER BY 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Sort (cost=7343904.60..7444904.60 rows=40400000 width=4) (actual time=37795.033..45133.431 rows=40400000 loops=1)
Sort Key: id
Sort Method: external merge Disk: 553512kB
-> Seq Scan on hoge (cost=0.00..582762.00 rows=40400000 width=4) (actual time=0.579..8184.498 rows=40400000 loops=1)
Planning Time: 0.549 ms
Execution Time: 47161.516 ms
(6 rows)

쿼리 성능을 튜닝하기 위해 Amazon RDS PostgreSQL 또는 Aurora PostgreSQL에 대한 쿼리의 실행 계획을 기록하려면 어떻게 해야 합니까?

해결 방법

auto_explain 파라미터를 사용하여 쿼리 성능 튜닝

관련 정보

관련 비디오

관련 콘텐츠