How to run some tasks without user intervention, at specific times? – part 2

Couple of days earlier I wrote about tools to run tasks in db at scheduled times.

In writing this I missed pg_cron. So let's look into it now.

Just like previously, I'm on Ubuntu 20.04, with PGDB apt repo, so installing pg_cron is trivial:

=$ sudo apt-get install postgresql-13-cron

Afterwards I need to enable it, by adding pgcron to preloaded libraries:

$ SHOW shared_preload_libraries;
 shared_preload_libraries 
 ──────────────────────────
 
(1 ROW)
 
$ ALTER system SET shared_preload_libraries = 'pg_cron';
ALTER SYSTEM

This can be changed only on Pg start, so I have to restart my server:

=$ sudo systemctl restart postgresql@13-main.service

afterwards quick check if it's really enabled:

$ SHOW shared_preload_libraries ;
 shared_preload_libraries 
──────────────────────────
 pg_cron
(1 ROW)

All is well. What's more, we can see that it's loaded from shell:

=$ ps f -u postgres
    PID TTY      STAT   TIME COMMAND
  61445 ?        Ss     0:00 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13/main -c config_file=/etc/postgresql/13/main/postgresql.conf
  61458 ?        Ss     0:00  \_ postgres: 13/main: checkpointer 
  61459 ?        Ss     0:00  \_ postgres: 13/main: background writer 
  61460 ?        Ss     0:00  \_ postgres: 13/main: walwriter 
  61461 ?        Ss     0:00  \_ postgres: 13/main: autovacuum launcher 
  61462 ?        Ss     0:00  \_ postgres: 13/main: stats collector 
  61463 ?        Ss     0:00  \_ postgres: 13/main: pg_cron launcher 
  61464 ?        Ss     0:00  \_ postgres: 13/main: logical replication launcher

As you can see, there is a process, pid 61463, that's actually “pg_cron launcher". This should handle all the things that pg_cron should do. So, let's try to use it.

$ CREATE EXTENSION pg_cron;
CREATE EXTENSION
 
$ CREATE TABLE testit (id serial PRIMARY KEY, inserted_on timestamptz, inserted_by int4);
CREATE TABLE
 
$ SELECT cron.schedule('test-insert', '* * * * *', 'insert into testit (inserted_on, inserted_by) values (now(), pg_backend_pid())');
 schedule 
──────────
        1
(1 ROW)

After two minutes:

$ SELECT * FROM testit;
 id │          inserted_on          │ inserted_by 
────┼───────────────────────────────┼─────────────
  12021-01-28 16:15:00.033298+01 │       63466
  22021-01-28 16:16:00.029691+01 │       63540
(2 ROWS)

Nice. Looks to be working.

What do we have available?

First, tables:

$ \dt cron.
              List OF relations
 Schema │      Name       │ TYPE  │  Owner   
────────┼─────────────────┼───────┼──────────
 cron   │ job             │ TABLE │ postgres
 cron   │ job_run_details │ TABLE │ postgres
(2 ROWS)

Names suggest the content, but let's see it:

$ SELECT * FROM cron.job;
 jobid │ schedule  │                                    command                                     │ nodename  │ nodeport │ DATABASE │ username │ active │   jobname   
───────┼───────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼─────────────
     1* * * * *INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ localhost │     5432 │ postgres │ postgres │ t      │ test-INSERT
(1 ROW)
 
$ SELECT * FROM cron.job_run_details ;
 jobid │ runid │ job_pid │ DATABASE │ username │                                    command                                     │  STATUS   │ return_message │          start_time           │           end_time            
───────┼───────┼─────────┼──────────┼──────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┼────────────────┼───────────────────────────────┼───────────────────────────────
     1163466 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 12021-01-28 16:15:00.033227+01 │ 2021-01-28 16:15:00.036568+01
     1263540 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 12021-01-28 16:16:00.029662+01 │ 2021-01-28 16:16:00.032796+01
     1363614 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 12021-01-28 16:17:00.035651+01 │ 2021-01-28 16:17:00.038934+01
     1463715 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 12021-01-28 16:18:00.021993+01 │ 2021-01-28 16:18:00.025342+01
(4 ROWS)

Pretty cool. Now, let's see what functions we have:

$ \df cron.
                                          List OF functions
 Schema │         Name         │ RESULT DATA TYPE │            Argument DATA types             │ TYPE 
────────┼──────────────────────┼──────────────────┼────────────────────────────────────────────┼──────
 cron   │ job_cache_invalidate │ TRIGGER          │                                            │ func
 cron   │ schedule             │ BIGINT           │ job_name name, schedule text, command text │ func
 cron   │ schedule             │ BIGINT           │ schedule text, command text                │ func
 cron   │ unschedule           │ BOOLEAN          │ job_id BIGINT                              │ func
 cron   │ unschedule           │ BOOLEAN          │ job_name name                              │ func
(5 ROWS)

Well, it looks that we can schedule jobs with job name, and without it, and unschedule based on jobid or job_name, if it is there. That's simple. But what is job_cache_invalidate?

This is actually a function called by trigger on cron.job table, that apparently clears some cache about schedule. Nothing that we need to worry about if we don't do anything crazy.

There are couple of interesting points:

  • pg_cron makes sure only one copy of a given job runs at the same time
  • cronjob definition is lifted from vixie cron, so it should handle things like “3-59/17 * * * *" – which can be great for spreading load.

One side thing – all pg_cron config is in single database, postgres by default, but it looks that you can make cronjobs work in any db. Let's try. First, let's clean the situation:

$ SELECT * FROM cron.unschedule(1);
$ DELETE FROM cron.job_run_details WHERE jobid = 1;

Now, in another database, let's make test table:

=$ psql -U depesz -c "create table testit2 (id serial primary key, inserted_on timestamptz, inserted_by int4);"

Please note that it is different name from previously.

Now, I can add the schedule, in postgres db:

$ SELECT cron.schedule('test2-insert', '* * * * *', 'insert into testit2 (inserted_on, inserted_by) values (now(), pg_backend_pid())');
 schedule 
──────────
        2
(1 ROW)

After a short while, I saw in job_run_details:

$ SELECT * FROM cron.job_run_details \gx
─[ RECORD 1 ]──┬────────────────────────────────────────────────────────────────────────────────
jobid          │ 2
runid          │ 16
job_pid        │ 65402
DATABASE       │ postgres
username       │ postgres
command        │ INSERT INTO testit2 (inserted_on, inserted_by) VALUES (now(), pg_backend_pid())
STATUS         │ failed
return_message │ ERROR:  relation "testit2" does NOT exist                                      ↵
               │ LINE 1: INSERT INTO testit2 (inserted_on, inserted_by) VALUES (now()...        ↵
               │                     ^                                                          ↵
               │ 
start_time     │ 2021-01-28 16:30:00.025378+01
end_time       │ 2021-01-28 16:30:00.026865+01

This failed, because it was ran in database postgres, but the testit2 table is in depesz.

So, let's update the job:

$ UPDATE cron.job SET DATABASE='depesz' WHERE jobid = 2;
UPDATE 1
 
$ SELECT * FROM cron.job \gx
─[ RECORD 1 ]─────────────────────────────────────────────────────────────────────────────
jobid    │ 2
schedule │ * * * * *
command  │ INSERT INTO testit2 (inserted_on, inserted_by) VALUES (now(), pg_backend_pid())
nodename │ localhost
nodeport │ 5432
DATABASE │ depesz
username │ postgres
active   │ t
jobname  │ test2-INSERT

And now, after some time:

$ SELECT * FROM testit2;
 id │          inserted_on          │ inserted_by 
────┼───────────────────────────────┼─────────────
  12021-01-28 16:31:00.017937+01 │       65524
(1 ROW)

Cool. All works. That looks really impressive.

One thought on “How to run some tasks without user intervention, at specific times? – part 2”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.