Recently I spent some time thinking about what can be improved when it comes to helping new users start using PostgreSQL.
One thing that almost immediately jumped to my mind is – how to install PostgreSQL? The task is theoretically simple. But there are always some caveats – which packages to use, what to configure in the beginning, where to find config files and logs.
With that in mind I decided to write a howto based on my ideas on what is right. These do not necessarily mean that these are the best for everybody, but I think this is a good start for anyone wanting to start their adventure with PostgreSQL.
Final note of warning – this post is for installing and setting PostgreSQL on developer workstation. As in: server where user can do anything, and we don't really care about security. Please do not configure production servers using this howto.
To make sure I start in as vanilla configuration as possible, I made myself two virtual servers:
- testubuntu – running Ubuntu 20.10 groovy
- testdebian – Debian GNU/Linux 10 (buster)
Both were updated to newest versions of packages before any work was done.
Now, before I will go any further – I tend to not work from root account, unless I absolutely must. So, I need some way to get to root for things like package installation.
Ubuntu ships with sudo configured in such a way that my user (first created) already belongs to sudo group, thanks to which I can:
depesz@testubuntu:~$ sudo id [sudo] password for depesz: uid=0(root) gid=0(root) groups=0(root)
I need to give my password (password to account “depesz"), but clearly sudo works.
Unfortunately – debian has sudo installed, but my user is not added to sudo group by default.
So, first things I need to do (to be able to effectively use sudo), is:
depesz@testdebian:~$ su -c "/usr/sbin/adduser depesz sudo" Password: Adding user `depesz' to group `sudo' ... Adding user depesz to group sudo Done.
Afterwards, I need to logout and login again (to make sure new group is loaded properly), and afterwards I can:
depesz@testdebian:~$ sudo id [sudo] password for depesz: uid=0(root) gid=0(root) groups=0(root)
Now let's install PostgreSQL. Both ubuntu and debian do provide packaged PostgreSQL:
depesz@testubuntu:~$ apt-cache search ^postgresql-[0-9] | grep -E '^postgresql-[0-9.]+ ' postgresql-12 - object-relational SQL database, version 12 server depesz@testdebian:~$ apt-cache search ^postgresql-[0-9] | grep -E '^postgresql-[0-9.]+ ' postgresql-11 - object-relational SQL database, version 11 server
but, there is only one version available, and as you can quickly check these are not the latest. Latest version is 13 (at the time when I write it).
To make it possible to use the repo we need to execute couple of steps, as described in the docs linked above:
sudo apt-get install curl ca-certificates gnupg
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt-get update
These commands do:
- Installs packages that will be necessary to fetch, and verify, cryptographic signatures for the packages
- Downloads, and installs cryptographic key that is used to sign the packages
- Actually tells apt, by writing to /etc/apt/sources.list.d/pgdg.list file, to download packages from apt.postgresql.org
- Forces apt to fetch new indexes (lists of packages) from all sources, including newly added apt.postgresql.org
It is possible that when running last command you will see something like:
... Fetched 559 kB in 1s (485 kB/s) Reading package lists... Done N: Skipping acquire of configured file 'main/binary-i386/Packages' as repository 'http://apt.postgresql.org/pub/repos/apt groovy-pgdg InRelease' doesn't support architecture 'i386'
While the last line, about i386 packages, looks dangerous, it really isn't.
Now, with this new package repository configured our list of available PostgreSQL versions is much more interesting:
depesz@testubuntu:~$ apt-cache search ^postgresql-[0-9] | grep -E '^postgresql-[0-9.]+ ' postgresql-12 - object-relational SQL database, version 12 server postgresql-10 - object-relational SQL database, version 10 server postgresql-11 - object-relational SQL database, version 11 server postgresql-13 - object-relational SQL database, version 13 server postgresql-9.5 - object-relational SQL database, version 9.5 server postgresql-9.6 - object-relational SQL database, version 9.6 server
and on debian, even more:
depesz@testdebian:~$ apt-cache search ^postgresql-[0-9] | grep -E '^postgresql-[0-9.]+ ' postgresql-11 - object-relational SQL database, version 11 server postgresql-10 - object-relational SQL database, version 10 server postgresql-12 - object-relational SQL database, version 12 server postgresql-13 - object-relational SQL database, version 13 server postgresql-8.2 - object-relational SQL database, version 8.2 server postgresql-8.3 - object-relational SQL database, version 8.3 server postgresql-8.4 - object-relational SQL database, version 8.4 server postgresql-9.0 - object-relational SQL database, version 9.0 server postgresql-9.1 - object-relational SQL database, version 9.1 server postgresql-9.2 - object-relational SQL database, version 9.2 server postgresql-9.3 - object-relational SQL database, version 9.3 server postgresql-9.4 - object-relational SQL database, version 9.4 server postgresql-9.5 - object-relational SQL database, version 9.5 server postgresql-9.6 - object-relational SQL database, version 9.6 server
One, generally, should always use the latest version – it has the most features, and the most bugs removed.
So – let's use postgresql-13:
$ sudo apt-get install postgresql-13
Command is the same on both systems.
After it will finish, you can see that PostgreSQL is running:
$ sudo pg_lsclusters Ver Cluster Port Status Owner Data directory Log file 13 main 5432 online postgres /var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log
What do all the fields mean? Well, each line is so called “cluster", which is rather unfortunate term, as it's not cluster in the generally used meaning. Rather it basically means “installation". You can have multiple installations of PostgreSQL on the same server – with the same, or different, versions. Meaning of the fields is:
- Ver which PostgreSQL version is this cluster using
- Cluster name of this cluster. It has to be unique within given version. As in – you can have two clusters named main, in different versions, but only one in single version
- Port which port PostgreSQL is listening on. These are both for TCP/IP connections, and connections over Unix Socket.
- Status well, what is the status of this cluster. For now, it's safe to assume you will only see statuses online, down, and potentially suffixed by ,recovery if given cluster is starting now and executing recovery.
- Owner which system user PostgreSQL data belongs to. Generally you will see there postgres, unless you will do some manual changes.
- Data directory where does PostgreSQL keep all the data files related to this cluster, or at least – main data files.
- Log file where you can find logs for this PostgreSQL cluster
All is simple, I hope.
Now. Let's actually use it.
First, let's start with default, always available, command line interface to PostgreSQL – namely psql. If i'll run psql now, I will be greeted with error:
$ psql psql: error: FATAL: role "depesz" does not exist
This requires some explanation.
There are system accounts (like root, postgres, depesz, and others. Defined in /etc/passwd (by default).
These are not so interesting to PostgreSQL.
To be able to login to PostgreSQL, you have to have account within PostgreSQL.
Immediately after installation there is only one account created, named postgres. This account is also so called superuser – which means someone logged using it can do anything to any database within given cluster.
Interestingly, we will not be able to use it to login, at least not immediately:
$ psql -U postgres psql: error: FATAL: Peer authentication failed for user "postgres"
Now, the problem is that, by default, configuration of PostgreSQL will only allow user postgres when logging from system account postgres.
So I can do it, using:
$ sudo -u postgres psql psql (13.1 (Debian 13.1-1.pgdg100+1)) Type "help" for help. postgres=#
How does that work?
Normally, when you do “sudo some_command", sudo logs to root account, and then executed some_command.
But if I'd use “sudo -u postgres some_command" – then sudo will log not to root, but rather to account that I named as -u option.
This means that the psql I started will be running from postgres system account.
Now, a bit more info – if you don't provide -U option to psql (or any other way to provide database user name) – psql assumes that your database user is the same as your system user.
Similarly, if you don't provide -d option (name of database you want to connect to), it will assume that you want to connect to database named the same way as your database user.
Since this howto is about setting Pg for developer, let's make it simpler to log using varying accounts.
In directory /etc/postgresql/13/main you will find file pg_hba.conf (please note that directory depends on version and cluster name, so if you installed different version, the path will be slightly different.
If we'd remove comments, it will look like this:
local all postgres peer local all all peer host all all 127.0.0.1/32 md5 host all all ::1/128 md5 local replication all peer host replication all 127.0.0.1/32 md5 host replication all ::1/128 md5
Comments provide pretty good explanation on what's there, but for now, let's just change all peer and md5 at the end of lines to trust
This is insecure, as it basically tells PostgreSQL that anyone can log using any database account, without checking of any password or other ways of authentication.
Which is of course very important in production, but not so much on developer workstation.
You can do the change using this one-liner:
$ sudo perl -pi -e 's/(peer|md5)$/trust/' /etc/postgresql/13/main/pg_hba.conf
or just use some editor, but please note that you will have to edit it using root or postgres system user.
After the change we need to reload configuration.
This can be done using quite a lot of various methods:
- in shell: sudo pg_ctlcluster 13 main reload
- also in shell: sudo systemctl reload email@example.com
- or, if you have psql open, logged in as superuser, you can simply: select pg_reload_conf();
All will reload pg_hba.conf, and from this moment on, from your normal system account you will be able to run psql using superuser:
$ psql -U postgres psql (13.1 (Ubuntu 13.1-1.pgdg20.10+1)) Type "help" for help. postgres=#
I find it preferable to using sudo, but you might find otherwise.
Now, within the Pg cluster, there are, by default three databases (which you can see by calling psql -l, or by executing \l in open psql):
- template0 – inaccessible to users, used, sometimes, when creating new databases
- template1 – basic template for new database. Whenever you create new database (not cluster, database within cluster), and you don't specify otherwise, template1 will be copied and renamed to your requested name.
- postgres – generally empty database that is used by various tools as “default database to connect to" when user doesn't specify another db. Generally it's not advised to change anything there.
To make it possible/sensible to use PostgreSQL as your own user, you need to create database user.
You can do it using shell:
$ createuser -U postgres depesz
or using SQL, within psql (or other client program, we'll get to it):
$ psql -U postgres ... postgres=# CREATE USER depesz; CREATE ROLE
I had to use -U postgres option to psql or createuser, because otherwise the tools would try to connect to database as user depesz (name of system account I'm currently logged in), and there was no such user before.
Now, I should be able to connect to database as myself:
$ psql psql: error: FATAL: database "depesz" does not exist
Well, it did log in as “depesz" account, but, since I didn't provide database name – it tried to connect to database “depesz" – which still doesn't exist. So let's make it. Again, two ways:
- shell: createdb -U postgres -O depesz depesz
- in SQL, logged as superuser: create database depesz with owner depesz;
Both will do the same – make new database (copy of template1), name it depesz, and make depesz database user owner of this database (which means that it has more privileges in there).
And now, finally:
$ psql ... depesz=> select current_user, current_database(); current_user | current_database --------------+------------------ depesz | depesz (1 row)
please note that “depesz" in psql prompt means database name, not user name. This can be checked quickly by connecting to another database:
$ psql -d postgres ... postgres=> select current_user, current_database(); current_user | current_database --------------+------------------ depesz | postgres (1 row)
With this in place, you can now use psql to do all normal SQL operations. Defining data structures, adding data, modifying data, deleting, selecting. All is accessible.
But – lots of people, apparently, don't really like psql.
$ sudo apt-get install pgadmin4
This will take some time, but eventually will end.
Afterwards, let's launch it:
Startup takes some time – when it will finish, in the console you'll see:
* Serving Flask app "pgadmin" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Debug mode: off
and your web browser will open pgAdmin page. First thing – it will ask you for a password – this will be used to protect all communication details with actual databases. Since it's devel machine, it doesn't really matter what you will choose, as long as you will remember it.
It should look basically like this:
In the quick links section, there is a button “Add New Server".
After you will click it you will be presented with dialog to define new server:
Please note the tabs at the top – General / Connection / SSL, and so on.
For now, let's put “Personal" as Name in General. Then in “Connection" Tab:
- Host: 127.0.0.1 – as it will connect to localhost only
- Port should be the same as pg_lsclusters showed, but by default it's correct 5432
- Username – in my case it should be “depesz"
I could use username “postgres", but I made myself personal user so I can use it.
After clicking save, I got page looking like:
And in there, you can do whatever you want. Generally you'd first unroll Databases, and select your own database, like here:
How to use pgAdmin – sorry, I just don't know it that well. Started it, looked at couple of things, and disconnected. I'm very old-school, so I use console and psql exclusively.
But I know that others like GUI tools.
Basically, at this moment, you can install any database application, that supports PostgreSQL, have it connect to db server on you machine, using host 127.0.0.1, port 5432, and your own username and database name, and it should work.
If something is wrong – you can see into logs:
=$ sudo tail -n 10 /var/log/postgresql/postgresql-13-main.log [sudo] password for depesz: 2020-12-21 13:09:42.279 CET  LOG: starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit 2020-12-21 13:09:42.279 CET  LOG: listening on IPv4 address "127.0.0.1", port 5432 2020-12-21 13:09:42.284 CET  LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-12-21 13:09:42.291 CET  LOG: database system was shut down at 2020-12-21 13:09:41 CET 2020-12-21 13:09:42.297 CET  LOG: database system is ready to accept connections 2020-12-21 13:17:19.279 CET  LOG: received SIGHUP, reloading configuration files
That should be good enough for development. For production, or multi-user development, we will need some more settings, but this will have to wait for next blogpost.