First of all – I base information in this post on Linux, so if you are using another operating system – it's quite likely that it will not help you.
Second – if you're using Linux, you have probably some package manager – dpkg, rpm, pacman, poldek, whatever. Right? So, it should be easy to install Pg using this package manager?
Well, yes and no. Yes – of course all (known to me) Linux distributions include PostgreSQL, but I do not use their prepackaged version.
Why, and how do I install?
Why is simple – I don't like those packages. I was using (so far) packages on Debian, Ubuntu and Arch. Debian and Ubuntu chose to put config files away from datadir – which has some merit, but it just makes it difficult for newbies for find config (yes, config in etc makes it difficult, I can show irc logs to prove the point).
Additionally – they by default use ident on unix connections, which I don't like. As for Arch – they don't force ident, and they leave config files where they belong, but their postgresql package is somewhat weird – it has plpythonu, but not plperl. It doesn't have contrib modules, and even in separate packages I've seen only tsearch2 – which is even more weird, when you'll consider that tsearch2 is part of core, so making “tsearch2" package for PostgreSQL 8.4 doesn't make any sense.
As for RedHat packages – I don't have enough exposure to them to be able to say anything meaningful, but I heard they also used this “ident by default" thing, which makes me really sad.
You have also to consider one thing – if you do “apt-get install postgresql", or “pacman -S postgresql" or “yum install postgresql" – you will get the software, built as somebody wanted, not necessarily how you want it, and what's more – you're missing quite a bit of knowledge you can learn by actually building and installing it yourself.
I remember, that when I first tried to build Pg – I actually succeeded, thanks to quite nice manual, but since it was my first time, I just configured it “on defaults", and was happy to get it up and running.
So, I intend that this blogpost will be helpful for people that want to start their adventure with PostgreSQL, and would like to know a bit more about how it works, what's what, and how to get it up and running.
So, without further ado, let's move to getting it up and running.
Before we will even start dealing with PostgreSQL – we need to prepare the system. For this blogpost, I prepared 5 machines (well, virtual machines), with following distributions:
- Arch Linux
- CentOS 5.4
- Debian 5.0
- Fedora 12
- Ubuntu 9.10
All of these systems have been update to newest packages in their respective versions/repositories, but otherwise all systems are brand new, with just base packages + ssh.
To install PostgreSQL we will need to install some packages. Depending on distribution it will be:
pacman -S gcc python libxml2 tcl make
yum install gcc readline-devel zlib-devel libxml2-devel tcl-devel python-devel
apt-get install bzip2 build-essential libreadline-dev zlib1g-dev libxml2-dev tcl-dev python-dev
yum install gcc perl-core readline-devel zlib-devel libxml2-devel tcl-devel python-devel
apt-get install build-essential libreadline-dev zlib1g-dev libxml2-dev tcl-dev python-dev libperl-dev
- SuSE (SLES10), information provided by Filip (vide comments):
zypper install gcc make readline-devel zlib-devel openssl-devel libxml2-devel tcl-devel python-devel
Now, that we have all of it installed, we will need system user that PostgreSQL will run from. It is important to understand that PostgreSQL cannot run from root account – it will not start. So we need some shell account to run from. Since in most distributions, system account for PostgreSQL is named postgres we will use something different. Reason is pretty simple – we do not want any interference, in case we'd ever need/want to install packaged version of PostgreSQL, or anything like this.
I generally tend to name the system account pgdba. So, let's create the user:
[root@arch ~]# useradd --home-dir /home/pgdba --create-home --shell /bin/bash --user-group --comment "System account for running PostgreSQL" pgdba [root@arch ~]# id pgdba uid=1000(pgdba) gid=1000(pgdba) groups=1000(pgdba)
[root@centos ~]# useradd --home-dir /home/pgdba --create-home --shell /bin/bash --comment "System account for running PostgreSQL" pgdba [root@centos ~]# id pgdba uid=500(pgdba) gid=500(pgdba) groups=500(pgdba) context=root:system_r:unconfined_t:SystemLow-SystemHigh
debian:~# useradd --home-dir /home/pgdba --create-home --shell /bin/bash --user-group --comment "System account for running PostgreSQL" pgdba debian:~# id pgdba uid=1001(pgdba) gid=1001(pgdba) groups=1001(pgdba)
[root@fedora ~]# useradd --home-dir /home/pgdba --create-home --shell /bin/bash --user-group --comment "System account for running PostgreSQL" pgdba [root@fedora ~]# id pgdba uid=501(pgdba) gid=501(pgdba) groups=501(pgdba)
root@ubuntu:~# useradd --home-dir /home/pgdba --create-home --shell /bin/bash --user-group --comment "System account for running PostgreSQL" pgdba root@ubuntu:~# id pgdba uid=1001(pgdba) gid=1001(pgdba) groups=1001(pgdba)
As you can see only in case of CentOS the command has been different (their version of useradd doesn't handle –user-group, which we want to use).
You might wanted why do we need user-group, or what it is. Generally users belong to groups, and user-group means that for user “pgdba" there will be created group “pgdba". Otherwise user could get attached to group “users" or it could get user-group created anyway – depending on settings on your system.
Now. We have the account, and all packages are installed. All other work should work exactly the same, regardless of which Linux we will use. Isn't it great?
So, we need to switch to pgdba user, which is done by:
[root@arch ~]# su - pgdba [pgdba@arch ~]$
(I'm showing output from arch, but it works the same way on all Linuxes).
Now, we need to get sources of PostgreSQL – this can be easily done by downloading from PostgreSQL site. Let's assume you chose mirror sunsite.icm.edu.pl (which is pretty sane, as long as you're in Poland).
On pgdba account we need to create src directory, and download, and unpack the file there:
[pgdba@arch ~]$ mkdir src [pgdba@arch ~]$ cd src [pgdba@arch src]$ wget ftp://sunsite.icm.edu.pl/site/postgres/source/v8.4.2/postgresql-8.4.2.tar.bz2 ... 2010-02-26 00:43:17 (35.2 MB/s) - postgresql-8.4.2.tar.bz2 saved  [pgdba@arch src]$ tar xjf postgresql-8.4.2.tar.bz2 [pgdba@arch src]$ rm postgresql-8.4.2.tar.bz2 [pgdba@arch src]$ cd postgresql-8.4.2/
As you can see, now we have the sources unpacked in /home/pgdba/src/postgresql-8.4.2.
You might ask – why do we need src directory? It's simple – I like to have $HOME as clean as possible. When your PostgreSQL usage will increase you might want to download and install another tools – Slony, pgbouncer, pgpool or some modules from pgfoundry. In such case having directory designed to keep sources will come in handy.
Now, just like docs say – we need to configure the sources. But, instead of running ./configure –some –options, I usually do:
[pgdba@arch postgresql-8.4.2]$ ./configure --help > ../my.configure.sh [pgdba@arch postgresql-8.4.2]$ vi ../my.configure.sh
(Of course you might want to use another editor than vi).
Reason for this is very simple – if I will ever want to install another version – to play with it, upgrade, or just reinstall current version for whatever reason – I will have ready script that will configure it up to my taste.
So, having this script opened, I can remove lines, edit it all the way I want. My standard my.configure.sh afterwards look like this:
[pgdba@arch postgresql-8.4.2]$ chmod 700 ../my.configure.sh [pgdba@arch postgresql-8.4.2]$ cat ../my.configure.sh #!/bin/bash ./configure \ --prefix=/opt/pgsql-8.4.2 \ --with-pgport=5840 \ --with-tcl \ --with-perl \ --with-python \ --enable-integer-datetimes \ --without-krb5 \ --without-pam \ --without-bonjour \ --without-openssl \ --with-readline \ --with-libxml \ --with-zlib \ --with-gnu-ld
Explanation of my choise of options:
- prefix – it makes all programs, libraries and docs built by compilation to be installed in /opt/pgsql-8.4.2 – I want this directory fully versioned, so I will be able to easily upgrade without removing previous versions of binaries
- pgport – I move PostgreSQL to non standard port to be able to run multiple PostgreSQL instances on the same machine. I generally build the port using following logic:
- first digit – always 5
- 2nd and third digit – major version of PostgreSQL
- 4th digit – number of instance of this version of PostgreSQL on this machine
Usually last digit is 0 – I rarely need more than 1 instance of any given PostgreSQL version, but if I'll need – my port number assignment schema can easily accommodate it. To give final example – when I have 9.0 Postgresql installed, and will have 2 instances, 2nd instance will listen on port 5901
- tcl/perl/python – I like to have choice what I use to write my stored procedures. 99.9% of them are in PL/pgSQL or even plain SQL, but building other PL/* languages makes the installation more versatile
- integer datetimes – it's a must have, and default – I put it in my.configure.sh just to remember never to turn it off
- krb5 (kerberos) and pam – I don't need it. I use PostgreSQL mechanisms for authentication, and it suits me well enough
- bonjour – never really used it – perhaps because I don't own any Mac or Windows, but I generally seem to remember hostnames of database servers, don't need to have “discovery" for them
- openssl – this is potentially controversial. I believe that there are better ways to secure connection – for example using OpenVPN. If you have usecase where you need SSL when connecting to Pg – fine. But since you're reading my startup blogpost – I believe that you can safely omit it.
- readline – well, I like to have tab-completion and history in psql
- xml – it can come in handy if I'd want to use the xml datatype
- zlib – it's on by default, and it's just like integer datetimes – a must have
- gnu ld – well, I'm on Linux, and ld is GNU ld 🙂
Uff. That was long. So, having this script I can:
[pgdba@arch postgresql-8.4.2]$ ../my.configure.sh checking build system type... i686-pc-linux-gnu ... <a lot of lines> ... config.status: linking ./src/makefiles/Makefile.linux to src/Makefile.port
It's configured. If it didn't finish with the linking line – something went wrong.
Now, it's time to actually compile it:
[pgdba@arch postgresql-8.4.2]$ make make -C doc all ... <a lot of lines> ... All of PostgreSQL successfully made. Ready to install.
Isn't it great? But we're not installing. Yet. First we need to build contrib modules. If you're not familiar with the concept – contrib modules are distributed together with PostgreSQL, but are not built by default. But since building them doesn't cost anything, and it's definitely better to build it all now, than to remember how to add it after a week – let's build the contrib:
[pgdba@arch postgresql-8.4.2]$ cd contrib [pgdba@arch contrib]$ make make: Entering directory `/home/pgdba/src/postgresql-8.4.2/contrib/adminpack' ... <a lot of lines> ... make: Leaving directory `/home/pgdba/src/postgresql-8.4.2/contrib/xml2'
YAY. It's built.
Now, we need to install it. And here goes issue – it should be done by root. Reason is very simple – pgdba, that is the user that PostgreSQL will be running on, should not have privileges to alter the binaries. Just in case.
So, we need to go back to root account, and:
[root@arch ~]# cd ~pgdba/src/postgresql-8.4.2/ [root@arch postgresql-8.4.2]# make install make -C doc install ... <a lot of lines> ... PostgreSQL installation complete.
Of course we don't trust above message – we also need:
[root@arch postgresql-8.4.2]# cd contrib [root@arch contrib]# make install make: Entering directory `/home/pgdba/src/postgresql-8.4.2/contrib/adminpack' ... <a lot of lines> ... make: Leaving directory `/home/pgdba/src/postgresql-8.4.2/contrib/xml2'
Now it's really installed.
We will just add one more thing:
[root@arch postgresql-8.4.2]# cd /opt/ [root@arch opt]# ln -s pgsql-8.4.2/ pgsql-8.4 [root@arch opt]# ls -l total 4 lrwxrwxrwx 1 root root 12 Feb 26 01:57 pgsql-8.4 -> pgsql-8.4.2/ drwxr-xr-x 6 root root 4096 Feb 26 01:53 pgsql-8.4.2
Reason for this is pretty simple – when you upgrade your PostgreSQL, but within the same major version (i.e. 8.4.1 -> 8.4.2, or 8.3.5 -> 8.3.9) – you just need to change the binaries, and restart PostgreSQL.
With these symlinks, we can simply change the symlink after upgrade, and restart pg, without any more dealing with version paths.
Now, with all of these, we should add binaries path to $PATH, to avoid having to type
One solution is to simply add:
to your $HOME/.bashrc and/or $HOME/.bash_profile, but I prefer slightly different approach.
Let's write such simple script, and save it as /usr/local/bin/pg-env.sh:
[root@arch ~]# cat /usr/local/bin/pg-env.sh #!/bin/bash VERSION="$1" if [[ -z "$VERSION" ]] then VERSION=$( /bin/ls -dc1 /opt/pgsql-* | sort -t. -nrk 3,3 | sort -t. -nsrk 2,2 | sort -t- -snrk 2,2 | head -n 1 | cut -d- -f2 ) fi STRIPPED_PATH="$( echo $PATH | sed 's#\(^\|:\)/opt/pgsql[^:]*:\?#\1#' )" export PATH="/opt/pgsql-$VERSION/bin:$STRIPPED_PATH" echo "Pg version $VERSION chosen"
This script is intended to run like this:
[pgdba@arch ~]$ echo $PATH /bin:/usr/bin:/sbin:/usr/sbin:/usr/bin/perlbin/site:/usr/bin/perlbin/vendor:/usr/bin/perlbin/core [pgdba@arch ~]$ . /usr/local/bin/pg-env.sh 8.4 Pg version 8.4 chosen [pgdba@arch ~]$ echo $PATH /opt/pgsql-8.4/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/bin/perlbin/site:/usr/bin/perlbin/vendor:/usr/bin/perlbin/core
As you can see, when you run it with version, it sets appropriately PATH.
What's more – thanks to the cryptic VERSION=$(…) line, when you run the script without any arguments, it does:
[pgdba@arch ~]$ . /usr/local/bin/pg-env.sh Pg version 8.4.2 chosen
I.e. it chooses newest installed PostgreSQL.
Now. It's all great that we have Pg installed, but we'd want to get some databases created.
Before we will go to it, let's make sure that pgdba account has correct environment. On this account please make 2 files: .bashrc and .bash_profile, with content as shown below:
[pgdba@arch ~]$ cat .bashrc export PS1="\u@\h:\w\$ " export PATH=/usr/local/bin:/usr/bin:/bin source /usr/local/bin/pg-env.sh > /dev/null export LANG=en_US.UTF-8 export LC_CTYPE="en_US.UTF-8" export LC_NUMERIC="en_US.UTF-8" export LC_TIME="en_US.UTF-8" export LC_COLLATE="en_US.UTF-8" export LC_MONETARY="en_US.UTF-8" export LC_MESSAGES="en_US.UTF-8" export LC_PAPER="en_US.UTF-8" export LC_NAME="en_US.UTF-8" export LC_ADDRESS="en_US.UTF-8" export LC_TELEPHONE="en_US.UTF-8" export LC_MEASUREMENT="en_US.UTF-8" export LC_IDENTIFICATION="en_US.UTF-8" [pgdba@arch ~]$ cat .bash_profile . $HOME/.bashrc
Of course you might want to change the locale to some other, but please make sure that it's UTF-8. Thanks to this you will avoid a lot of potential problems with encodings.
So, now we need to make first PostgreSQL instance. Idea is pretty simple – thanks to above .bashrc, when you relogin to pgdba account you will have proper paths set. So you can:
pgdba@arch:~$ cd /home/pgdba; mkdir data pgdba@arch:~$ initdb -E UTF8 -D /home/pgdba/data/5840 The files belonging to this database system will be owned by user "pgdba". This user must also own the server process. The database cluster will be initialized with locale en_US.UTF-8. The default text search configuration will be set to "english". creating directory /home/pgdba/data/5840 ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 28MB creating configuration files ... ok creating template1 database in /home/pgdba/data/5840/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: postgres -D /home/pgdba/data/5840 or pg_ctl -D /home/pgdba/data/5840 -l logfile start
This is pretty cool. Now we can start PostgreSQL, and see it it works:
pgdba@arch:~$ pg_ctl -D /home/pgdba/DATA/5840 -l logfile START server starting pgdba@arch:~$ psql psql: FATAL: DATABASE "pgdba" does NOT exist pgdba@arch:~$ createdb pgdba@arch:~$ psql -c "select version()" version -------------------------------------------------------------------------------- PostgreSQL 8.4.2 ON i686-pc-linux-gnu, compiled BY GCC gcc (GCC) 4.4.3, 32-bit (1 ROW) pgdba@arch:~$ pg_ctl -D /home/pgdba/DATA/5840 -l logfile stop waiting FOR server TO shut down.... done server stopped
As you can see, I had problem with connecting to database, because of my not-standard username. This is perfectly OK, and I will explain it in a minute – for now, I just created pgdba database (createdb command).
As you can see, everything works well.
Now, let's change some configuration.
First of all, we will change pg_hba.conf, which is in /home/pgdba/data/5840 directory (/home/pgdba/data/5840 is PGDATA for this particular installation/cluster).
At this time, the file contains a lot of comments, and just couple of lines with actual, sensible content:
pgdba@arch:~$ cat /home/pgdba/data/5840/pg_hba.conf | grep -E '^[^#]' local all all trust host all all 127.0.0.1/32 trust host all all ::1/128 trust
This basically means that everybody from localhost, can connect, over UNIX socket, or TCP/IP, without any authentication – PostgreSQL will trust them.
Now, all those lines can/should be commented out, and instead we will keep the file simpler, with less comments, but more interesting:
# TYPE DATABASE USER CIDR-ADDRESS METHOD local all pgdba ident host all pgdba 0.0.0.0/0 reject local all all md5 host all all 0.0.0.0/0 md5
This setup means that:
- database user pgdba, can be used only via Unix socket (no TCP/IP), and only by shell pgdba account.
- When connecting from shell pgdba account to database user pgdba – password will not be needed
- all other users can connect both via Unix socket or via TCP/IP, but will need to provide valid password
Why such setup? It's easy. I will now create “postgres" user in database – since this is the name that most other installation use for their superusers – scripts and dumps will work well. Access to this user will be password protected, so I will be able to connect from any account on the machine (or remotely) as long as I have password.
But, for administrative cronjobs – I have passwordless account pgdba, which is also superuser, and works only from pgdba shell account.
Before I will go to create this (and other) accounts, I will also change postgresql.conf (which also is in /home/pgdba/data/5840 directory).
Since this file is long, I will not show all of it, just list changes from default config:
|#listen_addresses = ‘localhost'||listen_addresses = ‘*'||Thanks to this change I will be able to connect to Pg using tools like pgAdmin (which by default connect via TCP/IP, or from other hosts).|
|#checkpoint_segments = 3||checkpoint_segments = 10||To make large write operations faster. Each segment is 16MB, and you PostgreSQL generally can have up to 2n+1 segments, so with checkpoint_segments = 10, you can use up to 16 MB * ( 2 * 10 + 1 ) = 336 MB|
|#checkpoint_completion_target = 0.5||checkpoint_completion_target = 0.9||To avoid IO spike on checkpoints.|
|#log_destination = ‘stderr'||log_destination = ‘csvlog'||csvlogs are much easier to parse, and just minimally more difficult to read|
|#logging_collector = off||logging_collector = on||Required for csvlogging.|
|#log_rotation_size = 10MB||log_rotation_size = 100MB||I think that 10MB is to small to rotate. 100MB can be easily handled on most modern systems.|
|#log_min_duration_statement = -1||log_min_duration_statement = 100||It will log all queries that took longer than 100ms. If you are using Pg for cases where it's too short – just increase. For me it's a good compromise between logging all queries, and not logging any queries.|
|#log_checkpoints = off||log_checkpoints = on||This information can be helpful, and it doesn't make logs too big, so let's have it.|
|#log_connections = off||log_connections = on||This information can be helpful, and it doesn't make logs too big, so let's have it.|
|#log_disconnections = off||log_disconnections = on||This information can be helpful, and it doesn't make logs too big, so let's have it.|
|#log_lock_waits = off||log_lock_waits = on||This information can be helpful, and it doesn't make logs too big, so let's have it.|
|#log_temp_files = -1||log_temp_files = 0||If temp file had to be created, it means that some query it working on too much data, or too little memory. Every such case should be monitored, as temp files are relatively (in comparison to memory) slow.|
And that would be all. Now I can:
pgdba@arch:~$ pg_ctl -D /home/pgdba/data/5840/ start server starting
Please notice lack of “-l" option.
Now, all logs are in here:
pgdba@arch:~$ ls -l /home/pgdba/data/5840/pg_log/ total 4 -rw------- 1 pgdba pgdba 435 Feb 26 03:05 postgresql-2010-02-26_030557.csv -rw------- 1 pgdba pgdba 0 Feb 26 03:05 postgresql-2010-02-26_030557.log pgdba@arch:~$ cat /home/pgdba/data/5840/pg_log/postgresql-2010-02-26_030557.csv 2010-02-26 03:05:57.798 CET,,,27244,,4b872c85.6a6c,1,,2010-02-26 03:05:57 CET,,0,LOG,00000,"database system was shut down at 2010-02-26 03:05:39 CET",,,,,,,, 2010-02-26 03:05:57.800 CET,,,27240,,4b872c85.6a68,1,,2010-02-26 03:05:57 CET,,0,LOG,00000,"database system is ready to accept connections",,,,,,,, 2010-02-26 03:05:57.800 CET,,,27247,,4b872c85.6a6f,1,,2010-02-26 03:05:57 CET,,0,LOG,00000,"autovacuum launcher started",,,,,,,,
Now, that PostgreSQL started, I will add this “postgres" account, with some password:
pgdba@arch:~$ createuser --superuser --pwprompt --login postgres Enter password for new role: Enter it again:
And now, from any other account on the machine I can:
[root@arch ~]# . /usr/LOCAL/bin/pg-env.sh 8.4 Pg version 8.4 choosen [root@arch ~]# psql -U pgdba psql: FATAL: Ident authentication failed FOR USER "pgdba" [root@arch ~]# psql -U pgdba -h 127.0.0.1 psql: FATAL: no pg_hba.conf entry FOR host "127.0.0.1", USER "pgdba", DATABASE "pgdba" [root@arch ~]# psql -U postgres -c "select version()" Password FOR USER postgres: version -------------------------------------------------------------------------------- PostgreSQL 8.4.2 ON i686-pc-linux-gnu, compiled BY GCC gcc (GCC) 4.4.3, 32-bit (1 ROW) [root@arch ~]# psql -U postgres -h 127.0.0.1 -c "select version()" Password FOR USER postgres: version -------------------------------------------------------------------------------- PostgreSQL 8.4.2 ON i686-pc-linux-gnu, compiled BY GCC gcc (GCC) 4.4.3, 32-bit (1 ROW)
As you can see, I cannot connect to pgdba database account, but I can to postgres.
It took some time, but I hope I made it as clear as possible.
Of course – this text does not show the only way to install. There are ways that are faster, easier, with different configuration – this in here is simply my suggestion.
At the end, let me reiterate over benefits of this situation:
- config files are in datadirs
- $HOME of pgdba user is as tidy as possible (just 2 dirs: data and src)
- logs are in standard place (datadir / pg_log)
- this installation does not conflict in any way with installation from standarized packages from all known to me Linux distributions (different username, ports, paths)
- it gives you passwordless superuser
- it doesn't require you to use shell “su", if you want to connect as superuser to database
- it has an algorithm to choose port number in case of multiple PostgreSQL installations on the same machine – so it's predictable, you don't have to guess
- upgrades do not overwrite previous version of binaries (programs), so rollback from upgrade is usually just one “ln" and one “pg_ctl restart" away
- logs are in csv format which is pretty cool when it comes to parsing them
- switching between versions of postgresql programs, is as simple as running single script – of course you can avoid this script by using full paths
- you have all basic PL/ languages (PL/tcl, PL/tclu, PL/perl, PL/perlu and PL/pythonu) easily accessible, without any additional compilations
- you have all contrib modules ready to use (check /opt/pgsql-<VERSION>/share/contrib/), without any additional compilations
- full html documentation in /opt/pgsql-<VERSION>/share/doc/html, and manuals to all programs working automatically after running pg-env.sh.
And that would be all. Hope you will find it useful.