April 4th, 2014 by depesz | Tags: , , , , , , , , | 7 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

Every now and then someone asks, on irc or mailing lists, some question which shows deep misunerstanding (or lack of understanding) of timestamps – especially the ones with time zones.

Since I got bitten by this before, let me describe what timestamps are, how to work with them, and what are the most common pitfalls that you can encounter.

So, basically we have two datatypes that we can use:

  • timestamp
  • timestamp with time zone (a.k.a. timestamptz)

timestamp contains just date and time, with no more information. So, while it might sound OK (after all – in normal life we are just using date and time, aren't we?) it's absolutely terrible.

Let's assume you have timestamp “2014-04-04 20:00:00″. What does it tell you? Unfortunately – not much. It all depends on where on earth it is. 8pm on 4th of April is different moment in Los Angeles, Chicago, London, Warsaw or Moscow. That's the problem with time zones.

Of course you can assume – I will be always in one time zone, so I don't need any fancy time zone support. In my time zone date and time is perfectly OK to represent point in time – just as it is in “real life".

Is it really true?

Let's assume you have timestamp ‘2013-10-27 02:00:00′, and you know that your application, and all times are stored according to Polish time. Well, you're still out of luck, as it could be either 2am CEST, or (1 hour later) CET. And that's due to daylight savings time.

Long story short – I believe that using timestamps (without time zone) is, in almost every case, a bug. And should be fixed.

This gets only more problematic when you have (for example) entries coming from different time zones. Think – scheduling application with users in various timezones.

So, the solution is to use timestamptz.

First thing – it doesn't take more disc space:

$ select typname, typlen from pg_type where typname ~ '^timestamp';
   typname   | typlen 
-------------+--------
 timestamp   |      8
 timestamptz |      8
(2 rows)

So how does it work? It has to know the timezone – so why isn't it using more disk space?

Well – it doesn't know the timezone. Internally, all values in timestamptz columns are stored in UTC time zone.

UTC has certain great features: it doesn't have “offset" (well, it's the base for offsets for other timezones), and it doesn't have daylight savings time.

So any timestamp in UTC is always guaranteed to point to just one point in time.

But if it's all UTC – how do I get time in my proper timezone?

Well, whenever dealing with timestamptz values, if the timezone is not specified – PostgreSQL uses configured timezone.

And you can configure it in multiple ways:

  • timezone “GUC" in postgresql.conf
  • alter database … set timezone = ‘…'
  • alter user … set timezone = ‘…'
  • SET timezone = ‘…'

First one is used to specify in which timezone your server is. Or – the default timezone that will be used unless something else changes the default.

The next two change defaults for given database and user.

The last one can be used if you want your db connection to work under different setting.

Please note how it changes now() output:

$ select now();
              now              
-------------------------------
 2014-04-04 20:32:59.390583+02
(1 row)
 
$ set timezone = 'America/New_York';
SET
 
$ select now();
              now              
-------------------------------
 2014-04-04 14:33:06.442768-04
(1 row)

So – whenever you access timestamptz values (either read or write) PostgreSQL does, transparently, convert to/from UTC.

This means that values can be easily comparable (all are in the same timezone, it doesn't have daylight savings time, so comparison is always possible).

Now – let's assume you have string ‘2014-04-04 20:00:00′. And you know that it time in Los Angeles, but you want to store it in your database, which runs in different timezone. You can check that the current offset there is -7 hours, and use the value like:

$ select '2014-04-04 20:00:00-07'::timestamptz;
      timestamptz       
------------------------
 2014-04-05 05:00:00+02

What happened? Why it doesn't show 8pm?

Reason is simple – in the query I provided timestamp, in some time zone. This was internally converted to UTC, and then converted again (or maybe even without UTC step, not sure) to my normal timezone, which is:

$ show timezone;
 TimeZone 
----------
 Poland
(1 row)

If I'd have timezone set to LA, result of the query would be:

$ set timezone = 'America/Los_Angeles';
SET
 
$ select '2014-04-04 20:00:00-07'::timestamptz;
      timestamptz       
------------------------
 2014-04-04 20:00:00-07
(1 row)

It is important, though, to understand that the output value always takes into consideration “timezone" setting.

There is also another way to get the 8pm in LA:

$ set timezone = 'Poland';
SET
 
$ select '2014-04-04 20:00:00'::timestamp at time zone 'America/Los_Angeles';
        timezone        
------------------------
 2014-04-05 05:00:00+02
(1 row)

It is important though to add “::timestamp" after the value. Otherwise we'll get something weird:

$ set timezone = 'Poland';
SET
 
$ select '2014-04-04 20:00:00' at time zone 'America/Los_Angeles';
      timezone       
---------------------
 2014-04-04 11:00:00
(1 row)

What happened in here? And why is there 11:00 ?

Well. The value in quotes (2014-04-04 20:00:00) as assumed to be timestamptz – which means 8pm in my local time zone:

select '2014-04-04 20:00:00'::timestamptz;
      timestamptz       
------------------------
 2014-04-04 20:00:00+02
(1 row)

And only after treating the value as in my timezone, Pg read “at time zone …" – which can be used to show what the time was at given timezone.

So – timestamp at time zone gives timestamptz which represents the moment in time when local time at given zone was as given.

and timestamptz at time zone gives timestamp, which shows what the time was in given timezone at given point in time.

This is convoluted, so let me show you some examples:

select '2014-04-04 20:00:00'::timestamptz at time zone 'UTC';
      timezone       
---------------------
 2014-04-04 18:00:00
(1 row)
 
select '2014-04-04 20:00:00'::timestamp at time zone 'UTC';
        timezone        
------------------------
 2014-04-04 22:00:00+02
(1 row)

Interesting thing is that we can use this to convert time from one time zone to another – even if the Pg is in neither of them.

Let's assume we want to know what time it is in Los Angeles, when it's 8am in Moscow.

My local time zone is:

$ show timezone;
 TimeZone 
----------
 Poland
(1 row)

So it doesn't help.

First, we need to get “point in time" (i.e. in timestamptz) that represents Moscow 8am:

$ select '2014-04-04 08:00:00'::timestamp at time zone 'Europe/Moscow';
        timezone        
------------------------
 2014-04-04 06:00:00+02
(1 row)

This tells me that it's 6am in my time zone. But we want Los Angeles. I could do ‘2014-04-04 06:00:00+02′ at time zone ‘LA', but I can also:

$ select ('2014-04-04 08:00:00'::timestamp at time zone 'Europe/Moscow') at time zone 'America/Los_Angeles';
      timezone       
---------------------
 2014-04-03 21:00:00
(1 row)

since the ‘timestamp at time zone ..' expression is timestamptz, we can use another “at time zone " to convert it back to timestamp (without time zone information) that is localized to some other place.

I hope it's clear – it took me a while to get to understand it, but I think I finally get it :)

Interesting side effect of all of this is that you can't have (easily) indexes on functions on timestamptz. For example – you can't have index that would be used to get day of week:

$ create table test (i timestamptz);
CREATE TABLE
 
$ create index q on test (to_char(i, 'Day'));
ERROR:  functions in index expression must be marked IMMUTABLE

The reason is very simple – as we say in the example above – the same point in time can represent different days – depending on time zone. And since to_char() uses current timezone – it can produce different values for the same input, depending on system configuration (i.e. timezone):

$ set timezone = 'Europe/Warsaw';
SET
 
$ insert into test (i) values ('2014-04-04 06:00:00');
INSERT 0 1
 
$ select i, to_char(i, 'Day') from test;
           i            |  to_char  
------------------------+-----------
 2014-04-04 06:00:00+02 | Friday   
(1 row)
 
$ set timezone = 'Europe/Moscow';
SET
 
$ select i, to_char(i, 'Day') from test;
           i            |  to_char  
------------------------+-----------
 2014-04-04 08:00:00+04 | Friday   
(1 row)
 
$ set timezone = 'America/Los_Angeles';
SET
 
$ select i, to_char(i, 'Day') from test;
           i            |  to_char  
------------------------+-----------
 2014-04-03 21:00:00-07 | Thursday 
(1 row)

Same point in time – different days. This could be different months or different years – it only depends on “when".

Timestamp (without time zone) actually “shines" in here – since it doesn't have timezone, it can be safely used to extract information.

But – we know how to convert timestamptz to timestamp. We need to tell it at which time zone.

So, we should be able to:

create index q on test (to_char(i at time zone 'Poland', 'Day'));

Unfortunately it also fails.

The reason is that to_char is a bit too versatile.

Namely – you can use to_char like this:

$ select to_char(now(), 'TMMonth');
 to_char 
---------
 April
(1 row)
 
$ set lc_time = 'pl_PL.UTF-8';
SET
 
$ select to_char(now(), 'TMMonth');
 to_char  
----------
 Kwiecień
(1 row)

So this time – we get different results not because of timezone – but because of localization.

The proper solution to indexing problem is to write your own function, which calls to_char in a perfectly constant “environment", and then index this. Like:

create function day_from_ts(timestamptz) returns text as $$
select to_char( $1 at time zone 'Poland', 'Day' );
$$ language sql immutable;
CREATE FUNCTION

And now we can use it to index:

create index q on test (day_from_ts( i ));
CREATE INDEX

This is safe because the function itself enforces timezone to be “Poland", and it calls to_char in such a way that locale is ignored (i.e. there is no TM prefix in to_char format).

Of course to use this index, all queries would need to use the function too, like:

select * from test where day_from_ts(i) = 'Friday';

One more thing that is important when it comes to time zones is getting “unix time", a.k.a. epoch.

This is generally simple:

$ select extract(epoch from now());
    date_part     
------------------
 1396638868.57491
(1 row)

What's interesting, is that it doesn't depend on timezone:

$ begin;
BEGIN
 
$ show timezone;
 TimeZone 
----------
 Poland
(1 row)
 
$ select now(), extract(epoch from now());
              now              |    date_part     
-------------------------------+------------------
 2014-04-04 21:15:27.834775+02 | 1396638927.83477
(1 row)
 
$ set timezone = 'America/Los_Angeles';
SET
 
$ select now(), extract(epoch from now());
              now              |    date_part     
-------------------------------+------------------
 2014-04-04 12:15:27.834775-07 | 1396638927.83477
(1 row)
 
$ commit;
COMMIT

The reason is that, while not everybody knows it, epoch time is always taken into consideration at UTC time zone.

What it means is that when you're doing extract epoch from timestamp – pg assumes it to be UTC. Which leads to potential problems:

$ select now(), extract(epoch from now());
              now              |    date_part    
-------------------------------+-----------------
 2014-04-04 21:19:01.456205+02 | 1396639141.4562
(1 row)
 
$ select extract(epoch from '2014-04-04 21:19:01.456205'::timestamp);
    date_part    
-----------------
 1396646341.4562
(1 row)

In the first case, Pg gets “point in time", which gets internally converted to UTC (and when displayed – converted to my, +2, timezone).

In the second case – the timestamp is in my timezone, but it is assumed to be UTC (not converted!), and epoch is taken from ‘2014-04-04 21:19:01.456205 UTC' and not ‘2014-04-04 21:19:01.456205+02′.

Tricky.

Long story short – try to avoid timestamps and use timestamptz.

The last thing I wanted to show is not a bug or potential issue, but rather a functionality that many people don't know about.

As you saw PostgreSQL uses timestamps (and timestamptzs) with up to microsecond precision. While I don't like it, some people insist that they want just up to second precision.

Both timestamp and timestamptz (and other time related datatypes too) types can have optional “precision".

Let me show you simple example:

$ select now(), now()::timestamptz(0), now()::timestamptz(1);
              now              |          now           |           now            
-------------------------------+------------------------+--------------------------
 2014-04-04 21:23:42.322315+02 | 2014-04-04 21:23:42+02 | 2014-04-04 21:23:42.3+02
(1 row)

You can, of course, use it in tables too:

$ create table test (i timestamptz(0));
CREATE TABLE
 
$ insert into test(i) values (now());
INSERT 0 1
 
$ select * from test;
           i            
------------------------
 2014-04-04 21:24:16+02
(1 row)

Nice. You don't have to change your “now()" or anything – just add precision to datatype, and it will do the sensible thing.

I mentioned that I don't like it. The reason is very simple – in any sensibly busy system, a second is simply too low precision. What's more – keeping data up to microsecond doesn't cost me anything, and it can get useful.

but then – if the data is up to microsecond – how do I get values to display without subsecond?

It's simple – I use (in select clauses) to_char(), or date_trunc, or even cast to timestamptz(0):

$ select now(),
    to_char(now(), 'YYYY-MM-DD HH24:MI:SS TZ'),
    date_trunc('second', now()),
    now()::timestamptz(0);
              now              |         to_char          |       date_trunc       |          now           
-------------------------------+--------------------------+------------------------+------------------------
 2014-04-04 21:28:20.827763+02 | 2014-04-04 21:28:20 CEST | 2014-04-04 21:28:20+02 | 2014-04-04 21:28:21+02
(1 row)

I hope this blogpost will clear some confusion. If something is still not clear – please comment, and I'll do my best to answer all questions.

  1. 7 comments

  2. # JIm Nasby
    Apr 4, 2014

    Great article, and I agree that timestamp without timezone is pretty guaranteed to be a bug.

    Unfortunately, I don’t think timestamptz goes far enough either.

    The problem is that it’s throwing away useful information; namely: what timezone was this timestamp originally specified in?

    That’s frequently very useful information to have. Things like…

    “What time did the customer think they were actually scheduling this for, based on the timezone they entered?”

    “At what local time was X supposed to happen?”

    “What was the local time when X actually did happen?”

    I think it’s a shame that we don’t actually have a data type that preserves timezone (though, I think there is an add-on one).

    There’s another important point I think needs to be made in any discussion of time.

    Humans apply a magic connotation to time. People don’t see time as just some other datum… it’s a real thing that we experience every day. And this causes huge problems. For example, trying to determine the history of changes to something. Time is actually a HORRIBLE way to do that.

    Think of it this way… when humans talk about time, what we’re actually referring to is the constant *immutable* chain of events that we experience in our lives. If you just read that sentence and now you’re reading this one, that is history. And it can never change. You can never un-read the previous sentence, and you can never say that you read the second one before the first one.

    This immutable stream of events is what we think about when we talk about time. And it has NOTHING to do with clocks, timezones, dates, timestamps or any of that stuff. All of that stuff is about a *measurement*. It’s not even a measurement of what we perceive time to be, because it doesn’t actually chain events together.

    So next time you want the history of something, do yourself a huge favor and *build a chain of events that doesn’t use time*. For example, give each record a previous_event_id that points at the event that came before this one.

  3. Apr 5, 2014

    Personally, I think we lack a data type that preserves all the important information. Both timestamp and timestamptz are seriously flawed.

    What we need is a genuine “timestamp with time zone(s)”.

    This type must be sufficient to identify:

    – A unique instant in UTC; and
    – A distinct local time

    to do this, we’d need to store a timestamp in utc seconds, *and* an unambiguous identifier for a time zone. That means either something like “Australia/Sydney; DST=0″ or some disambiguated form of a short tz code like “EST” / “EDT”.

    All in all, times are a horrible mess, and SQL makes a bigger mess out of times.

    Right now there are plenty of cases where you have to store a ‘timestamptz’ and a separate offset and/or timestamp code (depending on your specific requirements) in order to deal with times correctly. This tends to involve horrible chains of AT TIME ZONE operators, or falling back to client code.

  4. Apr 5, 2014

    Would be interesting if you updated this to include the why not epoch, since a large number of developers seem to think an int is a sane way to store timestamps

  5. Apr 7, 2014

    @Jim, @Craig – if you need additional TZ information stored (I usually don’t) is there anything that stops you from …storing it (either as common name or offset or …a timestamp w/o TZ)? Add a column or create complex data type on your own. It doesn’t differ from complex numbers or anything composed – do you thing PostgreSQL should support quaternions and octonions OOTbaseB? Anyway – how do you want know what the user timezone is!? In the web you can use JS to get offset only, maybe some fancy Java applet that looks for TZ environment variable and parses /etc/localtime, but all the other ways?

    If you think that time representation is a mess, then apparently you are skwed by using windows OS. Time is universal (well, not the physical time, just the administration one on earth), all that differs is it’s representation (not only TZ and DST, but also gregorian/julian/discordian and other calendars). Value stored in DB should reflect exact point in time, just like it reflects exact number – and it’s up to a user whether display it in hex, oct or dec. However, the bits of stored number don’t contain information if it is a float or int – table definition tells you that. Your suggestion to store user TZ in a row is equal to creating new datatype named “some number” with per-row declarable precision. While storing ints and floats (and complex? and simple fractions to save infinite precision?) in a single column might be interesting idea for some specific usecase, it’s not the general-use type.

  6. Apr 7, 2014

    @Caleb:
    actually, in my opinion – it’s not *that* bad. As long as you understand what you’re storing (i.e. epoch timestamp should be based on UTC time zone, and not any local).

    My only major problem with integer based time is that it is less readable. I.e. I prefer to see: now() – ‘1 week'::interval, than now() – 604800. Not to mention whole timestamps – like 1396866292 – it bears almost no meaning for humans.

  7. # Arek
    Apr 7, 2014

    As for epoch timestamp..

    It is bad.. it gives you false impresion that you optimized your table, but loosing at the same time:
    * the whole point of real timestamp with time zones, intervals, leap years.. etc..
    * precision — epoch expressed in seconds is pretty bad precision for nowadays speedy servers.
    * readability — of course :)

  8. # Dan Harris
    Apr 7, 2014

    While it is rare to have a use case for timestamp WITHOUT time zone, I live in one every day. I store archive data from dozens of different systems in many different time zones. There are many different systems with which I have to interface. Very few of these systems store a time zone.

    For most of these systems, I have no clue what the originating UTC time stamp is. Yet, I still must store them in Postgres. An hour of deviation twice per year is fine for these purposes. But, because I can only stuff a date into a field, I have to go with the plain timestamp WITHOUT time zone field to make my system work. The data is only relative to other data within each of those systems. So, as long as this deviation is acceptable, this works.

    It scares me when people state “that must a bug”. These systems I deal with were developed when time zones and databases did not mix (some are even stored as character fields). I NEED these in my database or else my systems would be completely sunk if this feature was deprecated.

    Never assume that you have control over the data you are storing. It’s great that some of those systems are evolving to store it. But as long as I need to support those that don’t, I will rely on timestamp without time zone.

Leave a comment