overprotective developers

i'm recently under impression that pg developers are overprotective. who do they protect?

us, users.

i mean – don't get me wrong – i live with, and (to some extent) thanks to the piece of marvelous technology which is postgresql ordbms.

i use it, i try to help other use it. i don't think there is any better database available now (at the very least – for my purposes).

but, just lately, i some to see some cases where decissions are made in a way to “protect us (dbas) from our faults".

two examples which i personally find important are:

  • lack of hints (which i already ranted about)
  • removing automatic casts int/text in 8.3

in first case – reasoning is simple – it is not good! we will make the planner do better, and you dont need it. using it will harm you, make you loose your ti
me, and so on.

fine. the argument is there. but (for me) – it is nothing more than overprotecting users. if user wants to do something that is potentially dangerous or even
pointless – i think it would be much better to tell him that he can shoot himself in a foot, but, by all means – let him do it!

unix system for longest time permit (to superusers) access to raw disks (think /dev/hda in linux). a lot of people got burned by this (including me). but the
feature is still a feature. and nobody advocates removing it.

even in postgresql – there is “fsync=off" setting which is potentially dangerous, and (to some extent) pointless (with smart raid controllers with battery-back
ed-cache memory).

then – the automatic casts.

“since always" you could do something like this:

CREATE TABLE x (id text PRIMARY KEY);
INSERT INTO x (...);
SELECT * FROM x WHERE id = 123;

and it worked. it was not perfect. i don't think i ever made a table with textual primary key.

but, some people, somewhere did. seen that.

and now, their code will stop working, because it was “discovered" that “text = int" is not actually clearly defined.

so, instead of defining it in any way (for example, the way it always worked) – it was made illegal.

justification? such code is broken anyway.

what will happen is that there will be (there are) some databases which do use something like this. and they will not be “upgradeable".
maintainer doesn't work here anymore, it's too complex, nobody wants to touch it, we've got better things to do than to find and fix every possible mistake in the code – there are even more possible (and, business-wise, valid) reasons.

this will lead to people get “stuck" at 8.2. which is great database. now. but what about “in 5 years"? right now, when somebody asks a question on mailing lists, saying that he's using 7.x, he gets prompt suggestion: upgrade, 8.x is soooo much better.

i bet the same will happen with 8.2. and any other future version.

so again – core team decided that it's better to protect us from ambiguity, than:

  • leaving things the way there were
  • and/or: putting into bible/manual how it exactly works, thus removing all possible ambiguity from the case
  • or making a poll which will tell which way it should work, than make it so.

do you think it's ok? i'm not so sure.

the problem with it is that i never made my databases that way – so i'm not really into advocating against the change. but i can easily feel the pain of people who will be banned from future improvements in postgresql code, but the simple fact that once in their application-life-time, they made a mistake. and then – they didn't have time nor resources to fix it.

what i'd like to advocate though is, that i would love to have zillions of dangerous “knobs and switches" in postgresql. the ones that go deep inside the engine, and change things that are dangerous, but can be used to do something a bit better.

of course – we would need big and bold warning labels: “do not touch this, this will destroy your data!", but – same goes with /dev/hda or fsync=off.

12 thoughts on “overprotective developers”

  1. I think hints are bad, because they prevent performance bug reports reaching hackers.

    I think automatic type casts bad, because they violate the standard.

    Siding with the hackers here.

  2. @depesz:

    * lack of hints (which i already ranted about)

    I think this is a misperception. PostgreSQL _does_ have hints, such as “enable_seqscan=false” and the like.

    PostgreSQL is open to ideas an proposals for declarative hint system, so long as the hints are not in the queries themselves. That’s actually the problem, right there, is that people think of hints as being inside queries, which is absolutely the wrong place. You can do everything you need to do with hints declared by the administrator (not in the queries), and you avoid the serious drawbacks of in-query hints.

    So what we need is a good proposal and good implementation of hints that are declared outside of the queries themselves.

    * removing automatic casts int/text in 8.3

    The removal of implicit casts is a feature for some people, and a misfeature for others. It’s not a question of protecting DBAs from themselves.

    Removing ambiguity and throwing errors are both very important features in a database and will help a lot of people. It’s not backwards compatible, and that’s unfortunate, but they weighed the benefits and decided to remove a lot of implicit casts.

    So, I don’t think this is an example of protecting DBAs from themselves, I think this is about solving problems that DBAs have (and there’s plenty of evidence on the mailing lists).

    You wouldn’t suggest that throwing an error on the date ‘2007-02-31’ is overprotective, right?

  3. A lot of people even in professional situations tend to stick with default settings. I see the point of both sides. The devs are right for not wanting hints and automatic type casting, but existing users are also justified for feeling that they shouldn’t have to forfeit backwards compatibility in order to upgrade. So, for 8.3 set the new behavior as default but document how to migrate from older versions by setting compatibility switches, with full warning that the old behavior is deprecated. Then completely remove the old behavior in 8.4. Or if you want to be conservative, set the old behavior as default with options to use the new behavior, then remove the old deprecated behavior in 8.4.

  4. sw that allows to write bug code is bad. Do you remember old vbscript? But if you need it, no problem.

    postgres=# create function int2text(int) returns text as $$select $1::text$$ language sql;
    CREATE FUNCTION
    postgres=# create function text2int(text) returns integer as $$select $1::int$$ language sql;
    CREATE FUNCTION
    postgres=# create cast (int as text) with function int2text(int) as implicit;
    CREATE CAST
    postgres=# create cast (text as int) with function text2int(text) as implicit;
    CREATE CAST
    postgres=# select 1 = ‘1’;
    ?column?
    ———-
    t
    (1 row)

    postgres=# select ‘1’ = 1;
    ?column?
    ———-
    t
    (1 row)

    postgres=#

  5. @Pavel Stehule:
    > sw that allows to write bug code is bad.

    so, basically every compiler, interperter and virtual machine in the world is bad because it allows writing bad code. otherwise – we wouldn’t have error, would we?

    as for vbscript – no, sorry i don’t. i stopped using windows more or less around version 3.11, and used them since only to run some games (read: total annihilation and quake: ). so – i know that vbscript is something in msoffice, but i never used it.

    and lately – i know that i can add my own casts. even tom showed this on list. this is by far one of the most important reasons i love postgresql – because it is so flexible.

    but i just feel that people shodn’t be “forced” to do own casts to *keep the same* functionality.

  6. To Depesz. I understand, but really problem is in people – programmers. They are in 90% total stupid :). I know about I talk. It’s well on PostgreSQL so you cannot simply do wrong thing. Then you have to learn and understant. Second way is way to hell. And note: there are differences between languages, interprets. Look to languages for safe programming (like ADA, ..). They are really strict. And you can find lot of bugs on compile time.

  7. To depesz:
    i like what you write 🙂

    To Pavel:
    About the casts – Great.
    I have this example that actually goes beyond the restrictions imposed by 8.3 and hits the treatment of a text literal (yes TEXT) as “unknown” in Postgresql.
    Have you ever try to run this?

    select hierarchy_level, sum(a_number)
    from (select ‘Total’ as hierarchy_level, count(1) as a_number) a
    group by hierarchy_level

    If you did you probably got response similar to

    ERROR: failed to find conversion function from “unknown” to text

    If you didn’t you may say that whoever wrote such a SQL is stupid 🙂
    That could be an inconsiderate opinion though…
    Some OLAP tools would generate a query for the higher level of the drill-down hierarchy exactly like that and i do not see why that should be otherwise.
    If you have any idea how to make this work – please share. It would allow me running our DW on Postgresql rather than going humbly back to Oracle…

    I would be happy to accept the fact that someone feels that automatic casts are dangerous provided i have enough flexibility override that opinion in Postgres.
    So far i failed – which may be because of my lack of knowledge about Postgres internals as i only work with it for couple of months…
    Nevertheless at a more general level i do not accept the fact that someone knows what i need better than i do without knowing the context.
    In this case i need to cast “unknown” to text because the idiots who created the best commercial OLAP i know never assumed that
    treating a string literal as a string may be too dangerous… Or bad practice…
    🙂
    You see – i started way back with similar opinion (sw that allows to write bug code is bad) and a friend of mine still takes this approach in his 40s…
    He ended-up doing his life decission based on availability of jobs in Smalltalk because its the only thing pure enough (he is extremely smart guy otherwise).
    Perhaps you will learn over time that we do not live in ideal world…
    I appreciate all efforts to uproot such deviations as “sw that allows to write bug code” but that was ultimately the approach inquisition took 🙂
    Historically different approach works much better – be flexible, open and educate. That way you show the real power.

    Do not take me wrong and please show me the flexibility so i can run the SQL above without explicit casting to text in the SQL…
    🙂

    To Jeff:
    The “hints” currently available in Postgres is a good start. We can argue days about “a hint as a part of query is bad” but then again – i do not live in
    an ideal world where there is only Postgresql. I would like to use Postgres for its flexibility as a database server but i have to live with
    other applications that may never be so open and flexible. I understand (i hope) the reasoning behind “no hints in SQL” but its about the flexibility…
    Btw if it is by a chance about not contaminating the SQL with additional things because there is a legacy code that makes the parser not flexible then it should be said.
    Apart from that…
    There is huge number of things in Postgres i like but it is not always as flexible as i read. Take for example the Query-Rewrite.
    Should it be really flexible i could re-write the query issued completely and process any “hints” i put in the query using the SELECT rules (MY DREAM!).
    Or just to have more SELECT rules would make me happy! Not talking about implementing a REAL materialized view functionality where i can
    redirect the select from huge fact table to a much smaler aggregate table…
    The current behaviour does not allow that and when i track down the “progress” on mailing-lists in the query-rewrite area i find that more restrictions
    were imposed and the flexibility was removed for the sake of idiots like me because we could do a mistake if it is too flexible!
    I suggest be flexible (first) and educate. If you “overprotect” people will go to someone else. Eventually.

    Milan

  8. to Milan:

    select hierarchy_level, sum(a_number)
    from (select ‘Total’ as hierarchy_level, count(1) as a_number) a
    group by hierarchy_level

    it is different bug, that isn’t related to changes in 8.3. Its not works on 8.2 too. It’s simply bug or feature. Please report it.

  9. 😉
    so is it a bug or a feature?
    i actually did some research earlier (mail-list) and the “official” recommendation is “do explicit cast to text” …

  10. Hints:

    pg has hints, though not in the query. I’d like to see more hints available, but I do also like them to not be in the query.

    The code:

    SELECT z.a, y.b from z join y using(c) where z.d

  11. Oops. XSS filter cut everything off at a less-than symbol. I’ll try again with a different example query.

    Hints:

    pg has hints, though not in the query. I’d like to see more hints available, but I do also like them to not be in the query.

    The code:

    SELECT z.a, y.b from z join y using(c) where z.d = y.e

    Describes the retraction of certain data from certain relations that meets certain requirements.

    Putting hints into that and you are no longer describing that, you’re describing how a piece of software works. It’s no longer a query, it’s an instruction.

    If we could have both at the same time, that would be nice.

    Casts:

    I have a major problem with:

    create table x (id text primary key);
    insert into x (…);
    select * from x where id = 123;

    Namely, I don’t know what it’s meant to do.

    I can figure what it’s meant to do for a row where x = ‘123’ and what it’s meant to do where = ‘124’ but not for the following values of x:
    ‘ 123’
    ‘123 ‘
    ‘123.0’
    ‘120 + 3’
    ‘foo’

    I can find out (as it happens it matches the first two, and errors on the others) but it’s not at all clear. In particular, note that select ‘123.0’ = 123 throws an error, while select 123.0 = 123 returns true. Hardly intuitive.

    I don’t entirely agree with Pavel saying that allowing us to write buggy code is bad. But the point of code is to be something that both humans and machines can understand – especially humans. If I can’t see what the code will do, despite knowing the language, then there is a flaw. The problem isn’t that the code is buggy, it’s that it doesn’t express anything – it isn’t good enough to be buggy.

    Putting in explicit casts allows us to turn that from code that doesn’t describe anything to code that describes something. It also allows legacy code to be rapidly ported. Not a problem.

    “I have this example that actually goes beyond the restrictions imposed by 8.3 and hits the treatment of a text literal (yes TEXT) as “unknown” in Postgresql.”

    No you don’t. You have in your example a case where pg hit the treatement of an unknown literal as unknown. Threating “unknown” as “unknown” doesn’t sound like a problem to me. If you want it to be treated as text, then use a text literal, not an unknown literal:

    select hierarchy_level, sum(a_number)
    from (select ‘Total’::text as hierarchy_level, count(1) as a_number) a
    group by hierarchy_level

    Note that I only know that the literal is of type text from your description, not from your code. For all I know from your code it should really be:

    select hierarchy_level, sum(a_number)
    from (select ‘Total’::char(5) as hierarchy_level, count(1) as a_number) a
    group by hierarchy_level

    or even:

    select hierarchy_level, sum(a_number)
    from (select ‘Total’::char(2) as hierarchy_level, count(1) as a_number) a
    group by hierarchy_level

    Your code doesn’t describe your intention. It is failing to express, and is therefore flawed. If a human can’t work out what your code is meant to do, why expect a program to?

    Now, you could perhaps argue for a type for literals of the format of ‘Total’ other than unknown – essentially changing the language to have a different type for plain literals and require explicit typing when unknown is what you want. But that’s a completely different thing than just expecting pg to know what you mean by magic.

    I’d argue against the explicit type of ::unknown though (though it is possible to use it if you want). A large number of the time pg can work out an appropriate type and does so, and having ::unknown as the default is therefore very useful. The minority of cases where using literals of type unknown cause problems is rare, and easily caught once you know about the implicit ::unknown and how it works.

Comments are closed.