My personal PostgreSQL Wish List

During my time with PostgreSQL I found some things that I love. I found some things that I hate (very little of them). And some things that I would like to get, but, so far, I don't have them.

Of course, PostgreSQL being Open Source, I could add it myself, if only I would know some (real) C and had necessary skills. Which I don't. But anyway, decided to write my personal wishlist for PostgreSQL – maybe someone will say “hey, that would be cool, and I know how to write it" 🙂

Number one is the most controversial, so let's start with it, so I can get over with it to something that is less likely to aggravate people:

hints

I want ability to force my point of view of data to PostgreSQL. I know that we have stats. And stats will get better. But the stats and smartness of planner only go so far. What's more – at any point in time there are LOTS of instances where people use older versions of PostgreSQL – so they can't reap benefits on newest changes to planner/optimizer. Adding hints would make their lives easier. Is it nice? No. It is elegant? No. Is it potential foot-gun? Of course. Do I think that we should have it: YES!

Change settings for other backends

Very often I could use ability to, for example, force logging of currently running query (or all future queries) for given backend. Simple case – backend is misbehaving. I see that it's running a lot of quick queries, but it's also idling a lot. I'd love to be able to:

ALTER BACKEND 1234 SET log_statement = ALL;

Or perhaps change it's application_name. Or do anything else that I could. Right now I can only cancel query or kill backend.

More history

This is somewhat related to previous point.

Some time ago we had only currently running query. Now we have “last running query" and current state – the distinction becomes obvious when dealing with IDLE in transaction connections.

But I want more.

I want to be able to (dynamically?) change limit of memory per query visible in pg_stat_activity. Or have the “last queries" an array, with given max limit of entries, or max mem usage.

I'm working usually on machines with at least 32-64 GB of RAM, and > 100 GB is not unheard of. And when I do the calculation that “last 10 queries, each, let's assume – 10kB, 200 backends – we'd need whopping 20 megabytes", and I realize that we don't have this – WHY?!

My work is 90% of time DBA, and not DB Programmer. So I want more insight into what's happening. This backend – what it's doing. And what it has been doing lately?

Build-in scheduler

I know there is Cron. I know there is pgAgent. But I'd prefer to have something built-in. With standardized, db-accessible logs, configuration, privileges, and so on.

External solutions will always cause more problems. Starting with authentication. Solution that is built-in would be simpler to maintain, easier to explain and friendlier for users.

Global variables in pl/PgSQL

Something like %_SHARED in pl/Perl. I know I can abuse GUC's. I know I can have temporary (or even normal) tables for variables. But I'd like normal variables that can be stored per session or even absolutely globally. Why? Why not?!

Hashes (dictionaries) in pl/PgSQL

This is related to previous point. And I know that I can use hstore. Or JSON. But it just doesn't look nice. Not to mention that I worry (perhaps I shouldn't) about performance of doing:

hstore_variable := hstore_variable || hstore( 'some_key', 'some_value' )

A zillion times.

eval/execute function

This is very selfish. I know we have “DO", but for simple/quick tasks, I much prefer to have my own execute(), like:

CREATE FUNCTION EXECUTE(text) RETURNS void AS $$
BEGIN
    EXECUTE $1;
END;
$$ LANGUAGE plpgsql;

And then I can: select execute(format(……)) from …;

Simple, and to the point. Not to mention that clearer (in my personal opinion) than DO command.


And that's it. I could probably think of some other things (built-in sharding with fault tolerance or differential textual backups for example) but the ones I listed seem to me as more important, or at least – more important for larger number of people (or for me personally).

While re-reading the list I noticed that it's really not a lot of things. And that's OK – I am very happy with PostgreSQL, it's developers, Core Team. I just would love to get a bit more.

17 thoughts on “My personal PostgreSQL Wish List”

  1. I wish I had sublinear algorithms to select partial indexes a built-in feature.
    While table partitioning seems to be on its way to some final deployment, partial indexes have been left behind.
    The query planner should be able to select which partial index(es) among the set should be taken into account.
    And should do it in a sublinear way.
    Yes, this is selfish as well, but a number of my problems would be solved with a very large table (1B+) with a large number of partial indexes (1K+). Kind of index partitioning.
    I know you can push a lot into the application level. But why?

  2. Yes, you are right.

    Should the built-in scheduler build on background-workers?
    I have made an pgagent-nextGen.

  3. @Tom: that would work, I think. I want the scheduler to bypass normal authentication, so we’d not need scheduler/cron to “know” passwords, just make it possible to execute given queries at certain times, and log output/errors.

  4. My wish list

    *CPU utilization on a per query basis in pg_stat_statements (one of the most useful bits of info for tuning purposes in other DB’s)

    *The ability to pool DB connections between DB users (and ideally databases) without having to break the DB security model (like the pgbouncer approach requires)

  5. @depesz – I meant when you use it to pool DB connections between different DB users. Say you want to pool DB connections between DB user A and DB User B. You could make them both DB user A and push the authentication into pgbouncer.

  6. @Brad:
    I don’t understand how you plan on sharing connecting between users. Even disregarding pgbouncer, how would that work? How would Pg know which privileges to apply for any given query?

  7. @depesz – I’m not sure how that would work – I suspect major changes to the back end. That’s probably why I’m not a hacker 🙂

    Point is it’s a wish list 🙂 I have multi-tenant applications that suffer from too many DB connection performance issues due to need to have many different DB users connect. If I could pool between them, it would be a huge performance gain…

  8. I would like to have true parallel query. In a warehousing environment, this is a MUST.

  9. * In disk bitmap indexes

    * Easier way to get data to inheritted tables. Some form of automatic distribution like greenplum does

  10. My top 3 items on the whish list would be:
    1. Parallel queries
    2. master-master replication
    3. Memory manager built into Postgres

  11. A built-in scheduler is now relatively trivial to write, now that we have dynamic bgworkers. It’s on my “if I have time and motivation” TODO, as I’ve written a lot of bgworker and SPI code now.

    Session variables would be very, very useful, and are something we’re going to need sooner or later.

    Personally I want to be able to declare secure session variables that’re owned by a given user and can have rights granted to them. So they can be set by a SECURITY DEFINER function and not changed by the user. Particularly useful in combination with another feature I want, ON LOGIN session triggers.

    Other goodies on my wishlist:

    CTEs that aren’t optimisation fences (Grrrr!).

    Better support for anonymous records. Particularly allowing somerecord :: sometype, or somerecord AS somealias(id, name) .

    Parallel query. PostgreSQL’s limitation to a single CPU per query is becoming crippling for some workloads.

    Smarter autovacuum (always).

    Improved monitoring and performance tuning tooling. The current situation in Pg is pretty limited compared to almost everything else, and users have to know a lot about Pg’s innards. Play with MS SQL’s tools and you’ll cry with jealousy.

    I’d say “WITH ROLLUP” but we have that now 🙂

    Physical or logical replication control on a per-database level, if not per-table.

    Partitioning that doesn’t absolutely suck.

    Getting rid of pg_hba.conf in favour of in-SQL configuration.

    Integrated connection pooling!

    Yeah…. there’s lots of room for improvement. For all that, overall PostgreSQL does a great job of meeting my day to day needs and those of customers I work with. I love how extensible it is, how good the docs are, how generally stable it is, and how most functionality you might want is already just there and ready to use.

  12. Hi Depesz,

    I would be interested in dictionaries too, however I believe that PostgreSql’s safety relies heavily on the fact that all objects are immutable, so technically, this requires always re-creating the dictionary rather than extending, whenever you add an element into it.
    I’m not sure what would introducing mutable objects do – this can cause havoc because much of the internals may be simplified significantly by immutability.

    You should also consider this for operators.
    Currently operators like sum(complex) for example, take the previous sum object, add it the current value, then add them to create a third object which will be stored as the sum for the next iteration.

    This requires creating and managing new temporary objects per each iteration. Imagine it were mutable, I have created once a test plugin to try it and I recall it improved overall performance measurably.
    So potentially, this could boost performance of all aggregates, which is pretty useful for analytics and reporting.

    — T

  13. Transparent table partitioning. This is a incredible fault. All Postgresql competitors have this for years !!!

  14. IMHO top missing Postgres features

    1. Stored procedures with rollbacks and commits
    2. Built-in scheduler
    3. Transparent table partitioning
    4. Autonomous transactions

  15. I know it’s a bit late bu :

    – Internal metrics
    – query plan rewrite (like hints but on server side)
    – block change tracking capability (to be used by autovaccuum and help to do incremental backups)
    – Per session : IO, Memory, CPU used
    – A sort of workload repository
    – Modular storage engine (ability to mix row, column storage)
    – push down predicates on foreign tables

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.