Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

On 2nd of September, Teodor Sigaev committed patch:

Allow usage of huge maintenance_work_mem for GIN build.
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.
Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.
Robert Abraham <robert.abraham86@googlemail.com> with additions by me

To test this, I made a test table:

$ \d test_gin
                            TABLE "public.test_gin"
   COLUMN    |  TYPE   |                       Modifiers                       
 id          | INTEGER | NOT NULL DEFAULT NEXTVAL('test_gin_id_seq'::regclass)
 some_string | text    | 
    "test_gin_pkey" PRIMARY KEY, btree (id)

Data looks like this:

$ SELECT id, substr( some_string, 1, 100 ) FROM test_gin LIMIT 20;
  id  |                                                substr                                                
    1 | organic ricksha's adversaries mail's koala boney hollies mukluk giggle's jab's murder intensively co
    2 | orator's wariness Spencer eccentricities peafowl teething viscounts casinos galoshes irregularity pe
    3 | obsolete annually setter provision raga Luke fairgrounds disdained Alma's dissimulation tanning perp
    4 | codfish's partition's betrothing Nashville telecasters pipe's Frenchwoman's Engels's drawer Ortega's
    5 | sprawls Federalist confused shelter's offset ensembles victimizing skepticism Bushido nothing earmar
    6 | mulattoes delights griffins unofficially waterway's stunted triathlons trickle pimps Silva drag's in
    7 | tacticians averring diereses assault's Clouseau's alined crazier frock vintner humidor's consonants 
    8 | allusively Hapsburg's gigglier acclimatization fiendish gypsum exemplifying thrill artiste wikis New
   42 | slurring Poppins raindrop doggonedest Betelgeuse's Rick flip's geneticist's porthole petard's dyspep
 3062 | Dwight's bicyclists copperheads hypocrites mimicked resembling tourmaline Friday tyrannizes Akbar un
    9 | irk sew straggler's reminisced chooses Ashley dud Pangaea newspaperwoman pantomime's Punjabi golden 
   10 | Miami yearning burden ramps campers obviously conspirator's envies Christians sunned tetanus Gable m
   11 | councilwomen execute louver's unknowns angularity's consulate's clouded infamously anapests infests 
   12 | Cassiopeia leggings unmentionable propelled was hardiness crouching yaws aerosol browbeaten sluggish
   13 | seduces hardball's gaffes reappearance bet unevener infallibility's coloration's busting mislay Janu
   14 | Katie's someday feathering Audrey's thespian Arcadia ductile creosote's tunnels startlingly emulate 
   15 | expurgation baffling industries scapegoat Ginsburg cent's recapped hedonism spitballs Mimosa's hocks
   16 | infomercial adapter rancher's monthly's exactitude's regency mouthed wildcatted pressure's clutter's
   17 | freedoms officiating firearms unplanned rums philanders player's deluding horizon's pedantry's coola
   18 | cesspool's tromped handkerchieves orthopedic explosion's Valerian's commencement's Confucianism y st
(20 ROWS)


FROM test_gin;
  COUNT   | MIN |         avg          | MAX  |    SUM     
 10000000 |  64 | 798.9262166000000000 | 1651 | 7989262166
(1 ROW)

Currently the table is:

$ SELECT pg_size_pretty( pg_table_size( 'test_gin' ) );
 8195 MB
(1 ROW)

Now, I'll run index creation:

CREATE INDEX ti ON test_gin USING gin (to_tsvector( 'pg_catalog.english', some_string ) );

5 times each, with work_mem of 64MB, 512MB, 1GB, and 4GB. Out of these 5 times, I'll pick the fastest (not average!). Results:

maintenance_work_mem index creation time
64MB 1,954,296.659 ms
512MB 2,055,028.899 ms
1024MB 2,173,428.048 ms
4096MB 2,211,913.840 ms

And that's a shocker. Why is it slower with more memory? No idea. The box I was testing it on has 16GB of ram, and had almost no concurrent load. Maybe someone can explain to me what I did wrong in the test?

As for the patch – despite the results looking weird (to me), I think it's a good thing, as it removes one more limitation (or moves it) in PostgreSQL, which is always a good thing. If I could only understand the timing …

4 thoughts on “Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.”

  1. GIN index build accumulates as much entries as fit maintenance_work_mem, then sort them and insert to the index. GIN is logically two-level btree. When you insert into GIN you have to find or insert corresponding entry. If entry already exists then insert to corresponding posting list/tree. Specific of index build is that data are coming ordered by item pointer, the same order as in posting trees. Thus, append to posting tree is very cheap.

    When we accumulate N entries in memory, we have to spend O(N * log(N)) time to sort it. If we have limited number of entries M, then we need O(M * (log(M)) time to find corresponding entries and O(N) to actually insert item pointers. This is quite rough model which doesn’t reflect all the details. But it can predict grow of time spend per value after some N.

    We could solve this issue by trying to find optimal N. But it could be better to rewrite GIN build from scratch. We can have RB-tree or hash on entries in memory and accumulate posting lists in-memory. Then inserting into index by
    larger batches shouldn’t lead to slower index build.

  2. @Alexander:

    it’s all good, but unless I’m missing something, it doesn’t explain why rebuilding the index from scratch, with higher maintenance_work_mem, takes longer than with lower maintenance_work_mem.

  3. I tried to explain technical reason in my previous comment. But in general, it is the issue we need solve.

  4. @Depesz:
    Based on Alexander’s comment it (i.e., sorting) takes longer because M (i.e., entries in memory) is larger. So essentially the more memory you give it the more entries can exist (i.e., bigger M) and the bigger M is the longer sorting and inserting (the O(N) part) takes. Fortunately the approximate run time is O(M * (log(M))) instead of something truly horrid like O(M^2).

Comments are closed.