September 25th, 2007 by depesz | Tags: , , , , | 5 comments »
Did it help? If yes - maybe you can help me?

i recently got new toy for tests – brand new dell powervault md1000.

what's this, you ask? basically – a rather nice das (direct attached storage) from dell.

the box i got had 15 sas discs, each disc being 72gb, 15krpm.

since this will be used as database storage, i wanted to make some performance tests.

the box was connected to a nice server, i had handy (also dell), specs:

  • 4, dual core, xeon processors (3.4ghz)
  • 32gb ram
  • 2 internal, 15k rpm, 72g, sas discs for system in raid1 setup

for making tests we used bonnie++. test procedure was quite simple:

  1. setup new raid
  2. mke2fs -j (ext3)
  3. mount using noatime and nodiratime options
  4. run bonnie using these parameters: “-u nobody:nobody -f -s 65000,8192 -n 0 -x 3" (8192 because this is page size in postgresql)
  5. results (-x 3!) were averaged

i knew we will be going for raid10, so this is what we tested at first.

2 separate series of tests were made:

  1. raid 10, purely hardware, using 2, 4, 6, 8, 10, 12, and 14 discs
  2. raid 10, mixed software/hardware – hardware was used to built 7 separate logical devices, each using 2 discs in raid1, then we combined them in software linux raid0 to create 4, 6, 8, 10, 12 and 14 discs raid10 setups

(all tests were done using no read ahead, and write-back using controller's battery backed up memory).

results are somewhat strange (red line – pure hardware, green line – hardware/software):

write.png
rewrite.png
read.png
seeks.png

if you want the results as table:

name put_block put_block_cpu rewrite rewrite_cpu get_block get_block_cpu seeks seeks_cpu
2xraid1 34820 12 25105 6 97459 9 436 1
4xraid10 95427 37 65661 19 246490 23 615 1
6xraid10 100367 39 70955 20 288188 27 672 1
8xraid10 165980 66 98887 29 423983 39 737 1
10xraid10 164195 64 96039 28 394442 36 618 1
12xraid10 185671 72 103271 30 414942 38 686 1
14xraid10 195349 76 104087 30 439088 40 821 2
2s0@2h1 86651 32 61836 18 251109 24 618 1
3s0@2h1 110977 42 79381 24 356231 34 708 2
4s0@2h1 120232 45 91988 28 391041 37 748 2
5s0@2h1 131024 50 92403 28 556601 55 788 2
6s0@2h1 123812 47 93563 28 482090 47 778 2
7s0@2h1 137513 53 100083 31 657221 65 839 2
2s0@6h10 160090 61 104375 32 482106 46 716 2
2s1@2h1 44373 16 25972 6 99071 10 651 1
13xraid5 222225 87 113040 32 392238 36 806 2
14xraid5 222690 87 114142 33 398201 36 809 2

name column means:

  • (\d+)xraid(\d+) – $1 discs in pure hardware raid $2. for example – 6xraid10 means 6 discs in pure hardware raid10
  • (\d+)s0@2h1 – $1 logical drives (where each logical drive is 2 discs in hardware raid1) connected using software raid0. for example 5s0@2h1 – 5 logical drives (each of 2 discs in raid1) connected giving in total raid10 over 10 discs in mixed hardware/software setting
  • 2s0@6h10 – 2 logical drives, each being 6-disk, hardware raid 10, connected using software raid 0
  • 2s1@2h1 – 2 logical drives, each being 2-disk, hardware raid 1, connected using software raid 1 – effectively giving size of 1 disc but using 4 drives

as you can see we tested more than i showed in graphs above, but the rest of tests were only for “fun" of testing :).

strange thing – purely hardware raid has very visible “steps" in write/rewrite performance. speed was gained only when total number of used discs was power of 2. i don't know why, but this is how it worked. in mixed software/hardware raid – there was no such situation.

since we have 15 discs, we decided to use layout of:

  • 1 disc for global hot-spare
  • 2 disc raid1 (hardware) for pg_xlog
  • 8 disc raid10 (hardware) for primary tablespace
  • 4 disc raid10 (hardware) for secondary tablespace

theoretically this layout should give best results.

after creation of the arrays and mkfs i decided to make a concurrent test of all 3 arrays at the same time.

i will not graph the results as this is not the point, but numbers from the results are:

name put_block put_block_cpu rewrite rewrite_cpu get_block get_block_cpu seeks seeks_cpu
2xraid1 30875 12 15438 5 39867 5 224 1
2xraid1 32489 14 24037 7 99700 9 389 1
2xraid1 35096 13 24708 6 96969 9 383 0
4xraid10 41343 18 31910 11 61798 8 109 0
4xraid10 80630 34 35707 12 137488 18 306 1
4xraid10 40136 17 38388 12 147282 16 255 0
8xraid10 42376 18 37513 13 155740 19 302 1
8xraid10 156044 65 34153 11 177690 22 338 1
8xraid10 146096 61 71307 25 29568 3 154 0

(i show here all 3 results for each array for comparison purposes).

as you can see the numbers are lower than expected. it has to be noted that 8xraid10 disc finished tests as first, 4xraid10 as second and 2xraid1 as last.

in general i think that the layout we've chosen is the best possible on this array, but i can't help wondering why is this controller so performance-bound to powers of 2.

hope you'll find the numbers useful for your own purposes.

  1. 5 comments

  2. Sep 25, 2007

    I think you have the wrong units on your graph, you can’t tell me that you can get 722 GB/sec performance out of this beast. I’d venture to say you are off by 1 Si unit.

  3. Sep 26, 2007

    @Darcy Buskermolen:
    heh, of course you are right. fixed the png’s 🙂

  4. Sep 26, 2007

    What surprise me most is the performance of the software raid / hardware mirror.
    I tought that it would always be slower.

    Does somebody have a benchmark comparing software-only raid V.S. hardware in a Postgresql context ?

  5. Oct 5, 2007

    Here are some more results comparing different ways to do 10 disk array with two controllers. Something is up with your seeks, seems low to me. (i was able to get 1500 seeks w/10 disks).

    merlin

  6. Oct 6, 2007

    @Merlin:
    your results are better. do you have any idea on what might be wrong?
    did you use readahead? what was stripe size?

Leave a comment