October 8th, 2015 by depesz | Tags: , , , , , | 3 comments »
Did it help? If yes - maybe you can help me?

In case you're not familiar – there is a thing called LVM – it's a layer between physical disks, and filesystems, and allow certain interesting things, like extending, migrating, snapshotting and others.

At one of systems I've been dealing with, we stumbled upon specific requirement – change LV into striped. It took me a while to figure it out, so I'm writing it down, so I'll never have to research it again.

First some “vocabulary", so everything will be understood:

  • PV – Physical volume – generally a disk, or partition on a disk that stores the data
  • LV – Logical volume – what OS sees as partition/filesystem, but can be on many physical devices
  • VG – Volume group – single pool of PVs and LVs (neither PV nor LV can be in multiple VGs at a time)

Very simple approach can be: you have one disk, you make it a PV, create VG using this PV, and then you create LV within this VG, which you can then format (mkfs.*), mount and use.

Less simple example, can be situation where you have multiple disks, each becomes PV, and then you make on them LV that is as large as sum of all disks – all while making OS see it all as single device.

Of course you could also use mirroring (data is written to multiple PVs at once, to provide safety in case physical disk gets damaged).

Anyway – if you have multiple disks (let's say 3) you can just make single LV on them, with total size of their size summed, but lvm will, by default, use them in sequence – once one of them gets full, data is written to next, and so on.

This means, that at any given time, you have only performance of single disk.

What you can do, is use striping – like RAID 0. And make the same LV, with the same size, but make LVM spread all operations across all PVs – so that you will have better performance (each disk will have to write/read 1/3rd of data).

In our case, we had single PV, and on it single LV. What we didn't take into account was performance. Without going into details – if we'd use 3 (or more) disks, each smaller than the one we originally used, and we'd make striped LV on all of them – it would be faster, for the same money (the server was virtual server in AWS cloud, and disks were EBS volumes).

So, let's see how it was, and how to migrate it.

Initial situation was (originally with much larger disks, but size is irrelevant for test case):

root@test:~# df -x tmpfs
Filesystem                    1K-blocks    Used Available Use% Mounted on
/dev/xvda1                      8115168 1044624   6635268  14% /
udev                            1922248      12   1922236   1% /dev
/dev/mapper/test--vg-test--lv   8125880   18420   7671648   1% /test

As you can see, I have small (8GB) LV mounted as /test.

I can view it's details:

root@test:~# lvs
  LV      VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
  test-lv test-vg -wi-ao--- 8.00g

This shows that test-lv LV is within test-vg VG, and has size of 8GB.

What about this VG?

root@test:~# vgs
  VG      #PV #LV #SN Attr   VSize VFree   
  test-vg   1   1   0 wz--n- 9.00g 1020.00m

This VG has 1 PV, 1 LV, total size of 9.00G, and 1020m free (unused by any LV).

And finally PVs:

root@test:~# pvs
  PV         VG      Fmt  Attr PSize PFree   
  /dev/xvdf  test-vg lvm2 a--  9.00g 1020.00m

We see that there is /dev/xvdf PV, which belongs to test-vg VG, it's size is 9.00 GB, and there is 1020 MB free.

All fine.

One more information, which will become useful later on:

root@test:~# lvdisplay -m
  --- Logical volume ---
  LV Path                /dev/test-vg/test-lv
  LV Name                test-lv
  VG Name                test-vg
  LV UUID                deR1rm-UUc5-Lyjy-f2ia-A3T0-zyWF-1G4UvD
  LV Write Access        read/write
  LV Creation host, time test.depesz.com, 2015-10-08 18:54:03 +0000
  LV Status              available
  # open                 1
  LV Size                8.00 GiB
  Current LE             2048
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0
 
  --- Segments ---
  Logical extent 0 to 2047:
    Type                linear
    Physical volume     /dev/xvdf
    Physical extents    0 to 2047

Please note the last part: “— Segments —“. Each LV contains certain number of extents – which are like blocks. This particular LV has 2048 extents, numbered from 0 to 2047. And they are mapped linearly to extents 0 to 2047 on /dev/xvdf PV.

Pretty obvious, but this will be helpful later on.

Of course we can also see extent information for PV:

root@test:~# pvdisplay -m
  --- Physical volume ---
  PV Name               /dev/xvdf
  VG Name               test-vg
  PV Size               9.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              2303
  Free PE               255
  Allocated PE          2048
  PV UUID               kKkYv4-LeiX-UZfm-CtB5-0G8w-z28g-kgL3mv
 
  --- Physical Segments ---
  Physical extent 0 to 2047:
    Logical volume      /dev/test-vg/test-lv
    Logical extents     0 to 2047
  Physical extent 2048 to 2302:
    FREE

In here, we have “Physical Segments" – where extents 0 to 2047 belong to test-lv LV, and extents 2048 to 2302 are free to use for whatever.

Now – we want to migrate this LV, to 3 disks, but keep the size (we can grow it too, if needed, but that's not the subject of this blogpost).

First thing I have to do – striping works by dividing extents, equally, into all PVs. Since my LV has 2048 extents, it can't be divided equally to 3 PVs, so I'll grow it by 1 extent:

root@test:~# lvextend -l +1 /dev/test-vg/test-lv
  Extending logical volume test-lv to 8.00 GiB
  Logical volume test-lv successfully resized

Since I changed the size of underlying disk, it would be good to change ext4 filesystem size too:

root@test:~# df /test/
Filesystem                    1K-blocks  Used Available Use% Mounted on
/dev/mapper/test--vg-test--lv   8125880 18420   7671648   1% /test
 
root@test:~# resize2fs /dev/mapper/test--vg-test--lv 
resize2fs 1.42.9 (4-Feb-2014)
Filesystem at /dev/mapper/test--vg-test--lv is mounted on /test; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/mapper/test--vg-test--lv is now 2098176 blocks long.
 
root@test:~# df /test/
Filesystem                    1K-blocks  Used Available Use% Mounted on
/dev/mapper/test--vg-test--lv   8127920 18420   7673528   1% /test

as you can see size of the partition has changed from 8125880 to 8127920 blocks. But at the very least – now we have number of extents that is divisible by 3:

root@test:~# lvdisplay | grep Current\ LE
  Current LE             2049

Nice. With this fixed, let's add 3 new volumes – /dev/xvdg, /dev/xvdh, and /dev/xvgi. After creation of them, and attaching to AWS instance, I now see them in my system, and can:

root@test:~# pvcreate /dev/xvd{g,h,i}
  Physical volume "/dev/xvdg" successfully created
  Physical volume "/dev/xvdh" successfully created
  Physical volume "/dev/xvdi" successfully created

Since they are now PV's, I need to “attach" them to my test-vg:

root@test:~# vgextend test-vg /dev/xvd{g,h,i}
  Volume group "test-vg" successfully extended

Now, let's see how it looks:

root@test:~# pvs
  PV         VG      Fmt  Attr PSize PFree   
  /dev/xvdf  test-vg lvm2 a--  9.00g 1016.00m
  /dev/xvdg  test-vg lvm2 a--  3.00g    3.00g
  /dev/xvdh  test-vg lvm2 a--  3.00g    3.00g
  /dev/xvdi  test-vg lvm2 a--  3.00g    3.00g
 
root@test:~# vgs
  VG      #PV #LV #SN Attr   VSize  VFree
  test-vg   4   1   0 wz--n- 17.98g 9.98g

Please note that new pvs are 100% free (each is 3GB). And list of VGS shows that now we have 4 PVs, with total size of 17.98G, and 9.98G free (1020MB on old VG, and 9G on new PVs).

So now I'm ready to actually do the change.

Unfortunately you can't directly change LV into striped. First you have to turn it into mirrored one, and make the mirror striped.

This is done using this command:

root@test:~# lvconvert --mirrors 1 --stripes 3 /dev/test-vg/test-lv 
  Using default stripesize 64.00 KiB
  test-vg/test-lv: Converted: 0.0%
...
  test-vg/test-lv: Converted: 100.0%

It's not fast process – for my small test case it took over 4 minutes. But it's irrelevant as it doesn't lock anything – normal filesystem access works all the time.

What this command does, and why? It's simple – it adds mirror (2nd copy) to our LV, which is now (after the lvconvert ended) kept identical to original LV – so it's redundant copy. But this new copy is set as striped to 3 volumes.

How does it look now? df is obviously unchanged, but lvs shows something different:

root@test:~# lvs
  LV      VG      Attr      LSize Pool Origin Data%  Move Log          Copy%  Convert
  test-lv test-vg mwi-aom-- 8.00g                         test-lv_mlog 100.00

please note “m" letter in Attr – this means that this LV is a mirror. We can dig deeper, too:

root@test:~# lvs -a
  LV                 VG      Attr      LSize Pool Origin Data%  Move Log          Copy%  Convert
  test-lv            test-vg mwi-aom-- 8.00g                         test-lv_mlog 100.00        
  [test-lv_mimage_0] test-vg iwi-aom-- 8.00g                                                    
  [test-lv_mimage_1] test-vg iwi-aom-- 8.00g                                                    
  [test-lv_mlog]     test-vg lwi-aom-- 4.00m

test-lv is a mirror over test-lv_mimage_0 and test-lv_mimage_1 – which are the two sides of the mirror. There is also helper, 4MB test-lv_mlog LV, which is used for internal purposes.

Of course, we don't care about these [test-lv_m* LVs, these are hidden, as we care only about test-lv.

But, let's look at one more thing, lvdisplay with option to show which extent goes where:

root@test:~# lvdisplay -m -a
  --- Logical volume ---
  LV Path                /dev/test-vg/test-lv
  --- Segments ---
  Logical extent 0 to 2048:
    Type		mirror
    Mirrors		2
    Mirror size		2049
    Mirror log volume	test-lv_mlog
    Mirror region size	512.00 KiB
    Mirror original:
      Logical volume	test-lv_mimage_0
      Logical extents	0 to 2048
    Mirror destinations:
      Logical volume	test-lv_mimage_1
      Logical extents	0 to 2048
 
 
  --- Logical volume ---
  Internal LV Name       test-lv_mlog
  --- Segments ---
  Logical extent 0 to 0:
    Type		linear
    Physical volume	/dev/xvdi
    Physical extents	683 to 683
 
 
  --- Logical volume ---
  Internal LV Name       test-lv_mimage_0
  --- Segments ---
  Logical extent 0 to 2048:
    Type		linear
    Physical volume	/dev/xvdf
    Physical extents	0 to 2048
 
 
  --- Logical volume ---
  Internal LV Name       test-lv_mimage_1
  --- Segments ---
  Logical extent 0 to 2048:
    Type		striped
    Stripes		3
    Stripe size		64.00 KiB
    Stripe 0:
      Physical volume	/dev/xvdg
      Physical extents	0 to 682
    Stripe 1:
      Physical volume	/dev/xvdh
      Physical extents	0 to 682
    Stripe 2:
      Physical volume	/dev/xvdi
      Physical extents	0 to 682

(there was more information there, but I removed it as it's not all that important).

Let's see – our “usable" LV is test-lv. According to Segments map for it, there are extents 0 to 2048, which has “Mirror original" – with all extents (0 to 2048) on test-lv_mimage_0, and “Mirror destinations:", with one destination, also all extents, on “test-lv_mimage_1".

So, let's look closer at these internal LVs:

test-lv_mimage_0 is exactly the same as test-lv was before – it's linear mapping to physical extents 0 to 2048 on /dev/xvdf – i.e. original disk. What LVM did, was simply rename old LV into this.

The new, test-lv_mimage_1, LV is more interesting.

We can see, in Segments, that it's striped, has 3 stripes, with stripe size of 64KiB. And the stripes are sent to physical extents 0 to 682 on /dev/xvdg, /dev/xvdh and /dev/xvdi.

This means that this new test-lv_mimage_1 LV is exactly what we wanted – single LV striped across 3 PVs. But it's also part of mirror, which we don't want. So let's make it standalone:

root@test:~# lvconvert --mirrors 0 /dev/test-vg/test-lv /dev/xvdf
  Logical volume test-lv converted.

This command does the magic. It converts test-lv into mirror-less LV, by removing whatever was on /dev/xvdf. Afterwards, all the hidden, internal, LVS disappeared:

root@test:~# lvs -a
  LV      VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
  test-lv test-vg -wi-ao--- 8.00g

And the one existing is properly striped:

root@test:~# lvdisplay -m
  --- Logical volume ---
  LV Path                /dev/test-vg/test-lv
  --- Segments ---
  Logical extent 0 to 2048:
    Type                striped
    Stripes             3
    Stripe size         64.00 KiB
    Stripe 0:
      Physical volume   /dev/xvdg
      Physical extents  0 to 682
    Stripe 1:
      Physical volume   /dev/xvdh
      Physical extents  0 to 682
    Stripe 2:
      Physical volume   /dev/xvdi
      Physical extents  0 to 682

With this done, we should have /dev/xvdf 100% free, so let's verify that:

root@test:~# pvs
  PV         VG      Fmt  Attr PSize PFree  
  /dev/xvdf  test-vg lvm2 a--  9.00g   9.00g
  /dev/xvdg  test-vg lvm2 a--  3.00g 336.00m
  /dev/xvdh  test-vg lvm2 a--  3.00g 336.00m
  /dev/xvdi  test-vg lvm2 a--  3.00g 336.00m

Nice. Since it's no longer useful, I can remove it from VG, and then detach from instance using AWS Console:

root@test:~# vgreduce test-vg /dev/xvdf
  Removed "/dev/xvdf" from volume group "test-vg"
 
root@test:~# pvremove /dev/xvdf
  Labels on physical volume "/dev/xvdf" successfully wiped

Finally, let's see how the VG looks after all the changes:

root@test:~# vgs
  VG      #PV #LV #SN Attr   VSize VFree   
  test-vg   3   1   0 wz--n- 8.99g 1008.00m

Not bad. The greatest thing is that all these operations happened live.

To test that striping actually works, I ran:

$ iostat -kx 5 | grep -E 'xvd[ghi]|Device'

and while it was running, I ran:

root@test:/test# time dd if=/dev/zero of=test.file bs=1M count=7500; time sync

Immediately after this started I saw increase in traffic on the disks:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdi              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.00   137.88    0.20  109.98     0.81 14077.39   255.54    75.53  513.49    0.00  514.44   5.21  57.43
xvdh              0.00   137.88    0.00  110.18     0.00 14103.46   256.00    75.95  517.53    0.00  517.53   5.21  57.43
xvdi              0.00   137.88    0.20  114.46     0.81 14650.92   255.56    64.22  442.84    0.00  443.63   5.01  57.43
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.00    96.90    0.00   97.73     0.00 12509.09   256.00   124.00 1259.75    0.00 1259.75  10.57 103.31
xvdh              0.00    96.90    0.00   98.35     0.00 12549.59   255.21   123.55 1253.64    0.00 1253.64  10.50 103.31
xvdi              0.00    98.35    0.00   97.52     0.00 12449.59   255.32   105.53 1063.20    0.00 1063.20  10.59 103.31

and, as you can see, all 3 disks were used to the same extent – so I got exactly what I wanted. Nice.

  1. 3 comments

  2. # norbi
    Oct 13, 2015

    Actually it’s spelled “striped”/”striping” (single “p”), as it has stripes (and not stripped of something).

  3. Oct 13, 2015

    @norbi:
    thanks, fixed.

  4. # Roy
    Apr 30, 2017

    perfect solution for my environment.
    works like a charm !!!
    thank you

Sorry, comments for this post are disabled.