Image of the glider from the Game of Life by John Conway
Skip to content

ZFS Administration, Part XIV- ZVOLS

Table of Contents

Zpool Administration ZFS Administration Appendices
0. Install ZFS on Debian GNU/Linux 9. Copy-on-write A. Visualizing The ZFS Intent Log (ZIL)
1. VDEVs 10. Creating Filesystems B. Using USB Drives
2. RAIDZ 11. Compression and Deduplication C. Why You Should Use ECC RAM
3. The ZFS Intent Log (ZIL) 12. Snapshots and Clones D. The True Cost Of Deduplication
4. The Adjustable Replacement Cache (ARC) 13. Sending and Receiving Filesystems
5. Exporting and Importing Storage Pools 14. ZVOLs
6. Scrub and Resilver 15. iSCSI, NFS and Samba
7. Getting and Setting Properties 16. Getting and Setting Properties
8. Best Practices and Caveats 17. Best Practices and Caveats

What is a ZVOL?

A ZVOL is a "ZFS volume" that has been exported to the system as a block device. So far, when dealing with the ZFS filesystem, other than creating our pool, we haven't dealt with block devices at all, even when mounting the datasets. It's almost like ZFS is behaving like a userspace application more than a filesystem. I mean, on GNU/Linux, when working with filesystems, you're constantly working with block devices, whether they be full disks, partitions, RAID arrays or logical volumes. Yet somehow, we've managed to escape all that with ZFS. Well, not any longer. Now we get our hands dirty with ZVOLs.

A ZVOL is a ZFS block device that resides in your storage pool. This means that the single block device gets to take advantage of your underlying RAID array, such as mirrors or RAID-Z. It gets to take advantage of the copy-on-write benefits, such as snapshots. It gets to take advantage of online scrubbing, compression and data deduplication. It gets to take advantage of the ZIL and ARC. Because it's a legitimate block device, you can do some very interesting things with your ZVOL. We'll look at three of them here- swap, ext4, and VM storage. First, we need to learn how to create a ZVOL.

Creating a ZVOL

To create a ZVOL, we use the "-V" switch with our "zfs create" command, and give it a size. For example, if I wanted to create a 1 GB ZVOL, I could issue the following command. Notice further that there are a couple new symlinks that exist in /dev/zvol/tank/ and /dev/tank/ which points to a new block device in /dev/:

# zfs create -V 1G tank/disk1
# ls -l /dev/zvol/tank/disk1
lrwxrwxrwx 1 root root 11 Dec 20 22:10 /dev/zvol/tank/disk1 -> ../../zd144
# ls -l /dev/tank/disk1
lrwxrwxrwx 1 root root 8 Dec 20 22:10 /dev/tank/disk1 -> ../zd144

Because this is a full fledged, 100% bona fide block device that is 1 GB in size, we can do anything with it that we would do with any other block device, and we get all the benefits of ZFS underneath. Plus, creating a ZVOL is near instantaneous, regardless of size. Now, I could create a block device with GNU/Linux from a file on the filesystem. For example, if running ext4, I can create a 1 GB file, then make a block device out of it as follows:

# fallocate -l 1G /tmp/file.img
# losetup /dev/loop0 /tmp/file.img

I now have the block device /dev/loop0 that represents my 1 GB file. Just as with any other block device, I can format it, add it to swap, etc. But it's not as elegant, and it has severe limitations. First off, by default you only have 8 loopback devices for your exported block devices. You can change this number, however. With ZFS, you can create 2^64 ZVOLs by default. Also, it requires a preallocated image, on top of your filesystem. So, you are managing three layers of data: the block device, the file, and the blocks on the filesystem. With ZVOLs, the block device is exported right off the storage pool, just like any other dataset.

Let's look at some things we can do with this ZVOL.

Swap on a ZVOL

Personally, I'm not a big fan of swap. I understand that it's a physical extension of RAM, but swap is only used when RAM fills, spilling the cache. If this is happening regularly and consistently, then you should probably look into getting more RAM. It can act as part of a healthy system, keeping RAM dedicated to what the kernel actively needs. But, when active RAM starts spilling over to swap, then you have "the swap of death", as your disks thrash, trying to keep up with the demands of the kernel. So, depending on your system and needs, you may or may not need swap.

First, let's create 1 GB block device for our swap. We'll call the dataset "tank/swap" to make it easy to identify its intention. Before we begin, let's check out how much swap we currently have on our system with the "free" command:

# free
             total       used       free     shared    buffers     cached
Mem:      12327288    8637124    3690164          0     175264    1276812
-/+ buffers/cache:    7185048    5142240
Swap:            0          0          0

In this case, we do not have any swap enabled. So, let's create 1 GB of swap on a ZVOL, and add it to the kernel:

# zfs create -V 1G tank/swap
# mkswap /dev/zvol/tank/swap
# swapon /dev/zvol/tank/swap
# free
             total       used       free     shared    buffers     cached
Mem:      12327288    8667492    3659796          0     175268    1276804
-/+ buffers/cache:    7215420    5111868
Swap:      1048572          0    1048572

It worked! We have a legitimate Linux kernel swap device on top of ZFS. Sweet. As is typical with swap devices, they don't have a mountpoint. They are either enabled, or disabled, and this swap device is no different.

Ext4 on a ZVOL

This may sound wacky, but you could put another filesystem, and mount it, on top of a ZVOL. In other words, you could have an ext4 formatted ZVOL and mounted to /mnt. You could even partition your ZVOL, and put multiple filesystems on it. Let's do that!

# zfs create -V 100G tank/ext4
# fdisk /dev/tank/ext4
( follow the prompts to create 2 partitions- the first 1 GB in size, the second to fill the rest )
# fdisk -l /dev/tank/ext4

Disk /dev/tank/ext4: 107.4 GB, 107374182400 bytes
16 heads, 63 sectors/track, 208050 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk identifier: 0x000a0d54

          Device Boot      Start         End      Blocks   Id  System
/dev/tank/ext4p1            2048     2099199     1048576   83  Linux
/dev/tank/ext4p2         2099200   209715199   103808000   83  Linux

Let's create some filesystems, and mount them:

# mkfs.ext4 /dev/zd0p1
# mkfs.ext4 /dev/zd0p2
# mkdir /mnt/zd0p{1,2}
# mount /dev/zd0p1 /mnt/zd0p1
# mount /dev/zd0p2 /mnt/zd0p2

Enable compression on the ZVOL, copy over some data, then take a snapshot:

# zfs set compression=lzjb pool/ext4
# tar -cf /mnt/zd0p1/files.tar /etc/
# tar -cf /mnt/zd0p2/files.tar /etc /var/log/
# zfs snapshot tank/ext4@001

You probably didn't notice, but you just enabled transparent compression and took a snapshot of your ext4 filesystem. These are two things you can't do with ext4 natively. You also have all the benefits of ZFS that ext4 normally couldn't give you. So, now you regularly snapshot your data, you perform online scrubs, and send it offsite for backup. Most importantly, your data is consistent.

ZVOL storage for VMs

Lastly, you can use these block devices as the backend storage for VMs. It's not uncommon to create logical volume block devices as the backend for VM storage. After having the block device available for Qemu, you attach the block device to the virtual machine, and from its perspective, you have a "/dev/vda" or "/dev/sda" depending on the setup.

If using libvirt, you would have a /etc/libvirt/qemu/vm.xml file. In that file, you could have the following, where "/dev/zd0" is the ZVOL block device:

<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none'/>
  <source dev='/dev/zd0'/>
  <target dev='vda' bus='virtio'/>
  <alias name='virtio-disk0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>

At this point, your VM gets all the ZFS benefits underneath, such as snapshots, compression, deduplication, data integrity, drive redundancy, etc.


ZVOLs are a great way to get to block devices quickly while taking advantage of all of the underlying ZFS features. Using the ZVOLs as the VM backing storage is especially attractive. However, I should note that when using ZVOLs, you cannot replicate them across a cluster. ZFS is not a clustered filesystem. If you want data replication across a cluster, then you should not use ZVOLs, and use file images for your VM backing storage instead. Other than that, you get all of the amazing benefits of ZFS that we have been blogging about up to this point, and beyond, for whatever data resides on your ZVOL.

{ 14 } Comments

  1. Ahmed Kamal | July 22, 2013 at 2:56 pm | Permalink

    When you put ext4 on top of a ZVOL, and snapshot it .. You say it's "consistent" I guess it's only crash-consistent .. There is no FS/ZVOL integration to ensure better consistency, right

  2. Aaron Toponce | August 7, 2013 at 10:08 am | Permalink

    Not sure what you're saying. When you put ext4 on top of a ZVOL, ext4 is just a standard run-of-the-mill application wishing to stare data on ZFS just as much as anything else. So, the data is pooled into a TXG just as anything else. TXGs are flushed in sequential order to the ZIL. The contents of the ZIL are flushed to disk synchronously. So, the data is always consistent.

    Suppose you have a VM that is using that ZVOL for its storage. Suppose further that your VM crashes. At worst case, the ext4 journel is not closed. So, at next boot, you will be forced to fsck(8) the disk. What's important to know, is that the data on ZFS is still consistent, even if ext4 may have lost some data as a result of the crash. In other words, closing the journal did not happen before the rest of the data blocks were flushed to disk.

  3. Jack Relish | August 26, 2013 at 3:25 pm | Permalink

    Excellent guide. I was wondering if you had any insights on mounting partitions that exist on a ZVOL?

    For example, lets say that I have a ZVOL /dev/tank/vm0, which was used as the root device for a VM. At some point, the VM breaks or for whatever other reason I want to be able to access the contents of it's filysystem. Is it possible to expose the internals of the ZVOL? I'm sure that it could be done manually an tediously by getting the start offset of the partition and then mounting it in the same way you would a raw image file, but if there is a slicker way to do so that would be incredible.

  4. Zyon | March 10, 2014 at 7:17 pm | Permalink

    When you say "you cannot replicate them across a cluster", that means I cannot use DRBD over ZVOL?

  5. Andy | April 12, 2014 at 6:58 am | Permalink


    kpartx is probably what you'd need for getting at partitions within a ZVOL holding a VM's disk. Certainly it works for images taken from entire real disks using dd.


  6. sammand | June 3, 2014 at 9:26 pm | Permalink


    Yes ZVOLs can be replicated using DRBD and we support it in ZBOSS Linux distribution.
    You are free to check it out

  7. cbu | October 20, 2014 at 10:31 am | Permalink

    First thank for this incredible tutorial.
    I created a zvol and I am trying to attach it to a VM but I could not. Every time that I try to create a new Block device Storage Pool from virt-manager I get:

    RuntimeError: Could not start storage pool: internal error: Child process (/usr/bin/mount -t auto /dev/zvol/tank/rhel/disk1 /var/lib/libvirt/images/sss) unexpected exit status 32: mount: /dev/zd0 is write-protected, mounting read-only
    mount: unknown filesystem type '(null)'

    How can I get a block device available for Qemu? Thanks!

    P.S: I am using centOS 7 and zfs-0.6.3-1.1

  8. Mark | February 17, 2015 at 1:43 pm | Permalink

    Since ZFS is so vulnerable to fragmentation, would BTRFS on ZVOLs be a working combination? You'd miss the shared space usage between datasets, but you'd gain the advantages of the now stable BTRFS filesystem, while maintaining the reliability of RAID using ZFS. BTRFS RAID is still unstable, but it can defragment. And one can grow a ZVOL and BTRFS when needed. BTRFS is still evolving, while ZFS is stable, but Oracle will likely never release the new code (goodbye encryption) and the v5000 code isn't evolving much either. Wonder where compression would be more effective though. What's your opinion?

  9. Luca | September 8, 2015 at 5:49 am | Permalink

    Thanks for these very interesting articles on zfs!
    I want to set up some VM with Xen on ZFS and it is not clear to me which is the best solution for guest disk images: on LVM I use one LV and, when it fills up, I have to extend it. On ZFS I suppose I must create a ZVOL of some extent but what do to when is full? What is the smartest way to manage the VMs on ZFS?

  10. John Naggets | October 10, 2015 at 5:57 am | Permalink

    How do you create a ZVOL using 100% of the available size with its -V option?

  11. Ekkehard | April 25, 2016 at 2:55 am | Permalink

    Thanks for your awesome series, it was the only resource I studied before diving into ZFS, and I feel like I have a quite deep understanding now; creating and deploying my first (home) NAS with ZFS (published over Samba and rsync) was a snap.

    I have tried iSCSI as well (I use a FreeBSD-based NAS), it works beautifully to provide partitions to Win7 as "native" NTFS drives.

    Obviously, some of the benefits of datasets do not apply to ZVOLS (i.e., ZFS cannot know which blocks are actually in use and which are not, when the file system has been "cycled" a few times), I wonder whether you have practical experience with how ZVOLs evolve over time.

    For example, say I use a 100GB NTFS-formatted ZVOL to backup a windows partition (using robocopy, some Windows imaging software, or whatever other tool), to be able to keep all the windows permissions that do not survive with rsync- or Samba/CIFS-based copying. I don't know much about NTFS internals, but I assume that NTFS will, sooner than later, have touched every block at least once; at this point, ZFS will see the 100GB as active. Every further block change will directly lead to more use (at least when there are snapshots around). When I delete or "overwrite" files (in the NTFS world), ZFS will not notice, etc..

    Compared to a dataset, where ZFS knows about actual files, the ZVOL will thus have a drawback as long as the dataset is smaller. But if I compare a 100G ZVOL with a dataset that actually *uses* 100G of data, it should pretty much be the same (in terms of using up storage in the pool), no matter what file operations I do, right? (All with deduplication off, of course).

    Regarding compression, there should not be a noticable difference between a ZVOL and a dataset, right?

    I have a hourly-daily-weekly-monthly snapshot scheme; would you say ZVOLs will eat up the space quicker than a comparable dataset when many snapshots are in use?

  12. Mael Strom | February 7, 2017 at 1:20 am | Permalink

    It maybe necroposting, but it can make a difference...

    2Ekkehard: in your case you need to use sparse zvols ( -s key) and enable iscsi export to acts like ssd, with rpm=1 and unmap=on, and use windows 8 or above (XP, Vista and 7 are unable to send unmap command through iscsi). So just used (or touched by snapshot) blocks will be keep, others will discard.

  13. Anonimous | February 20, 2017 at 8:26 am | Permalink

    Just as a point, not to cause any discussion.

    I had seen some apps that refuse to work is there is no swap area defined (some als refuse to even start, no swap error message shown, etc).

    I had seen some apps that causes swap area to be used among there is plenty free ram at the same time (they are not so common, thanks who know why, etc); by the way, how an app can force data go to swap when there is free ram ata same time?

    But the worst case is when you can not add more ram to your motherboard (when i personally by motherboards i also buy the top most ram they can support), some motherboards (quite old, or not so much) only allows 2GiB of RAM (talking about PC, not laptops, etc).

    And there is also the other part counting, what if adding RAM is multiply the cost of the entire PC by four or five times? Example: A laptop with touch screen that can be turned (TabletPC) with 3GiB of ram, with a max of 4GiB said by vendor, but having some people tested it with 8GiB and with 16GiB (really max, there are no bigger ram than 8GiB and it only has two modules), now the costs, the 4GiB module (2x4GiB=8GiB) cost arround 300 euros each (the tablet cost 300 euros with 3GiB of ram), so putting 8GiB is making the tabletPC cost the triple, but the 8GiB module (2x8GiB=16GiB) cost more than one thusand euros each, both arround 2500 euros, so cost would be more than eight times the cost of the tabletpc... and much much more than a new computer.

    Sometimes adding more ram is not an option, some can not hold more than 2GiB, others is too much expensive.

    So could you explain a little how ZFS would go in a system with 2GiB RAM and 4GiB on SWAP with only one disk (laptop) of 500GiB? Understanding no dedup is being used, of course; and the top most important point: how to configure it to not be a pain in terms of speed!

    I mean: Ext4 is great (i had no loose i had noticed), but i can not trust it for silent file changes... i do not mind if HDD breaks, i have OffLine backUPs...

    I better explain it a little: If i use Ext4 for BackUPs on external media, silent changes can occur, if i use ZFS they will be detected (at leas most of them); since i use 3 to 7 external HDDs, only one powered at same time because i am a really paranoid on loosing my data (tutorials maid by me), all with Ext4, i can suffer from silent corruption (never seen it yet, but not impossible), ZFS would be great to detect them if they occur.

    Till i can use ZFS i may think my method to avoid silent corruption is great, i use 7-Zip to compress LZMA2 one directory or file, then i put such 7z file on one external disk, then unplug it, then on another, ... up to 7 disks... 7-Zip has an internal checksum, but how can i be sure all 7 copies had not have a silent corruption ata the same time? so i can not recover data from inside 7z files (all copies are bad)... to avoid at most that, i check 7z integrity prior to copy it on the 7 external disks... but it does not warranty at all co corruption can occur.

    If i just can put ZFS on each of that 7 external DISKs i would have another level of trust.

    By the way... a lot of times the Ext4 has been powered of (freeze) at brute force... but i am really lucky, i never lost anything, neither seen any of such silent changes... but i am paranoid, they can happen, so better to be safe.

    Resuming: How would you configure ZFS for rootfs (i do not like to create partitions for /home, etc, since i am so paranoid i make periodically full clones of all system on external media) for a laptop (only one hdd) with only 2GiB (3GiB at most) of RAM, with 500GiB HDD, but only 64GiB for rootfa and 64GiB for data partition, the rest is used by other OSs... better is you can explain it for SolidXK distro, thanks; thinking of having a similar response as having a Ext4 over a LUKs over a LUKs over a LUKs over a logical partition (i hate primary partitions)... and of course, having encryption enabled (better if ZFS encryption with cascade of TwoFish and Serpent algoriths, since i collaborate on coding the break of AES-128 up to AES-8192).

    Thanks in advance for any help, and also thanks for your great turorial i am reading with pleasure.

  14. Aurélien DESBRIÈRES | December 6, 2018 at 2:38 am | Permalink

    Impressive works!

{ 3 } Trackbacks

  1. [...] ZVOLs [...]

  2. [...] ZVOLs [...]

  3. […] finally booted I would have to reconfigure the iSCSI LUNs due to the encryption. After the LUNs/zvols were reconfigured they were presented to the ESXi machine via iSCSI as a datastore which contained […]

Post a Comment

Your email is never published nor shared.