Chapter 2. Planning an XFS Filesystem

This chapter discusses the following:

Choosing the Filesystem Block Size

XFS lets you choose the logical block size for each filesystem by using the -b size= option of the mkfs.xfs command. (Physical disk blocks remain 512 bytes.)

For XFS filesystems on disk partitions and logical volumes and for the data subvolume of filesystems on logical volumes, the block size guidelines are as follows:

  • The minimum block size is 512 bytes. Small block sizes increase allocation overhead which decreases filesystem performance. In general, the recommended block size for filesystems under 100 MB and for filesystems with many small files is 512 bytes. The filesystem block size must be a power of two.

  • The default block size is 4096 bytes (4 KB). This is the recommended block size for filesystems over 100 MB.

  • The maximum block size is the page size of the kernel, which is 4 KB on x86 systems (both 32-bit and 64-bit) and is configurable on ia64 systems. Because large block sizes can waste space, in general block sizes should not be larger than 4096 bytes (4 KB).

Block sizes are specified in bytes as follows:

  • Decimal (default)

  • Octal (prefixed by 0)

  • Hexadecimal (prefixed by 0x or 0X)

If the number has the suffix “K” it is multiplied by 1024.

Choosing the Filesystem Directory Block Size

To select a logical block size for the filesystem directory that is greater than the logical block size of the filesystem, use the -n option of the mkfs.xfs command. This lets you choose a filesystem block size to match the distribution of data file sizes without adversely affecting directory operation performance. Using this option could improve performance for a filesystem with many small files, such as a news or mail filesystem. In this case, the filesystem logical block size could be small (512 bytes, 1 KB, or 2 KB) and the logical block size for the filesystem directory could be large (4 KB or 8 KB); this can improve the performance of directory lookups because the tree storing the index information has larger blocks and less depth.

You should consider setting a logical block size for a filesystem directory that is greater than the logical block size for the filesystem if you are supporting an application that reads directories (with the readdir(3C) or getdents(2) system calls) many times in relation to how much it creates and removes files. Using a small filesystem block size saves on disk space and on I/O throughput for the small files.

The data needed to perform a readdir operation is segregated from the index information. Directory data blocks can be “read-ahead” in a readdir. Performing read-ahead improves the readdir performance dramatically. Because the data needed for a readdir operation and index information are separate in a directory block, the offset in a directory is limited to 32 bits.

Choosing the Log Type and Size

Each XFS filesystem has a log that contains filesystem journaling records. This log requires dedicated disk space. This disk space does not show up in listings from the df command, nor can you access it with a filename.

The location of the disk space depends on the type of log you choose:

Log Type 

Description

External log 

Log records that are maintained in a dedicated log device. To make the XFS filesystem on a logical volume with a log subvolume, use the mkfs.xfs -l option.

You should use an external log in the following circumstances:

  • If you want the data and log records to be on different partitions

  • If you want the data and the log subvolume of a logical volume to be on different partitions or to use different subvolume configurations

  • If you want the log subvolume of a logical volume to be striped independently from the data subvolume

Internal log 

Log records that are put into a dedicated portion of the disk partition (or data subvolume) that contains user files. This is used when an XFS filesystem is created on a disk partition or logical volume that does not have a log subvolume. This is the default.

The amount of disk space that should be allocated for the log is a function of how the filesystem is used. The amount of disk space required for log records is proportional to the transaction rate and the size of transactions on the filesystem, not the size of the filesystem. Larger block sizes result in larger transactions. Transactions from directory updates (for example, the mkdir and rmdir commands and the create() and unlink() system calls) cause more log data to be generated.

You can choose the amount of disk space to dedicate to the log (called the log size). The minimum log size for a filesystem is enforced by the size of the largest transaction, which depends on the filesystem and directory block sizes. The maximum log size is 64K blocks or 128 MB, whichever is smaller (this will depend on the block size).

For internal logs, the size of the log is specified with the -l size= option when you create the filesystem with the mkfs.xfs command. The default log size grows with the size of the filesystem up to the maximum log size, 128 MB, on a 1-TB filesystem. The log size is specified in bytes as described in “Choosing the Filesystem Block Size”, or as a multiple of the filesystem block size by using the suffix “ b.”

For a filesystem that is contained in a striped logical volume, the default internal log size is rounded up to a multiple of the stripe unit size. In this case, the user-specified size value must be a multiple of the stripe unit size.

For external logs, the default size of the log is the same as the size of the log device. You can specify the size of the log with the -l size= option of the mkfs.xfs command, but any additional space in the log device cannot be used. You may find that you need to repartition a disk to create a properly sized log subvolume.

For filesystems with a very high transaction activity, a large log size is recommended. You should avoid making your log too large because a large log can increase filesystem mount time after a crash.

Choosing Allocation Groups and Stripe Units

The data section of an XFS filesystem is divided into allocation groups. You can select the number of allocation groups when you create an XFS filesystem or, alternatively, you can select the size of an allocation group. The larger the number of allocation groups, the more parallelism can be achieved when allocating blocks and inodes. You should avoid selecting a very large number of allocation groups or an allocation group size that will yield a very large number of allocation groups; a large number of allocation groups causes an unreasonable amount of CPU time to be used when the filesystem is close to full.

The minimum allocation group size is 16 MB; the maximum size is just under 4 GB.

The default number of allocation groups is 8, unless the filesystem is smaller than 128 MB or larger than 8 GB. When the filesystem is smaller than 128 MB, the default number of allocation groups is fewer than 8, since the minimum allocation group size is 16 MB. In this case, the data section, by default, will be divided into as many allocation groups as possible that are at least 16 MB. When the filesystem is larger than 8 GB, but smaller than 64 GB, the default number of allocation groups is greater than 8, with each allocation group approximately 1 GB in size. When the filesystem is larger than 64 GB, the default number of allocation groups is still greater than 8, but the allocation group size is 4 GB.

XFS lets you select the stripe unit for a RAID device or stripe volume. This ensures that data allocations, inode allocations, and the internal log will be aligned along stripe units when the end-of-file is extended and the file size is larger than 512 KB. You specify stripe units in 512-byte block units or in bytes. See the mkfs.xfs(1M) man page for information on specifying stripe units.

When you specify a stripe unit, you also specify a stripe width in 512-byte block units or in bytes. The stripe width must be a multiple of the stripe unit. The stripe width will be the preferred I/O size returned in the stat() system call. See the mkfs.xfs(8) man page for information on specifying stripe width.

When used in conjunction with the -b (block size) option of the mkfs.xfs command, you can use the -d su= and -d sw= options to specify the stripe unit and stripe width, respectively, in filesystem blocks.

For a RAID device, the default stripe unit is 0, indicating that the feature is disabled. You should configure the stripe unit and width sizes of RAID devices in order to avoid unexpected performance anomalies caused by the filesystem doing non-optimal I/O operations to the RAID unit. For example, if a block write is not aligned on a RAID stripe unit boundary and is not a full stripe unit, the RAID will be forced to do a read/modify/write cycle to write the data. This can have a significant performance impact. By setting the stripe unit size properly, XFS will avoid unaligned accesses.

For a striped volume, the stripe unit that was specified when the volume was created is provided by default.

Repartitioning the Disks

Many system administrators may find that they want or need to repartition disks when they switch to XFS filesystems and/or logical volumes. Some of the reasons to consider repartitioning are:

  • Repartitioning can result in a larger pool of free space for all of the formerly separate filesystems

  • If you plan to use logical volumes, you may want to put the XFS log into a small subvolume. This requires disk repartitioning to create a small partition for the log subvolume.

  • If you plan to use logical volumes, you may want to repartition to create disk partitions of equal size that can be striped or plexed.