|
Hi,
We are running XFS filesystem on one of out machines which is a big store (~3TB) of different data files (mostly images). Quite recently we experienced some performance problems - machine wasn't able to keep up with updates. After some investigation it turned out that open() syscalls (open for writing) were taking significantly more time than they should eg. 15-20ms vs 100-150us. Some more info about our workload as I think it's important here: our XFS filesystem is exclusively used as data store, so we only read and write our data (we mostly write). When new update comes it's written to a temporary file eg. /mountpoint/some/path/.tmp/file When file is completely stored we move it to final location eg. /mountpoint/some/path/different/subdir/newname That means that we create lots of files in /mountpoint/some/path/.tmp directory, but directory is empty as they are moved (rename() syscall) shortly after file creation to a different directory on the same filesystem. The workaround which I found so far is to remove that directory (/mountpoint/some/path/.tmp in our case) with its content and re-create it. After this operation open() syscall goes down to 100-150us again. Is this a known problem ? Information regarding our system: CentOS 5.8 / kernel 2.6.18-308.el5 / kmod-xfs-0.4-2 Let me know if you need to know anything more. Cheers, Marcin _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
On 10/10/2012 3:51 AM, Marcin Deranek wrote:
> Hi, > > We are running XFS filesystem on one of out machines which is a big > store (~3TB) of different data files (mostly images). Quite recently we > experienced some performance problems - machine wasn't able to keep up > with updates. After some investigation it turned out that open() > syscalls (open for writing) were taking significantly more time than > they should eg. 15-20ms vs 100-150us. > Some more info about our workload as I think it's important here: > our XFS filesystem is exclusively used as data store, so we only > read and write our data (we mostly write). When new update comes it's > written to a temporary file eg. > > /mountpoint/some/path/.tmp/file > > When file is completely stored we move it to final location eg. > > /mountpoint/some/path/different/subdir/newname > > That means that we create lots of files in /mountpoint/some/path/.tmp > directory, but directory is empty as they are moved (rename() syscall) > shortly after file creation to a different directory on the same > filesystem. > The workaround which I found so far is to remove that directory > (/mountpoint/some/path/.tmp in our case) with its content and re-create > it. After this operation open() syscall goes down to 100-150us again. > Is this a known problem ? > Information regarding our system: > CentOS 5.8 / kernel 2.6.18-308.el5 / kmod-xfs-0.4-2 > Let me know if you need to know anything more. Hi Marcin, I'll begin where you ended: kmod-xfs. DO NOT USE THAT. Use the kernel driver. Eric Sandeen can point you to the why. AIUI that XFS module hasn't been supported for many many years. Regarding your problem, I can't state some of the following with authority, though it might read that way. I'm making an educated guess based on what I do know of XFS and the behavior you're seeing. Dave will clobber and correct me if I'm wrong here. ;) XFS filesystems are divided into multiple equal sized allocation groups on the underlying storage device (single disk, RAID, LVM volume, etc). With inode32 each directory that is created has its files store in only one AG, with some exceptions, which you appear to bumping up against. If you're using inode64 the directories, along with their files, go into the AGs round robin. Educated guessing: When you use rename(2) to move the files, the file contents are not being moved, only the directory entry, as with EXTx etc. Thus the file data is still in the ".tmp" directory AG, but that AG is no longer its home. Once this temp dir AG gets full of these "phantom" file contents (you can only see them with XFS tools), the AG spills over. At that point XFS starts moving the phantom contents of the rename(2) files into the AG which owns the directory of the rename(2) target. I believe this is the source of your additional latency. Each time you do an open(2) call to write a new file, XFS is moving a file's contents (extents) to its new/correct parent AG, causing much additional IO, especially if these are large files. As you are witnessing, if XFS did the move to the new AG in real time, the performance of rename(2) would be horrible on the front end. I'd guess the developers never imagined that a user would fill an entire AG using rename(2) calls. Your deleting and recreating of the .tmp directory which fixes the performance seems to be evidence of this. Each time you delete/create that directory it is put into a different AG in the filesystem, in a round robin fashion. If you do this enough times, you should eventually create the directory in the original AG that's full of the rename(2) file extents, and performance will suffer again. One of the devs probably has some tricks/tools up his sleeve to force those extents to their new parent AG. You might be able to run a nightly script to do this housekeeping. Or you could always put the .tmp directory on a different filesystem on a scratch disk. This problem could also be a free space fragmentation issue, but given that recreating the .tmp directory fixes it, I doubt free space frag is the problem. -- Stan _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
On 10/10/12 8:17 AM, Stan Hoeppner wrote:
> On 10/10/2012 3:51 AM, Marcin Deranek wrote: >> Hi, >> >> We are running XFS filesystem on one of out machines which is a big >> store (~3TB) of different data files (mostly images). Quite recently we >> experienced some performance problems - machine wasn't able to keep up >> with updates. After some investigation it turned out that open() >> syscalls (open for writing) were taking significantly more time than >> they should eg. 15-20ms vs 100-150us. >> Some more info about our workload as I think it's important here: >> our XFS filesystem is exclusively used as data store, so we only >> read and write our data (we mostly write). When new update comes it's >> written to a temporary file eg. >> >> /mountpoint/some/path/.tmp/file >> >> When file is completely stored we move it to final location eg. >> >> /mountpoint/some/path/different/subdir/newname >> >> That means that we create lots of files in /mountpoint/some/path/.tmp >> directory, but directory is empty as they are moved (rename() syscall) >> shortly after file creation to a different directory on the same >> filesystem. >> The workaround which I found so far is to remove that directory >> (/mountpoint/some/path/.tmp in our case) with its content and re-create >> it. After this operation open() syscall goes down to 100-150us again. >> Is this a known problem ? >> Information regarding our system: >> CentOS 5.8 / kernel 2.6.18-308.el5 / kmod-xfs-0.4-2 >> Let me know if you need to know anything more. > > Hi Marcin, > > I'll begin where you ended: kmod-xfs. DO NOT USE THAT. Use the kernel > driver. Eric Sandeen can point you to the why. AIUI that XFS module > hasn't been supported for many many years. Yep. Ditch that; it overrides the maintained module that comes with the kernel itself. See if that helps, first, I suppose. I've been asking Centos for a while to find some way to deprecate that, but it's like night of the living dead xfs modules. (modinfo xfs will tell you for sure which xfs.ko is getting loaded I suppose). > Regarding your problem, I can't state some of the following with > authority, though it might read that way. I'm making an educated guess > based on what I do know of XFS and the behavior you're seeing. Dave > will clobber and correct me if I'm wrong here. ;) > > XFS filesystems are divided into multiple equal sized allocation groups > on the underlying storage device (single disk, RAID, LVM volume, etc). > With inode32 each directory that is created has its files store in only > one AG, with some exceptions, which you appear to bumping up against. > If you're using inode64 the directories, along with their files, go into > the AGs round robin. Agreed that it would be good to know whether inode64 is in use. Let's start there (and with a modern xfs.ko) before we speculate further. > Educated guessing: When you use rename(2) to move the files, the file > contents are not being moved, only the directory entry, as with EXTx > etc. Thus the file data is still in the ".tmp" directory AG, but that > AG is no longer its home. Once this temp dir AG gets full of these > "phantom" file contents (you can only see them with XFS tools), the AG > spills over. At that point XFS starts moving the phantom contents of > the rename(2) files into the AG which owns the directory of the > rename(2) target. I believe this is the source of your additional > latency. Each time you do an open(2) call to write a new file, XFS is > moving a file's contents (extents) to its new/correct parent AG, causing > much additional IO, especially if these are large files. Nope, don't think so ;) Nothing is going to be moving file contents behind your back on a rename. <snip> -Eric _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
In reply to this post by Marcin Deranek
On Wed, Oct 10, 2012 at 10:51:42AM +0200, Marcin Deranek wrote:
> Hi, > > We are running XFS filesystem on one of out machines which is a big > store (~3TB) of different data files (mostly images). Quite recently we > experienced some performance problems - machine wasn't able to keep up > with updates. After some investigation it turned out that open() > syscalls (open for writing) were taking significantly more time than > they should eg. 15-20ms vs 100-150us. Which is clearly an IO latency vs cache hit latency. > Some more info about our workload as I think it's important here: > our XFS filesystem is exclusively used as data store, so we only > read and write our data (we mostly write). When new update comes it's > written to a temporary file eg. > > /mountpoint/some/path/.tmp/file > > When file is completely stored we move it to final location eg. > > /mountpoint/some/path/different/subdir/newname > > That means that we create lots of files in /mountpoint/some/path/.tmp > directory, but directory is empty as they are moved (rename() syscall) > shortly after file creation to a different directory on the same > filesystem. > The workaround which I found so far is to remove that directory > (/mountpoint/some/path/.tmp in our case) with its content and re-create > it. After this operation open() syscall goes down to 100-150us again. > Is this a known problem ? By emptying the directory, you are making it smaller and likely causing it to be cached in memory again as new files are added to it. Over time, blocks will be removed from the cache due to memory pressure, and latencies will be seen again. > Information regarding our system: > CentOS 5.8 / kernel 2.6.18-308.el5 / kmod-xfs-0.4-2 Use a more recent distro. I reworked the metadata caching algorithms a couple of years ago to avoid these sorts of problems with memory reclaim. Cheers, Dave. -- Dave Chinner [hidden email] _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
In reply to this post by Eric Sandeen-3
Hi Eric,
On Wed, 10 Oct 2012 09:31:16 -0500 Eric Sandeen <[hidden email]> wrote: > Yep. Ditch that; it overrides the maintained module that comes with > the kernel itself. See if that helps, first, I suppose. I wasn't aware that stock kernel comes with xfs module. From my testing looks like stock kernel module is still preferred over kmod-xfs: # modinfo xfs filename: /lib/modules/2.6.18-308.el5/kernel/fs/xfs/xfs.ko license: GPL description: SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled author: Silicon Graphics, Inc. srcversion: D37A003AFEE1A42BDD4DD56 depends: vermagic: 2.6.18-308.el5 SMP mod_unload gcc-4.1 module_sig: 883f3504f44471c48d0a1fbae482c4c11225a009e3fa1179850eea96ab882c910d750e88743fec5309d1ca09de3d81add6999f9dedc65f84a0d1e21293 Most likely due to historical reasons we still install kmod-xfs on our systems. To be sure I have removed kmod-xfs, unmounted filesystem and removed kernel module and them mounted filesystem again. Still seeing the very same behaviour. > Agreed that it would be good to know whether inode64 is in use. No, we don't use any special mount options here. > Let's start there (and with a modern xfs.ko) before we speculate > further. I guess next step would be to use inode64.. Regards, Marcin _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
In reply to this post by Dave Chinner
On Thu, 11 Oct 2012 10:37:04 +1100
Dave Chinner <[hidden email]> wrote: > Use a more recent distro. I reworked the metadata caching algorithms > a couple of years ago to avoid these sorts of problems with memory > reclaim. I can give a shot CentOS 6.x although that might take some time.. Marcin _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
In reply to this post by Marcin Deranek
On Thu, 11 Oct 2012 10:33:52 +0200
Marcin Deranek <[hidden email]> wrote: > I guess next step would be to use inode64.. After mounting XFS with inode64 I see performance improvement (open() now takes ~3ms vs ~15ms previous) although it's still not something I would expect (~150us.) On Dave's suggestion I will give a shot CentOS 6.x and see if that makes any difference although this needs to be monitored over longer period of time to reliably tell if that make a difference. Regards, Marcin _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
|
>>> [ ... ] open() syscalls (open for writing) were taking
>>> significantly more time than they should eg. 15-20ms vs >>> 100-150us. [ ... ] That means that we create lots of files in >>> /mountpoint/some/path/.tmp directory, but directory is empty >>> as they are moved (rename() syscall) shortly after file >>> creation to a different directory on the same filesystem. >>> The workaround which I found so far is to remove that >>> directory (/mountpoint/some/path/.tmp in our case) with its >>> content and re-create it. After this operation open() syscall >>> goes down to 100-150us again. >>> Is this a known problem ? Indeed, two known (for several decades) problems: using filesystems as DBMSes and directories as spool queues. [ ... ] > After mounting XFS with inode64 I see performance improvement > (open() now takes ~3ms vs ~15ms previous) although it's still > not something I would expect (~150us.) It would be amusing to know why ever to expect a random metadata access operation to take 150µs on *average* on a storage system that seems to have rotating disk with 10-15ms *average* access time. The metadata operations may have locality, but unsurprisingly that decreases with time... _______________________________________________ xfs mailing list [hidden email] http://oss.sgi.com/mailman/listinfo/xfs |
| Free forum by Nabble | Edit this page |
