vserver development mailing list: Re: [vserver] hybrid zfs pools as iSCSI targets for vserver

From: John A. Sullivan III <jsullivan_at_opensourcedevel.com>
Date: Sat 06 Aug 2011 - 21:51:44 BST
Message-ID: <1312663904.8151.16.camel@denise.theartistscloset.com>

On Sat, 2011-08-06 at 21:37 +0100, Gordan Bobic wrote:
> On 08/06/2011 09:30 PM, John A. Sullivan III wrote:
> > On Sat, 2011-08-06 at 21:40 +0200, Eugen Leitl wrote:
> >> I've recently figured out how to make low-end hardware (e.g. HP N36L)
> >> work well as zfs hybrid pools. The system (Nexenta Core + napp-it)
> >> exports the zfs pools as CIFS, NFS or iSCSI (Comstar).
> >>
> >> 1) is this a good idea?
> >>
> >> 2) any of you are running vserver guests on iSCSI targets? Happy with it?
> >>
> > Yes, we have been using iSCSI to hold vserver guests for a couple of
> > years now and are generally unhappy with it. Besides our general
> > distress at Nexenta, there is the constraint of the Linux file system.
> >
> > Someone please correct me if I'm wrong because this is a big problem for
> > us. As far as I know, Linux file system block size cannot exceed the
> > maximum memory page size and is limited to no more than 4KB.
>
> I'm pretty sure it is _only_ limited by memory page size, since I'm
> pretty sure I remember that 8KB blocks were available on SPARC.
Yes, or for example, Oracle can write directly bypassing the file system
and thus works very well with iSCSI by setting very large block sizes.
>
> > iSCSI
> > appears to acknowledge every individual block that is sent. That means
> > the most data one can stream without an ACK is 4KB. That means the
> > throughput is limited by the latency of the network rather than the
> > bandwidth.
>
> Hmm, buffering in the FS shouldn't be dependant on the block layer
> immediately acknowledging unless you are issuing fsync()/barriers. What
> FS are you using on top of the iSCSI block device and is your
> application fsync() heavy?
The application is for standard file service and we are not using
barriers. We have tried using the device as disk, as part of LVM, as
part of a RAID device, as part of dm-multipath multi-bus. Pretty much
the same results across the board. We could produce higher aggregate
throughput with RAID and multibus by multiplexing several individual
streams but even then only after changing from the default CFQ scheduler
to noop (which I suppose makes sense when writing to a SAN). Individual
streams are still limited by latency.

When we are drawing from cache, our systems fly but anything that
touches disk is painfully slow - John
>
> Gordan
Received on Sat Aug 6 21:51:58 2011