Re: [vserver] hybrid zfs pools as iSCSI targets for vserver

From: Herbert Poetzl <herbert_at_13thfloor.at>
Date: Sun 07 Aug 2011 - 01:09:34 BST
Message-ID: <20110807000934.GF12671@MAIL.13thfloor.at>

On Sat, Aug 06, 2011 at 04:30:04PM -0400, John A. Sullivan III wrote:
> On Sat, 2011-08-06 at 21:40 +0200, Eugen Leitl wrote:
>> I've recently figured out how to make low-end hardware (e.g. HP N36L)
>> work well as zfs hybrid pools. The system (Nexenta Core + napp-it)
>> exports the zfs pools as CIFS, NFS or iSCSI (Comstar).

>> 1) is this a good idea?

>> 2) any of you are running vserver guests on iSCSI targets? Happy with it?

> Yes, we have been using iSCSI to hold vserver guests for a
> couple of years now and are generally unhappy with it. Besides
> our general distress at Nexenta, there is the constraint of the
> Linux file system.

> Someone please correct me if I'm wrong because this is a big
> problem for us. As far as I know, Linux file system block size
> cannot exceed the maximum memory page size and is limited to no
> more than 4KB. iSCSI appears to acknowledge every individual
> block that is sent. That means the most data one can stream
> without an ACK is 4KB. That means the throughput is limited by
> the latency of the network rather than the bandwidth.

> Nexenta is built on OpenSolaris and has a significantly higher
> internal network latency than Linux. It is not unusual for
> us to see round trip times from host to Nexenta well upwards
> of 100us (micro-seconds). Let's say it was even as good as
> 100us. One could send up to 10,000 packets per second * 4KB
> = 40MBps maximum throughput for any one iSCSI conversation.
> That's pretty lousy disk throughput.

well, largely depends on your networking infrastructe,
for example with gigabit ethernet, your upper theoretical
limit will be 125MB/s for ethernet, now let's take a
closer look at the real thing:

iSCSI is basically SCSI over network, more precisely over
TCP/IP (at least on ethernet :), so we have a packet size
of 1500 bytes in IP land (unless you utilize jumbo packets)
which will be reduced by the IPv4 header size (min 20 bytes)
the TCP header size (min 20 bytes) and the iSCSI PDU header
(min 48 bytes), which leaves us with 1412 bytes payload.
now, on the other end, ethernet adds an equivalent of 38
bytes (8 bytes preamble, 14 bytes header, 4 bytes trailer
and an interframe gap equal to 12 bytes), which means that
it takes the time of 1538 bytes to transfer 1412 bytes of
payload, which in turn takes 1.538 microseconds to transmit

assuming a TCP receive window of 64k (maximum) we end up
with 750 kilobit per second at an optimal roundtrip time
of 0.1 miliseconds, which means 93MB/s theoretical maximum
for iSCSI over TCP/IP on gigabit ethernet.

any latency in target or initiator or any disruption of
the ethernet connection, any delay from a switch or any
other traffic on the network will drastically reduce that
to the more realistical 50MB/s ...

and don't forget, that's the raw block device speed and
does not account for filesystem overhead and possible
fragmentation.

so I assume if you speak of pretty lousy disk throughput
of 40MB/s you are using isolated 10G ethernet on the
client (initiator) side and separate 10G connections on
the target side together with TCP offloading and a really
smart switch all utilizing jumbo frames :)

best,
Herbert

> Other than that, iSCSI is fabulous because it appears as a
> local block device. We typically mount a large data volume into
> the VServer host and the mount rbind it into the guest file
> systems. A magically well working file server without a file
> server or the hassles of a network file system. Our single
> complaint other than about Nexenta themselves is the latency
> constrained throughput.

> Any one have a way around that? Thanks - John
Received on Sun Aug 7 01:09:47 2011

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Sun 07 Aug 2011 - 01:09:47 BST by hypermail 2.1.8