While researching iSCSI on VI3, I came across some interesting information when using the ESX iSCSI software initiator that would be applicable to many installations, highlighting a potential bottleneck.
The short version is that if you’re using the iSCSI software initiator connecting to a single iSCSI target, multiple uplinks in an ESX network team for the VMKernel iSCSI port would not be used for load balancing.
This can be easily proven by connecting to the service console and running esxtop (n) to view the traffic for individual network adapters. Assuming your storage is in use, one or more physical uplinks for the vSwitch handling iSCSI should be showing traffic. You can also use resxtop through the RCLI on ESXi.
Why this happens
My understanding is that current ESX software initiated iSCSI connections have a 1:1 relationship between NIC and iSCSI targets. An iSCSI target in this sense is a connection to the IP-based SAN storage, not LUN targets. This limitation applies when the SAN presents a single IP address for connectivity.
VI3 software initiated iSCSI doesn’t support multipathing, which within ESX leaves only load balancing the physical uplinks in a team. Unfortunately, that leaves load balancing up to the vSwitch load balancing policy exceptions. I don’t believe any of the three choices fit most scenarios when connectivity to the iSCSI is through a single MAC/IP:
- Route based on the originating virtual switch port ID, based on virtual port ID, of which there is only one VMKernel iSCSI port
- Route based on source MAC hash, based on source MAC, of which there is only one
- Route based on IP hash, based on layer 3 source-destination IP pair, of which there is only one (VMKernel -> iSCSI virtual address). I don’t think this is a generally recommended load balancing approach anyway
The VI3 SAN Deploy guide does state that one connection is established to each target. This seems to indicate one connection per LUN target, but the paragraph starts with software iSCSI and switches half way through to discuss iSCSI HBA’s.
I’m still unsure of whether software iSCSI has multiple TCP sessions, one per target (I don’t believe this is the case). The blog referenced below also talks about 802.3 link aggregation which states the ESX 3.x software initiator does not support multiple TCP sessions.
However, if multiple TCP sessions were being established for the iSCSI software initiator to a single target IP address, this opens the possibility of link aggregation at the physical switch. When using 802.3ad LACP in this IP-IP scenario, the switches would have to distribute connections based on the hash of TCP source/destination ports, rather than just IP/MAC.
The following excerpt from the SAN deploy guide:
Software iSCSI initiators establish only one connection to each target.
Therefore, storage systems with a single target that contains multiple LUNs have all LUN traffic routed through that one connection. In a system that has two targets, with one LUN each, two connections are established between the ESX host and the two available volumes. For example, when aggregating storage traffic from multiple connections on an ESX host equipped with multiple iSCSI HBAs, traffic for one target can be set to a specific HBA, while traffic for another target uses a different HBA. For more information, see the “Multipathing” section of the iSCSI SAN Configuration Guide. Currently, VMware ESX provides active/passive multipath capability. NIC teaming paths do not appear as multiple paths to storage in ESX host configuration displays, however. NIC teaming is handled entirely by the network layer and must be configured and monitored separately from ESX SCSI storage multipath configuration.
Excerpts from the following blog, indicate that changes in vSphere for software iSCSI to support multiple iSCSI sessions, allowing multipathing or link aggregation, which would allow separate iSCSI TCP sessions to be spread across more than one NICs (depending on how many iSCSI sessions).
The current experience discussed above (all traffic across one NIC per ESX host):
VMware can’t be accused of being unclear about this. Directly in the iSCSI SAN Configuration Guide: ESX Server‐based iSCSI initiators establish only one connection to each target. This means storage systems with a single target containing multiple LUNs have all LUN traffic on that one connection, but in general, in my experience, this is relatively unknown.
This usually means that customers find that for a single iSCSI target (and however many LUNs that may be behind that target – 1 or more), they can’t drive more than 120-160MBps. This shouldn’t make anyone conclude that iSCSI is not a good choice or that 160MBps is a show-stopper. For perspective I was with a VERY big customer recently (more than 4000 VMs on Thursday and Friday two weeks ago) and their comment was that for their case (admittedly light I/O use from each VM) this was working well. Requirements differ for every customer.
The changes in vSphere:
Now, this behavior will be changing in the next major VMware release. Among other improvements, the iSCSI initiator will be able to use multiple iSCSI sessions (hence multiple TCP connections). Looking at our diagram, this corresponds with “multiple purple pipes”for a single target. It won’t support MC/S or “multiple orange pipes per each purple pipe” – but in general this is not a big deal (large scale use of MC/S has shown a marginal higher efficiency than MPIO at very high end 10GbE configurations) .
Multiple iSCSI sessions will mean multiple “on-ramps” for MPIO (and multiple “conversations” for Link Aggregation). The next version also brings core multipathing improvements in the vStorage initiative (improving all block storage): NMP round robin, ALUA support, and EMC PowerPath for VMware which integrates into the MPIO framework and further improves multipathing. In the spirit of this post, EMC is working to make PowerPath for VMware as heterogeneous as we can.
Together – multiple iSCSI sessions per iSCSI target and improved multipathing means aggregate throughput for a single iSCSI target above that 160MBps mark in the next VMware release, as people are playing with now. Obviously we’ll do a follow up post.
Wayne's World of IT (WWoIT), Copyright 2009 Wayne Martin.