Tag Archive: NLB



We have owned a pair of Kemp 2500 Network Load Balancer for some time now. One thing I noticed after an update was I was getting alerts from the load balancer telling me that my primary balancer was unresponsive. Being a production balancer you can imagine no one wants to get this kind of message during peak times. The first time I recieved this message I was very anxious not knowing what to do. However there is plenty of information on the internet on how to resolve this issue. Being that the Kemp Load Balancer’s are built on a Linux server, the suggestions out on the internet helped tons. So I called support and they helped me to increase the values of GC_Thresh1,2 and 3. This was pretty simple and straight forward, but far from over.

So I won’t make the how to’s that are already widely available more redundant, instead I am writting this to put out a scenario where after these values were increased, I started to get that same issue happening again.

I could not believe it. I thought for sure this was fixed by increasing the values. According to Kemp they had tested these balancer’s on a class A network. So how is it that my class B is throwing everyone off and in fact freezing up again due to an overflow of the ARP table.  After running a TCP dump of only ARP requests on the balancer for 14 hours, we noticed that each ARP request were getting tripled because 3 of the 4 Nic’s on the balancer had address’ assigned to them that all go back to a single switch.

Being that the network design is flat with no VLAN’s, all ARP requests will come in every NIC. If the NIC’s were on seperate VLAN’s then the issue would not have happened, however it is very hard to go back and change a network design after it has been in place for several years. So how could this happen? it is like a broadcast storm or an ARP flood, but we found out that actually it was a utility that was being run to find all MAC address’ on the network and it’s associated IP address. This program CC Get MAC Address, floods the network with ARP requests. While every server and PC seem to handle this flood fine, the balancer’s on the other hand struggle. I would have thought that the balancer would dispose of the packets if the requests do not pertain to it, but in fact it caches the request, at an alarming rate causing the table to overflow.

So in short if you have this happening even after your threshold values have been increased, make sure no utilities are being run that will flood the network. It will save you some serious time and headaches.

Advertisements

Many load balancers now allow you to specify a location of a script to run that will tell the load balancer how busy a particular server is. This is called Adaptive Load balancing. I highly recommend using the Adaptive LB setting rather then Round Robin, or Weighted. Reasoning is that if you have an array of servers, 1 server may be performing a rather expensive task then the others, so you don’t want the Load Balancer throwing traffic at it if it is busy. You want the load balancer to make a decision as to which server is the best able to handle the connection. The ColdFusion script below will access CF JVM memory and report a number 0 – 99 that determines how busy it is. 0 being no connections 99 being busy. any number such as 101 and 102 usually is a reserved number for the Load balancer to know that you want to do a drainstop for that server. So i have an admin that i select a server to remove from the array, and it writes a 102 to a text file that the load script reads an passes on to the load balancer.

Once the load balancer querys that page it sees the 101- or 102 and begins draining connections from that server and pushing the connections to another. Note, that if you have sticky sessions, it may take a while to drain connections based on how long you keep connections persistant.

EX:

 <cfif fileExists('#expandPath('\load\adminDown.txt')#')>

<cffile action="read" file="#expandPath('\load\adminDown.txt')#" variable="return">

<cfset returnNum = trim(return)>

<cfelse>

<cfset returnNum = ''>

</cfif>

<cfif returnNum EQ ''>

<cfinvoke component="cfide.adminapi.administrator" method="login" adminPassword="password"/>

<cfinvoke component="cfide.adminapi.servermonitoring" method="getHeartbeat" returnVariable="result">

<cfset returnNum = (result["usedMemory"] / (result["usedMemory"] + result["freeMemory"])) * 100>

<cfif round(returnNum) GTE 100>

99

<cfelse>

<cfoutput>#round(returnNum)#</cfoutput>

</cfif>

<cfelse>

<cfoutput>#returnNum#</cfoutput>

</cfif>

%d bloggers like this: