Kernel tuning – sysctl

Here is some of my kernel tuning

# Tune network memory
net.core.wmem_max = 4194304
net.core.rmem_max = 4194304
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_abort_on_overflow = 1
# Disable IPV6 if no use.
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

# Shorten nf_conntrack timeout values
net.netfilter.nf_conntrack_generic_timeout = 180
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 30
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 40
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 40
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 60
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 60
# Need more buckets in nf_conntrack
net.nf_conntrack_max = 200000

Completely disable swap on CentOS 7

Our services are no longer on bare metal, instead, new services are on VPS. Recently we have a problem on CE7 when reboot and it waiting for swap partition, which we have removed completely to free-up space.

Since hypervisors will manage disk IO and memory of VPS, default i/o scheduler & memory paging will double the I/O. Hypervisor allow overcommit on memory, and it will try to take free memory from other idled VPS, before it start paging to disk. Therefore, best practice are I/O scheduler set to noop and disable memory paging by turn it off & remove swap.

CE7 auto tuned i/o schedule to noop when detect itself are running under hypervisor, but it won’t turn off swap and allocate plenty amount of disk space for swap partition.

Turn off swap is simple.

$ sudo swapoff -a

and remove it to free-up space.

$ sudo lvremove -Ay /dev/centos/swap

of course reassign it to /dev/centos/root

$ sudo lvextend -l +100%FREE centos/root

But one point we are missing, grub2.cfg need to be regenerate, but modification needed before regenerate.

$ sudo vi /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
##GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/root rd.lvm.lv=centos/swap crashkernel=auto rhgb quiet"
GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/root crashkernel=auto rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

$ sudo cp /etc/grub2.cfg /etc/grub2.cfg.bak
$ sudo grub2-mkconfig >/etc/grub2.cfg

Viola! No more swap partition waiting for next reboot!

Configuring LVS-TUN for web service

There are many way to spread load to server farm, but I think LVS-TUN can do the best job. Since LVS-TUN will forward the requests to REAL servers and they response client request through their own public interface. Also REAL servers can locate in different data center or even different Geographical location. This mechanism can prevent uplink flooding, compare to LVS-NAT and LVS-DR that cannot work outside a physical network segment.

Our sample config is a startup model with two failover load balancers running ldirectord, heartbeat and LVS-TUN IP-IP tunneling configured on two web nodes in different datacenter.

Requirement

You’ll need a virtual IP address (VIP) that can float between load balancers, and setup in all web nodes. The way to work with that VIP is that, load balancers will receive traffic on VIP and forward request to a web node, and it will reply using that IP address as a source. Heartbeat will determine which load balancer take control of VIP at one time. Web nodes need to configure silent and won’t send ARP.
Install necessary packages

# yum -y install \
heartbeat.i386 \
heartbeat-ldirectord.i386 \
heartbeat-pils.i386 \
heartbeat-stonith.i386 \
ipvsadm.i386

Configuring the load balancers

Copy default config files…

[[email protected]_lvs ~]# cp /usr/share/doc/heartbeat-2.1.3/ha.cf /etc/ha.d/
[[email protected]_lvs ~]# cp /usr/share/doc/heartbeat-2.1.3/authkeys /etc/ha.d/
[[email protected]_lvs ~]# cp /usr/share/doc/heartbeat-2.1.3/haresources /etc/ha.d/
[[email protected]_lvs ~]# cp /usr/share/doc/heartbeat-ldirectord-2.1.3/ldirectord.cf /etc/ha.d/

Setup ldirectord

#cat >>/etc/heartbeat/ldirectord.cf<<EOF
checktimeout=10
checkinterval=2
autoreload=yes
logfile="local0"
quiescent=yes


virtual=1.2.3.4:80
real=192.168.0.10:80 ipip
real=172.19.0.10:80 ipip
service=http
request="httpcheck.html"
receive="OK"
scheduler=rr
protocol=tcp
checktype=negotiate

Create a html file called httpcheck.html, content is “OK”. ldirectord will GET /httpcheck.html and see if it receive “OK” every two seconds, and it will drop the REAL server after 10 seconds disconnect. Noted that, the real ip is different subnet, i.e. real server don’t need to sit together in once place, one could be in USA and another one could be in Singapore. Load balance type was simply round-robin style.

Setup Heartbeat Continue reading