Thursday, April 9, 2009

pmtu

ip_queue_xmit2() is an intermediate routine before we pass on the packet from
the IP layer to the packet scheduler. The routine is called both for locally generated
packets and for a forwarded packet. It does some routine checks such as header
room in the buffer. In the case where the header room is less than the size of the
hardware address, we need to reallocate the buffer for the packet. This may happen
because the routine for the destination has changed. We also compare the size of
IP datagram against the current PMTU here. If the datagram size is found to exceed
the PMTU, we need to fragment the packet. If the don ’ t fragment bit is set for IP
datagram, we need to send an ICMP message to the source TCP by calling icmp_
send() . If we are allowed to fragment the packet, it is split into fragments by calling
ip_fragment()


http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml
The IP protocol was designed for use on a wide variety of transmission links. Although the maximum length of an IP datagram

is 64K, most transmission links enforce a smaller maximum packet length limit, called a MTU. The value of the MTU depends on

the type of the transmission link. The design of IP accommodates MTU differences by allowing routers to fragment IP datagrams

as necessary. The receiving station is responsible for reassembling the fragments back into the original full size IP

datagram.


What Is PMTUD?

TCP MSS as described above takes care of fragmentation at the two endpoints of a TCP connection, but it doesn't handle the

case where there is a smaller MTU link in the middle between these two endpoints. PMTUD was developed to avoid fragmentation

in the path between the endpoints. It is used to dynamically determine the lowest MTU along the path from a packet's source

to its destination.

Note: PMTUD is only supported by TCP. UDP and other protocols do not support it. If PMTUD is enabled on a host, and it almost

always is, all TCP/IP packets from the host will have the DF bit set.

When a host sends a full MSS data packet with the DF bit set, PMTUD works by reducing the send MSS value for the connection

if it receives information that the packet would require fragmentation. A host usually "remembers" the MTU value for a

destination by creating a "host" (/32) entry in its routing table with this MTU value.

If a router tries to forward an IP datagram, with the DF bit set, onto a link that has a lower MTU than the size of the

packet, the router will drop the packet and return an Internet Control Message Protocol (ICMP) "Destination Unreachable"

message to the source of this IP datagram, with the code indicating "fragmentation needed and DF set" (type 3, code 4). When

the source station receives the ICMP message, it will lower the send MSS, and when TCP retransmits the segment, it will use

the smaller segment size.




http://security.maruhn.com/iptables-tutorial/x10386.html
iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o eth0 -j TCPMSS --set-mss 1460
yukarıdaki komut tüm syn lerdeki mss i 1460 yapıyor
iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o eth0 -j TCPMSS --clamp-mss-to-pmtu

https://blue-labs.org/howto/mtu-mss.php
Iptables - Allow only ICMP type 3, code 4 to be passed through in the FORWARD table, drop everything else. Remember, the

FORWARD chain has no effect on the packets destined for this firewall, only packets traveling through it. For packets

destined for this firewall you add the same rules to the INPUT chain.
iptables -A FORWARD -p icmp --icmp-type fragmentation-needed -j ACCEPT
iptables -A FORWARD -p icmp -j DROP
iptables -A INPUT -p icmp --icmp-type fragmentation-needed -j ACCEPT
iptables -A INPUT -p icmp -j DROP


senaryo:
linux router(192.168.101.3) imizin default gw i bir adsl pppoe modem (192.168.101.1)
ethernetimizin mtu su 1500 byte.yani bir tcp paketinde ip hdr 20 + tcp hdr 20 + data = 1500

echo request de 20 byte ip header + 8 byte icmp header ise data mız en fazla 1500 - 20 - 8 = 1472 bye olabilir.

ip route flush cache

posta:~/ozan# ping -s 1472 74.125.43.98 -M do -c 1
PING 74.125.43.98 (74.125.43.98) 1472(1500) bytes of data.
From 192.168.101.1 icmp_seq=1 Frag needed and DF set (mtu = 1492)

--- 74.125.43.98 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

posta:~/ozan# ip route show cache
74.125.43.98 via 192.168.101.1 dev eth0 src 192.168.101.3
cache expires 593sec mtu 1492 advmss 1460 hoplimit 64

posta:~/ozan# ping -s 1472 74.125.43.98 -M do -c 1
PING 74.125.43.98 (74.125.43.98) 1472(1500) bytes of data.
From 192.168.101.3 icmp_seq=1 Frag needed and DF set (mtu = 1492)

--- 74.125.43.98 ping statistics ---
0 packets transmitted, 0 received, +1 errors

posta:~/ozan# ip route show cache
74.125.43.98 via 192.168.101.1 dev eth0 src 192.168.101.3
cache expires 574sec mtu 1492 advmss 1460 hoplimit 64

youkarıya dikkat edersek ilk icmp adsl modem den geliyor ve diyorki benim mtu 1492 ona göre gönder yani paketi 8 byte küçült
burdaki 8 byte pppoe header i için

ikinci icmp ise localden geliyor bunun sebebi ise 74.125.43.104 e doğru olan pmtu nun sistem tarafından öğrenildiği ve kernel

route cache te tutulduğu görülüyor.bu durumu bir tcp bağlantıya göre şu şekilde yorumlayabiliriz.sistem cache timeout

süresine kadar kayıttaki mtudan büyük paketi 74.125.43.104 ' e göndermeyecektir.pmtu ile öğrendiği kadar göndercektir.bize

modemimiz diyorki benim mtu 1492 .8 byte küçülde gel diyor.

bizde 8 byte küçülelim

posta:~/ozan# ping -s 1464 74.125.43.99 -M do -c 1
PING 74.125.43.99 (74.125.43.99) 1464(1492) bytes of data.
64 bytes from 74.125.43.99: icmp_seq=1 ttl=244 (truncated)

16:44:58.626428 IP 192.168.101.3 > 74.125.43.99: ICMP echo request, id 33069, seq 1, length 1472
16:44:58.717041 IP 74.125.43.99 > 192.168.101.3: ICMP echo reply, id 33069, seq 1, length 64


bizde bu duruma ayak uyduralım
ifconfig eth0 mtu 1492 yapiyoruz linux router da




**************************


windows unuzun gateway inin black hole olup olmadığını anlamak için.

C:\Documents and Settings\azureus>ping -f -n 1 -l 1472 www.av.com

1472 bayt veri ile rc.fy.b.yahoo.com [206.190.60.37] 'ping' ediliyor:

192.168.1.254 cevabı: Paketlerin birleştirilmesi gerekiyor fakat DF bayrağı ayar
lanmış.

206.190.60.37 için Ping istatistiği:
Paket: Giden = 1, Gelen = 1, Kaybolan = 0 (0% kayıp),
Mili saniye türünden yaklaşık tur süreleri:
En Az = 0ms, En Çok = 0ms, Ortalama = 0ms

1500 luk paket için routerimiz icmp gönderdi demekki router imiz black hole değil.
pppoe modemimiz için mtu su 1492
optimizasyon için win xp mizni mtu sunu 1492 yapabiliriz.

No comments: