Newsgroups: comp.protocols.tcp-ip
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!cunixf.cc.columbia.edu!cs.columbia.edu!ji
From: ji@cs.columbia.edu (John Ioannidis)
Subject: Re: Is the data received using recvfrom() in SOCK_RAW fragmented by IP?
Message-ID: <1991Apr13.070012.3565@cs.columbia.edu>
Followup-To: comp.protocols.tcp-ip
Sender: news@cs.columbia.edu (The Daily News)
Reply-To: ji@liberty.columbia.edu (John Ioannidis)
Organization: Columbia University Department of Computer Science
References: <912473B7FC7F400347@nusdiscs.bitnet>
Date: Sat, 13 Apr 91 07:00:12 GMT

In article <912473B7FC7F400347@nusdiscs.bitnet> TAYBENGH@NUSDISCS.BITNET (THE AGEIS) writes:
>
>Hi netlander,
>        In BSD socket, one can implement a new protocol on top of IP using
>SOCK_RAW with the designated protocol number. Then, one can receive the
>data using recvfrom() call. The recvfrom() returns the data with the IP header.

Correct. Remember to set the proper protocol type in the third field
of the socket(2) call. There is a kernel patch (see, e.g., the
distribution notes for traceroute) so that if you set IPPROTO_RAW,
you place your own IP header in the packet you are sending rather than
rely on the system for it. This can be useful, e.g., for sending your
own options. In 4.3Reno (at least), you can also accomplish that by
setting the RINPF_HDRINCL flag. 

>So far so good. But if the sender sends large amount of data in one single
>send(), and the IP layer needs to fragment the data to a few packets, then
>can we receive the data in onr single recvfrom() [Note: this implies the IP
>layer on the receiver side re-assemble all the packets b4 passing the data
>up to us]? 

All of the above. When send-ing the packet, the IP layer will fragment
the datagram if you are trying to send a packet larger than your MTU.
A limitation you may come up against is the send buffer size (2K by
default). You can change that by setting the appropriate socket-level
option, as in the following code:

	...

 	int bufsz=16384; /* send messages up to 16K in length */
	int sd;
	
	sd = socket(AF_INET, SOCK_RAW, 42);
	if (sd < 0)
	  perror("rawsocket"), exit();

	if (setsockopt(sd, SOL_SOCKET, SO_SNDBUF, &bufsz, sizeof bufsz) < 0)
	  perror("setting options"), exit();
	
	....
	
	
Upon receipt, if the receive buffer size is large enough (again,n
settable with the SO_RCVBUF option), the IP layer will reassemble your
packet and pass it to you. Remember that when the raw socket interface
passes you a packet back, the ip_len field contains the length of the
*protocol* part of the packet (i.e., total length - IP-header lenght),
and that all sizes are in host order (network addresses are still in
network order).

>         Or do we need to take care of the fragments ourself by inspecting
>the IP header and do re-assemble if necessary?

I don't think you can do that even if you want. 

>        Which one is true? Could somebody shed some light on me please?
>        Thanks a lot.
>
>- Beng Hang (email: taybengh@nusdiscs.bitnet)

I hope that's enough light! 

/ji

In-Real-Life: John "Heldenprogrammer" Ioannidis
E-Mail-To: ji@cs.columbia.edu
V-Mail-To: +1 212 854 8120
P-Mail-To: 450 Computer Science \n Columbia University \n New York, NY 10027

