Newsgroups: comp.unix.programmer,comp.unix.answers,comp.answers,news.answers Path: news1.ucsd.edu!ihnp4.ucsd.edu!munnari.OZ.AU!news.mel.connect.com.au!news.mira.net.au!Germany.EU.net!howland.reston.ans.net!vixen.cso.uiuc.edu!newsfeed.internetmci.com!in1.uu.net!ott.istar!istar.net!infoshare!whome!telly!innuendo.tlug.org!brutus!vic From: vic@brutus.tlug.org (Vic Metcalfe) Subject: [comp.unix.programmer] Unix-socket-faq for network programming Approved: news-answers-request@MIT.EDU Followup-To: comp.unix.programmer X-Newsreader: TIN [version 1.2 PL2] Organization: Zymurgy Systems, Aurora, Ontario, Canada Message-ID: <1996Jun21.215610.11542@brutus.tlug.org> Date: Fri, 21 Jun 1996 21:56:10 GMT Summary: This posting offers answers to frequent questions about network programming in the unix environment using sockets. Lines: 1364 Xref: news1.ucsd.edu comp.unix.programmer:34835 comp.answers:15486 news.answers:61636 Archive-name: unix-faq/socket Posting-Frequency: monthly Last-modified: 1996/06/21 URL: http://www.auroraonline.com/sock-faq/ ----------------------------- Programming UNIX Sockets in C Frequently Asked Questions ----------------------------- Part I. General Information and Concepts 1: About this FAQ 2: Who is this FAQ for? 3: What are Sockets? 4: How do Sockets Work? 5: Where can I get source code for the book ""? 6: Where can I get more information? Part II. Questions regarding both Clients and Servers 1: How can I tell when a socket is closed on the other end? 2: What's with the second parameter in bind()? 3: How do I get the port number for a given service? 4: If bind() fails, what should I do with the socket descriptor? 5: How do I properly close a socket? 6: When should I use shutdown()? 7: Please explain the TIME_WAIT state. 8: Why does it take so long to detect that the peer died? 9: What are the pros/cons of select(), non-blocking I/O and SIGIO? 10: Why do I get EPROTO from read()? 11: How can I force a socket to send the data in it's buffer? 12: Where can a get a library for programming sockets? 13: How come select says there is data, but read returns zero? 14: Whats the difference between select() and poll()? 15: How do I send [this] over a socket? 16: How do I use TCP_NODELAY? 17: What exactly does the Nagle algorithm do? 18: What is the difference between read() and recv()? 19: I see that send()/write() can generate SIGPIPE. Is there any advantage to handling the signal, rather than just ignoring it and checking for the EPIPE error? Are there any useful parameters passed to the signal catching function? 20: I'm writing a sockets program which must be chroot() to a particular directory. But after the chroot(), calls to socket() are failing with "bad file number". (Solaris 2.4) Part III. Writing Client Applications 1: How do I convert a string into an internet address? 2: How can my client work through a firewall/proxy server? 3: Why does connect() succeed even before my server did an accept()? 4: Why do I sometimes loose a server's address when using more than one server? 5: How can I set the timeout for the connect() system call? 6: Should I bind() a port number in my client program, or let the system choose one for me on the connect() call? 7: Why do I get "connection refused" when the server isn't running? Part IV. Writing Server Applications 1: How come I get "address already in use" from bind? 2: Why don't my sockets close? 3: How can I make my server a daemon? 4: How can I listen on more than one port at a time? 5: What exactly does SO_REUSEADDR do? 6: What exactly does SO_LINGER do? 7: What exactly does SO_KEEPALIVE do? 8: How can I bind() to a port number < 1024? 9: How do I get my server to find out the client's address / hostname? 10: How do I use the gethostbyaddr() function? 11: How should I choose a port number for my server? 12: What is the difference between SO_REUSEADDR and SO_REUSEPORT? Appendix A. Sample Source Code ----------------------------------------- Part I. General Information and Concepts ----------------------------------------- I.1: About this FAQ ^^^^^^^^^^^^^^^^^^^^ This FAQ is maintained by Vic Metcalfe (vic@brutus.tlug.org), with lots of assistance from Andrew Gierth (andrewg@microlise.co.uk). While I am no expert, I do have some knowledge of sockets. I am depending on the true wizards to fill in the details, and correct my (no doubt) plentiful mistakes. The code examples in this FAQ are written to be easy to follow and understand. It is up to the reader to make them as efficient as required. After reading comp.unix.programmer for a short time, it became evident that a FAQ was needed. The FAQ is available at the following locations: Usenet: (Posted on the 21st of each month) news.answers, comp.answers, comp.unix.answers, comp.unix.programmer FTP: ftp://rtfm.mit.edu/pub/usenet/news.answers/unix-faq/socket WWW: http://www.auroraonline.com/sock-faq http://kipper.york.ac.uk/~vic/sock-faq The faq itself is mirrored in Japan by Takayuki Fujino on his web page: http://www.join.ad.jp/tech/faq-ee.html. Please email me if you would like to correct or clarify an answer. I would also like to hear from you if you would like me to add a question to the list. I may not be able to answer it, but I can add it in the hopes that someone else will submit an answer. I.2: Who is this FAQ for? ^^^^^^^^^^^^^^^^^^^^^^^^^^ This FAQ is for C programmers in the Unix environment. It is not intended for WinSock programmers, or for Perl, Java, etc. I have nothing against Windows or Perl, but I had to limit the scope of the FAQ for the first draft. In the future, I would really like to provide examples for Perl, Java, and maybe others. For now though I will concentrate on correctness and completeness for C. This version of the FAQ will only cover sockets of the AF_INET family, since this is their most common use. Coverage of other types of sockets may be added later. I.3: What are Sockets? ^^^^^^^^^^^^^^^^^^^^^^^ Sockets are just like "worm holes" in science fiction. When things go into one end, they (should) come out of the other. Different kinds of sockets have different properties. Sockets are either connection-oriented or connectionless. Connection-oriented sockets allow for data to flow back and forth as needed, while connectionless sockets (also known as datagram sockets) allow only one message at a time to be transmitted, without an open connection. There are also different socket families. The two most common are AF_INET for internet connections, and AF_UNIX for unix IPC (interprocess communication). As stated earlier, this FAQ deals only with AF_INET sockets. I.4: How do Sockets Work? ^^^^^^^^^^^^^^^^^^^^^^^^^^ The implementation is left up to the vendor of your particular unix, but from the point of view of the programmer, connection-oriented sockets work a lot like files, or pipes. The most noticeable difference, once you have your file descriptor is that read() or write() calls may actually read or write fewer bytes than requested. If this happens, then you will have to make a second call for the rest of the data. There are examples of this in the source code that accompanies the faq. I.5: Where can I get source code for the book ""? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here is a list of the places I know to get source code for network programming books. It is very short, so please mail me with any others you know of. Title: Unix Network Programming Author: W. Richard Stevens Publisher: Prentice Hall, Inc. ISBN: 0-13-949876-1 URL: http://www.noao.edu/~rstevens Title: Power Programming with RPC Author: John Bloomer Publisher: O'Reilly & Associates, Inc. ISBN: 0-937175-77-3 URL: ftp://ftp.uu.net/published/oreilly/nutshell/rpc/rpc.tar.Z Recommended by: Lokmanm Merican Title: UNIX PROGRAM DEVELOPMENT for IBM PC'S Including OSF/Motif Author: Thomas Yager Publisher: Addison Wesley, 1991 ISBN: 0-201-57727-5 I.6: Where can I get more information? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I keep a copy of the resources I know of on my socks page on the web. I don't remember where I got most of these items, but some day I'll check out their sources, and provide ftp information here. For now, you can get them at http://www.auroraonline.com/~vic/sock-faq. Included is the TCP/IP faq (which is really geared more to sys-admins than it is programmers), relevant rfc's and standards, as well as Jim Frost's socket tutorial. All of the source from this FAQ is available there too. I fantasize about adding my own socket tutorial to the page, with all kind of nifty interactive Java components, but I'll probably never get around to doing it. (On the other hand, you never do know. I did manage to put this FAQ together.) ------------------------------------------------------ Part II. Questions regarding both Clients and Servers ------------------------------------------------------ II.1: How can I tell when a socket is closed on the other end? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From Andrew Gierth (andrewg@microlise.co.uk): > AFAIK: > > If the peer calls close() or exits, without having messed with SO_LINGER, > then our calls to read() should return 0. It is less clear what happens > to write() calls in this case; I would expect EPIPE, not on the next > call, but the one after. > > If the peer reboots, or sets l_onoff = 1, l_linger = 0 and then closes, > then we should get ECONNRESET (eventually) from read(), or EPIPE from > write(). > > I should also point out that when write() returns EPIPE, it also > raises the SIGPIPE signal - you never see the EPIPE error unless you > handle or ignore the signal. > > If the peer remains unreachable, we should get some other > error. > > I don't think that write() can legitimately return 0. read() should > return 0 on receipt of a FIN from the peer, and on all following calls. > > So yes, you _must_ expect read() to return 0. > > As an example, suppose you are receiving a file down a TCP link; you > might handle the return from read() like this: > > rc = read(sock,buf,sizeof(buf)); > if (rc > 0) > { > write(file,buf,rc); > /* error checking on file omitted */ > } > else if (rc == 0) > { > close(file); > close(sock); > /* file received successfully */ > } > else /* rc < 0 */ > { > /* close file and delete it, since data is not complete > report error, or whatever */ > } II.2: What's with the second parameter in bind()? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The man page shows it as "struct sockaddr *my_addr". The sockaddr struct though is just a place holder for the structure it really wants. You have to pass different structures depending on what kind of socket you have. For an AF_INET socket, you need the sockaddr_in structure. It has three fields of interest: sin_family: Set this to AF_INET. sin_port: The network byte-ordered 16 bit port number sin_addr: The host's ip number. This is a struct in_addr, which contains only one field, s_addr which is a u_long. II.3: How do I get the port number for a given service? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use the getservbyname() routine. This will return a pointer to a servent structure. You are interested in the s_port field, which contains the port number, with correct byte ordering (so you don't need to call htons on it). Here is a sample routine: /* Take a service name, and a service type, and return a port number. If the service name is not found, it tries it as a decimal number. The number returned is byte ordered for the network. */ int atoport(char *service, char *proto) { int port; long int lport; struct servent *serv; char *errpos; /* First try to read it from /etc/services */ serv = getservbyname(service, proto); if (serv != NULL) port = serv->s_port; else { /* Not in services, maybe a number? */ lport = strtol(service,&errpos,0); if ( (errpos[0] != 0) || (lport < 1) || (lport > 5000) ) return -1; /* Invalid port address */ port = htons(lport); } return port; } II.4: If bind() fails, what should I do with the socket descriptor? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you are exiting, I have been assured by Andrew that all unixes will close open file descriptors on exit. If you are not exiting though, you can just close it with a regular close() call. II.5: How do I properly close a socket? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This question is usually asked by people who try close(), because they have seen that that is what they are supposed to do, and then run netstat and see that their socket is still active. Yes, close() is the correct method. To read about the TIME_WAIT state, and why it is important, refer to Part II, question 7. II.6: When should I use shutdown()? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From Michael Hunter : Shutdown is useful for deliniating when you are done providing a request to a server using TCP. A typical use is to send a request to a server followed by a shutdown(1). The server will read your request followed by an EOF (read of 0 on most unix implementations). This tells the server that it has your full request. You then go read blocked on the socket. The server will process your request and send the necessary data back to you followed by a close. When you have finished reading all of the response to your request you will read an EOF thus signifying that you have the whole response. It should be noted the TTCP (TCP for Transactions -- see R. Steven's home page) provides for a better method of tcp transaction management. II.7. Please explain the TIME_WAIT state. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Remember that TCP guarantees all data transmitted will be delivered, if at all possible. When you close a socket, the server goes into a TIME_WAIT state, just to be really really sure that all the data has gone through. When a socket is closed, both sides agree by sending messages to each other that they will send no more data. This, it seemed to me was good enough, and after the handshaking is done, the socket should be closed. The problem is two-fold. First, there is no way to be sure that the last ack was communicated successfully. Second, there may be "wandering duplicates" left on the net that must be dealt with if they are delivered. Andrew Gierth (andrewg@microlise.co.uk) helped to explain the closing sequence in the following usenet posting: > Assume that a connection is in ESTABLISHED state, and the client is about > to do an orderly release. The client's sequence no. is Sc, and the server's > is Ss. The pipe is empty in both directions. > > Client Server > ====== ====== > ESTABLISHED ESTABLISHED > (client closes) > ESTABLISHED ESTABLISHED > ------->> > FIN_WAIT_1 > <<-------- > FIN_WAIT_2 CLOSE_WAIT > <<-------- (server closes) > LAST_ACK > , ------->> > TIME_WAIT CLOSED > (2*msl elapses...) > CLOSED > > Note: the +1 on the sequence numbers is because the FIN counts as one byte > of data. (The above diagram is equivalent to fig. 13 from RFC 793). > > Now consider what happens if the last of those packets is dropped in the > network. The client has done with the connection; it has no more data or > control info to send, and never will have. But the server does not know > whether the client received all the data correctly; that's what the last > ACK segment is for. Now the server may or may not *care* whether the > client got the data, but that is not an issue for TCP; TCP is a reliable > protocol, and *must* distinguish between an orderly connection _close_ > where all data is transferred, and a connection _abort_ where data may > or may not have been lost. > > So, if that last packet is dropped, the server will retransmit it (it is, > after all, an unacknowledged segment) and will expect to see a suitable > ACK segment in reply. If the client went straight to CLOSED, the only > possible response to that retransmit would be a RST, which would indicate > to the server that data had been lost, when in fact it had not been. > > (Bear in mind that the server's FIN segment may, additionally, contain > data.) > > DISCLAIMER: This is my interpretation of the RFCs (I have read all the > TCP-related ones I could find), but I have not attempted to examine > implementation source code or trace actual connections in order to > verify it. I am satisfied that the logic is correct, though. The second issue was addressed by Richard Stevens (rstevens@noao.edu, author of Unix Network Programming). I have put together quotes from some of his postings and email which explain this. I have brought together paragraphs from different postings, and have made as few changes as possible. > If the duration of the T_W state were just to handle TCP's full-duplex > close, then the time would be much smaller, and it would be some function > of the current RTO (retransmission timeout), not the MSL (the packet > lifetime). > A couple of points about the T_W state. > > - The end that sends the first FIN goes into the T_W state, because that > is the end that sends the final ACK. If the other end's FIN is lost, or > if the final ACK is lost, having the end that sends the first FIN > maintain state about the connection guarantees that it has enough > information to retransmit the final ACK. > > - Realize that TCP sequence numbers wrap around after 2**32 bytes have been > transferred. Assume a connection between A.1500 (host A, port 1500) and > B.2000. During the connection one segment is lost and > retransmitted. But the segment is not really lost, it is held by > some intermediate router and then re-injected into the network. (This > is called a "wandering duplicate".) But in the time between the > packet being lost & retransmitted, and then reappearing, the > connection is closed (without any problems) and then another > connection is established between the same host, same port (that is, > A.1500 and B.2000; this is called another "incarnation" of the > connection). But the sequence numbers chosen for the new incarnation > just happen to overlap with the sequence number of the wandering > duplicate that is about to reappear. (This is indeed possible, given > the way sequence numbers are chosen for TCP connections.) Bingo, you > are about to deliver the data from the wandering duplicate (the > previous incarnation of the connection) to the new incarnation of the > connection. To avoid this, you do not allow the same incarnation of > the connection to be reestablished until the T_W state terminates. > > Even the T_W state doesn't complete solve the second problem, given > what is called T_W assassination. RFC 1337 has more details. > > - The reason that the duration of the T_W state is 2*MSL is that the > maximum amount of time a packet can wander around a network is > assumed to be MSL seconds. The factor of 2 is for the round-trip. > The recommended value for MSL is 120 seconds, but Berkeley-derived > implementations normally use 30 seconds instead. This means a T_W > delay between 1 and 4 minutes. Solaris 2.x does indeed use the > recommended MSL of 120 seconds. > A wandering duplicate is a packet that appeared to be lost and was > retransmitted. But it wasn't really lost ... some router had problems, > held on to the packet for a while (order of seconds, could be a minute > if the TTL is large enough) and then re-injects the packet back into > the network. But by the time it reappears, the application that sent > it originally has already retransmitted the data contained in that packet. > Because of these potential problems with T_W assassinations, one should > *not* avoid the T_W state by setting the SO_LINGER option to send an > RST instead of the normal TCP connection termination (FIN/ACK/FIN/ACK). > The T_W state is there for a reason; it's your friend and it's there to > help you :-) > I have a long discussion of just this topic in my just-released "TCP/IP > Illustrated, Volume 3". The T_W state is indeed, one of the most > misunderstood features of TCP. > I'm currently rewriting UNP and will include lots more on this topic, as > it is often confusing and misunderstood. An additional note from Andrew: > - closing a socket: if SO_LINGER has not been called on a socket, then > close() is not supposed to discard data. This is true on SVR4.2 (and, > apparently, on all non-SVR4 systems) but apparently *not* on SVR4; the > use of either shutdown() or SO_LINGER seems to be required to > guarantee delivery of all data. II.8: Why does it take so long to detect that the peer died? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) Because by default, no packets are sent on the TCP connection unless there is data to send or acknowledge. So, if you are simply waiting for data from the peer, there is no way to tell if the peer has silently gone away, or just isn't ready to send any more data yet. This can be a problem (especially if the peer is a PC, and the user just hits the Big Switch...). One solution is to use the SO_KEEPALIVE option. This option enables periodic probing of the connection to ensure that the peer is still present. BE WARNED: the default timeout for this option is AT LEAST 2 HOURS. This timeout can often be altered (in a system-dependent fashion) but not normally on a per-connection basis (AFAIK). RFC1122 specifies that this timeout (if it exists) must be configurable. On the majority of Unix variants, this configuration may only be done globally, affecting all TCP connections which have keepalive enabled. The method of changing the value, moreover, is often difficult and/or poorly documented, and in any case is different for just about every version in existence. If you must change the value, look for something resembling tcp_keepidle in your kernel configuration or network options configuration. If you're *sending* to the peer, though, you have some better guarantees; since sending data implies receiving ACKs from the peer, then you will know after the retransmit timeout whether the peer is still alive. But the retransmit timeout is designed to allow for various contingencies, with the intention that TCP connections are not dropped simply as a result of minor network upsets. So you should still expect a delay of several minutes before getting notification of the failure. The approach taken by most application protocols currently in use on the Internet (e.g. FTP, SMTP etc.) is to implement read timeouts on the server end; the server simply gives up on the client if no requests are received in a given time period (often of the order of 15 minutes). Protocols where the connection is maintained even if idle for long periods have two choices: 1) use SO_KEEPALIVE 2) use a higher-level keepalive mechanism (such as sending a null request to the server every so often). II.9: What are the pros/cons of select(), non-blocking I/O and SIGIO? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ II.10: Why do I get EPROTO from read()? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From Steve Rago (sar@plc.com): > EPROTO means that the protocol encountered an unrecoverable error > for that endpoint. EPROTO is one of those catch-all error codes > used by STREAMS-based drivers when a better code isn't available. II.11: How can I force a socket to send the data in it's buffer? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From Richard Stevens (rstevens@noao.edu): > You can't force it. Period. TCP makes up its own mind as to when > it can send data. Now, *normally* when you call write() on a TCP > socket, TCP will indeed send a segment, but there's no guarantee > and no way to force this. There are *lots* of reasons why TCP > will not send a segment: a closed window and the Nagle algorithm > are two things to come immediately to mind. > Setting this only disables one of the many tests, the Nagle algorithm. > But if the original poster's problem is this, then setting this socket > option will help. > > A quick glance at tcp_output() shows around 11 tests TCP has to make > as to whether to send a segment or not. Now from Dr. Charles E. Campbell Jr. : > As you've surmised, I've never had any problem with disabling Nagle's > algorithm. Its basically a buffering method; there's a fixed overhead > for all packets, no matter how small. Hence, Nagle's algorithm > collects small packets together (no more than .2sec delay) and thereby > reduces the amount of overhead bytes being transferred. This approach > works well for rcp, for example: the .2 second delay isn't humanly > noticeable, and multiple users have their small packets more > efficiently transferred. Helps in university settings where most folks > using the network are using standard tools such as rcp and ftp, and > programs such as telnet may use it, too. > > However, Nagle's algorithm is pure havoc for real-time control and not much > better for keystroke interactive applications (control-C, anyone?). It has > seemed to me that the types of new programs using sockets that people write > usually do have problems with small packet delays. One way to bypass > Nagle's algorithm selectively is to use "out-of-band" messaging, but > that is limited in its content and has other effects (such as a loss of > sequentiality) (by the way, out-of-band is often used for that ctrl-C, > too). So to sum it all up, if you are having trouble and need to flush the socket, setting the TCP_NODELAY option will usually solve the problem. If it doesn't, you will have to use out-of-band messaging, but according to Andrew, "out-of-band data has its own problems, and I don't think it works well as a solution to buffering delays (haven't tried it though). It is *not* 'expedited data' in the sense that exists in some other protocols; it is transmitted in-stream, but with a pointer to indicate where it is." I asked Andrew Gierth something to the effect of "What promises does TCP make about when it will get around to writing data to the network?" I thought his reply should be put under this question. Normal lines are Andrew's, lines beginning with >: are also Andrew's, and lines beginning with just a > are mine. Here it is: Not many promises, but some. I'll try and quote chapter and verse on this: [References: RFC 1122, "Requirements for Internet Hosts" (also STD 3) RFC 793, "Transmission Control Protocol" (also STD 7) ] 1. The socket interface does not provide access to the TCP PUSH flag. 2. RFC1122 says (4.2.2.2): A TCP MAY implement PUSH flags on SEND calls. If PUSH flags are not implemented, then the sending TCP: (1) must not buffer data indefinitely, and (2) MUST set the PSH bit in the last buffered segment (i.e., when there is no more queued data to be sent). 3. RFC793 says (2.8): When a receiving TCP sees the PUSH flag, it must not wait for more data from the sending TCP before passing the data to the receiving process. [RFC1122 supports this statement.] 4. Therefore, data passed to a write() call must be delivered to the peer within a finite time, unless prevented by protocol considerations. 5. There are (according to a post from Stevens quoted in the FAQ) about 11 tests made which could delay sending the data. But as I see it, there are only 2 that are significant, since things like retransmit backoff are a) not under the programmers control and b) must either resolve within a finite time or drop the connection. The first of the interesting cases is: >: - window closed (ie. there is no buffer space at the receiver; >: this can delay data indefinitely, but only if the receiving >: process is not actually reading the data that is available) >OK, it makes sense that if the client isn't reading, the data isn't going >to make it across the connection. I take it this causes the sender to >block after the recieve queue is filled? The sender blocks when the socket send buffer is full, so buffers will be full at both ends. While the window is closed, the sending TCP sends window probe packets. This ensures that when the window finally does open again, the sending TCP detects the fact. [RFC1122, ss 4.2.2.17] The second interesting case is: >: - Nagle algorithm (small segments, e.g. keystrokes, are delayed to >: form larger segments if ACKs are expected from the peer; this >: is what is disabled with TCP_NODELAY) >Does this mean that my tcpclient sample should set TCP_NODELAY to ensure >that the end-of-line code is indeed put out onto the network when sent? No. tcpclient.c is doing the right thing as it stands; trying to write as much data as possible in as few calls to write() as is feasible. Since the amount of data is likely to be small relative to the socket send buffer, then it is likely (since the connection is idle at that point) that the entire request will require only one call to write(), and that the TCP layer will immediately dispatch the request as a single segment (with the PSH flag, see point 2.2 above). The Nagle algorithm only has an effect when a second write() call is made while data is still unacknowledged. In the normal case, this data will be left buffered until either: a) there is no unacknowledged data; or b) enough data is available to dispatch a full-sized segment. The delay cannot be indefinite, since condition (a) must become true within the retransmit timeout or the connection dies. Since this delay has negative consequences for certain applications, generally those where a stream of small requests are being sent without response, e.g. mouse movements, the standards specify that an option must exist to disable it. [RFC1122, ss 4.2.3.4] Additional note: RFC1122 also says: [DISCUSSION]: When the PUSH flag is not implemented on SEND calls, i.e., when the application/TCP interface uses a pure streaming model, responsibility for aggregating any tiny data fragments to form reasonable sized segments is partially borne by the application layer. So programs should avoid calls to write() with small data lengths (small relative to the MSS, that is); it's better to build up a request in a buffer and then do one call to sock_write() or equivalent. >: The other possible sources of delay in the TCP are not really >: controllable by the program, but they can only delay the data >: temporarily. > >By temporarily, you mean that the data will go as soon as it can, and I >won't get stuck in a position where one side is waiting on a response, >and the other side hasn't recieved the request? (Or at least I won't >get stuck forever) You can only deadlock if you somehow manage to fill up all the buffers in both directions... not easy. If it is possible to do this, (can't think of a good example though), the solution is to use nonblocking mode, especially for writes. Then you can buffer excess data in the program as necessary. II.12: Where can a get a library for programming sockets? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There is the Simple Sockets Library by Charles E. Campbell, Jr. PhD. and Terry McRoberts. The file is called ssl.tar.gz, and you can download it from this faq's home page. For c++ there is the Socket++ library which is supposed to be on ftp://ftp.virginia.edu somewhere. There is also C++ Wrappers, but I can't find this package anywhere. The file is called C++_wrappers.tar.gz. I have asked the people where it used to be stored where I can find it now. From http://www.cs.wustl.edu/~schmidt you should be able to find the ACE toolkit. I don't have any experience with any of these libraries, so I can't recomend one over the other. II.13: How come select says there is data, but read returns zero? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The data that causes select to return is the EOF because the other side has closed the connection. This causes read to return zero. For more information see question II.1. II.14: Whats the difference between select() and poll()? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From Richard Stevens (rstevens@noao.edu): The basic difference is that select's fd_set is a bit mask and therefore has some fixed size. It would be possible for the kernel to not limit this size when the kernel is compiled, allowing the application to define FD_SETSIZE to whatever it wants (as the comments in the system header imply today) but it takes more work. 4.4BSD's kernel and the Solaris library function both have this limit. But I see that BSD/OS 2.1 has now been coded to avoid this limit, so it's doable, just a small matter of programming. :-) Someone should file a Solaris bug report on this, and see if it ever gets fixed. With poll, however, the user must allocate an array of pollfd structures, and pass the number of entries in this array, so there's no fundamental limit. As Casper notes, fewer systems have poll than select, so the latter is more portable. Also, with original implementations (SVR3) you could not set the descriptor to -1 to tell the kernel to ignore an entry in the pollfd structure, which made it hard to remove entries from the array; SVR4 gets around this. Personally, I always use select and rarely poll, because I port my code to BSD environments too. Someone could write an implementation of poll that uses select, for these environments, but I've never seen one. Both select and poll are being standardized by POSIX 1003.1g. II.15: How do I send [this] over a socket? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Anything other than single bytes of data will probably get mangled unless you take care. For integer values you can use htons() and friends, and strings are really just a bunch of single bytes, so those should be OK. Be careful not to send a pointer to a string though, since the pointer will be meaningless on another machine. If you need to send a struct, you should write sendthisstruct() and readthisstruct() functions for it that do all the work of taking the structure appart on one side, and putting it back together on the other. If you need to send floats, you may have a lot of work ahead of you. You should read RFC 1014 which is about portable ways of getting data from one machine to another (thanks to Andrew Gabriel for pointing this out). II.16: How do I use TCP_NODELAY? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ First off, be sure you really want to use it in the first place. It will disable the Nagle algorithm (see II.11), which will cause network traffic to increase, with smaller than needed packets wasting bandwidth. Also, from what I have been able to tell, the speed increase is very small, so you should probably do it without TCP_NODELAY first, and only turn it on if there is a problem. Here is a code example, with a warning about using it from Andrew Gierth: int flag = 1; int result = setsockopt(sock, /* socket affected */ IPPROTO_TCP, /* set option at TCP level */ TCP_NODELAY, /* name of option */ (char *) &flag, /* the cast is historical cruft */ sizeof(int)); /* length of option value */ if (result < 0) ... handle the error ... TCP_NODELAY is for a *specific* purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements). II.17: What exactly does the Nagle algorithm do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It groups together as much data as it can between ack's from the other end of the connection. I found this really confusing until Andrew Gierth drew the following diagram, and explained: This diagram is not intended to be complete, just to illustrate the point better... Case 1: client writes 1 byte per write() call. The program on host B is tcpserver.c from the FAQ examples. CLIENT SERVER APP TCP TCP APP [connection setup omitted] "h" ---------> [1 byte] ------------------> -----------> "h" [ack delayed] "e" ---------> [Nagle alg. . now in effect] . "l" ---------> [ditto] . "l" ---------> [ditto] . "o" ---------> [ditto] . "\n"---------> [ditto] . . . [ack 1 byte] <------------------ [send queued data] [5 bytes] ------------------> ------------> "ello\n" <------------ "HELLO\n" [6 bytes, ack 5 bytes] <------------------ "HELLO\n" <---- [ack delayed] . . . [ack 6 bytes] ------------------> Total segments: 5. (If TCP_NODELAY was set, could have been up to 10.) Time for response: 2*RTT, plus ack delay. Case 2: client writes all data with one write() call. CLIENT SERVER APP TCP TCP APP [connection setup omitted] "hello\n" ---> [6 bytes] ------------------> ------------> "hello\n" <------------ "HELLO\n" [6 bytes, ack 6 bytes] <------------------ "HELLO\n" <---- [ack delayed] . . . [ack 6 bytes] ------------------> Total segments: 3. Time for response = RTT (therefore minimum possible). Hope this makes things a bit clearer... Note that in case 2, you *don't* want the implementation to gratuitously delay sending the data, since that would add straight onto the response time. II.18: What is the difference between read() and recv()? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) read() is equivalent to recv() with a flags parameter of 0. Other values for the flags parameter change the behaviour of recv(). Similarly, write() is equivalent to send() with flags == 0. It is unlikely that send()/recv() would be dropped; perhaps someone with a copy of the POSIX drafts for socket calls can check... Portability note: non-unix systems may not allow read()/write() on sockets, but recv()/send() are usually ok. This is true on Windows and OS/2, for example. II.19: I see that send()/write() can generate SIGPIPE. Is there any advantage to handling the signal, rather than just ignoring it and checking for the EPIPE error? Are there any useful parameters passed to the signal catching function? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) In general, the only parameter passed to a signal handler is the signal number that caused it to be invoked. Some systems have optional additional parameters, but they are no use to you in this case. My advice is to just ignore SIGPIPE as you suggest. That's what I do in just about all of my socket code; errno values are easier to handle than signals (in fact, the first revision of the FAQ failed to mention SIGPIPE in that context; I'd got so used to ignoring it...) There is one situation where you should *not* ignore SIGPIPE; if you are going to exec() another program with stdout redirected to a socket. In this case it is probably wise to set SIGPIPE to SIG_DFL before doing the exec. II.20: I'm writing a sockets program which must be chroot() to a particular directory. But after the chroot(), calls to socket() are failing with "bad file number". (Solaris 2.4) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) On systems where sockets are implemented on top of Streams (e.g. all SysV-based systems, presumably including Solaris), the socket() function will actually be opening certain special files in /dev. You will need to create a /dev directory under your fake root and populate it with the required device nodes (only). Your system documentation may or may not specify exactly which device nodes are required; I can't help you there (sorry). -------------------------------------- Part III. Writing Client Applications -------------------------------------- III.1: How do I convert a string into an internet address? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you are reading a host's address from the command line, you may not know if you have an aaa.bbb.ccc.ddd style address, or a host.domain.com style address. What I do with these, is first try to use it as a aaa.bbb.ccc.ddd type address, and if that fails, then do a name lookup on it. Here is an example: /* Converts ascii text to in_addr struct. NULL is returned if the address can not be found. */ struct in_addr *atoaddr(char *address) { struct hostent *host; static struct in_addr saddr; /* First try it as aaa.bbb.ccc.ddd. */ saddr.s_addr = inet_addr(address); if (saddr.s_addr != -1) { return &saddr; } host = gethostbyname(address); if (host != NULL) { return (struct in_addr *) *host->h_addr_list; } return NULL; } III.2: How can my client work through a firewall/proxy server? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you are running through separate proxies for each service, you shouldn't need to do anything. If you are working through sockd, you will need to "socksify" your application. Details for doing this can be found in the package itself, which is available at: ftp://ftp.net.com/socks.cstc/socks.cstc.4.2.tar.gz you can get the socks faq at: ftp://coast.cs.purdue.edu/pub/tools/unix/socks/FAQ III.3: Why does connect() succeed even before my server did an accept()? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) Once you have done a listen() call on your socket, the kernel is primed to accept connections on it. The usual UNIX implementation of this works by *immediately* completing the SYN handshake for any incoming valid SYN segments (connection attempts), creating the socket for the new connection, and keeping this new socket on an internal queue ready for the accept() call. So the socket is fully open *before* the accept is done. The other factor in this is the 'backlog' parameter for listen(); that defines how many of these completed connections can be queued at one time. If the specified number is exceeded, then new incoming connects are simply ignored (which causes them to be retried). III.4: Why do I sometimes loose a server's address when using more than one server? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From andrewg@microlise.co.uk (Andrew Gierth): Take a careful look at struct hostent. Notice that almost everything in it is a pointer? *All* these pointers will refer to statically allocated data. For example, if you do: struct hostent *host = gethostbyname(hostname); then (as you should know) a subsequent call to gethostbyname will overwrite the structure pointed to by 'host'. But if you do: struct hostent myhost; struct hostent *hostptr = gethostbyname(hostname); if (hostptr) myhost = *host; to make a copy of the hostent before it gets overwritten, then it *still* gets clobbered by a subsequent call to gethostbyname, since although 'myhost' won't get overwritten, all the data it is pointing to will be. You can get round this by doing a proper 'deep copy' of the hostent structure, but this is tedious. My recommendation would be to extract the needed fields of the hostent and store them in your own way. III.5: How can I set the timeout for the connect() system call? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From Richard Stevens (rstevens@noao.edu): > Normally you cannot change this. Solaris does let you do this, on a > per-kernel basis with the ndd tcp_ip_abort_cinterval parameter. > > The easiest way to shorten the connect time is with an alarm around > the call to connect(). A harder way is to use select, after setting > the socket nonblocking. Also notice that you can only shorten the > connect time, there's normally no way to lengthen it. III.6 Should I bind() a port number in my client program, or let the system choose one for me on the connect() call? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) ** Let the system choose your client's port number ** The exception to this, is if the server has been written to be picky about what client ports it will allow connections from. Rlogind and rshd are the classic examples. This is usually part of a Unix-specific (and rather weak) authentication scheme; the intent is that the server allows connections only from processes with root privilege. (The weakness in the scheme is that many O/Ss (e.g. MS-DOS) allow anyone to bind any port.) The rresvport() routine exists to help out clients that are using this scheme. It basically does the equivalent of socket() + bind(), choosing a port number in the range 512..1023. If the server is not fussy about the *client's* port number, then don't try and assign it yourself in the client, just let connect() pick it for you. If, in a client, you use the naive scheme of starting at a fixed port number and calling bind() on consecutive values until it works, then you buy yourself a whole lot of trouble: The problem is if the server end of your connection does an active close. (E.G. client sends 'QUIT' command to server, server responds by closing the connection). That leaves the client end of the connection in CLOSED state, and the server end in TIME_WAIT state. So after the client exits, there is no trace of the connection on the client end. Now run the client again. It will pick the same port number, since as far as it can see, it's free. But as soon as it calls connect(), the server finds that you are trying to duplicate an existing connection (although one in TIME_WAIT). It is perfectly entitled to refuse to do this, so you get, I suspect, ECONNREFUSED from connect(). (Some systems may sometimes allow the connection anyway, but you *can't* rely on it.) This problem is *especially* dangerous because it doesn't show up unless the client and server are on *different* machines. (If they are the same machine, then the client *won't* pick the same port number as before). So you can get bitten well into the development cycle (if you do what I suspect most people do, and test client & server on the same box initially). Even if your protocol has the client closing first, there are still ways to produce this problem (e.g. kill the server). III.7: Why do I get "connection refused" when the server isn't running? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The connect() call will only block while it is waiting to establish a connection. When there is no server waiting at the other end, it gets notified that the connection can not be established, and gives up with the error message you see. This is a good thing, since if it were not the case clients might wait for ever for a service which just doesn't exist. Users would think that they were only waiting for the connection to be established, and then after a while give up, muttering something about crummy software under their breath. ------------------------------------- Part IV. Writing Server Applications ------------------------------------- IV.1: How come I get "address already in use" from bind? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You get this when the address is already in use. (Oh, you figured that much out?) The most common reason for this is that you have stopped your server, and then re-started it right away. The sockets that were used by the first incarnation of the server are still active. This is further explained in Part II, question 7, and Part IV, question 5. IV.2: Why don't my sockets close? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When you issue the close() system call, you are closing your interface to the socket, not the socket itself. It is up to the kernel to close the socket. Sometimes, for really technical reasons, the socket is kept alive for a few minutes after you close it. It is normal, for example for the socket to go into a TIME_WAIT state, on the server side, for a few minutes. People have reported ranges from 20 seconds to 4 minutes to me. The official standard says that it should be 4 minutes. On my Linux system it is about 2 minutes. This is explained in great detail in Part II question 7. IV.3: How can I make my server a daemon? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two approaches you can take here. The first is to use inetd to do all the hard work for you. The second is to do all the hard work yourself. If you use inetd, you simply use stdin, stdout, or stderr for your socket. (These three are all created with dup() from the real socket) You can use these as you would a socket in your code. The inetd process will even close the socket for you when you are done. If you wish to write your own server, there is a detailed explanation in Unix Network Programming by Richard Stevens. I also picked up this posting from comp.unix.programmer, by Nikhil Nair (nn201@cus.cam.ac.uk). > I worked all this lot out from the GNU C Library Manual (on-line > documentation). Here's some code I wrote - you can adapt it as necessary: > > > #include > #include > #include > #include > #include > #include > #include > > /* Global variables */ > ... > volatile sig_atomic_t keep_going = 1; /* controls program termination */ > > > /* Function prototypes: */ > ... > void termination_handler (int signum); /* clean up before termination */ > > > int > main (void) > { > ... > > if (chdir (HOME_DIR)) /* change to directory containing data > files */ > { > fprintf (stderr, "`%s': ", HOME_DIR); > perror (NULL); > exit (1); > } > > /* Become a daemon: */ > switch (fork ()) > { > case -1: /* can't fork */ > perror ("fork()"); > exit (3); > case 0: /* child, process becomes a daemon: */ > close (STDIN_FILENO); > close (STDOUT_FILENO); > close (STDERR_FILENO); > if (setsid () == -1) /* request a new session (job control) */ > { > exit (4); > } > break; > default: /* parent returns to calling process: */ > return 0; > } > > /* Establish signal handler to clean up before termination: */ > if (signal (SIGTERM, termination_handler) == SIG_IGN) > signal (SIGTERM, SIG_IGN); > signal (SIGINT, SIG_IGN); > signal (SIGHUP, SIG_IGN); > > /* Main program loop */ > while (keep_going) > { > ... > } > > return 0; > } > > > void > termination_handler (int signum) > { > keep_going = 0; > signal (signum, termination_handler); > } IV.4: How can I listen on more than one port at a time? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The best way to do this is with the select() call. This tells the kernel to let you know when a socket is available for use. You can have one process do i/o with multiple sockets with this call. If you want to wait for a connect on sockets 4, 6 and 10 you might execute the following code snippet: -------------------------- fd_set socklist; FD_ZERO(&socklist); /* Always clear the structure first. */ FD_SET(4, &socklist); FD_SET(6, &socklist); FD_SET(10, &socklist); if (select(11, NULL, &socklist, NULL, NULL) < 0) perror("select"); -------------------------- The kernel will notify us as soon as a file descriptor which is less than 11 (the first parameter to select), and is a member of our socklist becomes available for writing. See the man page on select for more details. IV.5: What exactly does SO_REUSEADDR do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This socket option tells the kernel that even if this port is busy, go ahead and reuse it anyway. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port. You should be aware that if any unexpected data comes in, it may confuse your server, but while this is possible, it is not likely. It has been pointed out that "A socket is a 5 tuple . SO_REUSEADDR just says that you can reuse local addresses. The 5 tuple still must be unique!" by Michael Hunter (mphunter@qnx.com). This is true, and this is why it is very unlikely that unexpected data will ever be seen by your server. The danger is that such a 5 tuple is still floating around on the net, and while it is bouncing around, a new connection from the same client, on the same system, happens to get the same remote port. This is explained by Richard Stevens in II.7. II.6: What exactly does SO_LINGER do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ On some unixes this does nothing. On others, it instructs the kernel to abort tcp connections instead of closing them properly. This can be dangerous. If you are not clear on this, see Part II, question 7. IV.7: What exactly does SO_KEEPALIVE do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) The SO_KEEPALIVE option causes a packet (called a 'keepalive probe') to be sent to the remote system if a long time (by default, more than 2 hours) passes with no other data being sent or received. This packet is designed to provoke an ACK response from the peer. This enables detection of a peer which has become unreachable (e.g. powered off or disconnected from the net). See II.8 for further discussion. Note that the figure of 2 hours comes from RFC1122, "Requirements for Internet Hosts". The precise value should be configurable, but I've often found this to be difficult. The only implementation I know of that allows the keepalive interval to be set per-connection is SVR4.2. IV.8: How can I bind() to a port number < 1024? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) The restriction on access to ports < 1024 is part of a (fairly weak) security scheme particular to UNIX. The intention is that servers (for example rlogind, rshd) can check the port number of the client, and if it is < 1024, assume the request has been properly authorised at the client end. The practical upshot of this, is that binding a port number < 1024 is reserved to processes having an effective UID == root. This can, occasionally, itself present a security problem, e.g. when a server process needs to bind a well-known port, but does *not* itself need root access (news servers, for example). This is often solved by creating a small program which simply binds the socket, then restores the real userid and exec()s the real server. This program can then be made setuid root. IV.9 How do I get my server to find out the client's address / hostname? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) After accept()ing a connection, use getpeername() to get the address of the client. To get the hostname, see the next question IV.10. The client's address is of course, also returned on the accept(), but it is essential to initialise the address-length parameter before the accept call for this will work. IV.10 How do I use the gethostbyaddr() function? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (From Andrew:) Many people are confused by the fact that the address parameter to this function is declared as char*. That *doesn't* mean it's a character string representation of the address! The first parameter should really have been declared as void*, not char*; but the functions probably precede this extension to the C language. If you are using AF_INET addresses, then you should use a 'struct in_addr *', cast to a 'char*', as in the following example: struct sockaddr_in addr; struct hostent *host; ... host = gethostbyaddr((char *) &addr.sin_addr, sizeof(addr.sin_addr), AF_INET); IV.11: How should I choose a port number for my server? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The list of registered port assignments can be found in STD 2 or RFC 1700. Choose one that isn't already registered, and isn't in /etc/services on your system. It is also a good idea to let users customize the port number in case of conflicts with other un-registered port numbers in other servers. The best way of doing this is hardcoding a service name, and using getservbyname() to lookup the actual port number. This method allows users to change the port your server binds to by simply editing the /etc/services file. IV.12: What is the difference between SO_REUSEADDR and SO_REUSEPORT? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SO_REUSEADDR allows your server to bind to an address which is in a TIME_WAIT state. It does not allow more than one server to bind to the same address. It was mentioned that use of this flag can create a security risk because another server can bind to a the same port, by binding to a specific address as opposed to INADDR_ANY. The SO_REUSEPORT flag allows multiple processes to bind to the same address provided all of them use the SO_REUSEPORT option. Richard Stevens explains: This is a newer flag that appeared in the 4.4BSD multicasting code (although that code was from elsewhere, so I am not sure just who invented the new SO_REUSEPORT flag). What this flag lets you do is rebind a port that is already in use, but only if all users of the port specify the flag. I believe the intent is for multicasting apps, since if you're running the same app on a host, all need to bind the same port. But the flag may have other uses. For example the following is from a post in February: + SO_REUSEPORT is also useful for eliminating the try-10-times-to-bind + hack in ftpd's data connection setup routine. Without SO_REUSEPORT, + only one ftpd thread can bind to TCP in + preparation for connecting back to the client. Under conditions of + heavy load, there are more threads colliding here than the try-10-times + hack can accomodate. With SO_REUSEPORT, things work nicely and the + hack becomes unnecessary. + + Stu Friedberg (stuartf@sequent.com) I have also heard that DEC OSF supports the flag. Also note that under 4.4BSD, if you are binding a multicast address, then SO_REUSEADDR is condisered the same as SO_REUSEPORT (p. 731 of "TCP/IP Illustrated, Volume 2"). I think under Solaris you just replace SO_REUSEPORT with SO_REUSEADDR. ------------------------------- Appendix A. Sample Source Code ------------------------------- The sample source code is no longer included in the faq. To get it, please download it from one of the unix-socket-faq www pages: http://www.auroraonline.com/sock-faq http://kipper.york.ac.uk/~vic/sock-faq If you don't have web access, you can ftp it with ftpmail by following the following instructions. Please do not use the ftp server if you have access to the web, since computain.com is connected only by a 28.8 modem, and you'd be amazed how much traffic this faq generates. To get the sample source by mail, send mail to ftpmail@decwrl.dec.com, with no subject line and a body like this: reply connect ftp.computain.com binary uuencode get pub/sockets/examples.tar.gz quit Save the reply as examples.uu, and type: % uudecode examples.uu % gunzip examples.tar.gz % tar xf examples.tar This will create a directory called socket-faq-examples which contains the sample code from this faq, plus a sample client and server for both tcp and udp. Note that this package requires the gnu unzip program to be installed on your system. It is very common, but if you don't have it you can get the source for it from: ftp://prep.ai.mit.edu/pub/gnu/gzip-1.2.4.tar If you don't have ftp access, you can obtain it in a way similar to obtaining the sample source. I'll leave the exact changes to the body of the message as an excersise for the reader. .