From nobody@FreeBSD.org  Mon Sep  2 04:49:49 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTP id D8897A03
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  2 Sep 2013 04:49:49 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from oldred.freebsd.org (oldred.freebsd.org [8.8.178.121])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id B53F62D3E
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  2 Sep 2013 04:49:49 +0000 (UTC)
Received: from oldred.freebsd.org ([127.0.1.6])
	by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id r824nn8g040399
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 2 Sep 2013 04:49:49 GMT
	(envelope-from nobody@oldred.freebsd.org)
Received: (from nobody@localhost)
	by oldred.freebsd.org (8.14.5/8.14.5/Submit) id r824nnJq040370;
	Mon, 2 Sep 2013 04:49:49 GMT
	(envelope-from nobody)
Message-Id: <201309020449.r824nnJq040370@oldred.freebsd.org>
Date: Mon, 2 Sep 2013 04:49:49 GMT
From: Yuri <yuri@rawbw.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [PATCH] Packet loss when 'control' messages are present with large data (sendmsg(2))
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         181741
>Category:       kern
>Synopsis:       [kernel] [patch] Packet loss when 'control' messages are present with large data (sendmsg(2))
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-net
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Sep 02 04:50:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Thu Feb 20 17:30:00 UTC 2014
>Originator:     Yuri
>Release:        10 and 9.1
>Organization:
n/a
>Environment:
>Description:
There is the case when sendmsg(2) silently loses packets for AF_LOCAL domain when large packets with control part in them are sent.

* Problem Description
There is the watermark limit on sockbuf determined by net.local.stream.sendspace, default is 8192 bytes (field sockbuf.sb_hiwat).
When sendmsg(2) sends large enough data (8K+ that hits this 8192 limit) with control message, sosend_generic will be cutting the message data into separate mbufs based on 'sbspace' (derived from the above-mentioned sb_hiwat limit) with adjustment for control message size as it sees it. This way it tries to make sure this sb_hiwat limit is enforced.

However, down on uipc level control message is being further modified in two ways: unp_internalize modifies it into some 'internal' form, also unp_addsockcred function adds another control message when LOCAL_CREDS are requested by client. Both functions only increase control message size beyond its original size (seen by sosend_generic). So that the first final mbuf sent (concatenation of control and data) will always be larger than 'sbspace' limit that sosend_generic was cutting data for.

There is also the function sbappendcontrol_locked. It checks the 'sbspace' limit again, and discards the packet when sbspace llimit is exceeded. Its result code is essentially ignored in uipc_send. I believe, sbappendcontrol_locked shouldn't be checking space at all, since packets are expected to be properly sized to begin with. But this won't be the right fix, since sizes would be exceeding the sbspace limit anyway.

sosend_default is one level up over uipc level, and it doesn't know what uipc will do with control message. Therefore it can't know what the real adjustment for control message is needed (to properly cut data). It wrongly takes the original control size and this makes the first packet too large and discarded by sbappendcontrol_locked.

* Patch synopsys
- Added the new function into struct pr_usrreqs:
int     (*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf **pcontrol);
It is called by sosend_generic to update control message to its final form.

- Removed 'sbspace' check from sbappendcontrol_locked. The only context it is called from is uipc_send, and all packet sizes are already conforming.

- Fixed few wrong error codes relevant to this situation

>How-To-Repeat:

>Fix:


Patch attached with submission follows:

Index: kern/uipc_sockbuf.c
===================================================================
--- kern/uipc_sockbuf.c	(revision 255139)
+++ kern/uipc_sockbuf.c	(working copy)
@@ -688,23 +688,16 @@
 	return (retval);
 }
 
-int
+void
 sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
     struct mbuf *control)
 {
-	struct mbuf *m, *n, *mlast;
-	int space;
+	struct mbuf *m, *mlast;
 
 	SOCKBUF_LOCK_ASSERT(sb);
 
-	if (control == 0)
-		panic("sbappendcontrol_locked");
-	space = m_length(control, &n) + m_length(m0, NULL);
+	m_last(control)->m_next = m0;		/* concatenate data to control */
 
-	if (space > sbspace(sb))
-		return (0);
-	n->m_next = m0;			/* concatenate data to control */
-
 	SBLASTRECORDCHK(sb);
 
 	for (m = control; m->m_next; m = m->m_next)
@@ -717,18 +710,14 @@
 	SBLASTMBUFCHK(sb);
 
 	SBLASTRECORDCHK(sb);
-	return (1);
 }
 
-int
+void
 sbappendcontrol(struct sockbuf *sb, struct mbuf *m0, struct mbuf *control)
 {
-	int retval;
-
 	SOCKBUF_LOCK(sb);
-	retval = sbappendcontrol_locked(sb, m0, control);
+	sbappendcontrol_locked(sb, m0, control);
 	SOCKBUF_UNLOCK(sb);
-	return (retval);
 }
 
 /*
Index: kern/uipc_socket.c
===================================================================
--- kern/uipc_socket.c	(revision 255139)
+++ kern/uipc_socket.c	(working copy)
@@ -1102,6 +1102,11 @@
 	KASSERT(so->so_proto->pr_flags & PR_ATOMIC,
 	    ("sosend_dgram: !PR_ATOMIC"));
 
+	if (so->so_proto->pr_usrreqs->pru_finalizecontrol &&
+	    (error = (*so->so_proto->pr_usrreqs->pru_finalizecontrol)(so,
+	        flags, &control, td)))
+		goto out;
+
 	if (uio != NULL)
 		resid = uio->uio_resid;
 	else
@@ -1168,6 +1173,10 @@
 	space = sbspace(&so->so_snd);
 	if (flags & MSG_OOB)
 		space += 1024;
+	if (clen > space) {
+		error = EMSGSIZE;
+		goto out;
+	}
 	space -= clen;
 	SOCKBUF_UNLOCK(&so->so_snd);
 	if (resid > space) {
@@ -1269,6 +1278,11 @@
 	int clen = 0, error, dontroute;
 	int atomic = sosendallatonce(so) || top;
 
+	if (so->so_proto->pr_usrreqs->pru_finalizecontrol &&
+	    (error = (*so->so_proto->pr_usrreqs->pru_finalizecontrol)(so,
+	        flags, &control, td)))
+		goto out;
+
 	if (uio != NULL)
 		resid = uio->uio_resid;
 	else
@@ -1361,6 +1375,10 @@
 			goto restart;
 		}
 		SOCKBUF_UNLOCK(&so->so_snd);
+		if (clen > space) {
+			error = EMSGSIZE;
+			goto release;
+		}
 		space -= clen;
 		do {
 			if (uio == NULL) {
@@ -1426,8 +1444,11 @@
 				so->so_options &= ~SO_DONTROUTE;
 				SOCK_UNLOCK(so);
 			}
-			clen = 0;
-			control = NULL;
+			if (control) {
+				control = NULL;
+				space += clen;
+				clen = 0;
+			}
 			top = NULL;
 			if (error)
 				goto release;
Index: kern/uipc_usrreq.c
===================================================================
--- kern/uipc_usrreq.c	(revision 255139)
+++ kern/uipc_usrreq.c	(working copy)
@@ -290,7 +290,7 @@
 static void	unp_internalize_fp(struct file *);
 static int	unp_externalize(struct mbuf *, struct mbuf **, int);
 static int	unp_externalize_fp(struct file *);
-static struct mbuf	*unp_addsockcred(struct thread *, struct mbuf *);
+static int	unp_addsockcred(struct mbuf **, struct thread *);
 static void	unp_process_defers(void * __unused, int);
 
 /*
@@ -782,6 +782,47 @@
 }
 
 static int
+uipc_finalizecontrol(struct socket *so, int flags, struct mbuf **pcontrol,
+    struct thread *td)
+{
+	struct unpcb *unp, *unp2;
+	int error = 0;
+
+	unp = sotounpcb(so);
+	KASSERT(unp != NULL, ("uipc_finalizecontrol: unp == NULL"));
+
+	UNP_PCB_LOCK(unp);
+	unp2 = unp->unp_conn;
+
+	if (*pcontrol != NULL && (error = unp_internalize(pcontrol, td))) {
+		UNP_PCB_UNLOCK(unp);
+		return (error);
+	}
+
+	/* Lockless read. */
+	if (unp2->unp_flags & UNP_WANTCRED) {
+		switch (so->so_type) {
+		case SOCK_SEQPACKET:
+		case SOCK_STREAM:
+			/* Credentials are passed only once on SOCK_STREAM. */
+			UNP_PCB_LOCK(unp2);
+			if (unp2->unp_flags & UNP_WANTCRED) {
+				unp2->unp_flags &= ~UNP_WANTCRED;
+				error = unp_addsockcred(pcontrol, td);
+			}
+			UNP_PCB_UNLOCK(unp2);
+			break;
+		case SOCK_DGRAM:
+			error = unp_addsockcred(pcontrol, td);
+			break;
+		}
+	}
+
+	UNP_PCB_UNLOCK(unp);
+	return (error);
+}
+
+static int
 uipc_rcvd(struct socket *so, int flags)
 {
 	struct unpcb *unp, *unp2;
@@ -845,8 +886,6 @@
 		error = EOPNOTSUPP;
 		goto release;
 	}
-	if (control != NULL && (error = unp_internalize(&control, td)))
-		goto release;
 	if ((nam != NULL) || (flags & PRUS_EOF))
 		UNP_LINK_WLOCK();
 	else
@@ -880,9 +919,6 @@
 			error = ENOTCONN;
 			break;
 		}
-		/* Lockless read. */
-		if (unp2->unp_flags & UNP_WANTCRED)
-			control = unp_addsockcred(td, control);
 		UNP_PCB_LOCK(unp);
 		if (unp->unp_addr != NULL)
 			from = (struct sockaddr *)unp->unp_addr;
@@ -949,14 +985,6 @@
 		so2 = unp2->unp_socket;
 		UNP_PCB_LOCK(unp2);
 		SOCKBUF_LOCK(&so2->so_rcv);
-		if (unp2->unp_flags & UNP_WANTCRED) {
-			/*
-			 * Credentials are passed only once on SOCK_STREAM
-			 * and SOCK_SEQPACKET.
-			 */
-			unp2->unp_flags &= ~UNP_WANTCRED;
-			control = unp_addsockcred(td, control);
-		}
 		/*
 		 * Send to paired receive port, and then reduce send buffer
 		 * hiwater marks to maintain backpressure.  Wake up readers.
@@ -964,9 +992,8 @@
 		switch (so->so_type) {
 		case SOCK_STREAM:
 			if (control != NULL) {
-				if (sbappendcontrol_locked(&so2->so_rcv, m,
-				    control))
-					control = NULL;
+				sbappendcontrol_locked(&so2->so_rcv, m, control);
+				control = NULL;
 			} else
 				sbappend_locked(&so2->so_rcv, m);
 			break;
@@ -1114,6 +1141,7 @@
 	.pru_disconnect =	uipc_disconnect,
 	.pru_listen =		uipc_listen,
 	.pru_peeraddr =		uipc_peeraddr,
+	.pru_finalizecontrol =	uipc_finalizecontrol,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
 	.pru_sense =		uipc_sense,
@@ -1136,6 +1164,7 @@
 	.pru_disconnect =	uipc_disconnect,
 	.pru_listen =		uipc_listen,
 	.pru_peeraddr =		uipc_peeraddr,
+	.pru_finalizecontrol =	uipc_finalizecontrol,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
 	.pru_sense =		uipc_sense,
@@ -1158,6 +1187,7 @@
 	.pru_disconnect =	uipc_disconnect,
 	.pru_listen =		uipc_listen,
 	.pru_peeraddr =		uipc_peeraddr,
+	.pru_finalizecontrol =	uipc_finalizecontrol,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
 	.pru_sense =		uipc_sense,
@@ -1747,7 +1777,7 @@
 			    SCM_RIGHTS, SOL_SOCKET);
 			if (*controlp == NULL) {
 				FILEDESC_XUNLOCK(fdesc);
-				error = E2BIG;
+				error = ENOBUFS;
 				unp_freerights(fdep, newfds);
 				goto next;
 			}
@@ -1928,7 +1958,7 @@
 			    SCM_RIGHTS, SOL_SOCKET);
 			if (*controlp == NULL) {
 				FILEDESC_SUNLOCK(fdesc);
-				error = E2BIG;
+				error = ENOBUFS;
 				goto out;
 			}
 			fdp = data;
@@ -1992,9 +2022,10 @@
 	return (error);
 }
 
-static struct mbuf *
-unp_addsockcred(struct thread *td, struct mbuf *control)
+static int
+unp_addsockcred(struct mbuf **pcontrol, struct thread *td)
 {
+	struct mbuf *control = *pcontrol;
 	struct mbuf *m, *n, *n_prev;
 	struct sockcred *sc;
 	const struct cmsghdr *cm;
@@ -2004,7 +2035,7 @@
 	ngroups = MIN(td->td_ucred->cr_ngroups, CMGROUP_MAX);
 	m = sbcreatecontrol(NULL, SOCKCREDSIZE(ngroups), SCM_CREDS, SOL_SOCKET);
 	if (m == NULL)
-		return (control);
+		return (ENOBUFS);
 
 	sc = (struct sockcred *) CMSG_DATA(mtod(m, struct cmsghdr *));
 	sc->sc_uid = td->td_ucred->cr_ruid;
@@ -2038,7 +2069,8 @@
 
 	/* Prepend it to the head. */
 	m->m_next = control;
-	return (m);
+	*pcontrol = m;
+	return (0);
 }
 
 static struct unpcb *
Index: sys/protosw.h
===================================================================
--- sys/protosw.h	(revision 255139)
+++ sys/protosw.h	(working copy)
@@ -201,6 +201,8 @@
 	int	(*pru_listen)(struct socket *so, int backlog,
 		    struct thread *td);
 	int	(*pru_peeraddr)(struct socket *so, struct sockaddr **nam);
+	int	(*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf **pcontrol,
+		    struct thread *td);
 	int	(*pru_rcvd)(struct socket *so, int flags);
 	int	(*pru_rcvoob)(struct socket *so, struct mbuf *m, int flags);
 	int	(*pru_send)(struct socket *so, int flags, struct mbuf *m,
Index: sys/sockbuf.h
===================================================================
--- sys/sockbuf.h	(revision 255139)
+++ sys/sockbuf.h	(working copy)
@@ -127,9 +127,9 @@
 	    struct mbuf *m0, struct mbuf *control);
 int	sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa,
 	    struct mbuf *m0, struct mbuf *control);
-int	sbappendcontrol(struct sockbuf *sb, struct mbuf *m0,
+void	sbappendcontrol(struct sockbuf *sb, struct mbuf *m0,
 	    struct mbuf *control);
-int	sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
+void	sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
 	    struct mbuf *control);
 void	sbappendrecord(struct sockbuf *sb, struct mbuf *m0);
 void	sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0);


>Release-Note:
>Audit-Trail:

From: Yuri <yuri@rawbw.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181741: [PATCH] Packet loss when 'control' messages are
 present with large data (sendmsg(2))
Date: Mon, 02 Sep 2013 01:56:33 -0700

 Please hold off from checking this in.
 I will submit the updated patch.
 
 Yuri

From: Yuri <yuri@rawbw.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181741: [PATCH] Packet loss when 'control' messages are
 present with large data (sendmsg(2))
Date: Mon, 02 Sep 2013 23:03:41 -0700

 This is a multi-part message in MIME format.
 --------------060206060307010809000308
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Here is the updated patch.
 
 --------------060206060307010809000308
 Content-Type: text/plain; charset=UTF-8;
  name="patch-10-net-control-loss-3.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="patch-10-net-control-loss-3.patch"
 
 Index: kern/uipc_sockbuf.c
 ===================================================================
 --- kern/uipc_sockbuf.c	(revision 255139)
 +++ kern/uipc_sockbuf.c	(working copy)
 @@ -688,23 +688,16 @@
  	return (retval);
  }
  
 -int
 +void
  sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
      struct mbuf *control)
  {
 -	struct mbuf *m, *n, *mlast;
 -	int space;
 +	struct mbuf *m, *mlast;
  
  	SOCKBUF_LOCK_ASSERT(sb);
  
 -	if (control == 0)
 -		panic("sbappendcontrol_locked");
 -	space = m_length(control, &n) + m_length(m0, NULL);
 +	m_last(control)->m_next = m0;		/* concatenate data to control */
  
 -	if (space > sbspace(sb))
 -		return (0);
 -	n->m_next = m0;			/* concatenate data to control */
 -
  	SBLASTRECORDCHK(sb);
  
  	for (m = control; m->m_next; m = m->m_next)
 @@ -717,18 +710,14 @@
  	SBLASTMBUFCHK(sb);
  
  	SBLASTRECORDCHK(sb);
 -	return (1);
  }
  
 -int
 +void
  sbappendcontrol(struct sockbuf *sb, struct mbuf *m0, struct mbuf *control)
  {
 -	int retval;
 -
  	SOCKBUF_LOCK(sb);
 -	retval = sbappendcontrol_locked(sb, m0, control);
 +	sbappendcontrol_locked(sb, m0, control);
  	SOCKBUF_UNLOCK(sb);
 -	return (retval);
  }
  
  /*
 Index: kern/uipc_socket.c
 ===================================================================
 --- kern/uipc_socket.c	(revision 255139)
 +++ kern/uipc_socket.c	(working copy)
 @@ -1102,6 +1102,11 @@
  	KASSERT(so->so_proto->pr_flags & PR_ATOMIC,
  	    ("sosend_dgram: !PR_ATOMIC"));
  
 +	if (so->so_proto->pr_usrreqs->pru_finalizecontrol &&
 +	    (error = (*so->so_proto->pr_usrreqs->pru_finalizecontrol)(so,
 +	        flags, &control, td)))
 +		goto out;
 +
  	if (uio != NULL)
  		resid = uio->uio_resid;
  	else
 @@ -1166,10 +1171,14 @@
  	 * problem and need fixing.
  	 */
  	space = sbspace(&so->so_snd);
 +	SOCKBUF_UNLOCK(&so->so_snd);
  	if (flags & MSG_OOB)
  		space += 1024;
 +	if (clen > space) {
 +		error = EMSGSIZE;
 +		goto out;
 +	}
  	space -= clen;
 -	SOCKBUF_UNLOCK(&so->so_snd);
  	if (resid > space) {
  		error = EMSGSIZE;
  		goto out;
 @@ -1269,6 +1278,11 @@
  	int clen = 0, error, dontroute;
  	int atomic = sosendallatonce(so) || top;
  
 +	if (so->so_proto->pr_usrreqs->pru_finalizecontrol &&
 +	    (error = (*so->so_proto->pr_usrreqs->pru_finalizecontrol)(so,
 +	        flags, &control, td)))
 +		goto out;
 +
  	if (uio != NULL)
  		resid = uio->uio_resid;
  	else
 @@ -1361,6 +1375,10 @@
  			goto restart;
  		}
  		SOCKBUF_UNLOCK(&so->so_snd);
 +		if (clen > space) {
 +			error = EMSGSIZE;
 +			goto release;
 +		}
  		space -= clen;
  		do {
  			if (uio == NULL) {
 Index: kern/uipc_usrreq.c
 ===================================================================
 --- kern/uipc_usrreq.c	(revision 255139)
 +++ kern/uipc_usrreq.c	(working copy)
 @@ -290,7 +290,7 @@
  static void	unp_internalize_fp(struct file *);
  static int	unp_externalize(struct mbuf *, struct mbuf **, int);
  static int	unp_externalize_fp(struct file *);
 -static struct mbuf	*unp_addsockcred(struct thread *, struct mbuf *);
 +static int	unp_addsockcred(struct mbuf **, struct thread *);
  static void	unp_process_defers(void * __unused, int);
  
  /*
 @@ -782,6 +782,43 @@
  }
  
  static int
 +uipc_finalizecontrol(struct socket *so, int flags, struct mbuf **pcontrol,
 +    struct thread *td)
 +{
 +	struct unpcb *unp, *unp2;
 +	int error = 0;
 +
 +	unp = sotounpcb(so);
 +	KASSERT(unp != NULL, ("uipc_finalizecontrol: unp == NULL"));
 +
 +	unp2 = unp->unp_conn;
 +
 +	if (*pcontrol != NULL && (error = unp_internalize(pcontrol, td)))
 +		return (error);
 +
 +	/* Lockless read, ignore when not connected. */
 +	if (unp2 && unp2->unp_flags & UNP_WANTCRED) {
 +		switch (so->so_type) {
 +		case SOCK_SEQPACKET:
 +		case SOCK_STREAM:
 +			/* Credentials are passed only once on streams */
 +			UNP_PCB_LOCK(unp2);
 +			if (unp2->unp_flags & UNP_WANTCRED) {
 +				unp2->unp_flags &= ~UNP_WANTCRED;
 +				error = unp_addsockcred(pcontrol, td);
 +			}
 +			UNP_PCB_UNLOCK(unp2);
 +			break;
 +		case SOCK_DGRAM:
 +			error = unp_addsockcred(pcontrol, td);
 +			break;
 +		}
 +	}
 +
 +	return (error);
 +}
 +
 +static int
  uipc_rcvd(struct socket *so, int flags)
  {
  	struct unpcb *unp, *unp2;
 @@ -845,8 +882,6 @@
  		error = EOPNOTSUPP;
  		goto release;
  	}
 -	if (control != NULL && (error = unp_internalize(&control, td)))
 -		goto release;
  	if ((nam != NULL) || (flags & PRUS_EOF))
  		UNP_LINK_WLOCK();
  	else
 @@ -880,9 +915,6 @@
  			error = ENOTCONN;
  			break;
  		}
 -		/* Lockless read. */
 -		if (unp2->unp_flags & UNP_WANTCRED)
 -			control = unp_addsockcred(td, control);
  		UNP_PCB_LOCK(unp);
  		if (unp->unp_addr != NULL)
  			from = (struct sockaddr *)unp->unp_addr;
 @@ -949,14 +981,6 @@
  		so2 = unp2->unp_socket;
  		UNP_PCB_LOCK(unp2);
  		SOCKBUF_LOCK(&so2->so_rcv);
 -		if (unp2->unp_flags & UNP_WANTCRED) {
 -			/*
 -			 * Credentials are passed only once on SOCK_STREAM
 -			 * and SOCK_SEQPACKET.
 -			 */
 -			unp2->unp_flags &= ~UNP_WANTCRED;
 -			control = unp_addsockcred(td, control);
 -		}
  		/*
  		 * Send to paired receive port, and then reduce send buffer
  		 * hiwater marks to maintain backpressure.  Wake up readers.
 @@ -964,9 +988,9 @@
  		switch (so->so_type) {
  		case SOCK_STREAM:
  			if (control != NULL) {
 -				if (sbappendcontrol_locked(&so2->so_rcv, m,
 -				    control))
 -					control = NULL;
 +				sbappendcontrol_locked(&so2->so_rcv, m,
 +				    control);
 +				control = NULL;
  			} else
  				sbappend_locked(&so2->so_rcv, m);
  			break;
 @@ -981,6 +1005,7 @@
  			break;
  			}
  		}
 +		m = NULL;
  
  		/*
  		 * XXXRW: While fine for SOCK_STREAM, this conflates maximum
 @@ -1004,7 +1029,6 @@
  		SOCKBUF_UNLOCK(&so->so_snd);
  		unp2->unp_cc = sbcc;
  		UNP_PCB_UNLOCK(unp2);
 -		m = NULL;
  		break;
  
  	default:
 @@ -1114,6 +1138,7 @@
  	.pru_disconnect =	uipc_disconnect,
  	.pru_listen =		uipc_listen,
  	.pru_peeraddr =		uipc_peeraddr,
 +	.pru_finalizecontrol =	uipc_finalizecontrol,
  	.pru_rcvd =		uipc_rcvd,
  	.pru_send =		uipc_send,
  	.pru_sense =		uipc_sense,
 @@ -1136,6 +1161,7 @@
  	.pru_disconnect =	uipc_disconnect,
  	.pru_listen =		uipc_listen,
  	.pru_peeraddr =		uipc_peeraddr,
 +	.pru_finalizecontrol =	uipc_finalizecontrol,
  	.pru_rcvd =		uipc_rcvd,
  	.pru_send =		uipc_send,
  	.pru_sense =		uipc_sense,
 @@ -1158,6 +1184,7 @@
  	.pru_disconnect =	uipc_disconnect,
  	.pru_listen =		uipc_listen,
  	.pru_peeraddr =		uipc_peeraddr,
 +	.pru_finalizecontrol =	uipc_finalizecontrol,
  	.pru_rcvd =		uipc_rcvd,
  	.pru_send =		uipc_send,
  	.pru_sense =		uipc_sense,
 @@ -1747,7 +1774,7 @@
  			    SCM_RIGHTS, SOL_SOCKET);
  			if (*controlp == NULL) {
  				FILEDESC_XUNLOCK(fdesc);
 -				error = E2BIG;
 +				error = ENOBUFS;
  				unp_freerights(fdep, newfds);
  				goto next;
  			}
 @@ -1928,7 +1955,7 @@
  			    SCM_RIGHTS, SOL_SOCKET);
  			if (*controlp == NULL) {
  				FILEDESC_SUNLOCK(fdesc);
 -				error = E2BIG;
 +				error = ENOBUFS;
  				goto out;
  			}
  			fdp = data;
 @@ -1992,9 +2019,10 @@
  	return (error);
  }
  
 -static struct mbuf *
 -unp_addsockcred(struct thread *td, struct mbuf *control)
 +static int
 +unp_addsockcred(struct mbuf **pcontrol, struct thread *td)
  {
 +	struct mbuf *control = *pcontrol;
  	struct mbuf *m, *n, *n_prev;
  	struct sockcred *sc;
  	const struct cmsghdr *cm;
 @@ -2004,7 +2032,7 @@
  	ngroups = MIN(td->td_ucred->cr_ngroups, CMGROUP_MAX);
  	m = sbcreatecontrol(NULL, SOCKCREDSIZE(ngroups), SCM_CREDS, SOL_SOCKET);
  	if (m == NULL)
 -		return (control);
 +		return (ENOBUFS);
  
  	sc = (struct sockcred *) CMSG_DATA(mtod(m, struct cmsghdr *));
  	sc->sc_uid = td->td_ucred->cr_ruid;
 @@ -2038,7 +2066,8 @@
  
  	/* Prepend it to the head. */
  	m->m_next = control;
 -	return (m);
 +	*pcontrol = m;
 +	return (0);
  }
  
  static struct unpcb *
 Index: sys/protosw.h
 ===================================================================
 --- sys/protosw.h	(revision 255139)
 +++ sys/protosw.h	(working copy)
 @@ -201,6 +201,8 @@
  	int	(*pru_listen)(struct socket *so, int backlog,
  		    struct thread *td);
  	int	(*pru_peeraddr)(struct socket *so, struct sockaddr **nam);
 +	int	(*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf **pcontrol,
 +		    struct thread *td);
  	int	(*pru_rcvd)(struct socket *so, int flags);
  	int	(*pru_rcvoob)(struct socket *so, struct mbuf *m, int flags);
  	int	(*pru_send)(struct socket *so, int flags, struct mbuf *m,
 Index: sys/sockbuf.h
 ===================================================================
 --- sys/sockbuf.h	(revision 255139)
 +++ sys/sockbuf.h	(working copy)
 @@ -127,9 +127,9 @@
  	    struct mbuf *m0, struct mbuf *control);
  int	sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa,
  	    struct mbuf *m0, struct mbuf *control);
 -int	sbappendcontrol(struct sockbuf *sb, struct mbuf *m0,
 +void	sbappendcontrol(struct sockbuf *sb, struct mbuf *m0,
  	    struct mbuf *control);
 -int	sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
 +void	sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
  	    struct mbuf *control);
  void	sbappendrecord(struct sockbuf *sb, struct mbuf *m0);
  void	sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0);
 
 --------------060206060307010809000308--

From: Yuri <yuri@rawbw.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181741: [PATCH] Packet loss when &#39;control&#39; messages
 are present with large data (sendmsg(2))
Date: Mon, 02 Sep 2013 23:45:45 -0700

 I originally came across this issue while running tmux (terminal 
 multiplexer) under linuxlator. For some reason, tmux under linuxlator 
 sends data into the local socket in larger portions on Linux, control 
 message was included, and this exposed this bug. FreeBSD version of tmux 
 doesn't expose this problem.
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Sep 10 01:39:52 UTC 2013 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=181741 

From: Gleb Smirnoff <glebius@freebsd.org>
To: Yuri <yuri@rawbw.com>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/181741
Date: Thu, 3 Oct 2013 02:26:49 +0400

 --O5XBE6gyVG5Rl6Rj
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
   I've made a test case for this problem.
   Patch for unix_passfd test from our test suite.
 
 -- 
 Totus tuus, Glebius.
 
 --O5XBE6gyVG5Rl6Rj
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename="unix_passfd.c.diff"
 
 Index: tools/regression/sockets/unix_passfd/unix_passfd.c
 ===================================================================
 --- tools/regression/sockets/unix_passfd/unix_passfd.c	(revision 256008)
 +++ tools/regression/sockets/unix_passfd/unix_passfd.c	(working copy)
 @@ -29,11 +29,14 @@
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <sys/stat.h>
 +#include <sys/sysctl.h>
 +#include <sys/un.h>
  
  #include <err.h>
  #include <fcntl.h>
  #include <limits.h>
  #include <stdio.h>
 +#include <stdlib.h>
  #include <string.h>
  #include <unistd.h>
  
 @@ -106,11 +109,10 @@ samefile(const char *test, struct stat *sb1, struc
  }
  
  static void
 -sendfd(const char *test, int sockfd, int sendfd)
 +sendfd_payload(const char *test, int sockfd, int sendfd,
 +    void *payload, size_t paylen)
  {
  	struct iovec iovec;
 -	char ch;
 -
  	char message[CMSG_SPACE(sizeof(int))];
  	struct cmsghdr *cmsghdr;
  	struct msghdr msghdr;
 @@ -118,13 +120,12 @@ static void
  
  	bzero(&msghdr, sizeof(msghdr));
  	bzero(&message, sizeof(message));
 -	ch = 0;
  
  	msghdr.msg_control = message;
  	msghdr.msg_controllen = sizeof(message);
  
 -	iovec.iov_base = &ch;
 -	iovec.iov_len = sizeof(ch);
 +	iovec.iov_base = payload;
 +	iovec.iov_len = paylen;
  
  	msghdr.msg_iov = &iovec;
  	msghdr.msg_iovlen = 1;
 @@ -138,55 +139,71 @@ static void
  	len = sendmsg(sockfd, &msghdr, 0);
  	if (len < 0)
  		err(-1, "%s: sendmsg", test);
 -	if (len != sizeof(ch))
 +	if (len != paylen)
  		errx(-1, "%s: sendmsg: %zd bytes sent", test, len);
  }
  
  static void
 -recvfd(const char *test, int sockfd, int *recvfd)
 +sendfd(const char *test, int sockfd, int sendfd)
  {
 +	char ch;
 +
 +	return (sendfd_payload(test, sockfd, sendfd, &ch, sizeof(ch)));
 +}
 +
 +static void
 +recvfd_payload(const char *test, int sockfd, int *recvfd,
 +    void *buf, size_t buflen)
 +{
  	struct cmsghdr *cmsghdr;
 -	char message[CMSG_SPACE(sizeof(int))];
 +	char message[CMSG_SPACE(SOCKCREDSIZE(CMGROUP_MAX)) + sizeof(int)];
  	struct msghdr msghdr;
  	struct iovec iovec;
  	ssize_t len;
 -	char ch;
  
  	bzero(&msghdr, sizeof(msghdr));
 -	ch = 0;
  
  	msghdr.msg_control = message;
  	msghdr.msg_controllen = sizeof(message);
  
 -	iovec.iov_base = &ch;
 -	iovec.iov_len = sizeof(ch);
 +	iovec.iov_base = buf;
 +	iovec.iov_len = buflen;
  
  	msghdr.msg_iov = &iovec;
  	msghdr.msg_iovlen = 1;
  
 -	iovec.iov_len = sizeof(ch);
 -
 -	msghdr.msg_iov = &iovec;
 -	msghdr.msg_iovlen = 1;
 -
  	len = recvmsg(sockfd, &msghdr, 0);
  	if (len < 0)
  		err(-1, "%s: recvmsg", test);
 -	if (len != sizeof(ch))
 +	if (len != buflen)
  		errx(-1, "%s: recvmsg: %zd bytes received", test, len);
 +
  	cmsghdr = CMSG_FIRSTHDR(&msghdr);
  	if (cmsghdr == NULL)
  		errx(-1, "%s: recvmsg: did not receive control message", test);
 -	if (cmsghdr->cmsg_len != CMSG_LEN(sizeof(int)) ||
 -	    cmsghdr->cmsg_level != SOL_SOCKET ||
 -	    cmsghdr->cmsg_type != SCM_RIGHTS)
 +	*recvfd = -1;
 +	for (; cmsghdr != NULL; cmsghdr = CMSG_NXTHDR(&msghdr, cmsghdr)) {
 +		if (cmsghdr->cmsg_level == SOL_SOCKET &&
 +		    cmsghdr->cmsg_type == SCM_RIGHTS &&
 +		    cmsghdr->cmsg_len == CMSG_LEN(sizeof(int))) {
 +			*recvfd = *(int *)CMSG_DATA(cmsghdr);
 +			if (*recvfd == -1)
 +				errx(-1, "%s: recvmsg: received fd -1", test);
 +		}
 +	}
 +	if (*recvfd == -1)
  		errx(-1, "%s: recvmsg: did not receive single-fd message",
  		    test);
 -	*recvfd = *(int *)CMSG_DATA(cmsghdr);
 -	if (*recvfd == -1)
 -		errx(-1, "%s: recvmsg: received fd -1", test);
  }
  
 +static void
 +recvfd(const char *test, int sockfd, int *recvfd)
 +{
 +	char ch;
 +
 +	return (recvfd_payload(test, sockfd, recvfd, &ch, sizeof(ch)));
 +}
 +
  int
  main(int argc, char *argv[])
  {
 @@ -330,6 +347,43 @@ main(int argc, char *argv[])
  	closesocketpair(fd);
  
  	printf("%s passed\n", test);
 +
 +	/*
 +	 * Test for PR 181741. Receiver sets LOCAL_CREDS, and kernel
 +	 * prepends a control message to the data. Sender sends large
 +	 * payload. Payload + SCM_RIGHTS + LOCAL_CREDS hit socket buffer
 +	 * limit, and receiver receives truncated data.
 +	 */
 +	test = "test8-rigths+creds+payload";
 +	printf("beginning %s\n", test);
 +
 +	{
 +		const int on = 1;
 +		u_long sendspace;
 +		size_t len;
 +		void *buf;
 +
 +		len = sizeof(sendspace);
 +		if (sysctlbyname("net.local.stream.sendspace", &sendspace,
 +		    &len, NULL, 0) < 0)
 +			err(-1, "%s: sysctlbyname(net.local.stream.sendspace)",
 +			    test);
 +
 +		if ((buf = malloc(sendspace)) == NULL)
 +			err(-1, "%s: malloc", test);
 +
 +		domainsocketpair(test, fd);
 +		if (setsockopt(fd[1], 0, LOCAL_CREDS, &on, sizeof(on)) < 0)
 +			err(-1, "%s: setsockopt(LOCAL_CREDS)", test);
 +		tempfile(test, &putfd_1);
 +		sendfd_payload(test, fd[0], putfd_1, buf, sendspace);
 +		recvfd_payload(test, fd[1], &getfd_1, buf, sendspace);
 +		close(putfd_1);
 +		close(getfd_1);
 +		closesocketpair(fd);
 +	}
 +
 +	printf("%s passed\n", test);
  	
  	return (0);
  }
 
 --O5XBE6gyVG5Rl6Rj--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181741: commit references a PR
Date: Thu,  6 Feb 2014 13:18:18 +0000 (UTC)

 Author: glebius
 Date: Thu Feb  6 13:18:10 2014
 New Revision: 261550
 URL: http://svnweb.freebsd.org/changeset/base/261550
 
 Log:
   Add test case for kern/181741. Right now test fails.
   
   PR:		181741
   Sponsored by:	Nginx, Inc.
 
 Modified:
   head/tools/regression/sockets/unix_passfd/unix_passfd.c
 
 Modified: head/tools/regression/sockets/unix_passfd/unix_passfd.c
 ==============================================================================
 --- head/tools/regression/sockets/unix_passfd/unix_passfd.c	Thu Feb  6 12:43:06 2014	(r261549)
 +++ head/tools/regression/sockets/unix_passfd/unix_passfd.c	Thu Feb  6 13:18:10 2014	(r261550)
 @@ -29,11 +29,14 @@
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <sys/stat.h>
 +#include <sys/sysctl.h>
 +#include <sys/un.h>
  
  #include <err.h>
  #include <fcntl.h>
  #include <limits.h>
  #include <stdio.h>
 +#include <stdlib.h>
  #include <string.h>
  #include <unistd.h>
  
 @@ -106,11 +109,10 @@ samefile(const char *test, struct stat *
  }
  
  static void
 -sendfd(const char *test, int sockfd, int sendfd)
 +sendfd_payload(const char *test, int sockfd, int sendfd,
 +    void *payload, size_t paylen)
  {
  	struct iovec iovec;
 -	char ch;
 -
  	char message[CMSG_SPACE(sizeof(int))];
  	struct cmsghdr *cmsghdr;
  	struct msghdr msghdr;
 @@ -118,13 +120,12 @@ sendfd(const char *test, int sockfd, int
  
  	bzero(&msghdr, sizeof(msghdr));
  	bzero(&message, sizeof(message));
 -	ch = 0;
  
  	msghdr.msg_control = message;
  	msghdr.msg_controllen = sizeof(message);
  
 -	iovec.iov_base = &ch;
 -	iovec.iov_len = sizeof(ch);
 +	iovec.iov_base = payload;
 +	iovec.iov_len = paylen;
  
  	msghdr.msg_iov = &iovec;
  	msghdr.msg_iovlen = 1;
 @@ -138,33 +139,35 @@ sendfd(const char *test, int sockfd, int
  	len = sendmsg(sockfd, &msghdr, 0);
  	if (len < 0)
  		err(-1, "%s: sendmsg", test);
 -	if (len != sizeof(ch))
 +	if (len != paylen)
  		errx(-1, "%s: sendmsg: %zd bytes sent", test, len);
  }
  
  static void
 -recvfd(const char *test, int sockfd, int *recvfd)
 +sendfd(const char *test, int sockfd, int sendfd)
 +{
 +	char ch;
 +
 +	return (sendfd_payload(test, sockfd, sendfd, &ch, sizeof(ch)));
 +}
 +
 +static void
 +recvfd_payload(const char *test, int sockfd, int *recvfd,
 +    void *buf, size_t buflen)
  {
  	struct cmsghdr *cmsghdr;
 -	char message[CMSG_SPACE(sizeof(int))];
 +	char message[CMSG_SPACE(SOCKCREDSIZE(CMGROUP_MAX)) + sizeof(int)];
  	struct msghdr msghdr;
  	struct iovec iovec;
  	ssize_t len;
 -	char ch;
  
  	bzero(&msghdr, sizeof(msghdr));
 -	ch = 0;
  
  	msghdr.msg_control = message;
  	msghdr.msg_controllen = sizeof(message);
  
 -	iovec.iov_base = &ch;
 -	iovec.iov_len = sizeof(ch);
 -
 -	msghdr.msg_iov = &iovec;
 -	msghdr.msg_iovlen = 1;
 -
 -	iovec.iov_len = sizeof(ch);
 +	iovec.iov_base = buf;
 +	iovec.iov_len = buflen;
  
  	msghdr.msg_iov = &iovec;
  	msghdr.msg_iovlen = 1;
 @@ -172,19 +175,33 @@ recvfd(const char *test, int sockfd, int
  	len = recvmsg(sockfd, &msghdr, 0);
  	if (len < 0)
  		err(-1, "%s: recvmsg", test);
 -	if (len != sizeof(ch))
 +	if (len != buflen)
  		errx(-1, "%s: recvmsg: %zd bytes received", test, len);
 +
  	cmsghdr = CMSG_FIRSTHDR(&msghdr);
  	if (cmsghdr == NULL)
  		errx(-1, "%s: recvmsg: did not receive control message", test);
 -	if (cmsghdr->cmsg_len != CMSG_LEN(sizeof(int)) ||
 -	    cmsghdr->cmsg_level != SOL_SOCKET ||
 -	    cmsghdr->cmsg_type != SCM_RIGHTS)
 +	*recvfd = -1;
 +	for (; cmsghdr != NULL; cmsghdr = CMSG_NXTHDR(&msghdr, cmsghdr)) {
 +		if (cmsghdr->cmsg_level == SOL_SOCKET &&
 +		    cmsghdr->cmsg_type == SCM_RIGHTS &&
 +		    cmsghdr->cmsg_len == CMSG_LEN(sizeof(int))) {
 +			*recvfd = *(int *)CMSG_DATA(cmsghdr);
 +			if (*recvfd == -1)
 +				errx(-1, "%s: recvmsg: received fd -1", test);
 +		}
 +	}
 +	if (*recvfd == -1)
  		errx(-1, "%s: recvmsg: did not receive single-fd message",
  		    test);
 -	*recvfd = *(int *)CMSG_DATA(cmsghdr);
 -	if (*recvfd == -1)
 -		errx(-1, "%s: recvmsg: received fd -1", test);
 +}
 +
 +static void
 +recvfd(const char *test, int sockfd, int *recvfd)
 +{
 +	char ch;
 +
 +	return (recvfd_payload(test, sockfd, recvfd, &ch, sizeof(ch)));
  }
  
  int
 @@ -330,6 +347,43 @@ main(int argc, char *argv[])
  	closesocketpair(fd);
  
  	printf("%s passed\n", test);
 +
 +	/*
 +	 * Test for PR 181741. Receiver sets LOCAL_CREDS, and kernel
 +	 * prepends a control message to the data. Sender sends large
 +	 * payload. Payload + SCM_RIGHTS + LOCAL_CREDS hit socket buffer
 +	 * limit, and receiver receives truncated data.
 +	 */
 +	test = "test8-rigths+creds+payload";
 +	printf("beginning %s\n", test);
 +
 +	{
 +		const int on = 1;
 +		u_long sendspace;
 +		size_t len;
 +		void *buf;
 +
 +		len = sizeof(sendspace);
 +		if (sysctlbyname("net.local.stream.sendspace", &sendspace,
 +		    &len, NULL, 0) < 0)
 +			err(-1, "%s: sysctlbyname(net.local.stream.sendspace)",
 +			    test);
 +
 +		if ((buf = malloc(sendspace)) == NULL)
 +			err(-1, "%s: malloc", test);
 +
 +		domainsocketpair(test, fd);
 +		if (setsockopt(fd[1], 0, LOCAL_CREDS, &on, sizeof(on)) < 0)
 +			err(-1, "%s: setsockopt(LOCAL_CREDS)", test);
 +		tempfile(test, &putfd_1);
 +		sendfd_payload(test, fd[0], putfd_1, buf, sendspace);
 +		recvfd_payload(test, fd[1], &getfd_1, buf, sendspace);
 +		close(putfd_1);
 +		close(getfd_1);
 +		closesocketpair(fd);
 +	}
 +
 +	printf("%s passed\n", test);
  	
  	return (0);
  }
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: Alan Somers <asomers@freebsd.org>
To: bug-followup@FreeBSD.org, yuri@rawbw.com
Cc:  
Subject: Re: kern/181741: [kernel] [patch] Packet loss when &#39;control&#39;
 messages are present with large data (sendmsg(2))
Date: Thu, 20 Feb 2014 10:29:27 -0700

 I've been working on kern/185813, which is closely related.  My
 comments on your patch:
 
 1) In uipc_socket.c, you twice do "if (clen > space) error =
 EMSGSIZE".  Should the comparison be with sp->so_snd->sb_hiwat instead
 of space?  Space shrinks when the sockbuf is partially full, but
 sb_hiwat is constant.  (Actually, sb_hiwat also shrinks for Unix
 domain sockets, but I am going to fix that as part of kern/185812).
 
 2) In uipc_finalizecontrol(), I think that you need to grab
 UNP_LINK_RLOCK to protect the linkage between unp and unp2.
 
 3) It would be fantastic if you could convert the testcase to ATF
 format.  ATF is the new format that all testcases should use going
 forward.  It's easily automatable, unlike the stuff in
 tools/regression, and it's very flexible too.
 https://wiki.freebsd.org/TestingFreeBSD
 
 4) I think there are some tab/space issues with the patch, but I'm not
 positive because I'm reading it in Firefox.
 
 -Alan
>Unformatted:
