From nobody@FreeBSD.org  Wed Aug 10 21:52:30 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A7403106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 10 Aug 2011 21:52:30 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 9705D8FC16
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 10 Aug 2011 21:52:30 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p7ALqUnw075208
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 10 Aug 2011 21:52:30 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p7ALqUl4075207;
	Wed, 10 Aug 2011 21:52:30 GMT
	(envelope-from nobody)
Message-Id: <201108102152.p7ALqUl4075207@red.freebsd.org>
Date: Wed, 10 Aug 2011 21:52:30 GMT
From: Robert Millan <rmh@debian.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: sockets don't work though nullfs mounts
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         159663
>Category:       kern
>Synopsis:       [socket] [nullfs] sockets don't work though nullfs mounts
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-fs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Aug 10 22:00:23 UTC 2011
>Closed-Date:    Tue Apr 24 19:26:37 UTC 2012
>Last-Modified:  Tue Apr 24 19:26:37 UTC 2012
>Originator:     Robert Millan
>Release:        FreeBSD 8.1
>Organization:
>Environment:
GNU/kFreeBSD thorin 8.1-1-amd64 #0 Wed Aug 10 13:58:08 CEST 2011 x86_64 amd64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ GNU/kFreeBSD

>Description:
When mounting a filesystem that contains a socket with nullfs, a new
instance of the socket is generated. Processes listening on one instance
of this socket do not respond to connect() requests on the other instance.

>How-To-Repeat:
$ mkdir a b
$ ./server a/sock &
[1] 2093
$ ./client a/sock 
MESSAGE FROM CLIENT: hello from a client
MESSAGE FROM SERVER: hello from the server
$ sudo mount -t nullfs a b
$ ./client b/sock
connect() failed

>Fix:


>Release-Note:
>Audit-Trail:

From: Robert Millan <rmh@debian.org>
To: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Cc:  
Subject: Re: kern/159663: sockets don't work though nullfs mounts
Date: Thu, 11 Aug 2011 00:01:55 +0200

 --20cf307cfecc166ab004aa2dd44c
 Content-Type: text/plain; charset=UTF-8
 
 Attached server.c and client.c I used for the test (found on the net).
 
 -- 
 Robert Millan
 
 --20cf307cfecc166ab004aa2dd44c
 Content-Type: text/x-csrc; charset=US-ASCII; name="client.c"
 Content-Disposition: attachment; filename="client.c"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_gr6ugbu00
 
 I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzeXMvc29ja2V0Lmg+CiNpbmNsdWRlIDxzeXMv
 dW4uaD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgojaW5jbHVkZSA8c3RyaW5nLmg+CgppbnQKbWFpbiAo
 aW50IGFyZ2MsIGNoYXIgKiphcmd2KQp7CiAgc3RydWN0IHNvY2thZGRyX3VuIGFkZHJlc3M7CiAg
 aW50IHNvY2tldF9mZCwgbmJ5dGVzOwogIGNoYXIgYnVmZmVyWzI1Nl07CgogIHNvY2tldF9mZCA9
 IHNvY2tldCAoUEZfVU5JWCwgU09DS19TVFJFQU0sIDApOwogIGlmIChzb2NrZXRfZmQgPCAwKQog
 ICAgewogICAgICBwcmludGYgKCJzb2NrZXQoKSBmYWlsZWRcbiIpOwogICAgICByZXR1cm4gMTsK
 ICAgIH0KCiAgLyogc3RhcnQgd2l0aCBhIGNsZWFuIGFkZHJlc3Mgc3RydWN0dXJlICovCiAgbWVt
 c2V0ICgmYWRkcmVzcywgMCwgc2l6ZW9mIChzdHJ1Y3Qgc29ja2FkZHJfdW4pKTsKCiAgYWRkcmVz
 cy5zdW5fZmFtaWx5ID0gQUZfVU5JWDsKICBzbnByaW50ZiAoYWRkcmVzcy5zdW5fcGF0aCwgUEFU
 SF9NQVgsIGFyZ3ZbMV0pOwoKICBpZiAoY29ubmVjdCAoc29ja2V0X2ZkLAoJICAgICAgIChzdHJ1
 Y3Qgc29ja2FkZHIgKikgJmFkZHJlc3MsCgkgICAgICAgc2l6ZW9mIChzdHJ1Y3Qgc29ja2FkZHJf
 dW4pKSAhPSAwKQogICAgewogICAgICBwcmludGYgKCJjb25uZWN0KCkgZmFpbGVkXG4iKTsKICAg
 ICAgcmV0dXJuIDE7CiAgICB9CgogIG5ieXRlcyA9IHNucHJpbnRmIChidWZmZXIsIDI1NiwgImhl
 bGxvIGZyb20gYSBjbGllbnQiKTsKICB3cml0ZSAoc29ja2V0X2ZkLCBidWZmZXIsIG5ieXRlcyk7
 CgogIG5ieXRlcyA9IHJlYWQgKHNvY2tldF9mZCwgYnVmZmVyLCAyNTYpOwogIGJ1ZmZlcltuYnl0
 ZXNdID0gMDsKCiAgcHJpbnRmICgiTUVTU0FHRSBGUk9NIFNFUlZFUjogJXNcbiIsIGJ1ZmZlcik7
 CgogIGNsb3NlIChzb2NrZXRfZmQpOwoKICByZXR1cm4gMDsKfQo=
 --20cf307cfecc166ab004aa2dd44c
 Content-Type: text/x-csrc; charset=US-ASCII; name="server.c"
 Content-Disposition: attachment; filename="server.c"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_gr6uggbr1
 
 I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzeXMvc29ja2V0Lmg+CiNpbmNsdWRlIDxzeXMv
 dW4uaD4KI2luY2x1ZGUgPHN5cy90eXBlcy5oPgojaW5jbHVkZSA8dW5pc3RkLmg+CiNpbmNsdWRl
 IDxzdHJpbmcuaD4KCmludApjb25uZWN0aW9uX2hhbmRsZXIgKGludCBjb25uZWN0aW9uX2ZkKQp7
 CiAgaW50IG5ieXRlczsKICBjaGFyIGJ1ZmZlclsyNTZdOwoKICBuYnl0ZXMgPSByZWFkIChjb25u
 ZWN0aW9uX2ZkLCBidWZmZXIsIDI1Nik7CiAgYnVmZmVyW25ieXRlc10gPSAwOwoKICBwcmludGYg
 KCJNRVNTQUdFIEZST00gQ0xJRU5UOiAlc1xuIiwgYnVmZmVyKTsKICBuYnl0ZXMgPSBzbnByaW50
 ZiAoYnVmZmVyLCAyNTYsICJoZWxsbyBmcm9tIHRoZSBzZXJ2ZXIiKTsKICB3cml0ZSAoY29ubmVj
 dGlvbl9mZCwgYnVmZmVyLCBuYnl0ZXMpOwoKICBjbG9zZSAoY29ubmVjdGlvbl9mZCk7CiAgcmV0
 dXJuIDA7Cn0KCmludAptYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCnsKICBzdHJ1Y3Qgc29j
 a2FkZHJfdW4gYWRkcmVzczsKICBpbnQgc29ja2V0X2ZkLCBjb25uZWN0aW9uX2ZkOwogIHNvY2ts
 ZW5fdCBhZGRyZXNzX2xlbmd0aDsKICBwaWRfdCBjaGlsZDsKCiAgc29ja2V0X2ZkID0gc29ja2V0
 IChQRl9VTklYLCBTT0NLX1NUUkVBTSwgMCk7CiAgaWYgKHNvY2tldF9mZCA8IDApCiAgICB7CiAg
 ICAgIHByaW50ZiAoInNvY2tldCgpIGZhaWxlZFxuIik7CiAgICAgIHJldHVybiAxOwogICAgfQoK
 ICB1bmxpbmsgKGFyZ3ZbMV0pOwoKICAvKiBzdGFydCB3aXRoIGEgY2xlYW4gYWRkcmVzcyBzdHJ1
 Y3R1cmUgKi8KICBtZW1zZXQgKCZhZGRyZXNzLCAwLCBzaXplb2YgKHN0cnVjdCBzb2NrYWRkcl91
 bikpOwoKICBhZGRyZXNzLnN1bl9mYW1pbHkgPSBBRl9VTklYOwogIHNucHJpbnRmIChhZGRyZXNz
 LnN1bl9wYXRoLCBQQVRIX01BWCwgYXJndlsxXSk7CgogIGlmIChiaW5kIChzb2NrZXRfZmQsCgkg
 ICAgKHN0cnVjdCBzb2NrYWRkciAqKSAmYWRkcmVzcywgc2l6ZW9mIChzdHJ1Y3Qgc29ja2FkZHJf
 dW4pKSAhPSAwKQogICAgewogICAgICBwcmludGYgKCJiaW5kKCkgZmFpbGVkXG4iKTsKICAgICAg
 cmV0dXJuIDE7CiAgICB9CgogIGlmIChsaXN0ZW4gKHNvY2tldF9mZCwgNSkgIT0gMCkKICAgIHsK
 ICAgICAgcHJpbnRmICgibGlzdGVuKCkgZmFpbGVkXG4iKTsKICAgICAgcmV0dXJuIDE7CiAgICB9
 CgogIHdoaWxlICgoY29ubmVjdGlvbl9mZCA9IGFjY2VwdCAoc29ja2V0X2ZkLAoJCQkJICAoc3Ry
 dWN0IHNvY2thZGRyICopICZhZGRyZXNzLAoJCQkJICAmYWRkcmVzc19sZW5ndGgpKSA+IC0xKQog
 ICAgewogICAgICBjaGlsZCA9IGZvcmsgKCk7CiAgICAgIGlmIChjaGlsZCA9PSAwKQoJewoJICAv
 KiBub3cgaW5zaWRlIG5ld2x5IGNyZWF0ZWQgY29ubmVjdGlvbiBoYW5kbGluZyBwcm9jZXNzICov
 CgkgIHJldHVybiBjb25uZWN0aW9uX2hhbmRsZXIgKGNvbm5lY3Rpb25fZmQpOwoJfQoKICAgICAg
 Lyogc3RpbGwgaW5zaWRlIHNlcnZlciBwcm9jZXNzICovCiAgICAgIGNsb3NlIChjb25uZWN0aW9u
 X2ZkKTsKICAgIH0KCiAgY2xvc2UgKHNvY2tldF9mZCk7CiAgdW5saW5rIChhcmd2WzFdKTsKICBy
 ZXR1cm4gMDsKfQo=
 --20cf307cfecc166ab004aa2dd44c--

From: Robert Millan <rmh@freebsd.org>
To: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Cc: Kostik Belousov <kostikbel@gmail.com>, Adrian Chadd <adrian@freebsd.org>, 
	Josef Karthauser <joe@FreeBSD.org>
Subject: Re: kern/159663: sockets don't work though nullfs mounts
Date: Sat, 24 Sep 2011 17:10:29 +0200

 I found a thread from 2007 with further discussion about this problem:
 
 http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html
 
 There's a proposed solution which might cause consistency problems.
 If I find some time, I'll study the code and try to figure out how to
 do this properly.  No promises though :-)

From: Robert Millan <rmh@freebsd.org>
To: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Cc: Kostik Belousov <kostikbel@gmail.com>, Adrian Chadd <adrian@freebsd.org>, 
	Josef Karthauser <joe@freebsd.org>, freebsd-fs@freebsd.org
Subject: Re: kern/159663: sockets don't work though nullfs mounts
Date: Sun, 25 Sep 2011 17:32:27 +0200

 2011/9/24 Robert Millan <rmh@freebsd.org>:
 > I found a thread from 2007 with further discussion about this problem:
 >
 > http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html
 
 Hi,
 
 I've looked at the situation in a bit more detail, for now only with
 sockets in mind (not named pipes).  My understanding is (please
 correct me if I'm wrong):
 
 - nullfs holds reference counts for each vnode, but sockets have their
 own mechanism for reference counting (so_count / soref / sorele).
 vnode reference counting doesn't protect against socket being closed,
 which would leave a stale pointer in the upper nullfs layer.
 
 - Increasing the reference count of the socket itself can't be done in
 null_nodeget() because this function is merely a getter whose call
 doesn't indicate any meaningful event.
 
 - It's not clear to me that there's any event in time where the socket
 reference can be increased.  If mounting a nullfs were that event,
 then all existing sockets would be soref'ed but we wouldn't be
 soref'ing future sockets created in the lower layer after the mount.
 This doesn't seem correct.
 
 - Possible solution: null_nodeget() semantics are replaced with
 something that actually allows vnodes in the upper layer to be created
 and destroyed.
 
 - Possible solution: upper layer has a memory structure to keep track
 of which sockets in the lower layer have been soref'ed.

From: Mikolaj Golub <trociny@freebsd.org>
To: Robert Millan <rmh@freebsd.org>
Cc: FreeBSD-gnats-submit@freebsd.org,  freebsd-bugs@freebsd.org,  Kostik Belousov <kostikbel@gmail.com>,  Josef Karthauser <joe@freebsd.org>,  Adrian Chadd <adrian@freebsd.org>,  freebsd-fs@freebsd.org
Subject: Re: kern/159663: sockets don't work though nullfs mounts
Date: Mon, 26 Sep 2011 00:58:03 +0300

 --=-=-=
 
 Hi,
 
 On Sun, 25 Sep 2011 17:32:27 +0200 Robert Millan wrote:
 
  RM> 2011/9/24 Robert Millan <rmh@freebsd.org>:
  >> I found a thread from 2007 with further discussion about this problem:
  >>
  >> http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html
 
  RM> Hi,
 
  RM> I've looked at the situation in a bit more detail, for now only with
  RM> sockets in mind (not named pipes).  My understanding is (please
  RM> correct me if I'm wrong):
 
  RM> - nullfs holds reference counts for each vnode, but sockets have their
  RM> own mechanism for reference counting (so_count / soref / sorele).
  RM> vnode reference counting doesn't protect against socket being closed,
  RM> which would leave a stale pointer in the upper nullfs layer.
 
  RM> - Increasing the reference count of the socket itself can't be done in
  RM> null_nodeget() because this function is merely a getter whose call
  RM> doesn't indicate any meaningful event.
 
  RM> - It's not clear to me that there's any event in time where the socket
  RM> reference can be increased.  If mounting a nullfs were that event,
  RM> then all existing sockets would be soref'ed but we wouldn't be
  RM> soref'ing future sockets created in the lower layer after the mount.
  RM> This doesn't seem correct.
 
  RM> - Possible solution: null_nodeget() semantics are replaced with
  RM> something that actually allows vnodes in the upper layer to be created
  RM> and destroyed.
 
  RM> - Possible solution: upper layer has a memory structure to keep track
  RM> of which sockets in the lower layer have been soref'ed.
 
 It looks like there is no need in setting vp->v_un = lowervp->v_un for
 VFIFO. They work without this modification bypassing vnode operations to lover
 node and lowervp->v_un is used.
 
 The issue is only with local sockets, because when bind or connnect is called
 for nullfs file the upper v_un is used.
 
 For me the approach "vp->v_un = lowervp->v_un" has many complications. May be
 it is much easier to use always only lower vnode? What we need for this is to
 make bind and connect get the lower vnode when they are called on nullfs file.
 
 As a proof of concept below is a patch that implements it. Currently I am not
 sure that vrele/vref magic is done properly, but it looks like it works for
 me.
 
 The issues with this approach I see so far:
 
 - we need an additional flag for namei;
 
 - nullfs can be unmounted with a socket file still being opened.
 
 -- 
 Mikolaj Golub
 
 
 --=-=-=
 Content-Type: text/x-patch
 Content-Disposition: inline; filename=nullfs.sockets.patch
 
 Index: sys/sys/namei.h
 ===================================================================
 --- sys/sys/namei.h	(revision 225716)
 +++ sys/sys/namei.h	(working copy)
 @@ -149,7 +149,8 @@ struct nameidata {
  #define	AUDITVNODE1	0x04000000 /* audit the looked up vnode information */
  #define	AUDITVNODE2 	0x08000000 /* audit the looked up vnode information */
  #define	TRAILINGSLASH	0x10000000 /* path ended in a slash */
 -#define	PARAMASK	0x1ffffe00 /* mask of parameter descriptors */
 +#define	LOWERVNODE	0x20000000 /* if it is a stackable fs return lower vnode */
 +#define	PARAMASK	0x3ffffe00 /* mask of parameter descriptors */
  
  #define	NDHASGIANT(NDP)	(((NDP)->ni_cnd.cn_flags & GIANTHELD) != 0)
  
 Index: sys/kern/uipc_usrreq.c
 ===================================================================
 --- sys/kern/uipc_usrreq.c	(revision 225716)
 +++ sys/kern/uipc_usrreq.c	(working copy)
 @@ -493,7 +493,7 @@ uipc_bind(struct socket *so, struct sockaddr *nam,
  
  restart:
  	vfslocked = 0;
 -	NDINIT(&nd, CREATE, MPSAFE | NOFOLLOW | LOCKPARENT | SAVENAME,
 +	NDINIT(&nd, CREATE, MPSAFE | NOFOLLOW | LOCKPARENT | SAVENAME | LOWERVNODE,
  	    UIO_SYSSPACE, buf, td);
  /* SHOULD BE ABLE TO ADOPT EXISTING AND wakeup() ALA FIFO's */
  	error = namei(&nd);
 @@ -1268,7 +1268,7 @@ unp_connect(struct socket *so, struct sockaddr *na
  	UNP_PCB_UNLOCK(unp);
  
  	sa = malloc(sizeof(struct sockaddr_un), M_SONAME, M_WAITOK);
 -	NDINIT(&nd, LOOKUP, MPSAFE | FOLLOW | LOCKLEAF, UIO_SYSSPACE, buf,
 +	NDINIT(&nd, LOOKUP, MPSAFE | FOLLOW | LOCKLEAF | LOWERVNODE, UIO_SYSSPACE, buf,
  	    td);
  	error = namei(&nd);
  	if (error)
 Index: sys/fs/nullfs/null_vnops.c
 ===================================================================
 --- sys/fs/nullfs/null_vnops.c	(revision 225756)
 +++ sys/fs/nullfs/null_vnops.c	(working copy)
 @@ -365,16 +365,40 @@ null_lookup(struct vop_lookup_args *ap)
  			vrele(lvp);
  		} else {
  			error = null_nodeget(dvp->v_mount, lvp, &vp);
 -			if (error)
 +			if (error) {
  				vput(lvp);
 -			else
 -				*ap->a_vpp = vp;
 +			} else if ((flags & LOWERVNODE) != 0) {
 +				vref(lvp);
 +				vrele(vp);
 +				*ap->a_vpp =  lvp;
 +			} else {
 +				*ap->a_vpp =  vp;
 +			}
  		}
  	}
  	return (error);
  }
  
  static int
 +null_create(struct vop_create_args *ap)
 +{
 +	struct componentname *cnp = ap->a_cnp;
 +	int flags = cnp->cn_flags;
 +	int retval;
 +	struct vnode *vp, *lvp;
 +
 +	retval = null_bypass(&ap->a_gen);
 +	if (retval == 0 && (flags & LOWERVNODE) != 0) {
 +		vp = *ap->a_vpp;
 +		lvp = NULLVPTOLOWERVP(vp);
 +		vref(lvp);
 +		vrele(vp);
 +		*ap->a_vpp = lvp;
 +	}
 +	return (retval);
 +}
 +
 +static int
  null_open(struct vop_open_args *ap)
  {
  	int retval;
 @@ -826,6 +850,7 @@ struct vop_vector null_vnodeops = {
  	.vop_accessx =		null_accessx,
  	.vop_advlockpurge =	vop_stdadvlockpurge,
  	.vop_bmap =		VOP_EOPNOTSUPP,
 +	.vop_create =		null_create,
  	.vop_getattr =		null_getattr,
  	.vop_getwritemount =	null_getwritemount,
  	.vop_inactive =		null_inactive,
 
 --=-=-=--

From: Robert Millan <rmh@freebsd.org>
To: Mikolaj Golub <trociny@freebsd.org>
Cc: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org, 
	Kostik Belousov <kostikbel@gmail.com>, Josef Karthauser <joe@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, 
	freebsd-fs@freebsd.org
Subject: Re: kern/159663: sockets don't work though nullfs mounts
Date: Tue, 27 Sep 2011 07:36:28 +0200

 2011/9/25 Mikolaj Golub <trociny@freebsd.org>:
 > As a proof of concept below is a patch that implements it.
 
 This works very well, I'm currently using your patch to run X11 over a
 nullfs-mounted /tmp.
 
 > The issues with this approach I see so far:
 >
 > - we need an additional flag for namei;
 
 What does this involve?

From: Mikolaj Golub <trociny@freebsd.org>
To: Robert Millan <rmh@freebsd.org>
Cc: Mikolaj Golub <trociny@freebsd.org>,  FreeBSD-gnats-submit@freebsd.org,  freebsd-bugs@freebsd.org,  Kostik Belousov <kostikbel@gmail.com>,  Josef Karthauser <joe@freebsd.org>,  Adrian Chadd <adrian@freebsd.org>,  freebsd-fs@freebsd.org
Subject: Re: kern/159663: sockets don't work though nullfs mounts
Date: Tue, 27 Sep 2011 09:49:08 +0300

 --=-=-=
 
 
 On Tue, 27 Sep 2011 07:36:28 +0200 Robert Millan wrote:
 
  RM> 2011/9/25 Mikolaj Golub <trociny@freebsd.org>:
  >> As a proof of concept below is a patch that implements it.
 
  RM> This works very well, I'm currently using your patch to run X11 over a
  RM> nullfs-mounted /tmp.
 
  >> The issues with this approach I see so far:
  >>
  >> - we need an additional flag for namei;
 
  RM> What does this involve?
 
 Well, adding yet another flag just to handle this one case might be not very
 good idea :-)
 
 But actually it is possible to do without the additional flag, with the only
 hack in nullfs code: in lookup and create return lower vnode if it is a
 socket, like in the patch below. It works for me but I have not tested much
 and not checked yet if use cases are possible when this makes undesirable
 effect.
 
 -- 
 Mikolaj Golub
 
 
 --=-=-=
 Content-Type: text/x-diff
 Content-Disposition: inline; filename=nullfs.VSOCK.patch
 
 Index: sys/fs/nullfs/null_vnops.c
 ===================================================================
 --- sys/fs/nullfs/null_vnops.c	(revision 225757)
 +++ sys/fs/nullfs/null_vnops.c	(working copy)
 @@ -365,16 +365,38 @@ null_lookup(struct vop_lookup_args *ap)
  			vrele(lvp);
  		} else {
  			error = null_nodeget(dvp->v_mount, lvp, &vp);
 -			if (error)
 +			if (error) {
  				vput(lvp);
 -			else
 +			} else if (vp->v_type == VSOCK) {
 +				vref(lvp);
 +				vrele(vp);
 +				*ap->a_vpp =  lvp;
 +			} else {
  				*ap->a_vpp = vp;
 +			}
  		}
  	}
  	return (error);
  }
  
  static int
 +null_create(struct vop_create_args *ap)
 +{
 +	struct vnode *vp, *lvp;
 +	int retval;
 +
 +	retval = null_bypass(&ap->a_gen);
 +	vp = *ap->a_vpp;
 +	if (retval == 0 && vp->v_type == VSOCK) {
 +		lvp = NULLVPTOLOWERVP(vp);
 +		vref(lvp);
 +		vrele(vp);
 +		*ap->a_vpp = lvp;
 +	}
 +	return (retval);
 +}
 +
 +static int
  null_open(struct vop_open_args *ap)
  {
  	int retval;
 @@ -826,6 +848,7 @@ struct vop_vector null_vnodeops = {
  	.vop_accessx =		null_accessx,
  	.vop_advlockpurge =	vop_stdadvlockpurge,
  	.vop_bmap =		VOP_EOPNOTSUPP,
 +	.vop_create =           null_create,
  	.vop_getattr =		null_getattr,
  	.vop_getwritemount =	null_getwritemount,
  	.vop_inactive =		null_inactive,
 
 --=-=-=--

From: Robert Millan <rmh@freebsd.org>
To: bug-followup@FreeBSD.org, Mikolaj Golub <trociny@freebsd.org>
Cc:  
Subject: Re: kern/159663: [socket] [nullfs] sockets don't work though nullfs mounts
Date: Sun, 27 Nov 2011 18:44:30 +0100

 Hi Mikolaj,
 
 > But actually it is possible to do without the additional flag, with the only
 > hack in nullfs code: in lookup and create return lower vnode if it is a
 > socket, like in the patch below. It works for me but I have not tested much
 > and not checked yet if use cases are possible when this makes undesirable
 > effect.
 
 I've been using your patch (with 8.1 kernel on amd64) for two months
 now, and didn't notice any ill effects.
 
 Do you have plans on committing it?

From: Mikolaj Golub <to.my.trociny@gmail.com>
To: Robert Millan <rmh@freebsd.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/159663: [socket] [nullfs] sockets don't work though nullfs mounts
Date: Mon, 28 Nov 2011 22:49:02 +0200

 On Sun, 27 Nov 2011 18:44:30 +0100 Robert Millan wrote:
 
  RM> Hi Mikolaj,
 
 Hi
 
  >> But actually it is possible to do without the additional flag, with the only
  >> hack in nullfs code: in lookup and create return lower vnode if it is a
  >> socket, like in the patch below. It works for me but I have not tested much
  >> and not checked yet if use cases are possible when this makes undesirable
  >> effect.
 
  RM> I've been using your patch (with 8.1 kernel on amd64) for two months
  RM> now, and didn't notice any ill effects.
 
  RM> Do you have plans on committing it?
 
 Thanks for testing!
 
 I wouldn't like to commit it without trying to find a better solution. After I
 got some suggestions from kib@ I hope will be able to come with something
 better than this patch.
 
 -- 
 Mikolaj Golub
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Jan 17 22:13:29 UTC 2012 
Responsible-Changed-Why:  

Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=159663 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/159663: commit references a PR
Date: Wed, 29 Feb 2012 21:38:57 +0000 (UTC)

 Author: trociny
 Date: Wed Feb 29 21:38:31 2012
 New Revision: 232317
 URL: http://svn.freebsd.org/changeset/base/232317
 
 Log:
   Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
   operations for setting and accessing vnode's v_socket field.
   
   The operations are necessary to implement proper unix socket handling
   on layered file systems like nullfs(5).
   
   This change fixes the long standing issue with nullfs(5) being in that
   unix sockets did not work between lower and upper layers: if we bound
   to a socket on the lower layer we could connect only to the lower
   path; if we bound to the upper layer we could connect only to the
   upper path. The new behavior is one can connect to both the lower and
   the upper paths regardless what layer path one binds to.
   
   PR:		kern/51583, kern/159663
   Suggested by:	kib
   Reviewed by:	arch
   MFC after:	2 weeks
 
 Modified:
   head/UPDATING
   head/sys/kern/uipc_usrreq.c
   head/sys/kern/vfs_default.c
   head/sys/kern/vnode_if.src
   head/sys/sys/vnode.h
 
 Modified: head/UPDATING
 ==============================================================================
 --- head/UPDATING	Wed Feb 29 21:11:02 2012	(r232316)
 +++ head/UPDATING	Wed Feb 29 21:38:31 2012	(r232317)
 @@ -22,6 +22,14 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10
  	machines to maximize performance.  (To disable malloc debugging, run
  	ln -s aj /etc/malloc.conf.)
  
 +20120229:
 +	Now unix domain sockets behave "as expected" on	nullfs(5). Previously
 +	nullfs(5) did not pass through all behaviours to the underlying layer,
 +	as a result if we bound to a socket on the lower layer we could connect
 +	only to the lower path; if we bound to the upper layer we could connect
 +	only to	the upper path. The new behavior is one can connect to both the
 +	lower and the upper paths regardless what layer path one binds to.
 +
  20120211:
  	The getifaddrs upgrade path broken with 20111215 has been restored.
  	If you have upgraded in between 20111215 and 20120209 you need to
 
 Modified: head/sys/kern/uipc_usrreq.c
 ==============================================================================
 --- head/sys/kern/uipc_usrreq.c	Wed Feb 29 21:11:02 2012	(r232316)
 +++ head/sys/kern/uipc_usrreq.c	Wed Feb 29 21:38:31 2012	(r232317)
 @@ -542,7 +542,7 @@ restart:
  
  	UNP_LINK_WLOCK();
  	UNP_PCB_LOCK(unp);
 -	vp->v_socket = unp->unp_socket;
 +	VOP_UNP_BIND(vp, unp->unp_socket);
  	unp->unp_vnode = vp;
  	unp->unp_addr = soun;
  	unp->unp_flags &= ~UNP_BINDING;
 @@ -638,7 +638,7 @@ uipc_detach(struct socket *so)
  	 * XXXRW: Should assert vp->v_socket == so.
  	 */
  	if ((vp = unp->unp_vnode) != NULL) {
 -		unp->unp_vnode->v_socket = NULL;
 +		VOP_UNP_DETACH(vp);
  		unp->unp_vnode = NULL;
  	}
  	unp2 = unp->unp_conn;
 @@ -1308,7 +1308,7 @@ unp_connect(struct socket *so, struct so
  	 * and to protect simultaneous locking of multiple pcbs.
  	 */
  	UNP_LINK_WLOCK();
 -	so2 = vp->v_socket;
 +	VOP_UNP_CONNECT(vp, &so2);
  	if (so2 == NULL) {
  		error = ECONNREFUSED;
  		goto bad2;
 @@ -2318,17 +2318,15 @@ vfs_unp_reclaim(struct vnode *vp)
  
  	active = 0;
  	UNP_LINK_WLOCK();
 -	so = vp->v_socket;
 +	VOP_UNP_CONNECT(vp, &so);
  	if (so == NULL)
  		goto done;
  	unp = sotounpcb(so);
  	if (unp == NULL)
  		goto done;
  	UNP_PCB_LOCK(unp);
 -	if (unp->unp_vnode != NULL) {
 -		KASSERT(unp->unp_vnode == vp,
 -		    ("vfs_unp_reclaim: vp != unp->unp_vnode"));
 -		vp->v_socket = NULL;
 +	if (unp->unp_vnode == vp) {
 +		VOP_UNP_DETACH(vp);
  		unp->unp_vnode = NULL;
  		active = 1;
  	}
 
 Modified: head/sys/kern/vfs_default.c
 ==============================================================================
 --- head/sys/kern/vfs_default.c	Wed Feb 29 21:11:02 2012	(r232316)
 +++ head/sys/kern/vfs_default.c	Wed Feb 29 21:38:31 2012	(r232317)
 @@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = {
  	.vop_unlock =		vop_stdunlock,
  	.vop_vptocnp =		vop_stdvptocnp,
  	.vop_vptofh =		vop_stdvptofh,
 +	.vop_unp_bind =		vop_stdunp_bind,
 +	.vop_unp_connect =	vop_stdunp_connect,
 +	.vop_unp_detach =	vop_stdunp_detach,
  };
  
  /*
 @@ -1037,6 +1040,30 @@ vop_stdadvise(struct vop_advise_args *ap
  	return (error);
  }
  
 +int
 +vop_stdunp_bind(struct vop_unp_bind_args *ap)
 +{
 +
 +	ap->a_vp->v_socket = ap->a_socket;
 +	return (0);
 +}
 +
 +int
 +vop_stdunp_connect(struct vop_unp_connect_args *ap)
 +{
 +
 +	*ap->a_socket = ap->a_vp->v_socket;
 +	return (0);
 +}
 +
 +int
 +vop_stdunp_detach(struct vop_unp_detach_args *ap)
 +{
 +
 +	ap->a_vp->v_socket = NULL;
 +	return (0);
 +}
 +
  /*
   * vfs default ops
   * used to fill the vfs function table to get reasonable default return values.
 
 Modified: head/sys/kern/vnode_if.src
 ==============================================================================
 --- head/sys/kern/vnode_if.src	Wed Feb 29 21:11:02 2012	(r232316)
 +++ head/sys/kern/vnode_if.src	Wed Feb 29 21:38:31 2012	(r232317)
 @@ -640,6 +640,26 @@ vop_advise {
  	IN int advice;
  };
  
 +%% unp_bind	vp	E E E
 +
 +vop_unp_bind {
 +	IN struct vnode *vp;
 +	IN struct socket *socket;
 +};
 +
 +%% unp_connect	vp	L L L
 +
 +vop_unp_connect {
 +	IN struct vnode *vp;
 +	OUT struct socket **socket;
 +};
 +
 +%% unp_detach	vp	= = =
 +
 +vop_unp_detach {
 +	IN struct vnode *vp;
 +};
 +
  # The VOPs below are spares at the end of the table to allow new VOPs to be
  # added in stable branches without breaking the KBI.  New VOPs in HEAD should
  # be added above these spares.  When merging a new VOP to a stable branch,
 
 Modified: head/sys/sys/vnode.h
 ==============================================================================
 --- head/sys/sys/vnode.h	Wed Feb 29 21:11:02 2012	(r232316)
 +++ head/sys/sys/vnode.h	Wed Feb 29 21:38:31 2012	(r232317)
 @@ -703,6 +703,9 @@ int	vop_stdpathconf(struct vop_pathconf_
  int	vop_stdpoll(struct vop_poll_args *);
  int	vop_stdvptocnp(struct vop_vptocnp_args *ap);
  int	vop_stdvptofh(struct vop_vptofh_args *ap);
 +int	vop_stdunp_bind(struct vop_unp_bind_args *ap);
 +int	vop_stdunp_connect(struct vop_unp_connect_args *ap);
 +int	vop_stdunp_detach(struct vop_unp_detach_args *ap);
  int	vop_eopnotsupp(struct vop_generic_args *ap);
  int	vop_ebadf(struct vop_generic_args *ap);
  int	vop_einval(struct vop_generic_args *ap);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/159663: commit references a PR
Date: Tue, 24 Apr 2012 19:08:51 +0000 (UTC)

 Author: trociny
 Date: Tue Apr 24 19:08:40 2012
 New Revision: 234660
 URL: http://svn.freebsd.org/changeset/base/234660
 
 Log:
   MFC r232317:
   
   Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
   operations for setting and accessing vnode's v_socket field.
   
   The operations are necessary to implement proper unix socket handling
   on layered file systems like nullfs(5).
   
   This change fixes the long standing issue with nullfs(5) being in that
   unix sockets did not work between lower and upper layers: if we bound
   to a socket on the lower layer we could connect only to the lower
   path; if we bound to the upper layer we could connect only to the
   upper path. The new behavior is one can connect to both the lower and
   the upper paths regardless what layer path one binds to.
   
   PR:		kern/51583, kern/159663
   Suggested by:	kib
   Reviewed by:	arch
 
 Modified:
   stable/9/UPDATING   (contents, props changed)
   stable/9/sys/kern/uipc_usrreq.c
   stable/9/sys/kern/vfs_default.c
   stable/9/sys/kern/vnode_if.src
   stable/9/sys/sys/vnode.h
 Directory Properties:
   stable/9/sys/   (props changed)
 
 Modified: stable/9/UPDATING
 ==============================================================================
 --- stable/9/UPDATING	Tue Apr 24 19:00:42 2012	(r234659)
 +++ stable/9/UPDATING	Tue Apr 24 19:08:40 2012	(r234660)
 @@ -9,6 +9,14 @@ handbook.
  Items affecting the ports and packages system can be found in
  /usr/ports/UPDATING.  Please read that file before running portupgrade.
  
 +20120422:
 +	Now unix domain sockets behave "as expected" on	nullfs(5). Previously
 +	nullfs(5) did not pass through all behaviours to the underlying layer,
 +	as a result if we bound to a socket on the lower layer we could connect
 +	only to the lower path; if we bound to the upper layer we could connect
 +	only to	the upper path. The new behavior is one can connect to both the
 +	lower and the upper paths regardless what layer path one binds to.
 +
  20120109:
  	The acpi_wmi(4) status device /dev/wmistat has been renamed to
  	/dev/wmistat0.
 
 Modified: stable/9/sys/kern/uipc_usrreq.c
 ==============================================================================
 --- stable/9/sys/kern/uipc_usrreq.c	Tue Apr 24 19:00:42 2012	(r234659)
 +++ stable/9/sys/kern/uipc_usrreq.c	Tue Apr 24 19:08:40 2012	(r234660)
 @@ -541,7 +541,7 @@ restart:
  
  	UNP_LINK_WLOCK();
  	UNP_PCB_LOCK(unp);
 -	vp->v_socket = unp->unp_socket;
 +	VOP_UNP_BIND(vp, unp->unp_socket);
  	unp->unp_vnode = vp;
  	unp->unp_addr = soun;
  	unp->unp_flags &= ~UNP_BINDING;
 @@ -637,7 +637,7 @@ uipc_detach(struct socket *so)
  	 * XXXRW: Should assert vp->v_socket == so.
  	 */
  	if ((vp = unp->unp_vnode) != NULL) {
 -		unp->unp_vnode->v_socket = NULL;
 +		VOP_UNP_DETACH(vp);
  		unp->unp_vnode = NULL;
  	}
  	unp2 = unp->unp_conn;
 @@ -1307,7 +1307,7 @@ unp_connect(struct socket *so, struct so
  	 * and to protect simultaneous locking of multiple pcbs.
  	 */
  	UNP_LINK_WLOCK();
 -	so2 = vp->v_socket;
 +	VOP_UNP_CONNECT(vp, &so2);
  	if (so2 == NULL) {
  		error = ECONNREFUSED;
  		goto bad2;
 @@ -2317,17 +2317,15 @@ vfs_unp_reclaim(struct vnode *vp)
  
  	active = 0;
  	UNP_LINK_WLOCK();
 -	so = vp->v_socket;
 +	VOP_UNP_CONNECT(vp, &so);
  	if (so == NULL)
  		goto done;
  	unp = sotounpcb(so);
  	if (unp == NULL)
  		goto done;
  	UNP_PCB_LOCK(unp);
 -	if (unp->unp_vnode != NULL) {
 -		KASSERT(unp->unp_vnode == vp,
 -		    ("vfs_unp_reclaim: vp != unp->unp_vnode"));
 -		vp->v_socket = NULL;
 +	if (unp->unp_vnode == vp) {
 +		VOP_UNP_DETACH(vp);
  		unp->unp_vnode = NULL;
  		active = 1;
  	}
 
 Modified: stable/9/sys/kern/vfs_default.c
 ==============================================================================
 --- stable/9/sys/kern/vfs_default.c	Tue Apr 24 19:00:42 2012	(r234659)
 +++ stable/9/sys/kern/vfs_default.c	Tue Apr 24 19:08:40 2012	(r234660)
 @@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = {
  	.vop_unlock =		vop_stdunlock,
  	.vop_vptocnp =		vop_stdvptocnp,
  	.vop_vptofh =		vop_stdvptofh,
 +	.vop_unp_bind =		vop_stdunp_bind,
 +	.vop_unp_connect =	vop_stdunp_connect,
 +	.vop_unp_detach =	vop_stdunp_detach,
  };
  
  /*
 @@ -1037,6 +1040,30 @@ vop_stdadvise(struct vop_advise_args *ap
  	return (error);
  }
  
 +int
 +vop_stdunp_bind(struct vop_unp_bind_args *ap)
 +{
 +
 +	ap->a_vp->v_socket = ap->a_socket;
 +	return (0);
 +}
 +
 +int
 +vop_stdunp_connect(struct vop_unp_connect_args *ap)
 +{
 +
 +	*ap->a_socket = ap->a_vp->v_socket;
 +	return (0);
 +}
 +
 +int
 +vop_stdunp_detach(struct vop_unp_detach_args *ap)
 +{
 +
 +	ap->a_vp->v_socket = NULL;
 +	return (0);
 +}
 +
  /*
   * vfs default ops
   * used to fill the vfs function table to get reasonable default return values.
 
 Modified: stable/9/sys/kern/vnode_if.src
 ==============================================================================
 --- stable/9/sys/kern/vnode_if.src	Tue Apr 24 19:00:42 2012	(r234659)
 +++ stable/9/sys/kern/vnode_if.src	Tue Apr 24 19:08:40 2012	(r234660)
 @@ -640,23 +640,31 @@ vop_advise {
  	IN int advice;
  };
  
 -# The VOPs below are spares at the end of the table to allow new VOPs to be
 -# added in stable branches without breaking the KBI.  New VOPs in HEAD should
 -# be added above these spares.  When merging a new VOP to a stable branch,
 -# the new VOP should replace one of the spares.
 +%% unp_bind	vp	E E E
  
 -vop_spare1 {
 +vop_unp_bind {
  	IN struct vnode *vp;
 +	IN struct socket *socket;
  };
  
 -vop_spare2 {
 +%% unp_connect	vp	L L L
 +
 +vop_unp_connect {
  	IN struct vnode *vp;
 +	OUT struct socket **socket;
  };
  
 -vop_spare3 {
 +%% unp_detach	vp	= = =
 +
 +vop_unp_detach {
  	IN struct vnode *vp;
  };
  
 +# The VOPs below are spares at the end of the table to allow new VOPs to be
 +# added in stable branches without breaking the KBI.  New VOPs in HEAD should
 +# be added above these spares.  When merging a new VOP to a stable branch,
 +# the new VOP should replace one of the spares.
 +
  vop_spare4 {
  	IN struct vnode *vp;
  };
 
 Modified: stable/9/sys/sys/vnode.h
 ==============================================================================
 --- stable/9/sys/sys/vnode.h	Tue Apr 24 19:00:42 2012	(r234659)
 +++ stable/9/sys/sys/vnode.h	Tue Apr 24 19:08:40 2012	(r234660)
 @@ -707,6 +707,9 @@ int	vop_stdpathconf(struct vop_pathconf_
  int	vop_stdpoll(struct vop_poll_args *);
  int	vop_stdvptocnp(struct vop_vptocnp_args *ap);
  int	vop_stdvptofh(struct vop_vptofh_args *ap);
 +int	vop_stdunp_bind(struct vop_unp_bind_args *ap);
 +int	vop_stdunp_connect(struct vop_unp_connect_args *ap);
 +int	vop_stdunp_detach(struct vop_unp_detach_args *ap);
  int	vop_eopnotsupp(struct vop_generic_args *ap);
  int	vop_ebadf(struct vop_generic_args *ap);
  int	vop_einval(struct vop_generic_args *ap);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: trociny 
State-Changed-When: Tue Apr 24 19:26:09 UTC 2012 
State-Changed-Why:  
The fix merged to stable/9. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=159663 
>Unformatted:
