From nobody@FreeBSD.ORG  Tue Dec 14 00:24:05 1999
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 150DD14A2D; Tue, 14 Dec 1999 00:24:05 -0800 (PST)
Message-Id: <19991214082405.150DD14A2D@hub.freebsd.org>
Date: Tue, 14 Dec 1999 00:24:05 -0800 (PST)
From: str@giganda.komkon.org
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@freebsd.org
Subject: incorrect utmp/wtmp records update upon connection being interrupted
X-Send-Pr-Version: www-1.0

>Number:         15478
>Category:       kern
>Synopsis:       incorrect utmp/wtmp records update upon connection being interrupted
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          suspended
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Dec 14 00:30:00 PST 1999
>Closed-Date:    
>Last-Modified:  Sat Sep 11 22:55:24 GMT 2004
>Originator:     Igor Roshchin
>Release:        3.x-stable
>Organization:
KomKon
>Environment:
FreeBSD 3.3-RELEASE i386
FreeBSD 3.1-STABLE i386
>Description:
1. The record in wtmp file is not updated properly.
This seems to be happening when the connection dies (e.g. "reset by peer"),
say, when a people connected from a ppp-via-dialup host gets kicked
off by a modem, and the connection is not disconneted properly.

2. The utmp record is not updated either.
So, "w" shows a person being logged in, even though there are no processes
running on that tty. (accordingly w shows "-" as a current process).
When somebody else logs in on that tty, the utmp record is updated,
but not the wtmp one.

In most cases, if not in all, the users use "screen".
I am not sure if the use of screen is necessary condition or just
a coinsidence.

Additional information:
I've got responses that the problem is also observed in
3.3-STABLE (Nov 16) & 4.0-CURRENT (Sept 29)
and also without screen, but rather due to
WindowMaker unconditionally killing rxvt (from Will Andrews).
(" I have X11 + WindowMaker setup to
run a rxvt w/ top & xtail /var/log whenever it starts up.")
In that case, "w" shows incorrect idle time which might be
even greater than the uptime.

>How-To-Repeat:

I am not sure if it works every time, but..

Login (via telnet or ssh) from a dialup-PPP-host, 
reattach running screen.
Harshly disconnect the modem.

Also, suggested by Will Andrews:
(From his e-mail)

=========
 I have X11 + WindowMaker setup to
run a rxvt w/ top & xtail /var/log whenever it starts up. I never kill these
apps, so WindowMaker does the job. Unfortunately, the utmp & wtmp logs are
affected as you say above:

<2 5001-0> (99-12-11 17:02:42) [will@shadow ~]% w
 5:02PM  up 6 days, 19:40, 9 users, load averages: 1.02, 1.08, 1.07
USER             TTY      FROM              LOGIN@  IDLE WHAT
will             v0       -                12:59PM  4:03 xinit /home/will/.xini
will             p0       unix:0           12:59PM 6days top
will             p1       unix:0           12:59PM 27days xtail /var/log

Note that ttyp1's idle time is 27 days whereas my system uptime is only 6 days.
Also note that I've only been running X for 4 hours. Because WindowMaker
unconditionally kills these rxvt's, the utmp & wtmp files are not cleaned up
properly, and I get a result like the above.
==========
>Fix:

Check utmp/wtmp related functions...
Sorry, don't have better clue.

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: mike 
State-Changed-When: Sat Jul 21 11:20:43 PDT 2001 
State-Changed-Why:  

Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=15478 

From: Mike Barcroft <mike@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Mon, 23 Jul 2001 12:11:08 -0400

 Adding to Audit-Trail.
 
 ----- Forwarded message from Igor Roshchin <str@giganda.komkon.org> -----
 
 Delivered-To: mike@freebsd.org
 Date: Mon, 23 Jul 2001 02:06:33 -0400 (EDT)
 From: Igor Roshchin <str@giganda.komkon.org>
 To: freebsd-bugs@FreeBSD.org, mike@FreeBSD.org
 Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
 In-Reply-To: <200107211820.f6LIKsA19064@freefall.freebsd.org>
 
 > From mike@FreeBSD.org Sat Jul 21 14:24:59 2001
 > Date: Sat, 21 Jul 2001 11:20:54 -0700 (PDT)
 > From: <mike@FreeBSD.org>
 > To: str@giganda.komkon.org, mike@FreeBSD.org, freebsd-bugs@FreeBSD.org
 > Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
 >
 > Synopsis: incorrect utmp/wtmp records update upon connection being interrupted
 >
 > State-Changed-From-To: open->feedback
 > State-Changed-By: mike
 > State-Changed-When: Sat Jul 21 11:20:43 PDT 2001
 > State-Changed-Why: 
 >
 > Does this problem still occur in newer versions of FreeBSD,
 > such as 4.3-RELEASE?
 >
 > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=15478
 >
 
 Although I have not been paying close attention to this problem recently,
 a quick look at a wtmp file shows that it still happens on a
 4.3-RELEASE box.
 
 
 Igor
 
 
 ----- End forwarded message -----
State-Changed-From-To: feedback->suspended 
State-Changed-By: mike 
State-Changed-When: Mon Jul 23 16:47:45 PDT 2001 
State-Changed-Why:  

This is still a problem.  Awaiting fix and committer. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=15478 

From: "Yar Tikhiy" <yar@comp.chem.msu.su>
To: <freebsd-gnats-submit@FreeBSD.org>, <str@giganda.komkon.org>
Cc:  
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Thu, 11 Oct 2001 17:57:08 +0400

 As I can see, it's hardly a FreeBSD problem. It's the consequence of some
 applications (e.g. xterm, rxvt, screen) modifying utmp bogusly.
 
 Rxvt and xterm just can't clean up if killed unconditionally.
 
 As for screen, it does a Very Bad Thing: It takes a user record out of utmp
 at startup. Of course, /sbin/init then won't add a logout record to wtmp if
 the
 session gets aborted. If I were in your shoes, I'd report that to the author
 of screen.
 
 The instant cure is to remove the set-uid bit from the programs so they
 won't
 mess utmp up.
 
 If you don't mind, I'd rather close this PR since it has to do nothing
 with FreeBSD.
 

From: Igor Roshchin <str@giganda.komkon.org>
To: freebsd-gnats-submit@FreeBSD.org, str@giganda.komkon.org,
	yar@comp.chem.msu.su
Cc:  
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Thu, 11 Oct 2001 12:33:27 -0400 (EDT)

 > From yar@comp.chem.msu.su Thu Oct 11 09:57:05 2001
 > From: "Yar Tikhiy" <yar@comp.chem.msu.su>
 > To: <freebsd-gnats-submit@FreeBSD.org>, <str@giganda.komkon.org>
 > Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
 > Date: Thu, 11 Oct 2001 17:57:08 +0400
 >
 > As I can see, it's hardly a FreeBSD problem. It's the consequence of some
 > applications (e.g. xterm, rxvt, screen) modifying utmp bogusly.
 >
 > Rxvt and xterm just can't clean up if killed unconditionally.
 >
 > As for screen, it does a Very Bad Thing: It takes a user record out of utmp
 > at startup. Of course, /sbin/init then won't add a logout record to wtmp if
 > the
 > session gets aborted. If I were in your shoes, I'd report that to the author
 > of screen.
 >
 > The instant cure is to remove the set-uid bit from the programs so they
 > won't
 > mess utmp up.
 >
 > If you don't mind, I'd rather close this PR since it has to do nothing
 > with FreeBSD.
 >
 
 First of all, I am not completely sure if the part "2" of the initial PR
 is still happening on 4.x systems (the one concerning the utmp),
 while the part 1 (concerning the wtmp) is still valid for sure.
 I just don't have a conclusive evidence that the part "2" is gone completely.
 
 In a presence of the part "2", when utmp has a record of a currently logged
 user when there are no processes related to the tty in question,
  your comments about screen and its behavior do not have a complete
 description of the problem.
 
 Removing the set-uid bit is not a good thing. It prevents you from seeing
 the users on the computer with w(1).
 
 Although I see your point, at this moment I am not convinced that this an
 application-only problem. I believe the system should be able to correct 
 both utmp and wtmp. Thus, I don't think this PR should be closed without
 a proper fix.
 
 From init(8):
      In multi-user operation, init maintains processes for the terminal ports
      found in the file ttys(5). ..
      ... getty opens
      and initializes the tty line and executes the login(1) program.  The
      login program, when a valid user logs in, executes a shell for that user.
      When this shell dies, either because the user logged out or an abnormal
      termination occurred (a signal), the init program wakes up, deletes the
      user from the utmp(5) file of current users and records the logout in the
      wtmp(5) file.  The cycle is then restarted by init executing a new getty
      for the line.
 
 In the initially reported behavior, the utmp record of a user disconnected
 was present until somebody else would log in onto the same tty. Then
 the utmp would get cleared (by init ?), while wtmp record still wouldn't be
 closed (i.e. logout record is not added).
 
 
 Again, the part "2" of the initial PR 
 might've been fixed in the recent releases.
 Regardless whether that part was fixed or not,
 how about some type of check-and-cleanup procedure in init, when it
 regains the ownership of the tty , and checks if there is a record
 in utmp, and if not, just adds a logout record to utmp ?
 
 
 Igor
 
 
 PS. This wtmp-related bug reveals a bug in last(1)
 Interestingly enough, last(1) , depending on the invocation, 
 behaves differently.  Note, that "user" is no longer logged in.
 
 machine: [12:13] [651] ~>last | grep user
 user         ttypi    64.152.168.61    Thu Oct 11 00:38 - 02:23  (01:44)
 user         ttypt    63.210.212.179   Wed Oct 10 23:19   still logged in
 user         ttypm    209.246.81.251   Tue Oct  9 22:00 - 22:48  (00:47)
 user         ttyq3    209.246.91.243   Thu Oct  4 21:33   still logged in
 user         ttypc    64.152.174.71    Wed Oct  3 18:13 - 18:58  (00:44)
 machine: [12:13] [652] ~>
 machine: [12:13] [652] ~>last -10 user
 user         ttypi    64.152.168.61    Thu Oct 11 00:38 - 02:23  (01:44)
 user         ttypt    63.210.212.179   Wed Oct 10 23:19 - 09:18  (09:59)
 user         ttypm    209.246.81.251   Tue Oct  9 22:00 - 22:48  (00:47)
 user         ttyq3    209.246.91.243   Thu Oct  4 21:33 - 17:09  (19:35)
 ^C
 interrupted Thu Oct  4 07:19
 
 In the second case it is reporting the logout
 time of the next user (user2) logged on the same tty later:
 user2        ttypt    130.91.163.208   Thu Oct 11 09:12 - 09:18  (00:06)
 
 This is a bug in last(1)
 

From: Yar Tikhiy <yar@FreeBSD.org>
To: Igor Roshchin <str@giganda.komkon.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Fri, 12 Oct 2001 13:39:57 +0400

 On Thu, Oct 11, 2001 at 12:33:27PM -0400, Igor Roshchin wrote:
 > 
 > In a presence of the part "2", when utmp has a record of a currently logged
 > user when there are no processes related to the tty in question,
 >  your comments about screen and its behavior do not have a complete
 > description of the problem.
 
 Does screen produce such dead utmp records?  I interpreted your message
 that it was rxvt that caused the "zombie" records in utmp.
  
 > Removing the set-uid bit is not a good thing. It prevents you from seeing
 > the users on the computer with w(1).
 
 No it doesn't.  If you remove the suid bit from screen, you'll just see
 a user's initial login instead of his screen sessions.
  
 > Although I see your point, at this moment I am not convinced that this an
 > application-only problem. I believe the system should be able to correct 
 > both utmp and wtmp. Thus, I don't think this PR should be closed without
 > a proper fix.
 
 I don't see a clear way of fixing the wtmp file in such a case.
 Additionally, I believe an operating system cannot have a tool/code
 to fix every breakage that a lame program run with the superuser rights
 may introduce.
  
 > In the initially reported behavior, the utmp record of a user disconnected
 > was present until somebody else would log in onto the same tty. Then
 > the utmp would get cleared (by init ?), while wtmp record still wouldn't be
 > closed (i.e. logout record is not added).
  
 init, sshd, and telnetd effectively move a utmp record
 from the utmp file to the wtmp file.  If there's no record
 in the utmp file for the tty, these programs can do nothing
 about fixing the wtmp file.  And screen removes the utmp
 record for the user when started.  If screen didn't, the
 problem wouldn't appear at all.
  
 > Again, the part "2" of the initial PR 
 > might've been fixed in the recent releases.
 
 I'd like to repeat once more: The problem hardly can be fixed
 in FreeBSD itself.
 
 > Regardless whether that part was fixed or not,
 > how about some type of check-and-cleanup procedure in init, when it
 > regains the ownership of the tty , and checks if there is a record
 > in utmp, and if not, just adds a logout record to utmp ?
 						    ^^^^ wtmp? 
 
 The actual init's (and sshd's, and telnetd's) logic is opposite
 to what you propose: init won't add a logout record to wtmp if
 there is no record in utmp for the tty by the time of a logout.
 That's because programs run by init don't need to record logins
 to utmp--some of them may have to do nothing with logins/logouts.
  
 > PS. This wtmp-related bug reveals a bug in last(1)
 > Interestingly enough, last(1) , depending on the invocation, 
 > behaves differently.  Note, that "user" is no longer logged in.
 > 
 > machine: [12:13] [651] ~>last | grep user
 > user         ttypi    64.152.168.61    Thu Oct 11 00:38 - 02:23  (01:44)
 > user         ttypt    63.210.212.179   Wed Oct 10 23:19   still logged in
 > user         ttypm    209.246.81.251   Tue Oct  9 22:00 - 22:48  (00:47)
 > user         ttyq3    209.246.91.243   Thu Oct  4 21:33   still logged in
 > user         ttypc    64.152.174.71    Wed Oct  3 18:13 - 18:58  (00:44)
 > machine: [12:13] [652] ~>
 > machine: [12:13] [652] ~>last -10 user
 > user         ttypi    64.152.168.61    Thu Oct 11 00:38 - 02:23  (01:44)
 > user         ttypt    63.210.212.179   Wed Oct 10 23:19 - 09:18  (09:59)
 > user         ttypm    209.246.81.251   Tue Oct  9 22:00 - 22:48  (00:47)
 > user         ttyq3    209.246.91.243   Thu Oct  4 21:33 - 17:09  (19:35)
 > ^C
 > interrupted Thu Oct  4 07:19
 > 
 > In the second case it is reporting the logout
 > time of the next user (user2) logged on the same tty later:
 > user2        ttypt    130.91.163.208   Thu Oct 11 09:12 - 09:18  (00:06)
 > 
 > This is a bug in last(1)
 
 Sorry, but I couldn't reproduce that.  Would you mind
 sending me an example of such wtmp in a personal mail?
 
 -- 
 Yar

From: Igor Roshchin <str@giganda.komkon.org>
To: yar@FreeBSD.org
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Mon, 15 Oct 2001 15:06:15 -0400 (EDT)

 > From yar@snark.rinet.ru Fri Oct 12 05:40:00 2001
 > Date: Fri, 12 Oct 2001 13:39:57 +0400
 > From: Yar Tikhiy <yar@FreeBSD.org>
 > To: Igor Roshchin <str@giganda.komkon.org>
 > Cc: freebsd-gnats-submit@FreeBSD.org
 > Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
 >
 > On Thu, Oct 11, 2001 at 12:33:27PM -0400, Igor Roshchin wrote:
 > > 
 > > In a presence of the part "2", when utmp has a record of a currently logged
 > > user when there are no processes related to the tty in question,
 > >  your comments about screen and its behavior do not have a complete
 > > description of the problem.
 >
 > Does screen produce such dead utmp records?  I interpreted your message
 > that it was rxvt that caused the "zombie" records in utmp.
 
 screen used to do that (dead utmp records) on 3.x systems, I am not sure,
 I think it doesn't do it on 4.x any more.
 Note, that it was the utmp record for the actual login tty that was left
 behind.
 
 >  
 > > Removing the set-uid bit is not a good thing. It prevents you from seeing
 > > the users on the computer with w(1).
 >
 > No it doesn't.  If you remove the suid bit from screen, you'll just see
 > a user's initial login instead of his screen sessions.
 
 That's true with the screen, not with the xterm.
 Besides, I'd prefer to see the actual activity of the people logged on
 as I can do with the suid bit on.
 
 >  
 > > Although I see your point, at this moment I am not convinced that this an
 > > application-only problem. I believe the system should be able to correct 
 > > both utmp and wtmp. Thus, I don't think this PR should be closed without
 > > a proper fix.
 >
 > I don't see a clear way of fixing the wtmp file in such a case.
 > Additionally, I believe an operating system cannot have a tool/code
 > to fix every breakage that a lame program run with the superuser rights
 > may introduce.
 
 In my previous response I suggested what init can do about it.
 You are coming from the position "in the current design of the system
 it can not be done". So, sometimes the design of the system can/should
 be changed.
 
 Again, let me reiterate my suggestion:
 When the init regains control of the tty, it should 
 1. lookup what is going on with the record corresponding to that tty in
 wtmp. 
 2. If that record does not have the corresponding "closing record",
 check for the existence of the utmp record 
  a) if present - "move" it to the wtmp
  b) if absent - write the "closing record" to wtmp using the current time
 stamp.
 
 This way the system can insure that no matter what application
 has left a trace behind it, the wtmp and utmp files are coherent.
 
 I haven't checked how (if at all) cleaning of a dead utmp record
 was implemented on the way from 3.x to 4.x. That might be a part
 of the procedure for 2a) branch of the above.
 
 
 >  
 > > In the initially reported behavior, the utmp record of a user disconnected
 > > was present until somebody else would log in onto the same tty. Then
 > > the utmp would get cleared (by init ?), while wtmp record still wouldn't be
 > > closed (i.e. logout record is not added).
 >  
 > init, sshd, and telnetd effectively move a utmp record
 > from the utmp file to the wtmp file.  If there's no record
 > in the utmp file for the tty, these programs can do nothing
 > about fixing the wtmp file.  And screen removes the utmp
 > record for the user when started.  If screen didn't, the
 > problem wouldn't appear at all.
 
 As we know this is not only screen problem.
 xterm and rxvt (and maybe something else) also does it. 
 That's why I am suggesting to do have a universal fix from the system side.
 
 >
 > I'd like to repeat once more: The problem hardly can be fixed
 > in FreeBSD itself.
 
 I offered a possible scenario.
 
 >
 > > Regardless whether that part was fixed or not,
 > > how about some type of check-and-cleanup procedure in init, when it
 > > regains the ownership of the tty , and checks if there is a record
 > > in utmp, and if not, just adds a logout record to utmp ?
 > 						    ^^^^ wtmp? 
 
 yes, that was a typo.
 
 >
 > The actual init's (and sshd's, and telnetd's) logic is opposite
 > to what you propose: init won't add a logout record to wtmp if
 > there is no record in utmp for the tty by the time of a logout.
 
 It's time to update that logic!
 
 > That's because programs run by init don't need to record logins
 > to utmp--some of them may have to do nothing with logins/logouts.
 
 I believe all programs which are taking a tty from init(1) (or one of its
 children) should add records to utmp.
 
 
 Igor
 

From: Yar Tikhiy <yar@snark.rinet.ru>
To: Igor Roshchin <str@giganda.komkon.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Tue, 16 Oct 2001 13:52:37 +0400

 On Mon, Oct 15, 2001 at 03:06:15PM -0400, Igor Roshchin wrote:
 > 
 > > > Although I see your point, at this moment I am not convinced that this an
 > > > application-only problem. I believe the system should be able to correct 
 > > > both utmp and wtmp. Thus, I don't think this PR should be closed without
 > > > a proper fix.
 > >
 > > I don't see a clear way of fixing the wtmp file in such a case.
 > > Additionally, I believe an operating system cannot have a tool/code
 > > to fix every breakage that a lame program run with the superuser rights
 > > may introduce.
 > 
 > In my previous response I suggested what init can do about it.
 > You are coming from the position "in the current design of the system
 > it can not be done". So, sometimes the design of the system can/should
 > be changed.
 > 
 > Again, let me reiterate my suggestion:
 > When the init regains control of the tty, it should 
 > 1. lookup what is going on with the record corresponding to that tty in
 > wtmp. 
 > 2. If that record does not have the corresponding "closing record",
 > check for the existence of the utmp record 
 >  a) if present - "move" it to the wtmp
 >  b) if absent - write the "closing record" to wtmp using the current time
 > stamp.
 > 
 > This way the system can insure that no matter what application
 > has left a trace behind it, the wtmp and utmp files are coherent.
 
 Sorry, but I see at least two drawbacks in such a solution:
 
 1. Nowadays few tty sessions are started and closed by init(8).
    Much more often it's sshd, or telnetd, or some sort of XDM
    that manages sessions and the utmp and wtmp files.  That leads
    to the need of fixing all such applications, which are mostly
    third-party software.
 
 2. Since the wtmp file is a log file, each "lookup" will mean
    scanning it sequentially, which will lead to excessive resource
    use and the possibility of DoS attacks if wtmp is large.
 
 -- 
 Yar

From: "Cox SMTP central" <harry685@cox.net>
To: <freebsd-gnats-submit@FreeBSD.org>, <str@giganda.komkon.org>
Cc:  
Subject: Re: kern/15478: incorrect utmp/wtmp records update upon connection being interrupted
Date: Fri, 5 Jul 2002 14:58:50 -0500

 This is a multi-part message in MIME format.
 
 ------=_NextPart_000_0003_01C22434.78A00ED0
 Content-Type: text/plain;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable
 
 How about using a binary search instead?
 
 ------=_NextPart_000_0003_01C22434.78A00ED0
 Content-Type: text/html;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable
 
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <HTML><HEAD>
 <META http-equiv=3DContent-Type content=3D"text/html; =
 charset=3Diso-8859-1">
 <META content=3D"MSHTML 6.00.2716.2200" name=3DGENERATOR>
 <STYLE></STYLE>
 </HEAD>
 <BODY bgColor=3D#ffffff>
 <DIV><FONT face=3D"Microsoft Sans Serif" size=3D2>How about using a =
 binary search=20
 instead?</FONT></DIV></BODY></HTML>
 
 ------=_NextPart_000_0003_01C22434.78A00ED0--
 
 
>Unformatted:
