Newsgroups: comp.os.mach
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!apollo!goykhman_a
From: goykhman_a@apollo.hp.com (Alex Goykhman)
Subject: Re: Efficient copying from task to task
Message-ID: <1991Jun6.175926.8654@apollo.hp.com>
Sender: netnews@apollo.hp.com (USENET posting account)
Nntp-Posting-Host: dzoo.ch.apollo.hp.com
Organization: Hewlett-Packard Company, Apollo Division - Chelmsford, MA
Date: Thu, 6 Jun 1991 17:59:26 GMT

In article <1991Jun3.085647.25107@ecrc.de> veron@ecrc.de (Andre Veron) writes:
>In article <51b7861a.20b6d@apollo.HP.COM>, goykhman_a@apollo.HP.COM
>(Alex Goykhman) writes:
>|>>Because of 3/ and 5/ the eager copying of UNIX fork put UNIX out of
>the game.
>|>>
>|>>We then put some hope in the lazy copying of MACH.  The problem then
>appeared
>|>>to be that MACH as well as UNIX is not designed for applications which
>|>>need intensive forking of "threads" of computation which have their own 
>|>>separate adress spaces. ....................................................
>|>
>|>    Why do you think so?
>|>
>|>
>|>>....................... The task in MACH  is a coarse grain entity which
>|>>have a whole bunch of available facilities like files, communication
>|>>port which are not always needed but are always there. A task is
>consequently
>|>>costly to create, to schedule and to terminate.
>|>
>|>    Files???  Contextwise, a MACH task is (mostly) a collection of VM
>regions. 
>|>    Considering the way MACH manages memory (page aliasing), creating a new
>|>    task/context should be relatively cheap.
>|>>
>
>
>It is claimed that Mach is able to fork off tasks in constant time
>(without taking into account the cost of future page faults due to copy-on-
>write copying). It simply can not be true.
>
>All the pages inherited by the children which have the copy-on-write property
>have to be set to read-only before forking in order to trigger
>the copy-on-write. This implies a scan of the region for modifying the pages
>properties and to invalidate the corresponding TLB entries for these pages.
>This has a LINEAR cost.

    The Mach's copy-on-write property is based on harware's ability to
    write-protect a physical page, and that has to be done only once 
    (and not necessarily during a context switch), regardless of how many 
    forks are issued involving the page.

    If you really need to invalidate TLBs for VM_INHERIT_COPY, you can't blame
    that on Mach but rather on the MMU that you are using.  What you really
    need is a "global" bit associated with every TLB entry and set to '1' only
    for VM_INHERIT_SHARE and VM_INHERIT_COPY pages.  This way, the "private"
    TLB entries could be cheaply purged during a context switch.
>
>When it is claimed that that the cots is constant, I do not conclude
>that the designers are liars but  that this linear cost is simply hidden
>by some other CONSTANT costs which are usuallly much bigger.
>
>Since all I would be interested in task forking is the virtual memory handling
>I conclude that MACH does not fulfill my needs.
>
>|>>                             ------------- : Owned by Thread1
>|>>                             ------------- : Owned by Thread2
>|>>                             -------------
>|>>     |-----Shared space----| Private space |--------Shared space -------|
>|>>                             ------------- : ....
>|>>                             ------------- : ....
>|>>                             ------------- : Owned by ThreadN...
>|>
>|>    I am not sure what you mean by "private", since you also indicate that a 
>|>    "private" region will be read-shared with the parent task.
>|>
>
>In the scheme I want, parent threads/processes are suspended until thier
>offsprings terminate. Private means that the sibling threads/processes
>do not see thei respective private regions.

    If the parent always gets suspended till a child terminates, than only one
    process (the "youngest" child) could be running at any given time.  ???
>
>|>    What you are really looking for is a VM_INHERIT_MOVE inheretance
>attribute
>|>    in addition to VM_INHERIT_SHARE, 0VM_INHERIT_COPY, VM_INHERIT_NONE
>already
>|>    provided by MACH.  While it would be nice to have one supported by MACH,
>|>    you should be able to achieve similar results with VM_INHERIT_COPY and 
>|>    vm_deallocate(parent_task).  
>|>
>|>> Within a quantum of time allocated
>|>>to a global "task" (not a MACH task any more) context switching between
>|>>these threads/processes is cheap - the cost is the one of a unmapping 
>|>>the pages of one thread/process and mapping those of the next one.
>|>>Since the concept of private region is hardwired in the paradigm
>|>>all virtual memory handling can be done at forking
>time/context-switching time
>|>>when the system is in kernel mode. No additional and unelegant system calls
>|>>are need to set up  the execution environement of a newly created/scheduled
>|>>thread/process. Moreover teh resources (physical pages) allocated to a
>|>>terminated
>|>>thread can be kept by the "task" and ready to be allocated to the
>next created
>|>>thread/process.
>|>
>|>    Looks like you just reinvented the MACH :)
>
>
>You are hard with me !! :->.
>If such memory handling is mimicked in MACH with tasks and VM_INHERIT_COPY,
>this does not work because taks are considered to be independent entities.
>Hence during task context-switch all the pages of the scheduled out task
>are removed form the TLB (Ok it may not be that bad if your MMU has a context
>information in its TLB lines but still....).
>
>My point is that since I know that these threads/processes are tightly
>connected
>I do not want to remove the TLB entries other than the ones from the
>private regions.
>
>You cant do that in MACH

    Mach can not prevent you from selectively purging TLB entries because
    it knows nothing about the TLBs.  Maintaining them is a job of the 
    machine-dependent (pmap) code which you can always modify it to suit your 
    needs, assuming that the underlying MMU harware would let you do it.

>
>BTW: I would need a forking time of less than 1 ms with a private region
>(or VM_INHERIT_COPY region) has big as possible.
>                                                                        


