Received: from spf5.us4.outblaze.com (spf5.us4.outblaze.com [205.158.62.27]) by sdf.lonestar.org (8.12.10/8.12.10) with ESMTP id iB9IQHGk003899 for ; Thu, 9 Dec 2004 18:26:18 GMT Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by spf5.us4.outblaze.com (Postfix) with ESMTP id 973AD770B1 for ; Thu, 9 Dec 2004 18:26:16 +0000 (GMT) Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CcT9Q-0000VN-Pq for migo@homemail.com; Thu, 09 Dec 2004 13:36:28 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1CcT90-0000TY-IU for gnu-arch-users@gnu.org; Thu, 09 Dec 2004 13:36:02 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1CcT8z-0000T6-LJ for gnu-arch-users@gnu.org; Thu, 09 Dec 2004 13:36:02 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CcT8z-0000T1-IR for gnu-arch-users@gnu.org; Thu, 09 Dec 2004 13:36:01 -0500 Received: from [205.149.2.136] (helo=xl2.seyza.com) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1CcSye-0007UV-Tu for gnu-arch-users@gnu.org; Thu, 09 Dec 2004 13:25:23 -0500 Received: from xl2.seyza.com (localhost.seyza.com [127.0.0.1]) by xl2.seyza.com (8.12.10/8.12.10) with ESMTP id iB9IQOqe066573; Thu, 9 Dec 2004 10:26:24 -0800 (PST) (envelope-from lord@xl2.seyza.com) Received: (from lord@localhost) by xl2.seyza.com (8.12.10/8.12.10/Submit) id iB9IQKJD066570; Thu, 9 Dec 2004 10:26:20 -0800 (PST) (envelope-from lord) Date: Thu, 9 Dec 2004 10:26:20 -0800 (PST) Message-Id: <200412091826.iB9IQKJD066570@xl2.seyza.com> From: Thomas Lord To: tbrowder@cox.net, gnu-ach-dev@lists.sezya.com.seyza.com In-reply-to: <20041209120211.UXZ12874.lakermmtao12.cox.net@nonerjsnum1tkq> (tbrowder@cox.net) References: <20041209120211.UXZ12874.lakermmtao12.cox.net@nonerjsnum1tkq> Cc: gnu-arch-users@gnu.org, cduffy@spamcop.net Subject: [Gnu-arch-users] obtaining delta compression with 1.x (was Re: Arch, CVS, Subversion) X-BeenThere: gnu-arch-users@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: a discussion list for all things arch-ish List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: gnu-arch-users-bounces+migo=homemail.com@gnu.org Errors-To: gnu-arch-users-bounces+migo=homemail.com@gnu.org Status: RO Content-Length: 4617 Lines: 136 It occurs to me that all the fuss over how to implement binary delta compression and merging in the face of binaries is mostly a fuss over nothing. It is pretty easy to do already, even using only the features available in `tla-1.2'. (Some additional hooks would make it even easier.) Here is one way to do it: > From: "Tom Browder" > Um, a typical project has a binary file that grows to about 10 > Mb. Snapshots (check ins) are done throughout the day as > significant editing is done to the file (mainly for backup and > to give a rollback capability in case of errors). A project > will typically take 90+ days to complete, each day with several > check ins. We have over 120 projects archived (but none of the > binary files yet). > The way we work around the lack of binary diffs now is the > binary file has an ASCII equivalent so we convert to ASCII and > check it in. To reconstruct, check out the correct copy and > reconvert to binary (a pain, and subject to error). Ideally we > could just deal with the binary form. I am glad you don't mind *too* much the idea of coverting files between formats before and after ``commit''s and ``get''s'. My suggestion is that instead of converting to ASCII, you grab `xdelta', and convert to an already-delta-compressed format. To illustrate, let's suppose that you have a binary file: diagram.jpg I suggest that in the archived trees, you store that file as: diagram.jpg.base diagram.jpg.xdelta After checkout, you can construct `diagram.jpg' with: % apply-xdelta diagram.jpg.xdelta diagram.jpg.base > diagram.jpg Before committing, you can run % make-xdelta diagram.jpg.base diagram.jpg > diagram.jpg.xdelta There is a catch. If the resulting size of `diagram.jpg.xdelta' plus the previous size of `diagram.jpg.xdelta' exceeds the size of `diagram.jpg', then instead you should run: % rm diagram.jpg.base % cp diagram.jpg diagram.jpg.base % make-xdelta diagram.jpg.base diagram.jpg > diagram.jpg.xdelta Your arch changesets will, as a result of taking those steps, contain delta-compressed binaries. You might consider making `.jpg' files (or whatever your binary files are) `precious' in arch inventories. If you want to get fancy, you could implement a system of Emacs-style numbered backups: diagram.jpg.base diagram.jpg.xdelta diagram.jpg.xdelta.45 diagram.jpg.xdelta.46 diagram.jpg.xdelta.47 In that tree, the four most recent versions of the binary file are kept conveniently on-hand. After the next commit, the tree will contain: diagram.jpg.base diagram.jpg.xdelta diagram.jpg.xdelta.46 diagram.jpg.xdelta.47 diagram.jpg.xdelta.48 If you later decide that merging would be useful to you, then you can extend the above practices by "forking" each binary file for each branch. You might wind up with: diagram.jpg.base diagram.jpg.xdelta.official diagram.jpg.xdelta.testing diagram.jpg.xdelta.alice diagram.jpg.xdelta.bob You don't get automatic merging of `jpg' files, that way (of course) --- but if branches make changes to the binaries, when merging them back, at least you wind up with both alternative versions of the file to work with and pick from. This approach adds a new twist to working with project trees: there is a resulting "state" that reflects which `.jpg' files have been inflated from which bases and deltas, and so forth. So this solution creates the new problem of how to manage that extra state (hence my suggestion at the end of the first paragraph about new hooks). Tom Browder's experience, quoted above, suggests that the new problem isn't an impractical one, even today. (He isn't using xdelta but he is "inflating" ephemeral source files from archived delta-compressable files. And he has to "deflate" the ephemeral files to get a committable form --- the same procedures can implement binary delta compression much more directly.) I think it is a good trade off to solve the problem of binary delta compression in arch in exchange for taking on the problem of managing the inflation/deflation to/from binaries (and other kinds of ephemeral source tree contents) from archived xdelta files (and other kinds of archive-format-friendly-file-formats). -t _______________________________________________ Gnu-arch-users mailing list Gnu-arch-users@gnu.org http://lists.gnu.org/mailman/listinfo/gnu-arch-users GNU arch home page: http://savannah.gnu.org/projects/gnu-arch/