(PSYC) Version_control

psyced
PSYC
White Paper

Software
Newsletter
Help
Webchat with us

Version control is amazing. How you can get so much better at managing information with it. And source code of course. That's why it is also known as SCM, source code management, but source code shouldn't be the only thing a VCS can handle. I want it to do binary files too, so it can version control websites or even [distributed file systems.

Contents

1 For developing PSYC tools?
2 For distributing PSYC tools?
3 Use PSYC for version control?
4 Issues about version control systems
5 Comparison

For developing PSYC tools?

We currently use both shared and decentralized GIT repositories. Here's a collection of csh aliases to make life with version control systems softer:

http://www.psyced.org/dist/config/versioncontrol.tcsh

The newest copy is in the psyced git in the config directory.

For distributing PSYC tools?

How should we allow for software updates and distribution? So far we used CVS. Version control should allow for public updates in clean deltas, in particular for binary files. Like rsync, but without having to find out which files have changed and how.

Obviously, a distributed VCS cannot be so helpful here: you don't want every user to have a complete version history of the project. So, if a VCS is suitable to this job at all, it would probably be subversion. http://info.wsisiz.edu.pl/~blizinsk/git-bzr.html explains a model where you use git or bazaar locally, then still have a subversion repository at the root of it all. Other approaches allow to emulate a subversion server based on a distributed VCS etc.

Currently we provide a git to everyone, but it makes distribution double as big. Is it worth it?

Use PSYC for version control?

Another interesting thought is, should version control updates always be manual pull operations? What about an ability to push changes in a large multicast to all repository clones, then the owner can use them later on. It actually fits the GIT model pretty well. See also software project#Events Are Us.

The advantage over mere notification (as we already have it) is, enormous bandwidth savings if large releases are multicast into the Internet just once. Not typically for PSYC tools themselves. But for larger things instead. Since you can't BitTorrent a git. Also torrents aren't incremental or very suited to run forever.

Issues about version control systems

I'm unhappy with diffs that cannot deal with long lines of written text. If just one character has been changed, it is hard to see. Web-based diffs such as Mediawiki's are better at that. I noticed hg has an extension for external diff apps, I suppose most VCS provide something like that.
Integration with bug tracking tools like trac seems to be available for most combinations of VCS and bug tracking.

Comparison

Okay I need to write down some pros and cons, so I might as well do it here. Information stems from Wikipedia:Comparison of revision control software, the Berlios comparison and Wheeler's comments. Josh Carter has some interesting experience with large binary files. JavaWorld, preed and sayspy dig deeper into hg versus bzr with varying degrees of accuracy and up to date information.

Centralistic systems

CVS

We all know why we don't want to use this. But just as a reminder, CVS has a feature that is hard to find in newer systems: Subdirectory checkouts can exist by themselves. Okay, not that important, as long as I get to have my subdirectory checkins.

subversion

I like the idea of having the possibility to do checkins while I am off the Internet. svn is server-based. You need to install SVN::Mirror and SVN-Pusher to have a local copy of the repository. Alternatively you can use a git to do local work for a svn, using git-svn.
Even though the history of the repository is not locally kept, the metadata is said to be quite big, anyway.
svn has no nice approach to file renaming. "a rename operation is actually a copy-with-history-and-delete sequence." svn fails at the renaming test.
No explicit support for changesets: "There are implicit changesets that are generated on each commit."
Pro: Subdirectory checkouts are possible as with CVS. (but they are considering to drop that!)
A subversion server is said to be a bit more cumbersome to install (like CVS), than other systems.
Svn is said to be really bad at merging.

Established distributed systems

git

Pro: git seems generally to be the fastest of all VCS.
According to Josh Carter git can handle large binary files in the most space efficient way, if you let it take its time to do git-gc. If you forget about git-gc, git becomes slow and huge.
Personal frustration when git first uploaded several megabytes of patch to the server to find out that a colleague had checked in two lines of changes in the meantime. Check-in operations should not suffer from race conditions, but with git this seems to be the case.
Usability: That colleague also complains git frequently doesn't get merges done and you have to try strategies manually. For me it is worse: I have several times not been able to figure out how to fix a checkout that rejected merging. Or maybe I did, but it was impossible to remember in which order I did mystic commands. git is apparently the toughest version control system to get used to.
Also I don't like how git fakes file renames by delete and add. I want something that handles native file renaming. Does not support cloning of files with history. git fails at the directory renaming and reverting test.
Subdirectory checkouts cannot exist by themselves.
Subdirectory checkins can only be achieved by restricting your submit permissions.
git is said to be able to do things no other system can do... Same goes for darcs and monotone.
git is said to be a bad choice for Windows users. Why are you still on windows?

Created commit fa03bd1: end of list marker for clients
 1 files changed, 2 insertions(+), 1 deletions(-)
Counting objects: 9, done.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 535 bytes, done.
Total 5 (delta 4), reused 0 (delta 0)
error: insufficient permission for adding an object to repository database ./objects

fatal: failed to write object
error: unpack failed: unpacker exited with error code
To ssh://git.psyced.org/~psyc/psyced.git
 ! [remote rejected] master -> master (n/a (unpacker error))
error: failed to push some refs to 'ssh://git.psyced.org/~psyc/psyced.git'

Here I am trying to give git yet another chance, then it dies on me with a cryptic message. I have write access to all of the local and remote files. I have no idea what it is missing. No google, no strace and no grep in the docs is of any help. I finally resolved by redoing a clone --bare of the remote repository.. that means, throwing away 'ssh://git.psyced.org/~psyc/psyced.git' and making a new one. That's not how things should be.

Mercurial (hg)

Pro: http://trac.edgewall.org/wiki/TracMercurial looks better than trac for bzr currently.
"When you use the 'hg rename' command, Mercurial makes a copy of each source file, then deletes it and marks the file as removed." Still, it can revert all of that, normally. hg passes the renaming test. It's not as efficient as bzr with it however: Renaming in Mercurial is indeed wasteful on disk. I've reorganized some folders with large files and the commit increased my repository size considerably -- I wasn't expecting that. I just tried the same in Git and it's nearly a zero-cost operation, as it should be. That's a compelling advantage for Git over Mercurial. (from Josh Carter)
Pro: The inotify extension is truly to my liking: an event push approach to knowing what the user is doing to her repositories rather than having to figure it out. Unfortunately it is not provided as a gentoo USE flag yet. See also file monitoring with PSYC, if you like the topic.
Subdirectory checkouts are planned for the future.

Experimental and less popular systems

bazaar (bzr)

Pro: Does intelligent renames. I really like the idea how some new contributor's rename operations can never break my repository, but for psyced it should also be fine to forbid rename operations to code newbies. According to the comments on that page, several other systems deal with the problem okay. Obviously, bzr passes the renaming test.
Does not support cloning of files with history (no bzr cp).
Subdirectory checkouts cannot exist by themselves.
Bazaar does not work with really big files (i.e. 80 MB and up). See https://bugs.launchpad.net/bugs/109114. (from Josh Carter)

darcs

Doesn't support symlinks.
Does intelligent renames and the renaming test.
Gentoo: darcs doesn't emerge currently.
Not integrated into trac apparently.
Does not support cloning of files with history.
Subdirectory checkouts cannot exist by themselves.
Scarce popularity and add-ons/GUIs. We'd be the first important project to use it. ;)
Pro: Said to be uniquely flexible in branching, merging, and cherry-picking by means of extra semantical awareness.
Not as mature as other systems, and possibly slow because of Haskell.

monotone

Symlinks and executable bits need to be defined in a metafile called .mt-attrs.
Not clear if it can do intelligent renames, but it passes the rename test.
Subdirectory checkouts cannot exist by themselves.
Uses the programming language lua for hooks and SQLite for data.
Interesting.. "Good support for recording status about approvals and disapprovals"