Thursday, September 11, 2008

man-pages goes git (at last!)

[Update: 19 Sep 2008: when I first attempted the git import, I didn't import the subversion tags into git. I've updated and expanded this post to include the required details to do that.]

When I inherited man-pages, there was no version control system (VCS) in use. To help myself keep track of changes, I've been running a private subversion repository since I took over as maintainer (i.e., since man-pages-2.00), but I never got round to hosting it on a public server so that people could pull from it (requests for such a facility were only occasional). Instead, people wanting to send patches would just grab the latest tarball from the downloads directory, patch the required source file, and email me the patch.

Somewhat more frequent requests for a public repository, and the fact that the Linux world is nowadays mostly oriented around the git distributed version control system, have gradually created a pressure to change things. So, I'm taking baby steps towards using git for man-pages. Here goes...

Importing from subversion to git

I found the Simplistic Complexity blog's simple instructions on subversion-to-git migration quite useful (though it didn't supply all of the details I needed for importing subversion tags).

My subversion repository had a somewhat non-standard layout, which affected the options (see the git svn init command below) that I needed to do an import that included my subversion tags. (Thanks to various people on the git mailing list who helped me find the right way to do things, especially Björn Steinbrink and Michael Gruber.) The following subversion commands give an idea of the layout:

$ svn list file:///home/mtk/man-pages-rep
branches/
tags/
trunk/
$ svn list file:///home/mtk/man-pages-rep/trunk
man-pages/
$ svn list file:///home/mtk/man-pages-rep/trunk/man-pages
Changes
Changes.old
Makefile
README
...
man7/
man8/
scripts/
$ svn list file:///home/mtk/man-pages-rep/tags
man-pages-2.00/
man-pages-2.01/
...
man-pages-3.08/
man-pages-3.09/
$ svn list file:///home/mtk/man-pages-rep/tags
man-pages-2.00
man-pages-2.01
...
man-pages-3.08
man-pages-3.09
$ svn list file:///home/mtk/man-pages-rep/tags/man-pages-2.00
man-pages
$ svn list file:///home/mtk/man-pages-rep/tags/man-pages-2.01
man-pages
[and so on]
$ svn list file:///home/mtk/man-pages-rep/branches
$
[i.e., no branches, since this has been a linear svn repo.]
Set up an empty, temporary git repository, in the process informing git about the location of the subversion repository from which the import should be done:
$ cd $HOME
$ mkdir man-pages-git-tmp
$ cd man-pages-git-tmp
$ git svn init file:///home/mtk/man-pages-rep/ \
-T trunk/man-pages \
-b branches/*/man-pages
-t tags/*/man-pages
--no-metadata
Initialized empty Git repository in .git/
Tell git about the names of users in the subversion commit logs. Because the subversion repository was private (I just took other people's emailed input and made changes as required), I'm the only user. That means that the historical information in the repository will be bogus, suggesting that I'm responsible for all of the nearly 5000 commits to the repository, when it's probably more like 75%. My apologies to everyone else. Something like the true story for man-pages 2.00 through to 3.09 can be found here.
$ cat > ~/users.txt
mtk = Michael Kerrisk
^D
$ git config svn.authorsfile ~/users.txt
Do the import:
$ time git svn fetch
[...]
M Changes
r4917 = 93a6e8f9ee5d0710a084425548775348389e2900 (git-svn)
Checking out files: 100% (1925/1925), done.
Checked out HEAD:
file:///home/mtk/man-pages-rep/trunk/man-pages r4917

real 70m24.850s
user 14m46.571s
sys 22m29.852s
And yes, I definitely need some faster hardware...

At this point, git branch -a showed that I had imported the subversion tags into git. But the tags still need to be turned into proper git tags. This requires a command for each tag, which can be automated in a script. (Thanks to Heikki Orsila who pointed out to me that this step was required, and who supplied the script.)
$ cat git-svn-tags.sh
#!/bin/sh
#
git branch -a | grep tags/ | while read tag ; do
tagname=$(echo $tag | cut -d/ -f2)
commit=$(git rev-parse $tag)
res=$(git log master | grep "^commit $commit")
if test -z "$res" ; then
# take the parent commit for the tag commit (found in master)
commit=$(git rev-parse $tag^)
fi
echo $tagname $commit
git tag -a $tagname $commit -m "This is $tagname"
done

$ sh ~/git-svn-tags.sh
man-pages-2.00 105a35bc69bd3088d69f5d3c94d22e6595e223d2
man-pages-2.01 972a9aee950d4df97cc1f81147b64fe52007dfc7
...
man-pages-3.08 5c8cbdc1b2c9b5a1ebb8626361ea01cb73965636
man-pages-3.09 81468e1308f643ffd5718809379c074ff75dc311
$ git tag -l
man-pages-2.00
man-pages-2.01
...
man-pages-3.08
man-pages-3.09
Clone the temporary repository, to produce a clean repository from which any lingering cruft that was used to support git svn has been removed:
$ git clone man-pages-git-tmp man-pages-git
Initialized empty Git repository in /home/mtk/man-pages-git/.git/
0 blocks
And then run a garbage collection to clean up any remaining unneeded files, and otherwise improve the efficiency of the repository:
$ git gc
Counting objects: 36341, done.
Compressing objects: 100% (12059/12059), done.
Writing objects: 100% (36341/36341), done.
Total 36341 (delta 29117), reused 29623 (delta 23595)
Hosting the git repository on kernel.org

The public git repository is to be hosted on kernel.org. Quite a while back the admins there already set up a location for a man-pages git repository. Jeremy Kerr kindly held my hand through the hurdles of the initial set up.

On kernel.org, the admins assigned man-pages a repository location of /pub/scm/docs/man-pages. So, let's create an empty repository there:
$ ssh mtk@master.kernel.org
mtk@master.kernel.org's password:
[mtk@hera man-pages]$ cd /pub/scm/docs/man-pages
[mtk@hera man-pages]$ git init
[mtk@hera man-pages]$ mv .git/ man-pages.git
[mtk@hera man-pages]$ exit
Back on my local machine, modify the git repository's config file so that I can write simple git push commands to push changes onto the kernel.org repository
$ cat >> ~/man-pages-git/.git/config
[remote "kernel.org"]
url = ssh://master.kernel.org/pub/scm/docs/man-pages/man-pages.git
push = +refs/heads/master:refs/heads/master
And then do the push to kernel.org:
$ cd $HOME/man-pages-git
$ git push kernel.org
mtk@master.kernel.org's password:
Counting objects: 36341, done.
Compressing objects: 100% (12059/12059), done.
Writing objects: 100% (36341/36341), 8.62 MiB 681 KiB/s, done.
Total 36341 (delta 29117), reused 29623 (delta 23595)
To ssh://master.kernel.org/pub/scm/docs/man-pages/man-pages.git
* [new branch] master -> master
Log into kernel.org and give the repository a public description:
$ ssh mtk@master.kernel.org
mtk@master.kernel.org's password:
[mtk@hera man-pages]$ cat > /pub/scm/docs/man-pages/man-pages.git/description
Man pages for Linux kernel and glibc APIs
^D
[mtk@hera man-pages]$ exit
And then we have the repository visible and ready for use at http://git.kernel.org/, and you can clone it using the following command:
$ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
PS In case it's not clear: only use git to submit changes to man-pages if that's your preference. It's still possible to submit patches the good old-fashioned way, by grabbing the latest release (tarballs will continue to be released as usual every week or two), editing the required source file(s), and sending me a diff -u patch by email.

2 comments:

_ said...

Congratulations, finally.

I am hoping that you would find that the switch was worth the hassle. I should have tried much harder to sell git to you when we sat next to each other at OLS a few years ago.

And thanks for keeping the manual pages up to date for all these years.

-jc

Michael Kerrisk said...

Thanks. As it turns out, here's still a few things to work out -- my import didn't manage to bring the svn tags with it, and so far I haven't deduced the right way of doing that. Drop me a mail if you know more about that.