- A new makedev(3) page documents the makedev(), major(), and minor() macros used to manipulate device IDs.
- A new pthread_cleanup_push_defer_np(3) page documents pthread_cleanup_push_defer_np() and pthread_cleanup_pop_restore_np().
- The accept(2) page adds documentation of the new accept4() system call (coming in Linux 2.6.28)
- The fmemopen(3) page adds a description of open_wmemstream(3).
- The tcp(7) page adds a description of the use of the MSG_TRUNC flag for retrieving data from a TCP socket.
- Many updates to the atexit(3) page.
- Updates to many other pages
Friday, December 5, 2008
Posted by Michael Kerrisk at 11:30 PM
Wednesday, December 3, 2008
Not quite 6 months since I started the Linux Foundation fellowship, it's time to analyze and reflect on what has (or hasn't) been accomplished.
I took over maintainership of man-pages at the start of November 2004, with the first release being man-pages-2.00. From then until the fellowship started in the middle of May this year (a period of 185 weeks), I probably spent between 0 and 2.5 days a week on man-pages, most of it done as private, volunteer work. (For a period of around a year, I probably managed up about half day a week as part of my day job; thanks Google!) I'd guess it was a bit better than day a week on average (let's say 1.25 days), and we could roughly estimate that as the equivalent of 45 working weeks.
Since the fellowship started, I've worked for about 25 weeks on man-pages; that is, somewhat more than half of the estimated time that I spent on man-pages in the preceding 3.5 years. The first release during my tenure of the fellowship was man-pages-2.80, and since then there have been 15 more (man-pages-3.00 through man-pages-3.14).
What I'm expecting is that the limiting factor in the progress of man-pages is the availability of my time. If I get to work at around four times the rate I did before, then we should see a corresponding increase in the progress of man-pages. Very roughly, in the last 6 months, progress should have been somewhat more than 50% of what it was in the previous 3.5 years.So here's a first comparison:
|Period||Number of releases|
Well, that doesn't look so good. But there's no question that there's more work going on for each release nowadays. Here's another simple statistic, derived from the commit logs:
|Period||Number of commits|
Commits in the last 6 months were nearly 50% of the total during the previous 3.5 years. That seems roughly in line with expectations, and supports the theory that there's a lot more work going into each man-pages release nowadays. Of course, commits vary a lot in size, ranging from a spelling fix, to a complete new page, and going through to some of the enormous global formatting fixes that took place in the man-pages-2.* series, so this is a very rough measure. (One of the commits cleaning up source files layout in man-pages-2.47 had a diff size of more than 60000 lines(!). There were many other large formatting commits in the man-pages-2.*, which is why trying to compare the volume of diffs before and during the fellowship doesn't produce a useful metric.)Another rough measure is how many man pages were added to the set over time:
|Period||Number of new pages added|
Again, that's roughly in line with expectations, with the number of pages added during the fellowship being somewhat more than 50% of the previous period.
But where did the new pages come from?
|Period||By mtk||By others||By mtk + other(s)||Imports|
"mtk" is me. "Other(s)" is someone else. "Imports" are pages under a free license that I scooped up from some other source (e.g., found on the net, in a distro, or in BSD).
On the negative side, I wrote the vast majority of new pages that have been added so far during the fellowship. On the positive side, Paul Jackson contributed the single biggest page, cpuset(7), which became the fourth largest page in man-pages. (Also worth noting: in the man-pages-2.* releases, a total of 28 pages were deleted, mainly obsolete pages in Section 1.) In fact, I had hoped to be able to get even more pages written, but other tasks, such as testing, API review, and kernel patches have also taken up a significant fraction of my time during the fellowship. When considered as a (calendar) monthly rate, contributions of new pages by others are, unfortunately, essentially unchanged since before the fellowship.
So, progress towards improving contributions by others, at least in terms of new pages, has not been good. However, my gut feeling has been that more people are actually contributing to man-pages than before: the fact that there is a full-time maintainer means people are rather more likely to send bug reports, suggestions, and patches for existing pages. Here's a statistic that bears it out:
This was calculated by summing the number of contributors in each of the change logs in all of the releases over the two periods and then dividing by the number of calendar weeks in each period (185 and 28 respectively). 5.9 contributors per week is still much lower than I'd like, but my feeling is that the rate has increased steadily over the time of the fellowship, so that the current rate is already higher than 5.9, and set to increase further. (Another factor that may also have helped boost the number of reports is that in December 2007 I started adding a COLOPHON to each man page describing how to report bugs, and this change would have filtered into distribution CDs a few months later.)
Timeliness of documentation
Things have defintely got better during the fellowship. Most additions and changes to the kernel-userland interface during the time of the fellowship have been documented in man-pages pretty much as they occur. (This contrasts with earlier times, where interface changes have sometimes been followed only months (or in extreme cases years) later by man page updates.) Most notably, Ulrich Drepper's new system calls in Linux 2.6.27 saw man pages go out a few days after the release of that kernel.
Testing and bug reporting
I've done a fair bit of this over the course of the fellowship. Most new system calls and system call extensions got tested by me before they hit mainline. This uncovered a few bugs which were then fixed. The biggest single piece of work here was for the utimensat(2) system call, producing a test suite (later integrated into LTP), along with patches that fixed the 5 or so bugs in the interface (details here).
Many existing glibc functions also got tested as I updated the man pages for them. Most notably, updates to the man pages produced about 35 bug reports related to error reporting by the math functions. The addition of man pages for various pthreads functions has also been accompanied by a lot of testing, and a half dozen or so bug reports.
API design review
Most new system calls and system call extensions got reviewed before going into mainline. (My record on other kernel interfaces, such as /proc files, was a more spotty though.) Among other things, this resulted in a redesign of the proposed extension of the accept() system call (originally proposed as paccept(), with a signal set argument whose necessity was dubious, later revised to accept4(), which should appear in kernel 2.6.28).
- man-pages moved from a private Subversion repository to a public git repository on kernel.org.
- In general, I'm blogging a bit more actively nowadays, and in addition to posts summarizing releases, there have been longer posts on topics such as: problems with kernel-userland interface design and implementation; the state of error reporting for glibc's math functions; and a few articles describing or summarizing Linux kernel-userland interface changes.
- After my presentation at LPC for the kernel-userland interface track, I finally got round to an idea I'd been considering for a while: creating the linux-api mailing list. The rationale for the list is that all patches that cause API/ABI changes should be CCed to the list, so that the many parties who are interested in API/ABI changes (e.g., man-pages, LSB, libc developers, kernel developers, testers such as the folk at LTP, and of course userland developers) can get an idea of what's going on. Most people still don't read Documentation/SubmitChecklist, to know they should be using this list, so I try to regularly chase people to use it (and some others also help in that regard), and by now at least some people do so without prompting.
- I've helped out LSB on a number of occasions, filing a few bug reports against the spec, but also helping out by writing man pages for functions that they wanted to specify that were currently undocumented (see, e.g., here and here). This resulted in new man pages such as getprotoent_r(3), getservent_r(3), getnetent_r(3), and pthread_getattr_np().
- I continue to respond to many bug reports in the manpages and manpages-dev components of Debian's bug tracking system. This has mutual benefits: on the one hand, although I'm not actually a member of Debian, I'm by far the most active fixer of their bug reports; on the other hand, most Debian bug reports for man pages really apply to the upstream pages (I ignore the ones that don't), and so the reports provide a valuable source of pointers to things that need fixing in man-pages. A big thank you to Debian users, who produce far more (and more useful) man-pages bug reports than all of the other distributions put together!
- Working on man-pages led me to find various deficiencies in POSIX.1 specifications, resulting in around a half dozen bug reports to the Austin group.
Posted by Michael Kerrisk at 11:50 AM
Tuesday, November 25, 2008
- A new CPU_SET(3) page documents the CPU_* macros used for manipulating CPU sets (the cpu_set_t data structure used by sched_setaffinity(2), pthread_attr_setaffinity_np(3), and pthread_setaffinity_np(3)).
- More man pages for POSIX threads library functions: 5 new pages documenting 8 functions
- pthread_attr_setinheritsched(3) (includes documentation of pthread_attr_getinheritsched())
- pthread_cleanup_push(3) (includes documentation of pthread_cleanup_pop())
- pthread_setcancelstate(3) (includes documentation of pthread_setcanceltype())
- clone(2) adds documentation of the CLONE_NEWNET, CLONE_NEWUTS, CLONE_NEWPID, and CLONE_IPC flags.
- mmap(2) adds documentation of the MAP_STACK flag.
- arp(7) adds documentation of a few preously undocumented /proc files.
- icmp(7) adds documentation of a few previously undocumented /proc files.
- tcp(7) adds documentation of many previously undocumented /proc files.
- udp(7) adds documentation of a few previously undocumented /proc files.
- pthreads(7) adds lists of functions that POSIX.1 specifies are and may be cancellation points.
Posted by Michael Kerrisk at 10:40 PM
Friday, November 7, 2008
- More man pages for POSIX threads library functions: 6 new pages documenting 11 functions.
- pthread_attr_setaffinity_np(3) (includes documentation of pthread_attr_getaffinity_np(3))
- pthread_attr_setschedparam(3) (includes documentation of pthread_attr_getschedparam(3))
- pthread_attr_setschedpolicy(3) (includes documentation of pthread_attr_getschedpolicy(3))
- pthread_setaffinity_np(3) (includes documentation of pthread_getaffinity_np(3))
- pthread_setschedparam(3) (includes documentation of pthread_getschedparam(3))
- pthread_attr_setaffinity_np(3) (includes documentation of pthread_attr_getaffinity_np(3))
- Various fixes and improvements to the example program in epoll(7).
Posted by Michael Kerrisk at 2:27 PM
Thursday, October 30, 2008
In recent Linux kernels, especially 2.6.27, a number of system calls have changed, or new versions of existing system calls have been added, to allow more control over the file descriptors created by those system calls. (Most of this work has been done by Ulrich Drepper.) These changes have taken the form of either adding new bits to the flags bit-mask argument of an existing system call, if it had such an argument, or creating a new version of the system call that adds an extra flags argument. In most cases, two new flags have been added: a close-on-exec flag, and a non-blocking flag, which we describe shortly.
The changes are summarized in the table below. In this table, the Kernel column indicates the kernel version where the change occurred, and the Glibc column indicates the version of glibc that adds the corresponding wrapper functions and/or header file definitions. (Note: glibc 2.9 is not yet released.)
New flag: O_CLOEXEC
Flag also supported for openat(2). These syscalls already supported O_NONBLOCK.
New flag: F_DUPFD_CLOEXEC
Performs a similar task to dup3(2)
|recvmsg(2)||New flag: MSG_CMSG_CLOEXEC||2.6.23||2.7|
|dup3(2)||New syscall, like dup2(2), but adds flags argument (O_CLOEXEC)||2.6.27||2.9||Requires new glibc interface|
New syscall, like pipe(2), but adds flags argument: O_CLOEXEC, O_NONBLOCK
Requires new glibc interface
|New flags in type argument: SOCK_CLOEXEC, SOCK_NONBLOCK|
|New flags in type argument: SOCK_CLOEXEC, SOCK_NONBLOCK|
New syscall, like epoll_create(2), but adds flags argument: EPOLL_CLOEXEC; the new system call drops epoll_create()'s obsolete size argument
Requires new glibc interface
New syscall, like inotify_init(2), but adds flags argument: IN_CLOEXEC, IN_NONBLOCK
Requires new glibc interface
New syscall, like eventfd(2), but adds flags argument: EFD_CLOEXEC, EFD_NONBLOCK
The glibc eventfd() wrapper already allowed a flags argument, so no new wrapper is required
New syscall, like signalfd(2), but adds flags argument: SFD_CLOEXEC, SFD_NONBLOCK
The glibc signalfd() wrapper already allowed a flags argument, so no new wrapper is required
New flags: TFD_CLOEXEC, TFD_NONBLOCK
A proposed analogous change for accept(2), paccept(), supporting flags SOCK_CLOEXEC and SOCK_NONBLOCK and treatment of a signal mask argument like pselect(2), was debated and then spent some time in limbo, but has recently re-emerged in a somewhat modified form, accept4() (which was in fact the original proposal), that will probably go into Linux 2.6.28 or 2.6.29.
Perhaps one day there might even be an analogous change for mq_notify(3), since (on Linux, but not on most other systems) a message queue descriptor is really just a file descriptor.
The close-on-exec flag (*_CLOEXEC)
The addition of a close-on-exec flag was the primary motivator for the system call changes. Specifying this flag causes the file descriptor created by the system call to automatically have its close-on-exec flag set. (This flag causes the file descriptor to automatically be closed if the process does a successful execve(2).)
Before the existence of this flag, it was possible to change the close-on-exec flag of a file descriptor after it has been created, using the fcntl(2) F_GETFL and F_SETFL operations. The fact that this required two additional system calls was not so problematic as the fact that the need for multiple (non-atomic) steps to set the flag on a new file descriptor meant that there were certain race conditions that could lead to races in multithreaded programs where one thread was trying to set a file descriptor's close-on-exec flag at the same time as another thread was performing a fork() plus execve(). Ulrich Drepper explains the resulting security issues in more detail.
The non-blocking flag (*_NONBLOCK)
The *_NONBLOCK flag causes the non-blocking flag to be set on the open file description associated with the new file descriptor. (For a discussion of the relationship of a file descriptor to an open file description, see the open(2) man page.)
Note that there deliberately is no *_NONBLOCK flag for dup3(2). This would not be sensible, since the new file descriptor shares an open file description with the old file descriptor.
There is also deliberately no *_NONBLOCK flag for epoll_create1(2), since equivalent functionality can be obtained with a zero timeout.
The flags argument added for the new system calls allows for other kinds of functionality to be added to these system calls in the future.
Ulrich Drepper already did some work on getting some of these interface changes into the POSIX.1-2008 standard, which includes specifications of the O_CLOEXEC flag for open() and the F_DUPFD_CLOEXEC operation for fcntl(). In the future, some the other changes may also make their way into the standard.
The numbers in the names of the new system calls refer to the number of arguments that each system call has. This is an extension of a convention that was used for some existing Unix system calls, notably dup2(2), wait3(2), and wait4(2). Note that while the wrapper function for signalfd(2) has three arguments, the underlying signalfd4() system call really does have four arguments, as described in the man page. (However, this suggests that, in the end, this naming scheme might not have been the best choice.)
Posted by Michael Kerrisk at 4:26 PM
- Documentation is added for a set of new and changed system calls (which will be the subject of a future post) that extend the functionality of existing system calls that work with file descriptors. (The changes occurred in kernel 2.6.27.) The new and modified system calls add flags that allow the close-on-exec file descriptor flag to be set, and the non-blocking file status to be set on a file description, as the file is opened. The modified pages are:
- dup(2) adds a description of the new dup3() system call.
- epoll_create(2) adds a description of the new epoll_create1() system call.
- eventfd(2) adds a description of the new eventfd2() system call.
- inotify_init(2) adds a description of the new inotify_init1() system call.
- pipe(2) adds a description of the new pipe2() system call.
- signalfd(2) adds a description of the new signalfd4() system call.
- socket(2) and socketpair(2) add a description of the new SOCK_CLOEXEC and SOCK_NONBLOCK flags.
- timerfd_create(2) adds a description of the new TFD_CLOEXEC and TFD_NONBLOCK flags.
- A start has been made on providing man pages for the POSIX threads library functions: 15 new pages documenting 23 functions.
- pthread_attr_init(3) (includes documentation of pthread_attr_destroy(3))
- pthread_attr_setdetachstate(3) (includes documentation of pthread_attr_getdetachstate(3))
- pthread_attr_setguardsize(3) (includes documentation of pthread_attr_getguardsize(3))
- pthread_attr_setscope(3) (includes documentation of pthread_attr_getscope(3))
- pthread_attr_setstacki(3) (includes documentation of pthread_attr_getstack(3))
- pthread_attr_setstackaddr(3) (includes documentation of pthread_attr_getstackaddr(3))
- pthread_attr_setstacksize(3) (includes documentation of pthread_attr_getstacksize(3))
- pthread_tryjoin_np(3) (includes documentation of pthread_timedjoin_np(3))
- Many details were added to, or fixed in, ld.so(8).
Posted by Michael Kerrisk at 3:28 AM
Tuesday, October 7, 2008
- A new umount(2) page has been created by splitting the umount() and umount2() material out of the old mount(2) page.
- The mount(2) page adds a description of per-process namespaces.
- Various fixes and improvements in getdents(2), including the addition of an example program.
- Many improvements and additions in signal(7), the page that provides an overview of signals on Linux.
- Numerous fixes to many other pages.
Posted by Michael Kerrisk at 11:30 AM
Wednesday, September 24, 2008
I've uploaded man-pages-3.10 into the release directory (or view the online pages). This is a fairly light release; conferences, and learning and changing my workflow for git have take a bit of time lately. Notable changes in man-pages-3.10 are:
Posted by Michael Kerrisk at 11:04 AM
Thursday, September 18, 2008
I updated my previous post, mainly to add the details required to import subversion tags into git. If you cloned the repository that I put up at kernel.org about a week ago, you'll need to re-clone. (Sorry!)
The timing has been good. Primed and ready to start learning more about git, I got to join a sizable crowd at the Linux Plumbers Conference to see Linus giving a highly informative and entertaining tutorial on git.
Posted by Michael Kerrisk at 5:14 PM
Thursday, September 11, 2008
[Update: 19 Sep 2008: when I first attempted the git import, I didn't import the subversion tags into git. I've updated and expanded this post to include the required details to do that.]
When I inherited man-pages, there was no version control system (VCS) in use. To help myself keep track of changes, I've been running a private subversion repository since I took over as maintainer (i.e., since man-pages-2.00), but I never got round to hosting it on a public server so that people could pull from it (requests for such a facility were only occasional). Instead, people wanting to send patches would just grab the latest tarball from the downloads directory, patch the required source file, and email me the patch.
Somewhat more frequent requests for a public repository, and the fact that the Linux world is nowadays mostly oriented around the git distributed version control system, have gradually created a pressure to change things. So, I'm taking baby steps towards using git for man-pages. Here goes...
Importing from subversion to git
I found the Simplistic Complexity blog's simple instructions on subversion-to-git migration quite useful (though it didn't supply all of the details I needed for importing subversion tags).
My subversion repository had a somewhat non-standard layout, which affected the options (see the git svn init command below) that I needed to do an import that included my subversion tags. (Thanks to various people on the git mailing list who helped me find the right way to do things, especially Björn Steinbrink and Michael Gruber.) The following subversion commands give an idea of the layout:
Set up an empty, temporary git repository, in the process informing git about the location of the subversion repository from which the import should be done:$ svn list file:///home/mtk/man-pages-rep
$ svn list file:///home/mtk/man-pages-rep/trunk
$ svn list file:///home/mtk/man-pages-rep/trunk/man-pages
$ svn list file:///home/mtk/man-pages-rep/tags
$ svn list file:///home/mtk/man-pages-rep/tags
$ svn list file:///home/mtk/man-pages-rep/tags/man-pages-2.00
$ svn list file:///home/mtk/man-pages-rep/tags/man-pages-2.01
[and so on]
$ svn list file:///home/mtk/man-pages-rep/branches
[i.e., no branches, since this has been a linear svn repo.]
Tell git about the names of users in the subversion commit logs. Because the subversion repository was private (I just took other people's emailed input and made changes as required), I'm the only user. That means that the historical information in the repository will be bogus, suggesting that I'm responsible for all of the nearly 5000 commits to the repository, when it's probably more like 75%. My apologies to everyone else. Something like the true story for man-pages 2.00 through to 3.09 can be found here.$ cd $HOME
$ mkdir man-pages-git-tmp
$ cd man-pages-git-tmp
$ git svn init file:///home/mtk/man-pages-rep/ \
-T trunk/man-pages \
Initialized empty Git repository in .git/
Do the import:$ cat > ~/users.txt
mtk = Michael Kerrisk
$ git config svn.authorsfile ~/users.txt
And yes, I definitely need some faster hardware...$ time git svn fetch
r4917 = 93a6e8f9ee5d0710a084425548775348389e2900 (git-svn)
Checking out files: 100% (1925/1925), done.
Checked out HEAD:
At this point, git branch -a showed that I had imported the subversion tags into git. But the tags still need to be turned into proper git tags. This requires a command for each tag, which can be automated in a script. (Thanks to Heikki Orsila who pointed out to me that this step was required, and who supplied the script.)
Clone the temporary repository, to produce a clean repository from which any lingering cruft that was used to support git svn has been removed:$ cat git-svn-tags.sh
git branch -a | grep tags/ | while read tag ; do
tagname=$(echo $tag | cut -d/ -f2)
commit=$(git rev-parse $tag)
res=$(git log master | grep "^commit $commit")
if test -z "$res" ; then
# take the parent commit for the tag commit (found in master)
commit=$(git rev-parse $tag^)
echo $tagname $commit
git tag -a $tagname $commit -m "This is $tagname"
$ sh ~/git-svn-tags.sh
$ git tag -l
And then run a garbage collection to clean up any remaining unneeded files, and otherwise improve the efficiency of the repository:$ git clone man-pages-git-tmp man-pages-git
Initialized empty Git repository in /home/mtk/man-pages-git/.git/
Hosting the git repository on kernel.org$ git gc
Counting objects: 36341, done.
Compressing objects: 100% (12059/12059), done.
Writing objects: 100% (36341/36341), done.
Total 36341 (delta 29117), reused 29623 (delta 23595)
The public git repository is to be hosted on kernel.org. Quite a while back the admins there already set up a location for a man-pages git repository. Jeremy Kerr kindly held my hand through the hurdles of the initial set up.
On kernel.org, the admins assigned man-pages a repository location of /pub/scm/docs/man-pages. So, let's create an empty repository there:
Back on my local machine, modify the git repository's config file so that I can write simple git push commands to push changes onto the kernel.org repository$ ssh firstname.lastname@example.org
[mtk@hera man-pages]$ cd /pub/scm/docs/man-pages
[mtk@hera man-pages]$ git init
[mtk@hera man-pages]$ mv .git/ man-pages.git
[mtk@hera man-pages]$ exit
And then do the push to kernel.org:$ cat >> ~/man-pages-git/.git/config
url = ssh://master.kernel.org/pub/scm/docs/man-pages/man-pages.git
push = +refs/heads/master:refs/heads/master
Log into kernel.org and give the repository a public description:$ cd $HOME/man-pages-git
$ git push kernel.org
Counting objects: 36341, done.
Compressing objects: 100% (12059/12059), done.
Writing objects: 100% (36341/36341), 8.62 MiB 681 KiB/s, done.
Total 36341 (delta 29117), reused 29623 (delta 23595)
* [new branch] master -> master
And then we have the repository visible and ready for use at http://git.kernel.org/, and you can clone it using the following command:$ ssh email@example.com
[mtk@hera man-pages]$ cat > /pub/scm/docs/man-pages/man-pages.git/description
Man pages for Linux kernel and glibc APIs
[mtk@hera man-pages]$ exit
PS In case it's not clear: only use git to submit changes to man-pages if that's your preference. It's still possible to submit patches the good old-fashioned way, by grabbing the latest release (tarballs will continue to be released as usual every week or two), editing the required source file(s), and sending me a diff -u patch by email.$
git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
Posted by Michael Kerrisk at 10:44 AM
Wednesday, September 10, 2008
- A new fopencookie(3) page, documents a library function that allows a custom implementation of a stdio stream.
- A new networks(5) page (adopted from Debian) documents the /etc/networks file.
- Feature test macro requirements were added or fixed for various pages.
- Many parts of the hsearch(3) man page were fixed and rewritten.
Posted by Michael Kerrisk at 3:55 PM
Wednesday, August 27, 2008
- A new numa(7) page gives an overview of the Linux NUMA interfaces.
- A new getnetent_r(3) page documents getnetent_r(), getnetbyname_r(), and getnetbyaddr_r(), the reentrant equivalents of getnetent(), getnetbyname(), and getnetbyaddr().
- A new getprotoent_r(3) page documents getprotoent_r(), getprotobyname_r(), and getprotobynumber_r(), the reentrant equivalents of getprotoent(), getprotobyname(), and getprotobynumber().
- A new getrpcent_r(3) page documents getrpcent_r(), getrpcbyname_r(), and
getrpcbynumber_r(), the reentrant equivalents of getrpcent(), getrpcbyname(), and getrpcbynumber().
- A new getservent_r(3) page documents documents getservent_r(), getservbyname_r(), and getservbyport_r(), the reentrant equivalents of getservent(), getservbyname(), and getservbyport().
- Further updates related to changes in the recently approved POSIX.1-2008 standard.
Posted by Michael Kerrisk at 3:22 PM
Friday, August 15, 2008
Tuesday, August 12, 2008
- A new move_pages(2) page, written by Christoph Lameter, documents the move_pages() system call. This page was formerly part of the numactl package, but has been revised and moved into man-pages (its natural home, since it is a kernel interface).
- A new clock_getcpuclockid(3) page describes the clock_getcpuclockid() library function.
- A new udplite(7) page, written by Gerrit Renker, documents the Linux implementation (since kernel 2.6.20) of the UDP-Lite transport layer protocol.
- The proc(5) man page adds a description of the /proc/PID/numa_maps file.
- Various updates and improvements by Lee Schermerhorn for the mbind(2), get_mempolicy(2), and set_mempolicy(2) pages.
- Following on from last week's big update to the math man pages, there are a few more changes to these pages. Most notably, where error-reporting details have differed (from glibc 2.8) in earlier versions of glibc, the differences have been noted in the man pages. Currently, the details have been extended back until glibc 2.3.2, but I hope to extend them back further, when I can get test results. If you want to help, look here.
- On 24 July, the Governing Board of The Open Group approved the 2008 revisions of POSIX.1. Among other things, POSIX.1-2008 marks some previously specified functions as obsolete, and drops the specifications of some other functions altogether. The manual pages have been updated to reflect all of these changes. (The standard should be published in final form after approval by the IEEE, but in the meantime you can get access to the draft by joining the Austin group.)
- VERSIONS sections have been added to the man pages of many library functions to indicate the glibc version where the function first appeared.
Posted by Michael Kerrisk at 2:33 PM
Friday, August 8, 2008
Lately, I've gotten a few requests for information about how to translate man-pages into other languages.
First off, I should say that I have never translated man-pages. But I have communicated with a few people who do. So these are my current thoughts...
Do you really want to do this? Before you answer this, consider the following:
- man-pages contains the documentation of the Linux and glibc programming APIs (i.e., pages in Sections 2, 3, 4, 5, and 7). Is this the set of man pages that your group of language speakers most need? If, for example, you are more interested in translating pages for end users, then you might want instead to translate pages in the coreutils package. (More generally, if you want to find out which package a particular man page belongs to, take a look here.)
- How big is the target audience? Your target audience is primarily programmers. What proportion of them aren't able to read English well enough to read man pages, and therefore would benefit from a translation? Is that group big enough to warrant the effort of a translation? Or is there perhaps a better place where you can invest your time in working on Linux?
- How much time do you have? There are currently around 850 pages in man-pages, amounting to perhaps 2000 pages of printed text. My guess is that this amounts to one to two person years of translation work. In other words, you'll need to have a team of translators, if you intend to complete the translation in any reasonable time.
- What is your longer term commitment? man-pages is a moving target: starting a couple of months ago, I'm now working full time on man-pages, and I make a release every week or so. The French translator estimates that there is around two days' work for him translating each release. Now, I may not be working full time on man-pages forever, and therefore the required translation effort may decrease some day, but the point remains that there is a significant ongoing effort required to keep a translation up to date and useful.
The size of the translation effort should not be underestimated. It is because it is so large that to date there has been only one complete and up-to-date translation: the French translation. (For a while, there was a fairly full German translation, but it seems to have languished for a few years now.) The state of the French translation has largely been down to the extraordinary work of two people: Christophe Blaess, and more recently, Alain Portal. (In fact, there are nowadays two French translations which cooperate to some extent: the Debian distribution has a team doing a French translation of man-pages.) But nowadays even the French translator(s) have started to feel the strain resulting from the recent increase in my output.
If you decide you really want to do a translation (and think very carefully before you do decide that!), then I have a few thoughts on how you go about it.
Tools: I have no real recommendations here (since I never translated man-pages). But it's worth mentioning that the Debian French translators use po4a, and see it as very beneficial for their work, especially for facilitating the work of a team of translators.
Other than that, I'd say that you need to:
- Estimate the time required to translate the 850 pages in man-pages, and decide if you have the necessary number translators who have sufficient time to complete the work.
- Divide the work up so that your translators can work independently on translations. I suggest you divide the pages up into small, related parcels. For example, the POSIX message queue pages (mq_*) could be a parcel translated by a single translator, or the math man pages could be a parcel translated by a single translator, etc.
- Come up with a review plan, so that each translation by one member of your team is reviewed by at least one other member.
- Devise a glossary of terminology, so that you all translate English technical terms ("e.g., shared memory segment") into the same terms in the target language.
- Plan for ongoing maintenance, so that as the English man pages are updated, then the translated pages are also updated. Don't underestimate the amount of this work!
My suggestion is that if you go forward with a translation project, then:
- Pick a particular man-pages release -- let's say man-pages-3.x -- and translate all of the pages in that version.
- When that is completed, you can then update your translation with all of the changes that have occurred in the English original since release man-pages-3.x. I keep fairly detailed changelogs which should assist you during this phase of the work.
I suggest doing things this way since I estimate that trying to do a translation while simultaneously trying to keep up with changes in already translated pages would just prove too difficult. You might decide otherwise.
And finally... did I mention that you should think long and hard before embarking on a translation of man-pages?
[12 Aug 08: minor updates, to point out exactly which sections are in man-pages, and to suggest more appropriate pages to translate, if targeting end users.]
Posted by Michael Kerrisk at 11:41 AM
Wednesday, August 6, 2008
As I mentioned in my previous post, as of man-pages-3.06, the math man pages now describe the error-reporting behavior of the math functions as at glibc 2.8. I'd like to extend those descriptions to cover differences in older glibc versions. In order to do that, I've written some scripts to check the error-reporting behavior of the math functions, and I'd like to run them on as many different versions of glibc as possible.
If you'd like to help, and you have an x86 system with an older glibc (look at the version number in the first line of output produced by the command /lib/libc.so.6), run the script in this tarball (see the README file inside the tarball for details), and send me the resulting log file (email to mtk.manpages AT gmail.com).
Updated, 2012- 03-06: Fix link to tarball
Posted by Michael Kerrisk at 11:00 AM
Math functions are different from most other library functions in the kinds of errors that they report, and in the way that they report errors. Broadly speaking, a math function can fail for one of the following reasons:
- Domain error: an argument to the function was outside the range for which the function was defined. For example, the call sqrt(-1.0) gives a domain error because a negative number does not have (real) square root. When a domain error occurs, a math function typicall returns a NaN (not-a-number).
- Pole error: the function result is an exact infinity. For example log(0.0) is negative infinity. When a pole error occurs, most math functions return the floating-point representation of positive or negative infinity, as appropriate (i.e., HUGE_VAL or -HUGE_VAL for functions returning a double).
- Range error (overflow): an overflow occurs if the function result is too large to be represented as a floating-point number. For example, exp(1e10) produces a number too large to represent in a double. When an overflow occurs, most math functions return the floating-point representation of positive or negative infinity, as appropriate (i.e., HUGE_VAL or -HUGE_VAL for functions returning a double).
- Range error (underflow): an underflow occurs if the function result is so small that it can't be represented as a (normalized) floating-point number. For example, exp(-1e10) produces a number too large to represent in a double. When an underflow occurs, a math function usually either returns a (signed) zero, or a subnormal value, as appropriate.
(More details can be found in the math_error(7) man page.)
Many library functions report an error by returning a NULL pointer or an integer -1. Neither of these mechanisms would be suitable for math functions: these functions usually return a floating-point value, and -1 is in many cases a valid successful return. For this reasons, the C99 and POSIX.1-2001 standards define two other mechanisms by which math functions can report errors.
The first of the error-reporting mechanisms is to use the traditional errno variable. We set the errno to zero before the call, and if it has a non-zero value after the call, then an error occurred. On error, errno is set as follows:
- Domain error: EDOM
- Pole error: ERANGE
- Overflow: ERANGE
- Underflow: ERANGE
The other error-reporting mechanism is exceptions. For each of the errors described above, the system raises an exception, and the fetestexcept() library function can be used to check whether an exception occurred. In order to use this mechanism we do the following:
- Call feclearexcept(FE_ALL_EXCEPT) to clear any existing exceptions.
- Call the math library function.
- Call fetestexcept(FE_INVALID FE_DIVBYZERO FE_OVERFLOW FE_UNDERFLOW).
If the math function was successful, then fetestexcept() returns 0. If an error occurred while calling the math function, then fetestexcept() returns a bit mask indicating the error. In this bit mask, exactly one of FE_INVALID, FE_DIVBYZERO, FE_OVERFLOW, or FE_UNDERFLOW will be set. The exceptions raised for each error are:
- Domain error: invalid exception (FE_INVALID)
- Pole error: divide-by-zero exception (FE_DIVBYZERO)
- Overflow: overflow exception (FE_OVERFLOW)
- Underflow: underflow exception (FE_UNDERFLOW)
C99 and POSIX.1-2001 require an implementation to support at least one of the error-reporting mechanisms for all math functions, and allow both to be supported. The standards specify an identifier, math_errhandling, that an implementation should set to indicate which mechanisms are supported. If (math_errhandling & MATH_ERRNO) is non-zero, then errno is set to indicate errors. If (math_errhandling & MATH_EXCEPT) is non-zero ,then exceptions are raised on errors.
The CONFORMANCE file in the glibc sources has long explained that:
Implementing MATH_ERRNO, MATH_ERREXCEPT and math_errhandling in
needs compiler support: see
But to date this support has not arrived. In any case, this support is a somewhat moot point, since it transpires that neither of the mechanisms is supported by all of the math functions in glibc: most (but not all) support exceptions, many support both exceptions and errno, a few support errno but not exceptions, and one or two functions support neither mechanism. To make things even worse, the man pages didn't fully and correctly describe the details for each math function. Since man-pages-3.06, the details should now be accurate, at least for glibc 2.8.
Ideally, all of the glibc math functions would support both mechanisms, so that programs that depend on either mechanism could be happily ported to Linux. With that idea in mind, I went through and tested the error-reporting behavior for each math function, and filed a series of bug reports that document deviations from that ideal.
In order to get an overview, the table below summarizes the situation for all of the math functions as at glibc 2.8. The third and fourth columns indicate whether errno is correctly set and an exception is raised for each error case.
|Function||Expected error||errno set correctly?||Exception correctly raised?||Notes|
|asinh()||-||-||-||No errors occur|
|atan()||-||-||-||No errors occur|
|atan2()||-||-||-||No errors occur|
|atanh(+-1)||pole||n||y||errno is set to EDOM (should be ERANGE)|
|cbrt()||-||-||-||No errors occur|
|ceil()||-||-||-||No errors occur|
|erf(+-small)||underflow||n||y||For subnormal x|
|erfc(x)||underflow||n||y||Result underflows but produces representable (i.e., subnormal) result; e.g., erfc(27)|
|exp10(+large)||overflow||y||y||GNU extension, but inconsistent with exp()|
|exp10(-large)||underflow||n||y||GNU extension, but inconsistent with exp()|
|fabs()||-||-||-||No errors occur|
|fdim()||overflow||n||y||e.g., fdim(DBL_MAX, -DBL_MAX)|
|floor()||-||-||-||No errors occur|
|fma()||domain||n||y||Various causes, e.g., one of x or y is an infinity and the other is 0.|
|fma()||overflow||n||y||e.g., fma(DBL_MAX, DBL_MAX, 0)|
|fma()||underflow||n||y||e.g., fma(DBL_MIN, DBL_MIN, 0)|
|fmax()||-||-||-||No errors occur|
|fmin()||-||-||-||No errors occur|
|hypot()||overflow||y||y||e.g., hypot(DBL_MAX, DBL_MAX)|
|hypot()||underflow||n||y||e.g., if both arguments are small subnormal numbers|
|ilogb(+-inf)||domain||n||n||Does correctly return INT_MAX|
|ilogb(0)||domain||n||y||Does correctly return FP_ILOGB0|
|ilogb(nan)||domain||n||y||Does correctly return FP_ILOGBNAN|
|lgamma()||pole||n||y||Occurs when x is a non-positive integer; errno is set to EDOM (should be ERANGE)|
|llrint()||domain||n||y||x is NaN, infinity, or too large to store in a long long|
|llround()||domain||n||y||x is NaN, infinity, or too large to store in a long long|
|lrint()||domain||n||y||x is NaN, infinity, or too large to store in a long|
|lround()||domain||n||y||x is NaN, infinity, or too large to store in a long|
|nearbyint()||-||-||-||No errors occur|
|nextafter()||overflow||n||y||e.g., nextafter(DBL_MAX, +inf)|
|nextafter()||underflow||n||y||e.g., nextafter(DBL_MIN, 0);|
|nexttoward()||overflow||n||y||e.g., nexttoward(DBL_MAX, +inf)|
|nexttoward()||underflow||n||y||e.g., nexttoward(DBL_MIN, 0);|
|pow(0, -y)||pole (0, neg)||n||y||errno is set to EDOM (should be ERANGE)|
Suitable values to cause overflow (e.g., pow(2, 1e100))
Suitable values to cause underflow (e.g., pow(2, -1e100)
|rint()||-||-||-||No errors occur|
|round()||-||-||-||No errors occur|
|scalb()||overflow||n||y||e.g., scalb(DBL_MAX, 200)|
|scalb()||underflow||n||y||e.g., scalb(DBL_MAX, -200)|
|scalbln()||overflow||n||y||e.g., scalbln(DBL_MAX, 200)|
|scalbln()||underflow||n||y||e.g., scalbln(DBL_MAX, -200)|
|scalbn()||overflow||n||y||e.g., scalbn(DBL_MAX, 200)|
|scalbn()||underflow||n||y||e.g., scalbn(DBL_MAX, -200)|
|tan(pi/2)||overflow||-||-||No test possible, since the best approximation of pi/2 in double precision only yields a tan() value of 1.633e16.|
|tanh()||-||-||-||No errors occur|
|tgamma()||underflow||n||y||Occurs for ranges of x values between negative integers, e.g., tgamma(-10000.5)|
Note the difference from
|tgamma(x<0)||domain||y||y||For finite x|
|trunc()||-||-||-||No errors occur|
|y0()||overflow||-||-||Not possible to overflow with double|
|y0(0)||pole||n||n||errno is set to EDOM (should be ERANGE)|
|y1()||overflow||-||-||Not possible to overflow with double|
|y1(0)||pole||n||n||errno is set to EDOM (should be ERANGE)|
|yn()||overflow||n||y||e.g., yn(1000, DBL_MIN)|
|yn()||underflow||y||n||e.g., yn(10, DBL_MAX)|
|yn(0)||pole||n||n||errno is set to EDOM (should be ERANGE)|
Posted by Michael Kerrisk at 10:45 AM