So I came to write the man pages for utimensat() and futimens(), interfaces that are specified in the upcoming POSIX.1 revision, and recently added to Linux (kernel 2.6.22, glibc 2.6), and it's a familiar story: bugs, and yet more bugs.
This case is a little worse than usual (the gory details are below), but hardly exceptional: it was a similar story with other recent APIs, such as splice(), signalf(), and the timerfd API (the last of which was ultimately redesigned after some pushing by me). In each case, an interface that was already released in a stable kernel turned out to have easy-to-find bugs when I came to document and test it. All that was required was to come up with something like a reasonable written specification (i.e., a man page), and then start testing against that specification.
The following seems (to me) like a reasonable development model for new kernel userland APIs:
- Write a more or less complete design specification (preferably something like a man page, which I'm more than happy to edit and review at this stage in the process), and perhaps even code up an initial version of the interface.
- Get review comments about the design; revise the design if necessary.
- Implement something like a final version of the interface.
- Test the interface. Preferably: test in collaboration with other people who had nothing to do with the implementation (they will think about the interface in different ways from the implementer). If it's sensible to do so (and usually it is), produce a test suite to check the correctness of as many operational cases of the interface as possible (and then send that suite to the Linux Test Project).
- Write the interface documentation (i.e., a man page) for userland programmers, and send it to me for editing and review. (If you do this smart, then you can just recycle the work from the first step.)
- Get the interface accepted into mainline.
- Code up the new interface.
- Post the code to LKML, explaining that "I tested it, and it works", and, maybe, include some brief documentation of the interface.
- Often, someone looks at the code and suggests some fixes. Usually, no one else does any testing.
- Code gets accepted into mainline.
- Usually, some few dot releases later: someone (often me) who had nothing to do with the earlier steps writes the man page.
- Often, some few dot releases later: find and fix all of the bugs (e.g., in the case of utimensat(), released for 2.6.22, the fixes will only make it into 2.6.26 at the earliest).
- Occasionally, some few dot releases later: realize that the interface could have been better designed, but it's too late now to change it, because we can't break the ABI.
- Userland programmers end up dealing with kernel bugs they shouldn't have to deal with. (When this happens often, it can damage the reputation of the Linux kernel-userland interface, and userland programmers may become wary of adopting new interfaces, which can further delay any kind of real-world testing and adoption of the interface.)
- Code that uses new interfaces may have to special case for ABI bugs that were present in the first few kernel releases that contained the bug.
I'll have more time for testing and review nowadays, and hopefully I'll catch more problems before they are released into stable kernels, but there's certainly more than one person can do. And, as described above, there is a culture problem when it comes to adding kernel-userland interfaces on Linux. The question is whether the culture can be changed. (And here its worth repeating a point that I often make: one could perhaps try and blame individual developers for buggy interfaces, but that doesn't really get to the core issue (everyone is going to write bugs): the real problem is the process by which kernel-userland interface changes are accepted into the kernel.)
The Gory Details
For those who are interested, here are the problems I found with utimensat() and futimens(), explained by annotating the current draft of the man page.
SYNOPSIS #include <sys/stat.h> int utimensat(int dirfd, const char *pathname, const struct timespec times[2], int flags); int futimens(int fd, const struct timespec times[2]); DESCRIPTION utimensat() and futimens() update the timestamps of a file with nanosecond precision. This contrasts with the historical utime(2) and utimes(2), which permit only sec- ond and microsecond precision, respectively, when setting file timestamps.
This is the first of the advantages of the new interfaces: greater precision for setting timestamps. (The next revision of POSIX.1 specifies nanosecond file timestamps, and extends stat(2) to allow their retrieval. File system support is needed, and can't necessarily be retrofitted to some older file systems. Currently, Linux supports nanosecond timestamps on XFS, JFS, and ext4.)
With utimensat() the file is specified via the pathname
given in pathname. With futimens() the file whose times-
tamps are to be updated is specified via an open file
descriptor, fd.
For both calls, the new file timestamps are specified in
the array times: times[0] specifies the new "last access
time" (atime); times[1] specifies the new "last modifica-
tion time" (mtime). Each of the elements of times speci-
fies a time in seconds and nanoseconds since the Epoch
(00:00:00, 1 Jan 1970, UTC), in a structure of the fol-
lowing form:
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
If the tv_nsec field of one of the timespec structures
has the special value UTIME_NOW, then the corresponding
file timestamp is set to the current time. If the
tv_nsec field of one of the timespec structures has the
special value UTIME_OMIT, then the corresponding file
timestamp is left unchanged. In both of these cases, the
value of the corresponding tv_sec field is ignored.
These are the other advantages of the new interfaces.
With utime(2) and utimes(2), to change just one of the timestamps, we must make a call to stat(2) to retrieve the current timestamps, use one of the timestamps to initialize the times element that we don't want to change, and then call utimensat() with the desired value for the other timestamp. This can be subject to race conditions: if another process updates the file timestamps between the two calls, then that update will be lost. UTIME_OMIT allows us to avoid this problem.
UTIME_NOW exists mainly as a convenience: it allows us to avoid fetching the current time (using gettimeofday(2) or similar) in order to set a file timestamp to "now".
If times is NULL, then both timestamps are set to the
current time.
Permissions requirements
To set both file timestamps to the current time (i.e.,
times is NULL, or both tv_nsec fields specify UTIME_NOW),
either:
1. the caller must have write access to the file;
2. the caller's effective user ID must match the owner of
the file; or
3. the caller must have appropriate privileges.
To make any change other than setting both timestamps to
the current time (i.e., times is not NULL, and both
tv_nsec fields are not UTIME_NOW and both tv_nsec fields
are not UTIME_OMIT), either condition 2 or 3 above must
apply.
If both tv_nsec fields are specified as UTIME_OMIT, then
no file ownership or permission checks are performed, and
the file timestamps are not modified, but other error
conditions may still be detected.
utimensat() specifics
[Details of dirfd and flags arguments omitted.]
For details of the dirfd argument, see openat(2).
RETURN VALUE On success, utimensat() and futimens() return 0. On error, -1 is returned and errno is set to indicate the error. ERRORS [Various other errors that are irrelevant to this post have been omitted.] EACCES times is NULL, or both tv_nsec values are UTIME_NOW, but the effective effective ID of the caller does not match the owner of the file, the caller does not have write access to the file, and the caller is not privileged (Linux: does not have either the CAP_FOWNER or the CAP_DAC_OVERRIDE capability). EACCES times is NULL, or both tv_nsec values are UTIME_NOW, and the file is marked immutable. EPERM The caller attempted to change one or both times- tamps to a value other than the current time, or to change one of the timestamps to the current time while leaving the other timestamp unchanged, (i.e., times is not NULL, both tv_nsec fields are not UTIME_NOW, and both tv_nsec fields are not UTIME_OMIT) but the caller's effective user ID does not match the owner of file, and the caller is not privileged (Linux: does not have the CAP_FOWNER capability). EPERM times is not NULL, and the file is marked append- only or immutable. NOTES On Linux, futimens() is a library function implemented on top of the utimensat() system call. To support this, the Linux utimensat() system call implements a non-standard feature: if pathname is NULL, then the system call modi- fies the timestamps of the open file referred to by the file descriptor dirfd (which may refer to any type of file). Using this feature, the call futimens(fd, times) is implemented as: utimensat(fd, NULL, times, 0); BUGS Several bugs afflict utimensat() and futimens() on ker- nels before 2.6.??.
Really, there should have been a test suite to go along with the initial implementation. There wasn't, unfortunately, so I wrote one (later revised a little before it went to LTP) whose results can be seen here.
I've posted patches to fix all of the bugs, and Andrew Morton has accepted them into -mm, and from there, they seem to have gone upstream to Al Viro. It's just a question of when they will get pushed into mainline. It'd be nice if they make it into 2.6.26, but maybe they won't, since it is already getting very late in the -rc cycle. (Update, 6 Jul 08: the patches have gone into -rc9.)
These bugs are either non-confor-
mances with the POSIX.1 draft specification or inconsis-
tencies with historical Linux behavior.
* POSIX.1 specifies that if one of the tv_nsec fields has
the value UTIME_NOW or UTIME_OMIT, then the value of
the corresponding tv_sec field should be ignored.
Instead, the value of the tv_sec field is required to
be 0 (or the error EINVAL results).
This was a simple and obvious divergence from the specification.
* Various bugs mean that for the purposes of permission
checking, the case where both tv_nsec fields are set to
UTIME_NOW isn't always treated the same as specifying
times as NULL, and the case where one tv_nsec value is
UTIME_NOW and the other is UTIME_OMIT isn't treated the
same as specifying times as a pointer to a structure
containing arbitrary time values. As a result, in some
cases: a) file timestamps can be updated by a process
that shouldn't have permission to perform updates; b)
file timestamps can't be updated by a process that
should have permission to perform updates; and c) the
wrong errno value is returned in case of an error.
There are multiple error cases here:
- If one of the tv_nsec fields is UTIME_OMIT and the other is UTIME_NOW, then the error EPERM should occur if the process's effective user ID does not match the file owner and the process is not privileged. Instead, the call successfully changes one of the timestamps.
- If the file is not writable by the effective user ID of the process and the process's effective user ID does not match the file owner and the process is not privileged, and times is NULL, then the error EACCES results. This error should also occur if times points to a structure in which both tv_nsec fields are UTIME_NOW. Instead the call succeeds.
- If a file is marked as append-only (see chattr(1)), then Linux traditionally (i.e., utime(2), utimes(2)), permits a NULL times argument to be used in order to update both timestamps to the current time. For consistency, utimensat() and futimens() should also produce the same result when given a times argument that points to a structure in which both tv_nsec fields are UTIME_NOW. Instead, the call fails with the error EPERM.
- If a file is marked as immutable (see chattr(1)), then Linux traditionally (i.e., utime(2), utimes(2)), gives an EACCES error if times is NULL. For consistency, utimensat() and futimens() should also produce the same result when given a times that points to a structure in which both tv_nsec fields are UTIME_NOW. Instead, the call fails with the error EPERM.
* POSIX.1 says that a process that has write access to
the file can make a call with times as NULL, or with
times pointing to a structure in which both tv_nsec
fields are UTIME_NOW, in order to update the both
timestamps to the current time. However, futimens()
instead checks whether the access mode of the file
descriptor allows writing.
This means that a process with a file descriptor that allows writing could change the timestamps of a file for which it does not have write permission; conversely, a process with a read-only file descriptor won't be able to update the timestamps of a file, even if it has write permission on the file.
Updated, 2012-03-06: Changed link to tarball containing tests