While looking at the new prlimit() system call in Linux 2.6.36, I surveyed the various system calls that allow one process to change the operation or attributes of another (arbitrary) process. In general, these system calls require either that the caller is privileged (i.e., has some capability) or that there is a match between the credentials (user or group IDs) of the calling process and the target process.
There's a great deal of inconsistency. As at 2.6.36, here's what we have (in the following, uid means the real UID of the caller, euid means the effective UID, and suid means the saved set-user-ID; a similar convention applies for the group IDs--thus gid, egid, sgid; and a "t-" prefix means the corresponding credentials of the target process):
- setpriority(), sched_setscheduler(), sched_setparam(), sched_setaffinity(): CAP_SYS_NICE || euid == t-uid || euid == t-euid. This is sane: you can make changes to another process if you have the right capability or you own the process--that is, you (i.e., here "you" means the UID currently operated via the effective UID) can change the attributes of a process that was originally created by you (euid == t-uid) or one that has assumed (via the set-user-ID mechanism) your identity (euid == t-euid). POSIX specifies that the checks for setpriority() are uid == t-euid || euid == t-euid; the Linux semantics are arguably saner (and are consistent with historical BSD behavior). POSIX specifies sched_setscheduler() and sched_setparam() but does not specify their permission-checking semantics.
- ioprio_set(): CAP_SYS_NICE || uid == t-uid || euid == t-uid. The caller is privileged, or the caller's real or effective UID matches the target process's UID. There's no obvious reason for the inconsistency with setpriority().
- migrate_pages(), move_pages(): CAP_SYS_NICE || uid == t-uid || uid == t-suid || euid == t-uid || euid == t-suid. Like setpriority(), but you can also make changes if your real UID matches target credentials. Again, there's no obvious reason for the inconsistency with setpriority().
- kill(), killpg(): CAP_KILL || uid == t-uid || uid == t-suid || euid == t-uid euid == t-suid. The UID-matching semantics are as required by POSIX: the real or effective UID of the caller must match the real or saved set-user-ID of the target.
- prlimit(): CAP_SYS_RESOURCE || (uid == t-uid && uid == t-euid && uid == t-suid) && (gid == t-gid && gid == t-guid && gid == t-sgid). Now we start to get into strange territory. Using CAP_SYS_RESOURCE makes sense, because CAP_SYS_RESOURCE is used for the privilege checks in the setrlimit() system call. However, requiring that all of the UIDs of the target match the real UID of the caller is quite inconsistent with any of the other APIs. Adding an analogous check for the group IDs further compounds the inconsistency.
One thing to note: the behavior of most of the Linux-specific system calls (i.e.,
ioprio_set(),
move_pages(),
migrate_pages(), and
prlimit()) was documented only after the implementation, which I'd argue was a contributing factor to the inconsistencies described above.