Wednesday, August 6, 2008

Math functions and error reporting

Math functions are different from most other library functions in the kinds of errors that they report, and in the way that they report errors. Broadly speaking, a math function can fail for one of the following reasons:

  • Domain error: an argument to the function was outside the range for which the function was defined. For example, the call sqrt(-1.0) gives a domain error because a negative number does not have (real) square root. When a domain error occurs, a math function typicall returns a NaN (not-a-number).
  • Pole error: the function result is an exact infinity. For example log(0.0) is negative infinity. When a pole error occurs, most math functions return the floating-point representation of positive or negative infinity, as appropriate (i.e., HUGE_VAL or -HUGE_VAL for functions returning a double).
  • Range error (overflow): an overflow occurs if the function result is too large to be represented as a floating-point number. For example, exp(1e10) produces a number too large to represent in a double. When an overflow occurs, most math functions return the floating-point representation of positive or negative infinity, as appropriate (i.e., HUGE_VAL or -HUGE_VAL for functions returning a double).
  • Range error (underflow): an underflow occurs if the function result is so small that it can't be represented as a (normalized) floating-point number. For example, exp(-1e10) produces a number too large to represent in a double. When an underflow occurs, a math function usually either returns a (signed) zero, or a subnormal value, as appropriate.

(More details can be found in the math_error(7) man page.)

Many library functions report an error by returning a NULL pointer or an integer -1. Neither of these mechanisms would be suitable for math functions: these functions usually return a floating-point value, and -1 is in many cases a valid successful return. For this reasons, the C99 and POSIX.1-2001 standards define two other mechanisms by which math functions can report errors.

The first of the error-reporting mechanisms is to use the traditional errno variable. We set the errno to zero before the call, and if it has a non-zero value after the call, then an error occurred. On error, errno is set as follows:

  • Domain error: EDOM

  • Pole error: ERANGE

  • Overflow: ERANGE

  • Underflow: ERANGE
(These settings do of course make it hard to distinguish the last three types of errors.)

The other error-reporting mechanism is exceptions. For each of the errors described above, the system raises an exception, and the fetestexcept() library function can be used to check whether an exception occurred. In order to use this mechanism we do the following:

  1. Call feclearexcept(FE_ALL_EXCEPT) to clear any existing exceptions.

  2. Call the math library function.


If the math function was successful, then fetestexcept() returns 0. If an error occurred while calling the math function, then fetestexcept() returns a bit mask indicating the error. In this bit mask, exactly one of FE_INVALID, FE_DIVBYZERO, FE_OVERFLOW, or FE_UNDERFLOW will be set. The exceptions raised for each error are:

  • Domain error: invalid exception (FE_INVALID)

  • Pole error: divide-by-zero exception (FE_DIVBYZERO)

  • Overflow: overflow exception (FE_OVERFLOW)

  • Underflow: underflow exception (FE_UNDERFLOW)

C99 and POSIX.1-2001 require an implementation to support at least one of the error-reporting mechanisms for all math functions, and allow both to be supported. The standards specify an identifier, math_errhandling, that an implementation should set to indicate which mechanisms are supported. If (math_errhandling & MATH_ERRNO) is non-zero, then errno is set to indicate errors. If (math_errhandling & MATH_EXCEPT) is non-zero ,then exceptions are raised on errors.

The CONFORMANCE file in the glibc sources has long explained that:

Implementing MATH_ERRNO, MATH_ERREXCEPT and math_errhandling in needs compiler support: see

But to date this support has not arrived. In any case, this support is a somewhat moot point, since it transpires that neither of the mechanisms is supported by all of the math functions in glibc: most (but not all) support exceptions, many support both exceptions and errno, a few support errno but not exceptions, and one or two functions support neither mechanism. To make things even worse, the man pages didn't fully and correctly describe the details for each math function. Since man-pages-3.06, the details should now be accurate, at least for glibc 2.8.

Ideally, all of the glibc math functions would support both mechanisms, so that programs that depend on either mechanism could be happily ported to Linux. With that idea in mind, I went through and tested the error-reporting behavior for each math function, and filed a series of bug reports that document deviations from that ideal.

In order to get an overview, the table below summarizes the situation for all of the math functions as at glibc 2.8. The third and fourth columns indicate whether errno is correctly set and an exception is raised for each error case.

FunctionExpected errorerrno set correctly?Exception correctly raised?Notes
asinh()---No errors occur
atan()---No errors occur
atan2()---No errors occur
atanh(+-1)polenyerrno is set to EDOM (should be ERANGE)
cbrt()---No errors occur
ceil()---No errors occur
cosh(+-large)overflowyye.g., cosh(DBL_MAX)
erf(+-small)underflownyFor subnormal x
erfc(x)underflownyResult underflows but produces representable (i.e., subnormal) result; e.g., erfc(27)
exp10(+large)overflowyyGNU extension, but inconsistent with exp()
exp10(-large)underflownyGNU extension, but inconsistent with exp()
fabs()---No errors occur
fdim()overflownye.g., fdim(DBL_MAX, -DBL_MAX)
floor()---No errors occur
fma()domainnyVarious causes, e.g., one of x or y is an infinity and the other is 0.
fma()overflownye.g., fma(DBL_MAX, DBL_MAX, 0)
fma()underflownye.g., fma(DBL_MIN, DBL_MIN, 0)
fmax()---No errors occur
fmin()---No errors occur
hypot()overflowyye.g., hypot(DBL_MAX, DBL_MAX)
hypot()underflownye.g., if both arguments are small subnormal numbers
ilogb(+-inf)domainnnDoes correctly return INT_MAX
ilogb(0)domainnyDoes correctly return FP_ILOGB0
ilogb(nan)domainnyDoes correctly return FP_ILOGBNAN
j0()underflowyne.g., j0(DBL_MAX)
ldexp(x, +large-exp)overflowyy-
ldexp(x, -large-exp)underflowyy-
lgamma()overflowyye.g., lgamma(DBL_MAX)
lgamma()polenyOccurs when x is a non-positive integer; errno is set to EDOM (should be ERANGE)
llrint()domainnyx is NaN, infinity, or too large to store in a long long
llround()domainnyx is NaN, infinity, or too large to store in a long long
lrint()domainnyx is NaN, infinity, or too large to store in a long
lround()domainnyx is NaN, infinity, or too large to store in a long
nearbyint()---No errors occur
nextafter()overflownye.g., nextafter(DBL_MAX, +inf)
nextafter()underflownye.g., nextafter(DBL_MIN, 0);
nexttoward()overflownye.g., nexttoward(DBL_MAX, +inf)
nexttoward()underflownye.g., nexttoward(DBL_MIN, 0);
pow(0, -y)pole (0, neg)nyerrno is set to EDOM (should be ERANGE)

Suitable values to cause overflow (e.g., pow(2, 1e100))


Suitable values to cause underflow (e.g., pow(2, -1e100)


rint()---No errors occur
round()---No errors occur
scalb()overflownye.g., scalb(DBL_MAX, 200)
scalb()underflownye.g., scalb(DBL_MAX, -200)
scalbln()overflownye.g., scalbln(DBL_MAX, 200)
scalbln()underflownye.g., scalbln(DBL_MAX, -200)
scalbn()overflownye.g., scalbn(DBL_MAX, 200)
scalbn()underflownye.g., scalbn(DBL_MAX, -200)
sinh(+-large)overflowy-e.g., sinh(DBL_MAX)
tan(pi/2)overflow--No test possible, since the best approximation of pi/2 in double precision only yields a tan() value of 1.633e16.
tanh()---No errors occur
tgamma()underflownyOccurs for ranges of x values between negative integers, e.g., tgamma(-10000.5)

Note the difference from

tgamma(x<0)domainyyFor finite x
trunc()---No errors occur
y0()overflow--Not possible to overflow with double
y0()underflowyne.g., y0(DBL_MAX)
y0(0)polennerrno is set to EDOM (should be ERANGE)
y1()overflow--Not possible to overflow with double
y1()underflowyne.g., y1(DBL_MAX)
y1(0)polennerrno is set to EDOM (should be ERANGE)
yn()overflownye.g., yn(1000, DBL_MIN)
yn()underflowyne.g., yn(10, DBL_MAX)
yn(0)polennerrno is set to EDOM (should be ERANGE)

No comments: