## Wednesday, August 6, 2008

### Math functions and error reporting

Math functions are different from most other library functions in the kinds of errors that they report, and in the way that they report errors. Broadly speaking, a math function can fail for one of the following reasons:

• Domain error: an argument to the function was outside the range for which the function was defined. For example, the call sqrt(-1.0) gives a domain error because a negative number does not have (real) square root. When a domain error occurs, a math function typicall returns a NaN (not-a-number).
• Pole error: the function result is an exact infinity. For example log(0.0) is negative infinity. When a pole error occurs, most math functions return the floating-point representation of positive or negative infinity, as appropriate (i.e., HUGE_VAL or -HUGE_VAL for functions returning a double).
• Range error (overflow): an overflow occurs if the function result is too large to be represented as a floating-point number. For example, exp(1e10) produces a number too large to represent in a double. When an overflow occurs, most math functions return the floating-point representation of positive or negative infinity, as appropriate (i.e., HUGE_VAL or -HUGE_VAL for functions returning a double).
• Range error (underflow): an underflow occurs if the function result is so small that it can't be represented as a (normalized) floating-point number. For example, exp(-1e10) produces a number too large to represent in a double. When an underflow occurs, a math function usually either returns a (signed) zero, or a subnormal value, as appropriate.

(More details can be found in the math_error(7) man page.)

Many library functions report an error by returning a NULL pointer or an integer -1. Neither of these mechanisms would be suitable for math functions: these functions usually return a floating-point value, and -1 is in many cases a valid successful return. For this reasons, the C99 and POSIX.1-2001 standards define two other mechanisms by which math functions can report errors.

The first of the error-reporting mechanisms is to use the traditional errno variable. We set the errno to zero before the call, and if it has a non-zero value after the call, then an error occurred. On error, errno is set as follows:

• Domain error: EDOM

• Pole error: ERANGE

• Overflow: ERANGE

• Underflow: ERANGE
(These settings do of course make it hard to distinguish the last three types of errors.)

The other error-reporting mechanism is exceptions. For each of the errors described above, the system raises an exception, and the fetestexcept() library function can be used to check whether an exception occurred. In order to use this mechanism we do the following:

1. Call feclearexcept(FE_ALL_EXCEPT) to clear any existing exceptions.

2. Call the math library function.

3. Call fetestexcept(FE_INVALID FE_DIVBYZERO FE_OVERFLOW FE_UNDERFLOW).

If the math function was successful, then fetestexcept() returns 0. If an error occurred while calling the math function, then fetestexcept() returns a bit mask indicating the error. In this bit mask, exactly one of FE_INVALID, FE_DIVBYZERO, FE_OVERFLOW, or FE_UNDERFLOW will be set. The exceptions raised for each error are:

• Domain error: invalid exception (FE_INVALID)

• Pole error: divide-by-zero exception (FE_DIVBYZERO)

• Overflow: overflow exception (FE_OVERFLOW)

• Underflow: underflow exception (FE_UNDERFLOW)

C99 and POSIX.1-2001 require an implementation to support at least one of the error-reporting mechanisms for all math functions, and allow both to be supported. The standards specify an identifier, math_errhandling, that an implementation should set to indicate which mechanisms are supported. If (math_errhandling & MATH_ERRNO) is non-zero, then errno is set to indicate errors. If (math_errhandling & MATH_EXCEPT) is non-zero ,then exceptions are raised on errors.

The CONFORMANCE file in the glibc sources has long explained that:

Implementing MATH_ERRNO, MATH_ERREXCEPT and math_errhandling in needs compiler support: see
http://sources.redhat.com/ml/libc-hacker/2000-06/msg00008.html
http://sources.redhat.com/ml/libc-hacker/2000-06/msg00014.html
http://sources.redhat.com/ml/libc-hacker/2000-06/msg00015.html

But to date this support has not arrived. In any case, this support is a somewhat moot point, since it transpires that neither of the mechanisms is supported by all of the math functions in glibc: most (but not all) support exceptions, many support both exceptions and errno, a few support errno but not exceptions, and one or two functions support neither mechanism. To make things even worse, the man pages didn't fully and correctly describe the details for each math function. Since man-pages-3.06, the details should now be accurate, at least for glibc 2.8.

Ideally, all of the glibc math functions would support both mechanisms, so that programs that depend on either mechanism could be happily ported to Linux. With that idea in mind, I went through and tested the error-reporting behavior for each math function, and filed a series of bug reports that document deviations from that ideal.

In order to get an overview, the table below summarizes the situation for all of the math functions as at glibc 2.8. The third and fourth columns indicate whether errno is correctly set and an exception is raised for each error case.

 Function Expected error errno set correctly? Exception correctly raised? Notes acos(+-inf) domain y y - acosh(x<1) domain y y - asin(+-inf) domain y y - asinh() - - - No errors occur atan() - - - No errors occur atan2() - - - No errors occur atanh(+-1) pole n y errno is set to EDOM (should be ERANGE) atanh(x>1) domain y y - cbrt() - - - No errors occur ceil() - - - No errors occur cos(+-inf) domain n y - cosh(+-large) overflow y y e.g., cosh(DBL_MAX) erf(+-small) underflow n y For subnormal x erfc(x) underflow n y Result underflows but produces representable (i.e., subnormal) result; e.g., erfc(27) exp(+large) overflow y y - exp(-large) underflow y y - exp10(+large) overflow y y GNU extension, but inconsistent with exp() exp10(-large) underflow n y GNU extension, but inconsistent with exp() exp2(+large) overflow y y - exp2(-large) underflow y y - expm1(+large) overflow n y - fabs() - - - No errors occur fdim() overflow n y e.g., fdim(DBL_MAX, -DBL_MAX) floor() - - - No errors occur fma() domain n y Various causes, e.g., one of x or y is an infinity and the other is 0. fma() overflow n y e.g., fma(DBL_MAX, DBL_MAX, 0) fma() underflow n y e.g., fma(DBL_MIN, DBL_MIN, 0) fmax() - - - No errors occur fmin() - - - No errors occur fmod(+-inf,y) domain n y - fmod(x,0) domain y y - hypot() overflow y y e.g., hypot(DBL_MAX, DBL_MAX) hypot() underflow n y e.g., if both arguments are small subnormal numbers ilogb(+-inf) domain n n Does correctly return INT_MAX ilogb(0) domain n y Does correctly return FP_ILOGB0 ilogb(nan) domain n y Does correctly return FP_ILOGBNAN j0() underflow y n e.g., j0(DBL_MAX) j1() underflow y n - jn() underflow y n - ldexp(x, +large-exp) overflow y y - ldexp(x, -large-exp) underflow y y - lgamma() overflow y y e.g., lgamma(DBL_MAX) lgamma() pole n y Occurs when x is a non-positive integer; errno is set to EDOM (should be ERANGE) llrint() domain n y x is NaN, infinity, or too large to store in a long long llround() domain n y x is NaN, infinity, or too large to store in a long long log(0) pole y y - log(x<0) domain y y - log10(0) pole y y - log10(x<0) domain y y - log1p(-1) pole n y - log1p(x<-1) domain n y - log2(0) pole y y - log2(x<0) domain y y - logb(0) pole n y - lrint() domain n y x is NaN, infinity, or too large to store in a long lround() domain n y x is NaN, infinity, or too large to store in a long nearbyint() - - - No errors occur nextafter() overflow n y e.g., nextafter(DBL_MAX, +inf) nextafter() underflow n y e.g., nextafter(DBL_MIN, 0); nexttoward() overflow n y e.g., nexttoward(DBL_MAX, +inf) nexttoward() underflow n y e.g., nexttoward(DBL_MIN, 0); pow(0, -y) pole (0, neg) n y errno is set to EDOM (should be ERANGE) pow(x,y) overflow y y Suitable values to cause overflow (e.g., pow(2, 1e100)) pow(x,y) underflow y y Suitable values to cause underflow (e.g., pow(2, -1e100) pow(x<0,> domain y y - remainder(+-inf,y) domain n y - remainder(x,0) domain y y - remquo(+-inf,y) domain n y - remquo(x,0) domain n y - rint() - - - No errors occur round() - - - No errors occur scalb() overflow n y e.g., scalb(DBL_MAX, 200) scalb() underflow n y e.g., scalb(DBL_MAX, -200) scalb(0,+inf) domain n y - scalbln() overflow n y e.g., scalbln(DBL_MAX, 200) scalbln() underflow n y e.g., scalbln(DBL_MAX, -200) scalbn() overflow n y e.g., scalbn(DBL_MAX, 200) scalbn() underflow n y e.g., scalbn(DBL_MAX, -200) sin(+-inf) domain n y - sinh(+-large) overflow y - e.g., sinh(DBL_MAX) sqrt(x<0) domain y y - tan(+-inf) domain n y - tan(pi/2) overflow - - No test possible, since the best approximation of pi/2 in double precision only yields a tan() value of 1.633e16. tanh() - - - No errors occur tgamma() underflow n y Occurs for ranges of x values between negative integers, e.g., tgamma(-10000.5) tgamma(+-0) pole y y - tgamma(+large) overflow y y - tgamma(-inf) domain n y Note the difference fromtgamma(x<0) tgamma(x<0) domain y y For finite x trunc() - - - No errors occur y0() overflow - - Not possible to overflow with double y0() underflow y n e.g., y0(DBL_MAX) y0(0) pole n n errno is set to EDOM (should be ERANGE) y0(x<0) domain y y - y1() overflow - - Not possible to overflow with double y1() underflow y n e.g., y1(DBL_MAX) y1(0) pole n n errno is set to EDOM (should be ERANGE) y1(x<0) domain y y - yn() overflow n y e.g., yn(1000, DBL_MIN) yn() underflow y n e.g., yn(10, DBL_MAX) yn(0) pole n n errno is set to EDOM (should be ERANGE) yn(x<0) domain y y -