Behind the Schemes: Exactness


Welcome back to Behind the Schemes, where I discuss items in Schemer that are not readily visible to users (i.e. behind the scenes). In this post I'll be taking a look at numbers and exactness. I'll start by defining numeric exactness, then examine what exactness means for math procedures, and last I'll explain how some math procedures are implemented (and maintain exactness).

In Scheme, exactness refers to how a number can be represented. A number is considered exact if it can be represented as an integer value (e.g. 1, -2, 100); otherwise, the number is considered inexact (e.g. 1.234, -9.54, 88.5). Although Scheme has many numeric types, Schemer only implements integers (exact) and reals (inexact) because they easily map to C++'s 32-bit integer and 32-bit floating point types.

Exactness also comes into play with math procedures. If a math procedure operates on all exact or all inexact numbers, the result is expected to be exact or inexact, respectively; however, there are exceptions to this rule. When a math procedure operates on exact numbers, but cannot produce an exact result (e.g. sqrt(6) ~= 2.4494897), then the result will be inexact. Also, when a math procedure operates on inexact numbers and produces a result that can be interpreted as exact (e.g. 0.5 + 0.5 = 1), then the result will be exact. Finally, when a math procedure operates on both exact and inexact numbers, the exactness of the result is determined by asking:

If the result is truncated to an integer (i.e. everything after the decimal point is removed), will it still have the same value?

If the answer is yes, then the result can be converted to an integer; otherwise, the result must remain a floating point number.

Most Schemer math procedures optimistically assume that their provided arguments are all integer or all floating point (depending on the type of the first numeric argument). When the math procedure encounters an argument whose numeric type is different from the first numeric argument, then the math procedure switches to a more general computation path that can handle mixed numeric types. This approach is mainly an effort to reduce integer and floating point conversions which have an additional cost associated with them. It should also be noted that integer and floating point arithmetic performance is dependent on the CPU, and so the approach described here may not always produce the most optimal results for every CPU (link).

Last, I'll describe how arithmetic math procedures are implemented:

Addition - When all integers or all floating point numbers are passed, the sum is computed and returned using all integer or all floating point addition, respectively (in the case of all floating point numbers, a check is performed to determine whether the final result can be an integer). When mixed numeric arguments are passed, the integers and floating point numbers are summed separately using integer and floating point addition, respectively. After all arguments are summed, if the floating point sum can be converted to an integer, then the integer and floating point results are summed together and the final result is returned as an integer; otherwise, the final result is returned as a floating point number.

Subtraction - This procedure is implemented similar to addition. The caveats are that the number being subtracted from is not included in the summations and at the end of  computation if the number being subtracted from was floating point, we must check whether the resulting subtraction of all quantities yields an integer. In the case where the number being subtracted from is an integer, it's sufficient to check whether the summed floating point result can be converted to an integer in order to determine whether the final result should be an integer. That is, if you're evaluating (- 10.5 2 3.5) = (10.5 - 2 - 3.5) = 5, you must check whether the final result can be an integer or not; whereas if you're evaluating (- 10 2.5 3.5) = (10 - 5.0) = 5, you need only check whether the summed floating point result can be an integer or not prior to performing the final subtraction.

Multiplication - When all integers or all floating point numbers are passed, the product is computed and returned using all integer or all floating point multiplication, respectively (in the case of all floating point numbers, a check is performed to determine whether the final result can be an integer). When mixed arguments are passed, the product is computed as a floating point product of all numbers and then the final result is checked to see whether it can be an integer. The rationale for computing the floating point product (instead of computing the product separately similar to the addition procedure) is in order to reduce the likelihood of precision issues related to multiplying small floating point numbers together.

Division - This procedure is implemented similar to and in terms of multiplication because (/ 2 3 4 5 6) = (2 / 3 / 4 / 5 / 6) =  (2 / (3 * 4 * 5 * 6)) ~= 0.0055. The denominator is first computed and then the final result is computed as the first number divided by the computed denominator. The final result is then checked to see whether or not it can be an integer.

That concludes the discussion of numbers and exactness. I know it was a bit dense, but you made it though and I hope you learned something new! Leave a comment if you have any questions. Take care.

Get Schemer

Download NowName your own price

Leave a comment

Log in with itch.io to leave a comment.