Minimizing Numerical Errors
Numeric errors involving
floating-point numbers are inevitable. This section discusses how to minimize
such errors through an example.
Listing 4.5 presents an example that sums a series that starts with
0.01
and ends with 1.0. The numbers in the series will increment by 0.01,
as follows: 0.01
+ 0.02 + 0.03 and so on. The output of
the program appears in Figure 4.7.
Figure 4.7. The program uses a for
loop to sum a series from 0.01
to 1.0
in increments of 0.01.
Listing 4.5. TestSum.java
The for
loop (lines 9–10) repeatedly adds the control variable i
to the sum. This variable, which begins with 0.01,
is incremented by 0.01
after each iteration. The loop terminates when i
exceeds 1.0.
The for loop initial action can be any statement, but it is often used
to initialize a control variable. From this example, you can see that a control
variable can be a float type. In fact, it can be any data type.
The exact sum
should be 50.50,
but the answer is 50.499985. The result is not precise because computers use a
fixed number of bits to represent floating-point numbers, and thus cannot
represent some floating-point numbers exactly. If you change float in the program to double as follows, you should see a slight improvement in
precision because a double
variable takes sixty-four bits, whereas a float
variable takes thirty-two bits.
// Initialize sum double sum = 0; // Add 0.01, 0.02, ..., 0.99, 1 to sum for (double i = 0.01; i <= 1.0; i = i + 0.01) sum += i;
However, you will be stunned to see that
the result is actually 49.50000000000003. What went wrong? If you print out i for each iteration in the loop, you will see that the last
i
is slightly larger than 1
(not exactly 1).
This causes the last i
not to be added into sum. The fundamental problem is that the floating-point
numbers are represented by approximation. Errors commonly occur. There are two
ways to fix the problem:
-
To minimize errors, add numbers from 1.0, 0.99, down to 0.1, as follows:
// Add 1, 0.99, ..., 0.01 to sum for (double i = 1.0; i >= 0.01; i = i - 0.01) sum += i;
To ensure that
all the items are added to sum, use an integer variable to count the items. Here is the new
loop:
double currentValue = 0.01; for {int count = 0; count < 100; count++) { sum += currentValue; currentValue += 0.01; }
No comments:
Post a Comment