Friday, November 18, 2011

Java (Minimizing Numerical Errors)


Minimizing Numerical Errors

Numeric errors involving floating-point numbers are inevitable. This section discusses how to minimize such errors through an example.
Listing 4.5 presents an example that sums a series that starts with 0.01 and ends with 1.0. The numbers in the series will increment by 0.01, as follows: 0.01 + 0.02 + 0.03 and so on. The output of the program appears in Figure 4.7.

Figure 4.7. The program uses a for loop to sum a series from 0.01 to 1.0 in increments of 0.01.


Listing 4.5. TestSum.java


 1 import javax.swing.JOptionPane;
 2
 3 public class TestSum {
 4   public static void main(String[] args) {

[Page 107]
5 // Initialize sum 6 float sum = 0; 7 8 // Add 0.01, 0.02, ..., 0.99, 1 to sum 9 for (float i = 0.01f; i <= 1.0f; i = i + 0.01f) 10 sum += i; 11 12 // Display result 13 JOptionPane.showMessageDialog(null, "The sum is " + sum); 14 } 15 }
The for loop (lines 9–10) repeatedly adds the control variable i to the sum. This variable, which begins with 0.01, is incremented by 0.01 after each iteration. The loop terminates when i exceeds 1.0.
The for loop initial action can be any statement, but it is often used to initialize a control variable. From this example, you can see that a control variable can be a float type. In fact, it can be any data type.
The exact sum should be 50.50, but the answer is 50.499985. The result is not precise because computers use a fixed number of bits to represent floating-point numbers, and thus cannot represent some floating-point numbers exactly. If you change float in the program to double as follows, you should see a slight improvement in precision because a double variable takes sixty-four bits, whereas a float variable takes thirty-two bits.
// Initialize sum
double sum = 0;

// Add 0.01, 0.02, ..., 0.99, 1 to sum
for (double i = 0.01; i <= 1.0; i = i + 0.01)
  sum += i;

However, you will be stunned to see that the result is actually 49.50000000000003. What went wrong? If you print out i for each iteration in the loop, you will see that the last i is slightly larger than 1 (not exactly 1). This causes the last i not to be added into sum. The fundamental problem is that the floating-point numbers are represented by approximation. Errors commonly occur. There are two ways to fix the problem:
  • Minimizing errors by processing large numbers first.
  • Using an integer count to ensure that all the numbers are processed.
  • To minimize errors, add numbers from 1.0, 0.99, down to 0.1, as follows:
// Add 1, 0.99, ..., 0.01 to sum
for (double i = 1.0; i >= 0.01; i = i - 0.01)
  sum += i;

To ensure that all the items are added to sum, use an integer variable to count the items. Here is the new loop:
double currentValue = 0.01;

for {int count = 0; count < 100; count++) {
  sum += currentValue;
  currentValue += 0.01;
}

After this loop, sum is 50.50000000000003.

No comments:

Post a Comment