Performances of Schnell_pi

As far as I know, Schnell_pi is presently (August, 10, 2001) the fastest program in the world running on a PC for the computation of Pi, in the range from 1000 to 250 millions digits.

Schnell_pi has been designed to be as fast as possible. This is why it uses a AGM algorithm. The number of floating point operations needed for the complete calculation of N digits of Pi scales as N*(log2(N)^2), where log2(N) is the base 2 logarithm of N (in the present implementation, N has to be a power of 2). With the present implementation of Schnell_pi (version 1.0), the number is close to 12*N*(log2(N)^2), i.e. around 5 G (1 G = 2^30 = 1.073 billion) floating-point operations.

The fastest competitors of Schnell_pi are programs using Ramanujan type formulae (among which the Chudnovsky formula is apparently the fastest) computed using the Binary Splitting method. Although the number of operations for these programs scale as N*(log2(N)^3), i.e. with an additional power of log2(N), the prefactor is so small that they involve less floating-point operations up to several billion digits. However, less operations does not necessarily imply a faster program. The details of the implementation come into play. Indeed, a careful implementation of the AGM algorithm with FFT based multiplication requires less data movement between the processor and the memory than a Chudnovsky/Binary Splitting algorithm. As such data movement is the main bottleneck for computing a large number of digits, AGM can compete with Chudnovsky/Binary Splitting, even if the number of floating-point operations is roughly doubled for 1 million digits. The two fastest programs (for PC) that I know are:

I ran systematic tests for powers of 2 number of digits of Pi between 65,536 (2^16) and 268,435,456 (2^28) on various computers. Schnell_pi was ran in a standard RedHat 7.0 environment (KDE desktop and various daemons running on the background) on a 2.2.17 Linux kernel. Faster timings (by few %) can be obtained in the single-user mode. PiFast and QuickPi were ran in a standard Windows98 environment.

In all tests but one, Schnell_pi is the fastest program, typically by 10 or 20%. The only exception is for 131,072 digits on a Pentium III 800 MHz, where Schnell_pi needs 1.29 second and QuickPi only 1.18 second. However, Schnell_pi is limited to powers of 2 number of digits, which is not the case for PiFast and QuickPi. For intermediate numbers of digits (for example 1.5 million), the latter programs are faster.

Here are the results:

Pentium II 266 MHz with 128 MB of SDRAM 66 MHz
Number of digits Time for Schnell_pi Time for PiFast Time for QuickPi
64 k 1.45 seconds 2.37 seconds 1.85 seconds
128 k 3.54 seconds 4.78 seconds 3.79 seconds
256 k 8.26 seconds 11.21 seconds 8.46 seconds
512 k 18.67 seconds 24.94 seconds 19.29 seconds
1 M 42.02 seconds 56.63 seconds 47.63 seconds
2 M 94.41 seconds 128.63 seconds 117.47 seconds
4 M 213.96 seconds 300.99 seconds 281.41 seconds
8 M 493.3 seconds 679.8 seconds 719.7 seconds

Pentium III 500 MHz with 256 MB of SDRAM 100 MHz
Number of digits Time for Schnell_pi Time for PiFast Time for QuickPi
64 k 0.83 seconds 1.54 seconds 1.07 seconds
128 k 1.97 seconds 2.86 seconds 2.13 seconds
256 k 4.55 seconds 6.54 seconds 4.78 seconds
512 k 10.43 seconds 14.34 seconds 11.03 seconds
1 M 23.41 seconds 32.46 seconds 27.39 seconds
2 M 53.10 seconds 73.44 seconds 68.01 seconds
4 M 117.16 seconds 172.13 seconds 162.38 seconds
8 M 270.83 seconds 390.57 seconds 385.96 seconds
16 M 609.18 seconds 879.20 seconds 886.05 seconds

Pentium III 800 MHz with 1 GB of SDRAM 133 MHz
Number of digits Time for Schnell_pi Time for PiFast Time for QuickPi
64 k 0.54 seconds 0.88 seconds 0.61 seconds
128 k 1.29 seconds 1.93 seconds 1.18 seconds
256 k 2.96 seconds 4.17 seconds 3.07 seconds
512 k 6.77 seconds 9.06 seconds 7.10 seconds
1 M 15.59 seconds 20.27 seconds 18.35 seconds
2 M 37.26 seconds 45.09 seconds 46.30 seconds
4 M 83.55 seconds 107.00 seconds 112.75 seconds
8 M 195.17 seconds 234.09 seconds 269.35 seconds
16 M 439.07 seconds 541.79 seconds Wrong result
32 M 1014.96 seconds 1225.83 seconds
64 M 2275.29 seconds 2708.26 seconds

PiFast should be able to compute 128 M digits with 1 GB of memory, but crashes for some unknown reason.

I could also run tests with Schnell_pi only on a bigger machine:

Pentium III 933 MHz with 4 GB of SDRAM 133 MHz
Number of digits Time for Schnell_pi
128 M 4749.57 seconds
256 M 10549.46 seconds

and on a Athlon machine:

Athlon 1.2 GHz with 1 GB of DDR SDRAM PC2100
Number of digits Time for Schnell_pi
64 k 0.28 seconds
1 M 8.80 seconds
8 M 110.6 seconds
64 M 1190 seconds

I am interested in any other timing results.

Back to Schnell_pi homepage What is Schnell_pi? How to run Schnell_pi? To do


dominique@delande.nom.fr>
Last modified: Fri Aug 10 18:14:19 CEST 2001