IMD Home


FMQ Home

User Guide
Benchmarks (old)
Examples, Movies (old)
Mailing Lists


Benchmark Numbers (old)

For our benchmarks we use two simple model systems, an FCC crystal with Lennard-Jones pair interactions, with a potential cutoff at 2.7 σ, and a B2 Ni-Al crystal with EAM interactions. The serial benchmarks are performed with about 128k atoms (128 thousand atoms), whereas the parallel benchmarks are performed with 2k, 16k, and 128k atoms per CPU. In other words, the system grows with the number of processes, so that we measure the scale-up performance. The smaller the number of atoms per CPU, the more important is the communication performance, especially the latency. For all systems, the neighbor list option nbl has been used.

The performance is given by the total computation time per time step and atom, given in microseconds. Ideally, this time does not depend on the system size or the number of processors. For the serial version, this is largely the case (except for cache size effects). For the parallel version, however, the scaling with the number of processors depends on the the processor interconnect, but also on the system size. The communication overhead is proportional to the surface of the block of material a processor has to deal with.

Most PC type processors, like Pentium 4, Xeon, Opteron, and Athlon, perform very well with IMD, and show an excellent price/performance ratio. For pair and EAM interactions using neighbor lists, Itanium processors are currently the fastest, but in order to achieve this performance, the code had to be carefully tuned for these processors. For other interactions, this tuning has not been done yet, which means that performance can be much worse.

Serial Performance

The serial performance depends very little on the system size. We therefore list the performance numbers only for one system size with 128k atoms. We give the timings for pair and EAM interactions for different processor/compiler combinations.

Intel Itanium2 1.5 GHzicc 2.58 5.05
AMD Opteron 2.2 GHz icc 4.41 6.59
Intel Xeon 3.2 GHz icc 4.64 7.44
AMD Opteron 2.0 GHz gcc 5.82 8.76
AMD Athlon MP 2600+ gcc 5.9411.56
IBM Power4+ 1.7 GHz xlc 6.93 9.02
UltraSparc III 900 MHzcc 15.0825.90
Alpha EV6 667 MHz cc 16.0422.99

Parallel Performance

The scaling of the parallel performance with the number of processors depends on the system size. We give below the performance for three different system sizes, with 2k, 16k, and 128k atoms per processor, each with pair and EAM interactions. For a large processor numbers, or for small systems per processor, a low latency interconnect like Myrinet or Infiniband is required. For smaller processor numbers, Gbit Ethernet or even 100Mbit Ethernet may also work.

Itanium cluster with Quadrics interconnect

scaling on Itanium cluster

Xeon EM64T cluster with Infiniband interconnect

scaling on Xeon cluster

Opteron cluster with Myrinet interconnect

For the smallest systems with 2k atoms per CPU, the interconnect latency starts to become noticable. For larger systems, the scaling is still excellent.

scaling on Opteron cluster

IMB Regatta cluster

For the smallest CPU numbers, there are probably interferences with other processes running on the same node.

scaling on IBM Regatta system