This is a package written by my good friend Mark Bellon. I have learned a lot from him, through his knowledge, support and friendship. This package is yet another demonstration of the power, AltiVec and MP offer to scientific computing (details available, below). The package is very portable, and includes highly optimized ports for OS X and Linux (PPC and x86). If you have question about GRAVSIM, please write directly to Mark Bellon, at

Download Source: gravsim.tar.gz
Note: Please use gnutar to extract the source from this distribution.

GRAVSIM is a package that contains an ever evolving and hopefully growing set of gravitational N body solvers. These solvers attempt to track the motion of a set of bodies moving under the influence of gravity. They are used in many areas of astrophysical study including planetary motion, star cluster formation and stability and galactic formation, stability and interactions (i.e. colliding galaxies).

Tracking the motion of a large number of bodies, each with its own mass and velocity, is a daunting task. Meaningful research problems involve anywhere from thousands to millions of bodies. The position, acceleration and velocity of each body (and some times other information) must be recorded many times over a period of (simulated) time all the while insuring that the basic laws of physics are followed.

Thankfully, the computation complexity of N body problems can be attacked (optimized) by a variety of methods: specialized mathematical methods, use of parallel computers (multiple processors dividing up the problem and sharing the load) and vectorization (a hardware and software optimization technique available on some processors to increase the efficency of the computations).

The GRAVSIM package was written to study these optimization techniques. The multiprocessor support is designed for computers that have multiple processors all sharing a common memory (i.e. Multiple Instruction, Multiple Data (MIMD) global shared memory machines). It has been used to study problems on machines with as many as 22 processors.

The source code for GRAVSIM is included and provides extensive documentation and explanations about how things are done. Since GRAVSIM is about learning, it is written to clearly present the underlying ideas. Optimizations that would obscure simplicity but obtain improved performance were not performed. No code rewriting or restructuring tools are necessary - GRAVSIM explicitly controls itself at all levels. Everything from the lowest level mechanism to the highest level algorithm are fully and directly viewable.

Two radically different mathematical methods are currently available. The method of Barnes and Hut and the "classic" Newtonian method. The Newtonian method solves problems in O(N2) time (increase the number of bodies by a factor of 3 and the time to obtain a solution increases by a factor of 9). The method of Barnes and Hut solves problems in O(N log N) time (much, much better than N2). In the future, I hope to add the method of Dehnen which solves problems in O(N) time (increase the number of bodies by a factor of 4 and the time to obtain a solution increases by a factor of 4).

On common dual processor desktop machines available today, GRAVSIM can achieve a speedup factor of 1.95 or better. That cuts the time to obtain a solution virtually in half! Have a machine with more than two processors? Expect a speedup factor that increases nearly linearly with the number of processors. I have measured a speedup factor of 18.6 times on a machine with 20 processors.

On AltiVec enhanced PowerPC hardware (the G4 and G5 processors) vector optimizations are possible. Here is a summary of what one may expect (approximately):

Method (AltiVec/Scalar Ratio) G4 G5
Newtonian 21.5 5.5
Barnes and Hut 2.2 1.25

The G4/G5 speedup ratios were initially surprising but investigation uncovered two architectural differences that explained virtually all of the differences:
1) The G5 processor has fast hardware square root instructions (single and double precision) which the G4 lacks (and must be provided via a complex software library routine).
2) The G5 processors are able to exploit more parallelism within a stream of instructions and this makes the G5 processors proportionally better at scalar code than the G4.

Have an AltiVec enhanced multiprocessor PowerPC machine? The speed-up factors for multiple processors and AltiVec multiply. Speedup factors of 10-40 are possible on dual processor machines and additional processors further improve things.

A simple X windows based viewer is included to allow you to view the generated data and a fully functional access library is provided so that additional tools may be written to access and analyze your data files.

Back to top