Miscellaneous :: ScreenShots
This is a screenshot of my G4 running the scientific computational toolkit, Cactus. It shows the main page of a simulation of two sources of radiation in orbit. There is also a clear indication (look at the bottom right corner) that the simulation is using 2 processors in parallel. This is an MPI based parallel computation. As shown, one can observe and even control the simulation using an ordinary web browser. Click on the image for a larger and clearer version.
This is another screenshot of my G4 running the scientific computational toolkit, Cactus. This shows some pictures of the output of the above simulation. Two round sources of radiation are visible .. that seem to be orbiting each other. Click on the image for a larger and clearer version.
This shows the detailed processor usage of the above Cactus simulation. Look at the first 3 items in the table. They indicate that Cactus is using 2 processors based on MPI.
This screenshot shows two windows. In the window on top (the one with several "Hello World" 's) .. is demonstrating a popular OpenMP test. It indicates that one can use many "threads" in one's programs and therefore speed things up a lot on multiprocessor machines. The window underneath, demonstrates the power of parallel and vector computing on a program called "nzc" (New Zerilli Code -- that takes two black holes and merges them and calculates the gravity radiation coming from such a phenomenon). The first line shows that the program takes about 46 seconds to run normally (look at the 3rd number in the 2nd line). Then after vectorization (using the Velocity Engine of the G4 processor) of the program, it takes only 19 seconds (3rd number in the 4th line). And finally after parallelization and vectorization (using 2 processors and the Velocity Engine!) it takes just 11 seconds (3rd number 6th line). More than 4 times faster than the original!!
Miscellaneous :: Apple's CHUD Tools
Download: Apple's CHUD Tools
Apple has posted an in depth tutorial on Shark. Check it out!
Shark: Shark is a system profiler that samples the entire system while one's code is running to see where time is being spent. This is very useful information, since then as a programmer you can focus on optimizing the parts of your code that would make the most impact on performance. It even offers advice on how to improve your code's performance. As an example, I profiled my Mac while it ran my Teukolsky code (written in Fortran) with a source term (which happens to be the most computationally complex part of the code) that models the infall of a small particle into a black hole. To do so, all I did was launch Shark, begin execution of my code (compiled by xlf) and then ask Shark to begin sampling (click, Start). In a few seconds, I get a profile of my system as show below.
As expected most of the time is spent in the source term subroutine. After that time is spent in computing the right-hand-side of the PDE and also in the time-step update routine. You can get more information by playing around with the settings, etc. Just open the Help documentation on Shark. As with everything else, Apple's done a super job with documentation on these CHUD tools!
For my purposes, Shark works great. If you'd like to play with more advanced tools, look into Saturn and MONster (see below). The brief information included below is from email exchanges with my very knowledgable colleague and friend, Mark Bellon.
Saturn and MONster: Saturn is a fairly traditional program profiler - you compile in special stubs into your program via a compiler option and a special library that is linked in. You run your program and it spits out a profile output file. Saturn takes that file and analyses it in various useful ways. Saturn provides a detailed calling tree that is completely accurate - the program traces its own flow. MONster traces the flow statistically. Saturn helps you find the most frequently called routines and which routines you spend the most amount of time in. MONster helps you find the most frequently called routines, the amount of time spent in each routine AND allows you to see activity down to the instruction level. MONster also analyses the code it sees and describes issues with the code that may affect performance on a given processor (750, 7410, 7450 and 970).
If one were building a new program one should start with Saturn. Once the program is working and providing correct results one would optimize things at the algorithm level as well as at the subroutine level. That done, one would have a very good understanding of the program and would be ready to look for detailed improvements - MONster and Shark. Next one would attack code sequences that could be improved and play with compiler options for specific routines and files. Then one would start again with Saturn and see what, if anything, changed in the overall characteristics of the program. Eventually one would converge on a very optimal program.