The macstl gcc rematch

The fight you’ve all been waiting for, after that last battle of the libraries. In the red corner, the latest gcc 3.3 libstdc++, courtesy of Apple’s December 2002 gcc Updater, replacing the old gcc 3.1 codebase. And in the blue corner, a newly retuned macstl 0.1.2. The ground remains the same: a dual processor Power Macintosh G4. But I had added a couple of new benchmarks to exercise our participants.

Operations per Clock Tick (larger = faster)
operation	gcc 3.3 libstdc++	macstl 0.1.2, Altivec off	macstl 0.1.2, Altivec on
inline arithmetic	807	888	3355
inline transcendental	74	79	1041
outline transcendental	90	96	39
inline scalarization	1474	1481	4329
inline predication	186	595	3448
inline slice	2958	2890	4065
unchunked apply	408	403	406
unchunked shift	1587	1086	1470
unchunked mask	251	154	163
unchunked indirect	429	421	534

New Benchmarks

The inline predication benchmark tests the relational min and max expressions of the form (v1 == v2).min (), which are optimized in macstl 0.1.2. The inline slice benchmark is the same as the old unchunked slice benchmark, but since slicing is now chunked and inlined, the name was changed. The unchunked shift benchmark was in the source code since macstl 0.1, but while it crashed the gcc 3.1 libstdc++, it works fine now in 3.3.

The biggest jump in gcc performance is with inline arithmetic: 81% faster than the previous version. However, macstl without Altivec still keeps its lead at 10% faster. And with Altivec, it speeds away from 4.2x to 18.5x faster than gcc on all inline tests except for slicing.

New Optimizations

macstl 0.1.2 specifically targets chunked relational min and max expressions, using Altivec predicates to gain 18.5x speed over gcc in the inline predication test. It even enhances unchunked bool-valued min and max, yielding 5.8x speed over gcc.

The new slicing algorithms also come out on top, based on Altivec permutes. The improvements over scalar code are not as dramatic though, being just 37% faster than gcc and 41% faster than without Altivec in the inline slice test.

» Codewarrior vs. gcc. vs. macstl