>>182
i am not so sure you know what bitslicing means. ;)
basicly it means using a CPU with N bit registers as N CPUs with 1 bit registers.
some operations (like DES) can profit from that a lot.
johns DESBS code does 64 "crypt() equivalent operations" in parallel on MMX, or 128 on altivec.
so porting this to newer extensions like SSE(2) would probably get another performance increase by a factor of 1.5-2 ...
but its hard to say without trying it since this stuff is getting really non-intuitive already. (direct influence of ram-speed on the performance, or the retarded linux 2.4.x SMP scheduler eating 70% of the performance, just to name two cute examples.)