I'm not too familiar with what the "bitslice" and "normal" code is. What I did was take the version of the DES code that generates results that are padded and out-of-order, and transformed those back to the normal format in base64, and used that to make a small utility that searched for specific substrings and output them. Even with the overhead of the conversion, it was nearly twice the speed of any other DES code. If I was trying to crack specific tripcodes, I could do the conversion in the other direction just once and make it even faster, but I didn't find that very interesting.
So, wild discovery, and using the john code in my own program.