only thing I did not like about the final candidate was that the security margin seem far too conservative and wasteful as really they seemed to be trying to gain the judges favor by making some changes based on the criticism that Blake was too fast
when working with FPGA the previous 10 round submission was far more practical to implement with the design tools, for Blakecoin I used a 8 round Blake-256 variant and at the time of working on it estimated a best attack of about 2^192
I was very happy to see this paper that shows a best attack against 8 rounds of 2^200 which is better than I had originally thought and more than adequate for how we are using it in the wallet and mining http://eprint.iacr.org/2013/852.pdf
a 40% improvement in efficiency, latency or area has turned out to be a good trade off and its still 2^256 for bruteforce and if you look at the whole round function due to its parallel nature 8 rounds is actually 64 G function calls which which could be compared with SHA-256 which normally uses 64 rounds in a linear fashion so 8 round Blake was another very cool way to do things 😛
for the asic paper they only compare against a single SHA-256 but Bitcoin uses a length-extension defense “SHA-256d” design by Ferguson and Schneier. SHA256d(x) = SHA256(SHA256(x)) therefore in silicon SHA-256d has either twice the latency or twice the area depending on the design vs a single SHA-256