for (j = 1; j < 1000000000; j++)
for (i = 0; i < 64; i++)
{
// try using shifts
val = 1ULL << i;
// try array
// val = PowerOf2;
}
printf("%llu\n", val); // needed to keep the optimizer from not doing the loops
}
If the compiler truly optimizes, this should always execute in zero time. At least, that is what such test attempts always do for me. With gcc -O2 the compiler is clever enough to see that the assignment to val is never used within the loop, and deletes the entire loop! It only generates the code for printf("%llu\n", 1ULL<<63);.
Btw, 1ULL<<i would normally be faster, if your shift unit is not overloaded. Note that i7 has only one shift unit, so it can be a bottleneck.
for (j = 1; j < 1000000000; j++)
for (i = 0; i < 64; i++)
{
// try using shifts
val = 1ULL << i;
// try array
// val = PowerOf2;
}
printf("%llu\n", val); // needed to keep the optimizer from not doing the loops
}
If the compiler truly optimizes, this should always execute in zero time. At least, that is what such test attempts always do for me. With gcc -O2 the compiler is clever enough to see that the assignment to val is never used within the loop, and deletes the entire loop! It only generates the code for printf("%llu\n", 1ULL<<63);.
Btw, 1ULL<<i would normally be faster, if your shift unit is not overloaded. Note that i7 has only one shift unit, so it can be a bottleneck.
I had told him to look at the asm output (gcc -s). His loop is gone. All that is left is the loop with the print, since a function call prevents optimizing the loop away unless the function can be inlined and it also does nothing...
Indeed. And even if not (i.e. when compiled without any optimalization) the execution time should be completely dominated by the printf. His program prints a billion lines. How much time would it take to print one line (and scroll the window) compared to doing 64 shifts or loads...?
Impossible to answer. If the table lookup comes from L1, no. If it comes from L2 it might be a tossup for multiply but add will be faster. If it comes from L3 or even main memory most anything is faster...