Code, algorithms, languages, construction...
-
H.G.Muller
- Posts: 190
- Joined: Sun Jul 14, 2013 10:00 am
- Real Name: H.G. Muller
Post
by H.G.Muller » Tue Oct 06, 2015 1:52 pm
thevinenator wrote:void BitmapTest()
{
U64 i,j,val;
U64 PowerOf2[64];
for (i = 0; i <= 63; i++)
PowerOf2 = (1ULL << i);
clock_t begin = clock();
for (j = 1; j < 1000000000; j++)
for (i = 0; i < 64; i++)
{
// try using shifts
val = 1ULL << i;
// try array
// val = PowerOf2;
}
printf("%llu\n", val); // needed to keep the optimizer from not doing the loops
}
If the compiler truly optimizes, this should always execute in zero time. At least, that is what such test attempts always do for me. With gcc -O2 the compiler is clever enough to see that the assignment to val is never used within the loop, and deletes the entire loop! It only generates the code for printf("%llu\n", 1ULL<<63);.
Btw, 1ULL<<i would normally be faster, if your shift unit is not overloaded. Note that i7 has only one shift unit, so it can be a bottleneck.
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Tue Oct 06, 2015 7:55 pm
H.G.Muller wrote:thevinenator wrote:void BitmapTest()
{
U64 i,j,val;
U64 PowerOf2[64];
for (i = 0; i <= 63; i++)
PowerOf2 = (1ULL << i);
clock_t begin = clock();
for (j = 1; j < 1000000000; j++)
for (i = 0; i < 64; i++)
{
// try using shifts
val = 1ULL << i;
// try array
// val = PowerOf2;
}
printf("%llu\n", val); // needed to keep the optimizer from not doing the loops
}
If the compiler truly optimizes, this should always execute in zero time. At least, that is what such test attempts always do for me. With gcc -O2 the compiler is clever enough to see that the assignment to val is never used within the loop, and deletes the entire loop! It only generates the code for printf("%llu\n", 1ULL<<63);.
Btw, 1ULL<<i would normally be faster, if your shift unit is not overloaded. Note that i7 has only one shift unit, so it can be a bottleneck.
I had told him to look at the asm output (gcc -s). His loop is gone. All that is left is the loop with the print, since a function call prevents optimizing the loop away unless the function can be inlined and it also does nothing...
-
H.G.Muller
- Posts: 190
- Joined: Sun Jul 14, 2013 10:00 am
- Real Name: H.G. Muller
Post
by H.G.Muller » Wed Oct 07, 2015 9:56 am
Indeed. And even if not (i.e. when compiled without any optimalization) the execution time should be completely dominated by the printf. His program prints a billion lines. How much time would it take to print one line (and scroll the window) compared to doing 64 shifts or loads...?
-
MathAddict
- Posts: 3
- Joined: Mon Oct 26, 2015 2:24 pm
- Real Name: Rian Neogi
Post
by MathAddict » Mon Oct 26, 2015 2:30 pm
On a side note, are addition and multiplication operators faster than array lookups?
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Wed Oct 28, 2015 2:39 am
Impossible to answer. If the table lookup comes from L1, no. If it comes from L2 it might be a tossup for multiply but add will be faster. If it comes from L3 or even main memory most anything is faster...