This doesn't surprise me. In almost every "heavy-computing" project I've done, there has been a heady discussion of the best I/O and CPU mix, to which the right answer has been "test it and see" (the latest that I was involved with was multiplying huge integers that don't fit into memory via a disk-based FFT -- we had around 128GB of RAM, and the integers were maybe 256GB each, and the hard drive capacity was 4TB or more, and I think using about 6 of the 24 available cpus was all that we could manage, as using more would mean the chunk size was too small, so the decimation method would mean that the I/O was more at another stage, though I forget the details).But the current EGTB 8K blocksize was simply a compromise between I/O bandwidth/cpu-utilization/compression efficiency. In reality, when we were testing, we found that the value (blocksize) really had an optimal number for each different CPU/Disk drive combination.
I think that Gaviota allows great control over the various decompression used, but the indexing system seems fixed. The block size looks like 32K: size_t block_mem = 32 * 1024; /* 32k fixed, needed for the compression schemes */
The RobboTotalBases are "not recommended" to be used in search (there are Shredder-style "RobboTripleBases" for that), but they have 64K blocks using BWT style compression, though something called "hyperindexing" seems to allow the block size to be 1MB, with then an extra 16-way indexing in the block itself). Any comparison to Nalimov is almost hopeless, given the different constraints.