Hello,
I am looking at orevein generation performance in GT5U. Currently I am seeing numbers around 100-300ms to fill an orevein.
I have pulled the logic into a standalone test harness, and it seems to run fairly fast, about 600uS to fill a vein (without
any minecraft-related function calls, just randomization and loop control). I have optimized that a little, shaving off about
60uS, but that's clearly not where the major culprit is located.
I was wondering if anyone has done any performance comparisons on how to step through a chunk. Based on my read of https://minecraft.gamepedia.com/Chunk_format it seems like doing a full Y layer at a time is the most CPU cache-friendly
since you wouldn't jump up and down between sections. Also, it would be ideal to order XZ so that the memory accesses
are linear and won't cause cache collisions (TMI: http://www.aristeia.com/TalkNotes/ACCU2011_CPUCaches.pdf). The current code
does looping as
for X
for Z
for Y
so there might be some benefit to swapping Y to the outermost loop. And possibly a benefit to swapping order of X and Z.
It might all be re-arranging deck chairs on the Titanic though. My suspicion is that the major time-sink is turning the victim block into a tile entity. I'm going to try adding some measurement code to see how much of the overall time is spent doing that. If that's the case, there isn't much to really do except lower the number of blocks converted by making veins smaller, reducing their density, or optimizing World.setBlock().
Also, if anyone has a more fleshed-out command line test harness for worldgen, that would be great (a fake chunk would be awesome). The variance when
testing in-game is so high it's hard to tease out if improvements actually matter. If not, can someone point me to the Chunk class file so I can make a better simulation?
Thanks for any help!