[1.6.4] getBlockTileEntity not as intensive as it seems?

Reika · April 5, 2014

I am trying to optimize code in RotaryCraft (for those not familiar, a large tech mod that has a bit of a reputation for being computationally intensive). One of the things that seems like a good start was the repeated calls of getBlockTileEntity(x,y,z), as I have heard that that function is terribly inefficient from a variety of developers, including KingLemming and LexManos.

So I set up a caching system using a HashMap<ChunkCoordinates, TileEntity>, to store the six adjacent TileEntities, and update the cache with the Block type's onNeighborTileChange() function. It works as intended, except for one thing:

No performance gain. And not because the new system is slow, but because getBlockTileEntity is not. I tested both normal terrain and superflat, and also with a 40x20x40 block of furnaces (to inflate the size of the chunk's loadedTileEntity list). Every time, there was negligible difference between getBlockTileEntity(x,y,z) and fetching the TE from cache.

I took some timing data with System.nanoTime():

Base Delay between nanoTime() calls and print: 293 ns
Additional Delay to call world.getBlockTileEntity once: ~800ns
Additional Delay to call from cache once: ~700ns
Additional Delay to call world.getBlockTileEntity 512 times: ~4.4 microseconds
Additional Delay to call from cache 512 times: ~6.5 microseconds
Additional Delay to call world.getBlockTileEntity 65536 times: ~1.2 milliseconds
Additional Delay to call from cache 65536 times: ~1.2 milliseconds

I also tried to call it 2M times per tick, but the game froze.

Why, however, am I getting these results?

TheGreyGhost · April 5, 2014

Hi

In my experience it's usually difficult to predict in advance what the most efficient way of implementing the code will be. There are some good general rules, and it can be very important to choose the appropriate algorithm, but nine times out of ten I find that the differences are completely negligible when you run in the real game.

So I think what you're doing is the right approach- write your code to be understandable, modular, and easily maintained, measure its performance, and optimise the bits that need it using your "real situation" performance measurements.

-TGG

Reika · April 5, 2014

Hi

In my experience it's usually difficult to predict in advance what the most efficient way of implementing the code will be. There are some good general rules, and it can be very important to choose the appropriate algorithm, but nine times out of ten I find that the differences are completely negligible when you run in the real game.

So I think what you're doing is the right approach- write your code to be understandable, modular, and easily maintained, measure its performance, and optimise the bits that need it using your "real situation" performance measurements.

-TGG

The thing is, there is no reason why iterating over a list 16000 entries long should be as fast as a hashmap get, or why there is such a large disconnect between my observations and "accepted knowledge".

Player · April 5, 2014

The list you are talking about only contains the very recently added TEs, all others are being retrieved through 2 HashMap lookups (one to get the Chunk, one to get the TE in there).

While a HashMap is fairly quick, it'll still take a lot longer than a direct array/field access. You are supposed to use an array like TileEntity[6], one entry for each direction, to cache the neighbor TEs, not another HashMap. Be aware of annoying and random side effects due to out of date caches though.

Abastro · April 5, 2014

How did you use ChunkCoordinates to get the Tile Entity from the Map?

if you used new ChunkCoordinates(x,y,z) or something, it should take time to create the coordinates.

sequituri · April 5, 2014

The Hopper uses a method that looks somewhat reasonable for finding nearby TEs.

world.getEntitiesWithinAABBExcludingEntity((Entity)null, AxisAlignedBB.getAABBPool().getAABB(xPos, yPos, zPos, xPos + 1.0D, yPos + 1.0D, zPos + 1.0D), IEntitySelector.selectInventories);

The first argument is the entity to exclude - don't want to include oneself.

The last one filters by EntityTypes.

It would be nice to know if this is one of the better (faster) methods.

sequituri · April 5, 2014

derp! I just realized that despite the similarity of names, TileEntities do not inherit from Entity in any way.

I also just figured out that an Entity (like EntityMinecartChest) can move but a TileEntity (like TileEntityChest) cannot. I wonder if that is the only difference, since both can have inventories and additional data and NBT's. I would be nice to be able to learn all this on the Wiki...

Reika · April 5, 2014

How did you use ChunkCoordinates to get the Tile Entity from the Map?

if you used new ChunkCoordinates(x,y,z) or something, it should take time to create the coordinates.

Yes, that is what I did, because the alternative - pooling them - would be even slower. That said, it should be negligible.

The list you are talking about only contains the very recently added TEs, all others are being retrieved through 2 HashMap lookups (one to get the Chunk, one to get the TE in there).

That explains a lot.

While a HashMap is fairly quick, it'll still take a lot longer than a direct array/field access. You are supposed to use an array like TileEntity[6], one entry for each direction, to cache the neighbor TEs, not another HashMap.

HashMap.get() is O(log(n)). Should it really be that big a difference? I do not like how messy an array will be.

Be aware of annoying and random side effects due to out of date caches though.

Block.updateNeighborTile() solves that issue nicely, and my only recently having learned it is why I am only now trying caching.

I doubt you can get much performance gain from this. getBlockTileEntity is already pretty much as efficient as it can be.

Then why do so many people in high places - including the developer of the mod I have seen called "the leader in optimization" - say otherwise?

sequituri · April 6, 2014

Did your results involve eliminating the time required for an empty timing loop of x times per tick? For a reasonably accurate answer, the loop should be called X number of times with no load, then that time should be subtracted from the same x loops with the actual code used.

TheGreyGhost · April 6, 2014

I doubt you can get much performance gain from this. getBlockTileEntity is already pretty much as efficient as it can be.

Then why do so many people in high places - including the developer of the mod I have seen called "the leader in optimization" - say otherwise?

Perhaps the code has changed since they last looked at it. Perhaps they never tested it using the same conditions that you're running it under. Perhaps they're just all wrong. I'd say, believe what your testing shows you and don't worry about it too much :-) Time to move on to the next possible source of performance problems I reckon...

BTW do you know about the inbuilt Minecraft profiler?

All those

this.mc.mcProfiler.startSection("level");

sprinkled through the code?

/debug start

/debug stop (creates a dump file)

You can even add your own sections if you want.

Your IDE might also have some powerful profiling capabilities too, I often find the results a surprise and show me performance gains I probably wouldn't have spotted without it.

-TGG

Reika · April 6, 2014

Did your results involve eliminating the time required for an empty timing loop of x times per tick? For a reasonably accurate answer, the loop should be called X number of times with no load, then that time should be subtracted from the same x loops with the actual code used.

Yes, I did account for that. 65536 iterations across an empty loop took 295 ns, and I simply subtracted that off.

I'll say it again: I doubt that this benefits anything. Your caching will hardly be any faster in the long run, especially when you do it this way. Because a) You need to keep track of changes to the world, so that your cache doesn't get out of date and b) you need to create the ChunkCoordinates keys for lookup.

What would help though is, as said above somewhere, cache your neighbor TileEntities, if you use them and refresh them on onNeighborBlockChange. Then don't use a HashMap but an EnumMap<ForgeDirection, TileEntity>.

I am doing it this way, though instead of EnumMap I am using TileEntity[6] and accessing it by dir.ordinal(). Given that it takes me only 3 ns to call a function, I think that will work perfectly.

Testing confirms this; 65536 calls of getBlockTileEntity, 2.4 ms. 65536 calls of getCachedTE(dir), which fetches from the array, takes less than 80 microseconds. This seems to translate roughly into a 20FPS gain in a singleplayer creative superflat, and should only increase from there.

BTW do you know about the inbuilt Minecraft profiler?

All those

this.mc.mcProfiler.startSection("level");

sprinkled through the code?

/debug start

/debug stop (creates a dump file)

You can even add your own sections if you want.

Your IDE might also have some powerful profiling capabilities too, I often find the results a surprise and show me performance gains I probably wouldn't have spotted without it.

-TGG

I have seen a lot of that, but had no idea what it was for. Is it as simple as you are implying?

jabelar · April 7, 2014

This is a cool discussion. The only thing I have to add though is sometimes a simple, brute force approach works sufficiently that you don't need to do all the work of a trickier, over-finessed approach. One thing I've taken advantage of in the past was the difference between human perception and computer execution speed. For example, if you process only half of the instances in your code every other tick, or one third every three ticks, etc. It is usually simple to implement and the human users often don't notice, especially if you're slightly smart about it (like maybe things behind the player or otherwise out of view (or farther) get updated less often.

There are some situations where you need to process everything in same tick for some synchronization purpose, but I've seen a lot more cases where selective, less frequent updates give big perf boost withough much coding pain.

I especially recommend this in the case (perhaps like here) where you know a certain activity is taking a majority of the processing time but you feel that any further true optimization is tough / impossible to squeeze out.

Reika · April 7, 2014

This is a cool discussion. The only thing I have to add though is sometimes a simple, brute force approach works sufficiently that you don't need to do all the work of a trickier, over-finessed approach. One thing I've taken advantage of in the past was the difference between human perception and computer execution speed. For example, if you process only half of the instances in your code every other tick, or one third every three ticks, etc. It is usually simple to implement and the human users often don't notice, especially if you're slightly smart about it (like maybe things behind the player or otherwise out of view (or farther) get updated less often.

There are some situations where you need to process everything in same tick for some synchronization purpose, but I've seen a lot more cases where selective, less frequent updates give big perf boost withough much coding pain.

I especially recommend this in the case (perhaps like here) where you know a certain activity is taking a majority of the processing time but you feel that any further true optimization is tough / impossible to squeeze out.

I do that with some code, but a lot of it is part of a sort of physics engine; if I slowed that down, it would be extremely noticeable.

jabelar · April 7, 2014

This is a cool discussion. The only thing I have to add though is sometimes a simple, brute force approach works sufficiently that you don't need to do all the work of a trickier, over-finessed approach. One thing I've taken advantage of in the past was the difference between human perception and computer execution speed. For example, if you process only half of the instances in your code every other tick, or one third every three ticks, etc. It is usually simple to implement and the human users often don't notice, especially if you're slightly smart about it (like maybe things behind the player or otherwise out of view (or farther) get updated less often.

There are some situations where you need to process everything in same tick for some synchronization purpose, but I've seen a lot more cases where selective, less frequent updates give big perf boost withough much coding pain.

I especially recommend this in the case (perhaps like here) where you know a certain activity is taking a majority of the processing time but you feel that any further true optimization is tough / impossible to squeeze out.

I do that with some code, but a lot of it is part of a sort of physics engine; if I slowed that down, it would be extremely noticeable.

Yeah, I understand some things do need real-time. However, even with some things like physics I have been successful with just scaling the physics itself to keep up with pace of updates. For example something can be processed with twice the velocity every other tick, etc. You of course have to be careful with collisions as you don't want things passing through each other without detecting a collision. In terms of users noticing I think it really depends on how near it is and whether it is within their field of view. For example, if a cog in a machine is not turning smoothly but is behind the player it may not matter.

Anyway, I was just putting it out there because sometimes you're already fairly optimized but still have a perf problem -- at that point you need to break down the processing into manageble, less frequent, activity.

Reika · April 7, 2014

Yeah, I understand some things do need real-time. However, even with some things like physics I have been successful with just scaling the physics itself to keep up with pace of updates. For example something can be processed with twice the velocity every other tick, etc. You of course have to be careful with collisions as you don't want things passing through each other without detecting a collision. In terms of users noticing I think it really depends on how near it is and whether it is within their field of view. For example, if a cog in a machine is not turning smoothly but is behind the player it may not matter.

Anyway, I was just putting it out there because sometimes you're already fairly optimized but still have a perf problem -- at that point you need to break down the processing into manageble, less frequent, activity.

Render code only runs when the player is looking at something, so optimizing it there will have no effect. I have tried distance LOD modeling, but it was both very noticeable and not particularly helpful.

I did do it with thermal and pressure calculations, as that saves a lot of load, but things like the power system (probably one of my biggest load sources), glitches when "stepped".

Sign In

[1.6.4] getBlockTileEntity not as intensive as it seems?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Announcements

Recently Browsing

Posts

Topics

Who's Online (See full list)

Important Information