TrueBrain
Members-
Posts
6 -
Joined
-
Last visited
Converted
-
Gender
Undisclosed
TrueBrain's Achievements
Tree Puncher (2/8)
10
Reputation
-
I fully agree that my benchmark might work on the cache too much. And possibly other optimizations Java does. So, I changed the benchmark a lot to avoid most of that, I hope. First, I used reflection to call sin() on the various of test cases. As far as I know Java can't optimize reflection, so it should not be possible to optimize the array calls to something more direct, avoiding the function call or something. At least that is my theory Second, every time I call sin(), next thing I do is trash the cache. I did this in a way I would in C-world, so I hope it carries enough in Java: I read a random array of 1024 * 1024 * 2 Integers (this should be between the 8 and 16 MiB or RAM). This should purge out anything in the cache that was there (as most L2 caches are only 2 MiB). Again, at least, my theory. Then running the benchmark gives indeed other results, but still with the same message. MathHelper is faster when the JIT kicks in. I updated the benchmark tool above. Please let me know what you think, or if you have any other ideas / suggestions. The results of this benchmark (on my machine, JRE 1.6): class MathHelper65536.sin(): first = 2.731 ms, mean = 74.238 us (CI deltas: -110.049 ns, +137.682 ns), sd = 62.098 us (CI deltas: -13.988 us, +22.066 us) WARNING: EXECUTION TIMES HAVE EXTREME OUTLIERS, SD VALUES MAY BE INACCURATE class MathHelper1024.sin(): first = 989.552 us, mean = 73.697 us (CI deltas: -260.063 ns, +368.889 ns), sd = 155.644 us (CI deltas: -34.896 us, +72.242 us) WARNING: execution times have mild outliers, SD VALUES MAY BE INACCURATE class java.lang.Math.sin(): first = 702.065 us, mean = 187.960 us (CI deltas: -1.187 us, +2.028 us), sd = 544.618 us (CI deltas: -183.904 us, +252.559 us) WARNING: EXECUTION TIMES HAVE EXTREME OUTLIERS, execution times may have serial correlation, SD VALUES MAY BE INACCURATE As you can see, MathHelper is now only twice as fast. PS: please do not compare these results with earlier results. The range of the test changed, and therefor the average execution time did.
-
Haha, I know those comments about too large post far too well But tnx, nice to know some people do appreciate them I indeed also found the IBM benchmark tool, and boy, it is a good one. The latest results: I have been benchmarking this wrong all along. It is very sad, but I have to conclude that FPS++ doesn't really help. Or if it does, it is very local. To explain: I did my benchmarks in the static initializer of a class. Nobody told me you had to be careful with them, and no documentation suggest you should. I am unsure, but it seems that if you call a function in the class of the static initializer, it imports the class/function again. Anyway, time goes up non-linear with it, and test results are very wrong. Worst of all: the JIT never kicks in. So most of my benchmarks earlier can be put in the bin! So, let's benchmarks outside the static initializer. This give completely different results. When you do just 1000 loops or so, Math.sin is faster. But when you go over it, MathHelper.sin starts to get faster. A lot in fact. In fact, so much faster that you should want to use MathHelper.sin. On OverMindDL1's request, I made a .jar which includes the IBM benchmark, prepared with 3 tests: MathHelper.sin, with a table size of 65536. MathHelper.sin, with a table size of 1024. Math.sin. It is available here: https://dl.dropbox.com/s/be86maw8ltqg6ed/MathSin.jar (source is included, takes 5+ minutes to run! Includes the IBM Benchmark Framework, I hope they don't mind 'distributing' it like this Awesome framework btw! Really impressive ...) The result: MathHelper65536.sin(): first = 19.566 ms, mean = 38.262 us (CI deltas: -84.409 ns, +106.498 ns), sd = 67.597 us (CI deltas: -13.994 us, +18.349 us) WARNING: EXECUTION TIMES HAVE EXTREME OUTLIERS, SD VALUES MAY BE INACCURATE MathHelper1024.sin(): first = 1.947 ms, mean = 36.361 us (CI deltas: -53.842 ns, +55.031 ns), sd = 38.650 us (CI deltas: -7.022 us, +10.210 us) WARNING: execution times have mild outliers, SD VALUES MAY BE INACCURATE Math.sin(): first = 1.173 ms, mean = 684.942 us (CI deltas: -1.077 us, +1.178 us), sd = 201.183 us (CI deltas: -33.190 us, +59.616 us) WARNING: execution times have mild outliers, SD VALUES MAY BE INACCURATE As you can, the second is fastest, closely followed by the first, and leaving the last far behind. Why? Well, as it turns out the JIT, the part that converts Java bytecode to more native code, only kicks in after a block has been called 1500 times with '-client', and 10,000 times with '-server'. I knew this was already in place, but I never knew it would hit this hard. So, when you do some specific benchmarking, Math.sin is faster for the times the code is not under JIT, and slower when it is. These results all tested under JRE 1.6. It doesn't dismiss the issues earlier. JRE 1.4 and JRE 1.5 did much worse in Math.sin, so there benchmarking would have shown a win for MathHelper.sin immediately. Now it was hiding under the control of the JIT. I really don't like language where I am not in control .. but so be it. In conclusion: MathHelper.sin is in fact faster, and I cannot find any statistical evidence FPS++ helps. The only reason I can see it helps, is when the JIT cannot execute, or on cold system. Also decreasing the table size seems not to get any statistical support. PS: this is under the assumption the JIT will kick in sooner or later for this function, which is very very likely.
-
Hmm, that for sure would be one way to establish if it would be faster or slower for most of the users. I am not against doing that, so I will drop by on IRC soon and we will see what we can do. More important, I was puzzled why there was so little documentation on something that impacts performance as much as this. So .. some more digging. (this reply is mostly intended for the curious, like myself, and for documentation purposes) Most documentation on this comes from the bug tracker, funny enough. An interesting read is: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4857011 Here they basically explain this: In JRE 1.3 Math.sin was quick, as it uses x87 fsin. But fsin has limitations. It is only accurate (by some standard) between -1/4 PI and 1/4 PI, and only accepts inputs ranging from - 2^63 .. 2^63. Java specifications define a much large range, and more accuracy. So, in 1.4(.1) they 'fixed' this bug by removing the fsin call, and calling StrictMath.sin instead. StrictMath is the Java implementation of most of the Math stuff. It uses the FPU, but avoids calls like fsin, ftan, ... Even in latest Java, you can still call this if you like, and you can be sure that results get from these functions carry correctly over all platforms and are very precise. So, how wrong is fsin() outside his range of accuracy? Well, somewhere along the 6th digit it starts to differ. Nothing huge, but .. enough for applications who care to notice. Unacceptable by Java standard, so instead of trying to fix it nicely, they just bypassed it and went for a lot slower solution, but always accurate. In result, 1.4 (and 1.5) use this dead-slow approach. If you read the Java source for 1.6 (and 1.7), you still see that Math.sin calls StrictMath.sin. But .. this is just a fallback. What came along is JNI (Java Native Implementation), in the form of Hotspot. It overwrites Math.sin for a native implementation. For x86 this calls fsin when you are in the -1/4 PI - 1/4 PI range, and falls back to the native case when you are not (the StrictMath.sin). Sadly, Hotspot has a lot of assembly, and is therefor useless for platforms like Linux. Along comes Icetead, a Hotspot alternative without any assembly, which restores this correctly. I haven't checked the source of Icetead (I did of Hotspot), but it should also implement Math.sin in a similar way. What just surprises me, and underlines my personal dislike for Java, is that these kind of changes mostly went undocumented, while it is of huge importance if you care about performance in any way. But, end good all good, I am happy recent JRE + JNIs solve this correctly. So, for a benchmark, several things should be tested for / harvested: JRE version JNI version (if possible?) Platform (Windows, Linux, ...) CPU set (x86, ...) Speed of Math.sin Speed of StrictMath.sin Speed of MathHelper.sin (for reference, I guess) The difference between Math.sin and StrictMath.sin should indicate if that JRE/JNI/Platform/CPU is accelerated by the x87 instruction fsin. PS: sorry for derailing this topic from a normal survey to this
-
Owh, you shouldn't post these questions. They make me curious, and make me look into the shit on a very deep level .. not trusting how it appears in FPS and stuff, no, benchmarking these functions and see what they do in Java Bytecode. The results? Both surprising (to me) and funny. DISCLAIMER: I am a C/C++ programmer who recently took the interest in Minecraft modding; I do know Java for years, but never did much in it. I might have made stupid mistakes in my benchmarks or related code. That said, I doubt I did First off, I am not the kind of person who believes in claims like '33% more FPS'. Those numbers are used by PR people, and are more than often complete bullshit. Who's FPS? Under what conditions? And how much will I notice of this? So .. I hope I will give you some results of a more precise order, and I hope it helps you a bit. Let's start off by cutting the bullshit away. The part about the l2 cache, I can be very short in that: nobody can ever prove that claim, and it is just that: a random unfounded claim. On modern desktops there are various of cores which might or might not have shared caches. You run various of applications, which might or might not be doing data intense operations. You, as programmer of an application, have no control what so ever of what goes into the cache and what not. This is a good thing. There are things at work that are much smarter than a programmer can be, in regards to caching or not. So if a solution like making an array shorter improves performance, in a consistent matter, over multiple machines, it is very unlikely you can name L2 cache as reason. And if it would, it would only work on a very small subset of all machines running Minecraft. But of course he does notice a performance gain. So what is really going on? What I first tried, was to benchmark the Mathhelper.sin and Math.sin. ( added http://pastebin.com/MeskiB1C in MathHelper.java ; mind you, I changed 'float' to 'double' in MathHelper.sin; this was purely for benchmarking purposes (to compare Math.sin and MathHelper.sin fairly)) The performance difference? 10 to 1. Math.sin is 10 times faster than MathHelper.sin. That is a huge difference. So why did Notch make such caching system, if it doesn't help performance? I am sure he would have benchmarked it. That immediately hints to me there is more at play then the results of my machine and my JRE. So let's dig on. These days, all x86 FPUs have hardware support for sin/cos (fsin/fcos, added in 80387, 1987). One thing you can be sure of: it is a damn fast operation. From my understanding only, Sun's JRE 1.5 didn't really use this instruction. Information is a bit fuzzy here, but from what I understand it is not accurate enough for Java's standard outside the range of -1/4 PI - 1/4 PI. In Sun's JRE 1.6 this got solved. As you might understand now, on certain systems with JRE 1.5 you get a piss poor performance of your Math.sin. Caching these values is very wise and give you a very good performance boost. In JRE 1.6? Not so much. In fact, caching values makes the performance suck balls. This made me wonder, as I come from a C background: how can a table cache be slower than a FPU call? FPU calls are in general considered expensive, so .. what is going on? The answer is right in your face: table cache and more specifically: array bounds checking. In Java it is very hard to request an element from an array without the system checking if the value is out of bounds. Even if you know 100% sure it never will, it checks 'just in case'. This is one of the principals of Java: never allow buffer overflows. It is great. But not for performance. In result, the FPU call is in many cases much quicker than a table call. Which to me is silly, as in C it would have been the other way around. So .. can you disable array bounds checking in Java? And would that make MathHelper.sin faster? It seems that it is not possible to disable the checking in any clean fashion. You can, however, access memory directly in Java via 'Unsafe' stuff. It is very hard to get this to work, and very ugly, but possible nevertheless. I did some benchmarks on this, cheating along the way to get it to work, as I wanted to know what the difference would be between Math.sin and a cache like MathHelper.sin if it wouldn't do the array bounds checking. The results: Current MathHelper.sin: 500 'arbitrary units'. Current Math.sin: 44 'arbitrary units'. MathHelper.sin without bounds checking: 40 'arbitrary units'. It is a tiny bit faster (10%, call it tiny), but it is not worth the hassle and ugly code that comes with it. Using Math.sin should be more than sufficient, and as a bonus increases the accuracy too Which brings me to the conclusion. Why did Notch implement a cache? I can think of 3 reasons: he was either using Sun's JRE 1.5 and noticed a huge performance loss there, or he was reading articles from 1985 (where it was common to cache your sin/cos for performance gain), or he was thinking in a C-like fashion. How ever it might be, all benchmarks (and all threads about it on the web) clearly indicate that these days it is better to use Math.sin over any custom made version. Something often called: premature optimizations. Trust your compiler to do the right thing, in many cases it will. To close up: all the benchmarks I ran on the MathHelper.sin function clearly shows that using Math.sin with Sun's (Oracle's?) JRE 1.6 increases the speed of that function dramatically. So the author of that quote seemed to be spot-on, although an increase of 33% in FPS sounds far fetched. But I cannot disprove it really However, such change does require some extensive testing on other machines / OSes before making such changes in for example MCForge; Notch is not an idiot, so he most likely did it for a good reason. Maybe ask him? tl;dr: the author of the FPS++ seems to be on to something; with Sun's JRE 1.6, Math.sin is a lot quicker. PS: I could find no evidence that changing the size of the cache table helps / changes anything. It did nothing for me on this level of benchmarking.
-
PacketEntitySpawn sending too new information
TrueBrain replied to TrueBrain's topic in Support & Bug Reports
After a lot of digging, it seems one of these 3 solutions would be best: 1) After sending par1EntityPlayerMP.playerNetServerHandler.sendPacket(this.getSpawnPacket()); send a Teleport packet. This will correct serverPos[XYZ], and it will work. The downside is, that the packet will be at an older position till the first sync (so within 64 ticks it will be at the correct position). It can lag 'tickUpdate - 1' ticks behind, basically, depending on the motion etc. 2) After sending par1EntityPlayerMP.playerNetServerHandler.sendPacket(this.getSpawnPacket()); send a new packet which sets serverPos[XYZ] (by sending encodedPos[XYZ]), and only that, once. 3) Change all the SpawnPacket to also include encodedPos[XYZ], and use that for serverPos[XYZ] at the client. To sum it up: 1) is a dirty hack, but patch-wise very small, and does the job (tested it and everything) 2) is most elegant, but requires yet another packet (not a big deal I guess) 3) is most clean, but much more invasive. This should really be fixed in Vanilla MC, and I can only hope they did in 1.3 .. one can dream, not? I think I will make a pull request for 2), and see how that works out. Any input would be much appreciated [Edit] As promised, a Pull Request is up for 2). It works rather nice, in my test case anyway. -
First off: hello all! Awesome work on MinecraftForge (first post, needs to be celebrated, right?) The last 3 days I have been working on my own mod, mostly out of boredom. For the last hour I have been tracking an issue that appears to be in Vanilla MC. Normally I would make a patch (or at least an attempt to) and put it on the Issue tracker of github, but I have a hard time finding a solution that is easy and clean. Of course I might also just be plain wrong, so I hope to get some input here. It is rather complex I am afraid. So here it goes: I created an Entity, which moves over a 'track'. Every time it has to turn, it changes motion[XYZ], and sends it to the clients (by setting isAirBorne to true, which is in fact forceUpdate, but who is counting ). The Entity has an ISpawnHandler as I have some custom fields to fill, and it is registered so EntityTrackerEntry is doing the updates, both in the position as the motion[XYZ]. This works very well, much more responsive than I expected. In SSP, there is absolutely no issue, which comes as no real surprise. In SMP, it works flawless for all players as long as they are all nearby. And here comes the issue. When the Entity is spawned, it uses pos[XYZ] to tell the client where it is. Next, serverPos[XYZ] is synced with this value, and the Entity is where it should be. Next, the server has its EntityTrackerEntry update. He uses encodedPos[XYZ] (better name would be lastUpdatedPos[XYZ], to give you an idea what this value does) and the pos[XYZ] of the Entity. It calculates its offset, and sends this to all the clients with a RelEntityMove. The client adds the diff to serverPost[XYZ], and here is the issue: This works fine for clients which were there to see the Entity spawn when the server spawned it. But when you receive the spawn packet at any later stage, pos[XYZ] != encodedPos[XYZ]. The next relative move you get brings you to a different position than you should be. So you always end up further in the direction that you are moving, with all the issues and trouble. To me it appears that this is also the reason you sometimes see players/mobs inside stuff or moving 'slightly' wrong. After 400 ticks EntityTrackerEntry sends a Teleport packet, correcting any mistake there might be. At this moment you see the Entity jumping to its real position, and from there on all clients track it correctly. I hope this story is a bit clear. Now of course the question: how to solve it. A few solution I have been pondering about. One obvious would be to store encodedPos[XYZ] in Entity, not EntityTrackerEntry, and send this value with the Spawn packets. This is not as trivial as it sounds, as every vanilla object has its own code to spawn. But, it is just a matter of doing to trace all those locations and make the correct adjustments. The drawback of this approach is that it should be done only on the server. An other solution, which is basically the same solution, would be to tell the Entity about the EntityTrackerEntry object, so it can use encodedPos[XYZ] when needed. This too should be done server-only, and appears to be even more dirty. The other thing I came up with, was to put the new player in a special queue, and first send all the EntityTrackerEntry information, before doing any other packet. But as I am relative new to this code, I wouldn't even know where to start with that. Hence, this post. I am really hoping I missed something obvious, or there is a simple fix to avoid this issue. How this problem appears to me, it is rather nasty, and a 'by design' issue. So I hope you guys have some good input for me. Sorry about the wall of text