Owh, you shouldn't post these questions. They make me curious, and make me look into the shit on a very deep level .. not trusting how it appears in FPS and stuff, no, benchmarking these functions and see what they do in Java Bytecode. The results? Both surprising (to me) and funny.
DISCLAIMER: I am a C/C++ programmer who recently took the interest in Minecraft modding; I do know Java for years, but never did much in it. I might have made stupid mistakes in my benchmarks or related code. That said, I doubt I did
First off, I am not the kind of person who believes in claims like '33% more FPS'. Those numbers are used by PR people, and are more than often complete bullshit. Who's FPS? Under what conditions? And how much will I notice of this? So .. I hope I will give you some results of a more precise order, and I hope it helps you a bit.
Let's start off by cutting the bullshit away. The part about the l2 cache, I can be very short in that: nobody can ever prove that claim, and it is just that: a random unfounded claim. On modern desktops there are various of cores which might or might not have shared caches. You run various of applications, which might or might not be doing data intense operations. You, as programmer of an application, have no control what so ever of what goes into the cache and what not. This is a good thing. There are things at work that are much smarter than a programmer can be, in regards to caching or not. So if a solution like making an array shorter improves performance, in a consistent matter, over multiple machines, it is very unlikely you can name L2 cache as reason. And if it would, it would only work on a very small subset of all machines running Minecraft. But of course he does notice a performance gain. So what is really going on?
What I first tried, was to benchmark the Mathhelper.sin and Math.sin. ( added http://pastebin.com/MeskiB1C in MathHelper.java ; mind you, I changed 'float' to 'double' in MathHelper.sin; this was purely for benchmarking purposes (to compare Math.sin and MathHelper.sin fairly)) The performance difference? 10 to 1. Math.sin is 10 times faster than MathHelper.sin. That is a huge difference. So why did Notch make such caching system, if it doesn't help performance? I am sure he would have benchmarked it. That immediately hints to me there is more at play then the results of my machine and my JRE. So let's dig on.
These days, all x86 FPUs have hardware support for sin/cos (fsin/fcos, added in 80387, 1987). One thing you can be sure of: it is a damn fast operation. From my understanding only, Sun's JRE 1.5 didn't really use this instruction. Information is a bit fuzzy here, but from what I understand it is not accurate enough for Java's standard outside the range of -1/4 PI - 1/4 PI.
In Sun's JRE 1.6 this got solved. As you might understand now, on certain systems with JRE 1.5 you get a piss poor performance of your Math.sin. Caching these values is very wise and give you a very good performance boost. In JRE 1.6? Not so much. In fact, caching values makes the performance suck balls.
This made me wonder, as I come from a C background: how can a table cache be slower than a FPU call? FPU calls are in general considered expensive, so .. what is going on? The answer is right in your face: table cache and more specifically: array bounds checking. In Java it is very hard to request an element from an array without the system checking if the value is out of bounds. Even if you know 100% sure it never will, it checks 'just in case'. This is one of the principals of Java: never allow buffer overflows. It is great. But not for performance. In result, the FPU call is in many cases much quicker than a table call. Which to me is silly, as in C it would have been the other way around.
So .. can you disable array bounds checking in Java? And would that make MathHelper.sin faster? It seems that it is not possible to disable the checking in any clean fashion. You can, however, access memory directly in Java via 'Unsafe' stuff. It is very hard to get this to work, and very ugly, but possible nevertheless. I did some benchmarks on this, cheating along the way to get it to work, as I wanted to know what the difference would be between Math.sin and a cache like MathHelper.sin if it wouldn't do the array bounds checking. The results:
Current MathHelper.sin: 500 'arbitrary units'.
Current Math.sin: 44 'arbitrary units'.
MathHelper.sin without bounds checking: 40 'arbitrary units'.
It is a tiny bit faster (10%, call it tiny), but it is not worth the hassle and ugly code that comes with it. Using Math.sin should be more than sufficient, and as a bonus increases the accuracy too
Which brings me to the conclusion. Why did Notch implement a cache? I can think of 3 reasons: he was either using Sun's JRE 1.5 and noticed a huge performance loss there, or he was reading articles from 1985 (where it was common to cache your sin/cos for performance gain), or he was thinking in a C-like fashion. How ever it might be, all benchmarks (and all threads about it on the web) clearly indicate that these days it is better to use Math.sin over any custom made version. Something often called: premature optimizations. Trust your compiler to do the right thing, in many cases it will.
To close up: all the benchmarks I ran on the MathHelper.sin function clearly shows that using Math.sin with Sun's (Oracle's?) JRE 1.6 increases the speed of that function dramatically. So the author of that quote seemed to be spot-on, although an increase of 33% in FPS sounds far fetched. But I cannot disprove it really
However, such change does require some extensive testing on other machines / OSes before making such changes in for example MCForge; Notch is not an idiot, so he most likely did it for a good reason. Maybe ask him?
tl;dr: the author of the FPS++ seems to be on to something; with Sun's JRE 1.6, Math.sin is a lot quicker.
PS: I could find no evidence that changing the size of the cache table helps / changes anything. It did nothing for me on this level of benchmarking.