A couple of weeks ago I made a modded minecraft server based on All The Mods 3 1.12.2 for our group of friends to play on (10 people total, only 3-5 people are online at once however). I have allocated it 8G of ram, with my client getting 5G. It ran fine for the first 3 weeks (The server was running 24/7, and I restarted my computer once a day during these three weeks), with only minor tps problems when generating new chunks outside the pregened chunks which are about 500x500 chunks around spawn. In the past few days however, the server has been lagging rhythmically every 3 seconds, skipping hundreds of ticks each time even if no one is online. The lag spikes happen immediately after restarting the server, and do not stop. Task manager and resource monitor both show the server never using the xmx value of ram, only reaching the xms and then dropping again. The list of mods in the mods folder is attached to the post, and I'd rather avoid having to individually install each of the 160-ish mods to figure out which one is causing it, because not only will that take a long time, I also feel the mods are not the problem as it was running fine for a long time? The server has a render distance of 8 chunks, and that has been constant for the past 3 weeks.
Computer that the server is running on has a Ryzen 5 1600, GTX 1060, and 16GB of ram. The server is running on a HDD but the disk activity in task manager never gets higher than 50%. I have considered moving the server to my C drive which is an M.2 SSD, but it is only 120GB and the server is currently 70GB total. I believe I could still do this using syslink for the larger folders like dynmap and the backups folder, but I'd rather see if I can solve the problem here before I resort to that. The tps spikes didn't come on gradually as if the folder is getting too big or something, they suddenly appeared a few days ago and I have been unable to fix them.
console (doesnt show as every 3 seconds, but brandon's core ticktime command does)
/bcore_ticktime command
Both of those screenshots were taken with no chunks chunk-loaded on the server, with only me online, and while I was in a desert far from any base.
The spikes happen with any combination of java arguments I have tried, but the current ones are simply:
java -Xms4G -Xmx8G -jar forge-1.12.2-14.23.5.2838-universal.jar nogui
PAUSE
So far to fix this problem I have tried:
LagGoggles Mod to try determine the source - The mod showed nothing as particularly high, except EnderIO and refined storage as event subscribers despite all player bases currently being unloaded (??, no bases are chunkloaded and im far from any, not sure why the blocks still show up in laggoggles) My base has about 500 slime generators from extrautilities, and 12 mechanical users which is what the "TileUse" entities are I believe. They are connected using enderio conduits, but again, they were all unchunkloaded when i ran this world scan.
Allocating more or less ram - Didn't seem to make much of a difference, no matter the combination of high xmx, low xms, or having them be the same number of gigabytes as one forum thread suggested I should do
Various Java Arguments - see code block below, these were not all used at once, but they have all be used at some point and didn't have an effect. I had hoped G1GC would help, and it did seem to help for other times where the server would lag, but has not had an effect on these lag spikes.
-XX:+UseG1GC-XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseNUMA -XX:+CMSParallelRemarkEnabled -XX:MaxTenuringThreshold=15 -XX:MaxGCPauseMillis=30 -XX:GCPauseIntervalMillis=150 -XX:+UseAdaptiveGCBoundary -XX:-UseGCOverheadLimit -XX:+UseBiasedLocking -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15 -Dfml.ignorePatchDiscrepancies=true -XX:+UseFastAccessorMethods -XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+AggressiveOpts -XX:ReservedCodeCacheSize=2048m -XX:+UseCodeCacheFlushing -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:ParallelGCThreads=6
Jprofiler to see garbage collector activity - the lag spikes looked like garbage collections to me, but various of the above java args changed its behaviour in JProfiler, but none had any effect on the lag spikes. This screenshot is while using a few java args to affect GC, but even with default args, the GC spikes dont line up with the lag spikes.
large GC spike here, but didnt really line up to a tps lag spike
Unloading all player bases and flying far away, then restarting server - We use ftb utilities on the server for chunk claiming and loading. I did the /chunk unload_everything command which should have unchunkloaded other players claims too. This did not have any noticeable effect on the spikes of lag, but the baseline in /bcore_ticktime when it is not lagging seems to have improved a bit. That isn't really helpful though because it was already below 50ms per tick when not spiking.
Giving my poor computer a break for a few hours - I turned off computer overnight just last night to see if it would have an effect, as it has been running for almost 3 weeks straight, and not properly shutdown during that time. This didn't have any noticeable affect however and the lag is exactly the same as before it was shutdown for 8 hours.
I have a few logs that I'll add here, I added a java arg at one point to show GC activity in a log, and the debug log also talks about leaks in the overworld. I did see a message once in the console mentioning a severe leak but I haven't seen it since and cant find it in the past logs anymore. The debug log will have to be uploaded to google drive or something as it is 18MB, I will wait for permission for that though as I don't want to get banned. The GC log: https://gist.github.com/ShinyPichu01/a8454c1d36445a1e6b6cb12a4dc9555f
Shortened debug log: https://gist.github.com/ShinyPichu01/2069f92cadc729612b8831d776e2ca9f
There was one time last night where the lag spikes inexplicably stopped. A player joined and they stopped, but then started a short while later, and then stopped, etc. As soon as that player left, the lag spikes came back continuously. That player's base is not chunkloaded, and doesn't contain many machines that could be laggy. The base does have forge microblocks I believe however.
Also, to reiterate, the "lag spikes" I talk about are TPS spikes, not FPS spikes. My client runs fine, but the server has been running behind causing everyone to experience server lag regardless of ping.