Operating System: TUXEDO OS
KDE Plasma Version: 6.5.2
KDE Frameworks Version: 6.19.0
Qt Version: 6.9.2
Kernel Version: 6.14.0-123037-tuxedo (64-bit)
Graphics Platform: Wayland
Processors: 4 × Intel CoreTM i5-6500T CPU @ 2.50GHz
Memory: 8 GiB of RAM (7.2 GiB usable)
Graphics Processor: Intel HD Graphics 530
Manufacturer: HP
Product Name: HP EliteDesk 800 G2 DM 35W
speedtest
===========================
Version : Stockfish 17.1
Compiled by : g++ (GNUC) 11.4.0 on Linux
Compilation architecture : x86-64-bmi2
Compilation settings : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler VERSION macro : 11.4.0
Large pages : yes
User invocation : speedtest
Filled invocation : speedtest 4 512 150
Available processors : 0-3
Thread count : 4
Thread binding : none
TT size [MiB] : 512
Hash max, avg [per mille] :
single search : 34, 17
single game : 556, 359
Total nodes searched : 350198261
Total search time [s] : 153.572
Nodes/second : 2280352
bench
===========================
Total time (ms) : 2970
Nodes searched : 2030154
Nodes/second : 683553
===========================
stockfish-ubuntu-x86-64-bmi2
===========================
Version : Stockfish 18
Compiled by : g++ (GNUC) 11.4.0 on Linux
Compilation architecture : x86-64-bmi2
Compilation settings : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler VERSION macro : 11.4.0
Large pages : yes
User invocation : speedtest
Filled invocation : speedtest 4 512 150
Available processors : 0-3
Thread count : 4
Thread binding : none
TT size [MiB] : 512
Hash max, avg [per mille] :
single search : 35, 18
single game : 555, 362
Total nodes searched : 356634604
Total search time [s] : 149.929
Nodes/second : 2378689
bench
===========================
Total time (ms) : 2967
Nodes searched : 2050811
Nodes/second : 691206
===========================
Performance Improvement
Speedtest (Nodes/s) increased by about 4.3%
Bench (Nodes/s) increased by about 1.1%
The tests were using defaults settings. Stockfish can use huge pages automatically if available. In my case it said "Yes" under Large page. I then checked my huge page status and did a temporary change:
zen@tuxedo-os:~$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
zen@tuxedo-os:~$ echo 1024 | sudo tee /proc/sys/vm/nr_hugepages
1024
zen@tuxedo-os:~$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 1024
HugePages_Free: 1024
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2097152 kB
A reboot automatically returns everything to default.
I did another test with stockfish but there was no difference in my case if I did those temporary changes or not.
So I then increased the hash to 768 which was my known sweat spot in the past. But I never gained any more nodes per second. So there is no need for me to be obsessed over huge pages on my system. The extra tweak for me is not worth it. I'll stick to defaults.
ChatGPT said ... For chess engines like Stockfish:
L3 cache size is very important;
L3 latency & bandwidth is almost as important;
Cores sharing the L3 is critical;
Memory subsystem (DDR4 vs DDR5) is secondary, but relevant;
L3 is one of the top specs to look at, but how it’s shared matters too.
My cpu, the L3 cache is 6 MiB (shared) across 4 cores. That’s 1.5 MB L3 per core.
This is the bottle neck that is basically forcing me to remain with default settings with stockfish.
According to AI, this is what I should be looking for in a new CPU (chess-engine–focused):
Priority #1: L3 cache per core
L3 per core ≥ 3–4 MB is excellent for Stockfish
≥ 2 MB is good
≤ 1.5 MB has diminishing returns, compared to what I already have.
I did this test too:
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ sudo sysctl -w kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ perf stat -e cache-misses,cache-references ./stockfish-ubuntu-x86-64-bmi2 speedtest
Stockfish 18 by the Stockfish developers (see AUTHORS file)
info string Using 4 threads
Warmup position 3/3
Position 258/258
===========================
Version : Stockfish 18
Compiled by : g++ (GNUC) 11.4.0 on Linux
Compilation architecture : x86-64-bmi2
Compilation settings : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler VERSION macro : 11.4.0
Large pages : yes
User invocation : speedtest
Filled invocation : speedtest 4 512 150
Available processors : 0-3
Thread count : 4
Thread binding : none
TT size [MiB] : 512
Hash max, avg [per mille] :
single search : 34, 18
single game : 548, 355
Total nodes searched : 350079097
Total search time [s] : 149.926
Nodes/second : 2335012
Performance counter stats for './stockfish-ubuntu-x86-64-bmi2 speedtest':
34,941,518,572 cache-misses # 14.45% of all cache refs
241,806,561,586 cache-references
154.699932426 seconds time elapsed
608.827550000 seconds user
1.112747000 seconds sys
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ perf stat -e cache-misses,cache-references ./stockfish-ubuntu-x86-64-bmi2 bench
===========================
Total time (ms) : 3019
Nodes searched : 2050811
Nodes/second : 679301
Performance counter stats for './stockfish-ubuntu-x86-64-bmi2 bench':
156,618,123 cache-misses # 10.95% of all cache refs
1,430,432,594 cache-references
3.804396320 seconds time elapsed
3.452327000 seconds user
0.329884000 seconds sys
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ sudo sysctl -w kernel.perf_event_paranoid=4
kernel.perf_event_paranoid = 4
During this final tests, I had other programs running, so the nodes per second are affected:
2 x Browsers
2 x Terminals
1 x File manager open
1 x Kwrite
Conclusion:
Adding more hash beyond ~L3 efficiency doesn’t help NPS;
L3 is the bottleneck, not your RAM or threads;
L3 limits nodes/sec
Operating System: TUXEDO OS
KDE Plasma Version: 6.5.2
KDE Frameworks Version: 6.19.0
Qt Version: 6.9.2
Kernel Version: 6.14.0-123037-tuxedo (64-bit)
Graphics Platform: Wayland
Processors: 4 × Intel CoreTM i5-6500T CPU @ 2.50GHz
Memory: 8 GiB of RAM (7.2 GiB usable)
Graphics Processor: Intel HD Graphics 530
Manufacturer: HP
Product Name: HP EliteDesk 800 G2 DM 35W
speedtest
===========================
Version : Stockfish 17.1
Compiled by : g++ (GNUC) 11.4.0 on Linux
Compilation architecture : x86-64-bmi2
Compilation settings : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 11.4.0
Large pages : yes
User invocation : speedtest
Filled invocation : speedtest 4 512 150
Available processors : 0-3
Thread count : 4
Thread binding : none
TT size [MiB] : 512
Hash max, avg [per mille] :
single search : 34, 17
single game : 556, 359
Total nodes searched : 350198261
Total search time [s] : 153.572
Nodes/second : 2280352
bench
===========================
Total time (ms) : 2970
Nodes searched : 2030154
Nodes/second : 683553
===========================
stockfish-ubuntu-x86-64-bmi2
===========================
Version : Stockfish 18
Compiled by : g++ (GNUC) 11.4.0 on Linux
Compilation architecture : x86-64-bmi2
Compilation settings : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 11.4.0
Large pages : yes
User invocation : speedtest
Filled invocation : speedtest 4 512 150
Available processors : 0-3
Thread count : 4
Thread binding : none
TT size [MiB] : 512
Hash max, avg [per mille] :
single search : 35, 18
single game : 555, 362
Total nodes searched : 356634604
Total search time [s] : 149.929
Nodes/second : 2378689
bench
===========================
Total time (ms) : 2967
Nodes searched : 2050811
Nodes/second : 691206
===========================
Performance Improvement
Speedtest (Nodes/s) increased by about 4.3%
Bench (Nodes/s) increased by about 1.1%
The tests were using defaults settings. Stockfish can use huge pages automatically if available. In my case it said "Yes" under Large page. I then checked my huge page status and did a temporary change:
zen@tuxedo-os:~$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
zen@tuxedo-os:~$ echo 1024 | sudo tee /proc/sys/vm/nr_hugepages
1024
zen@tuxedo-os:~$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 1024
HugePages_Free: 1024
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2097152 kB
A reboot automatically returns everything to default.
I did another test with stockfish but there was no difference in my case if I did those temporary changes or not.
So I then increased the hash to 768 which was my known sweat spot in the past. But I never gained any more nodes per second. So there is no need for me to be obsessed over huge pages on my system. The extra tweak for me is not worth it. I'll stick to defaults.
ChatGPT said ... For chess engines like Stockfish:
L3 cache size is very important;
L3 latency & bandwidth is almost as important;
Cores sharing the L3 is critical;
Memory subsystem (DDR4 vs DDR5) is secondary, but relevant;
L3 is one of the top specs to look at, but how it’s shared matters too.
My cpu, the L3 cache is 6 MiB (shared) across 4 cores. That’s 1.5 MB L3 per core.
This is the bottle neck that is basically forcing me to remain with default settings with stockfish.
According to AI, this is what I should be looking for in a new CPU (chess-engine–focused):
Priority #1: L3 cache per core
L3 per core ≥ 3–4 MB is excellent for Stockfish
≥ 2 MB is good
≤ 1.5 MB has diminishing returns, compared to what I already have.
I did this test too:
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ sudo sysctl -w kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ perf stat -e cache-misses,cache-references ./stockfish-ubuntu-x86-64-bmi2 speedtest
Stockfish 18 by the Stockfish developers (see AUTHORS file)
info string Using 4 threads
Warmup position 3/3
Position 258/258
===========================
Version : Stockfish 18
Compiled by : g++ (GNUC) 11.4.0 on Linux
Compilation architecture : x86-64-bmi2
Compilation settings : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 11.4.0
Large pages : yes
User invocation : speedtest
Filled invocation : speedtest 4 512 150
Available processors : 0-3
Thread count : 4
Thread binding : none
TT size [MiB] : 512
Hash max, avg [per mille] :
single search : 34, 18
single game : 548, 355
Total nodes searched : 350079097
Total search time [s] : 149.926
Nodes/second : 2335012
Performance counter stats for './stockfish-ubuntu-x86-64-bmi2 speedtest':
34,941,518,572 cache-misses # 14.45% of all cache refs
241,806,561,586 cache-references
154.699932426 seconds time elapsed
608.827550000 seconds user
1.112747000 seconds sys
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ perf stat -e cache-misses,cache-references ./stockfish-ubuntu-x86-64-bmi2 bench
===========================
Total time (ms) : 3019
Nodes searched : 2050811
Nodes/second : 679301
Performance counter stats for './stockfish-ubuntu-x86-64-bmi2 bench':
156,618,123 cache-misses # 10.95% of all cache refs
1,430,432,594 cache-references
3.804396320 seconds time elapsed
3.452327000 seconds user
0.329884000 seconds sys
zen@tuxedo-os:~/Downloads/stockfish18/stockfish$ sudo sysctl -w kernel.perf_event_paranoid=4
kernel.perf_event_paranoid = 4
During this final tests, I had other programs running, so the nodes per second are affected:
2 x Browsers
2 x Terminals
1 x File manager open
1 x Kwrite
Conclusion:
Adding more hash beyond ~L3 efficiency doesn’t help NPS;
L3 is the bottleneck, not your RAM or threads;
L3 limits nodes/sec