diff options
| author | Adam Stylinski <kungfujesus06@gmail.com> | 2024-11-30 09:23:28 -0500 |
|---|---|---|
| committer | Hans Kristian Rosbach <hk-github@circlestorm.org> | 2024-12-10 22:17:14 +0100 |
| commit | 43d74a223b30902b44b01bf4c4888d8deb35e253 (patch) | |
| tree | ef1813e6dfbeee03b01156404456cb81c23fd713 /insert_string_tpl.h | |
| parent | a4e7c34a4ac171ba878eec86bdd2a58c1d03f8e5 (diff) | |
| download | Project-Tick-43d74a223b30902b44b01bf4c4888d8deb35e253.tar.gz Project-Tick-43d74a223b30902b44b01bf4c4888d8deb35e253.zip | |
Improve pipeling for AVX512 chunking
For reasons that aren't quite so clear, using the masked writes here
did not pipeline very well. Either setting up the mask stalled things
or masked moves have issues overlapping regular moves. Simply putting
the masked moves behind a branch that is rarely taken seemed to do the
trick in improving the ILP. While here, put masked loads behind the same
branch in case there were ever a hazard for overreading.
Diffstat (limited to 'insert_string_tpl.h')
0 files changed, 0 insertions, 0 deletions
