caching - What is the cache's role when writing to memory? -
i have function little reading, lot of writing ram. when run multiple times on same core (the main thread), runs 5x fast if launch function on new thread every run (which doesn't guarantee same core used between runs), launch , join between runs.
this suggests cache being used heavily write process, don't understand how. thought cache useful reads.
modern processors have write-buffers. reason writes are, first approximation, pure sinks. processor doesn't have wait store reach coherent memory hierarchy before executes next instruction.
(aside: stores not pure sinks. later read written-to memory location should return written value, processor must snoop write-buffer, , either stall read or forward written value it)
obviously such buffer(s) of finite size, when buffers full next store in program can't executed , stalls until slot in buffer made available older store becoming architecturally visible.
ordinarily, way write leaves buffer when value written cache (since lot of writes are read again quickly, think of program stack example). if write sets part of cacheline, rest of cacheline must remain unmodified, consequently must loaded memory hierarchy.
there ways avoid loading old cacheline, non-temporal stores, write-combining memory or cacheline-zeroing instructions.
non-temporal stores , write-combining memory combine adjacent writes fill whole cacheline, sending new cacheline memory hierarchy replace old one.
power has instruction zeroes full cacheline (dcbz
), removes need load old value memory.
x86 avx512 has cacheline-sized registers, suggests aligned zmm
-register store could avoid loading old cacheline (though not know whether does).
note many of these techniques not consistent usual memory-ordering of respective processor architectures. using them may require additional fences/barriers in multi-threaded operation.
Comments
Post a Comment