[ad_1]
Hello Guys:
I used to be enjoying round with the Intel Reminiscence Latency checker and later needed to put in writing my very own model of the reminiscence latency measurement program.
I do know that we often use pointer-chasing for reminiscence latency measurement, however I need to strive an easier technique of “flush cacheline–> file time –> mem learn addr A –> end file time”.
I repeat the loop many occasions. From the outcomes, I discovered three classes of latency:
- 80-100ns, 98% of the outcomes
- ~150-300ns, 2% of the outcome
- >> 1us, <0.1% of the outcome.
80-100ns appears an inexpensive outcome for reminiscence latency. The >>1us ones ought to principally be attributable to interrupts/web page misses, and so forth.
What bothers me is these from 150-300us. They appear to occur periodically. Weakly aligned to cacheline dimension. The latency is just too massive for the DRAM shut/open web page coverage distinction, too small for the DRAM refresh interval, additionally too small for any interrupts.
I used to be suspecting that the “latency recording” would generate reminiscence writing that interference with the DRAM latency”. Nevertheless, even after I take away this portion, from the “high_latency_ch0” stat it nonetheless reveals ~2% of 150-300ns vary.
Completely different machines behave barely in a different way.
Right here is my core operate for measuring:
std::cout << "mem_latency experiment begin" << std::endl;
for (uint64_t i = 0; i < sample_count; i++) cycles_low1 );
int32_t cycle_ch0 = static_cast<int32_t>((end1 - start1) - rdtsc_self_delay);
sample_array_ch0[i] = cycle_ch0; //Remark out to see if latency recording trigger DRAM latency interference
high_latency_ch0 += (cycle_ch0 > (100 * cpu_ghz));
// addr = ori_addr + (i*4) % 4096;
What’s extra fascinating is, that generally, you would see some kind of alignment or sample occurring within the outcome. [Full csv in attachment]
I’ve tried to disable the Knowledge-Dependent Prefetcher but it surely doesn’t appear to be the explanation. I additionally disabled the DCP and L2 Prefetcher within the BIOS, but it surely additionally doesn’t appear to be associated. [Well I am not sure if the prefetcher in BIOS is useful….]
Right here is my CPU spec:
Structure: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Deal with sizes: 46 bits bodily, 48 bits digital
CPU(s): 14
On-line CPU(s) checklist: 0-13
Thread(s) per core: 1
Core(s) per socket: 14
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU household: 6
Mannequin: 79
Mannequin identify: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
Stepping: 1
CPU MHz: 1200.178
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 3999.97
L1d cache: 448 KiB
L1i cache: 448 KiB
L2 cache: 3.5 MiB
L3 cache: 35 MiB
NUMA node0 CPU(s): 0-13
Vulnerability Collect information sampling: Not affected
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Susceptible: Clear CPU buffers tried, no microcode; SMT disabled
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale information: Susceptible: Clear CPU buffers tried, no microcode; SMT disabled
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec retailer bypass: Mitigation; Speculative Retailer Bypass disabled by way of prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs limitations and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Susceptible: Clear CPU buffers tried, no microcode; SMT disabled
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe s
yscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni p
clmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_d
eadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_
ppin ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt
xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d
Working out of strategies already…. I used to be working it on Debian. I’ve solely this user-level program working. Is it doable the backend kernel threads trigger these….?
Thanks a lot
Jerry
[ad_2]