[ad_1]
This text explains, first, easy methods to benchmark utilizing the favored criterion crate. It, then, provides extra data displaying easy methods to benchmark throughout compiler settings. Though every mixture of compiler settings requires re-compilation and a separate run, we are able to nonetheless tabulate and analyze outcomes. The article is a companion to the article 9 Guidelines for SIMD Acceleration of Your Rust Code in In direction of Knowledge Science.
We’ll utilized this method to the range-set-blaze
crate. Our purpose is to measure the efficiency results of assorted SIMD (Single Instruction, A number of Knowledge) settings. We additionally wish to examine efficiency throughout completely different CPUs. This method can also be helpful for understanding the good thing about completely different optimization ranges.
Within the context of range-set-blaze
, we consider:
- 3 SIMD extension ranges —
sse2
(128 bit),avx2
(256 bit),avx512f
(512 bit) - 10 ingredient sorts —
i8
,u8
,i16
,u16
,i32
,u32
,i64
,u64
,isize
,usize
- 5 lane numbers — 4, 8, 16, 32, 64
- 2 CPUs — AMD 7950X with
avx512f
, Intel i5–8250U withavx2
- 5 algorithms — Common, Splat0, Splat1, Splat2, Rotate
- 4 enter lengths — 1024; 10,240; 102,400; 1,024,000
Of those, we externally alter the primary 4 variables (SIMD extension stage, ingredient sort, lane quantity, CPU). We managed the ultimate two variables (algorithm and enter size) with loops inside common Rust benchmark code.
So as to add benchmarking to your mission, add this dev dependency and create a subfolder:
cargo add criterion --dev --features html_reports
mkdir benches
In Cargo.toml
add:
[[bench]]
identify = "bench"
harness = false
Create a benches/bench.rs
. Right here is pattern one:
#![feature(portable_simd)]
#![feature(array_chunks)]
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use is_consecutive1::*;// create a string from the SIMD extension used
const SIMD_SUFFIX: &str = if cfg!(target_feature = "avx512f") {
"avx512f,512"
} else if cfg!(target_feature = "avx2") {
"avx2,256"
} else if cfg!(target_feature = "sse2") {
"sse2,128"
} else {
"error"
};
sort Integer = i32;
const LANES: usize = 64;
// examine in opposition to this
#[inline]
pub fn is_consecutive_regular(chunk: &[Integer; LANES]) -> bool {
for i in 1..LANES {
if chunk[i - 1].checked_add(1) != Some(chunk[i]) {
return false;
}
}
true
}
// outline a benchmark known as "easy"
fn easy(c: &mut Criterion) {
let mut group = c.benchmark_group("easy");
group.sample_size(1000);
// generate about 1 million aligned components
let parameter: Integer = 1_024_000;
let v = (100..parameter + 100).acquire::<Vec<_>>();
let (prefix, simd_chunks, reminder) = v.as_simd::<LANES>(); // hold aligned half
let v = &v[prefix.len()..v.len() - reminder.len()]; // hold aligned half
group.bench_function(format!("common,{}", SIMD_SUFFIX), |b| );
group.bench_function(format!("splat1,{}", SIMD_SUFFIX), |b| );
group.end();
}
criterion_group!(benches, easy);
criterion_main!(benches);
If you wish to run this instance, the code is on GitHub.
Run the benchmark with the command cargo bench
. A report will seem in goal/criterion/easy/report/index.html
and consists of plots like this one displaying Splat1 working many occasions sooner than Common.
We’ve got an issue. We wish to benchmark sse2
vs. avx2
vs. avx512f
which requires (typically) a number of compilations and criterion
runs.
Right here’s our method:
- Use a Bash script to set surroundings variables and name benchmarking.
For instance,bench.sh
:
#!/bin/bash
SIMD_INTEGER_VALUES=("i64" "i32" "i16" "i8" "isize" "u64" "u32" "u16" "u8" "usize")
SIMD_LANES_VALUES=(64 32 16 8 4)
RUSTFLAGS_VALUES=("-C target-feature=+avx512f" "-C target-feature=+avx2" "")for simdLanes in "${SIMD_LANES_VALUES[@]}"; do
for simdInteger in "${SIMD_INTEGER_VALUES[@]}"; do
for rustFlags in "${RUSTFLAGS_VALUES[@]}"; do
echo "Operating with SIMD_INTEGER=$simdInteger, SIMD_LANES=$simdLanes, RUSTFLAGS=$rustFlags"
SIMD_LANES=$simdLanes SIMD_INTEGER=$simdInteger RUSTFLAGS="$rustFlags" cargo bench
carried out
carried out
carried out
Apart: You possibly can simply use Bash on Home windows if in case you have Git and/or VS Code.
- Use a
construct.rs
to show these surroundings variables into Rust configurations:
use std::env;fn fundamental() {
if let Okay(simd_lanes) = env::var("SIMD_LANES") {
println!("cargo:rustc-cfg=simd_lanes="{}"", simd_lanes);
println!("cargo:rerun-if-env-changed=SIMD_LANES");
}
if let Okay(simd_integer) = env::var("SIMD_INTEGER") {
println!("cargo:rustc-cfg=simd_integer="{}"", simd_integer);
println!("cargo:rerun-if-env-changed=SIMD_INTEGER");
}
}
const SIMD_SUFFIX: &str = if cfg!(target_feature = "avx512f") {
"avx512f,512"
} else if cfg!(target_feature = "avx2") {
"avx2,256"
} else if cfg!(target_feature = "sse2") {
"sse2,128"
} else {
"error"
};#[cfg(simd_integer = "i8")]
sort Integer = i8;
#[cfg(simd_integer = "i16")]
sort Integer = i16;
#[cfg(simd_integer = "i32")]
sort Integer = i32;
#[cfg(simd_integer = "i64")]
sort Integer = i64;
#[cfg(simd_integer = "isize")]
sort Integer = isize;
#[cfg(simd_integer = "u8")]
sort Integer = u8;
#[cfg(simd_integer = "u16")]
sort Integer = u16;
#[cfg(simd_integer = "u32")]
sort Integer = u32;
#[cfg(simd_integer = "u64")]
sort Integer = u64;
#[cfg(simd_integer = "usize")]
sort Integer = usize;
#[cfg(not(any(
simd_integer = "i8",
simd_integer = "i16",
simd_integer = "i32",
simd_integer = "i64",
simd_integer = "isize",
simd_integer = "u8",
simd_integer = "u16",
simd_integer = "u32",
simd_integer = "u64",
simd_integer = "usize"
)))]
sort Integer = i32;
const LANES: usize = if cfg!(simd_lanes = "2") {
2
} else if cfg!(simd_lanes = "4") {
4
} else if cfg!(simd_lanes = "8") {
8
} else if cfg!(simd_lanes = "16") {
16
} else if cfg!(simd_lanes = "32") {
32
} else {
64
};
- In
benches.rs
, create a benchmark id that information the mix of variables you might be testing, separated by commas. This may both be a string or a criterionBenchmarkId
. I created aBenchmarkId
with this name:create_benchmark_id::<Integer>("common", LANES, *parameter)
to this perform:
fn create_benchmark_id<T>(identify: &str, lanes: usize, parameter: usize) -> BenchmarkId
the place
T: SimdElement,
{
BenchmarkId::new(
format!(
"{},{},{},{},{}",
identify,
SIMD_SUFFIX,
type_name::<T>(),
mem::size_of::<T>() * 8,
lanes,
),
parameter,
)
}
Set up:
cargo set up cargo-criterion-means
Run:
cargo criterion-means > outcomes.csv
Output Instance:
Group,Id,Parameter,Imply(ns),StdErr(ns)
vector,common,avx2,256,i16,16,16,1024,291.47,0.080141
vector,common,avx2,256,i16,16,16,10240,2821.6,3.3949
vector,common,avx2,256,i16,16,16,102400,28224,7.8341
vector,common,avx2,256,i16,16,16,1024000,287220,67.067
# ...
A CSV file is appropriate for evaluation through spreadsheet pivot tables or information body instruments resembling Polars.
For instance, right here is the highest of my 5000-line lengthy Excel information file:
Columns A to J got here from the benchmark. Columns Okay to N are calculated by Excel.
Here’s a pivot desk (and chart) based mostly on the info. It exhibits the impact of various the variety of SIMD lanes on throughput. The chart averages throughout ingredient sort and enter size. The chart means that for the perfect algorithms, both 32 or 64 lanes is finest.
With this evaluation, we are able to now select our algorithm and resolve how we wish to set the LANES parameter.
Thanks for becoming a member of me for this journey into Criterion benchmarking.
Should you’ve not used Criterion earlier than, I hope this encourages you to strive it. Should you’ve used Criterion however couldn’t get it to measure every thing you cared about, I hope this offers you a path ahead. Embracing Criterion on this expanded method can unlock deeper insights into the efficiency traits of your Rust tasks.
Please observe Carl on Medium. I write on scientific programming in Rust and Python, machine studying, and statistics. I have a tendency to write down about one article per thirty days.
[ad_2]