0x55aa
← Back to Blog

Rust Rayon: I Added One Letter to My Iterator and Got 8x Faster 🦀⚡

7 min read

Rust Rayon: I Added One Letter to My Iterator and Got 8x Faster 🦀⚡

Let me tell you about the best four characters I've ever typed in a programming language.

I was processing a large buffer of raw IQ samples from my RTL-SDR dongle — converting them, applying a filter, computing signal strength across frequency bins. Standard stuff. My Rust code was already fast compared to my old Python pipeline, so I was happy.

Then I read about Rayon. I changed .iter() to .par_iter(). I ran the benchmark again.

Every single CPU core went to 100%. My runtime dropped from 480ms to 61ms. I genuinely sat there for a second wondering if I'd made a mistake.

I had not made a mistake. I had just discovered Rayon. 🎉

The Thing Nobody Tells You About Parallelism 😤

Coming from 7 years of Laravel/Node.js, parallelism was always... a thing I thought about but rarely did. PHP processes are isolated by default so you get concurrency per-request, not within a request. Node.js is single-threaded — you work around it with worker threads, which feel like a punishment. Splitting CPU work across cores required spawning processes, setting up message channels, serializing data, and hoping nothing exploded.

The mental model was: "parallelism is hard, save it for when you really need it."

Rust's approach with Rayon flips this completely: parallelism should be as easy as changing one word.

What Rayon Actually Is 🔍

Rayon is a data-parallelism library for Rust. It gives you parallel versions of the standard iterator operations you already use — map, filter, for_each, fold, sum, the whole gang.

The secret is a work-stealing thread pool. Rayon creates a pool of threads (one per CPU core by default) and automatically splits your data across them. When one thread finishes early, it "steals" work from another thread's queue so nobody sits idle. You don't manage threads. You don't write synchronization code. You don't think about load balancing.

You just call .par_iter() instead of .iter().

use rayon::prelude::*;

let signal_strengths: Vec<f32> = iq_samples
    .par_iter()                    // ← this is the only change
    .map(|sample| sample.magnitude())
    .filter(|&strength| strength > NOISE_FLOOR)
    .collect();

Rayon splits iq_samples across your CPU cores, processes each chunk in parallel, and reassembles the results. The output is in the same order as the input. No race conditions. No data corruption. The borrow checker ensures you can't accidentally share mutable state across threads without proper synchronization.

One word. That's it. 🦀

The SDR Context: Why This Actually Matters 📡

For my RF/SDR hobby projects, I needed to process IQ samples at scale. An RTL-SDR running at 2.4 MSPS generates roughly 9.6 MB of raw data per second (4 bytes per complex sample). When I want to analyze 10 seconds of a recording, I'm working with nearly 100MB of data.

My processing pipeline looks roughly like this: normalize samples → apply a window function → compute FFT magnitude → find signal peaks. Each step is embarrassingly parallel — every sample or sample block is processed independently.

In Python with NumPy, I was leaning on NumPy's vectorized operations (which are themselves parallelized in C under the hood). When I rewrote this in Rust, I initially wrote clean sequential Rust that was already 3x faster than my NumPy pipeline. Adding Rayon made it another 6-8x faster on top of that.

The total improvement over my original Python? Let's just say I can now process signals in real time that I used to have to batch overnight. 📻

When Rayon Works (and When It Doesn't) ⚖️

Rayon shines when your work is CPU-bound and independent per item. Signal processing, image processing, bulk data transformations, number crunching — this is Rayon's home turf.

It won't help with:

  • I/O-bound work — reading files or hitting network endpoints. The bottleneck is the disk/network, not the CPU. Use tokio async here.
  • Work with heavy inter-item dependencies — if item #5 depends on the output of item #4, you can't process them in parallel. Sequential is correct here.
  • Very small collections — the overhead of splitting work and coordinating threads is real. On a 20-element Vec, sequential is faster. Rayon is worth it when you have thousands of items or expensive per-item work.

The good news: if you swap to .par_iter() and it's slower for a small input, that's fine — you can benchmark both and pick the winner. The API is identical either way.

The PHP Developer Brain Explodes 🐘

Coming from PHP, the mental model shift here is significant. In Laravel, if I want to process a collection in parallel, I'd... probably just not? Or reach for a queue with multiple workers? Or a third-party package? There's no built-in way to say "use all my CPU cores on this collection operation right now."

In Rust with Rayon:

// Sequential: uses 1 core
let results: Vec<_> = huge_collection.iter().map(expensive_fn).collect();

// Parallel: uses all cores, same result, same API
let results: Vec<_> = huge_collection.par_iter().map(expensive_fn).collect();

Same code. Same output. One word different. That's the level of ergonomics Rayon achieves. The Laravel developer in me is genuinely a bit jealous of this.

Parallel Sorting: The Thing That Broke My Brain 🔀

Here's one that still surprises me. Sorting a massive Vec in parallel:

use rayon::prelude::*;

let mut signal_peaks: Vec<f32> = find_all_peaks(&fft_data);
signal_peaks.par_sort_unstable_by(|a, b| b.partial_cmp(a).unwrap());

Parallel sort. Two words. Rayon automatically uses a parallel merge sort variant across all cores. On large inputs, this is dramatically faster than sequential sort, and the API is almost identical to the standard sort_unstable_by.

I used to think parallel algorithms were things you read about in a CS textbook and then never actually implemented because the complexity wasn't worth it. Rayon makes them just... normal things you use on a Tuesday afternoon. 🤯

What Excited Me About the Safety Angle 🔒

Here's the thing that matters from a security and correctness standpoint: Rayon can only parallelize operations that are safe to parallelize.

The Rust type system enforces this at compile time. If your iterator items contain a Mutex<T> or Arc<T>, Rayon can work with them. If you try to share mutable references across threads without synchronization, the compiler refuses to compile the code. Not "this might have a race condition at runtime." Just: no, this doesn't compile.

Compare this to threading in most other languages where data races are silent, intermittent, and terrifying to debug. In 7 years of Node.js, I've seen exactly zero data races (because of the event loop) and zero parallelism (same reason). In PHP, processes are isolated so you don't have shared state at all. Neither of these languages taught me to think about data races.

Rust does. And Rayon inherits all of that safety — you get parallel execution with the compiler guaranteeing you haven't created any data races. That combination is genuinely new to me. 🦀🔒

TL;DR 🏁

Rayon is the answer to "how do I actually use all my CPU cores in Rust without losing my mind."

  • Add rayon = "1" to your Cargo.toml
  • use rayon::prelude::*; at the top of your file
  • Change .iter() to .par_iter() (or .into_par_iter() for owned data)
  • That's it

The work-stealing thread pool, the load balancing, the correct output ordering — Rayon handles all of it. The Rust type system ensures you haven't introduced data races. Your CPU cores stop going to waste.

For my SDR signal processing pipeline, this was the difference between "fast enough" and "real-time." For general CPU-bound data processing, it's the highest ROI change I've ever made with a single word.

Coming from 7 years of Laravel and Node.js where parallelism was either impossible or painful — Rayon is one of those Rust features that makes me feel like I've been leaving performance on the table my entire career.

Change one word. Get more speed. Go home early.


Processing signals or large datasets in Rust? Find me on LinkedIn — I'd love to hear what you're working on.

RTL-SDR or other RF projects? Check out GitHub for my hobby project code.

Now go add rayon to your Cargo.toml. Your CPU cores are bored. 🦀⚡📡