-
Notifications
You must be signed in to change notification settings - Fork 17
Improve runtime performance by adding channels #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Use a `Rayon::scope` to spawn two tasks: one for filtering reads, one for printing. - Introduce `WritableRecord` struct to hold data needed for printing. - Move the `write_record` function into the `WritableRecord` implementation.
- Use BufWriter to reduce syscall overhead when writing to stdout. - Move `output_reads_` counting to the writer thread to avoid parallel writes. - Relax atomic ordering from `SeqCst` to `Relaxed` for read counting. - Store pre-formatted FASTQ strings instead of `fastq::Record` to leverage parallel processing before sending through the channel.
…ng reads - Add crossbeam crate to provide unbounded channels - Use Select struct to manage multiple receivers
- Introduced `get_valid_segment` to encapsulate record validation logic. - Added `sequential_filter` and `parallel_filter` for more efficient execution. - Refactored `filter` to delegate to the appropriate function. - Fixed typo: renamed variable `writter` to `writer`.
- Mark tests that call the `filter` function with `#[ignore]` to prevent them from running by default
- Move `record_to_string` outside `WritableRecord` - Refactor `WritableRecord::write_on_buffer to accept` `BufWriter<W: Write>`
…h` before and after trimming
…t that it returns multiple segments
|
Hi @MillarCD Wow, that's great. Let me look into all of this :-) Wouter |
|
Awesome! Thanks for contributing! |
|
Glad to contribute, and thanks for merging! 😁 |

Hellooo!! again @wdecoster,
I was working on improving the performance of Chopper.
I noticed that the major bottleneck occurs due to printing records across all threads.
Although the filtering processing happens in parallel, printing is sequential: each thread must wait to use the standard output. This also involves expensive system calls.
To solve this, I separated the
filterfunction into two tasks, assigning different roles to the threads: processing records and writing records.Writer: A dedicated worker responsible for receiving valid records and writing them to
STDOUT.It uses a standard buffer to reduce system calls and further improve performance. Only one thread acts as a writer.
Filter: Threads that validate records based on quality and apply trimming.
When a record is valid, it sends the record to the writer. All remaining threads act as filters.
I used one channel per worker, which slightly improved performance.
I also leveraged the
crossbeam-channelcrate to manage simultaneous channel reading efficiently.Finally, I created a separate function to run sequentially when only one thread is set.
Benchmark
To evaluate performance improvements, I measured the runtime across different numbers of threads and percentages of records retained after filtering.
Based on my hardware specifications, I selected 1, 2, 4, and 6 threads.
For the filtering levels, I tested 0%, 50%, 80% and 100% of records kept (I tested different parameter values to achieve these results).
Each scenario was executed 20 times per Chopper version, and the average runtime was calculated.
The benchmark used a FASTQ file of approximately 9 GB.
Hardware Specifications
Results: Average Runtime per Number of Threads (in seconds)
In summary, the proposed version significantly improves performance in most scenarios,
especially as the number of threads and the percentage of records kept increase.
The only exception occurs when using two threads, where the original version performs
slightly better due to lower contention when both threads write directly to
STDOUT.In the proposed implementation, the dedicated writer thread remains idle at times while
waiting for new records, which slightly impacts efficiency in this specific case.
Overall, the new design achieves better scalability and more consistent performance across
multiple thread configurations, reducing I/O bottlenecks and improving throughput for larger workloads.
Refactorings
To achieve this performance, I refactored several functions:
filterfunction: I initialized the main variables (Aligner and Trimmer) and then call a specific function depending on whether the workflow is sequential or parallel.I also extracted the main logic that validates quality and applies trimming into a new function called
get_valid_segments.write_recordfunction: I created a new struct calledWritableRecordwith a methodwrite_on_bufferthat writes the valid segments into a buffer.I also added a helper function
record_to_stringto generate the string representation of a record.Tests module: I moved all unit tests of the
main.rsfile, inside a dedicated module inmain.rsfollowing Rust recommendations.I added tests for
get_valid_segmentsto verify the double validation of minimum length.Limitations
One of the main drawbacks of the new approach is related to testing.
Because the implementation of
filterfunction usesStdout.lockto manage writes. So, the unit tests for thefilterfunction print all records to the standard output when executed.To avoid this issue, I had to mark these tests with
#[ignore]momentarily, so they are skipped during regular test runs.A possible improvement would be to test the entire program through an integration test or a GitHub workflow, capturing and validating the output without directly relying on
Stdout.Final Comments
Thank you for taking the time to review these changes. I hope they provide a meaningful improvement.