High-performance FASTQ/FASTA processing toolkit with optimized I/O and parallel processing π
Based on agordon/fastx_toolkit with significant performance improvements
- β‘ Block-Based I/O - Optimized buffered I/O for 80%+ performance improvement
- π OpenMP Parallelization - Multi-threaded processing (requires tuning for optimal performance)
- 𧬠FASTQ to FASTA Conversion - Fast and reliable format conversion
- π Quality Statistics - Comprehensive sequence quality analysis
- π² Sample Generator - Generate test FASTQ/FASTA samples for validation
All tools use Block-Based I/O by default. The toolkit provides two performance tiers:
- Block-Based I/O (Default) - Optimized buffering with configurable buffer sizes
- OpenMP - Multi-threaded processing leveraging CPU cores (environment-dependent, requires parameter tuning)
- πͺ Windows 10/11 or π§ Linux (Ubuntu 20.04+)
- π§ CMake 3.8+ - Download
- π¨ C++17 compatible compiler
- Windows: Visual Studio 2019+ with MSBuild
- Linux: GCC 7+ or Clang 5+
- π· OpenMP (Optional) - For parallel processing (usually included with compiler)
See the Building from Source section below.
All tools support the -h or --help flag to display usage information.
Convert FASTQ files to FASTA format with optional filtering and renaming.
Options:
| Option | Description | Default | Range |
|---|---|---|---|
-h |
Print help message | ||
-i |
Input file path | STDIN | |
-o |
Output file path | STDOUT | |
-n |
Keep sequences with unknown (N) nucleotides | false | |
-r |
Rename sequence IDs to sequential numbers | false | |
--ibufs |
Input buffer size (bytes) | 32768 | β₯ MXSL |
--obufs |
Output buffer size (bytes) | 32768 | > 0 |
--mxsl |
Maximum sequence length | 25000 | > 0 |
Example:
# Basic conversion
fastq-to-fasta -i input.fastq -o output.fasta
# Convert with filtering and custom buffer
fastq-to-fasta -i input.fastq -o output.fasta -n --ibufs 65536Analyze FASTQ quality scores with optimized buffered I/O.
Options:
| Option | Description | Default | Range |
|---|---|---|---|
-h |
Print help message | ||
-i |
Input file path | STDIN | |
-o |
Output file path | STDOUT | |
--bq |
Base quality offset (Phred encoding) | 33 | 0-255 |
--mnq |
Minimum quality score | -15 | BQ + MNQ β₯ 0 |
--mxq |
Maximum quality score | 93 | BQ + MXQ β€ 255 |
--ibufs |
Input buffer size (bytes) | 32768 | β₯ MXSL |
--mxsl |
Maximum sequence length | 25000 | > 0 |
Example:
# Analyze quality with default settings
fastx-qual-stats -i input.fastq -o stats.txt
# Custom buffer and quality range
fastx-qual-stats -i input.fastq -o stats.txt --ibufs 131072 --bq 33Multi-threaded quality analysis with parallel processing.
β οΈ Performance Warning: OpenMP performance is highly dependent on your system environment and requires careful parameter tuning. Without proper configuration, OpenMP version may perform slower than the single-threaded Block-Based I/O version. Test with your specific hardware and dataset before production use.
Options:
| Option | Description | Default | Range |
|---|---|---|---|
-h |
Print help message | ||
-i |
Input file path | STDIN | |
-o |
Output file path | STDOUT | |
--bq |
Base quality offset (Phred encoding) | 33 | 0-255 |
--mnq |
Minimum quality score | -15 | BQ + MNQ β₯ 0 |
--mxq |
Maximum quality score | 93 | BQ + MXQ β€ 255 |
--ibufs |
Input buffer size (bytes) | 32768 | β₯ MXSL |
--mxsl |
Maximum sequence length | 25000 | > 0 |
--rps |
Record pool size | 500 | > 0 |
--ths |
Number of threads | System default | > 0 |
--dyn |
Enable dynamic thread scheduling | false |
Example:
# Use all available cores
fastx-qual-stats-omp -i input.fastq -o stats.txt
# Specify thread count and pool size
fastx-qual-stats-omp -i input.fastq -o stats.txt --ths 8 --rps 1000Generate synthetic FASTQ/FASTA samples for testing and validation.
Options:
| Option | Description | Default | Range |
|---|---|---|---|
-h |
Print help message | ||
-s, --sf |
Sample format (fasta/fastq) | Required | |
--nr |
Number of records to generate | > 0 | |
--crlf |
Use CRLF line endings | LF | |
-o |
Output file path | STDOUT | |
--bq |
Base quality offset | 33 | |
--mnq |
Minimum quality score | -15 | BQ + |MNQ| β₯ 0 |
--mxq |
Maximum quality score | 93 | BQ + MXQ β₯ MNQ |
--mns |
Minimum sequence length | 1 | > 0 |
--mxs |
Maximum sequence length | 50 | β₯ MNS |
--obufs |
Output buffer size (bytes) | 32768 | > 0 |
Example:
# Generate 1M FASTQ records
fastx-samp-gen --sf fastq --nr 1000000 -o sample.fastq
# Generate FASTA with custom sequence lengths
fastx-samp-gen --sf fasta --nr 500000 --mns 50 --mxs 200 -o sample.fastaThe Block-Based I/O version provides consistent performance improvements across all environments:
# Reliable performance with optimized buffering
fastx-qual-stats -i input.fastq -o stats.txtI/O performance is fundamentally dependent on disk block size. Experiment with buffer sizes to find optimal values for your hardware:
# Try different buffer sizes (32KB, 64KB, 128KB)
fastx-qual-stats -i input.fastq --ibufs 32768 # Default
fastx-qual-stats -i input.fastq --ibufs 65536 # 2x larger
fastx-qual-stats -i input.fastq --ibufs 131072 # 4x larger
β οΈ Critical: OpenMP performance is environment-specific and requires extensive testing. It may perform worse than single-threaded version without proper tuning.
Before using OpenMP in production:
- Benchmark both versions with your actual data and hardware
- Tune thread count - More threads β better performance
- Adjust record pool size - Balance memory usage and parallelism efficiency
- Test different combinations - Each system has different optimal settings
# Example: Test OpenMP with conservative settings
fastx-qual-stats-omp -i input.fastq -o stats.txt --ths 4 --rps 500
# Compare performance with Block-Based I/O
fastx-qual-stats -i input.fastq -o stats.txt --ibufs 65536OpenMP Tuning Parameters:
# Smaller files: increase pool size, reduce threads
fastx-qual-stats-omp -i small.fastq --ths 2 --rps 1000
# Larger files: balance threads and pool size
fastx-qual-stats-omp -i large.fastq --ths 8 --rps 300
# Low memory: reduce pool size significantly
fastx-qual-stats-omp -i huge.fastq --ths 4 --rps 100Windows:
- Visual Studio 2019 or later
- CMake 3.8+
- Windows SDK
Linux:
- GCC 7+ or Clang 5+
- CMake 3.8+
- OpenMP (usually included)
# Clone the repository
git clone https://github.com/yourusername/fastx-toolkit.git
cd fastx-toolkit
# Configure (uses CMakePresets.json)
cmake --preset=x64-release
# Build
cmake --build out/build/x64-release --config Release
# Executables will be in: out/build/x64-release/fastx-toolkit/# Clone the repository
git clone https://github.com/yourusername/fastx-toolkit.git
cd fastx-toolkit
# Configure
cmake -B build -DCMAKE_BUILD_TYPE=Release
# Build
cmake --build build --config Release -j$(nproc)
# Install (optional)
sudo cmake --install buildAvailable CMake presets (Windows):
x64-debug- 64-bit Debug buildx64-release- 64-bit Release build (recommended)x86-debug- 32-bit Debug buildx86-release- 32-bit Release build
Test Environment:
- Device: ROG Zephyrus G16 (GA403UI-QS091)
- OS: Windows 11
- Method: Minimum of multiple runs, rounded
| Record Size | Old (ms) | Block-Based I/O (ms) | OpenMP (ms) | Block-Based Speedup | OpenMP Speedup | OpenMP vs Block-Based |
|---|---|---|---|---|---|---|
| 1M | 5,259 | 865 | 533 | 83.6% | 89.9% | +38.4% |
| 2.5M | 12,974 | 2,123 | 1,247 | 83.7% | 90.4% | +41.0% |
| 5M | 27,106 | 4,528 | 2,370 | 83.3% | 91.2% | +47.7% |
| 10M | 52,969 | 8,256 | 4,601 | 84.4% | 91.3% | +44.2% |
| 30M | 157,420 | 24,489 | 13,411 | 84.4% | 91.5% | +45.3% |
| 50M | 262,089 | 40,974 | 21,974 | 84.4% | 91.6% | +46.4% |
| 100M | 528,681 | 85,367 | 52,954 | 83.8% | 89.9% | +38.0% |
- β‘ Block-Based I/O: Consistent ~84% performance improvement (6.2x faster) across all dataset sizes
- π₯ OpenMP: Up to 91.6% speedup (12x faster) on large datasets (50M+ records)
- π― Scaling: OpenMP provides 38-48% additional improvement over Block-Based I/O
- πͺ Windows Performance: Testing on Windows 11 shows exceptional improvements (up to 12x faster)
- π§ Linux Performance: On Linux (Ubuntu 24.04), improvements are slightly lower but still significant (3-5x faster)
| Technology | Purpose | Version |
|---|---|---|
| C++ | Core language | 17 |
| CMake | Build system | 3.8+ |
| OpenMP | Parallel processing | 4.5+ |
| fmt | String formatting | Included |
| args.hxx | Argument parsing | Included |
This project intentionally avoids RAII (Resource Acquisition Is Initialization) patterns in performance-critical sections. Testing revealed that:
- Performance Impact: RAII usage paradoxically caused performance degradation
- Test Environment: Windows 11 with Visual Studio MSBuild
- Cause: Compiler-specific optimization behaviors and destructor overhead in tight loops
- Solution: Manual resource management in hot paths for optimal performance
β οΈ Note: This is a Windows + Visual Studio MSBuild specific issue. Other compilers and platforms may have different characteristics.
- β FASTQ-TO-FASTA - Complete
- β FASTX Statistics (Block-Based I/O) - Complete
- β FASTX Statistics (OpenMP) - Complete
- β FASTX Sample Generator - Complete
β Build fails with CMake errors
- β
Ensure CMake 3.8+ is installed:
cmake --version - β On Windows, run from Visual Studio Developer Command Prompt
- β
Delete
out/orbuild/directory and try again - β Verify C++17 compiler support
β οΈ OpenMP not found
Windows:
- OpenMP is included with Visual Studio (2019+)
- Ensure you're using Visual Studio compiler, not MinGW
Linux:
- Install:
sudo apt-get install libomp-dev(Ubuntu/Debian) - Install:
sudo yum install libomp-devel(RHEL/CentOS)
π Performance slower than expected
- β Ensure Release build configuration is used
- β
Try different buffer sizes with
--ibufsflag - β Start with Block-Based I/O version first - It provides consistent performance
- β If using OpenMP, test if it's actually faster than Block-Based I/O on your system
- β
Tune OpenMP parameters (
--ths,--rps) - default values may not be optimal - β Check available system resources (CPU, RAM, disk I/O)
- β For SSDs, larger buffer sizes (128KB+) may help
- β For HDDs, buffer sizes matching disk block size are optimal
πΎ Out of memory errors with OpenMP
- β
Reduce record pool size:
--rps 200 - β
Reduce thread count:
--ths 4 - β Use Block-Based I/O version instead for very large files
- β Check system RAM availability
π Input/output file issues
- β Ensure input file exists and is readable
- β Check output directory has write permissions
- β Verify file format (FASTQ files should have quality scores)
- β Use absolute paths if relative paths fail
- β Check for special characters in file paths
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE.txt file for details.
- β Free to use, modify, and distribute
- β Source code must be made available when distributed
- β Modifications must also be licensed under AGPL-3.0
β οΈ Network use triggers source code disclosure requirements
- agordon/fastx_toolkit - Original FASTX-Toolkit implementation
- fmtlib/fmt - Modern C++ formatting library
- Taywee/args - Argument parsing library
