Improve code quality of audiofile.read() #171

hagenw · 2025-10-16T11:36:32Z

Closes #162

Improves code quality of audiofile.read() by adding extra function for complicated value parsing.

Summary by Sourcery

Refactor audiofile.read() by extracting complex offset and duration parsing into dedicated helper functions and simplifying the main read logic

Enhancements:

Add _parse_time_value to centralize time value conversion and handling of NaN
Introduce _needs_sampling_rate to determine when sampling rate is required
Extract negative and infinite offset/duration handling into _normalize_offset_duration
Add _create_empty_signal to generate correctly shaped empty audio arrays
Simplify read() by replacing inlined parsing logic with helper function calls

sourcery-ai · 2025-10-16T11:36:38Z

Reviewer's Guide

Extracts time parsing and normalization logic from audiofile.read() into dedicated helper functions (_parse_time_value, _needs_sampling_rate, _normalize_offset_duration, _create_empty_signal) and refactors read() to utilize these helpers, reducing inline complexity and improving maintainability.

Class diagram for new helper functions in audiofile.read() refactor

classDiagram
    class read {
        +read(file, offset, duration, always_2d)
    }
    class _parse_time_value {
        +_parse_time_value(value, sampling_rate)
    }
    class _needs_sampling_rate {
        +_needs_sampling_rate(duration, offset)
    }
    class _normalize_offset_duration {
        +_normalize_offset_duration(offset, duration, signal_duration)
    }
    class _create_empty_signal {
        +_create_empty_signal(file, always_2d)
    }
    read --> _parse_time_value
    read --> _needs_sampling_rate
    read --> _normalize_offset_duration
    read --> _create_empty_signal

Flow diagram for refactored time value parsing in audiofile.read()

flowchart TD
    A["read()"] --> B["_needs_sampling_rate(duration, offset)"]
    B -- True --> C["get_sampling_rate(file)"]
    C --> D["_parse_time_value(duration, sampling_rate)"]
    C --> E["_parse_time_value(offset, sampling_rate)"]
    D --> F["Check if normalization needed"]
    E --> F
    F -- Yes --> G["get_duration(file)"]
    G --> H["_normalize_offset_duration(offset, duration, signal_duration)"]
    H --> I["Convert to samples"]
    I --> J["Return signal"]
    F -- No --> I
    I --> J

Flow diagram for empty signal creation in audiofile.read()

flowchart TD
    A["duration == 0"] --> B["_create_empty_signal(file, always_2d)"]
    B --> C["Return empty signal and sampling_rate"]

File-Level Changes

Change	Details	Files
Introduce helper functions for time value parsing, normalization, and empty-signal creation	Add _parse_time_value for unified time-to-seconds conversion Add _needs_sampling_rate to determine when sampling rate is required Add _normalize_offset_duration to handle negative and infinite offsets/durations Add _create_empty_signal to generate correctly shaped empty arrays	`audiofile/core/io.py`
Refactor read() to leverage the new helper functions	Replace inline sampling-rate condition with _needs_sampling_rate Use _parse_time_value for offset and duration parsing Use _normalize_offset_duration instead of manual normalization logic Use _create_empty_signal for zero-duration return case	`audiofile/core/io.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#162	Remove redundant conditionals from the codebase to improve clarity and efficiency.	✅
#162	Refactor the function flagged for low code quality (audiofile.read()) to improve its code quality score by reducing method length, cognitive complexity, and working memory usage.	✅

Possibly linked issues

Code Quality Issues: Redundant Conditionals and Low Function Quality #162: The PR refactors the audiofile.read function into smaller functions, addressing the low code quality and nesting issues mentioned.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The _normalize_offset_duration function has become very large and complex—consider refactoring it into smaller, well-named helper functions or using a table-driven approach to make each case clearer and more maintainable.
The logic in _needs_sampling_rate is redundant (the first duration is not None check already covers string values), so simplifying those conditions would make the intent clearer and reduce unneeded branches.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `_normalize_offset_duration` function has become very large and complex—consider refactoring it into smaller, well-named helper functions or using a table-driven approach to make each case clearer and more maintainable.
- The logic in `_needs_sampling_rate` is redundant (the first `duration is not None` check already covers string values), so simplifying those conditions would make the intent clearer and reduce unneeded branches.

## Individual Comments

### Comment 1
<location> `audiofile/core/io.py:39-48` </location>
<code_context>
+    return parsed
+
+
+def _needs_sampling_rate(
+    duration: float | int | str | np.timedelta64,
+    offset: float | int | str | np.timedelta64,
+) -> bool:
+    """Check if sampling rate is needed for parsing offset/duration.
+
+    Args:
+        duration: duration value
+        offset: offset value
+
+    Returns:
+        True if sampling rate is needed
+
+    """
+    if duration is not None or isinstance(duration, str):
+        return True
+    if offset is not None and isinstance(offset, str):
</code_context>

<issue_to_address>
**issue (bug_risk):** Logic in _needs_sampling_rate may always return True for duration.

The condition will always be True for string values, making the check redundant and potentially causing unnecessary sampling rate retrieval. Please revise the logic to ensure it only returns True when needed.
</issue_to_address>

### Comment 2
<location> `audiofile/core/io.py:62-71` </location>
<code_context>
+def _normalize_offset_duration(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Fallback branch in _normalize_offset_duration may mask logic errors.

Consider raising an exception or logging a warning in this branch to surface potential bugs or unhandled cases, rather than silently normalizing values.

Suggested implementation:

```python
import logging

def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:

```

```python
def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:
    """
    # Example normalization logic (add your actual logic here)
    if offset is not None and offset < 0:
        offset = signal_duration + offset
    if duration is not None and duration < 0:
        duration = signal_duration + duration

    # Fallback branch: if values are still not normalized as expected
    if (offset is not None and (offset < 0 or offset > signal_duration)) or (
        duration is not None and (duration < 0 or duration > signal_duration)
    ):
        logging.warning(
            "Unexpected offset/duration normalization: offset=%s, duration=%s, signal_duration=%s",
            offset, duration, signal_duration
        )
        # Optionally, raise an exception instead of logging
        # raise ValueError(f"Unhandled offset/duration values: offset={offset}, duration={duration}, signal_duration={signal_duration}")

    return offset, duration

```

You may need to adjust the normalization logic to match your actual implementation. 
Decide whether you want to log a warning or raise an exception in the fallback branch. 
If you choose to raise an exception, uncomment the `raise ValueError` line and remove the `logging.warning` line.
</issue_to_address>

### Comment 3
<location> `audiofile/core/io.py:34-36` </location>
<code_context>
def _parse_time_value(
    value: float | int | str | np.timedelta64,
    sampling_rate: int,
) -> float | None:
    """Parse a time value (offset or duration) to seconds.

    Args:
        value: time value to parse
        sampling_rate: sampling rate for conversion

    Returns:
        time value in seconds, or None if NaN

    """
    if value is None:
        return None
    parsed = duration_in_seconds(value, sampling_rate)
    if np.isnan(parsed):
        return None
    return parsed

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
    return None if np.isnan(parsed) else parsed
```
</issue_to_address>

### Comment 4
<location> `audiofile/core/io.py:55-58` </location>
<code_context>
def _needs_sampling_rate(
    duration: float | int | str | np.timedelta64,
    offset: float | int | str | np.timedelta64,
) -> bool:
    """Check if sampling rate is needed for parsing offset/duration.

    Args:
        duration: duration value
        offset: offset value

    Returns:
        True if sampling rate is needed

    """
    if duration is not None or isinstance(duration, str):
        return True
    if offset is not None and isinstance(offset, str):
        return True
    if offset is not None and offset != 0:
        return True
    return False

</code_context>

<issue_to_address>
**suggestion (code-quality):** Hoist a repeated condition into a parent condition ([`hoist-repeated-if-condition`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/hoist-repeated-if-condition/))

```suggestion
    if offset is not None:
        if isinstance(offset, str):
            return True
        if offset != 0:
            return True
```
</issue_to_address>

### Comment 5
<location> `audiofile/core/io.py:62` </location>
<code_context>
def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:
        offset: offset in seconds (can be negative or None)
        duration: duration in seconds (can be negative or None)
        signal_duration: total duration of signal in seconds

    Returns:
        tuple of (normalized_offset, normalized_duration)
        where offset is >= 0 and duration is >= 0 or None

    """
    # Handle: offset=None, duration < 0
    if offset is None and duration is not None and duration < 0:
        return max(0, signal_duration + duration), None

    # Handle: offset=None, duration >= 0
    if offset is None and duration is not None and duration >= 0:
        if np.isinf(duration):
            return 0, None
        return 0, duration

    # Guard: offset is None at this point means both are None
    if offset is None:
        return 0, None

    # Handle: offset >= 0, duration < 0
    if offset >= 0 and duration is not None and duration < 0:
        if np.isinf(offset) and np.isinf(duration):
            return 0, None
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            offset = min(offset, signal_duration)
            duration = np.sign(duration) * signal_duration
        orig_offset = offset
        offset = max(0, offset + duration)
        duration = min(-duration, orig_offset)
        return offset, duration

    # Handle: offset >= 0, duration >= 0
    if offset >= 0 and duration is not None and duration >= 0:
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            return offset, None
        return offset, duration

    # Handle: offset < 0, duration=None
    if offset < 0 and duration is None:
        return max(0, signal_duration + offset), None

    # Handle: offset >= 0, duration=None
    if offset >= 0 and duration is None:
        if np.isinf(offset):
            return 0, 0.0
        return offset, None

    # Handle: offset < 0, duration > 0
    if offset < 0 and duration is not None and duration > 0:
        if np.isinf(offset) and np.isinf(duration):
            return 0, None
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            offset = signal_duration + offset
            offset = max(0, offset)
            return offset, None
        offset = signal_duration + offset
        if offset < 0:
            duration = max(0, duration + offset)
            offset = 0
        else:
            duration = min(duration, signal_duration - offset)
        return offset, duration

    # Handle: offset < 0, duration < 0
    if offset < 0 and duration < 0:
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            duration = -signal_duration
        else:
            orig_offset = offset
            offset = max(0, signal_duration + offset + duration)
            duration = min(-duration, signal_duration + orig_offset)
            duration = max(0, duration)
        return offset, duration

    # Fallback (should not reach here)
    return offset if offset >= 0 else 0, duration

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Hoist a repeated condition into a parent condition ([`hoist-repeated-if-condition`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/hoist-repeated-if-condition/))
- Remove redundant conditional [×4] ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
- Low code quality found in \_normalize\_offset\_duration - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>




The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-16T11:37:18Z

audiofile/core/io.py

+def _needs_sampling_rate(
+    duration: float | int | str | np.timedelta64,
+    offset: float | int | str | np.timedelta64,
+) -> bool:
+    """Check if sampling rate is needed for parsing offset/duration.
+
+    Args:
+        duration: duration value
+        offset: offset value
+


issue (bug_risk): Logic in _needs_sampling_rate may always return True for duration.

The condition will always be True for string values, making the check redundant and potentially causing unnecessary sampling rate retrieval. Please revise the logic to ensure it only returns True when needed.

sourcery-ai · 2025-10-16T11:37:18Z

audiofile/core/io.py

+def _normalize_offset_duration(
+    offset: float | None,
+    duration: float | None,
+    signal_duration: float,
+) -> tuple[float, float | None]:
+    """Normalize offset and duration to handle negative values.
+
+    Converts negative offset/duration values (counted from end)
+    to positive values (counted from start).
+


suggestion (bug_risk): Fallback branch in _normalize_offset_duration may mask logic errors.

Consider raising an exception or logging a warning in this branch to surface potential bugs or unhandled cases, rather than silently normalizing values.

Suggested implementation:

import logging def _normalize_offset_duration( offset: float | None, duration: float | None, signal_duration: float, ) -> tuple[float, float | None]: """Normalize offset and duration to handle negative values. Converts negative offset/duration values (counted from end) to positive values (counted from start). Args:

def _normalize_offset_duration( offset: float | None, duration: float | None, signal_duration: float, ) -> tuple[float, float | None]: """Normalize offset and duration to handle negative values. Converts negative offset/duration values (counted from end) to positive values (counted from start). Args: """ # Example normalization logic (add your actual logic here) if offset is not None and offset < 0: offset = signal_duration + offset if duration is not None and duration < 0: duration = signal_duration + duration # Fallback branch: if values are still not normalized as expected if (offset is not None and (offset < 0 or offset > signal_duration)) or ( duration is not None and (duration < 0 or duration > signal_duration) ): logging.warning( "Unexpected offset/duration normalization: offset=%s, duration=%s, signal_duration=%s", offset, duration, signal_duration ) # Optionally, raise an exception instead of logging # raise ValueError(f"Unhandled offset/duration values: offset={offset}, duration={duration}, signal_duration={signal_duration}") return offset, duration

You may need to adjust the normalization logic to match your actual implementation.
Decide whether you want to log a warning or raise an exception in the fallback branch.
If you choose to raise an exception, uncomment the raise ValueError line and remove the logging.warning line.

sourcery-ai · 2025-10-16T11:37:18Z

audiofile/core/io.py

+    if np.isnan(parsed):
+        return None
+    return parsed


suggestion (code-quality): We've found these issues:

Lift code into else after jump in control flow (reintroduce-else)

Replace if statement with if expression (assign-if-exp)

Suggested change

if np.isnan(parsed):

return None

return parsed

return None if np.isnan(parsed) else parsed

sourcery-ai · 2025-10-16T11:37:18Z

audiofile/core/io.py

+    if offset is not None and isinstance(offset, str):
+        return True
+    if offset is not None and offset != 0:
+        return True


suggestion (code-quality): Hoist a repeated condition into a parent condition (hoist-repeated-if-condition)

Suggested change

if offset is not None and isinstance(offset, str):

return True

if offset is not None and offset != 0:

return True

if offset is not None:

if isinstance(offset, str):

return True

if offset != 0:

return True

sourcery-ai · 2025-10-16T11:37:18Z

audiofile/core/io.py

+    return False
+
+
+def _normalize_offset_duration(


issue (code-quality): We've found these issues:

Hoist a repeated condition into a parent condition (hoist-repeated-if-condition)

Remove redundant conditional [×4] (remove-redundant-if)

Low code quality found in _normalize_offset_duration - 24% (low-code-quality)

Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.

Reduce nesting, perhaps by introducing guard clauses to return early.

Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.

hagenw · 2025-10-16T11:39:24Z

The current suggestion does not have full code coverage yet.

Improve code quality of audiofile.read()

1592c82

sourcery-ai bot reviewed Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve code quality of audiofile.read() #171

Improve code quality of audiofile.read() #171

Uh oh!

hagenw commented Oct 16, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Oct 16, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Oct 16, 2025

Uh oh!

sourcery-ai bot Oct 16, 2025

Uh oh!

sourcery-ai bot Oct 16, 2025

Uh oh!

sourcery-ai bot Oct 16, 2025

Uh oh!

sourcery-ai bot Oct 16, 2025

Uh oh!

hagenw commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve code quality of audiofile.read() #171

Are you sure you want to change the base?

Improve code quality of audiofile.read() #171

Uh oh!

Conversation

hagenw commented Oct 16, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for new helper functions in audiofile.read() refactor

Flow diagram for refactored time value parsing in audiofile.read()

Flow diagram for empty signal creation in audiofile.read()

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

hagenw commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hagenw commented Oct 16, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 16, 2025 •

edited

Loading