Skip to content

Conversation

@hagenw
Copy link
Member

@hagenw hagenw commented Oct 16, 2025

Closes #162

Improves code quality of audiofile.read() by adding extra function for complicated value parsing.

Summary by Sourcery

Refactor audiofile.read() by extracting complex offset and duration parsing into dedicated helper functions and simplifying the main read logic

Enhancements:

  • Add _parse_time_value to centralize time value conversion and handling of NaN
  • Introduce _needs_sampling_rate to determine when sampling rate is required
  • Extract negative and infinite offset/duration handling into _normalize_offset_duration
  • Add _create_empty_signal to generate correctly shaped empty audio arrays
  • Simplify read() by replacing inlined parsing logic with helper function calls

@sourcery-ai
Copy link

sourcery-ai bot commented Oct 16, 2025

Reviewer's Guide

Extracts time parsing and normalization logic from audiofile.read() into dedicated helper functions (_parse_time_value, _needs_sampling_rate, _normalize_offset_duration, _create_empty_signal) and refactors read() to utilize these helpers, reducing inline complexity and improving maintainability.

Class diagram for new helper functions in audiofile.read() refactor

classDiagram
    class read {
        +read(file, offset, duration, always_2d)
    }
    class _parse_time_value {
        +_parse_time_value(value, sampling_rate)
    }
    class _needs_sampling_rate {
        +_needs_sampling_rate(duration, offset)
    }
    class _normalize_offset_duration {
        +_normalize_offset_duration(offset, duration, signal_duration)
    }
    class _create_empty_signal {
        +_create_empty_signal(file, always_2d)
    }
    read --> _parse_time_value
    read --> _needs_sampling_rate
    read --> _normalize_offset_duration
    read --> _create_empty_signal
Loading

Flow diagram for refactored time value parsing in audiofile.read()

flowchart TD
    A["read()"] --> B["_needs_sampling_rate(duration, offset)"]
    B -- True --> C["get_sampling_rate(file)"]
    C --> D["_parse_time_value(duration, sampling_rate)"]
    C --> E["_parse_time_value(offset, sampling_rate)"]
    D --> F["Check if normalization needed"]
    E --> F
    F -- Yes --> G["get_duration(file)"]
    G --> H["_normalize_offset_duration(offset, duration, signal_duration)"]
    H --> I["Convert to samples"]
    I --> J["Return signal"]
    F -- No --> I
    I --> J
Loading

Flow diagram for empty signal creation in audiofile.read()

flowchart TD
    A["duration == 0"] --> B["_create_empty_signal(file, always_2d)"]
    B --> C["Return empty signal and sampling_rate"]
Loading

File-Level Changes

Change Details Files
Introduce helper functions for time value parsing, normalization, and empty-signal creation
  • Add _parse_time_value for unified time-to-seconds conversion
  • Add _needs_sampling_rate to determine when sampling rate is required
  • Add _normalize_offset_duration to handle negative and infinite offsets/durations
  • Add _create_empty_signal to generate correctly shaped empty arrays
audiofile/core/io.py
Refactor read() to leverage the new helper functions
  • Replace inline sampling-rate condition with _needs_sampling_rate
  • Use _parse_time_value for offset and duration parsing
  • Use _normalize_offset_duration instead of manual normalization logic
  • Use _create_empty_signal for zero-duration return case
audiofile/core/io.py

Assessment against linked issues

Issue Objective Addressed Explanation
#162 Remove redundant conditionals from the codebase to improve clarity and efficiency.
#162 Refactor the function flagged for low code quality (audiofile.read()) to improve its code quality score by reducing method length, cognitive complexity, and working memory usage.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The _normalize_offset_duration function has become very large and complex—consider refactoring it into smaller, well-named helper functions or using a table-driven approach to make each case clearer and more maintainable.
  • The logic in _needs_sampling_rate is redundant (the first duration is not None check already covers string values), so simplifying those conditions would make the intent clearer and reduce unneeded branches.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_normalize_offset_duration` function has become very large and complex—consider refactoring it into smaller, well-named helper functions or using a table-driven approach to make each case clearer and more maintainable.
- The logic in `_needs_sampling_rate` is redundant (the first `duration is not None` check already covers string values), so simplifying those conditions would make the intent clearer and reduce unneeded branches.

## Individual Comments

### Comment 1
<location> `audiofile/core/io.py:39-48` </location>
<code_context>
+    return parsed
+
+
+def _needs_sampling_rate(
+    duration: float | int | str | np.timedelta64,
+    offset: float | int | str | np.timedelta64,
+) -> bool:
+    """Check if sampling rate is needed for parsing offset/duration.
+
+    Args:
+        duration: duration value
+        offset: offset value
+
+    Returns:
+        True if sampling rate is needed
+
+    """
+    if duration is not None or isinstance(duration, str):
+        return True
+    if offset is not None and isinstance(offset, str):
</code_context>

<issue_to_address>
**issue (bug_risk):** Logic in _needs_sampling_rate may always return True for duration.

The condition will always be True for string values, making the check redundant and potentially causing unnecessary sampling rate retrieval. Please revise the logic to ensure it only returns True when needed.
</issue_to_address>

### Comment 2
<location> `audiofile/core/io.py:62-71` </location>
<code_context>
+def _normalize_offset_duration(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Fallback branch in _normalize_offset_duration may mask logic errors.

Consider raising an exception or logging a warning in this branch to surface potential bugs or unhandled cases, rather than silently normalizing values.

Suggested implementation:

```python
import logging

def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:

```

```python
def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:
    """
    # Example normalization logic (add your actual logic here)
    if offset is not None and offset < 0:
        offset = signal_duration + offset
    if duration is not None and duration < 0:
        duration = signal_duration + duration

    # Fallback branch: if values are still not normalized as expected
    if (offset is not None and (offset < 0 or offset > signal_duration)) or (
        duration is not None and (duration < 0 or duration > signal_duration)
    ):
        logging.warning(
            "Unexpected offset/duration normalization: offset=%s, duration=%s, signal_duration=%s",
            offset, duration, signal_duration
        )
        # Optionally, raise an exception instead of logging
        # raise ValueError(f"Unhandled offset/duration values: offset={offset}, duration={duration}, signal_duration={signal_duration}")

    return offset, duration

```

You may need to adjust the normalization logic to match your actual implementation. 
Decide whether you want to log a warning or raise an exception in the fallback branch. 
If you choose to raise an exception, uncomment the `raise ValueError` line and remove the `logging.warning` line.
</issue_to_address>

### Comment 3
<location> `audiofile/core/io.py:34-36` </location>
<code_context>
def _parse_time_value(
    value: float | int | str | np.timedelta64,
    sampling_rate: int,
) -> float | None:
    """Parse a time value (offset or duration) to seconds.

    Args:
        value: time value to parse
        sampling_rate: sampling rate for conversion

    Returns:
        time value in seconds, or None if NaN

    """
    if value is None:
        return None
    parsed = duration_in_seconds(value, sampling_rate)
    if np.isnan(parsed):
        return None
    return parsed

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
    return None if np.isnan(parsed) else parsed
```
</issue_to_address>

### Comment 4
<location> `audiofile/core/io.py:55-58` </location>
<code_context>
def _needs_sampling_rate(
    duration: float | int | str | np.timedelta64,
    offset: float | int | str | np.timedelta64,
) -> bool:
    """Check if sampling rate is needed for parsing offset/duration.

    Args:
        duration: duration value
        offset: offset value

    Returns:
        True if sampling rate is needed

    """
    if duration is not None or isinstance(duration, str):
        return True
    if offset is not None and isinstance(offset, str):
        return True
    if offset is not None and offset != 0:
        return True
    return False

</code_context>

<issue_to_address>
**suggestion (code-quality):** Hoist a repeated condition into a parent condition ([`hoist-repeated-if-condition`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/hoist-repeated-if-condition/))

```suggestion
    if offset is not None:
        if isinstance(offset, str):
            return True
        if offset != 0:
            return True
```
</issue_to_address>

### Comment 5
<location> `audiofile/core/io.py:62` </location>
<code_context>
def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:
        offset: offset in seconds (can be negative or None)
        duration: duration in seconds (can be negative or None)
        signal_duration: total duration of signal in seconds

    Returns:
        tuple of (normalized_offset, normalized_duration)
        where offset is >= 0 and duration is >= 0 or None

    """
    # Handle: offset=None, duration < 0
    if offset is None and duration is not None and duration < 0:
        return max(0, signal_duration + duration), None

    # Handle: offset=None, duration >= 0
    if offset is None and duration is not None and duration >= 0:
        if np.isinf(duration):
            return 0, None
        return 0, duration

    # Guard: offset is None at this point means both are None
    if offset is None:
        return 0, None

    # Handle: offset >= 0, duration < 0
    if offset >= 0 and duration is not None and duration < 0:
        if np.isinf(offset) and np.isinf(duration):
            return 0, None
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            offset = min(offset, signal_duration)
            duration = np.sign(duration) * signal_duration
        orig_offset = offset
        offset = max(0, offset + duration)
        duration = min(-duration, orig_offset)
        return offset, duration

    # Handle: offset >= 0, duration >= 0
    if offset >= 0 and duration is not None and duration >= 0:
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            return offset, None
        return offset, duration

    # Handle: offset < 0, duration=None
    if offset < 0 and duration is None:
        return max(0, signal_duration + offset), None

    # Handle: offset >= 0, duration=None
    if offset >= 0 and duration is None:
        if np.isinf(offset):
            return 0, 0.0
        return offset, None

    # Handle: offset < 0, duration > 0
    if offset < 0 and duration is not None and duration > 0:
        if np.isinf(offset) and np.isinf(duration):
            return 0, None
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            offset = signal_duration + offset
            offset = max(0, offset)
            return offset, None
        offset = signal_duration + offset
        if offset < 0:
            duration = max(0, duration + offset)
            offset = 0
        else:
            duration = min(duration, signal_duration - offset)
        return offset, duration

    # Handle: offset < 0, duration < 0
    if offset < 0 and duration < 0:
        if np.isinf(offset):
            return 0, 0.0
        if np.isinf(duration):
            duration = -signal_duration
        else:
            orig_offset = offset
            offset = max(0, signal_duration + offset + duration)
            duration = min(-duration, signal_duration + orig_offset)
            duration = max(0, duration)
        return offset, duration

    # Fallback (should not reach here)
    return offset if offset >= 0 else 0, duration

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Hoist a repeated condition into a parent condition ([`hoist-repeated-if-condition`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/hoist-repeated-if-condition/))
- Remove redundant conditional [×4] ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
- Low code quality found in \_normalize\_offset\_duration - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>




The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +39 to +48
def _needs_sampling_rate(
duration: float | int | str | np.timedelta64,
offset: float | int | str | np.timedelta64,
) -> bool:
"""Check if sampling rate is needed for parsing offset/duration.
Args:
duration: duration value
offset: offset value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Logic in _needs_sampling_rate may always return True for duration.

The condition will always be True for string values, making the check redundant and potentially causing unnecessary sampling rate retrieval. Please revise the logic to ensure it only returns True when needed.

Comment on lines +62 to +71
def _normalize_offset_duration(
offset: float | None,
duration: float | None,
signal_duration: float,
) -> tuple[float, float | None]:
"""Normalize offset and duration to handle negative values.
Converts negative offset/duration values (counted from end)
to positive values (counted from start).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Fallback branch in _normalize_offset_duration may mask logic errors.

Consider raising an exception or logging a warning in this branch to surface potential bugs or unhandled cases, rather than silently normalizing values.

Suggested implementation:

import logging

def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:
def _normalize_offset_duration(
    offset: float | None,
    duration: float | None,
    signal_duration: float,
) -> tuple[float, float | None]:
    """Normalize offset and duration to handle negative values.

    Converts negative offset/duration values (counted from end)
    to positive values (counted from start).

    Args:
    """
    # Example normalization logic (add your actual logic here)
    if offset is not None and offset < 0:
        offset = signal_duration + offset
    if duration is not None and duration < 0:
        duration = signal_duration + duration

    # Fallback branch: if values are still not normalized as expected
    if (offset is not None and (offset < 0 or offset > signal_duration)) or (
        duration is not None and (duration < 0 or duration > signal_duration)
    ):
        logging.warning(
            "Unexpected offset/duration normalization: offset=%s, duration=%s, signal_duration=%s",
            offset, duration, signal_duration
        )
        # Optionally, raise an exception instead of logging
        # raise ValueError(f"Unhandled offset/duration values: offset={offset}, duration={duration}, signal_duration={signal_duration}")

    return offset, duration

You may need to adjust the normalization logic to match your actual implementation.
Decide whether you want to log a warning or raise an exception in the fallback branch.
If you choose to raise an exception, uncomment the raise ValueError line and remove the logging.warning line.

Comment on lines +34 to +36
if np.isnan(parsed):
return None
return parsed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
if np.isnan(parsed):
return None
return parsed
return None if np.isnan(parsed) else parsed

Comment on lines +55 to +58
if offset is not None and isinstance(offset, str):
return True
if offset is not None and offset != 0:
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Hoist a repeated condition into a parent condition (hoist-repeated-if-condition)

Suggested change
if offset is not None and isinstance(offset, str):
return True
if offset is not None and offset != 0:
return True
if offset is not None:
if isinstance(offset, str):
return True
if offset != 0:
return True

return False


def _normalize_offset_duration(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:


Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

@hagenw
Copy link
Member Author

hagenw commented Oct 16, 2025

The current suggestion does not have full code coverage yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code Quality Issues: Redundant Conditionals and Low Function Quality

2 participants