-
Notifications
You must be signed in to change notification settings - Fork 2
Improve code quality of audiofile.read() #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideExtracts time parsing and normalization logic from audiofile.read() into dedicated helper functions (_parse_time_value, _needs_sampling_rate, _normalize_offset_duration, _create_empty_signal) and refactors read() to utilize these helpers, reducing inline complexity and improving maintainability. Class diagram for new helper functions in audiofile.read() refactorclassDiagram
class read {
+read(file, offset, duration, always_2d)
}
class _parse_time_value {
+_parse_time_value(value, sampling_rate)
}
class _needs_sampling_rate {
+_needs_sampling_rate(duration, offset)
}
class _normalize_offset_duration {
+_normalize_offset_duration(offset, duration, signal_duration)
}
class _create_empty_signal {
+_create_empty_signal(file, always_2d)
}
read --> _parse_time_value
read --> _needs_sampling_rate
read --> _normalize_offset_duration
read --> _create_empty_signal
Flow diagram for refactored time value parsing in audiofile.read()flowchart TD
A["read()"] --> B["_needs_sampling_rate(duration, offset)"]
B -- True --> C["get_sampling_rate(file)"]
C --> D["_parse_time_value(duration, sampling_rate)"]
C --> E["_parse_time_value(offset, sampling_rate)"]
D --> F["Check if normalization needed"]
E --> F
F -- Yes --> G["get_duration(file)"]
G --> H["_normalize_offset_duration(offset, duration, signal_duration)"]
H --> I["Convert to samples"]
I --> J["Return signal"]
F -- No --> I
I --> J
Flow diagram for empty signal creation in audiofile.read()flowchart TD
A["duration == 0"] --> B["_create_empty_signal(file, always_2d)"]
B --> C["Return empty signal and sampling_rate"]
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The
_normalize_offset_durationfunction has become very large and complex—consider refactoring it into smaller, well-named helper functions or using a table-driven approach to make each case clearer and more maintainable. - The logic in
_needs_sampling_rateis redundant (the firstduration is not Nonecheck already covers string values), so simplifying those conditions would make the intent clearer and reduce unneeded branches.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_normalize_offset_duration` function has become very large and complex—consider refactoring it into smaller, well-named helper functions or using a table-driven approach to make each case clearer and more maintainable.
- The logic in `_needs_sampling_rate` is redundant (the first `duration is not None` check already covers string values), so simplifying those conditions would make the intent clearer and reduce unneeded branches.
## Individual Comments
### Comment 1
<location> `audiofile/core/io.py:39-48` </location>
<code_context>
+ return parsed
+
+
+def _needs_sampling_rate(
+ duration: float | int | str | np.timedelta64,
+ offset: float | int | str | np.timedelta64,
+) -> bool:
+ """Check if sampling rate is needed for parsing offset/duration.
+
+ Args:
+ duration: duration value
+ offset: offset value
+
+ Returns:
+ True if sampling rate is needed
+
+ """
+ if duration is not None or isinstance(duration, str):
+ return True
+ if offset is not None and isinstance(offset, str):
</code_context>
<issue_to_address>
**issue (bug_risk):** Logic in _needs_sampling_rate may always return True for duration.
The condition will always be True for string values, making the check redundant and potentially causing unnecessary sampling rate retrieval. Please revise the logic to ensure it only returns True when needed.
</issue_to_address>
### Comment 2
<location> `audiofile/core/io.py:62-71` </location>
<code_context>
+def _normalize_offset_duration(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Fallback branch in _normalize_offset_duration may mask logic errors.
Consider raising an exception or logging a warning in this branch to surface potential bugs or unhandled cases, rather than silently normalizing values.
Suggested implementation:
```python
import logging
def _normalize_offset_duration(
offset: float | None,
duration: float | None,
signal_duration: float,
) -> tuple[float, float | None]:
"""Normalize offset and duration to handle negative values.
Converts negative offset/duration values (counted from end)
to positive values (counted from start).
Args:
```
```python
def _normalize_offset_duration(
offset: float | None,
duration: float | None,
signal_duration: float,
) -> tuple[float, float | None]:
"""Normalize offset and duration to handle negative values.
Converts negative offset/duration values (counted from end)
to positive values (counted from start).
Args:
"""
# Example normalization logic (add your actual logic here)
if offset is not None and offset < 0:
offset = signal_duration + offset
if duration is not None and duration < 0:
duration = signal_duration + duration
# Fallback branch: if values are still not normalized as expected
if (offset is not None and (offset < 0 or offset > signal_duration)) or (
duration is not None and (duration < 0 or duration > signal_duration)
):
logging.warning(
"Unexpected offset/duration normalization: offset=%s, duration=%s, signal_duration=%s",
offset, duration, signal_duration
)
# Optionally, raise an exception instead of logging
# raise ValueError(f"Unhandled offset/duration values: offset={offset}, duration={duration}, signal_duration={signal_duration}")
return offset, duration
```
You may need to adjust the normalization logic to match your actual implementation.
Decide whether you want to log a warning or raise an exception in the fallback branch.
If you choose to raise an exception, uncomment the `raise ValueError` line and remove the `logging.warning` line.
</issue_to_address>
### Comment 3
<location> `audiofile/core/io.py:34-36` </location>
<code_context>
def _parse_time_value(
value: float | int | str | np.timedelta64,
sampling_rate: int,
) -> float | None:
"""Parse a time value (offset or duration) to seconds.
Args:
value: time value to parse
sampling_rate: sampling rate for conversion
Returns:
time value in seconds, or None if NaN
"""
if value is None:
return None
parsed = duration_in_seconds(value, sampling_rate)
if np.isnan(parsed):
return None
return parsed
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return None if np.isnan(parsed) else parsed
```
</issue_to_address>
### Comment 4
<location> `audiofile/core/io.py:55-58` </location>
<code_context>
def _needs_sampling_rate(
duration: float | int | str | np.timedelta64,
offset: float | int | str | np.timedelta64,
) -> bool:
"""Check if sampling rate is needed for parsing offset/duration.
Args:
duration: duration value
offset: offset value
Returns:
True if sampling rate is needed
"""
if duration is not None or isinstance(duration, str):
return True
if offset is not None and isinstance(offset, str):
return True
if offset is not None and offset != 0:
return True
return False
</code_context>
<issue_to_address>
**suggestion (code-quality):** Hoist a repeated condition into a parent condition ([`hoist-repeated-if-condition`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/hoist-repeated-if-condition/))
```suggestion
if offset is not None:
if isinstance(offset, str):
return True
if offset != 0:
return True
```
</issue_to_address>
### Comment 5
<location> `audiofile/core/io.py:62` </location>
<code_context>
def _normalize_offset_duration(
offset: float | None,
duration: float | None,
signal_duration: float,
) -> tuple[float, float | None]:
"""Normalize offset and duration to handle negative values.
Converts negative offset/duration values (counted from end)
to positive values (counted from start).
Args:
offset: offset in seconds (can be negative or None)
duration: duration in seconds (can be negative or None)
signal_duration: total duration of signal in seconds
Returns:
tuple of (normalized_offset, normalized_duration)
where offset is >= 0 and duration is >= 0 or None
"""
# Handle: offset=None, duration < 0
if offset is None and duration is not None and duration < 0:
return max(0, signal_duration + duration), None
# Handle: offset=None, duration >= 0
if offset is None and duration is not None and duration >= 0:
if np.isinf(duration):
return 0, None
return 0, duration
# Guard: offset is None at this point means both are None
if offset is None:
return 0, None
# Handle: offset >= 0, duration < 0
if offset >= 0 and duration is not None and duration < 0:
if np.isinf(offset) and np.isinf(duration):
return 0, None
if np.isinf(offset):
return 0, 0.0
if np.isinf(duration):
offset = min(offset, signal_duration)
duration = np.sign(duration) * signal_duration
orig_offset = offset
offset = max(0, offset + duration)
duration = min(-duration, orig_offset)
return offset, duration
# Handle: offset >= 0, duration >= 0
if offset >= 0 and duration is not None and duration >= 0:
if np.isinf(offset):
return 0, 0.0
if np.isinf(duration):
return offset, None
return offset, duration
# Handle: offset < 0, duration=None
if offset < 0 and duration is None:
return max(0, signal_duration + offset), None
# Handle: offset >= 0, duration=None
if offset >= 0 and duration is None:
if np.isinf(offset):
return 0, 0.0
return offset, None
# Handle: offset < 0, duration > 0
if offset < 0 and duration is not None and duration > 0:
if np.isinf(offset) and np.isinf(duration):
return 0, None
if np.isinf(offset):
return 0, 0.0
if np.isinf(duration):
offset = signal_duration + offset
offset = max(0, offset)
return offset, None
offset = signal_duration + offset
if offset < 0:
duration = max(0, duration + offset)
offset = 0
else:
duration = min(duration, signal_duration - offset)
return offset, duration
# Handle: offset < 0, duration < 0
if offset < 0 and duration < 0:
if np.isinf(offset):
return 0, 0.0
if np.isinf(duration):
duration = -signal_duration
else:
orig_offset = offset
offset = max(0, signal_duration + offset + duration)
duration = min(-duration, signal_duration + orig_offset)
duration = max(0, duration)
return offset, duration
# Fallback (should not reach here)
return offset if offset >= 0 else 0, duration
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Hoist a repeated condition into a parent condition ([`hoist-repeated-if-condition`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/hoist-repeated-if-condition/))
- Remove redundant conditional [×4] ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
- Low code quality found in \_normalize\_offset\_duration - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _needs_sampling_rate( | ||
| duration: float | int | str | np.timedelta64, | ||
| offset: float | int | str | np.timedelta64, | ||
| ) -> bool: | ||
| """Check if sampling rate is needed for parsing offset/duration. | ||
| Args: | ||
| duration: duration value | ||
| offset: offset value | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Logic in _needs_sampling_rate may always return True for duration.
The condition will always be True for string values, making the check redundant and potentially causing unnecessary sampling rate retrieval. Please revise the logic to ensure it only returns True when needed.
| def _normalize_offset_duration( | ||
| offset: float | None, | ||
| duration: float | None, | ||
| signal_duration: float, | ||
| ) -> tuple[float, float | None]: | ||
| """Normalize offset and duration to handle negative values. | ||
| Converts negative offset/duration values (counted from end) | ||
| to positive values (counted from start). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Fallback branch in _normalize_offset_duration may mask logic errors.
Consider raising an exception or logging a warning in this branch to surface potential bugs or unhandled cases, rather than silently normalizing values.
Suggested implementation:
import logging
def _normalize_offset_duration(
offset: float | None,
duration: float | None,
signal_duration: float,
) -> tuple[float, float | None]:
"""Normalize offset and duration to handle negative values.
Converts negative offset/duration values (counted from end)
to positive values (counted from start).
Args:def _normalize_offset_duration(
offset: float | None,
duration: float | None,
signal_duration: float,
) -> tuple[float, float | None]:
"""Normalize offset and duration to handle negative values.
Converts negative offset/duration values (counted from end)
to positive values (counted from start).
Args:
"""
# Example normalization logic (add your actual logic here)
if offset is not None and offset < 0:
offset = signal_duration + offset
if duration is not None and duration < 0:
duration = signal_duration + duration
# Fallback branch: if values are still not normalized as expected
if (offset is not None and (offset < 0 or offset > signal_duration)) or (
duration is not None and (duration < 0 or duration > signal_duration)
):
logging.warning(
"Unexpected offset/duration normalization: offset=%s, duration=%s, signal_duration=%s",
offset, duration, signal_duration
)
# Optionally, raise an exception instead of logging
# raise ValueError(f"Unhandled offset/duration values: offset={offset}, duration={duration}, signal_duration={signal_duration}")
return offset, durationYou may need to adjust the normalization logic to match your actual implementation.
Decide whether you want to log a warning or raise an exception in the fallback branch.
If you choose to raise an exception, uncomment the raise ValueError line and remove the logging.warning line.
| if np.isnan(parsed): | ||
| return None | ||
| return parsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Lift code into else after jump in control flow (
reintroduce-else) - Replace if statement with if expression (
assign-if-exp)
| if np.isnan(parsed): | |
| return None | |
| return parsed | |
| return None if np.isnan(parsed) else parsed |
| if offset is not None and isinstance(offset, str): | ||
| return True | ||
| if offset is not None and offset != 0: | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Hoist a repeated condition into a parent condition (hoist-repeated-if-condition)
| if offset is not None and isinstance(offset, str): | |
| return True | |
| if offset is not None and offset != 0: | |
| return True | |
| if offset is not None: | |
| if isinstance(offset, str): | |
| return True | |
| if offset != 0: | |
| return True |
| return False | ||
|
|
||
|
|
||
| def _normalize_offset_duration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Hoist a repeated condition into a parent condition (
hoist-repeated-if-condition) - Remove redundant conditional [×4] (
remove-redundant-if) - Low code quality found in _normalize_offset_duration - 24% (
low-code-quality)
Explanation
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
|
The current suggestion does not have full code coverage yet. |
Closes #162
Improves code quality of
audiofile.read()by adding extra function for complicated value parsing.Summary by Sourcery
Refactor audiofile.read() by extracting complex offset and duration parsing into dedicated helper functions and simplifying the main read logic
Enhancements: