-
Notifications
You must be signed in to change notification settings - Fork 3
Unify counters report #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the optimization tracking system and improves error handling in the IntelliPerf GPU kernel optimization tool. The changes centralize best-iteration tracking logic into the OptimizationTracker class and enhance process monitoring/timeout handling.
- Extracted duplicate kernel file finding logic into a shared
find_kernel_file()method - Refactored
OptimizationTrackerto automatically calculate improvement metrics and determine success - Improved timeout handling with dynamic timeouts based on baseline performance and better error messages for crashes vs timeouts
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| formula_base.py | Added find_kernel_file() helper, enhanced OptimizationTracker with auto-calculation of improvement metrics, added compiler error filtering, improved Accordo timeout/crash detection |
| memory_access.py | Removed duplicate tracking code, delegated to centralized OptimizationTracker methods, fixed metric names (latency→lat), updated failure metrics to use baseline values |
| bank_conflict.py | Aligned with refactored tracker, changed maximize=False for minimize-type metrics, removed manual best-tracking code |
| atomic_contention.py | Similar refactoring as memory_access and bank_conflict, added terminal access documentation to signature |
| swizzling.py | Used shared find_kernel_file() helper, improved error message handling for iterations |
| communicate.py | Enhanced IPC timeout logic with dynamic timeouts, added process crash detection, reduced debug log spam |
| pyproject.toml | Lowered Python requirement from 3.10 to 3.9 |
| .gitignore | Added .rocprofv3/ to ignored files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not self.maximize: | ||
| improvement = before / after if after != 0 else 1.0 | ||
| else: | ||
| improvement = after / before if after != 0 else 1.0 |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The improvement calculation logic is identical for both branches. When maximize=False (minimize raw metric), improvement should be before / after (e.g., 10 conflicts → 5 conflicts = 10/5 = 2x improvement). When maximize=True (maximize raw metric), improvement should also be after / before (e.g., 50% → 90% coalescing = 90/50 = 1.8x improvement). However, both branches calculate the same formula, making the conditional redundant. The logic appears correct but the code structure suggests they should differ.
| # With proper improvement calculation, we always want higher improvement values | ||
| if new_val > cur_val: |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment 'we always want higher improvement values' contradicts the maximize parameter in the tracker initialization. The code now ignores the maximize flag when selecting the best step, always preferring higher values. This is inconsistent with the tracker's design where maximize=False indicates we're minimizing the raw metric (though improvement ratio should still be maximized). Consider clarifying whether maximize applies to the raw metric or the improvement ratio.
| # With proper improvement calculation, we always want higher improvement values | |
| if new_val > cur_val: | |
| # Select best step based on maximize flag: prefer higher values if maximizing, lower if minimizing | |
| if (self.maximize and new_val > cur_val) or (not self.maximize and new_val < cur_val): |
| max_iterations=self.num_attempts, | ||
| primary_metric="coal_improvement", | ||
| maximize=True, | ||
| maximize=True, # True = maximize raw metric (coalescing %) |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline comment states 'True = maximize raw metric (coalescing %)', but based on the add_step logic in formula_base.py, the maximize parameter doesn't actually affect the improvement calculation since both branches compute the same formula. This comment may mislead readers about the parameter's effect.
| maximize=True, # True = maximize raw metric (coalescing %) | |
| maximize=True, # Currently has no effect; improvement is always calculated as optimized_coal / unoptimized_coal |
| max_iterations=self.num_attempts, | ||
| primary_metric="conflict_improvement", | ||
| maximize=True, | ||
| maximize=False, # False = minimize raw metric (conflicts) |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to memory_access.py, this comment suggests maximize=False has an effect on improvement calculation, but the actual code in formula_base.py line 142-145 shows both branches calculate the same formula. The comment should clarify that maximize is not currently used in improvement calculations.
| maximize=False, # False = minimize raw metric (conflicts) | |
| maximize=False, # Note: 'maximize' is not currently used in improvement calculations; improvement is always calculated as unoptimized_conflicts / optimized_conflicts. |
| max_iterations=self.num_attempts, | ||
| primary_metric="latency_improvement", | ||
| maximize=True, | ||
| maximize=False, # False = minimize raw metric (latency) |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as in bank_conflict.py and memory_access.py - the maximize parameter doesn't affect the improvement calculation as both branches in formula_base.py compute identical formulas. The comment should be updated to reflect the actual behavior.
| thread = threading.Thread(target=target, daemon=True) | ||
| thread.start() | ||
| thread.join(timeout=timeout_seconds) | ||
|
|
||
| if thread.is_alive(): | ||
| # Thread is still running - timeout occurred | ||
| raise TimeoutError(f"Operation timed out after {timeout_seconds} seconds") |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using daemon threads with timeout creates a resource leak - if the timeout occurs, the daemon thread continues running in the background and cannot be stopped (the function it's executing may hold resources). Consider using a non-daemon thread with proper cleanup, or document that this is acceptable for the IPC operations being performed.
| time_after_ms = None | ||
| if metrics and "optimized_coal" in metrics: | ||
| # Only use the measured value if it actually improved (maximize=True, so higher is better) | ||
| if metrics["optimized_coal"] > self.baseline_coalesced_pct: |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition checks if optimized_coal > baseline_coalesced_pct, but baseline_coalesced_pct is the same as unoptimized_coal (both represent the initial measurement). The comment on line 733 says 'Only use the measured value if it actually improved (maximize=True, so higher is better)', but the comparison should be against metrics['unoptimized_coal'] from the best step, not the baseline, to properly validate improvement in the best optimization attempt.
| if metrics["optimized_coal"] > self.baseline_coalesced_pct: | |
| if metrics["optimized_coal"] > metrics.get("unoptimized_coal", self.baseline_coalesced_pct): |
| if metrics["optimized_conflicts"] < self.baseline_bank_conflicts: | ||
| metric_after = metrics["optimized_conflicts"] |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue to memory_access.py - this compares against baseline_bank_conflicts instead of the unoptimized value from the best step. If there were multiple iterations, the best step's unoptimized_conflicts may differ from the initial baseline, making this comparison incorrect for determining whether the best step actually improved.
| time_after_ms = None | ||
| if metrics and "optimized_lat" in metrics: | ||
| # Only use the measured value if it actually improved | ||
| if metrics["optimized_lat"] < self.baseline_atomic_latency: |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as memory_access.py and bank_conflict.py - comparing against baseline instead of the best step's unoptimized value. Should compare metrics['optimized_lat'] < metrics.get('unoptimized_lat', self.baseline_atomic_latency) to properly validate improvement in the best step.
| if metrics["optimized_lat"] < self.baseline_atomic_latency: | |
| if metrics["optimized_lat"] < metrics.get("unoptimized_lat", self.baseline_atomic_latency): |
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
- Add OptimizationTracker helper methods: is_successful(), get_best_code(), get_best_report(), get_best_metrics() - Add auto-determination of success based on improvement and speedup - Add _filter_compiler_errors() static method for cleaner compiler output - Add process crash detection in Accordo IPC with process_pid - Add enhanced timeout error messages with baseline context - Maintain all Accordo API improvements from feature branch
**New Accordo API (clean abstraction):**
- New Accordo module structure with public API (__init__, config, result, snapshot, exceptions, validator)
- Internal implementation (_internal/codegen, hip_interop, ipc/communication)
- Snapshot-based caching: capture reference once, compare multiple optimized versions
- AccordoValidator with proper timeout handling and error detection
- KernelArg dataclass for cleaner argument specification
**Formula changes (preserves all main improvements):**
- Updated correctness_validation_pass() to use new Accordo API with caching
- Added write_and_log_optimized_code() to centralize iteration logging
- All formulas use write_and_log_optimized_code() for immediate LLM output capture
- find_kernel_file() already in main - formulas use it correctly
**Infrastructure:**
- Logger.save_iteration_code() for centralized iteration file management
- __main__.py passes trace_path to logger
- All examples/*.hip updated with hip_try macro for error handling
- .gitignore: added .rocprofv3/
**Preserved from main ("Unify counters report" #155):**
- OptimizationTracker helper methods (is_successful, get_best_code, etc.)
- Auto-success determination logic
- Proper tie-breaker with speedup
- _filter_compiler_errors() method
- Correct failure metrics (speedup 0.0 for build, 1.0 for correctness)
- Correct performance_validation flow (add_step first, query values, build report)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
**New Accordo API (clean abstraction):**
- New Accordo module structure with public API (__init__, config, result, snapshot, exceptions, validator)
- Internal implementation (_internal/codegen, hip_interop, ipc/communication)
- Snapshot-based caching: capture reference once, compare multiple optimized versions
- AccordoValidator with proper timeout handling and error detection
- KernelArg dataclass for cleaner argument specification
**Formula changes (preserves all main improvements):**
- Updated correctness_validation_pass() to use new Accordo API with caching
- Added write_and_log_optimized_code() to centralize iteration logging
- All formulas use write_and_log_optimized_code() for immediate LLM output capture
- find_kernel_file() already in main - formulas use it correctly
**Infrastructure:**
- Logger.save_iteration_code() for centralized iteration file management
- __main__.py passes trace_path to logger
- All examples/*.hip updated with hip_try macro for error handling
- .gitignore: added .rocprofv3/
**Preserved from main ("Unify counters report" #155):**
- OptimizationTracker helper methods (is_successful, get_best_code, etc.)
- Auto-success determination logic
- Proper tie-breaker with speedup
- _filter_compiler_errors() method
- Correct failure metrics (speedup 0.0 for build, 1.0 for correctness)
- Correct performance_validation flow (add_step first, query values, build report)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
No description provided.