Skip to content

Refactor generate() method for improved maintainability #29

@JacobPEvans

Description

@JacobPEvans

Problem

The generate() method in bin/goatsearch.py spans approximately 200 lines (232-430) and handles multiple responsibilities:

  • Dataset listing (lines 240-265)
  • Job status polling (lines 280-338)
  • Result collection with pagination (lines 345-419)
  • Event transformation (lines 399-414)
  • Search metadata updates (lines 421-425)

While PEP 8 does not specify a maximum function length, the Clean Code in Python guide (TestDriven.io) recommends that functions should:

  • Be comprehensible in 5-10 minutes of review
  • Follow the Single Responsibility Principle
  • Be easily testable

The current structure makes it difficult to:

  • Test individual behaviors in isolation
  • Understand the flow at a glance
  • Modify one aspect without risking others

Proposed Refactoring

Extract logical sections into focused methods:

New Method Lines Responsibility
_list_datasets() 240-265 Handle no-query case, return datasets
_poll_job_status() 280-338 Poll until running/complete, yield status events
_collect_results() 345-419 Paginate and yield result events
_transform_event() 399-414 Convert API response to Splunk event format

After refactoring, generate() becomes orchestration:

def generate(self):
    if not self.can_run:
        return

    if not self.query and not self.sid:
        yield from self._list_datasets()
        return

    self._prepare_event_search()

    if not self.job_id:
        yield from self.event_log
        return

    yield from self._poll_and_collect_results()

Acceptance Criteria

References

Related Code

  • bin/goatsearch.py lines 232-430

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions