Skip to content

Conversation

@anup-cloudsufi
Copy link

Background: partitioned reads on an Oracle table would skip any records where the partitioning column (split-by column) contained a NULL value. This caused data loss and incorrect record counts because the underlying logic only generated splits for non-null value ranges. This is applicable for other DB plugins.

Changes involved:

  • overriding DataDrivenETLDBInputFormat.getSplits to check the generated splits and if no split covering all records or a specific IS NULL split is found, adds a dedicated InputSplit using the condition [splitByColumn] IS NULL.
  • relevant unit tests

@anup-cloudsufi anup-cloudsufi marked this pull request as ready for review December 17, 2025 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants