Skip to content

Conversation

@eyala
Copy link
Contributor

@eyala eyala commented Nov 19, 2024

A new method for when you want to de-duplicate records, but not lose any "real" data.

For example if a server creates events with an autogenerated event id, and sometimes
events are duplicated. You don't want double rows just for the event ids, but if any of the other fields are distinct you want to keep the rows (with their original event ids) - otherwise you'd just drop the event id column. In order to keep at least one value you need to tediously list all the other columns.

JIRA: https://issues.apache.org/jira/browse/DATAFU-177

@eyala eyala self-assigned this Dec 9, 2024
@eyala eyala merged commit 6fd6fc4 into apache:main Dec 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant