Skip to content

Conversation

@turt2live
Copy link
Member

Reviewable commit-by-commit (recommended - some commit messages have additional context)

Requires element-hq/synapse#19308 on the server.

The ExchangeAPI comes from python-threatexchange, which is the underlying code behind HMA. The interface is a bit awkward at first glance, but the short version is:

  • HMA will inject credentials into our API class for us
  • HMA will also track the checkpoint we reach
  • HMA will call fetch whenever it wants more data. We don't want to block this for too long, but as long as required is fine.
  • Pretty much all of the functions are abstract implementations, with the exception of _hash and _fetch

Much of the code is inspired by existing exchanges in python-threatexchange.

This has been tested locally. One day we should figure out how to put CI on this.

The significant change is in the Dockerfile, where we install poetry so we can install the new exchange (and possibly future ones as well). We try to keep the number of layers relatively small to make the image easier to download/less bloated.

There's also some indirection via Docker Compose Secrets here for local testing/debugging. Essentially, we're just trying to run the environment variable from the machine down to the container, which requires secrets for reasons. Unfortunately, secrets can only write to files inside the container, so we have to extract the contents of that file back into the environment. That extraction is done in `startup.sh`.
@turt2live turt2live marked this pull request as ready for review December 16, 2025 21:35
@turt2live turt2live requested a review from a team as a code owner December 16, 2025 21:35
Copy link

@H-Shay H-Shay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a perfect pythonic world it would be cool to see docstrings but it's not a huge deal

@turt2live turt2live requested a review from H-Shay December 17, 2025 23:40
@turt2live
Copy link
Member Author

I've added some basic docstrings and fixed the request-calling code - please take another look, @H-Shay


# Now process remote media
remote_token = checkpoint.remote_from_token if checkpoint else 0
remote_media_mxcs, next_batch = self._fetch(False, remote_token)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since _fetch (and _hash) can raise an error, would it make sense to try/catch those calls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's not much we can do if we do catch it, so we let the fetcher inside HMA catch+log it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@H-Shay
Copy link

H-Shay commented Dec 18, 2025

Just one question, otherwise looks good, thanks for the docstrings!

@turt2live turt2live merged commit 4f0b967 into main Dec 18, 2025
@turt2live turt2live deleted the travis/quarantined-importer branch December 18, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants