-
Notifications
You must be signed in to change notification settings - Fork 24
feat: optionally split FIND_ITEMS' query in 2 #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
via _bsolm_index_separate = true Closes #118
| if self._bsolm_index_separate and fields is None: | ||
| # Split the get_items query (TODO: get_item_ids could | ||
| # reference solely the index) query in 2: | ||
| # #1: query BsoLastModified directly for bso ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the intended goal of using this 2 query version? I assume its an optimization to avoid the multiple executions that occur otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, see my description in the issue #118
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if this solution looks alright under load testing I'd like to relay those results back to Google, in case they have an alternative or a better explanation for the existing single query's perf (I still hope we won't have to do this..)
| EPOCH = datetime.datetime.utcfromtimestamp(0) | ||
|
|
||
| # a bogus FIND_ITEMS bind param | ||
| BOGUS_ID = object() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used (or planned to be used) anywhere outside of the SpannerStorage.init()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only used in _find_items as basically a placeholder
rfk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have my head around this 100% yet, but some initial comments...
| del bind_types["offset"] | ||
| # simiarly modified ranges/ttl aren't needed in #2 | ||
| for param in ("newer", "newer_eq", "older", "older_eq", "ttl"): | ||
| params.pop(param, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's quite a big chunk of code in this if block, I wonder if it could be factored out into a helper method of some sort.
| fields = params.get("fields") | ||
| offset_params = params | ||
|
|
||
| if self._bsolm_index_separate and fields is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious why and fields is None here, is it to ensure we only apply this logic in get_items rather than get_item_ids? If so, I wonder if it's worth refactoring _find_items into a couple of helper methods and pushing that logic up into get_items, like:
def get_items(...):
if self._bsolm_index_separate:
(params, ids) = self._do_bsolm_separate_query_to_get_the_ids()
return self._find_items(**params)
def get_item_ids(...):
return self._find_items(...)
Depends how much effort you want to invest in polishing this though, as opposed to just getting it out in stage to loadtest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, to only apply it to get_items. I like the refactor suggestion but it's a little further complicated by encode_next_offset's need for the original params (offset_params).
I will keep it in mind if this is revisited. This is definitely a "get it out to stage and see" patch
|
|
||
| # Setup a 'id IN (:id_1)' bind param for #2 (replacing it | ||
| # below) | ||
| params["ids"] = [BOGUS_ID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What stops us from just putting the returned ids in params["id"], and basically just pretending that the caller had requested those ids specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically to utilize spanner's UNNEST (what it's replaced with below) for query 2.
I do have a TODO for the FIND_ITEMS/sqlalchemy generation piece to do this for us. params["ids"] from the client and/or this case should both utilize UNNEST. Unfortunately coercing the sqlalchemy level to generate it for us is a bit involved, thus the current str replace.
Another quick solution for this is having FIND_ITEMS handle UNNEST by returning a str with the appropriate replacements (also not ideal, further straying from the original FIND_ITEMS code..) 🤷♂
|
@pjenvey do we need to get this merged before loadtesting, or can we do an initial test from this branch before deciding whether to push ahead with it or not? |
|
Sorry I missed the last comment until just now -- I can certainly tag this from the branch for an initial load test if more is needed to merge this PR (I'm totally fine proving out the load test before a full merge adding to the existing pile of hacks here) |
Whatever seems simplest for you. I'm happy to r+ with followup bugs for refactors, but if we're not sure yet whether we'll even keep this code, deploying from a branch feels like it might be the simpler path forward overall. |
via _bsolm_index_separate = true
Closes #118