Skip to content

Conversation

@pjenvey
Copy link
Member

@pjenvey pjenvey commented May 15, 2019

via _bsolm_index_separate = true

Closes #118

via _bsolm_index_separate = true

Closes #118
if self._bsolm_index_separate and fields is None:
# Split the get_items query (TODO: get_item_ids could
# reference solely the index) query in 2:
# #1: query BsoLastModified directly for bso ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intended goal of using this 2 query version? I assume its an optimization to avoid the multiple executions that occur otherwise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, see my description in the issue #118

Copy link
Member Author

@pjenvey pjenvey May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if this solution looks alright under load testing I'd like to relay those results back to Google, in case they have an alternative or a better explanation for the existing single query's perf (I still hope we won't have to do this..)

EPOCH = datetime.datetime.utcfromtimestamp(0)

# a bogus FIND_ITEMS bind param
BOGUS_ID = object()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used (or planned to be used) anywhere outside of the SpannerStorage.init()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only used in _find_items as basically a placeholder

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have my head around this 100% yet, but some initial comments...

del bind_types["offset"]
# simiarly modified ranges/ttl aren't needed in #2
for param in ("newer", "newer_eq", "older", "older_eq", "ttl"):
params.pop(param, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a big chunk of code in this if block, I wonder if it could be factored out into a helper method of some sort.

fields = params.get("fields")
offset_params = params

if self._bsolm_index_separate and fields is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious why and fields is None here, is it to ensure we only apply this logic in get_items rather than get_item_ids? If so, I wonder if it's worth refactoring _find_items into a couple of helper methods and pushing that logic up into get_items, like:

def get_items(...):
    if self._bsolm_index_separate:
        (params, ids) = self._do_bsolm_separate_query_to_get_the_ids()
    return self._find_items(**params)

def get_item_ids(...):
    return self._find_items(...)

Depends how much effort you want to invest in polishing this though, as opposed to just getting it out in stage to loadtest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, to only apply it to get_items. I like the refactor suggestion but it's a little further complicated by encode_next_offset's need for the original params (offset_params).

I will keep it in mind if this is revisited. This is definitely a "get it out to stage and see" patch


# Setup a 'id IN (:id_1)' bind param for #2 (replacing it
# below)
params["ids"] = [BOGUS_ID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What stops us from just putting the returned ids in params["id"], and basically just pretending that the caller had requested those ids specifically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically to utilize spanner's UNNEST (what it's replaced with below) for query 2.

I do have a TODO for the FIND_ITEMS/sqlalchemy generation piece to do this for us. params["ids"] from the client and/or this case should both utilize UNNEST. Unfortunately coercing the sqlalchemy level to generate it for us is a bit involved, thus the current str replace.

Another quick solution for this is having FIND_ITEMS handle UNNEST by returning a str with the appropriate replacements (also not ideal, further straying from the original FIND_ITEMS code..) 🤷‍♂

@rfk
Copy link
Contributor

rfk commented May 16, 2019

@pjenvey do we need to get this merged before loadtesting, or can we do an initial test from this branch before deciding whether to push ahead with it or not?

@pjenvey
Copy link
Member Author

pjenvey commented May 16, 2019

Sorry I missed the last comment until just now -- I can certainly tag this from the branch for an initial load test if more is needed to merge this PR (I'm totally fine proving out the load test before a full merge adding to the existing pile of hacks here)

@rfk
Copy link
Contributor

rfk commented May 16, 2019

I can certainly tag this from the branch for an initial load test if more is needed to merge this PR

Whatever seems simplest for you. I'm happy to r+ with followup bugs for refactors, but if we're not sure yet whether we'll even keep this code, deploying from a branch feels like it might be the simpler path forward overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a flag for splitting FIND_ITEMS into multiple queries

5 participants