Skip to content

Lookup submision IDs by strategic metadata#112

Open
jerbaroo wants to merge 7 commits into
masterfrom
jerbaroo-64-lookup-by-strategic-metadata
Open

Lookup submision IDs by strategic metadata#112
jerbaroo wants to merge 7 commits into
masterfrom
jerbaroo-64-lookup-by-strategic-metadata

Conversation

@jerbaroo

@jerbaroo jerbaroo commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

This PR adds the lookup_submission_ids_by_strategic_metadata method to ProducerClient, which will return the submission IDs of in-progress subsmissions that match ALL of the provided strategic metadata key-value pairs.

    def lookup_submission_ids_by_strategic_metadata(
        self, strategic_metadata: dict[str, int]
    ) -> list[SubmissionId]:
        """Attempts to find in-progress submissions where the strategic metadata
        of that submission includes all of the key-value pairs of the given
        'strategic_metadata'. A matching submission must include all of the
        given key-value pairs, but it may also contain other key-value pairs.

        Raises:
        - `LookupIdsWithEmptyStrategicMetadataError` if the provided
          'strategic_metadata' contained no key-value pairs to look for.

        """

@jerbaroo jerbaroo requested a review from Qqwy June 8, 2026 15:58
@jerbaroo jerbaroo force-pushed the jerbaroo-64-lookup-by-strategic-metadata branch from 3b0ca52 to 6f9e1e3 Compare June 8, 2026 15:58
@ReinierMaas

Copy link
Copy Markdown
Contributor

Open question shouldn't an empty StrategicMetadataMap return all submissions instead of raising an error on that?
Also this is an easy way to DDoS OpsQueue if the number of returned submissions is large.

@jerbaroo

Copy link
Copy Markdown
Contributor Author

Open question shouldn't an empty StrategicMetadataMap return all submissions instead of raising an error on that?

It depends on the API contract we want to provide. An argument in favour of throwing an error is that if a caller is using an API to "search for submissions matching some strategic metadata" but they haven't provided any strategic metadata, then they possibly have an issue in their code that they should be alerted to. Also, the SQLite query will need a special case to handle this, more code for something we don't have evidence for that it is a usecase we want to support.

@ReinierMaas ReinierMaas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comment and would like to point out that for the other queries we also assert the query plan. This allows us to spot potentially very bad execution behaviour.

Comment on lines +382 to +384
Raises:
- `LookupIdsWithEmptyStrategicMetadataError` if the provided
'strategic_metadata' contained no key-value pairs to look for.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this potentially also raise:

- `InternalProducerClientError` if there is a low-level internal error.

Comment on lines +374 to +376
def lookup_submission_ids_by_strategic_metadata(
self, strategic_metadata: dict[str, int]
) -> list[SubmissionId]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the other APIs on the producer and they either provide a single element or an iterator so that it can be lazily evaluated and doesn't need to be materialized in memory all at once.

async fn lookup_submission_ids_by_strategic_metadata(
State(state): State<ServerState>,
extract::Json(strategic_metadata): extract::Json<StrategicMetadataMap>,
) -> Result<Json<Vec<SubmissionId>>, ServerError> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our SubmissionIds are close to u64::max in serialized size, which is reasonable for our random (snowflake) identifiers. This issues the following serialized to JSON response size:

  • 1 ([) + elements * 20 + elements (,|]) characters

For ~1 million this is ~20MB. We can put that as an configurable upper bound to return to the client with an explicit error if it goes over that so that people can update the settings and return the larger response knowingly.

Comment on lines +382 to +384
Raises:
- `LookupIdsWithEmptyStrategicMetadataError` if the provided
'strategic_metadata' contained no key-value pairs to look for.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the interface to raise an explicit error if there are too many matching Submissions, we need that for DoS protection anyway, but allow an empty strategic_metadata set to be processed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support checking whether a submission currently exists in the queue for a particular piece of strategic metadata

2 participants