fix: decimal overflow in enriched packages copy#4305
Conversation
Signed-off-by: anilb <epipav@gmail.com>
|
|
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a decimal-overflow failure in the ossPackages_enriched Tinybird COPY pipe. The pipe computes per-package release-cadence signals from the versions datasource, where publishedAt is a Nullable(DateTime64(3)) replicated from Postgres. Out-of-range date values (far past/future) in that column caused DateTime64 (Decimal-backed) overflow during the daily COPY job, breaking the enriched dataset build.
The fix targets the ossPackages_enriched_release node by (1) replacing the arraySort(x -> -toUnixTimestamp(x), ...) descending-sort trick with the simpler arrayReverseSort(...) — removing the toUnixTimestamp conversion — and (2) bounding publishedAt to the sane range [1970-01-01, now()+1 day) so pathological dates no longer flow into the sort and downstream dateDiff calls.
Changes:
- Replace
arrayElement(arraySort(x -> -toUnixTimestamp(x), groupArray(publishedAt)), 2)witharrayElement(arrayReverseSort(groupArray(publishedAt)), 2)to computesecondPublished(semantically equivalent: newest-first ordering, second element = second-newest). - Add WHERE guards filtering
publishedAtto>= 1970-01-01and< now() + INTERVAL 1 DAY, excluding out-of-range dates that trigger the overflow.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Note
Low Risk
Narrow change to one analytics pipe node; behavior is equivalent for valid dates and only excludes corrupt or extreme timestamps.
Overview
Fixes the ossPackages_enriched_release Tinybird pipe so the scheduled COPY into
ossPackages_enriched_dsno longer fails on decimal overflow.secondPublished no longer uses
arraySortwith negatedtoUnixTimestamp(publishedAt); it now takes the second-most-recent date viaarrayReverseSort(groupArray(publishedAt)), avoiding numeric overflow from timestamp negation.The versions filter now drops out-of-range
publishedAtvalues (before epoch and more than one day in the future) in addition to nulls, so bad dates do not propagate into releaseCadence scoring downstream.Reviewed by Cursor Bugbot for commit 726fd36. Bugbot is set up for automated code reviews on this repo. Configure here.