feat: add region and subregion columns to country_mapping datasources (IN-1188)#4297
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds two trailing String columns — region and subregion — to the country_mapping_ds and country_mapping_no_flags_ds Tinybird datasources, and regenerates both NDJSON fixtures (242 rows each) so a future Insights pipe can surface continent/sub-continent geography. The region data is derived by joining the existing hardcoded country list in pipes/country_mapping.pipe against a pinned snapshot of mledoze/countries, via a new one-off generator script. No pipe or endpoint consumes the new columns yet — that is explicit follow-up work.
I verified the PR's non-breaking claim: all 14 consumer pipes (activityRelations_*, contributions_with_local_time) select explicit column tuples (country, country_code, timezone_offset) and none use SELECT *, so the trailing additive columns do not affect them. I also confirmed the generator's output key order matches the datasource column order and that the pipe uses '' (doubled-quote) escaping, which the parser handles correctly.
Changes:
- Added trailing
region/subregionStringcolumns to bothcountry_mapping_*datasource schemas. - Regenerated both NDJSON fixtures (242 rows) with the new columns.
- Added a one-off generator script plus a pinned
mledoze/countriessnapshot (cca2/region/subregion only) for reproducible fixture regeneration.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
services/libs/tinybird/datasources/country_mapping_ds.datasource |
Adds trailing region/subregion columns to schema (non-breaking). |
services/libs/tinybird/datasources/country_mapping_no_flags_ds.datasource |
Adds trailing region/subregion columns to schema (non-breaking). |
services/libs/tinybird/datasources/fixtures/country_mapping_ds.ndjson |
Regenerated fixture (242 rows) including region/subregion. |
services/libs/tinybird/datasources/fixtures/country_mapping_no_flags_ds.ndjson |
Regenerated fixture (242 rows) including region/subregion. |
services/libs/tinybird/scripts/generate_country_region_fixtures.js |
New one-off generator: parses pipe tuples, joins snapshot, writes both fixtures. |
services/libs/tinybird/scripts/mledoze_countries_snapshot.json |
New pinned upstream snapshot trimmed to cca2/region/subregion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const countries = []; | ||
| let match; | ||
| while ((match = tupleRegex.exec(pipeContent)) !== null) { | ||
| const unescape = (s) => s.replace(/''/g, "'"); |
… (IN-1188) Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
0a42b24 to
900ede7
Compare
… (IN-1188) (#4297) Signed-off-by: Gašper Grom <gasper.grom@gmail.com> Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Summary
regionandsubregionString columns (trailing, non-breaking) to thecountry_mapping_dsandcountry_mapping_no_flags_dsTinybird datasources, plus regenerated NDJSON fixtures, so a future Insights pipe can surface continent/sub-continent geography alongside the existing country/timezone mapping.09b28e3d03e6ca3fbbac996d716a50d929781e8c), trimmed down to only thecca2,region, andsubregionfields and checked into the repo asservices/libs/tinybird/scripts/mledoze_countries_snapshot.json— this keeps fixture regeneration reproducible and independent of network access or upstream drift.services/libs/tinybird/scripts/generate_country_region_fixtures.js, parses the existing hardcoded(country, flag, code, offset)tuple literal out ofpipes/country_mapping.pipe, joins each row against the mledoze snapshot by ISO 3166-1 alpha-2 code, and writes the two NDJSON fixtures. It is not part of any build/deploy path — it's a generator, run once to produce the checked-in fixtures.pipes/country_mapping.pipeitself (the source-of-truth pipe with the hardcoded literal) was not modified — it's a separate resource that happens to share a name prefix with the datasources touched here, easy to confuse but out of scope.Changes
services/libs/tinybird/datasources/country_mapping_ds.datasourceregion String, subregion Stringcolumns to SCHEMAservices/libs/tinybird/datasources/country_mapping_no_flags_ds.datasourceregion String, subregion Stringcolumns to SCHEMAservices/libs/tinybird/datasources/fixtures/country_mapping_ds.ndjsonservices/libs/tinybird/datasources/fixtures/country_mapping_no_flags_ds.ndjsonservices/libs/tinybird/scripts/generate_country_region_fixtures.jscountry_mapping.pipe's tuple literal, joins against the mledoze snapshot, writes both fixturesservices/libs/tinybird/scripts/mledoze_countries_snapshot.json09b28e3d03e6ca3fbbac996d716a50d929781e8c, trimmed to cca2/region/subregion onlyJIRA
IN-1188 — https://linuxfoundation.atlassian.net/browse/IN-1188 (Insights project ticket; the underlying Tinybird resources live in crowd.dev's
services/libs/tinybird/, so the implementation PR is here by design)Deploy order
No deploy ordering constraints — self-contained to this repo. The new columns are trailing/additive on both datasources, and all 14 existing consumer pipes select explicit column tuples rather than
SELECT *, so they are unaffected by this change and require no coordinated redeploy.DB migrations
No DB migrations. This is Tinybird datasource schema + fixture work, not a Postgres migration.
Test plan
A full manual QA plan is saved to Obsidian at
LFX/qa/IN-1188-country-mapping-region-subregion.md. Summary of what to verify:.datasourcefiles and confirmregionandsubregionappear as trailingStringcolumns aftertimezone_offset.tbCLI (seeservices/libs/tinybird/README.md), load both fixtures into a local Tinybird instance:tb datasource append country_mapping_ds services/libs/tinybird/datasources/fixtures/country_mapping_ds.ndjsontb datasource append country_mapping_no_flags_ds services/libs/tinybird/datasources/fixtures/country_mapping_no_flags_ds.ndjsonregion; exactly 3 rows havesubregion = "Unknown"— Antarctica (AQ), French Southern Territories (TF), and South Georgia and the South Sandwich Islands (GS). All three should haveregion = "Antarctic"correctly populated (mledoze itself defines no subregion for Antarctic entries, so "Unknown" is expected and correct here, not a bug).Americas, subregionNorth America; JP → regionAsia, subregionEastern Asia.contributions_with_local_time.pipe) and confirm they still return correct results unchanged — they select explicit tuples like(country, country_code, timezone_offset), neverSELECT *, so the new trailing columns should not affect them.pipes/country_mapping.pipeagainstmainand confirm it was NOT modified by this PR — it's a separate, unrelated pipe with a hardcoded literal and a confusingly similar name, explicitly out of scope here.services/libs/tinybird/pipes/forregionorsubregionand confirm none reference the new columns yet — this ticket intentionally does not expose them via any pipe or endpoint; that's follow-up work.Checklist
git commit --signoff -Son every commitgenerate_country_region_fixtures.js).datasourceschema tweaks plus the ~80-line generator script)https://claude.ai/code/session_01Rk1S72wcPkhp1VPtckKVER
Note
Low Risk
Additive trailing schema and reference-data fixtures only; consumer pipes use explicit column lists so behavior should stay unchanged until follow-up work wires region/subregion.
Overview
Adds
regionandsubregionas trailing columns oncountry_mapping_dsandcountry_mapping_no_flags_ds, with JSON ingest paths on existing fields and the new geography fields.Regenerates both NDJSON fixtures (~243 rows) so each country keeps the same name, code, flag, and timezone while gaining continent/sub-continent values from a pinned mledoze/countries snapshot joined by ISO
cca2. TheXXUnknown sentinel row is preserved explicitly because it is not incountry_mapping.pipe.Introduces
generate_country_region_fixtures.js(one-off, not on the deploy path) plus checked-inmledoze_countries_snapshot.jsonfor reproducible fixture generation. No pipes or API endpoints are updated in this PR; existing consumers still select explicit tuples like(country, country_code, timezone_offset).Reviewed by Cursor Bugbot for commit 308b4d9. Bugbot is set up for automated code reviews on this repo. Configure here.