Skip to content

feat: add region and subregion columns to country_mapping datasources (IN-1188)#4297

Merged
gaspergrom merged 2 commits into
mainfrom
feat/IN-1188-country-region-subregion
Jul 3, 2026
Merged

feat: add region and subregion columns to country_mapping datasources (IN-1188)#4297
gaspergrom merged 2 commits into
mainfrom
feat/IN-1188-country-region-subregion

Conversation

@gaspergrom

@gaspergrom gaspergrom commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds region and subregion String columns (trailing, non-breaking) to the country_mapping_ds and country_mapping_no_flags_ds Tinybird datasources, plus regenerated NDJSON fixtures, so a future Insights pipe can surface continent/sub-continent geography alongside the existing country/timezone mapping.
  • The new fixtures are derived from a pinned snapshot of mledoze/countries (commit 09b28e3d03e6ca3fbbac996d716a50d929781e8c), trimmed down to only the cca2, region, and subregion fields and checked into the repo as services/libs/tinybird/scripts/mledoze_countries_snapshot.json — this keeps fixture regeneration reproducible and independent of network access or upstream drift.
  • A new one-off script, services/libs/tinybird/scripts/generate_country_region_fixtures.js, parses the existing hardcoded (country, flag, code, offset) tuple literal out of pipes/country_mapping.pipe, joins each row against the mledoze snapshot by ISO 3166-1 alpha-2 code, and writes the two NDJSON fixtures. It is not part of any build/deploy path — it's a generator, run once to produce the checked-in fixtures.
  • Out of scope, intentionally: this PR does not wire region/subregion into any pipe or endpoint, and does not touch Insights frontend consumption — both are explicit follow-up work per the ticket. pipes/country_mapping.pipe itself (the source-of-truth pipe with the hardcoded literal) was not modified — it's a separate resource that happens to share a name prefix with the datasources touched here, easy to confuse but out of scope.

Changes

File What changed
services/libs/tinybird/datasources/country_mapping_ds.datasource Added trailing region String, subregion String columns to SCHEMA
services/libs/tinybird/datasources/country_mapping_no_flags_ds.datasource Added trailing region String, subregion String columns to SCHEMA
services/libs/tinybird/datasources/fixtures/country_mapping_ds.ndjson Regenerated, 242 rows, now includes region/subregion
services/libs/tinybird/datasources/fixtures/country_mapping_no_flags_ds.ndjson Regenerated, 242 rows, now includes region/subregion
services/libs/tinybird/scripts/generate_country_region_fixtures.js New one-off generator script: parses country_mapping.pipe's tuple literal, joins against the mledoze snapshot, writes both fixtures
services/libs/tinybird/scripts/mledoze_countries_snapshot.json New pinned snapshot of mledoze/countries at commit 09b28e3d03e6ca3fbbac996d716a50d929781e8c, trimmed to cca2/region/subregion only

JIRA

IN-1188 — https://linuxfoundation.atlassian.net/browse/IN-1188 (Insights project ticket; the underlying Tinybird resources live in crowd.dev's services/libs/tinybird/, so the implementation PR is here by design)

Deploy order

No deploy ordering constraints — self-contained to this repo. The new columns are trailing/additive on both datasources, and all 14 existing consumer pipes select explicit column tuples rather than SELECT *, so they are unaffected by this change and require no coordinated redeploy.

DB migrations

No DB migrations. This is Tinybird datasource schema + fixture work, not a Postgres migration.

Test plan

A full manual QA plan is saved to Obsidian at LFX/qa/IN-1188-country-mapping-region-subregion.md. Summary of what to verify:

  1. Open both .datasource files and confirm region and subregion appear as trailing String columns after timezone_offset.
  2. Using the tb CLI (see services/libs/tinybird/README.md), load both fixtures into a local Tinybird instance:
    • tb datasource append country_mapping_ds services/libs/tinybird/datasources/fixtures/country_mapping_ds.ndjson
    • tb datasource append country_mapping_no_flags_ds services/libs/tinybird/datasources/fixtures/country_mapping_no_flags_ds.ndjson
  3. Confirm both datasources report a row count of 242 after loading.
  4. Query both datasources and confirm: 0 rows have an empty-string region; exactly 3 rows have subregion = "Unknown" — Antarctica (AQ), French Southern Territories (TF), and South Georgia and the South Sandwich Islands (GS). All three should have region = "Antarctic" correctly populated (mledoze itself defines no subregion for Antarctic entries, so "Unknown" is expected and correct here, not a bug).
  5. Spot check a handful of countries across different continents, e.g. US → region Americas, subregion North America; JP → region Asia, subregion Eastern Asia.
  6. Push 1–2 of the existing consumer pipes locally (e.g. contributions_with_local_time.pipe) and confirm they still return correct results unchanged — they select explicit tuples like (country, country_code, timezone_offset), never SELECT *, so the new trailing columns should not affect them.
  7. Diff pipes/country_mapping.pipe against main and confirm it was NOT modified by this PR — it's a separate, unrelated pipe with a hardcoded literal and a confusingly similar name, explicitly out of scope here.
  8. Grep all pipes under services/libs/tinybird/pipes/ for region or subregion and confirm none reference the new columns yet — this ticket intentionally does not expose them via any pipe or endpoint; that's follow-up work.

Checklist

  • git commit --signoff -S on every commit
  • MIT license header on the new source file (generate_country_region_fixtures.js)
  • PR diff under ~1000 lines of hand-authored change (raw diff is ~1821 lines but ~1736 of those are two generated NDJSON fixtures and a checked-in third-party JSON snapshot; hand-written logic is the two .datasource schema tweaks plus the ~80-line generator script)
  • No unrelated changes bundled
  • Cross-repo impact documented — none for this PR; Insights follow-up will be a separate PR once a pipe/endpoint exposes these columns

https://claude.ai/code/session_01Rk1S72wcPkhp1VPtckKVER


Note

Low Risk
Additive trailing schema and reference-data fixtures only; consumer pipes use explicit column lists so behavior should stay unchanged until follow-up work wires region/subregion.

Overview
Adds region and subregion as trailing columns on country_mapping_ds and country_mapping_no_flags_ds, with JSON ingest paths on existing fields and the new geography fields.

Regenerates both NDJSON fixtures (~243 rows) so each country keeps the same name, code, flag, and timezone while gaining continent/sub-continent values from a pinned mledoze/countries snapshot joined by ISO cca2. The XX Unknown sentinel row is preserved explicitly because it is not in country_mapping.pipe.

Introduces generate_country_region_fixtures.js (one-off, not on the deploy path) plus checked-in mledoze_countries_snapshot.json for reproducible fixture generation. No pipes or API endpoints are updated in this PR; existing consumers still select explicit tuples like (country, country_code, timezone_offset).

Reviewed by Cursor Bugbot for commit 308b4d9. Bugbot is set up for automated code reviews on this repo. Configure here.

Copilot AI review requested due to automatic review settings July 2, 2026 20:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two trailing String columns — region and subregion — to the country_mapping_ds and country_mapping_no_flags_ds Tinybird datasources, and regenerates both NDJSON fixtures (242 rows each) so a future Insights pipe can surface continent/sub-continent geography. The region data is derived by joining the existing hardcoded country list in pipes/country_mapping.pipe against a pinned snapshot of mledoze/countries, via a new one-off generator script. No pipe or endpoint consumes the new columns yet — that is explicit follow-up work.

I verified the PR's non-breaking claim: all 14 consumer pipes (activityRelations_*, contributions_with_local_time) select explicit column tuples (country, country_code, timezone_offset) and none use SELECT *, so the trailing additive columns do not affect them. I also confirmed the generator's output key order matches the datasource column order and that the pipe uses '' (doubled-quote) escaping, which the parser handles correctly.

Changes:

  • Added trailing region/subregion String columns to both country_mapping_* datasource schemas.
  • Regenerated both NDJSON fixtures (242 rows) with the new columns.
  • Added a one-off generator script plus a pinned mledoze/countries snapshot (cca2/region/subregion only) for reproducible fixture regeneration.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/libs/tinybird/datasources/country_mapping_ds.datasource Adds trailing region/subregion columns to schema (non-breaking).
services/libs/tinybird/datasources/country_mapping_no_flags_ds.datasource Adds trailing region/subregion columns to schema (non-breaking).
services/libs/tinybird/datasources/fixtures/country_mapping_ds.ndjson Regenerated fixture (242 rows) including region/subregion.
services/libs/tinybird/datasources/fixtures/country_mapping_no_flags_ds.ndjson Regenerated fixture (242 rows) including region/subregion.
services/libs/tinybird/scripts/generate_country_region_fixtures.js New one-off generator: parses pipe tuples, joins snapshot, writes both fixtures.
services/libs/tinybird/scripts/mledoze_countries_snapshot.json New pinned upstream snapshot trimmed to cca2/region/subregion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const countries = [];
let match;
while ((match = tupleRegex.exec(pipeContent)) !== null) {
const unescape = (s) => s.replace(/''/g, "'");
… (IN-1188)

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
@gaspergrom gaspergrom force-pushed the feat/IN-1188-country-region-subregion branch from 0a42b24 to 900ede7 Compare July 3, 2026 09:05
@gaspergrom gaspergrom merged commit c5a661c into main Jul 3, 2026
16 checks passed
@gaspergrom gaspergrom deleted the feat/IN-1188-country-region-subregion branch July 3, 2026 09:26
skwowet pushed a commit that referenced this pull request Jul 3, 2026
… (IN-1188) (#4297)

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants