Skip to content

Add search_dataset and list_dataset_fields tools#142

Merged
meirk-brd merged 9 commits into
mainfrom
feat/search-dataset-tool
Jun 4, 2026
Merged

Add search_dataset and list_dataset_fields tools#142
meirk-brd merged 9 commits into
mainfrom
feat/search-dataset-tool

Conversation

@meirk-brd

Copy link
Copy Markdown
Collaborator

Summary

Adds two read-only MCP tools that let an LLM search Bright Data LinkedIn datasets by filter criteria (the fast, Elasticsearch-backed search API), complementing the existing web_data_* tools that fetch a single record by URL.

  • search_dataset(dataset_id, filter, size?, sort?, search_after?)POST /datasets/search/:dataset_id in sync mode; returns {hits, total_hits, took, search_after?} directly (no snapshot/polling). size is capped at max 10 (default 10).
  • list_dataset_fields(dataset_id)GET /datasets/:dataset_id/metadata; returns a compact {name, type, description} list of active fields. Lazy field discovery so the per-dataset schema costs no tokens until the LLM actually decides to filter.

Supported datasets (enum-validated): LinkedIn people profiles, people profiles (contact-enriched), and company information.

Design choices:

  • Thin pass-through filter — a recursive Zod schema mirrors the API and is capped at nesting depth 3; the API validates semantics.
  • Shared schema + a pure metadata_to_fields transform live in search_dataset_schema.js so they're unit-testable without booting the server.
  • Both tools registered in the social and business tool groups; new module added to package.json files.

Test Plan

  • Unit tests for the dataset-id enum, operator list, recursive filter schema (depth-3 accepted / depth-4 rejected), and metadata transform — npm test (16/16 pass)
  • node --check on server.js / search_dataset_schema.js / tool_groups.js
  • Live verification against the Bright Data API (both search and metadata endpoints confirmed working)
  • Reviewer: confirm tool grouping behaves as expected with GROUPS=social / GROUPS=business

@egoriklok

Copy link
Copy Markdown

Public no-secret MCP Buyer-Agent Readiness Snapshot for brightdata-mcp:

  • Public signal: Public MCP candidate brightdata-mcp: A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.. Matched R1 terms: agent, automation, browser, github, mcp, server, tool.
  • R1 fit: agent, automation, browser, github, mcp, server, tool.
  • Readiness status: public evidence review needed before an autonomous buyer-agent should rely on this surface.
  • Blind spot 1: explicit auth scopes and delegated-permission boundary.
  • Blind spot 2: spend/API cost cap plus approval semantics before paid actions.
  • Blind spot 3: receipt, audit-log, revocation, or dispute evidence for safe buyer-agent use.
  • Single next question: For brightdata-mcp, is there already a documented policy for agent spend/auth limits, receipt evidence, and revocation before a buyer-agent can invoke it?

No secrets, invoice, payment link, delivery link, private endpoint, paid call, or wallet signature; this is only a free public snapshot.

@meirk-brd meirk-brd merged commit a62650a into main Jun 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants