Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@


def _args_match(actual_args: dict, expected_args: dict) -> bool:
if sorted(actual_args.get("keywords") or []) != sorted(expected_args.get("keywords") or []):
# Only keywords and object_types determine semantic correctness.
# limit is optional with a server-side default; emit_widget was renamed to
# user_requested_search in the tool schema — neither affects search quality.
actual_kw = sorted(k.lower() for k in (actual_args.get("keywords") or []))
expected_kw = sorted(k.lower() for k in (expected_args.get("keywords") or []))
if actual_kw != expected_kw:
return False
if sorted(actual_args.get("object_types") or []) != sorted(expected_args.get("object_types") or []):
return False
if actual_args.get("limit") != expected_args.get("limit"):
return False
return actual_args.get("emit_widget") == expected_args.get("emit_widget")
return sorted(actual_args.get("object_types") or []) == sorted(expected_args.get("object_types") or [])
Comment on lines +12 to +16

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Harden _args_match against malformed JSON argument types.

parsed_arguments() returns raw model-emitted JSON, so a bad tool call like {"keywords":[1]} or mixed object_types will raise here (.lower() / sorted(...)) and abort evaluation instead of producing tool_correctness=False. Treat non-string entries as invalid input and normalize defensively.

Proposed fix
 def _args_match(actual_args: dict, expected_args: dict) -> bool:
     # Only keywords and object_types determine semantic correctness.
     # limit is optional with a server-side default; emit_widget was renamed to
     # user_requested_search in the tool schema — neither affects search quality.
-    actual_kw = sorted(k.lower() for k in (actual_args.get("keywords") or []))
-    expected_kw = sorted(k.lower() for k in (expected_args.get("keywords") or []))
+    def _normalize_str_list(value: object, *, lowercase: bool = False) -> list[str]:
+        if not isinstance(value, list):
+            return []
+        items = [item for item in value if isinstance(item, str)]
+        return sorted(item.lower() if lowercase else item for item in items)
+
+    actual_kw = _normalize_str_list(actual_args.get("keywords"), lowercase=True)
+    expected_kw = _normalize_str_list(expected_args.get("keywords"), lowercase=True)
     if actual_kw != expected_kw:
         return False
-    return sorted(actual_args.get("object_types") or []) == sorted(expected_args.get("object_types") or [])
+    return _normalize_str_list(actual_args.get("object_types")) == _normalize_str_list(
+        expected_args.get("object_types")
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
actual_kw = sorted(k.lower() for k in (actual_args.get("keywords") or []))
expected_kw = sorted(k.lower() for k in (expected_args.get("keywords") or []))
if actual_kw != expected_kw:
return False
if sorted(actual_args.get("object_types") or []) != sorted(expected_args.get("object_types") or []):
return False
if actual_args.get("limit") != expected_args.get("limit"):
return False
return actual_args.get("emit_widget") == expected_args.get("emit_widget")
return sorted(actual_args.get("object_types") or []) == sorted(expected_args.get("object_types") or [])
def _normalize_str_list(value: object, *, lowercase: bool = False) -> list[str]:
if not isinstance(value, list):
return []
items = [item for item in value if isinstance(item, str)]
return sorted(item.lower() if lowercase else item for item in items)
actual_kw = _normalize_str_list(actual_args.get("keywords"), lowercase=True)
expected_kw = _normalize_str_list(expected_args.get("keywords"), lowercase=True)
if actual_kw != expected_kw:
return False
return _normalize_str_list(actual_args.get("object_types")) == _normalize_str_list(
expected_args.get("object_types")
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/gooddata-eval/src/gooddata_eval/core/evaluators/search_tool.py`
around lines 12 - 16, The _args_match comparison is not defensive enough and can
crash on malformed tool arguments from parsed_arguments(). Update _args_match to
validate and normalize actual_args["keywords"] and actual_args["object_types"]
before lowercasing or sorting, treating any non-string or unexpected entry as a
mismatch instead of raising. Keep the existing matching behavior for valid
inputs, but ensure bad JSON like mixed types or numeric keywords returns False
rather than aborting evaluation.



class SearchToolEvaluator:
Expand Down
Loading