Skip to content

feat: Add elasticsearch instrumentation#882

Open
CagriYonca wants to merge 2 commits into
mainfrom
elasticsearch
Open

feat: Add elasticsearch instrumentation#882
CagriYonca wants to merge 2 commits into
mainfrom
elasticsearch

Conversation

@CagriYonca

Copy link
Copy Markdown
Contributor
  • Added elasticsearch instrumentation.
image

@CagriYonca CagriYonca self-assigned this Jun 18, 2026
@CagriYonca CagriYonca requested a review from a team as a code owner June 18, 2026 09:19
@github-actions

Copy link
Copy Markdown

@CagriYonca the signed-off-by was not found in the following 1 commits:

  • 99fcfab: feat: Add elasticsearch instrumentation

📝 What should I do to fix it?

All proposed commits should include a sign-off in their messages, ideally at the end.

❔ Why it is required

The Developer Certificate of Origin (DCO) is a lightweight way for contributors to certify that they wrote or otherwise have the right to submit the code they are contributing to the project. Here is the full text of the DCO, reformatted for readability:

By making a contribution to this project, I certify that:

a. The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

b. The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

c. The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

d. I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

Contributors sign-off that they adhere to these requirements by adding a Signed-off-by line to commit messages.

This is my commit message

Signed-off-by: Random Developer <randomdeveloper@example.com>

Git even has a -s command line option to append this automatically to your commit message:

$ git commit -s -m 'This is my commit message'

Comment thread src/instana/span/registered_span.py
Comment thread src/instana/span/registered_span.py

# Connection cache to avoid repeated URL parsing and store cluster info
# Structure: {connection_id: {host, port, cluster_name, last_updated}}
_connection_cache: dict[str, dict[str, Any]] = {}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion. You can use defaultdict to create _connection_cache, then you can better manage the usage of the dictionary. But remember, if asking for a missing key _connection_cache, it will return an empty dictionary. WDYT?

Suggested change
_connection_cache: dict[str, dict[str, Any]] = {}
_connection_cache: dict[str, dict[str, Any]] = defaultdict(dict)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

Comment on lines +55 to +63
if connection_id in _connection_cache:
cached = _connection_cache[connection_id]
cluster_name = cached.get("cluster_name")
if (
cluster_name
and (time.time() - cached.get("last_updated", 0))
< CLUSTER_NAME_CACHE_TTL
):
return cluster_name

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using defaultdict as suggested before, you can reduce this code to:

Suggested change
if connection_id in _connection_cache:
cached = _connection_cache[connection_id]
cluster_name = cached.get("cluster_name")
if (
cluster_name
and (time.time() - cached.get("last_updated", 0))
< CLUSTER_NAME_CACHE_TTL
):
return cluster_name
if cached := _connection_cache[connection_id]:
if (cluster_name := cached.get("cluster_name")) and (time.time() - cached.get("last_updated", 0) < CLUSTER_NAME_CACHE_TTL:
return cluster_name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the walrus operator make it harder to read?

Comment thread src/instana/instrumentation/elasticsearch.py Outdated
Comment on lines +87 to +89
cached = _get_cached_cluster_name(connection_id)
if cached:
return cached

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cached = _get_cached_cluster_name(connection_id)
if cached:
return cached
if cached := _get_cached_cluster_name(connection_id):
return cached

Comment on lines +95 to +96
cluster_name = _extract_cluster_name_from_response(instance.info())
if cluster_name:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cluster_name = _extract_cluster_name_from_response(instance.info())
if cluster_name:
if cluster_name := _extract_cluster_name_from_response(instance.info())

Comment on lines +230 to +243
url_lower = url.lower()

# Multi-operations
if "_msearch" in url_lower:
return "msearch"
if "_mget" in url_lower:
return "mget"
if "_bulk" in url_lower:
return "bulk"

# Search operations
if "_search" in url_lower:
return "search"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SonarCloud reported a Cognitive Complexity in this function due to the many if statements. Why not use something like:

Suggested change
url_lower = url.lower()
# Multi-operations
if "_msearch" in url_lower:
return "msearch"
if "_mget" in url_lower:
return "mget"
if "_bulk" in url_lower:
return "bulk"
# Search operations
if "_search" in url_lower:
return "search"
url_lower = url.lower()
if url_lower.startswith("_"):
return url_lower[1,]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Response status code
if hasattr(response, "meta") and hasattr(response.meta, "status"):
status_code = response.meta.status
span.set_attribute(SpanAttributes.HTTP_STATUS_CODE, status_code)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set the HTTP status code attribute since this is a DB-type operation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it uses urllib3 to make DB calls but semantically DB span shouldn't have HTTP_STATUS_CODE. It's been removed

Signed-off-by: Cagri Yonca <cagri@ibm.com>
@CagriYonca CagriYonca force-pushed the elasticsearch branch 2 times, most recently from eb46efa to 4b3de8a Compare June 23, 2026 15:20
@CagriYonca CagriYonca requested a review from pvital June 23, 2026 15:23
Signed-off-by: Cagri Yonca <cagri@ibm.com>
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants