Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] use existed env variable instead of introducing new one #1251

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

louie-tsai
Copy link
Collaborator

@louie-tsai louie-tsai commented Feb 1, 2025

Description

Remove new environment variable and use existed one to disable/enable telemetry feature
PR#1168

Issues

NA

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

no

Tests

manually testing

@louie-tsai louie-tsai force-pushed the otlp_disable_by_default branch 4 times, most recently from 222244b to c6165d6 Compare February 4, 2025 00:20
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, matches env var used in telemetry code:

GenAIComps/comps$ git grep TELEMETRY | grep py:
__init__.py:if os.getenv("ENABLE_OPEA_TELEMETRY", "false").lower() == "true":
cores/telemetry/opea_telemetry.py:telemetry_endpoint = os.environ.get("TELEMETRY_ENDPOINT", "http://localhost:4318/v1/traces")

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of enabling/disabling the tracing in comps/init.py, could we do this in the module comps.cores.telemetry.opea_telemetry.opea_telemetry itself.

Doing a simple grep, I found other places just directly import the opea_telemetry.

$ grep -r 'import opea_telemetry' `find -name "*.py"`
./comps/__init__.py:    from comps.cores.telemetry.opea_telemetry import opea_telemetry
./comps/rerankings/src/opea_reranking_microservice.py:from comps.cores.telemetry.opea_telemetry import opea_telemetry
./comps/embeddings/src/opea_embedding_microservice.py:from comps.cores.telemetry.opea_telemetry import opea_telemetry
./comps/cores/mega/orchestrator.py:from ..telemetry.opea_telemetry import opea_telemetry, tracer
./comps/llms/src/text-generation/opea_llm_microservice.py:from comps.cores.telemetry.opea_telemetry import opea_telemetry

@louie-tsai
Copy link
Collaborator Author

Instead of enabling/disabling the tracing in comps/init.py, could we do this in the module comps.cores.telemetry.opea_telemetry.opea_telemetry itself.

Doing a simple grep, I found other places just directly import the opea_telemetry.

$ grep -r 'import opea_telemetry' `find -name "*.py"`
./comps/__init__.py:    from comps.cores.telemetry.opea_telemetry import opea_telemetry
./comps/rerankings/src/opea_reranking_microservice.py:from comps.cores.telemetry.opea_telemetry import opea_telemetry
./comps/embeddings/src/opea_embedding_microservice.py:from comps.cores.telemetry.opea_telemetry import opea_telemetry
./comps/cores/mega/orchestrator.py:from ..telemetry.opea_telemetry import opea_telemetry, tracer
./comps/llms/src/text-generation/opea_llm_microservice.py:from comps.cores.telemetry.opea_telemetry import opea_telemetry

Actually, we probably should remove all Telemetry Endpoint env variables from default compose.yaml and enable it only using an additional telemetry.yaml file.
We used this mechanism in below PR.
opea-project/GenAIExamples#1488
Does it work?

@louie-tsai louie-tsai force-pushed the otlp_disable_by_default branch from c6165d6 to 7568ece Compare February 11, 2025 07:21
@lianhao
Copy link
Collaborator

lianhao commented Feb 11, 2025

Actually, we probably should remove all Telemetry Endpoint env variables from default compose.yaml and enable it only using an additional telemetry.yaml file.
We used this mechanism in below PR.
opea-project/GenAIExamples#1488
Does it work?

I think this PR is not related whether we configure the tracing endpoint through environment variable or command line parameter(I'm fine to any of the option). My point is that we should disable the tracing by default, currently it assumes a tracing endpoint is available at localhost:4318 by default which will result connection refused errors even when users are not set any tracing related configurations:

Below is the excerpt of container logs of the retriever service in latest chatqna example without any tracing configuration enabled.

retriever-redis-server  | The above exception was the direct cause of the following exception:
retriever-redis-server  |
retriever-redis-server  | Traceback (most recent call last):
retriever-redis-server  |   File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
retriever-redis-server  |     resp = conn.urlopen(
retriever-redis-server  |            ^^^^^^^^^^^^^
retriever-redis-server  |   File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retriever-redis-server  |     retries = retries.increment(
retriever-redis-server  |               ^^^^^^^^^^^^^^^^^^
retriever-redis-server  |   File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 519, in increment
retriever-redis-server  |     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
retriever-redis-server  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                   retriever-redis-server  | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError
('<urllib3.connection.HTTPConnection object at 0x7fabf9c97e90>: Failed to establish a new connection: [Errno 111] Connection refused'))  

@xiguiw
Copy link
Collaborator

xiguiw commented Feb 25, 2025

Actually, we probably should remove all Telemetry Endpoint env variables from default compose.yaml and enable it only using an additional telemetry.yaml file.
We used this mechanism in below PR.
opea-project/GenAIExamples#1488
Does it work?

I think this PR is not related whether we configure the tracing endpoint through environment variable or command line parameter(I'm fine to any of the option). My point is that we should disable the tracing by default, currently it assumes a tracing endpoint is available at localhost:4318 by default which will result connection refused errors even when users are not set any tracing related configurations:

Below is the excerpt of container logs of the retriever service in latest chatqna example without any tracing configuration enabled.

retriever-redis-server  | The above exception was the direct cause of the following exception:
retriever-redis-server  |
retriever-redis-server  | Traceback (most recent call last):
retriever-redis-server  |   File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
retriever-redis-server  |     resp = conn.urlopen(
retriever-redis-server  |            ^^^^^^^^^^^^^
retriever-redis-server  |   File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retriever-redis-server  |     retries = retries.increment(
retriever-redis-server  |               ^^^^^^^^^^^^^^^^^^
retriever-redis-server  |   File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 519, in increment
retriever-redis-server  |     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
retriever-redis-server  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                   retriever-redis-server  | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError
('<urllib3.connection.HTTPConnection object at 0x7fabf9c97e90>: Failed to establish a new connection: [Errno 111] Connection refused'))  

There is code confliction. Please help to resolve it.

@louie-tsai
Could you confirm the telemetry is off by default? although it not related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants