-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Copy link
Labels
needs-triageNeeds TriageNeeds Triage
Description
Is there an existing issue for this?
- I have searched the existing issues
Category of Bug / Issue
Profiler bug
Current Behavior
When I configured the pipeline_config.yml with a sql script that returns no rows, the profiler crashes:
e.g.
- pipeline_config.yml
- name: test_nulloutput
type: sql
extract_source: src/databricks/labs/lakebridge/resources/assessments/oracle/test_nulloutput.sql
mode: overwrite
frequency: once
flag: active
- test_nulloutput.sql (Oracle dialect)
select 'value' as X from dual where 1=2
This crashed the following unit test:
def test_Oracle_profile_execution() -> None:
"""Test successful profiling execution using actual pipeline configuration"""
profiler = Profiler("oracle")
path_prefix = Path(__file__).parent / "../../../"
config_file = path_prefix / "src/databricks/labs/lakebridge/resources/assessments/oracle/pipeline_config.yml"
config = profiler.path_modifier(config_file=config_file, path_prefix=path_prefix)
profiler.profile(pipeline_config=config)
assert Path("/tmp/data/oracle_assessment/profiler_extract.db").exists(), "Profiler extract database should be created"
(See exception details in the related section below)
Expected Behavior
No response
Steps To Reproduce
No response
Relevant log output or Exception details
ERROR [root] SQL execution failed: Invalid Input Error: executemany requires a non-empty list of parameter sets to be provided
10:11:13 INFO [d.l.l.assessments.pipeline] Step 'test_nulloutput' completed with status: ERROR
10:11:13 ERROR [d.l.l.assessments.pipeline] Step test_nulloutput failed with error: SQL execution failed: Invalid Input Error: executemany requires a non-empty list of parameter sets to be provided
10:11:13 ERROR [d.l.l.assessments.pipeline] Pipeline execution failed due to errors in steps: test_nulloutput
10:11:13 ERROR [d.l.l.assessments.profiler] Error executing pipeline for source oracle: Pipeline execution failed due to errors in steps: test_nulloutput
FAILED
tests/integration/assessments/test_profiler.py:24 (test_Oracle_profile_execution)
self = <databricks.labs.lakebridge.assessments.profiler.Profiler object at 0x110255ea0>
platform = 'oracle'
pipeline_config = PipelineConfig(name='oracle_assessment', version='1.0', extract_folder='/tmp/data/oracle_assessment', comment=None, st...sments/oracle/test_nulloutput.sql', mode='overwrite', frequency='once', flag='active', dependencies=[], comment=None)])
extractor = <databricks.labs.lakebridge.connections.database_manager.DatabaseManager object at 0x110288370>
def _execute(self, platform: str, pipeline_config: PipelineConfig, extractor=None) -> None:
try:
if extractor is None:
extractor = Profiler._setup_extractor(platform)
> result = PipelineClass(pipeline_config, extractor).execute()
src/databricks/labs/lakebridge/assessments/profiler.py:74:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <databricks.labs.lakebridge.assessments.pipeline.PipelineClass object at 0x110288670>
def execute(self) -> list[StepExecutionResult]:
logging.info(f"Pipeline initialized with config: {self.config.name}, version: {self.config.version}")
execution_results: list[StepExecutionResult] = []
error_flag = False
for step in self.config.steps:
logger.info(f"Executing step: {step.name}")
result = self._process_step(step)
execution_results.append(result)
logger.info(f"Step '{step.name}' completed with status: {result.status}")
# Check step execution status
if result.status == StepExecutionStatus.ERROR:
logger.error(f"Step {result.step_name} failed with error: {result.error_message}")
error_flag = True
elif result.status == StepExecutionStatus.SKIPPED:
logger.info(f"Step {result.step_name} was skipped.")
else:
logger.info(f"Step {result.step_name} has completed successfully.")
if error_flag:
failed_steps = [r for r in execution_results if r.status == StepExecutionStatus.ERROR]
error_msg = (
f"Pipeline execution failed due to errors in steps: {', '.join(r.step_name for r in failed_steps)}"
)
logger.error(error_msg)
> raise RuntimeError(error_msg)
E RuntimeError: Pipeline execution failed due to errors in steps: test_nulloutput
src/databricks/labs/lakebridge/assessments/pipeline.py:71: RuntimeError
The above exception was the direct cause of the following exception:
def test_Oracle_profile_execution() -> None:
"""Test successful profiling execution using actual pipeline configuration"""
profiler = Profiler("oracle")
path_prefix = Path(__file__).parent / "../../../"
config_file = path_prefix / "src/databricks/labs/lakebridge/resources/assessments/oracle/pipeline_config.yml"
config = profiler.path_modifier(config_file=config_file, path_prefix=path_prefix)
> profiler.profile(pipeline_config=config)
tests/integration/assessments/test_profiler.py:31:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/databricks/labs/lakebridge/assessments/profiler.py:59: in profile
self._execute(platform, pipeline_config, extractor)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <databricks.labs.lakebridge.assessments.profiler.Profiler object at 0x110255ea0>
platform = 'oracle'
pipeline_config = PipelineConfig(name='oracle_assessment', version='1.0', extract_folder='/tmp/data/oracle_assessment', comment=None, st...sments/oracle/test_nulloutput.sql', mode='overwrite', frequency='once', flag='active', dependencies=[], comment=None)])
extractor = <databricks.labs.lakebridge.connections.database_manager.DatabaseManager object at 0x110288370>
def _execute(self, platform: str, pipeline_config: PipelineConfig, extractor=None) -> None:
try:
if extractor is None:
extractor = Profiler._setup_extractor(platform)
result = PipelineClass(pipeline_config, extractor).execute()
logger.info(f"Profile execution has completed successfully for {platform} for more info check: {result}.")
except FileNotFoundError as e:
logger.error(f"Configuration file not found for source {platform}: {e}")
raise FileNotFoundError(f"Configuration file not found for source {platform}: {e}") from e
except Exception as e:
logger.error(f"Error executing pipeline for source {platform}: {e}")
> raise RuntimeError(f"Pipeline execution failed for source {platform} : {e}") from e
E RuntimeError: Pipeline execution failed for source oracle : Pipeline execution failed due to errors in steps: test_nulloutput
src/databricks/labs/lakebridge/assessments/profiler.py:81: RuntimeErrorLogs Confirmation
- I ran the command line with
--debug - I have attached the
lsp-server.logunder USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log
Sample Query
Operating System
macOS
Version
latest via Databricks CLI
Metadata
Metadata
Assignees
Labels
needs-triageNeeds TriageNeeds Triage