Skip to content

[BUG]: SQL Script execution crashes the profiler where empty output returns #2148

@lolo115

Description

@lolo115

Is there an existing issue for this?

  • I have searched the existing issues

Category of Bug / Issue

Profiler bug

Current Behavior

When I configured the pipeline_config.yml with a sql script that returns no rows, the profiler crashes:

e.g.

  • pipeline_config.yml
- name: test_nulloutput
   type: sql
   extract_source: src/databricks/labs/lakebridge/resources/assessments/oracle/test_nulloutput.sql
   mode: overwrite
   frequency: once
   flag: active
  • test_nulloutput.sql (Oracle dialect)
select 'value' as X from dual where 1=2

This crashed the following unit test:


def test_Oracle_profile_execution() -> None:
    """Test successful profiling execution using actual pipeline configuration"""
    profiler = Profiler("oracle")
    path_prefix = Path(__file__).parent / "../../../"
    config_file = path_prefix / "src/databricks/labs/lakebridge/resources/assessments/oracle/pipeline_config.yml"
    config = profiler.path_modifier(config_file=config_file, path_prefix=path_prefix)
    profiler.profile(pipeline_config=config)
    assert Path("/tmp/data/oracle_assessment/profiler_extract.db").exists(), "Profiler extract database should be created"

(See exception details in the related section below)

Expected Behavior

No response

Steps To Reproduce

No response

Relevant log output or Exception details

ERROR [root] SQL execution failed: Invalid Input Error: executemany requires a non-empty list of parameter sets to be provided
10:11:13     INFO [d.l.l.assessments.pipeline] Step 'test_nulloutput' completed with status: ERROR
10:11:13    ERROR [d.l.l.assessments.pipeline] Step test_nulloutput failed with error: SQL execution failed: Invalid Input Error: executemany requires a non-empty list of parameter sets to be provided
10:11:13    ERROR [d.l.l.assessments.pipeline] Pipeline execution failed due to errors in steps: test_nulloutput
10:11:13    ERROR [d.l.l.assessments.profiler] Error executing pipeline for source oracle: Pipeline execution failed due to errors in steps: test_nulloutput
FAILED
tests/integration/assessments/test_profiler.py:24 (test_Oracle_profile_execution)
self = <databricks.labs.lakebridge.assessments.profiler.Profiler object at 0x110255ea0>
platform = 'oracle'
pipeline_config = PipelineConfig(name='oracle_assessment', version='1.0', extract_folder='/tmp/data/oracle_assessment', comment=None, st...sments/oracle/test_nulloutput.sql', mode='overwrite', frequency='once', flag='active', dependencies=[], comment=None)])
extractor = <databricks.labs.lakebridge.connections.database_manager.DatabaseManager object at 0x110288370>

    def _execute(self, platform: str, pipeline_config: PipelineConfig, extractor=None) -> None:
        try:
            if extractor is None:
                extractor = Profiler._setup_extractor(platform)
    
>           result = PipelineClass(pipeline_config, extractor).execute()

src/databricks/labs/lakebridge/assessments/profiler.py:74: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <databricks.labs.lakebridge.assessments.pipeline.PipelineClass object at 0x110288670>

    def execute(self) -> list[StepExecutionResult]:
        logging.info(f"Pipeline initialized with config: {self.config.name}, version: {self.config.version}")
        execution_results: list[StepExecutionResult] = []
        error_flag = False
        for step in self.config.steps:
            logger.info(f"Executing step: {step.name}")
            result = self._process_step(step)
            execution_results.append(result)
            logger.info(f"Step '{step.name}' completed with status: {result.status}")
    
            # Check step execution status
            if result.status == StepExecutionStatus.ERROR:
                logger.error(f"Step {result.step_name} failed with error: {result.error_message}")
                error_flag = True
            elif result.status == StepExecutionStatus.SKIPPED:
                logger.info(f"Step {result.step_name} was skipped.")
            else:
                logger.info(f"Step {result.step_name} has completed successfully.")
    
        if error_flag:
            failed_steps = [r for r in execution_results if r.status == StepExecutionStatus.ERROR]
            error_msg = (
                f"Pipeline execution failed due to errors in steps: {', '.join(r.step_name for r in failed_steps)}"
            )
            logger.error(error_msg)
>           raise RuntimeError(error_msg)
E           RuntimeError: Pipeline execution failed due to errors in steps: test_nulloutput

src/databricks/labs/lakebridge/assessments/pipeline.py:71: RuntimeError

The above exception was the direct cause of the following exception:

    def test_Oracle_profile_execution() -> None:
        """Test successful profiling execution using actual pipeline configuration"""
        profiler = Profiler("oracle")
        path_prefix = Path(__file__).parent / "../../../"
        config_file = path_prefix / "src/databricks/labs/lakebridge/resources/assessments/oracle/pipeline_config.yml"
        config = profiler.path_modifier(config_file=config_file, path_prefix=path_prefix)
>       profiler.profile(pipeline_config=config)

tests/integration/assessments/test_profiler.py:31: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/databricks/labs/lakebridge/assessments/profiler.py:59: in profile
    self._execute(platform, pipeline_config, extractor)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <databricks.labs.lakebridge.assessments.profiler.Profiler object at 0x110255ea0>
platform = 'oracle'
pipeline_config = PipelineConfig(name='oracle_assessment', version='1.0', extract_folder='/tmp/data/oracle_assessment', comment=None, st...sments/oracle/test_nulloutput.sql', mode='overwrite', frequency='once', flag='active', dependencies=[], comment=None)])
extractor = <databricks.labs.lakebridge.connections.database_manager.DatabaseManager object at 0x110288370>

    def _execute(self, platform: str, pipeline_config: PipelineConfig, extractor=None) -> None:
        try:
            if extractor is None:
                extractor = Profiler._setup_extractor(platform)
    
            result = PipelineClass(pipeline_config, extractor).execute()
            logger.info(f"Profile execution has completed successfully for {platform} for more info check: {result}.")
        except FileNotFoundError as e:
            logger.error(f"Configuration file not found for source {platform}: {e}")
            raise FileNotFoundError(f"Configuration file not found for source {platform}: {e}") from e
        except Exception as e:
            logger.error(f"Error executing pipeline for source {platform}: {e}")
>           raise RuntimeError(f"Pipeline execution failed for source {platform} : {e}") from e
E           RuntimeError: Pipeline execution failed for source oracle : Pipeline execution failed due to errors in steps: test_nulloutput

src/databricks/labs/lakebridge/assessments/profiler.py:81: RuntimeError

Logs Confirmation

  • I ran the command line with --debug
  • I have attached the lsp-server.log under USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log

Sample Query

Operating System

macOS

Version

latest via Databricks CLI

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions