Skip to content

[BUG]: Lakebridge inability to determine delta tables' datatypes #2182

@arthur-dakota

Description

@arthur-dakota

Is there an existing issue for this?

  • I have searched the existing issues

Category of Bug / Issue

Application crashed

Current Behavior

Hello Lakebridge Team,

I have been attempting to implement Lakebridge's Reconciliation into our SQL Server to Databricks platform migration and have been facing issues in regards to Lakebridge determining datatypes of target (Databricks) columns.

Here is sample output in lakebridge_metadata.reconciliation.details

Image

You see in the image that source columns datetypes are determinable but not vice versa.

Additionally, Lakebridge is unable to correctly determine mismatched rows, validating it's output of mismatched rows in source and target tables. I suspected it was because of the initial datatypes issues (makes sense, if the datatypes are unable to match, then the values won't match when comparing row-to-row by source and target). However, I noticed when running Lakebridge and setting ReconcileConfig.report_type = "data", still the mismatching rows error persists.

Additionally, in the "Steps to Reproduce", I've attached a .ipynb file for the Databricks notebook I am running. You will notice that the TableRecon.tables attribute has been set, if this attribute is not set, then columns are not determined in the source.

For further information, please contact me through my email: [email protected]

Expected Behavior

Expecting that Databricks Delta Tables' column datatypes to be determinable. This would lead to correct column datatypes matching, correct conversion from SQL Server to Databricks.

Steps To Reproduce

Source Table Details in SQL Server:
"GL_Accounts"
Image

Target Table Details in SQL Server:
"GL_Accounts"

CREATE STREAMING TABLE <catalog>.<schema>.`gl_accounts` (   
  GLAcct_Key DECIMAL(15,0),   
  GLAcct_GLAcctMaj_Key DECIMAL(15,0),   
  GLAcct_BusEnt_Acct STRING,   
  GLAcct_Site_Acct STRING,   
  GLAcct_PCtr_Acct STRING,   
  GLAcct_Disabled SMALLINT,   
  GLAcct_Disable_Date TIMESTAMP,   
  GLAcct_Disable_User STRING,   
  ts BINARY) TBLPROPERTIES (  
   '__cdc_last_validated_schema_version' = '0',   
  '__cdc_reactivated_columns_since_schema_version' = '{}',  
   '__ingestion_connector_inactive_columns' = '[]',   
  'delta.columnMapping.mode' = 'name',   
  'delta.minReaderVersion' = '2',   
  'delta.minWriterVersion' = '5') AS

Note: table catalog and schema have been blurred.

How Lakebridge is being installed, initialized and ran (Note: table catalog and schema have been blurred):
Lakebridge Testing - Using JDBC Connection - ATOMIC_PDIENT_CND (1).ipynb

Relevant log output or Exception details

Logs Confirmation

  • I ran the command line with --debug
  • I have attached the lsp-server.log under USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log

Sample Query

Operating System

Windows

Version

latest via Databricks CLI

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions