[BUG]: T-SQL/Synapse Hash Concatenation Truncates String to 256 Characters, Leading to Incorrect Reconcile Results

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Category of Bug / Issue

Reconcile bug

### Current Behavior

In the reconciliation process, the hash calculation for the source (T-SQL/Synapse) environment is inconsistent with the hash calculation for the target (Databricks) environment, leading to reconciliation mismatches for certain rows.

1. **T-SQL/Synapse Source Hashing**: The query generator explicitly truncates the concatenated string of column values to 256 characters before calculating the hash. This issue happens in the file **src/databricks/labs/lakebridge/reconcile/query_builder/expression_generator.py** at **line 298**, where the function string is defined as:
`func="CONVERT(VARCHAR(256), HASHBYTES('SHA2_256', CONVERT(VARCHAR(256),{})), 2)"`
If a row's concatenated string of values exceeds 256 characters, the calculated T-SQL hash is based only on the first 256 characters. 

2.  **Databricks Target Hashing**: The target system (Databricks) calculates the hash based on the full, untruncated concatenated string.

This disparity means that for any record where the concatenated string exceeds 256 characters, the hash calculated in T-SQL will not match the hash calculated in Databricks, even if the row data is identical. This results in the reconciliation process incorrectly reporting these rows as being different.

### Expected Behavior

The hash calculation in the T-SQL/Synapse query should use the full, un-truncated concatenated string of column values to ensure the hash uniquely identifies the row data, consistent with the Databricks calculation.

The VARCHAR(256) limit on the inner CONVERT must be removed or significantly increased (e.g., to VARCHAR(8000) or VARCHAR(MAX) if supported) to avoid truncation and ensure consistent hashing across both source and target.

### Steps To Reproduce

1. Define a source table where the concatenation of all column values for at least one row exceeds 256 characters.
2. Ensure this data is correctly loaded into both the T-SQL/Synapse source and the Databricks target.
3. Execute the Lakebridge reconcile process for this table.
4. Observe that the reconciliation process incorrectly reports a mismatch for the row(s) whose concatenated values exceeded 256 characters, due to the T-SQL hash being based on the truncated string.

### Relevant log output or Exception details

```shell
No explicit exception is thrown. The symptom is reconciliation failure (mismatch) for rows where the concatenated values exceed 256 characters, despite the row data being identical in both systems.
```

### Logs Confirmation

- [ ] I ran the command line with `--debug`
- [ ] I have attached the `lsp-server.log` under USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log

### Sample Query

```shell

```

### Operating System

Windows

### Version

latest via Databricks CLI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: T-SQL/Synapse Hash Concatenation Truncates String to 256 Characters, Leading to Incorrect Reconcile Results #2194

Is there an existing issue for this?

Category of Bug / Issue

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output or Exception details

Logs Confirmation

Sample Query

Operating System

Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: T-SQL/Synapse Hash Concatenation Truncates String to 256 Characters, Leading to Incorrect Reconcile Results #2194

Description

Is there an existing issue for this?

Category of Bug / Issue

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output or Exception details

Logs Confirmation

Sample Query

Operating System

Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions