Skip to content

Error on dbWriteTable with staging_volume and append = TRUE #146

@brianperdomo

Description

@brianperdomo

I am getting an error when trying to append to an existing Databricks table using the staging volume.

con <- DBI::dbConnect(
  brickster::DatabricksSQL(),
  host = brickster::db_host(),
  warehouse_id = "88e5c3346f3e2b20"
)

tbl_name <- "test_catalog.test_schema.temp_penguins"
stg_vol <- "/Volumes/test_catalog/test_schema/staging/"

data_overwrite <- dplyr::sample_n(datasets::penguins, 1e5, replace = TRUE)

DBI::dbWriteTable(
  conn = con,
  name = tbl_name, 
  value = data_overwrite, 
  overwrite = TRUE, 
  staging_volume = stg_vol,
  progress = TRUE
)

#> ✔ Writing files [115ms]
#> ✔ Table created [3.1s]             
#> ✔ Clearing staged files [3.4s]

(dplyr::tbl(con, I(tbl_name)) |> 
  dplyr::count())

#> ✔ Submitting query [111ms]
#> ✔ Executing query [1.2s]
#> ✔ Downloaded 1 rows [311ms]
#> ✔ Processing results [13ms]
#> # Source:   SQL [?? x 1]
#> # Database: DatabricksConnection
#>        n
#>    <int>
#> 1 100000
 
data_append <- dplyr::sample_n(datasets::penguins, 1e5, replace = TRUE)

DBI::dbWriteTable(
  conn = con,
  name = tbl_name, 
  value = data_append, 
  append = TRUE, 
  staging_volume = stg_vol,
  progress = TRUE
)
#> ✔ Writing files [80ms]
#> ✔ Data appended [854ms]  
#> ✔ Clearing staged files [3.6s]
#> Error in `db_sql_exec_and_wait()`:
#> ! [COPY_INTO_SOURCE_SCHEMA_INFERENCE_FAILED] The source directory did not contain any parsable files of type PARQUET. Please
#>   check the contents of '/Volumes/finance/empyrean/staging//brickster_upload_20251212_134538_94812'. The error can be silenced by
#>   setting 'spark.databricks.delta.copyInto.emptySourceCheck.enabled' to 'false'.
#>     ▆
#>  1. ├─DBI::dbWriteTable(...)
#>  2. └─DBI::dbWriteTable(...)
#>  3.   └─brickster (local) .local(conn, name, value, ...)
#>  4.     └─brickster:::db_write_table_volume(...)
#>  5.       └─brickster:::db_sql_exec_and_wait(...)
#>  6.         └─cli::cli_abort(resp$status$error$message)
#>  7.           └─rlang::abort(...)

Created on 2025-12-12 with reprex v2.1.1

Session Info

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.1 (2025-06-13 ucrt)
#>  os       Windows 11 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       Rgui
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/New_York
#>  date     2025-12-12
#>  pandoc   3.8.2.1 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>  quarto   NA @ c:\\Users\\###\\DOCUME~1\\software\\Positron\\RESOUR~1\\app\\quarto\\bin\\quarto.exe
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  ! package     * version date (UTC) lib source
#>  P arrow         22.0.0  2025-10-29 [?] CRAN (R 4.5.2)
#>  P askpass       1.2.1   2024-10-04 [?] CRAN (R 4.5.0)
#>  P assertthat    0.2.1   2019-03-21 [?] CRAN (R 4.5.0)
#>  P bit           4.6.0   2025-03-06 [?] CRAN (R 4.5.0)
#>  P bit64         4.6.0-1 2025-01-16 [?] CRAN (R 4.5.0)
#>  P blob          1.2.4   2023-03-17 [?] CRAN (R 4.5.0)
#>  P brickster     0.2.9   2025-09-04 [?] CRAN (R 4.5.2)
#>  P callr         3.7.6   2024-03-25 [?] CRAN (R 4.5.0)
#>  P cli           3.6.5   2025-04-23 [?] CRAN (R 4.5.0)
#>  P clipr         0.8.0   2022-02-22 [?] CRAN (R 4.5.0)
#>  P curl          7.0.0   2025-08-19 [?] RSPM
#>  P DBI           1.2.3   2024-06-02 [?] CRAN (R 4.5.0)
#>  P dbplyr        2.5.1   2025-09-10 [?] CRAN (R 4.5.2)
#>  P digest        0.6.39  2025-11-19 [?] CRAN (R 4.5.2)
#>  P dplyr         1.1.4   2023-11-17 [?] CRAN (R 4.5.0)
#>  P evaluate      1.0.5   2025-08-27 [?] CRAN (R 4.5.1)
#>  P fastmap       1.2.0   2024-05-15 [?] CRAN (R 4.5.0)
#>  P fs            1.6.6   2025-04-12 [?] CRAN (R 4.5.0)
#>  P generics      0.1.4   2025-05-09 [?] CRAN (R 4.5.0)
#>  P glue          1.8.0   2024-09-30 [?] CRAN (R 4.5.0)
#>  P hms           1.1.4   2025-10-17 [?] CRAN (R 4.5.2)
#>  P htmltools     0.5.9   2025-12-04 [?] CRAN (R 4.5.2)
#>  P httpuv        1.6.16  2025-04-16 [?] CRAN (R 4.5.0)
#>  P httr2         1.2.2   2025-12-08 [?] CRAN (R 4.5.1)
#>  P jsonlite      2.0.0   2025-03-27 [?] CRAN (R 4.5.0)
#>  P knitr         1.50    2025-03-16 [?] CRAN (R 4.5.0)
#>  P later         1.4.4   2025-08-27 [?] CRAN (R 4.5.1)
#>  P lifecycle     1.0.4   2023-11-07 [?] CRAN (R 4.5.0)
#>  P magrittr      2.0.4   2025-09-12 [?] CRAN (R 4.5.1)
#>  P odbc          1.6.4   2025-12-06 [?] CRAN (R 4.5.1)
#>  P openssl       2.3.4   2025-09-30 [?] CRAN (R 4.5.1)
#>  P otel          0.2.0   2025-08-29 [?] CRAN (R 4.5.2)
#>  P pillar        1.11.1  2025-09-17 [?] CRAN (R 4.5.1)
#>  P pkgconfig     2.0.3   2019-09-22 [?] CRAN (R 4.5.0)
#>  P processx      3.8.6   2025-02-21 [?] CRAN (R 4.5.0)
#>  P promises      1.5.0   2025-11-01 [?] CRAN (R 4.5.2)
#>  P ps            1.9.1   2025-04-12 [?] CRAN (R 4.5.0)
#>  P purrr         1.2.0   2025-11-04 [?] CRAN (R 4.5.2)
#>  P R6            2.6.1   2025-02-15 [?] CRAN (R 4.5.0)
#>  P rappdirs      0.3.3   2021-01-31 [?] CRAN (R 4.5.0)
#>  P Rcpp          1.1.0   2025-07-02 [?] CRAN (R 4.5.1)
#>    renv          1.1.5   2025-07-24 [1] CRAN (R 4.5.1)
#>  P reprex      * 2.1.1   2024-07-06 [?] CRAN (R 4.5.2)
#>  P rlang         1.1.6   2025-04-11 [?] CRAN (R 4.5.0)
#>  P rmarkdown     2.30    2025-09-28 [?] CRAN (R 4.5.1)
#>  P rstudioapi    0.17.1  2024-10-22 [?] CRAN (R 4.5.0)
#>  P sessioninfo   1.2.3   2025-02-05 [?] CRAN (R 4.5.0)
#>  P tibble        3.3.0   2025-06-08 [?] CRAN (R 4.5.0)
#>  P tidyselect    1.2.1   2024-03-11 [?] CRAN (R 4.5.0)
#>  P utf8          1.2.6   2025-06-08 [?] CRAN (R 4.5.0)
#>  P vctrs         0.6.5   2023-12-01 [?] CRAN (R 4.5.0)
#>  P withr         3.0.2   2024-10-28 [?] CRAN (R 4.5.0)
#>  P xfun          0.54    2025-10-30 [?] CRAN (R 4.5.2)
#>  P yaml          2.3.12  2025-12-10 [?] CRAN (R 4.5.1)

Debugging the db_sql_exec_and_wait function in both cases the staging files do appear in the staging volume before the table is created/appended. The only difference I could find was that the overwrite version uses the following query:

CREATE OR REPLACE TABLE `test_catalog`.`test_schema`.`temp_penguins` AS 
SELECT * 
FROM 
  READ_FILES('/Volumes/test_catalog/test_schema/staging//brickster_upload_20251212_135953_85403', format => 'parquet')

vs the append which uses:

COPY INTO `test_catalog`.`test_schema`.`temp_penguins` 
FROM '/Volumes/test_catalog/test_schema/staging//brickster_upload_20251212_140348_83945' 
FILEFORMAT = PARQUET

Obviously, the overwrite and append would use different queries, but maybe there's some issue with using COPY INTO, but I'm not knowledgeable enough about such things.

Thanks in advance for any assistance you can provide.

Metadata

Metadata

Assignees

Labels

investigationFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions