Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update db.in.ogr manual to improve import CSV files #4674

Open
cmbarton opened this issue Nov 8, 2024 · 0 comments
Open

Update db.in.ogr manual to improve import CSV files #4674

cmbarton opened this issue Nov 8, 2024 · 0 comments
Labels
docs enhancement New feature or request good first issue Good for newcomers manual Documentation related issues
Milestone

Comments

@cmbarton
Copy link
Contributor

cmbarton commented Nov 8, 2024

The problem

CSV is one of the most commonly and widely used formats for exchanging tabular data across platforms. However, it is difficult to import csv files into GRASS. Currently, db.in.ogr imports csv files but transforms all columns to text. In the current manual text, the suggested work around is to create an accompanying *.csvt file that specifies the data types for each column. This is very cumbersome, especially for files with many columns. But OGR will automatically recognize column data types if a file open option is set
-oo AUTODETECT_TYPE=YES
This makes importing csv files much easier. This can be manually entered into the gdal_oo argument in db.in.ogr but this is not mentioned in the current manual.

Proposed solution

While -oo AUTODETECT_TYPE=YES should be the default for importing a csv file using db.in.ogr (see #4593), until that change can be made, it would be very helpful to describe in the manual how this argument can be implemented as a workaround for now. The information in the current manual about using a *.csvt file should also be maintained for finer manual control. I propose the manual should be updated as follows:

Current:
Import CSV file

Limited type recognition can be done for Integer, Real, String, Date, Time and DateTime columns through a descriptive file with same name as the CSV file, but .csvt extension (see details here).

NOTE: create koeppen_gridcode.csvt first for automated type recognition

db.in.ogr input=koeppen_gridcode.csv output=koeppen_gridcode
db.select table=koeppen_gridcode

New:
Import CSV file

db.in.ogr will attempt to automatically read input data types if the gdal_doo flag is set to
AUTODETECT_TYPE=YES.

db.in.ogr input=koeppen_gridcode.csv output=koeppen_gridcode gdal_doo=AUTODETECT_TYPE=YES
db.select table=koeppen_gridcode

Users can also specify data types for CSV columns using a type definition file with same name as the CSV file, but *.csvt extension (see details here). Columns can be defined as Integer, Real, String, Date, Time and DateTime in this way.

NOTE: create koeppen_gridcode.csvt first for automated type recognition

db.in.ogr input=koeppen_gridcode.csv output=koeppen_gridcode
db.select table=koeppen_gridcode
@cmbarton cmbarton added the enhancement New feature or request label Nov 8, 2024
@cmbarton cmbarton added this to the 8.4.1 milestone Nov 8, 2024
@neteler neteler added good first issue Good for newcomers manual Documentation related issues docs labels Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs enhancement New feature or request good first issue Good for newcomers manual Documentation related issues
Projects
None yet
Development

No branches or pull requests

2 participants