Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB: Various fixes and improvements #216

Merged
merged 12 commits into from
Aug 15, 2024
Merged

MongoDB: Various fixes and improvements #216

merged 12 commits into from
Aug 15, 2024

Conversation

amotl
Copy link
Member

@amotl amotl commented Aug 8, 2024

About

The venerable migr8 program did not receive much love to get the details right. This patch intends to catch up on this, in order to make ctk load table mongodb://... hold up to its promise of better DWIM.

Preview

pip install --upgrade 'cratedb-toolkit[mongodb] @ git+https://github.com/crate/cratedb-toolkit.git@mongodb-better-1'

What's Inside

Bugfixes

Details

  • MongoDB: Fix missing output on STDOUT for migr8 export
  • MongoDB: Improve timestamp parsing by using python-dateutil
  • MongoDB: Converge _id input field to id column instead of dropping it
  • MongoDB: Make user interface use stderr, so stdout is for data only
  • MongoDB: Make migr8 extract write to stdout by default
  • MongoDB: Make migr8 translate read from stdin by default
  • MongoDB: Improve user interface messages
  • MongoDB: Strip single leading underscore character from all top-level fields
  • MongoDB: Map OID types to CrateDB TEXT columns
  • MongoDB: Make migr8 extract and migr8 export accept the --limit option
  • MongoDB: Fix indentation in prettified SQL output of migr8 translate
  • MongoDB: Add capability to give type hints and add transformations using --transformation, see Zyp Transformations.

@amotl amotl force-pushed the mongodb-better-1 branch 3 times, most recently from fcb2ab2 to d224db7 Compare August 8, 2024 19:50
cratedb_toolkit/io/mongodb/core.py Outdated Show resolved Hide resolved
Comment on lines 38 to 43
def date_converter(value):
if isinstance(value, int):
return value
dt = datetime.strptime(value[:-5], "%Y-%m-%dT%H:%M:%S.%f")
iso_match = _TZINFO_RE.match(value[-5:])
if iso_match:
sign, hours, minutes = iso_match.groups()
tzoffset = int(hours) * 3600 + int(minutes) * 60
if sign == "-":
dt = dt + timedelta(seconds=tzoffset)
else:
dt = dt - timedelta(seconds=tzoffset)
else:
raise Exception("Can't parse datetime string {0}".format(value))
dt = dateparser.parse(value)
return calendar.timegm(dt.utctimetuple()) * 1000
Copy link
Member Author

@amotl amotl Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been trimmed to just rely on dateutil.parser completely. Do you think it will be okay, because that code here just tried to achieve similar things manually before?

Copy link

@hlcianfagna hlcianfagna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small thing that it may be worth mentioning in the docs, the schema translation part takes the name of the MongoDB collection for the CREATE TABLE statement but the data load part takes the table name from the CRATEDB_SQLALCHEMY_URL so these 2 have to match for ctk load table to work.

@amotl amotl force-pushed the mongodb-better-1 branch 2 times, most recently from 4814bf1 to d055214 Compare August 14, 2024 00:12
@amotl
Copy link
Member Author

amotl commented Aug 14, 2024

Hi Hernan. Thanks for your suggestion. I was sure there are anomalies, but I can't spot them, probably because of operational blindness. Can I humbly ask you to submit a corresponding suggestion how and where to improve the documentation, either by commenting on this patch, or by submitting a separate one? Thanks a stack!

@amotl
Copy link
Member Author

amotl commented Aug 14, 2024

The MongoDB Table Loader will now also accept Zyp transformations for re-shaping data.

@amotl amotl marked this pull request as ready for review August 14, 2024 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants