Feature/decimal support #982

GoEddie · 2021-10-14T21:13:54Z

We are excited to review your PR.

So we can do the best job, please check:

There's a descriptive title that will make sense to other developers some time from now.
There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
You have included any necessary tests in the same PR.

This implements #818

On the Apache Spark side decimal is implemented using a java.math.BigDecimal and the only way to construct that is using a int/long - there is no way to construct it with any value larger than max long unless you use the string constructor so I pass a string back and forth between .NET and the JVM, hope that is ok.

Co-authored-by: Steve Suh <[email protected]>

Macromullet · 2022-05-02T15:27:57Z

Can we get some traction on this one? It looks like this PR is ready and we have a lot of code that uses decimals that will be difficult to write tests around without this.

Macromullet · 2022-05-02T17:10:56Z

src/csharp/Microsoft.Spark/Interop/Ipc/SerDe.cs

+        /// </summary>
+        /// <param name="s">The stream to write</param>
+        /// <param name="value">The decimal to write</param>
+        public static void Write(Stream s, decimal value) => Write(s, value.ToString());


Should use ToString(CultureInfo.InvariantCulture) if we are using a string on the wire.

Macromullet · 2022-05-02T17:19:21Z

src/csharp/Microsoft.Spark/Interop/Ipc/JvmBridge.cs

@@ -267,6 +267,9 @@ private ISocketWrapper GetConnection()
                    case 'd':
                        returnValue = SerDe.ReadDouble(inputStream);
                        break;
+                    case 'm':
+                        returnValue = decimal.Parse(SerDe.ReadString(inputStream));


Use decimal.Parse(SerDe.ReadString(inputStream), CultureInfo.InvariantCulture) to ensure we are using invariant culture on the wire.

cutecycle · 2022-05-10T14:34:04Z

src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/DataTypesTests.cs

+            Row row = df.Collect().First();
+            Assert.Equal(decimal.MinValue, row[0]);
+            Assert.Equal(decimal.MaxValue, row[1]);
+            Assert.Equal(decimal.Zero, row[2]);


I haven't gotten to dive deep into whether this is an issue yet, but want to bring it to attention just in case:

There was a time when we were comparing SQL Server output to Spark SQL output trying to migrate a pipeline to Synapse, and when attempting to diff two tables, found an issue with a double.

SQL Server uses, presumably, C#'s (and JavaScript, which the Python Notebook table preview in Synapse uses)'s conception of floats: -0.0 == 0.0, but the JVM/Spark in some cases compares by bit and differentiates because of the signed bit: -0.0 != 0.0.

It's resolved in later versions of Spark's DataFrames, and may not apply in the case of [decimal]String, so it may not be problematic.

https://issues.apache.org/jira/browse/SPARK-26021

https://www.mail-archive.com/[email protected]/msg283973.html

https://issues.apache.org/jira/browse/SPARK-32110

[SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++ apache/spark#30673

[BUG] -0.0 vs 0.0 is a hot mess NVIDIA/spark-rapids#294

This is because internally, BigDecimal uses BigInteger, and BigInteger also only has a single concept of zero. A BigInteger behaves as a two's-complement integer, and two's-complement only has a single zero.

https://stackoverflow.com/questions/49774540/distinguish-zero-and-negative-zero-with-java-math-bigdecimal
It's okay!

Macromullet · 2022-06-02T22:10:00Z

Any updates on this PR?

AFFogarty · 2023-02-01T21:25:00Z

Hey @GoEddie , are you still working on this?

GoEddie · 2023-02-01T21:36:13Z

Hi @AFFogarty,

I had given up really as no one seemed to be reviewing pr’s but am happy to get it up to date again.

ed

wudanzy · 2025-01-03T03:31:38Z

Hi @GoEddie , thanks for the contribution! Are you still working on this? We recently get write permission of this repo and happy to move this forward. We just added support for 3.3 and 3.5.

GOEddieUK and others added 17 commits October 28, 2020 19:20

Check whether file is found before trying to dereference it

e42a0c4

switching to null conditional check

ca84f6a

Update src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs

5dc2143

Co-authored-by: Steve Suh <[email protected]>

Merge branch 'master' of github.com:GoEddie/spark

aefc393

Merge branch 'master' of github.com:dotnet/spark

9d31c3a

Merge branch 'master' of github.com:dotnet/spark

c5b660d

Merge branch 'master' of github.com:dotnet/spark

ec9ca58

Merge branch 'master' of github.com:dotnet/spark

4f29c18

Merge branch 'main' of github.com:dotnet/spark

3c1b505

Decimal Support

338210a

isnt used

66e357b

unnecessary import

ceab50f

formatting

7ff1c8e

other versions

546ba36

extra line

a72e152

test

4c88ece

comment

592a57d

GoEddie changed the title ~~[WIP] Feature/decimal support~~ Feature/decimal support Oct 14, 2021

GoEddie marked this pull request as ready for review October 14, 2021 22:03

GoEddie mentioned this pull request Nov 9, 2021

[BUG]: UDF returning json as string changing the decimal format inside json #984

Open

Merge branch 'main' into feature/DecimalSupport

c33f697

Macromullet reviewed May 2, 2022

View reviewed changes

Merge branch 'main' into feature/DecimalSupport

f9a0aa0

cutecycle reviewed May 10, 2022

View reviewed changes

tonyqus mentioned this pull request Jan 6, 2025

Can we breathe life back into this project? #1162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/decimal support #982

Feature/decimal support #982

Uh oh!

GoEddie commented Oct 14, 2021 •

edited

Loading

Uh oh!

Macromullet commented May 2, 2022

Uh oh!

Macromullet May 2, 2022

Uh oh!

Macromullet May 2, 2022

Uh oh!

cutecycle May 10, 2022 •

edited

Loading

Uh oh!

cutecycle Jun 2, 2022 •

edited

Loading

Uh oh!

Macromullet commented Jun 2, 2022

Uh oh!

AFFogarty commented Feb 1, 2023

Uh oh!

GoEddie commented Feb 1, 2023

Uh oh!

wudanzy commented Jan 3, 2025

Uh oh!

Uh oh!

Feature/decimal support #982

Are you sure you want to change the base?

Feature/decimal support #982

Uh oh!

Conversation

GoEddie commented Oct 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Macromullet commented May 2, 2022

Uh oh!

Macromullet May 2, 2022

Choose a reason for hiding this comment

Uh oh!

Macromullet May 2, 2022

Choose a reason for hiding this comment

Uh oh!

cutecycle May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cutecycle Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Macromullet commented Jun 2, 2022

Uh oh!

AFFogarty commented Feb 1, 2023

Uh oh!

GoEddie commented Feb 1, 2023

Uh oh!

wudanzy commented Jan 3, 2025

Uh oh!

Uh oh!

GoEddie commented Oct 14, 2021 •

edited

Loading

cutecycle May 10, 2022 •

edited

Loading

cutecycle Jun 2, 2022 •

edited

Loading