Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] JSON.dump for hash with duplicate key #564

Conversation

maniSHarma7575
Copy link

Motivation / Background

This Pull Request has been created because to fix #563

The output for the hash is inconsistent when using JSON.dump with the duplicate key. Specifically, if we examine the result below, the key 'a' appears twice in the output JSON for the same key 'a'.

For example:

JSON.dump({ 'a' => 1, :a => 2 })

Actual Result:

"{\"a\":1,\"a\":2}"

Expected result:

"{\"a\":2}"

cc: @byroot @hsbt

@maniSHarma7575
Copy link
Author

maniSHarma7575 commented Jan 7, 2024

@hsbt , did you get a chance to check this out?

cc: @byroot

@hsbt
Copy link
Member

hsbt commented Jan 11, 2024

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}

Please let me know if my understanding is wrong.

@maniSHarma7575
Copy link
Author

maniSHarma7575 commented Jan 11, 2024

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}

Please let me know if my understanding is wrong.

@hsbt , if the colon (:) syntax is used to create symbols as keys, the output matches the expected result:

ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.6.3
{"a":2}

ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.7.1
{"a":2}

However, when the rocket (=>) syntax is used to associate keys with values in the hash, the JSON output contains duplicate values.

Therefore, the output should be consistent regardless of the syntax used to create the hash.

@byroot
Copy link
Member

byroot commented Jan 11, 2024

@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.

However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.

@byroot
Copy link
Member

byroot commented Jan 11, 2024

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

@hsbt he's not trying to fix a regression, but do prevent duplicated keys in the output. Arguably it's a minor bug, but it's true that it's not really expected for a JSON library to allow this.

As you know JSON has multiple standards, most don't say anything about that, but RFC 8259 says keys SHOULD be unique.

And more generally it's unclear how parsers will handle it, some may keep the first value, some the last, etc.

@maniSHarma7575
Copy link
Author

@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.

However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.

@byroot , it's functioning well for all three alternative implementations, as I've implemented the fix in lib/json/common.rb.

https://github.com/flori/json/blob/78643245d2b60f770946e2e252890655d61737fc/lib/json/common.rb#L614

For all the 3 implementations I have tested it:

  • For Java
ruby -v
jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f Java HotSpot(TM) 64-Bit Server VM 25.391-b13 on 1.8.0_391-b13 +jit [arm64-darwin]
rake build
gem build json-java.gemspec
  Successfully built RubyGem
  Name: json
  Version: 2.7.2
  File: json-2.7.2-java.gem
mkdir -p pkg
mv json-2.7.2-java.gem pkg

Created a script json.rb

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

  • For C
ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
rake build
Resolving dependencies...
(in /Users/manishsharma/Development/personal/opensource/json)
Writing version information for 2.7.2
gem build json.gemspec
  Successfully built RubyGem
  Name: json
  Version: 2.7.2
  File: json-2.7.2.gem
gem build json_pure.gemspec
  Successfully built RubyGem
  Name: json_pure
  Version: 2.7.2
  File: json_pure-2.7.2.gem
mkdir -p pkg
mv json-2.7.2.gem pkg
mv json_pure-2.7.2.gem pkg
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

  • For Ruby
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json_pure", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json/pure'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

@maniSHarma7575
Copy link
Author

maniSHarma7575 commented Jan 16, 2024

@hsbt and @byroot Waiting for your response.

@maniSHarma7575
Copy link
Author

Hi @hsbt Could you please review this?

@maniSHarma7575
Copy link
Author

@byroot Waiting for the response on this?

cc: @hsbt

@jhawthorn
Copy link
Member

I don't think JSON should perform this validation. It will hurt performance a lot and it's not something we can fully guarantee to users.

This may cover simple cases but not, for example, differently encoded strings:

> JSON.dump({"foo".encode("UTF-16").to_s => 1, "foo".to_s => 2})
=> "{\"foo\":1,\"foo\":2}"

Covering that would have to be quite difficult. I think it's best we leave the responsibility to the user to provide reasonable input.

(As I wrote in the Rails issue) I'd actually prefer { 'a' => 1, :a => 2 } be turned into the questionably valid ("SHOULD" is not quite as strong as "MUST") JSON as "{\"a\":1,\"a\":2}" rather than either "{\"a\":1}" or "{\"a\":2}" as it would allow me to see that there is an issue with my program rather than hiding the error and making an arbitrary choice.

@byroot
Copy link
Member

byroot commented Oct 17, 2024

Thanks for the PR but I agree with John here. This is too costly to protect against this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON.dump inconsistent output for hash
5 participants