Skip to content

GH-48408: [C++] Enable ULP-based float comparison#49290

Open
andishgar wants to merge 2 commits intoapache:mainfrom
andishgar:enable_ulp_based_comparison
Open

GH-48408: [C++] Enable ULP-based float comparison#49290
andishgar wants to merge 2 commits intoapache:mainfrom
andishgar:enable_ulp_based_comparison

Conversation

@andishgar
Copy link
Contributor

@andishgar andishgar commented Feb 16, 2026

Rationale for this change

Enable ULP-based floating-point comparison.

What changes are included in this PR?

Add arrow::EqualOptions::use_ulp_distance and arrow::EqualOptions::ulp_distance.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. The ULP-based comparison method is enabled via arrow::EqualOptions::use_ulp_distance and arrow::EqualOptions::ulp_distance.

@github-actions
Copy link

⚠️ GitHub issue #48408 has been automatically assigned in GitHub to PR creator.

@andishgar andishgar force-pushed the enable_ulp_based_comparison branch from c79f91f to 9cd0147 Compare February 16, 2026 07:40
@andishgar
Copy link
Contributor Author

@pitrou , I would appreciate it if you could review this.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR @andishgar . As you'll see in the comments below, I think it would be nice to come up with a nicer API for EqualOptions.

bool use_atol_ = false;
bool use_schema_ = true;
bool use_metadata_ = false;
bool use_ulp_distance_ = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a enum to simplify those flags slightly, for example:

  enum FloatComparison { Exact, Atol, Ulps };
  FloatComparison float_comparison_ = Exact;

Alternatively, we could use a std::variant to also encompass the numeric parameters:

  struct ExactComparison {};
  struct UlpComparison { int max_ulps; }
  struct AtolComparison { double atol; }

  // Defaults to ExactComparison
  std::variant<ExactComparison, UlpComparison, AtolComparison> float_comparison_ = {};

auto res = EqualOptions(*this);
res.ulp_distance_ = v;
return res;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to be a lot of methods just to customize FP comparison. Also, usually you know the number of ULPs you want, so you have to call two methods (use_ulp_distance and ulp_distance) which feels pointlessly complicated.

I wonder if we could have a single method to encompass all these usages, for example:

  auto options = EqualOptions().float_comparison(EqualOptions::Atol(1e-5));
  auto options2 = EqualOptions().float_comparison(EqualOptions::Ulps(2));

@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Feb 25, 2026
@andishgar
Copy link
Contributor Author

andishgar commented Feb 26, 2026

@pitrou several notes
1- As I mentioned here, using both atol and ulp_distance together seems to be a common approach.
2- There are four options for floating-point comparison: NaN, signed zero, ULP, and atol, which can be combined together. My proposal is to use such a struct.

static constexpr double kDefaultAbsoluteTolerance = 1E-5;
static constexpr int32_t kDefaultUlpDistance = 4;

/// A container of options for equality comparisons
class EqualOptions {
 public:
  /// Whether or not NaNs are considered equal.
  bool nans_equal() const { return nans_equal_; }

  /// Return a new EqualOptions object with the "nans_equal" property changed.
  EqualOptions nans_equal(bool v) const {
    auto res = EqualOptions(*this);
    res.nans_equal_ = v;
    return res;
  }

  /// Whether or not zeros with differing signs are considered equal.
  bool signed_zeros_equal() const { return signed_zeros_equal_; }

  /// Return a new EqualOptions object with the "signed_zeros_equal" property changed.
  EqualOptions signed_zeros_equal(bool v) const {
    auto res = EqualOptions(*this);
    res.signed_zeros_equal_ = v;
    return res;
  }

  /// The absolute tolerance for approximate comparisons of floating-point values.
  std::optional<double> atol() const { return atol_; }

  /// Return a new EqualOptions object with the "atol" property changed.
  /// If both "ulp_distance" and "atol" are specified, the comparison
  /// succeeds when either condition is satisfied.
  EqualOptions atol(double v) const {
    auto res = EqualOptions(*this);
    res.atol_ = v;
    return res;
  }

  /// Whether the \ref arrow::Schema property is used in the comparison.
  ///
  /// This option only affects the Equals methods
  /// and has no effect on ApproxEquals methods.
  bool use_schema() const { return use_schema_; }

  /// Return a new EqualOptions object with the "use_schema_" property changed.
  ///
  /// Setting this option is false making the value of \ref EqualOptions::use_metadata
  /// is ignored.
  EqualOptions use_schema(bool v) const {
    auto res = EqualOptions(*this);
    res.use_schema_ = v;
    return res;
  }

  /// Whether the "metadata" in \ref arrow::Schema is used in the comparison.
  ///
  /// This option only affects the Equals methods
  /// and has no effect on the ApproxEquals methods.
  ///
  /// Note: This option is only considered when \ref arrow::EqualOptions::use_schema is
  /// set to true.
  bool use_metadata() const { return use_metadata_; }

  /// Return a new EqualOptions object with the "use_metadata" property changed.
  EqualOptions use_metadata(bool v) const {
    auto res = EqualOptions(*this);
    res.use_metadata_ = v;
    return res;
  }

  /// The ulp distance for approximate comparisons of floating-point values.
  std::optional<int32_t> ulp_distance() const { return ulp_distance_; }

  /// Return a new EqualOptions object with the "ulp_distance" property changed.
  /// If both "ulp_distance" and "atol" are specified, the comparison
  /// succeeds when either condition is satisfied.
  EqualOptions ulp_distance(int32_t v) {
    assert(v >= 0);
    auto res = EqualOptions(*this);
    res.ulp_distance_ = v;
    return res;
  }
  /// The ostream to which a diff will be formatted if arrays disagree.
  /// If this is null (the default) no diff will be formatted.
  std::ostream* diff_sink() const { return diff_sink_; }

  /// Return a new EqualOptions object with the "diff_sink" property changed.
  /// This option will be ignored if diff formatting of the types of compared arrays is
  /// not supported.
  EqualOptions diff_sink(std::ostream* diff_sink) const {
    auto res = EqualOptions(*this);
    res.diff_sink_ = diff_sink;
    return res;
  }

  static EqualOptions Defaults() { return {}; }

 protected:
  std::optional<double> atol_;
  std::optional<int32_t> ulp_distance_;
  bool nans_equal_ = false;
  bool signed_zeros_equal_ = true;
  bool use_schema_ = true;
  bool use_metadata_ = false;

  std::ostream* diff_sink_ = NULLPTR;
};

There are several questions regarding my proposal:
1-Such an API could lead to assertion failures like this. Would that be problematic?
2-Should we consider removing the ApproximateEqual methods from Arrow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants