GH-48408: [C++] Enable ULP-based float comparison#49290
GH-48408: [C++] Enable ULP-based float comparison#49290andishgar wants to merge 2 commits intoapache:mainfrom
Conversation
|
|
c79f91f to
9cd0147
Compare
|
@pitrou , I would appreciate it if you could review this. |
pitrou
left a comment
There was a problem hiding this comment.
Thanks for submitting this PR @andishgar . As you'll see in the comments below, I think it would be nice to come up with a nicer API for EqualOptions.
| bool use_atol_ = false; | ||
| bool use_schema_ = true; | ||
| bool use_metadata_ = false; | ||
| bool use_ulp_distance_ = false; |
There was a problem hiding this comment.
We could have a enum to simplify those flags slightly, for example:
enum FloatComparison { Exact, Atol, Ulps };
FloatComparison float_comparison_ = Exact;Alternatively, we could use a std::variant to also encompass the numeric parameters:
struct ExactComparison {};
struct UlpComparison { int max_ulps; }
struct AtolComparison { double atol; }
// Defaults to ExactComparison
std::variant<ExactComparison, UlpComparison, AtolComparison> float_comparison_ = {};| auto res = EqualOptions(*this); | ||
| res.ulp_distance_ = v; | ||
| return res; | ||
| } |
There was a problem hiding this comment.
This is starting to be a lot of methods just to customize FP comparison. Also, usually you know the number of ULPs you want, so you have to call two methods (use_ulp_distance and ulp_distance) which feels pointlessly complicated.
I wonder if we could have a single method to encompass all these usages, for example:
auto options = EqualOptions().float_comparison(EqualOptions::Atol(1e-5));
auto options2 = EqualOptions().float_comparison(EqualOptions::Ulps(2));|
@pitrou several notes static constexpr double kDefaultAbsoluteTolerance = 1E-5;
static constexpr int32_t kDefaultUlpDistance = 4;
/// A container of options for equality comparisons
class EqualOptions {
public:
/// Whether or not NaNs are considered equal.
bool nans_equal() const { return nans_equal_; }
/// Return a new EqualOptions object with the "nans_equal" property changed.
EqualOptions nans_equal(bool v) const {
auto res = EqualOptions(*this);
res.nans_equal_ = v;
return res;
}
/// Whether or not zeros with differing signs are considered equal.
bool signed_zeros_equal() const { return signed_zeros_equal_; }
/// Return a new EqualOptions object with the "signed_zeros_equal" property changed.
EqualOptions signed_zeros_equal(bool v) const {
auto res = EqualOptions(*this);
res.signed_zeros_equal_ = v;
return res;
}
/// The absolute tolerance for approximate comparisons of floating-point values.
std::optional<double> atol() const { return atol_; }
/// Return a new EqualOptions object with the "atol" property changed.
/// If both "ulp_distance" and "atol" are specified, the comparison
/// succeeds when either condition is satisfied.
EqualOptions atol(double v) const {
auto res = EqualOptions(*this);
res.atol_ = v;
return res;
}
/// Whether the \ref arrow::Schema property is used in the comparison.
///
/// This option only affects the Equals methods
/// and has no effect on ApproxEquals methods.
bool use_schema() const { return use_schema_; }
/// Return a new EqualOptions object with the "use_schema_" property changed.
///
/// Setting this option is false making the value of \ref EqualOptions::use_metadata
/// is ignored.
EqualOptions use_schema(bool v) const {
auto res = EqualOptions(*this);
res.use_schema_ = v;
return res;
}
/// Whether the "metadata" in \ref arrow::Schema is used in the comparison.
///
/// This option only affects the Equals methods
/// and has no effect on the ApproxEquals methods.
///
/// Note: This option is only considered when \ref arrow::EqualOptions::use_schema is
/// set to true.
bool use_metadata() const { return use_metadata_; }
/// Return a new EqualOptions object with the "use_metadata" property changed.
EqualOptions use_metadata(bool v) const {
auto res = EqualOptions(*this);
res.use_metadata_ = v;
return res;
}
/// The ulp distance for approximate comparisons of floating-point values.
std::optional<int32_t> ulp_distance() const { return ulp_distance_; }
/// Return a new EqualOptions object with the "ulp_distance" property changed.
/// If both "ulp_distance" and "atol" are specified, the comparison
/// succeeds when either condition is satisfied.
EqualOptions ulp_distance(int32_t v) {
assert(v >= 0);
auto res = EqualOptions(*this);
res.ulp_distance_ = v;
return res;
}
/// The ostream to which a diff will be formatted if arrays disagree.
/// If this is null (the default) no diff will be formatted.
std::ostream* diff_sink() const { return diff_sink_; }
/// Return a new EqualOptions object with the "diff_sink" property changed.
/// This option will be ignored if diff formatting of the types of compared arrays is
/// not supported.
EqualOptions diff_sink(std::ostream* diff_sink) const {
auto res = EqualOptions(*this);
res.diff_sink_ = diff_sink;
return res;
}
static EqualOptions Defaults() { return {}; }
protected:
std::optional<double> atol_;
std::optional<int32_t> ulp_distance_;
bool nans_equal_ = false;
bool signed_zeros_equal_ = true;
bool use_schema_ = true;
bool use_metadata_ = false;
std::ostream* diff_sink_ = NULLPTR;
};There are several questions regarding my proposal: |
Rationale for this change
Enable ULP-based floating-point comparison.
What changes are included in this PR?
Add
arrow::EqualOptions::use_ulp_distanceandarrow::EqualOptions::ulp_distance.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. The ULP-based comparison method is enabled via
arrow::EqualOptions::use_ulp_distanceandarrow::EqualOptions::ulp_distance.