Skip to content

[Clang][Sema] Reject array prvalue operands #140702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

languagelawyer
Copy link

of unary + and *, and binary + and - operators

Fixes #54016

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 20, 2025
@llvmbot
Copy link
Member

llvmbot commented May 20, 2025

@llvm/pr-subscribers-clang

Author: None (languagelawyer)

Changes

of unary + and *, and binary + and - operators

Fixes #54016


Full diff: https://github.com/llvm/llvm-project/pull/140702.diff

2 Files Affected:

  • (modified) clang/lib/Sema/SemaExpr.cpp (+21)
  • (modified) clang/test/CXX/expr/p8.cpp (+10-2)
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index d1889100c382e..c0fa9bc9e895e 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -11333,6 +11333,11 @@ QualType Sema::CheckAdditionOperands(ExprResult &LHS, ExprResult &RHS,
   if (!IExp->getType()->isIntegerType())
     return InvalidOperands(Loc, LHS, RHS);
 
+  if (OriginalOperand Orig(PExp); Orig.getType()->isArrayType() && Orig.Orig->isPRValue()) {
+    Diag(Loc, diag::err_typecheck_array_prvalue_operand) << PExp->getSourceRange();
+    return QualType();
+  }
+
   // Adding to a null pointer results in undefined behavior.
   if (PExp->IgnoreParenCasts()->isNullPointerConstant(
           Context, Expr::NPC_ValueDependentIsNotNull)) {
@@ -11429,6 +11434,16 @@ QualType Sema::CheckSubtractionOperands(ExprResult &LHS, ExprResult &RHS,
     return compType;
   }
 
+  OriginalOperand OrigLHS(LHS.get()), OrigRHS(RHS.get());
+  bool LHSArrayPRV = OrigLHS.getType()->isArrayType() && OrigLHS.Orig->isPRValue();
+  bool RHSArrayPRV = OrigRHS.getType()->isArrayType() && OrigRHS.Orig->isPRValue();
+  if (LHSArrayPRV || RHSArrayPRV) {
+    auto&& diag = Diag(Loc, diag::err_typecheck_array_prvalue_operand);
+    if (LHSArrayPRV) diag << LHS.get()->getSourceRange();
+    if (RHSArrayPRV) diag << RHS.get()->getSourceRange();
+    return QualType();
+  }
+
   // Either ptr - int   or   ptr - ptr.
   if (LHS.get()->getType()->isAnyPointerType()) {
     QualType lpointee = LHS.get()->getType()->getPointeeType();
@@ -15840,6 +15855,12 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
       InputExpr->getType()->isSpecificBuiltinType(BuiltinType::Dependent)) {
     resultType = Context.DependentTy;
   } else {
+    if (Opc == UO_Deref || Opc == UO_Plus) {
+      if (auto *expr = Input.get(); expr->getType()->isArrayType() && expr->isPRValue()) {
+        Diag(OpLoc, diag::err_typecheck_array_prvalue_operand) << expr->getSourceRange();
+        return ExprError();
+      }
+    }
     switch (Opc) {
     case UO_PreInc:
     case UO_PreDec:
diff --git a/clang/test/CXX/expr/p8.cpp b/clang/test/CXX/expr/p8.cpp
index 471d1c5a30206..f736b88b3db09 100644
--- a/clang/test/CXX/expr/p8.cpp
+++ b/clang/test/CXX/expr/p8.cpp
@@ -1,5 +1,4 @@
-// RUN: %clang_cc1 -fsyntax-only -verify %s
-// expected-no-diagnostics
+// RUN: %clang_cc1 -fsyntax-only -verify %s -std=c++11
 
 int a0;
 const volatile int a1 = 2;
@@ -16,4 +15,13 @@ int main()
   f0(a1);
   f1(a2);
   f2(a3);
+
+  using IA = int[];
+  void(+IA{ 1, 2, 3 }); // expected-error {{array prvalue}}
+  void(*IA{ 1, 2, 3 }); // expected-error {{array prvalue}}
+  void(IA{ 1, 2, 3 } + 0); // expected-error {{array prvalue}}
+  void(IA{ 1, 2, 3 } - 0); // expected-error {{array prvalue}}
+  void(0 + IA{ 1, 2, 3 }); // expected-error {{array prvalue}}
+  void(0 - IA{ 1, 2, 3 }); // expected-error {{array prvalue}}
+  void(IA{ 1, 2, 3 } - IA{ 1, 2, 3 }); // expected-error {{array prvalue}}
 }

Copy link

github-actions bot commented May 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@zwuis
Copy link
Contributor

zwuis commented May 20, 2025

This is related to CWG2548. CC @Endilll

@languagelawyer
Copy link
Author

languagelawyer commented May 20, 2025

This is related to CWG2548

Kinda, but not really. It has never been necessary to create that (non-)issue. Or does someone really believe that it is a C++ defect that IA{ 1, 2, 3 } - IA{ 1, 2, 3 } is not allowed? So, it is just a bug since C++11

@languagelawyer languagelawyer force-pushed the array_prvalues_as_operands branch 2 times, most recently from 3e262aa to f606724 Compare May 20, 2025 11:50
of unary + and *, and binary + and - operators
@languagelawyer languagelawyer force-pushed the array_prvalues_as_operands branch from f606724 to df91056 Compare May 20, 2025 11:55
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to break behavior in C? https://godbolt.org/z/PsPPs8ov1

@languagelawyer
Copy link
Author

languagelawyer commented May 20, 2025

Is this going to break behavior in C?

Array prvalues is C++11+ exclusive thing. Compound literals are lvalues in C https://port70.net/~nsz/c/c11/n1570.html#6.5.2.5p4
Anyway, here is a check:

$ build/bin/clang -fsyntax-only test.c
test.c:2:3: warning: expression result unused [-Wunused-value]
    2 |   *((int []){ 1, 2, 3});
      |   ^~~~~~~~~~~~~~~~~~~~~
test.c:3:24: warning: expression result unused [-Wunused-value]
    3 |   ((int []){ 1, 2, 3}) + 0;
      |   ~~~~~~~~~~~~~~~~~~~~ ^ ~
2 warnings generated.

However, it "breaks" compound literal C++ extension:

$ build/bin/clang++ -fsyntax-only test.cxx 
test.cxx:12:2: error: array prvalue is not permitted
   12 |         *((int []){ 1, 2, 3});
      |         ^~~~~~~~~~~~~~~~~~~~~
test.cxx:13:23: error: array prvalue is not permitted
   13 |         ((int []){ 1, 2, 3}) + 0;
      |         ~~~~~~~~~~~~~~~~~~~~ ^
2 errors generated.

but i.g. this is OK/expected

@AaronBallman
Copy link
Collaborator

Is this going to break behavior in C?

Array prvalues is C++11+ exclusive thing. Compound literals are lvalues in C

Ah, good point, I wasn't remembering that C and C++ are different in how they handle compound literals. And you can return a pointer to an array in C, but that's still an lvalue.

Anyway, here is a check:

$ build/bin/clang -fsyntax-only test.c
test.c:2:3: warning: expression result unused [-Wunused-value]
    2 |   *((int []){ 1, 2, 3});
      |   ^~~~~~~~~~~~~~~~~~~~~
test.c:3:24: warning: expression result unused [-Wunused-value]
    3 |   ((int []){ 1, 2, 3}) + 0;
      |   ~~~~~~~~~~~~~~~~~~~~ ^ ~
2 warnings generated.

Thank you!

@AaronBallman
Copy link
Collaborator

However, it "breaks" compound literal C++ extension:

$ build/bin/clang++ -fsyntax-only test.cxx 
test.cxx:12:2: error: array prvalue is not permitted
   12 |         *((int []){ 1, 2, 3});
      |         ^~~~~~~~~~~~~~~~~~~~~
test.cxx:13:23: error: array prvalue is not permitted
   13 |         ((int []){ 1, 2, 3}) + 0;
      |         ~~~~~~~~~~~~~~~~~~~~ ^
2 errors generated.

That probably should continue to work -- we accept it today and so does GCC. It's a bit of an oddity, to be sure. But I don't see why it should be rejected either.

@languagelawyer
Copy link
Author

That probably should continue to work -- we accept it today and so does GCC

GCC started accepting it from version 14, likely when fixing https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94264 (it is mentioned in #54016 (comment)), but it also started accepting IA{ 1, 2, 3 } + 0, which is clearly wrong https://godbolt.org/z/r7Yc3o1qd @pinskia @jicama

I think, if someone wants 100% C compatibility, then they should make compound literals lvalues. If compound literals are prvalues in C++ extension, then they should be treated like other array prvalues

@AaronBallman
Copy link
Collaborator

That probably should continue to work -- we accept it today and so does GCC

GCC started accepting it from version 14, likely when fixing https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94264 (it is mentioned in #54016 (comment)), but it also started accepting IA{ 1, 2, 3 } + 0, which is clearly wrong https://godbolt.org/z/r7Yc3o1qd @pinskia @jicama

I think, if someone wants 100% C compatibility, then they should make compound literals lvalues. If compound literals are prvalues in C++ extension, then they should be treated like other array prvalues

Ugh, I was backwards again. They're prvalues in C++ already, not lvalues as in C, so yes, it should be rejected in C++ with your changes. Sorry for the confusion!

@pinskia
Copy link

pinskia commented May 20, 2025

So reading https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220; it was decided to treat compound literals exactly the same as prvalues arrays in C++ mode of GCC. Yes breaking compatibility with C for the most part.

@languagelawyer
Copy link
Author

@pinskia the thing is, GCC started accepting ill-formed code right when it was reaffirmed it is ill-formed

@@ -16,4 +15,13 @@ int main()
f0(a1);
f1(a2);
f2(a3);

using IA = int[];
void(+IA{ 1, 2, 3 }); // expected-error {{array prvalue}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests here to make sure that we don’t complain about array glvalues?

@@ -7639,6 +7639,8 @@ def warn_param_mismatched_alignment : Warning<

def err_objc_object_assignment : Error<
"cannot assign to class object (%0 invalid)">;
def err_typecheck_array_prvalue_operand : Error<
"array prvalue is not permitted">;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"array prvalue is not permitted">;
"operand of '%0' cannot be an array prvalue">;

I think it’s a bit clearer if we phrase it like this and also print out the operator rather than just saying ‘X is not permitted’.

Copy link
Author

@languagelawyer languagelawyer May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the operator is pointed at by ^, like

test.cxx:13:23: error: array prvalue is not permitted
   13 |         ((int []){ 1, 2, 3}) + 0;
      |         ~~~~~~~~~~~~~~~~~~~~ ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in that case, maybe just ‘operand cannot be an array prvalue’. I would prefer at least including the word ‘operand’ so it’s clear that the problem is that you’re passing an array prvalue to this operator, because at the moment the diagnostic makes it sound like array prvalues aren’t permitted at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the operator is pointed at by ^

This part can be removed by using -fno-caret-diagnostics command line option. In this case, the diagnostic is as followed:

test.cxx:13:23: error: array prvalue is not permitted

Users may get confused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zwuis is "operand cannot be an array prvalue" ok? Comparing to an existing error:

	IA{ 1, 2, 3 } + 0.;
	IA{ 1, 2, 3 } + 0;
test.cxx:5:16: error: invalid operands to binary expression ('int[3]' and 'double')
test.cxx:6:16: error: operand cannot be an array prvalue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're coming from a C background
What about Python/Java background?

They're not particularly relevant because you don't directly mix Java and C++ code in the same way you do with C and C++ as happens in header files.

IIRC there's no rvalue conversion on the statement expression result so I think that ends up being a prvalue of array type
This is not correct, statement expressions (their "result") undergo array and function pointer decay (independently of the context where they appear), so you're getting a "dangling pointer"

I was incorrectly remembering the behavior from returning a char and not having it promote to an int. I can confirm we decay the array: https://godbolt.org/z/zhW1fnsej but you can hit the same concern I had via a typedef or using, where the type information is actually slightly helpful in understanding the issue: using foo = int[10]; foo{} + 0; (keeping in mind that foo{} could also be behind a macro where it's not easy for the user to spot the {} and realize there's a temporary involved). So I still find a formulation that includes the type information a bit more user-friendly, but that could be included in a new diagnostic that talks about temporary arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AaronBallman and @Sirraide that using err_typecheck_invalid_operands is the better approach here.
The "prvalue" bit is only relevant as far as array decay is concerned (eg : array-to-pointer does not apply).
But ultimately, int[] + int is what we should diagnose, not how we got here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping in mind that foo{} could also be behind a macro

I doubt that presenting exact array type vs. just saying "array" would help much in this case. There is note: expanded from macro to point that something comes from macro expansion. (Not shown for err_typecheck_invalid_operands, BTW!)

err_typecheck_invalid_operands is the better approach here

int main()
{
	using IA = int[];
	IA ia = { 1, 2, 3 };

	ia + 0.; // error: invalid operands to binary expression ('int[3]' and 'double')
	         // Where is the problem? in 'int[3]' or in 'double'?

	ia + 0; // no error means the issue was in 'double'?

	IA{ 1, 2, 3 } + 0; // error: invalid operands to binary expression ('int[3]' and 'int')
	                   // Huh? What about now?
}

But ultimately, int[] + int is what we should diagnose

The types are not the issue, why give misleading diagnostics?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid operands to binary expression

BTW, I think it is either "in binary expression" or "to binary operator"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid operands to binary expression

BTW, I think it is either "in binary expression" or "to binary operator"

‘an operand to sth.’ is a common turn of phrase that’s been around for a long time (not just in Clang).

@@ -15840,6 +15859,11 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
InputExpr->getType()->isSpecificBuiltinType(BuiltinType::Dependent)) {
resultType = Context.DependentTy;
} else {
if (Opc == UO_Deref || Opc == UO_Plus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird to use an if statement here when there’s a switch statement on the same variable right after. I’d probably make this a lambda and then call it in the cases for * and + below (the + case would then also need a [[fallthrough]] annotation).

@AaronBallman
Copy link
Collaborator

This is related to CWG2548

Kinda, but not really. It has never been necessary to create that (non-)issue. Or does someone really believe that it is a C++ defect that IA{ 1, 2, 3 } - IA{ 1, 2, 3 } is not allowed? So, it is just a bug since C++11

I think it is CWG2548; that's what made it clear that array-to-pointer decay does not happen here despite that being inconsistent with the rest of the language (cplusplus/papers#1633)

@languagelawyer
Copy link
Author

it is CWG2548; that's what made it clear that array-to-pointer decay does not happen here

CWG2548 is closed as NAD, with no changes to the wording. The fact that array-to-pointer decay does not happen has always been clear

despite that being inconsistent with the rest of the language

I'd say CWG hallucinated inconsistence, to me there is none

@AaronBallman
Copy link
Collaborator

it is CWG2548; that's what made it clear that array-to-pointer decay does not happen here

CWG2548 is closed as NAD, with no changes to the wording.

Correct.

The fact that array-to-pointer decay does not happen has always been clear

Incorrect; the reason the core issue was filed in the first place was because it seemed like an oversight that the language was inconsistent. Clang was implementing the consistent behavior where array to pointer decay happened. That's why this is CWG2548.

despite that being inconsistent with the rest of the language

I'd say CWG hallucinated inconsistence, to me there is none

int x[10];

(void)(x + 0); // perfectly fine, array to pointer decay on x
(void)((int[10]){} + 0); // Not okay under CWG2548 being closed NAD, no array to pointer decay on the temporary

So definitely inconsistent, and EWG agreed that it was, but liked the accidental behavior that came from it.

@languagelawyer
Copy link
Author

it seemed like an oversight that the language was inconsistent

Indeed, it only seemed.

Clang was implementing the consistent behavior

Clang is just cutting corners tryna parse C and C++ by the same code using as few lines as possible, so it applied incorrect conversion by batching/applying all of them at once. I think the only motivation of asking for issue instead of fixing the bug was to keep the implementation simpler.

Not okay under CWG2548

Not okay under [expr.add] and [expr]/8

Imagine binary + and * were doing temporary materialization:

const auto& r1 = IA{1, 2, 3}[0]; // extends lifetime of the temporary object
const auto& r2 = (IA{1, 2, 3} + 0); // does not extend
const auto& r3 = *(IA{1, 2, 3} + 0); // does not extend
const auto& r4 = *IA{1, 2, 3}; // does not extend

how this would be consistent? Quite contrary, this would become inconsistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang++ incorrectly accepts addition with an array prvalue operand
7 participants