Skip to content

[clangd] Improve Markup Rendering #140498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 25, 2025

Conversation

tcottin
Copy link
Contributor

@tcottin tcottin commented May 19, 2025

This is a preparation for fixing clangd/clangd#529.

It changes the Markup rendering to markdown and plaintext.

  • Properly separate paragraphs using an empty line between
  • Dont escape markdown syntax for markdown output except for HTML
  • Dont do any formatting for markdown because the client is handling the actual markdown rendering

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 19, 2025

@llvm/pr-subscribers-clangd

@llvm/pr-subscribers-clang-tools-extra

Author: None (tcottin)

Changes

This is a preparation for fixing clangd/clangd#529.

It changes the Markup rendering to markdown and plaintext.

  • Properly separate paragraphs using an empty line between
  • Dont escape markdown syntax for markdown output except for HTML
  • Dont do any formatting for markdown because the client is handling the actual markdown rendering

Patch is 41.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/140498.diff

7 Files Affected:

  • (modified) clang-tools-extra/clangd/Hover.cpp (+13-68)
  • (modified) clang-tools-extra/clangd/support/Markup.cpp (+147-105)
  • (modified) clang-tools-extra/clangd/support/Markup.h (+26-6)
  • (modified) clang-tools-extra/clangd/test/signature-help.test (+2-2)
  • (modified) clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp (+4-4)
  • (modified) clang-tools-extra/clangd/unittests/HoverTests.cpp (+51-24)
  • (modified) clang-tools-extra/clangd/unittests/support/MarkupTests.cpp (+167-47)
diff --git a/clang-tools-extra/clangd/Hover.cpp b/clang-tools-extra/clangd/Hover.cpp
index 3ab3d89030520..88755733aa67c 100644
--- a/clang-tools-extra/clangd/Hover.cpp
+++ b/clang-tools-extra/clangd/Hover.cpp
@@ -960,42 +960,6 @@ std::optional<HoverInfo> getHoverContents(const Attr *A, ParsedAST &AST) {
   return HI;
 }
 
-bool isParagraphBreak(llvm::StringRef Rest) {
-  return Rest.ltrim(" \t").starts_with("\n");
-}
-
-bool punctuationIndicatesLineBreak(llvm::StringRef Line) {
-  constexpr llvm::StringLiteral Punctuation = R"txt(.:,;!?)txt";
-
-  Line = Line.rtrim();
-  return !Line.empty() && Punctuation.contains(Line.back());
-}
-
-bool isHardLineBreakIndicator(llvm::StringRef Rest) {
-  // '-'/'*' md list, '@'/'\' documentation command, '>' md blockquote,
-  // '#' headings, '`' code blocks
-  constexpr llvm::StringLiteral LinebreakIndicators = R"txt(-*@\>#`)txt";
-
-  Rest = Rest.ltrim(" \t");
-  if (Rest.empty())
-    return false;
-
-  if (LinebreakIndicators.contains(Rest.front()))
-    return true;
-
-  if (llvm::isDigit(Rest.front())) {
-    llvm::StringRef AfterDigit = Rest.drop_while(llvm::isDigit);
-    if (AfterDigit.starts_with(".") || AfterDigit.starts_with(")"))
-      return true;
-  }
-  return false;
-}
-
-bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) {
-  // Should we also consider whether Line is short?
-  return punctuationIndicatesLineBreak(Line) || isHardLineBreakIndicator(Rest);
-}
-
 void addLayoutInfo(const NamedDecl &ND, HoverInfo &HI) {
   if (ND.isInvalidDecl())
     return;
@@ -1601,51 +1565,32 @@ std::optional<llvm::StringRef> getBacktickQuoteRange(llvm::StringRef Line,
   return Line.slice(Offset, Next + 1);
 }
 
-void parseDocumentationLine(llvm::StringRef Line, markup::Paragraph &Out) {
+void parseDocumentationParagraph(llvm::StringRef Text, markup::Paragraph &Out) {
   // Probably this is appendText(Line), but scan for something interesting.
-  for (unsigned I = 0; I < Line.size(); ++I) {
-    switch (Line[I]) {
+  for (unsigned I = 0; I < Text.size(); ++I) {
+    switch (Text[I]) {
     case '`':
-      if (auto Range = getBacktickQuoteRange(Line, I)) {
-        Out.appendText(Line.substr(0, I));
+      if (auto Range = getBacktickQuoteRange(Text, I)) {
+        Out.appendText(Text.substr(0, I));
         Out.appendCode(Range->trim("`"), /*Preserve=*/true);
-        return parseDocumentationLine(Line.substr(I + Range->size()), Out);
+        return parseDocumentationParagraph(Text.substr(I + Range->size()), Out);
       }
       break;
     }
   }
-  Out.appendText(Line).appendSpace();
+  Out.appendText(Text);
 }
 
 void parseDocumentation(llvm::StringRef Input, markup::Document &Output) {
-  std::vector<llvm::StringRef> ParagraphLines;
-  auto FlushParagraph = [&] {
-    if (ParagraphLines.empty())
-      return;
-    auto &P = Output.addParagraph();
-    for (llvm::StringRef Line : ParagraphLines)
-      parseDocumentationLine(Line, P);
-    ParagraphLines.clear();
-  };
+  llvm::StringRef Paragraph, Rest;
+  for (std::tie(Paragraph, Rest) = Input.split("\n\n");
+       !(Paragraph.empty() && Rest.empty());
+       std::tie(Paragraph, Rest) = Rest.split("\n\n")) {
 
-  llvm::StringRef Line, Rest;
-  for (std::tie(Line, Rest) = Input.split('\n');
-       !(Line.empty() && Rest.empty());
-       std::tie(Line, Rest) = Rest.split('\n')) {
-
-    // After a linebreak remove spaces to avoid 4 space markdown code blocks.
-    // FIXME: make FlushParagraph handle this.
-    Line = Line.ltrim();
-    if (!Line.empty())
-      ParagraphLines.push_back(Line);
-
-    if (isParagraphBreak(Rest) || isHardLineBreakAfter(Line, Rest)) {
-      FlushParagraph();
-    }
+    if (!Paragraph.empty())
+      parseDocumentationParagraph(Paragraph, Output.addParagraph());
   }
-  FlushParagraph();
 }
-
 llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
                               const HoverInfo::PrintedType &T) {
   OS << T.Type;
diff --git a/clang-tools-extra/clangd/support/Markup.cpp b/clang-tools-extra/clangd/support/Markup.cpp
index 63aff96b02056..b1e6252e473f5 100644
--- a/clang-tools-extra/clangd/support/Markup.cpp
+++ b/clang-tools-extra/clangd/support/Markup.cpp
@@ -11,7 +11,6 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
-#include "llvm/Support/Compiler.h"
 #include "llvm/Support/raw_ostream.h"
 #include <cstddef>
 #include <iterator>
@@ -56,80 +55,28 @@ bool looksLikeTag(llvm::StringRef Contents) {
   return true; // Potentially incomplete tag.
 }
 
-// Tests whether C should be backslash-escaped in markdown.
-// The string being escaped is Before + C + After. This is part of a paragraph.
-// StartsLine indicates whether `Before` is the start of the line.
-// After may not be everything until the end of the line.
-//
-// It's always safe to escape punctuation, but want minimal escaping.
-// The strategy is to escape the first character of anything that might start
-// a markdown grammar construct.
-bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,
-                        bool StartsLine) {
-  assert(Before.take_while(llvm::isSpace).empty());
-  auto RulerLength = [&]() -> /*Length*/ unsigned {
-    if (!StartsLine || !Before.empty())
-      return false;
-    llvm::StringRef A = After.rtrim();
-    return llvm::all_of(A, [C](char D) { return C == D; }) ? 1 + A.size() : 0;
-  };
-  auto IsBullet = [&]() {
-    return StartsLine && Before.empty() &&
-           (After.empty() || After.starts_with(" "));
-  };
-  auto SpaceSurrounds = [&]() {
-    return (After.empty() || llvm::isSpace(After.front())) &&
-           (Before.empty() || llvm::isSpace(Before.back()));
-  };
-  auto WordSurrounds = [&]() {
-    return (!After.empty() && llvm::isAlnum(After.front())) &&
-           (!Before.empty() && llvm::isAlnum(Before.back()));
-  };
-
+/// \brief Tests whether \p C should be backslash-escaped in markdown.
+///
+/// The MarkupContent LSP specification defines that `markdown` content needs to
+/// follow GFM (GitHub Flavored Markdown) rules. And we can assume that markdown
+/// is rendered on the client side. This means we do not need to escape any
+/// markdown constructs.
+/// The only exception is when the client does not support HTML rendering in
+/// markdown. In that case, we need to escape HTML tags and HTML entities.
+///
+/// **FIXME:** handle the case when the client does support HTML rendering in
+/// markdown. For this, the LSP server needs to check the
+/// [supportsHtml capability](https://github.com/microsoft/language-server-protocol/issues/1344)
+/// of the client.
+///
+/// \param C The character to check.
+/// \param After The string that follows \p C . This is used to determine if \p C is
+///             part of a tag or an entity reference.
+/// \returns true if \p C should be escaped, false otherwise.
+bool needsLeadingEscape(char C, llvm::StringRef After) {
   switch (C) {
-  case '\\': // Escaped character.
-    return true;
-  case '`': // Code block or inline code
-    // Any number of backticks can delimit an inline code block that can end
-    // anywhere (including on another line). We must escape them all.
-    return true;
-  case '~': // Code block
-    return StartsLine && Before.empty() && After.starts_with("~~");
-  case '#': { // ATX heading.
-    if (!StartsLine || !Before.empty())
-      return false;
-    llvm::StringRef Rest = After.ltrim(C);
-    return Rest.empty() || Rest.starts_with(" ");
-  }
-  case ']': // Link or link reference.
-    // We escape ] rather than [ here, because it's more constrained:
-    //   ](...) is an in-line link
-    //   ]: is a link reference
-    // The following are only links if the link reference exists:
-    //   ] by itself is a shortcut link
-    //   ][...] is an out-of-line link
-    // Because we never emit link references, we don't need to handle these.
-    return After.starts_with(":") || After.starts_with("(");
-  case '=': // Setex heading.
-    return RulerLength() > 0;
-  case '_': // Horizontal ruler or matched delimiter.
-    if (RulerLength() >= 3)
-      return true;
-    // Not a delimiter if surrounded by space, or inside a word.
-    // (The rules at word boundaries are subtle).
-    return !(SpaceSurrounds() || WordSurrounds());
-  case '-': // Setex heading, horizontal ruler, or bullet.
-    if (RulerLength() > 0)
-      return true;
-    return IsBullet();
-  case '+': // Bullet list.
-    return IsBullet();
-  case '*': // Bullet list, horizontal ruler, or delimiter.
-    return IsBullet() || RulerLength() >= 3 || !SpaceSurrounds();
   case '<': // HTML tag (or autolink, which we choose not to escape)
     return looksLikeTag(After);
-  case '>': // Quote marker. Needs escaping at start of line.
-    return StartsLine && Before.empty();
   case '&': { // HTML entity reference
     auto End = After.find(';');
     if (End == llvm::StringRef::npos)
@@ -142,10 +89,6 @@ bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,
     }
     return llvm::all_of(Content, llvm::isAlpha);
   }
-  case '.': // Numbered list indicator. Escape 12. -> 12\. at start of line.
-  case ')':
-    return StartsLine && !Before.empty() &&
-           llvm::all_of(Before, llvm::isDigit) && After.starts_with(" ");
   default:
     return false;
   }
@@ -156,8 +99,7 @@ bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,
 std::string renderText(llvm::StringRef Input, bool StartsLine) {
   std::string R;
   for (unsigned I = 0; I < Input.size(); ++I) {
-    if (needsLeadingEscape(Input[I], Input.substr(0, I), Input.substr(I + 1),
-                           StartsLine))
+    if (needsLeadingEscape(Input[I], Input.substr(I + 1)))
       R.push_back('\\');
     R.push_back(Input[I]);
   }
@@ -303,11 +245,12 @@ class CodeBlock : public Block {
 std::string indentLines(llvm::StringRef Input) {
   assert(!Input.ends_with("\n") && "Input should've been trimmed.");
   std::string IndentedR;
-  // We'll add 2 spaces after each new line.
+  // We'll add 2 spaces after each new line which is not followed by another new line.
   IndentedR.reserve(Input.size() + Input.count('\n') * 2);
-  for (char C : Input) {
+  for (size_t I = 0; I < Input.size(); ++I) {
+    char C = Input[I];
     IndentedR += C;
-    if (C == '\n')
+    if (C == '\n' && (((I + 1) < Input.size()) && (Input[I + 1] != '\n')))
       IndentedR.append("  ");
   }
   return IndentedR;
@@ -348,20 +291,24 @@ void Paragraph::renderMarkdown(llvm::raw_ostream &OS) const {
     if (C.SpaceBefore || NeedsSpace)
       OS << " ";
     switch (C.Kind) {
-    case Chunk::PlainText:
+    case ChunkKind::PlainText:
       OS << renderText(C.Contents, !HasChunks);
       break;
-    case Chunk::InlineCode:
+    case ChunkKind::InlineCode:
       OS << renderInlineBlock(C.Contents);
       break;
+    case ChunkKind::Bold:
+      OS << "**" << renderText(C.Contents, !HasChunks) << "**";
+      break;
+    case ChunkKind::Emphasized:
+      OS << "*" << renderText(C.Contents, !HasChunks) << "*";
+      break;
     }
     HasChunks = true;
     NeedsSpace = C.SpaceAfter;
   }
-  // Paragraphs are translated into markdown lines, not markdown paragraphs.
-  // Therefore it only has a single linebreak afterwards.
-  // VSCode requires two spaces at the end of line to start a new one.
-  OS << "  \n";
+  // A paragraph in markdown is separated by a blank line.
+  OS << "\n\n";
 }
 
 std::unique_ptr<Block> Paragraph::clone() const {
@@ -370,8 +317,8 @@ std::unique_ptr<Block> Paragraph::clone() const {
 
 /// Choose a marker to delimit `Text` from a prioritized list of options.
 /// This is more readable than escaping for plain-text.
-llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,
-                             llvm::StringRef Text) {
+llvm::StringRef Paragraph::chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,
+                                        llvm::StringRef Text) const {
   // Prefer a delimiter whose characters don't appear in the text.
   for (llvm::StringRef S : Options)
     if (Text.find_first_of(S) == llvm::StringRef::npos)
@@ -379,18 +326,94 @@ llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,
   return Options.front();
 }
 
+bool Paragraph::punctuationIndicatesLineBreak(llvm::StringRef Line) const{
+  constexpr llvm::StringLiteral Punctuation = R"txt(.:,;!?)txt";
+
+  Line = Line.rtrim();
+  return !Line.empty() && Punctuation.contains(Line.back());
+}
+
+bool Paragraph::isHardLineBreakIndicator(llvm::StringRef Rest) const {
+  // '-'/'*' md list, '@'/'\' documentation command, '>' md blockquote,
+  // '#' headings, '`' code blocks, two spaces (markdown force newline)
+  constexpr llvm::StringLiteral LinebreakIndicators = R"txt(-*@\>#`)txt";
+
+  Rest = Rest.ltrim(" \t");
+  if (Rest.empty())
+    return false;
+
+  if (LinebreakIndicators.contains(Rest.front()))
+    return true;
+
+  if (llvm::isDigit(Rest.front())) {
+    llvm::StringRef AfterDigit = Rest.drop_while(llvm::isDigit);
+    if (AfterDigit.starts_with(".") || AfterDigit.starts_with(")"))
+      return true;
+  }
+  return false;
+}
+
+bool Paragraph::isHardLineBreakAfter(llvm::StringRef Line,
+                                     llvm::StringRef Rest) const {
+  // In Markdown, 2 spaces before a line break forces a line break.
+  // Add a line break for plaintext in this case too.
+  // Should we also consider whether Line is short?
+  return Line.ends_with("  ") || punctuationIndicatesLineBreak(Line) ||
+         isHardLineBreakIndicator(Rest);
+}
+
 void Paragraph::renderPlainText(llvm::raw_ostream &OS) const {
   bool NeedsSpace = false;
+  std::string ConcatenatedText;
+  llvm::raw_string_ostream ConcatenatedOS(ConcatenatedText);
+
   for (auto &C : Chunks) {
+
+    if (C.Kind == ChunkKind::PlainText) {
+      if (C.SpaceBefore || NeedsSpace)
+        ConcatenatedOS << ' ';
+
+      ConcatenatedOS << C.Contents;
+      NeedsSpace = llvm::isSpace(C.Contents.back()) || C.SpaceAfter;
+      continue;
+    }
+
     if (C.SpaceBefore || NeedsSpace)
-      OS << " ";
+      ConcatenatedOS << ' ';
     llvm::StringRef Marker = "";
-    if (C.Preserve && C.Kind == Chunk::InlineCode)
+    if (C.Preserve && C.Kind == ChunkKind::InlineCode)
       Marker = chooseMarker({"`", "'", "\""}, C.Contents);
-    OS << Marker << C.Contents << Marker;
+    else if (C.Kind == ChunkKind::Bold)
+      Marker = "**";
+    else if (C.Kind == ChunkKind::Emphasized)
+      Marker = "*";
+    ConcatenatedOS << Marker << C.Contents << Marker;
     NeedsSpace = C.SpaceAfter;
   }
-  OS << '\n';
+
+  // We go through the contents line by line to handle the newlines
+  // and required spacing correctly.
+  llvm::StringRef Line, Rest;
+
+  for (std::tie(Line, Rest) =
+           llvm::StringRef(ConcatenatedText).trim().split('\n');
+       !(Line.empty() && Rest.empty());
+       std::tie(Line, Rest) = Rest.split('\n')) {
+
+    Line = Line.ltrim();
+    if (Line.empty())
+      continue;
+
+    OS << canonicalizeSpaces(Line);
+
+    if (isHardLineBreakAfter(Line, Rest))
+      OS << '\n';
+    else if (!Rest.empty())
+      OS << ' ';
+  }
+
+  // Paragraphs are separated by a blank line.
+  OS << "\n\n";
 }
 
 BulletList::BulletList() = default;
@@ -398,12 +421,13 @@ BulletList::~BulletList() = default;
 
 void BulletList::renderMarkdown(llvm::raw_ostream &OS) const {
   for (auto &D : Items) {
+    std::string M = D.asMarkdown();
     // Instead of doing this we might prefer passing Indent to children to get
     // rid of the copies, if it turns out to be a bottleneck.
-    OS << "- " << indentLines(D.asMarkdown()) << '\n';
+    OS << "- " << indentLines(M) << '\n';
   }
   // We need a new line after list to terminate it in markdown.
-  OS << '\n';
+  OS << "\n\n";
 }
 
 void BulletList::renderPlainText(llvm::raw_ostream &OS) const {
@@ -412,6 +436,7 @@ void BulletList::renderPlainText(llvm::raw_ostream &OS) const {
     // rid of the copies, if it turns out to be a bottleneck.
     OS << "- " << indentLines(D.asPlainText()) << '\n';
   }
+  OS << '\n';
 }
 
 Paragraph &Paragraph::appendSpace() {
@@ -420,29 +445,44 @@ Paragraph &Paragraph::appendSpace() {
   return *this;
 }
 
-Paragraph &Paragraph::appendText(llvm::StringRef Text) {
-  std::string Norm = canonicalizeSpaces(Text);
-  if (Norm.empty())
+Paragraph &Paragraph::appendChunk(llvm::StringRef Contents, ChunkKind K) {
+  if (Contents.empty())
     return *this;
   Chunks.emplace_back();
   Chunk &C = Chunks.back();
-  C.Contents = std::move(Norm);
-  C.Kind = Chunk::PlainText;
-  C.SpaceBefore = llvm::isSpace(Text.front());
-  C.SpaceAfter = llvm::isSpace(Text.back());
+  C.Contents = std::move(Contents);
+  C.Kind = K;
   return *this;
 }
 
+Paragraph &Paragraph::appendText(llvm::StringRef Text) {
+  if (!Chunks.empty() && Chunks.back().Kind == ChunkKind::PlainText) {
+    Chunks.back().Contents += std::move(Text);
+    return *this;
+  }
+
+  return appendChunk(Text, ChunkKind::PlainText);
+}
+
+Paragraph &Paragraph::appendEmphasizedText(llvm::StringRef Text) {
+  return appendChunk(canonicalizeSpaces(std::move(Text)),
+                     ChunkKind::Emphasized);
+}
+
+Paragraph &Paragraph::appendBoldText(llvm::StringRef Text) {
+  return appendChunk(canonicalizeSpaces(std::move(Text)), ChunkKind::Bold);
+}
+
 Paragraph &Paragraph::appendCode(llvm::StringRef Code, bool Preserve) {
   bool AdjacentCode =
-      !Chunks.empty() && Chunks.back().Kind == Chunk::InlineCode;
+      !Chunks.empty() && Chunks.back().Kind == ChunkKind::InlineCode;
   std::string Norm = canonicalizeSpaces(std::move(Code));
   if (Norm.empty())
     return *this;
   Chunks.emplace_back();
   Chunk &C = Chunks.back();
   C.Contents = std::move(Norm);
-  C.Kind = Chunk::InlineCode;
+  C.Kind = ChunkKind::InlineCode;
   C.Preserve = Preserve;
   // Disallow adjacent code spans without spaces, markdown can't render them.
   C.SpaceBefore = AdjacentCode;
@@ -475,7 +515,9 @@ Paragraph &Document::addParagraph() {
   return *static_cast<Paragraph *>(Children.back().get());
 }
 
-void Document::addRuler() { Children.push_back(std::make_unique<Ruler>()); }
+void Document::addRuler() {
+  Children.push_back(std::make_unique<Ruler>());
+}
 
 void Document::addCodeBlock(std::string Code, std::string Language) {
   Children.emplace_back(
diff --git a/clang-tools-extra/clangd/support/Markup.h b/clang-tools-extra/clangd/support/Markup.h
index 3a869c49a2cbb..a74fade13d115 100644
--- a/clang-tools-extra/clangd/support/Markup.h
+++ b/clang-tools-extra/clangd/support/Markup.h
@@ -49,6 +49,12 @@ class Paragraph : public Block {
   /// Append plain text to the end of the string.
   Paragraph &appendText(llvm::StringRef Text);
 
+  /// Append emphasized text, this translates to the * block in markdown.
+  Paragraph &appendEmphasizedText(llvm::StringRef Text);
+
+  /// Append bold text, this translates to the ** block in markdown.
+  Paragraph &appendBoldText(llvm::StringRef Text);
+
   /// Append inline code, this translates to the ` block in markdown.
   /// \p Preserve indicates the code span must be apparent even in plaintext.
   Paragraph &appendCode(llvm::StringRef Code, bool Preserve = false);
@@ -58,11 +64,9 @@ class Paragraph : public Block {
   Paragraph &appendSpace();
 
 private:
+  typedef enum { PlainText, InlineCode, Bold, Emphasized } ChunkKind;
   struct Chunk {
-    enum {
-      PlainText,
-      InlineCode,
-    } Kind = PlainText;
+    ChunkKind Kind = PlainText;
     // Preserve chunk markers in plaintext.
     bool Preserve = false;
     std::string Contents;
@@ -73,6 +77,19 @@ class Paragraph : public Block {
     bool SpaceAfter = false;
   };
   std::vector<Chunk> Chunks;
+
+  Paragraph &appendChunk(llvm::StringRef Contents, ChunkKind K);
+
+  llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,
+                               llvm::StringRef Text) const;
+  bool punctuationIndicatesLineBreak(llvm::StringRef Line) const;
+  bool isHardLineBreakIndicator(llvm::StringRef Rest) const;
+  bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) const;
+};
+
+class ListItemParagraph : public Paragraph {
+public:
+  void renderMarkdown(llvm::raw_ostream &OS) const override;
 };
 
 /// Represents a sequence of one or more documents. Knows how to print them in a
@@ -82,6 +99,9 @@ class BulletList : public Block {
   BulletList();
...
[truncated]

@tcottin
Copy link
Contributor Author

tcottin commented May 30, 2025

@HighCommander4 HighCommander4 requested a review from kadircet May 30, 2025 18:14
Copy link

github-actions bot commented May 30, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@HighCommander4
Copy link
Collaborator

Carrying over review request to @kadircet from the original combined patch at #128591 (comment):

@kadircet, you were involved with earlier rounds of review on the old patch (https://reviews.llvm.org/D143112) and design discussions in the issue (clangd/clangd#529) -- do you by chance have the cycles to pick up this review?

@@ -118,8 +138,8 @@ class Document {
BulletList &addBulletList();

/// Doesn't contain any trailing newlines.
/// We try to make the markdown human-readable, e.g. avoid extra escaping.
/// At least one client (coc.nvim) displays the markdown verbatim!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deleted comment refers to coc.nvims rendering of markdown - would this PR cause a change of behavior for that use case? If I understand correctly, the prior work intentionally targeted clients, like this, with limited markdown rendering support.

If that's the case and we somehow need to find a balance between Markdown capabilities of different clients, could the MarkdownClientCapabilities flags be used to guard the necessary logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch changes 2 things for markdown:

  • The documentation string is taken as is without escaping anything (except for html tags and entity references)
  • Paragraphs are separated with an empty line

My idea was to let the client handle the markdown rendering completely.
Therefore no escaping is needed and even false because if markdown annotations are escaped, the client will not render it as expected.
The same for the empty line between paragraphs: this is how paragraphs are separated in markdown.

I think it is save to assume that the client can handle unescaped markdown.
According to the LSP specification,
if a clients capability is specifying 'markdown' as MarkupKind for it MarkupContent, the "content should follow the GitHub Flavored Markdown Specification".

Now regarding coc.nvim: I dont use it and do not know the current behaviour.
Reading through e.g. clangd/coc-clangd#48 though, I understood that coc.nvim requests MarkupContent with 'markdown' but does not really render markdown.
Hence I would assume that in this case leaving out the escaping would result in just not showing the escaping character.
Regarding the empty newlines between paragraphs I would also expect that the empty line is just shown, creating better visual separations between paragraphs

But: @emaxx-google do you use coc.nvim and could you give this patch a try?
Maybe my assumptions are false and it causes issues I did not see yet.

Regarding the capabilities: I think currently there is only 'plaintext' and 'markdown'.
Additionally there is the option to allow HTML tags when markdown is selected.
Other than that I dont know how we could handle this.
But I think we don't need to actually, because as stated above, if the client specifies to support markdown, it should handle it correctly.
If it can't handle it, 'plaintext' is always an option.

Copy link
Contributor

@emaxx-google emaxx-google Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this PR assumes that the input documentation string is written using Markdown. The current implementation in Clangd, however, is much more conservative in its assumptions - it only recognizes basic syntax features like backticks. Hence when rendering for markdown, everything else is escaped because the assumption is it's actually a plain text, so that all characters that'd (accidentally?) trigger special semantics in the Markdown client-side renderer are mangled to be not treated as such:

// Tests whether C should be backslash-escaped in markdown.
<...>
// It's always safe to escape punctuation, but want minimal escaping.
// The strategy is to escape the first character of anything that might start
// a markdown grammar construct.

A change to pass documentation strings as-is would cause regressions on code bases where comments aren't written with the markdown syntax in mind.

P.S. As for coc-nvim - I personally don't use it, but if we're to do radical changes to how things are output, it's worth it checking we don't regress the UI for this known case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply. I did not think about it this way.
The issue that you dont want your documentation to be rendered as markdown by the client seems different from the coc.nvim issue.

The old behaviour was to treat all documentation as non-markdown independent of client render capabilities.
Hence to even make this work for clients requesting markdown as the MarkupKind, markdown syntax in the documentation string was escaped.
This way, only information which was added by clangd (like the declaration of the hover point) is rendered in markdown, if the client supports it.

With this change, the documentation is taken as-is which leads to rendered markdown on clients requesting markdown.

If viewed this way, the escaping behaviour is actually not a client decision and independent of the clients capabilities.
Therefore we could introduce a server option to control the escaping behaviour I think.

Just for reference: doxygen supports markdown since 1.8.0. There is a MARKDOWN_SUPPORT configuration option which is set to YES by default.
This means in a doxygen comment, all markdown syntax will be rendered as markdown in the final documentation by default.
Hence the default doxygen behaviour is the same as the behaviour proposed with this change.

I would argue that for the markdown rendering behaviour of clangd it should be the same:
the proposed behaviour should be the default but could be switched off with a server option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should distinguish between the input documentation's syntax and the client's rendering capabilities. The latter defines the syntax in which it expects the data received from the server, and not that all input is forced to be treated as if written in Markdown. As said, we're suspecting that source comments in a lot of projects would be rendered incorrectly if they'd be all interpreted as verbatim Markdown as a blanket decision.

So guarding this "no escaping" behavior by an option is one possibility, yes. Another possible direction could be to decide based on the syntax used in the comments - i.e., recognizing those distinctive comment syntaxes from https://www.doxygen.nl/manual/docblocks.html ? (This latter option would be nice as it'd provide reasonable behavior by default, although if necessary it could still be combined with a server option.)

Copy link
Contributor

@emaxx-google emaxx-google Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good generally! Thanks.

Just a quick question:

Plaintext (default): the old escaping behaviour and no doxygen command parsing. The only thing I would still change is the paragraph termination with 2 newlines instead of 1.

A single blank line between paragraphs sounds OK. Or do you mean adding another blank line in between, or a trailing blank line at the end? - that might look extraneous. In any case, that sounds like a relatively minor implementation detail that we can discuss further as the PR goes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single blank line between paragraphs sounds OK. Or do you mean adding another blank line in between, or a trailing blank line at the end? - that might look extraneous. In any case, that sounds like a relatively minor implementation detail that we can discuss further as the PR goes.

I mean in between.

Multiple newlines in a comment used to be "collapsed" to a single newline, resulting in removed paragraph separation in Markdown:

/**
 * paragraph1 line1
 * paragraph1 line2
 *
 * paragraph2 line1
 */

Hover looked like this:

paragraph1 line1 paragraph1 line2
paragraph2 line1

With the change, paragraphs are now seperated by an empty line (as per Markdown specification).

paragraph1 line1 paragraph1 line2

paragraph2 line1

Trailing newlines are always trimmed. This did not change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserving the blank line between paragraphs sounds reasonable. Thanks for giving the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the provided design.
I actually wanted to use the new config from the markup::Document class directly.
But the class is in the clangdSupport lib which does not have access to the config apparently.

Therefore I added a new asEscapedMarkdown function to the class.
This allows the users to determine what is needed based on the config and the client capabilities.

I fixed the missed formatting, but force pushed...
I think this canceled the approval of the CI again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that there were some things missing.

Whitespace is preserved now, therefore also the escaping implementation needed some additional changes.

Note: the whitespace preservation should not affect the client rendering.
Whitespace used to be canonicalized which leads to the markdown string already having spacing as the markdown renderer would show it on the client anyway.

Now whitespace is preserved but the client requests Markdown, hence it should be shown correctly.

@tcottin
Copy link
Contributor Author

tcottin commented Jun 7, 2025

ping

@mccakit
Copy link

mccakit commented Jun 20, 2025

ping @kadircet

@aaronliu0130
Copy link

@mccakit It looks like there's currently a reply from @tcottin to emaxx-google's last comment needed.

@mccakit
Copy link

mccakit commented Jun 21, 2025

Yeah, I saw it. Hope they look into it. This PR is like 2 months old, doxygen parsing at clang have some patches I found online that date back to 2019.

I'm currently using tcottins PR branch right now, because that is the only way of getting doxygen comments on my editor, which is Zed.

@tcottin tcottin force-pushed the improve-markup-rendering branch from bf5a925 to 39b891d Compare July 1, 2025 17:32
@tcottin
Copy link
Contributor Author

tcottin commented Jul 10, 2025

ping @emaxx-google

Copy link
Contributor

@emaxx-google emaxx-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple of questions about newline handling, and a few minor nits.

@tcottin tcottin requested a review from emaxx-google July 12, 2025 20:33
@tcottin
Copy link
Contributor Author

tcottin commented Jul 12, 2025

Thanks for the review!

I pushed some fixes.

@tcottin
Copy link
Contributor Author

tcottin commented Jul 21, 2025

ping @emaxx-google

@@ -677,6 +719,8 @@ Paragraph &Paragraph::appendCode(llvm::StringRef Code, bool Preserve) {
C.Preserve = Preserve;
// Disallow adjacent code spans without spaces, markdown can't render them.
C.SpaceBefore = AdjacentCode;

EstimatedStringSize += Norm.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Norm is likely empty here because of the std::move() above. Probably the simplest fix would be to move this size calculation above.

@@ -37,6 +37,14 @@ class Block {

virtual bool isRuler() const { return false; }
virtual ~Block() = default;

protected:
Copy link
Contributor

@emaxx-google emaxx-google Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If the field is Paragraph-specific, should we just move it into that class and make it private?

Copy link
Contributor

@emaxx-google emaxx-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM modulo the use-after-move fix and a nit.

@tcottin
Copy link
Contributor Author

tcottin commented Jul 24, 2025

Thanks for the review and approval @emaxx-google!

Once this is merged I will rebase my changes in https://github.com/tcottin/llvm-project/tree/introduce-doxygen-parsing and create the next PR.

@emaxx-google emaxx-google merged commit 5199489 into llvm:main Jul 25, 2025
9 checks passed
Copy link

@tcottin Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
This is a preparation for fixing clangd/clangd#529.

It changes the Markup rendering to markdown and plaintext.

- Properly separate paragraphs using an empty line between
- Dont escape markdown syntax for markdown output except for HTML
- Dont do any formatting for markdown because the client is handling the
actual markdown rendering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants