Skip to content

Conversation

@mleleszi
Copy link
Member

@mleleszi mleleszi commented Dec 12, 2025

#172040

This patch implements the scripts for generating the lookup tables and associated utils for wctype classification functions. Not all Unicode properties are covered as not all need a lookup table, the rest will be hardcoded. The size of the generated tables is 47,8KB.

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

✅ With the latest revision this PR passed the Python code formatter.

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

🐧 Linux x64 Test Results

✅ The build succeeded and no tests ran. This is expected in some build configurations.

@llvmbot
Copy link
Member

llvmbot commented Dec 12, 2025

@llvm/pr-subscribers-libc

Author: Marcell Leleszi (mleleszi)

Changes

#172040

This patch implements the scripts for generating the lookup tables and associated utils for wctype classification functions. Not all Unicode properties are covered as not all need a lookup table, the rest will be hardcoded. The size of the generated tables is 47,8KB.


Patch is 315.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/172042.diff

11 Files Affected:

  • (modified) libc/.gitignore (+3)
  • (modified) libc/cmake/modules/LLVMLibCCompileOptionRules.cmake (+4)
  • (modified) libc/src/__support/CMakeLists.txt (+1)
  • (added) libc/src/__support/wctype/CMakeLists.txt (+11)
  • (added) libc/src/__support/wctype/wctype_classification_utils.h (+3726)
  • (modified) libc/test/src/__support/CMakeLists.txt (+1)
  • (added) libc/test/src/__support/wctype/CMakeLists.txt (+12)
  • (added) libc/test/src/__support/wctype/wctype_classification_utils_test.cpp (+526)
  • (added) libc/utils/wctype_utils/classification/init.py (+3)
  • (added) libc/utils/wctype_utils/classification/gen_classification_data.py (+316)
  • (modified) libc/utils/wctype_utils/gen.py (+19)
diff --git a/libc/.gitignore b/libc/.gitignore
index 6a4ce5ac61cd8..36eda57ddaa3a 100644
--- a/libc/.gitignore
+++ b/libc/.gitignore
@@ -1,3 +1,6 @@
 # Sphinx documentation
 docs/_build/
 build/
+
+# Unicode data used for wctype functions
+UnicodeData.txt
diff --git a/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake b/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
index 619b53f828705..51c39c3edae0e 100644
--- a/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
+++ b/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
@@ -119,6 +119,10 @@ function(_get_compile_options_from_config output_var)
     list(APPEND config_options "-DLIBC_TRAP_ON_RAISE_FP_EXCEPT")
   endif()
 
+  if(LIBC_CONF_WCTYPE_MODE)
+    list(APPEND config_options "-DLIBC_CONF_WCTYPE_MODE=${LIBC_CONF_WCTYPE_MODE}")
+  endif()
+
   set(${output_var} ${config_options} PARENT_SCOPE)
 endfunction(_get_compile_options_from_config)
 
diff --git a/libc/src/__support/CMakeLists.txt b/libc/src/__support/CMakeLists.txt
index c7f127d6934a0..df524c25cbd8a 100644
--- a/libc/src/__support/CMakeLists.txt
+++ b/libc/src/__support/CMakeLists.txt
@@ -413,6 +413,7 @@ add_subdirectory(time)
 # Therefore, cannot currently build this on macos in overlay mode
 if(NOT (LIBC_TARGET_OS_IS_DARWIN))
   add_subdirectory(wchar)
+  add_subdirectory(wctype)
 endif()
 
 add_subdirectory(math)
diff --git a/libc/src/__support/wctype/CMakeLists.txt b/libc/src/__support/wctype/CMakeLists.txt
new file mode 100644
index 0000000000000..1c9a14326c5ef
--- /dev/null
+++ b/libc/src/__support/wctype/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_header_library(
+  wctype_classification_utils
+  HDRS
+    wctype_classification_utils.h
+  DEPENDS
+    libc.hdr.types.wchar_t
+    libc.hdr.stdint_proxy
+    libc.src.__support.macros.config
+    libc.src.__support.CPP.limits
+    libc.src.__support.libc_assert
+)
diff --git a/libc/src/__support/wctype/wctype_classification_utils.h b/libc/src/__support/wctype/wctype_classification_utils.h
new file mode 100644
index 0000000000000..f3b0cfffb3cc2
--- /dev/null
+++ b/libc/src/__support/wctype/wctype_classification_utils.h
@@ -0,0 +1,3726 @@
+//===-- Utils for wctype classification functions ---------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// DO NOT EDIT MANUALLY.
+// This file is generated by libc/utils/wctype_utils scripts.
+                
+#ifndef LLVM_LIBC_SRC___SUPPORT_WCTYPE_WCTYPE_CLASSIFICATION_UTILS_H
+#define LLVM_LIBC_SRC___SUPPORT_WCTYPE_WCTYPE_CLASSIFICATION_UTILS_H
+
+#include "hdr/stdint_proxy.h"
+#include "hdr/types/wchar_t.h"
+#include "src/__support/CPP/limits.h"
+#include "src/__support/libc_assert.h"
+#include "src/__support/macros/config.h"
+
+namespace LIBC_NAMESPACE_DECL {
+                
+// Property flags for Unicode categories
+enum PropertyFlag : uint8_t {
+  UPPER = 1 << 0,
+  LOWER = 1 << 1,
+  ALPHA = 1 << 2,
+  SPACE = 1 << 3,
+  PRINT = 1 << 4,
+  BLANK = 1 << 5,
+  CNTRL = 1 << 6,
+  PUNCT = 1 << 7,
+};
+
+static_assert(4352 <= cpp::numeric_limits<unsigned short>::max());
+static_assert(39168 <= cpp::numeric_limits<unsigned short>::max());
+
+inline constexpr uint16_t LEVEL1_SIZE = 4352;
+inline constexpr uint16_t LEVEL2_SIZE = 39168;
+
+// Level 1 table: indexed by (codepoint >> 8), stores level2 block offsets
+inline constexpr uint16_t level1[LEVEL1_SIZE] = {
+    0,     256,   512,   768,   1024,  1280,  1536,  1792,  2048,  2304,  2560,
+    2816,  3072,  3328,  3584,  3840,  4096,  4352,  4608,  4864,  5120,  4352,
+    5376,  5632,  5888,  6144,  6400,  6656,  6912,  7168,  7424,  7680,  7936,
+    8192,  8448,  8448,  8704,  8448,  8448,  8960,  8448,  8448,  8448,  9216,
+    9472,  9728,  9984,  10240, 10496, 10752, 11008, 8448,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    11264, 4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  11520,
+    4352,  11776, 12032, 12288, 12544, 12800, 13056, 4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  13312, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13824, 13824, 13824, 13824, 13824, 13824, 13824,
+    13824, 13824, 13824, 13824, 13824, 13824, 13824, 13824, 13824, 13824, 13824,
+    13824, 13824, 13824, 13824, 13824, 13824, 13824, 4352,  14080, 14336, 4352,
+    14592, 14848, 15104, 15360, 15616, 15872, 16128, 16384, 16640, 4352,  16896,
+    17152, 17408, 17664, 17920, 18176, 18432, 18688, 18944, 19200, 19456, 19712,
+    19968, 20224, 20480, 20736, 20992, 21248, 21504, 21760, 22016, 22272, 22528,
+    22784, 23040, 4352,  4352,  4352,  23296, 23552, 23808, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 24064, 4352,  4352,  4352,  4352,
+    24320, 4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  24576, 4352,  4352,  24832, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 25088, 13568, 13568, 13568, 13568, 13568, 13568, 4352,  4352,  25344,
+    25600, 13568, 25856, 26112, 26368, 4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    26624, 26880, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 27136, 4352,  27392, 27648, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 27904, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 28160, 8448,
+    28416, 28672, 28928, 29184, 29440, 29696, 29952, 30208, 30464, 30720, 8448,
+    8448,  30976, 13568, 13568, 13568, 13568, 31232, 31488, 31744, 32000, 13568,
+    32256, 32512, 32768, 33024, 33280, 33536, 13568, 13568, 33792, 34048, 34304,
+    13568, 34560, 34816, 35072, 8448,  8448,  8448,  35328, 35584, 35840, 8448,
+    36096, 36352, 13568, 13568, 13568, 13568, 4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  36608, 4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  36864, 4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  37120, 4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  37376,
+    4352,  4352,  37632, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 4352,  4352,  37888, 13568, 13568, 13568, 13568, 13568, 4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  38144, 4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,  4352,
+    4352,  4352,  4352,  4352,  4352,  4352,  38400, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568, 13568,
+    13568, 13568, 13568, 13568, 13568, 13568, ...
[truncated]

@mleleszi mleleszi force-pushed the libc-wctype-classification-generation branch 3 times, most recently from d52c622 to dc9f6bf Compare December 12, 2025 18:28
@mleleszi mleleszi marked this pull request as ready for review December 12, 2025 19:15
@mleleszi mleleszi marked this pull request as draft December 13, 2025 16:34
@mleleszi mleleszi marked this pull request as ready for review December 15, 2025 11:47
@mleleszi mleleszi requested a review from vonosmas December 18, 2025 07:52
Integrate into build

Set up unit tests

Add tests

Add more tests

Remove UnicodeData and add to gitignore

Cleanup

Size optimizations and some asserts

Add some more tests

Add some more tests

Cleanup

Add some more tests

Fix stuff

fixes
@mleleszi mleleszi force-pushed the libc-wctype-classification-generation branch from a55b4b4 to ddd0c3d Compare December 18, 2025 08:09
Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Looks good, I do think if we need to we could compress the classification tables more. There are long blocks of the same number so we could probably get some savings from run length encoding.

@SchrodingerZhu
Copy link
Contributor

Overall Looks good, I do think if we need to we could compress the classification tables more. There are long blocks of the same number so we could probably get some savings from run length encoding.

I guess we can use tools like Z3/CVC5 to search optimal encoding if needed but that would require extra work.

@mleleszi mleleszi force-pushed the libc-wctype-classification-generation branch from 293eb88 to 6984e2d Compare December 30, 2025 17:41
@mleleszi
Copy link
Member Author

I will look into compression if the header ends up included in a lot of files in the final impl and the size increase would be too big

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one change. Feel free to land after fixing.

@mleleszi mleleszi merged commit 9373dbd into llvm:main Jan 6, 2026
26 checks passed
@mleleszi mleleszi deleted the libc-wctype-classification-generation branch January 6, 2026 11:13
@mleleszi mleleszi restored the libc-wctype-classification-generation branch January 6, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[libc][wctype] Create generation script for classification lookup tables

4 participants