|
241 | 241 | \terminal{\textbackslash U} hex-quad hex-quad
|
242 | 242 | \end{bnf}
|
243 | 243 |
|
244 |
| -The character designated by the \grammarterm{universal-character-name} \tcode{\textbackslash |
245 |
| -U00NNNNNN} is that character |
246 |
| -that has \tcode{U+NNNNNN} as a code point short identifier; |
247 |
| -the character designated by the \grammarterm{universal-character-name} |
248 |
| -\tcode{\textbackslash uNNNN} is that character |
249 |
| -that has \tcode{U+NNNN} as a code point short identifier. |
250 |
| -If a \grammarterm{universal-character-name} does not correspond to |
251 |
| -a code point in ISO/IEC 10646 or |
252 |
| -if a \grammarterm{universal-character-name} corresponds to |
253 |
| -a surrogate code point, |
254 |
| -the program is ill-formed. Additionally, if |
255 |
| -a \grammarterm{universal-character-name} outside |
| 244 | +A \grammarterm{universal-character-name} |
| 245 | +designates the character in ISO/IEC 10646 (if any) |
| 246 | +whose code point is the hexadecimal number represented by |
| 247 | +the sequence of \grammarterm{hexadecimal-digit}s |
| 248 | +in the \grammarterm{universal-character-name}. |
| 249 | +The program is ill-formed if that number is not a code point |
| 250 | +or if it is a surrogate code point. |
| 251 | +Noncharacter code points and reserved code points |
| 252 | +are considered to designate separate characters distinct from |
| 253 | +any ISO/IEC 10646 character. |
| 254 | +If a \grammarterm{universal-character-name} outside |
256 | 255 | the \grammarterm{c-char-sequence}, \grammarterm{s-char-sequence}, or
|
257 | 256 | \grammarterm{r-char-sequence} of
|
258 | 257 | a character or
|
|
262 | 261 | \grammarterm{r-char-sequence}\iref{lex.string} does not form a
|
263 | 262 | \grammarterm{universal-character-name}.}
|
264 | 263 | \begin{note}
|
265 |
| -ISO/IEC 10646 code points are within the range 0x0-0x10FFFF (inclusive). |
266 |
| -A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive). |
| 264 | +ISO/IEC 10646 code points are integers in the range $[0, \mathrm{10FFFF}]$ (hexadecimal). |
| 265 | +A surrogate code point is a value in the range $[\mathrm{D800}, \mathrm{DFFF}]$ (hexadecimal). |
267 | 266 | A control character is a character whose code point is
|
268 |
| -in either of the ranges 0x0-0x1F or 0x7F-0x9F (both inclusive). |
| 267 | +in either of the ranges $[0, \mathrm{1F}]$ or $[\mathrm{7F}, \mathrm{9F}]$ (hexadecimal). |
269 | 268 | \end{note}
|
270 | 269 |
|
271 | 270 | \pnum
|
|
1219 | 1218 | provided that the code point value
|
1220 | 1219 | can be encoded as a single UTF-8 code unit.
|
1221 | 1220 | \begin{note}
|
1222 |
| -That is, provided the code point value is in the range 0x0-0x7F (inclusive). |
| 1221 | +That is, provided the code point value is in the range $[0, \mathrm{7F}]$ (hexadecimal). |
1223 | 1222 | \end{note}
|
1224 | 1223 | If the value is not representable with a single UTF-8 code unit,
|
1225 | 1224 | the program is ill-formed.
|
|
1238 | 1237 | provided that the code point value is
|
1239 | 1238 | representable with a single 16-bit code unit.
|
1240 | 1239 | \begin{note}
|
1241 |
| -That is, provided the code point value is in the range 0x0-0xFFFF (inclusive). |
| 1240 | +That is, provided the code point value is in the range $[0, \mathrm{FFFF}]$ (hexadecimal). |
1242 | 1241 | \end{note}
|
1243 | 1242 | If the value is not representable
|
1244 | 1243 | with a single 16-bit code unit, the program is ill-formed.
|
|
1771 | 1770 | string literal is the number of code units, not the number of
|
1772 | 1771 | characters.
|
1773 | 1772 | \end{note}
|
1774 |
| -Within \tcode{char32_t} and \tcode{char16_t} |
1775 |
| -string literals, any \grammarterm{universal-character-name}{s} shall be within the range |
1776 |
| -\tcode{0x0} to \tcode{0x10FFFF}. The size of a narrow string literal is |
| 1773 | +\begin{note} |
| 1774 | +Any \grammarterm{universal-character-name}{s} are required to |
| 1775 | +correspond to a code point in the range |
| 1776 | +$[0, \mathrm{D800})$ or $[\mathrm{E000}, \mathrm{10FFFF}]$ (hexadecimal)\iref{lex.charset}. |
| 1777 | +\end{note} |
| 1778 | +The size of a narrow string literal is |
1777 | 1779 | the total number of escape sequences and other characters, plus at least
|
1778 | 1780 | one for the multibyte encoding of each \grammarterm{universal-character-name}, plus
|
1779 | 1781 | one for the terminating \tcode{'\textbackslash 0'}.
|
|
0 commit comments