Skip to content

Commit 97c4860

Browse files
khwilliamsonLeont
authored andcommitted
vutil.c: Use improved algorithm for non-P2008 radix
Prior to this commit, the code went through lots of machinations to detect the radix character of an NV in order to handle locales in which the radix is not a dot. This is because the version is always expressed in dot notation, and we had to accommodate sprintf(), which uses the current locale, and that locale might cause the value to be printed with something other than a dot. What was done was to change locales to something known to be a dot locale, so that we would be guaranteed to get a dot. But on non-POSIX-2008 boxes, this has significant drawbacks. On threaded or embedded perls, changing the locale can clash with other instances on platforms that don't have thread-safe locale handling. This can be overcome by using mutexes, as long as the other instances also obey those mutexes, and the perl this is compiled with is of sufficiently modern vintage to have those. A better method is to not change locale, but take the output of the sprintf and change the radix to a dot from whatever it got formatted as. This is feasible because almost all locales in the world have either a dot or a comma radix. So, the code now examines the sprintf output and converts a found comma into a dot. No need to change locales. I am aware of only one other potential radix character, found only in a Pashtun language locale of the 500 locales available on my box. This commit also handles that possibility, again converting it to a dot. Should there be a locale where this doesn't work, the commit leaves the current state as a fallback, subject to its disadvantages. But the chances of this fallback actually getting executed become close to zero. The POSIX 2008 case from before is both trivial and safe, so is left as-is.
1 parent f62b11d commit 97c4860

File tree

1 file changed

+105
-9
lines changed

1 file changed

+105
-9
lines changed

vutil/vutil.c

+105-9
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77

88
#define VERSION_MAX 0x7FFFFFFF
99

10+
#ifndef STRLENs
11+
# define STRLENs(s) (sizeof("" s "") - 1)
12+
#endif
1013
#ifndef POSIX_SETLOCALE_LOCK
1114
# ifdef gwLOCALE_LOCK
1215
# define POSIX_SETLOCALE_LOCK gwLOCALE_LOCK
@@ -662,22 +665,116 @@ Perl_upg_version(pTHX_ SV *ver, bool qv)
662665
goto VER_PV;
663666
}
664667
#endif
665-
#ifdef USE_LOCALE_NUMERIC
666668

667669
{
668-
/* This may or may not be called from code that has switched
669-
* locales without letting perl know, therefore we have to find it
670-
* from first principals. See [perl #121930]. */
671670

672-
# ifdef USE_POSIX_2008_LOCALE
671+
#ifdef USE_POSIX_2008_LOCALE
673672

674673
/* With POSIX 2008, all we have to do is toggle to the C locale
675674
* just long enough to get the value (which should have a dot). */
676675
const locale_t locale_obj_on_entry = uselocale(PL_C_locale_obj);
677676
GET_NUMERIC_VERSION(ver, sv, tbuf, buf, len);
678677
uselocale(locale_obj_on_entry);
678+
#else
679+
/* Without POSIX 2008, it could be that toggling will zap another
680+
* thread's locale. Avoid that if possible by looking at the NV and
681+
* changing a non-dot radix into a dot */
682+
683+
char * radix = NULL;
684+
unsigned int radix_len = 0;
685+
686+
GET_NUMERIC_VERSION(ver, sv, tbuf, buf, len);
687+
688+
# ifndef ARABIC_DECIMAL_SEPARATOR_UTF8
689+
690+
/* This becomes feasible since there are only very few possible
691+
* radix characters in the world. khw knows of just 3 possible
692+
* ones. If we are being compiled on a perl without the very rare
693+
* third one, ARABIC DECIMAL SEPARATOR, just scan for the other
694+
* two: FULL STOP (dot) and COMMA */
695+
radix = strpbrk(buf, ".,");
696+
if (LIKELY(radix)) {
697+
radix_len = 1;
698+
}
679699
# else
680-
const char * locale_name_on_entry;
700+
/* Here, we have information about the third one; since it is
701+
* multi-byte, it becomes a little more work. Scan for the dot,
702+
* comma, or first byte of the arabic one */
703+
radix = strpbrk(buf,
704+
".,"
705+
ARABIC_DECIMAL_SEPARATOR_UTF8_FIRST_BYTE_s);
706+
707+
if (LIKELY(radix)) {
708+
if (LIKELY( (* (U8 *) radix)
709+
!= ARABIC_DECIMAL_SEPARATOR_UTF8_FIRST_BYTE))
710+
{
711+
radix_len = 1; /* Dot and comma are length 1 */
712+
}
713+
else {
714+
715+
/* Make sure that the rest of the bytes are what we expect
716+
* for the remainder of the arabic radix. If not, we
717+
* didn't find the radix. */
718+
radix_len = STRLENs(ARABIC_DECIMAL_SEPARATOR_UTF8);
719+
if ( radix + radix_len >= buf + len
720+
|| memNEs(radix + 1,
721+
STRLENs(ARABIC_DECIMAL_SEPARATOR_UTF8_TAIL),
722+
ARABIC_DECIMAL_SEPARATOR_UTF8_TAIL))
723+
{
724+
radix = NULL;
725+
radix_len = 0;
726+
}
727+
}
728+
}
729+
730+
# endif
731+
732+
/* Now convert any found radix into a dot (if not already). This
733+
* effectively does: ver =~ s/radix/dot/ */
734+
if (radix) {
735+
if (*radix != '.') {
736+
*radix = '.';
737+
738+
if (radix_len > 1) {
739+
Move(radix + radix_len, /* from what follows the radix
740+
*/
741+
radix + 1, /* to just after the new dot */
742+
743+
/* the number of bytes remaining, plus the NUL
744+
* */
745+
len - (radix - buf) - radix_len + 1,
746+
char);
747+
len -= radix_len - 1;
748+
}
749+
}
750+
751+
/* Guard against the very unlikely case that the radix is more
752+
* than a single character, like ".."; that is, make sure the
753+
* radix string we found above is the whole radix, and not just
754+
* the prefix of a longer one. Success is indicated by it
755+
* being at the end of the string, or the next byte should be a
756+
* digit */
757+
if (radix < buf + len && ! inRANGE(radix[1], '0', '9')) {
758+
radix = NULL;
759+
radix_len = 0;
760+
}
761+
}
762+
763+
if (! radix) {
764+
765+
/* If we couldn't find what the radix is, or didn't find it in
766+
* the PV, resort to toggling the locale to one known to have a
767+
* dot radix. This may or may not be called from code that has
768+
* switched locales without letting perl know, therefore we
769+
* have to find it from first principals. See [perl #121930].
770+
* */
771+
772+
# if ! defined(LC_NUMERIC) || ! defined(USE_LOCALE_NUMERIC)
773+
774+
Perl_croak(aTHX_ "panic: Unexpectedly didn't find a dot radix"
775+
" character in '%s'", buf);
776+
# else
777+
const char * locale_name_on_entry = NULL;
681778

682779
/* In windows, or not threaded, or not thread-safe, if it isn't C,
683780
* set it to C. */
@@ -709,11 +806,10 @@ Perl_upg_version(pTHX_ SV *ver, bool qv)
709806

710807
POSIX_SETLOCALE_UNLOCK; /* End critical section */
711808
# endif
712-
809+
}
810+
#endif
713811
}
714812

715-
#endif /* USE_LOCALE_NUMERIC */
716-
717813
/* Strip trailing zero's from the version number */
718814
while (buf[len-1] == '0' && len > 0) len--;
719815

0 commit comments

Comments
 (0)