Skip to content

Conversation

@philwalk
Copy link
Contributor

This PR modifies generate-native-image.sh to generate scala-cli.exe with code page set to 65001 if running in Windows:

  • save current code page
  • set code page to 65001
  • generate scala-cli.exe
  • restore the code page atexit

Using a test script named /opt/ue/jsrc/løp.sc:

#!/usr/bin/env -S scala-cli shebang
printf("Hello %s\n", scriptPath)
printf("Hello %s\n", sys.props("scala.sources"))
printf("Hello %s\n", sys.props("scala.source.names"))

and with the PATH set to use the generated scala-cli.exe:

PATH="out/cli/3.3.7/base-image/nativeImage.dest:$PATH"

Manual verification that the bug and the fix depend on the code page set during GraalVM compile.

verify the bug

create a buggy scala-cli.exe by running generated-native-image.sh temporarily modified to set code page 437
Verify that the bug manifests itself when attempting to execute the script:

# /opt/ue/jsrc/løp.sc
[error]  File not found: C:\opt\ue\jsrc\løp.sc

Verify the fix

create fixed scala-cli.exe by running generated-native-image.sh as proposed in this PR (set code page 65001)
Verify that the bug is fixed:

# /opt/ue/jsrc/løp.sc
Hello /opt/ue/jsrc/løp.sc
Hello C:/opt/ue/jsrc/løp.sc
Hello løp.sc

@philwalk
Copy link
Contributor Author

The error messages for the failing tests are the result of having Windows code page 437 active.
I assume that (at least as of jdk17) we need to set the code page to 65001 in Windows before running this test.

The failure is that a script named TestÅÄÖåäö.sc with contents:

object TestÅÄÖåäö {
  def main(args: Array[String]): Unit = {
    println("Hello from TestÅÄÖåäö")
  }
}

failed with this error message:

[950] Error: Could not find or load main class TestÅÄÖåäö
[950] Caused by: java.lang.ClassNotFoundException: TestÅÄÖåäö
# ./mill integration.test.native 'scala.cli.integration.RunTestsDefault.UTF-8'
[949] Produced artifacts:
[949]  C:\Users\philwalk\workspace\scala-cli-3307\out\cli\3.3.7\base-image\nativeImage.dest\scala-cli.build_artifacts.txt (txt)
[949]  C:\Users\philwalk\workspace\scala-cli-3307\out\cli\3.3.7\base-image\nativeImage.dest\scala-cli.exe (executable)
[949] ========================================================================================================================
[949] Finished generating 'scala-cli' in 3m 49s.
[950/950] integration.test.native
[950] >==== Running 'UTF-8' from RunTestDefinitions
[950] Running warmup testà
[950] Compiling project (Scala 3.7.3, JVM (17))
[950] Compiled project (Scala 3.7.3, JVM (17))
[950] Done running warmup test.
[950] Compiling project (Scala 3.7.3, JVM (17))
[950] Compiled project (Scala 3.7.3, JVM (17))
[950] Error: Could not find or load main class TestÅÄÖåäö
[950] Caused by: java.lang.ClassNotFoundException: TestÅÄÖåäö
[950] X==== Finishing 'UTF-8' from RunTestDefinitions
[950] scala.cli.integration.RunTestsDefault:
[950] ==> X scala.cli.integration.RunTestsDefault.UTF-8  2.017s os.SubprocessException: Result of C:\Users\philwalk\workspace\scala-cli-3307\out\cli\3.3.7\base-image\nativeImage.dest\scala-cli.exeà: 1
[950]
[950]     at os.proc.call(ProcessOps.scala:232)
[950]     at scala.cli.integration.RunTestDefinitions.$anonfun$new$117(RunTestDefinitions.scala:1080)
[950]     at scala.cli.integration.RunTestDefinitions.$anonfun$new$117$adapted(RunTestDefinitions.scala:1072)
[950]     at scala.cli.integration.TestInputs.$anonfun$fromRoot$1(TestInputs.scala:35)
[950]     at scala.cli.integration.TestInputs$.scala$cli$integration$TestInputs$$withTmpDir(TestInputs.scala:95)
[950]     at scala.cli.integration.TestInputs.fromRoot(TestInputs.scala:33)
[950]     at scala.cli.integration.RunTestDefinitions.$anonfun$new$116(RunTestDefinitions.scala:1072)
[950]     at scala.cli.integration.WithWarmUpScalaCliSuite.$anonfun$test$1(WithWarmUpScalaCliSuite.scala:34)
[950]     at scala.cli.integration.WithWarmUpScalaCliSuite.$anonfun$test$2(WithWarmUpScalaCliSuite.scala:37)
[950/950, 1 failed] ============================== integration.test.native scala.cli.integration.RunTestsDefault.UTF-8 ============================== 265s
1 tasks failed
integration.test.native 1 tests failed:
  scala.cli.integration.RunTestsDefault.UTF-8 scala.cli.integration.RunTestsDefault.UTF-8
philwalk@d5 MSYS ~/workspace/scala-cli-3307

@Gedochao
Copy link
Contributor

I assume that (at least as of jdk17) we need to set the code page to 65001 in Windows before running this test.

Do you maybe know how to do it? Can't say I'm familiar with code pages in Windows, myself.
The fix looks good otherwise, we just need to get the tests fixed.

@Gedochao Gedochao linked an issue Oct 27, 2025 that may be closed by this pull request
@philwalk

This comment was marked as outdated.

@Gedochao
Copy link
Contributor

One possible fix might be to run this one test with jdk18 or later (not sure how to do that, if it's possible).

Depends what "running with JDK18" means here.
If it's the JDK used for user code and/or Bloop, then just check how we do it in other tests, for example here:

test(s"correct JVM is picked up by $launcherString when JAVA_HOME set to $index") {
TestUtil.retryOnCi() {
TestInputs(
os.rel / "check_java_home.sc" ->
s"""assert(
| System.getProperty("java.version").startsWith("$javaVersion") ||
| System.getProperty("java.version").startsWith("1.$javaVersion")
|)
|println(System.getProperty("java.home"))""".stripMargin
).fromRoot { root =>
val javaHome =
os.Path(
os.proc(TestUtil.cs, "java-home", "--jvm", index).call().out.trim(),
os.pwd
)
withLauncher(root) { launcher =>
val res = os.proc(launcher, "run", ".", extraOptions)
.call(cwd = root, env = Map("JAVA_HOME" -> javaHome.toString))
expect(res.out.trim().contains(javaHome.toString))
}
}
}
}

-Dfile.encoding=UTF-8 should be possible to pass directly to the scala-cli native launcher (before the sub-command):

scala-cli -Dfile.encoding=UTF-8 compile (...)

If, however, this would mean the scala-cli launcher would require JDK18, that's a tougher cookie... I don't think we'd be bumping the required Java for the time being (not before Scala 3 compiler does it, and the upcoming 3.9 LTS will require JDK 17)

@philwalk

This comment was marked as outdated.

@philwalk
Copy link
Contributor Author

philwalk commented Oct 31, 2025

It turns out that when an os.proc(...).call(...) is run from mill it inherits 2 system properties that cause scripts with non-ascii script names to fail. They can be set to UTF-8 in the jvm running the integration test, but the spawned scala-cli script reports these values:

sun.jnu.encoding = Cp1252
native.encoding = Cp1252

If I duplicate the failing test in a scala-cli script, the bad encoding values are present if the script is run by scala-cli version 1.9.1. But if I run the script with a new GraalVM scala-cli.exe (compiled after setting the code page to 65001) then these values are inherited (apparently from scala-cli.exe) and the test succeeds:

sun.jnu.encoding = UTF-8
native.encoding = UTF-8

The same is true of the launcher out/cli/3.3.7/standaloneLauncher.dest/launcher.bat as demonstrated by this bash script. The implication is that the launcher succeeds or fails depending on whether Windows code page is 65001 or not.

# time ./mill integration.test.jvm 'scala.cli.integration.RunTestsDefault.UTF-8' 
# time ./mill integration.test.native 'scala.cli.integration.RunTestsDefault.UTF-8' 

Here's the demo shell script.
With scala-cli 1.9.1 in the PATH and it fails, run it with scala-cli from this PR build and it succeeds.

#!/bin/bash

cat > "/tmp/testÅÄÖåäö.sc" <<'EOF'
#!/usr/bin/env -S scala-cli shebang
//> using dep com.lihaoyi::os-lib:0.8.1
object TestÅÄÖåäö {
  import java.nio.charset.Charset
  var launcherName = "scala-cli.exe"
  val filename = "testÅÄÖåäö.sc"
  def showEncoding(): Unit = {
    printf("======= jvm encoding configuration:\n")
    import scala.jdk.CollectionConverters.*
    import scala.sys.process.*
    System.err.printf("code-page: [%s]\n", ("chcp.com".!!).trim)
    System.err.printf("JAVA_TOOL_OPTIONS[%s]\n", System.getenv("JAVA_TOOL_OPTIONS"))
    System.err.printf("native.encoding = %s\n",  System.getProperty("native.encoding"))
    System.err.printf("sun.jnu.encoding = %s\n", System.getProperty("sun.jnu.encoding"))
    System.err.printf("file.encoding = %s\n",    System.getProperty("file.encoding"))
    System.err.printf("Charset.defaultCharset = %s\n", java.nio.charset.Charset.defaultCharset())
    System.err.printf("Class name = %s\n", this.getClass.getName)
    printf("fileName[%s]\n", sys.props("scala.source.names"))
    import java.lang.ProcessHandle
    def displayAncestors(ph: ProcessHandle): Unit = {
      import java.util.Optional
      val parentHandle: Optional[ProcessHandle] = ph.parent()
      if (parentHandle.isPresent) {
        val parent = parentHandle.get()
        val command = parent.info().command().orElse("N/A")
        println(s"PID: ${parent.pid()}, Command: $command")
        displayAncestors(parent)
      } else {
        println("--- Reached top of process tree ---")
      }
    }
    println("--- Process Ancestry ---")
    // Display the current process first
    val currentHandle = ProcessHandle.current()
    val currentCommand = currentHandle.info().command().orElse("N/A")
    println(s"PID: ${currentHandle.pid()} (Current Process), Command: $currentCommand")
    displayAncestors(ProcessHandle.current())
  }

  def main(args: Array[String]): Unit = {
    val asciiOnly: Boolean = args.contains("-ascii")

    //val p = if scriptPath.fwd.startsWith("/") then os.Path(scriptPath) else os.Path(scriptPath, pwd)
    showEncoding()
    println(s"Hello from $filename")
    println(s"Hello from $scriptPath")
    printf("launcher: %s\n", launcher)
    val subProcessScriptpath = if asciiOnly then "/tmp/"+filename.replaceAll("[^\\x00-\\x7F]", "") else filename

    printf("subProcessScriptpath [%s]\n", subProcessScriptpath)
  
    val endOfCopy = "=== script cloning ends here ==="
    val p = java.nio.file.Paths.get(scriptPath)
    val osp = osPath(scriptPath)
    val content = os.read(osp, scala.io.Codec.UTF8)
    val topPart = content.split("[\r\n]+").takeWhile(!_.contains(endOfCopy)).filter(!_.contains("launcher")).mkString("\n")+"\n  }\n}"
    os.write.over(osPath(subProcessScriptpath), topPart)

    val extraArgs = Seq("--bloop-startup-timeout", "2min", "--bloop-bsp-timeout", "1min")

    printf("\nos.proc.call cloned copy: subProcessScriptpath [%s]\n", subProcessScriptpath)
    val res = os.proc(
      launcher,
      subProcessScriptpath
    )
      .call()
    val outstr = res.out.text(scala.io.Codec.UTF8).trim
    printf("outstr[%s]\n", outstr)
  }
  import scala.sys.process.*
  lazy val launcher = Seq("where.exe", launcherName).!!.trim.replace('\\', '/').split("[\r\n]+").toSeq.head
  def osPath(path: String): os.Path = {
    import java.nio.file.{Paths, Files}
    val p = Paths.get(path).toAbsolutePath
    os.Path(p.toString)
  }
}
EOF

FNAME=/tmp/testÅÄÖåäö.sc

scala-cli.exe run $FNAME # okay
echo "######################################" 1>&2
java -Xmx512m -Xms128m -jar 'out/cli/3.3.7/standaloneLauncher.dest/launcher.bat' $FNAME

The script in the HERE-doc is a self-contained scala-cli script that should fail in the same way as the 2 integration tests, but it is unabke to demonstrate the problem.
Just like the itegration test, the script launches a script with non-ascii filename via os.proc.
It succeeds if the scala-cli.exe on the PATH was built with Windows code page 95001 set.

@philwalk
Copy link
Contributor Author

philwalk commented Nov 2, 2025

This most recent commit adds extensive logging to the failing integration test for diagnostic purposes.
A much simpler and smaller test function is intended.

@Gedochao - are there any restrictions to using code created in collaboration with Copilot?

@Gedochao
Copy link
Contributor

Gedochao commented Nov 3, 2025

@Gedochao - are there any restrictions to using code created in collaboration with Copilot?

There are none, go ahead. We use a standard Apache 2.0 License and there's no specific policy against code created with LLM-reliant tooling.

@Gedochao Gedochao marked this pull request as draft November 3, 2025 09:58
@Gedochao
Copy link
Contributor

Gedochao commented Nov 3, 2025

@philwalk I converted the PR to draft format. I will keep track of what's happening here, feel free to mark it back as ready to review whenever. thanks for working on this.

@philwalk
Copy link
Contributor Author

philwalk commented Nov 3, 2025

@philwalk I converted the PR to draft format. I will keep track of what's happening here, feel free to mark it back as ready to review whenever. thanks for working on this.

Thanks. I'm trying to determine whether the two failing Windows tests for the initial PR commit are actually valid in Windows:

integration.test.jvm scala.cli.integration.RunTestsDefault.UTF-8
integration.test.native scala.cli.integration.RunTestsDefault.UTF-8

I'm trying to evolve the code into a valid UTF-8 test in the mill test environment. It seems possible if not likely that the tests as originally coded would be able to pass without using UTF-8 encoding, because the non-ascii String ÅÄÖåäö also has valid encodings in the Windows default Cp1252 as well as various other encodings. When I force mill to use the utf-8 encoding, it doesn't easily get propagated down to the spawned scala-cli environment, although the equivalent test script (above) seems to do so. I haven't figured out yet why it behaves differently in mill.

✅ Encodings with Valid Byte Representations for ÅÄÖåäö"

These encodings support the string ÅÄÖåäö" with valid byte representations:

Encoding Byte Representation (Hex) for ÅÄÖåäö Notes
UTF-8 C3 85 C3 84 C3 96 C3 A5 C3 A4 C3 B6 Fully supported; portable across platforms
Windows-1252 C5 C4 D6 E5 E4 F6 Common on Windows; compatible with Western European locales
ISO-8859-1 C5 C4 D6 E5 E4 F6 Similar to Windows-1252 but lacks some extra characters
ISO-8859-15 C5 C4 D6 E5 E4 F6 Adds euro symbol and other tweaks to ISO-8859-1
MacRoman 8A 8E 8F 86 84 94 Legacy Mac encoding; not recommended for new systems
Latin-9 C5 C4 D6 E5 E4 F6 Alias for ISO-8859-15
CP437 8F 8E 99 86 84 94 Original IBM PC encoding; limited compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cannot find the path specified with UTF-8 characters on Windows

2 participants