Skip to content

Conversation

marksvc
Copy link
Collaborator

@marksvc marksvc commented Sep 30, 2025

It bothers me to see all the error messages.
My workstation can start it in 15 s. GHA can start it in 30 s.


This change is Reviewable

Copy link

codecov bot commented Sep 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.22%. Comparing base (771e64c) to head (466f7ce).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3483      +/-   ##
==========================================
+ Coverage   82.21%   82.22%   +0.01%     
==========================================
  Files         615      615              
  Lines       37031    37031              
  Branches     6044     6063      +19     
==========================================
+ Hits        30444    30448       +4     
+ Misses       5703     5689      -14     
- Partials      884      894      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marksvc marksvc marked this pull request as ready for review September 30, 2025 22:17
@pmachapman pmachapman self-assigned this Sep 30, 2025
@pmachapman pmachapman self-requested a review September 30, 2025 22:29
Copy link
Collaborator

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 7 at r1 (raw file):

const pollUrl = 'http://localhost:5000/projects';
const pollInterval = 17000;

Wouldn't it be better to poll every second, but reduce the error logging to only log those failures that occur after the 30 second mark? This will help reduce the error messages without a non-standard interval number (i.e. when some looks at 17000 in future, they won't realize it is based on your PC's and the GHA's performance).

Including an exponential backoff for the interval, i.e. 1000, 2000, 4000, 8000, 16000, is also a usual practice for polling non-responsive services.

Code quote:

const pollInterval = 17000;

Copy link
Collaborator

@Nateowami Nateowami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @marksvc)


-- commits line 4 at r1:
I don't see what's wrong with the current approach. I can't think of a reason to run this locally, and usually if you're looking at it on GHA, it's because the e2e tests are broken, and we're going to have to re-run them. This will often add an unnecessary delay to running the tests, which we'd ideally like to optimize to run much faster.

@marksvc
Copy link
Collaborator Author

marksvc commented Oct 1, 2025

-- commits line 4 at r1:
One problem is that when I go look at e2e logs, I see

0:00 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111)

0:01 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111)

0:02 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111)

0:03 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111)

0:04 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111)
...

33 times. Once when I was still learning and checked the e2e logs early on I incorrectly thought the e2e tests never even successfully started because of all these messages. The little message "Startup check passed. Exiting." at the end is a squeak compared to all the errors :)

The other problem is seeing it on my workstation, which is more often. On my local computer I start stuff all the time: compile job, launch a program, start some tests. It will give an indication of starting up, and eventually it's going. But it doesn't say "Error" a dozen times before successfully starting up :-)

It would be an improvement if every second we printed a message saying "Still waiting" 33 times.
I like what Peter suggests: Try every second, but don't print the messages until enough time has passed. Then it could try every second for half a minute, but not say Error every time.

I can't think of a reason to run this locally,

I often run it locally. If the GHA job fails, I might be able to more accurately try what it is doing by running the same script.
Another reason I run it locally is that if I use e2e.mts, then

  1. It does stuff in the GUI and sometimes I am concerned that I might have messed it up with my mouse or keyboard.
  2. I need to be more coordinated with SF running locally in the background, since e2e.mts seems to test an already running SF.

I know the e2e.mts script has various configuration etc, but it's been very convenient to just run the bash script that handles it all.

usually if you're looking at it on GHA, it's because the e2e tests are broken, and we're going to have to re-run them

Can you clarify why this might mean the display of 33 error messages at the beginning of the test run isn't undesirable?

This will often add an unnecessary delay to running the tests, which we'd ideally like to optimize to run much faster.

Yes, I see what you're saying here.

Copy link
Collaborator

@Nateowami Nateowami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


-- commits line 4 at r1:
It seems like the issue has more to do with how the startup ping failures appear in the log, than their frequency or presence. Perhaps the message could be changed to just say it's not up yet. It's verbose because it's intended to run on the CI.

The canonical way of running the tests, defined in the README, is to run e2e.mts. pre_merge_ci.sh is intended as a wrapper around that for the sake of a CI. I'm not even sure if developers on Windows can run it.

  1. It does stuff in the GUI and sometimes I am concerned that I might have messed it up with my mouse or keyboard.

This is a good argument for adding a new preset called headless that's identical to the default, except for being headless. I usually run them in headed mode because I want to be able to easily investigate failures, and headless mode makes it nearly impossible.

  1. I need to be more coordinated with SF running locally in the background, since e2e.mts seems to test an already running SF.

I nearly always have SF running for development purposes, and shutting down the processes to start another is quite slow. Do you not have it running most of the time? I guess I've just assumed it would be difficult to work without SF running.

Can you clarify why this might mean the display of 33 error messages at the beginning of the test run isn't undesirable?

I see them as startup status messages, in a CI script that's intended to be as verbose as possible. Arguably maybe the error messages could be toned down to just say that it can't connect yet instead of the underlying error message.


src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 7 at r1 (raw file):
I don't really want an exponential backoff, since this isn't about a service failure, but just waiting for a service to come up. And the cost of the requests is very low, since it's all on localhost.

reduce the error logging to only log those failures that occur after the 30 second

Taking more than 30 seconds wouldn't be particularly abnormal. In my opinion showing network failure messages after an arbitrary time period would make the problem of status messages looking like critical failures worse, since they normally wouldn't show up, so when they do show up it would deviate from what's normal.

@marksvc marksvc force-pushed the task/e2e-poll-delay branch from d0a9b97 to 7a48679 Compare October 14, 2025 19:26
@marksvc marksvc requested a review from pmachapman October 14, 2025 19:26
Copy link
Collaborator Author

@marksvc marksvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


-- commits line 4 at r1:

  1. I need to be more coordinated with SF running locally in the background, since e2e.mts seems to test an already running SF.

I nearly always have SF running for development purposes, and shutting down the processes to start another is quite slow. Do you not have it running most of the time? I guess I've just assumed it would be difficult to work without SF running.

I'm sorry, what I gave wasn't really a fair point, since using pre_merge_ci.sh has the same coordination issue, which is that I need to make sure SF is not running before I can run pre_merge_ci.sh, or it will have trouble.

I often have SF running. But depending on what I'm doing, I may or may not be keeping SF running or turned off.

maybe the error messages could be toned down to just say that it can't connect yet instead of the underlying error message.

What do you think of this change? On my workstation I see the following now, where a . is shown each second until a threshold:

E2ELauncher: Awaiting application startup before running tests
............
await-application-startup: 0:27 Startup check passed. Exiting.

If a more explicit "waiting" message would be preferable, we could instead do something like the following, if that would be preferable? (i.e. showing Waiting. each second until a threshold.)

E2ELauncher: Awaiting application startup before running tests
Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. 
await-application-startup: 0:27 Startup check passed. Exiting.

src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 7 at r1 (raw file):

In my opinion showing network failure messages after an arbitrary time period would make the problem of status messages looking like critical failures worse, since they normally wouldn't show up, so when they do show up it would deviate from what's normal.

Hmmm. I hear what you're saying. Perhaps another way to think about it is "now something in our job is taking longer than normal / than it used to."

Copy link
Collaborator

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @marksvc)


src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 27 at r2 (raw file):

  const minutes = Math.floor(elapsed / 60_000);
  const seconds = Math.floor((elapsed % 60_000) / 1000);
  return `${minutes}:${seconds < 10 ? "0" + seconds : seconds}`;

NIT: You can use padStart:

return `${minutes}:${seconds.toString().padStart(2, "0")}`;

Code quote:

return `${minutes}:${seconds < 10 ? "0" + seconds : seconds}`;

@marksvc marksvc force-pushed the task/e2e-poll-delay branch from 7a48679 to 0ebda7e Compare October 15, 2025 17:08
@marksvc marksvc requested a review from pmachapman October 15, 2025 17:08
Copy link
Collaborator Author

@marksvc marksvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)


src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 27 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

NIT: You can use padStart:

return `${minutes}:${seconds.toString().padStart(2, "0")}`;

Thank you. Done.

Copy link
Collaborator

@Nateowami Nateowami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @pmachapman)


src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 9 at r2 (raw file):

const pollUrl = "http://localhost:5000/projects";
const pollInterval = 1000;
const quietPeriodSec = 50_000;

I don't see the point of having a quiet period, followed by a non-quiet period, when it eventually times out. Why not keep it quiet until the script gives up and times out? Also, I think the variable is misnamed.

Copy link
Collaborator

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @marksvc and @Nateowami)

Copy link
Collaborator

@Nateowami Nateowami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @marksvc)


-- commits line 4 at r1:

Previously, marksvc wrote…
  1. I need to be more coordinated with SF running locally in the background, since e2e.mts seems to test an already running SF.

I nearly always have SF running for development purposes, and shutting down the processes to start another is quite slow. Do you not have it running most of the time? I guess I've just assumed it would be difficult to work without SF running.

I'm sorry, what I gave wasn't really a fair point, since using pre_merge_ci.sh has the same coordination issue, which is that I need to make sure SF is not running before I can run pre_merge_ci.sh, or it will have trouble.

I often have SF running. But depending on what I'm doing, I may or may not be keeping SF running or turned off.

maybe the error messages could be toned down to just say that it can't connect yet instead of the underlying error message.

What do you think of this change? On my workstation I see the following now, where a . is shown each second until a threshold:

E2ELauncher: Awaiting application startup before running tests
............
await-application-startup: 0:27 Startup check passed. Exiting.

If a more explicit "waiting" message would be preferable, we could instead do something like the following, if that would be preferable? (i.e. showing Waiting. each second until a threshold.)

E2ELauncher: Awaiting application startup before running tests
Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. Waiting. 
await-application-startup: 0:27 Startup check passed. Exiting.

How about adding ... after the Awaiting application startup before running tests so it's a tad clearer that it's an ongoing thing. Then I think adding one dot each second is fine.

This hides a number of error messages that are really indications that
we need to continue waiting.
@marksvc marksvc force-pushed the task/e2e-poll-delay branch from 0ebda7e to 466f7ce Compare October 17, 2025 23:03
@marksvc marksvc requested a review from Nateowami October 17, 2025 23:03
Copy link
Collaborator Author

@marksvc marksvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Nateowami)


-- commits line 4 at r1:

Previously, Nateowami wrote…

How about adding ... after the Awaiting application startup before running tests so it's a tad clearer that it's an ongoing thing. Then I think adding one dot each second is fine.

How's this?


src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts line 9 at r2 (raw file):

Previously, Nateowami wrote…

I don't see the point of having a quiet period, followed by a non-quiet period, when it eventually times out. Why not keep it quiet until the script gives up and times out? Also, I think the variable is misnamed.

Thank you, that would have been milliseconds, wouldn't it :).

How's this? The failure output looks like:

E2ELauncher: 2025-10-17T15:30:00-06:00 Starting.
E2ELauncher: Logging dotnet output to ./ClientApp/e2e/test_output/ci_e2e_test_results/dotnet.txt
E2ELauncher: Server started with PID 2300229
await-application-startup: Awaiting application startup before running tests ...........
........................................................................................
........................................................................................
........................................................................................
............................
await-application-startup: Error: error sending request for url (http://localhost:5099/projects): client error (Connect): tcp connect error: Connection refused (os error 111)
await-application-startup: Failed to start in 300000 milliseconds. Exiting.
E2ELauncher: Elapsed time: 5m 0s
E2ELauncher: Shutting down server with PID 2300229.
E2ELauncher: Server shut down.

The success output looks like:

E2ELauncher: 2025-10-17T16:51:54-06:00 Starting.
E2ELauncher: Logging dotnet output to ./ClientApp/e2e/test_output/ci_e2e_test_results/do
tnet.txt                                                                                
E2ELauncher: Server started with PID 2434441                             
await-application-startup: Awaiting application startup before running tests ...........
........
await-application-startup: 0:31 Startup check passed. Exiting.
[ "pre_merge_ci" ]
Using preset: pre_merge_ci                                                              

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants