Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken pipe on WebSocketHandler #9135

Closed
sardaukar opened this issue Apr 19, 2020 · 19 comments
Closed

Broken pipe on WebSocketHandler #9135

sardaukar opened this issue Apr 19, 2020 · 19 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:files topic:stdlib:networking topic:stdlib

Comments

@sardaukar
Copy link

sardaukar commented Apr 19, 2020

I have a local server for a static site generator I'm working on:

require "http/server"

module Thingy
  class DevelopmentServer
    @reload_channel : Channel(Int32)
    @port : Int32

    getter reload_channel, port

    def initialize(@reload_channel, @port); end

    def run!
      server = HTTP::Server.new([
        HTTP::ErrorHandler.new,
        HTTP::LogHandler.new,
        HTTP::CompressHandler.new,
        HTTP::WebSocketHandler.new do |ws, ctx|
          loop do
            select
            when msg = reload_channel.receive
              ws.send("reload")
            when timeout 1.second
              ws.send("ping")
            end
          end
        end,
        HTTP::StaticFileHandler.new("./output/", directory_listing: false),
      ]) do |context|
        if is_html_index_request?(context)
          context.response.status_code = 301
          context.response.headers["Location"] = "#{context.request.path}index.html"
          puts "\nRedirecting implicit HTML index request..."
        else
          context.response.status_code = 404
          STDERR.puts "\nFile not found!"
        end
      end

      address = server.bind_tcp(port)
      puts "Listening on http://#{address}"

      server.listen
    end

    ###

    private def is_html_index_request?(context)
      context.request.headers["Accept"].index("text/html") &&
        context.request.path.ends_with?("/")
    end
  end
end

Another part of the code watches local Markdown files, and when one of them changes a value is sent to the reload_channel channel, and the WebSocket message sent to the client is changed from ping to reload and the browser reloads.

This all works, but 15/20 seconds after getting the first reload on the browser, and without other values sent to reload_channel I get:

GET / - Unhandled exception:
Error writing to socket: Broken pipe (IO::Error)
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/socket.cr:82:13 in 'unbuffered_write'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/io/buffered.cr:218:5 in 'flush'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/io/buffered.cr:226:5 in 'close'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handlers/websocket_handler.cr:47:9 in 'call'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handlers/compress_handler.cr:12:5 in 'call'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handlers/log_handler.cr:11:30 in 'call'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/handlers/error_handler.cr:15:7 in 'call'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/request_processor.cr:48:11 in 'process'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server/request_processor.cr:22:3 in 'process'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server.cr:498:5 in 'handle_client'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/http/server.cr:464:13 in '->'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/fiber.cr:255:3 in 'run'
  from /home/sardaukar/.asdf/installs/crystal/0.34.0/share/crystal/src/fiber.cr:92:34 in '->'
  from ???

On the browser (Firefox) I just get a skipped ping, and things go back to normal:

image

Since it's not too disruptive, I just wanted to rescue this error but I can't seem to do it even with a begin/rescue on the run! method for the class pasted above.

Any pointers helpful!

My crystal -v output:

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu
@jhass
Copy link
Member

jhass commented Apr 19, 2020

Related: #9065. Potentially fixed by #9115?

Looking at the trace, it might be something we want to generally ignore while flushing in IO::Buffered#close? But then silently discarding data the user has already written...

@rdp
Copy link
Contributor

rdp commented Apr 21, 2020

same problem with 0.33?

@sardaukar
Copy link
Author

@rdp @jhass I have created a repo to easily replicate this issue https://github.com/sardaukar/websocket_issue let me know if you see it too

@sardaukar
Copy link
Author

@rdp and yes, same problem in 0.33 and 0.34 for me

@rdp
Copy link
Contributor

rdp commented Apr 23, 2020 via email

@sardaukar
Copy link
Author

sardaukar commented Apr 23, 2020

Not sure why there are disconnects so often if it's localhost, though... And also, why can't this be rescued is my main gripe :|

@jhass
Copy link
Member

jhass commented Apr 23, 2020

It cannot be rescued because it simply doesn't touch your code. It runs in a fiber and is triggered in standard library code there after all your callbacks already returned.

Edit: sorry for the duplicates, github's downtime caused these :/

@waj

This comment has been minimized.

@waj
Copy link
Member

waj commented Apr 23, 2020

This is being improved by #9115 by not logging as errors the client disconnections

@sardaukar
Copy link
Author

It cannot be rescued because it simply doesn't touch your code. It runs in a fiber and is triggered in standard library code there after all your callbacks already returned.

Edit: sorry for the duplicates, github's downtime caused these :/

Yeah, I've gathered as much. The fact that I can't is my gripe :)

@sardaukar
Copy link
Author

@waj any idea on why there are so many disconnections for localhost endpoints?

@waj
Copy link
Member

waj commented Apr 23, 2020

@sardaukar you mean websockets are being disconnected with no apparent reason?

Oh... from your example I can see the callback for the WebSocketHandler never returns. Actually I'm testing that with Chrome and so far I cannot reproduce the issue. But if that block never returns, the WebSocketHandler cannot run the normal loop for the websocket. The browser might be upset about "ping" requests not being replied and force a disconnection.

I think you need to put that loop into a separate fiber. I'd also collect all the open websockets in an array for example, and broadcast the "reload" to all of them. That way you can make several tabs to reload when there are changes, if I understood correctly the intention. Don't forget to remove the websockets when they close, by handling the ws.on_close event.

@sardaukar
Copy link
Author

@waj thanks, I'll try to work those in! Not sure what you mean by "the callback never returns", though. Still a noob :|

@waj
Copy link
Member

waj commented Apr 23, 2020

@sardaukar I mean the block passed to HTTP::WebSocketHandler.new is meant to setup things and return immediately. In your code you have an infinite loop running.

@sardaukar
Copy link
Author

sardaukar commented Apr 23, 2020

Oh I see. Is there a better way to do it and still be able to react to a value coming in from a channel? I just need a different part of the code to signal the websocket handler to change the message and force a browser reload (with the accompanying JS script)

@asterite
Copy link
Member

@sardaukar maybe put that loop inside a spawn?

@sardaukar
Copy link
Author

sardaukar commented Apr 23, 2020

I fixed it with

def run!
      web_sockets = Set(HTTP::WebSocket).new

      server = HTTP::Server.new([
        HTTP::ErrorHandler.new,
        HTTP::LogHandler.new,
        HTTP::CompressHandler.new,
        HTTP::WebSocketHandler.new do |ws, ctx|
          spawn {
            web_sockets << ws

            ws.on_close do
              web_sockets = web_sockets.delete(ws)
            end

            loop do
              select
              when msg = reload_channel.receive
                web_sockets.each { |web_socket| web_socket.send("reload") }
              when timeout(1.second)
                web_sockets.each { |web_socket| web_socket.send("ping") }
              end
            end
          }
        end,
        HTTP::StaticFileHandler.new("./output/", directory_listing: false),
      ]) do |context|
        if is_html_index_request?(context)
          context.response.status_code = 301
          context.response.headers["Location"] = "#{context.request.path}index.html"
          puts "\nRedirecting implicit HTML index request..." # TODO colorize
        else
          context.response.status_code = 404
          STDERR.puts "\nFile not found!" # TODO colorize
        end
      end

      address = server.bind_tcp(port)
      puts "Listening on http://#{address}"

      server.listen
    end

Using a Set makes me feel like those years in uni were totally worth it :D

@waj
Copy link
Member

waj commented Apr 24, 2020

@sardaukar I'd put the spawn outside the WebSocketHandler, otherwise it's spawning a new fiber every time a new websocket is connected. Just add the websocket to the set and hook to on_close to remove it from the set when it's done.

@sardaukar
Copy link
Author

@waj thanks for that, changed it. I think I'll close this issue since #9115 is on track to be merged. Thanks for all the help, everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:files topic:stdlib:networking topic:stdlib
Projects
None yet
Development

No branches or pull requests

5 participants