Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARNING ReactorEpoll::del() (ERRNO 800) #5694

Open
warmbook opened this issue Feb 22, 2025 · 8 comments
Open

WARNING ReactorEpoll::del() (ERRNO 800) #5694

warmbook opened this issue Feb 22, 2025 · 8 comments

Comments

@warmbook
Copy link

warmbook commented Feb 22, 2025

现象:
V6.0.1中,配置max_request和心跳检测,启动服务后通过浏览器控制台发起大量请求,约在2000上下,持续时间40-70s时,进程终止并输出 zend_mm_heap corrupted,同时log文件中出现如题告警:[2025-02-22 17:00:00 *3543124.1] WARNING ReactorEpoll::del() (ERRNO 800): failed to delete events[fd=27, fd_type=0], it has already been removed,行数不固定,目前观测到出现过2和3行。
此外,WorkerExit回调始终没有执行

php --ri swoole:

swoole
Swoole => enabled
Author => Swoole Team [email protected]
Version => 6.0.1
Built => Feb 21 2025 13:23:20
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
thread => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
sockets => enabled
openssl => OpenSSL 3.1.4+quic 24 Oct 2023
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
curl-version => 8.4.0
c-ares => 1.24.0
zlib => 1.3.1
brotli => E16777225/D16777225
zstd => 1.5.2
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
mysqlnd => enabled
coroutine_pgsql => enabled
coroutine_odbc => enabled
coroutine_sqlite => enabled
Directive => Local Value => Master Value
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608

uname -a && php -v && gcc -v:

Swoole 6.0.1 (cli) (built: Feb 21 2025 13:25:03) (ZTS)
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/10/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --disable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20200825 (Alibaba 10.2.1-3 2.32) (GCC)

复现代码:
PHP代码

use Swoole\Thread\Map;
use Swoole\WebSocket\Server;
use Swoole\WebSocket\Frame;

list($connections)=Swoole\Thread::getArguments();
$setting=[
    'hook_flags'=>SWOOLE_HOOK_NATIVE_CURL|SWOOLE_HOOK_FILE|SWOOLE_HOOK_TCP|SWOOLE_HOOK_SLEEP,
    'max_request'=>64, // 每个worker进程处理请求数
    'heartbeat_check_interval'=>10, // 心跳间隔
    'heartbeat_idle_time'=>30, // 超时断开
    'log_level'=>SWOOLE_LOG_WARNING,
    'log_file'=>'/path/to/logs/swoole.txt',
];
if(!$connections){
    if(!file_exists($setting['log_file'])) fclose(fopen($setting['log_file'],'w+'));
    chmod($setting['log_file'],0777);
    $connections=new Map();
    $setting+=[
        'init_arguments'=>fn()=>[$connections],
        'worker_num'=>swoole_cpu_num()+1, // 默认为CPU核数
    ];
}
$server=new Server('0.0.0.0',80,SWOOLE_THREAD);
$server->set($setting);
$server->on('WorkerStart',function(Server $server,$workerId)use($connections){
    $server->connections=$connections;
    echo '[ ',str_pad((string)microtime(TRUE),15,'0'),' ] WorkerStart:',$workerId,PHP_EOL;
});
$server->on('WorkerExit',function(Server $server,$workerId){
    var_dump($server->connections);
    echo '[ ',str_pad((string)microtime(TRUE),15,'0'),' ] WorkerExit:',$workerId,PHP_EOL;
});
$server->on('request',function($request,$response){
    $response->end(microtime());
});
$server->on('open',function(Server $ws,$request){
    // echo '[ ',str_pad((string)microtime(TRUE),15,'0'),' ] open:',$ws->worker_id,' > ',$request->fd,PHP_EOL;
});
$server->on('message',function(Server $ws,Frame $frame){
    var_dump($frame->data);
    if(!$frame->data){
        /**
         * 心跳,不用处理 ping 帧,底层会自动回复 pong 包:
         * https://wiki.swoole.com/zh-cn/#/websocket_server?id=open_websocket_ping_frame
         * 
         * $pingFrame=new Frame();
         * if($frame->opcode===WEBSOCKET_OPCODE_PING) $pingFrame->opcode=WEBSOCKET_OPCODE_PONG;
         */
        $ws->push($frame->fd,new Frame());
        return;
    }
});
$server->on('connect',function(Server $ws,$fd){
    // echo '[ ',str_pad((string)microtime(TRUE),15,'0'),' ] connect:',$ws->worker_id,' > ',$fd,PHP_EOL;
});
$server->on('close',function(Server $ws,$fd){
    // echo '[ ',str_pad((string)microtime(TRUE),15,'0'),' ] close:',$ws->worker_id,' > ',$fd,PHP_EOL;
});
$server->start();

JS发起请求代码

(async ()=>{
  for(let i=0;i!==10000;i++){
    await fetch('/abcd');
    await new Promise(resolve=>{setTimeout(resolve,5)})
    console.log(i)
  }
})();
@NathanFreeman
Copy link
Member

应该是心跳和线程重启冲突了。
开启reload_async才能触发workerExit事件

@warmbook
Copy link
Author

@NathanFreeman
有没有可能是重启线程后心跳那边缓存的连接没有及时清除?因为我试了不加max_request,手动kill -USR1,进程不会立即崩溃,过了一会才挂了,然后日志文件的内容也是这个 ReactorEpoll::del() (ERRNO 800)
上述代码我试了加上 reload_async ,还是不会触发WorkerExit;是不是还要申请某些资源呢?我在完整程序中创建了定时器,重启的时候就会触发了

@NathanFreeman
Copy link
Member

NathanFreeman commented Feb 23, 2025

当进程没有事件句柄在监听时,进程结束时将不会回调WorkerExit函数
https://wiki.swoole.com/zh-cn/#/server/events?id=onworkerexit

@warmbook
Copy link
Author

@NathanFreeman 谢谢,之前理解的浅了,以为只要加上 reload_async 就算监听事件句柄了。
那个ReactorEpoll::del 的问题算是bug吧?

@NathanFreeman
Copy link
Member

心跳线程和工作线程冲突了,我看看是不是需要加个锁

@matyhtf
Copy link
Member

matyhtf commented Feb 24, 2025

使用 Valgrind 追踪一下程序的运行,将输出的错误信息粘贴到这里:

USE_ZEND_ALLOC=0 valgrind php server.php

重现后,找到日志中的 Invalid readInvalid write 相关信息

@warmbook
Copy link
Author

@matyhtf
试了几次,只出现了 Warning: invalid file descriptor -1 in syscall close(),而且进程也没有崩溃,但是日志文件还是一样,出现ReactorEpoll::del() (ERRNO 800)

@NathanFreeman
Copy link
Member

应该是这个心跳线程和工作线程冲突了的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants