Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ergoCub 1.1 S/N:001 – Left arm suddently stopped streaming data #1865

Closed
S-Dafarra opened this issue Jul 29, 2024 · 13 comments
Closed

ergoCub 1.1 S/N:001 – Left arm suddently stopped streaming data #1865

S-Dafarra opened this issue Jul 29, 2024 · 13 comments
Assignees
Labels
ergoCub 1.1 S/N:001 ergoCub1.1 platform

Comments

@S-Dafarra
Copy link

Robot Name πŸ€–

ergoCub 1.1 S/N:001

Request/Failure description

While during normal teleoperation, one arm suddenly stopped with errors like

[ERROR] |yarp.devices.controlBoard_nws_yarp|left_arm-mc_nws_yarp| Encoder timestamps are not consistent! Data will not be published.

Detailed context

Here a portion of the log:
left_arm_issue.txt
Here the full log:
left_arm_issue_full.zip

Additional context

No response

How does it affect you?

No response

@github-actions github-actions bot changed the title Left arm suddently stopped streaming data ergoCub 1.1 S/N:001 – Left arm suddently stopped streaming data Jul 29, 2024
@github-actions github-actions bot added the ergoCub 1.1 S/N:001 ergoCub1.1 platform label Jul 29, 2024
@S-Dafarra
Copy link
Author

Rebooting the robot seemed to be enough, still it should not stop streaming the data

@S-Dafarra
Copy link
Author

S-Dafarra commented Jul 31, 2024

Ciao @Gandoo, @maggia80. I imagine that this issue is related to the faulty CAN cable of #1867 (comment), right?

@S-Dafarra
Copy link
Author

Unfortunately it is still happening. By editing the error message, it turns out it is the wrist yaw the first joint for which the timestamp is not consistent. This might indicate that it is not possible to communicate to the AMC board.

@AntonioAzocar also noted a particular LED lighting pattern on the board, and we are not sure it is normal.

cc @MSECode @marcoaccame

@AntonioAzocar
Copy link

AntonioAzocar commented Aug 2, 2024

Hello everyone,
To get all the info, I attach the video of the led

VID_20240802_071330.mp4

cc @S-Dafarra @MSECode @marcoaccame

@MSECode
Copy link

MSECode commented Aug 5, 2024

Looking at logs I saw that in multiple sections there are these errors and warnings, which seems to be related to the ETH communication, and they are actually related to the board we have issues on:

image

Therefore it might be that the board is not communicating good over ETH. I'll suggest then to do the following:

  • check if it is possible to ping it from the head
  • check if the leds behavior is the same for the AMC on the right arm (which should work fine), thus to be sure that the board is not doing something weird
  • check the eth cable and eventually change it
  • (maybe we can try to flash again the board to reset its state )

in the meanwhile I'll double check the issue and the logs with @marcoaccame this afternoon to get a better idea of the problem

@S-Dafarra
Copy link
Author

I did check, and it was possible to ping it. I noticed those errors appear when trying to close the yarprobotinterface. My hunch is that the board gets blocked somehow, and when we try to close the device also the network communication gets blocked. In fact, if we want to restart the robot, we also need to restart the motors.

@traversaro
Copy link
Member

Not directly related to the issue (the root issue is indeed in EMS communication) but just for reference this is related to robotology/yarp#2939 .

@marcoaccame
Copy link

hi all, @MSECode and I will analyze in more details all available information asap later today.
for now we saw:

  • fw version is not the latest
  • yri often loses contact with board eb31 for times ranging 40 ms to some seconds.

long story short: second point may explain the problem. we need to understand why it happens and if it happened before.

@marcoaccame
Copy link

@MSECode and I have had a first analysis:

  • fw version is not the latest

if possible we advice to upgrade to latest devel: icub-main, icub-firmware-shared and flash latest binaries

  • yri often loses contact with board eb31 for times ranging 40 ms to some seconds.

long story short: second point may explain the problem. we need to understand why it happens and if it happened before.

yri loses contact because... link between eb25 and eb31 continually goes down and up again. see:

ERROR] from BOARD 10.0.1.25 (left_arm-eb25-j11_12) time=4108s 545m 416u :  ETH monitor: link goes down  in port ETH output (P3/P12/J5). Application state is unknown.
 .....
 .....
[ERROR] from BOARD 10.0.1.25 (left_arm-eb25-j11_12) time=4110s 446m 416u :  ETH monitor: link goes up. in port ETH output (P3/P12/J5). Application state is unknown.

we shall try to re-crimp the cable.

@S-Dafarra
Copy link
Author

we shall try to re-crimp the cable.

@AntonioAzocar tried to recrimp the cable some days ago, but it also happened afterwards. He also mentioned that the connector on the board is not very firm.

Discussing with @AntonioConsilvio we also thought that a good test could be that when the failure happens we try to ping the AMC board (10.0.1.31) board.

cc @CarlottaSartore

@marcoaccame
Copy link

we shall try to re-crimp the cable.

@AntonioAzocar tried to recrimp the cable some days ago, but it also happened afterwards. He also mentioned that the connector on the board is not very firm.

The link must not go down and up. If it does that you loose all UDP frames beyond the line interruption. The link down is due to one of the four wires of the ETH cable that is interrupted. In this case it may be due to the movement or to the bending of the link.
The link can be the cable, the crimp but it may be also the connector on the board.

Discussing with @AntonioConsilvio we also thought that a good test could be that when the failure happens we try to ping the AMC board (10.0.1.31) board.

cc @CarlottaSartore

the ping may say that all is OK even with link down for a short time. Sometimes I can see messages telling that the link is down just for a small amount of time and the ping may just report OK and only some ms more roundtrip time. The HW check of the link status on the ETH boards on the other hand is done every 100 ms and tells where we have link or not.

@AntonioConsilvio
Copy link
Contributor

AntonioConsilvio commented Aug 9, 2024

Hi @S-Dafarra! With the help of @marcoaccame and @MSECode, we realised that the problem was with the power supply of the AMC EB31 board, which was rebooting occasionally.

Trying to find the cause, @fgarini noticed that something was touching the back of the board.

In fact, the thumb tendon was broken and touching the back of the board, occasionally shorting out the power supply:

IMG_20240808_170751

I replaced the tendon and, together with @AntonioAzocar, we applied Kapton to the back of the AMC to prevent it from short-circuiting again.

The robot has been tested and works fine now! βœ…

Thank you all for your help! πŸš€

@AntonioConsilvio
Copy link
Contributor

Since the problem did not reoccur, I proceed with the closure of the issue! βœ…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ergoCub 1.1 S/N:001 ergoCub1.1 platform
Projects
Status: Done
Development

No branches or pull requests

6 participants