Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yarplogger may not report all entries #2643

Open
sgiraz opened this issue Jul 6, 2021 · 8 comments · May be fixed by #2789 or #2667
Open

Yarplogger may not report all entries #2643

sgiraz opened this issue Jul 6, 2021 · 8 comments · May be fixed by #2789 or #2667
Labels

Comments

@sgiraz
Copy link
Contributor

sgiraz commented Jul 6, 2021

Bug description

I happened to spot an unexpected behavior concerning the tracing of operations such as yInfo, yWarning, yError etc. within the yarplogger.

If two operations are very close each other in time, I can observe that the yarplogger may not keep track of their invocations. In particular, the lower the time interval the higher the probability of a missed trace.

To Reproduce

In order to reproduce the behavior with the smallest code snippet, I crafted a yInfo-test repository where you can find a simple case study (fully documented).

The yInfo-test application contains a simple for-loop that performs 10k iterations with just one call to the yInfo function per cycle. Note that after each iteration a yarp::os::Time::delay is also called so we can vary the elapsed time between the yInfo calls.

The test scenario foresees the use of yarplogger without the yarprun broker, thus relying on the YARP_FORWARD_LOG_ENABLE=1 env var.

After some quick tests, it came out that there exists a threshold of 0.001s. Below this value (included), the number of missing yInfo logged by the yarplogger dramatically increases.

Please, notice that the for-loop is only aimed to replicate the behavior, whereas two very close logging operations may occur wherever in the user code.

Below, you can see some results comparing what was produced by the yarplogger with the expected results:

🔘 Click to see the result of Test 1
./check_log_file.sh build/yarprunlog_04_07_2021_23_46_12.log
Parsing log file...
Missing value 1707, previous was at [ Time delay ]: 0.001 Counter value: 1706
Done.
🔘 Click to see the result of Test 2
./check_log_file.sh build/yarprunlog_04_07_2021_23_41_24.log
Parsing log file...
Missing value 3224, previous was at [ Time delay ]: 0.0001 Counter value: 3223
Missing value 3512, previous was at [ Time delay ]: 0.0001 Counter value: 3511
Missing value 3513, previous was at
Missing value 3514, previous was at
Missing value 3515, previous was at
Missing value 8594, previous was at [ Time delay ]: 0.0001 Counter value: 8593
Missing value 8595, previous was at
Missing value 8596, previous was at
Done.
🔘 Click to see the result of Test 3
./check_log_file.sh build/yarprunlog_05_07_2021_00_18_19.log
Parsing log file...
Missing value 7, previous was at [ Time delay ]: 1e-05 Counter value:  6
Missing value 9, previous was at [ Time delay ]: 1e-05 Counter value:  8
Missing value 145, previous was at [ Time delay ]: 1e-05 Counter value:  144
Missing value 197, previous was at [ Time delay ]: 1e-05 Counter value:  196
Missing value 199, previous was at [ Time delay ]: 1e-05 Counter value:  198
Missing value 549, previous was at [ Time delay ]: 1e-05 Counter value:  548
Missing value 687, previous was at [ Time delay ]: 1e-05 Counter value:  686
Missing value 1163, previous was at [ Time delay ]: 1e-05 Counter value:  1162
Missing value 1230, previous was at [ Time delay ]: 1e-05 Counter value:  1229
Missing value 1335, previous was at [ Time delay ]: 1e-05 Counter value:  1334
Missing value 1659, previous was at [ Time delay ]: 1e-05 Counter value:  1658
Missing value 1891, previous was at [ Time delay ]: 1e-05 Counter value:  1890
Missing value 2071, previous was at [ Time delay ]: 1e-05 Counter value:  2070
Missing value 2206, previous was at [ Time delay ]: 1e-05 Counter value:  2205
Missing value 3038, previous was at [ Time delay ]: 1e-05 Counter value:  3037
Missing value 3306, previous was at [ Time delay ]: 1e-05 Counter value:  3305
Missing value 3523, previous was at [ Time delay ]: 1e-05 Counter value:  3522
Missing value 3648, previous was at [ Time delay ]: 1e-05 Counter value:  3647
Missing value 3880, previous was at [ Time delay ]: 1e-05 Counter value:  3879
Missing value 3904, previous was at [ Time delay ]: 1e-05 Counter value:  3903
Missing value 4372, previous was at [ Time delay ]: 1e-05 Counter value:  4371
Missing value 4558, previous was at [ Time delay ]: 1e-05 Counter value:  4557
Missing value 4559, previous was at
Missing value 4561, previous was at [ Time delay ]: 1e-05 Counter value:  4560
Missing value 4811, previous was at [ Time delay ]: 1e-05 Counter value:  4810
Missing value 4974, previous was at [ Time delay ]: 1e-05 Counter value:  4973
Missing value 5115, previous was at [ Time delay ]: 1e-05 Counter value:  5114
Missing value 5415, previous was at [ Time delay ]: 1e-05 Counter value:  5414
Missing value 5633, previous was at [ Time delay ]: 1e-05 Counter value:  5632
Missing value 5711, previous was at [ Time delay ]: 1e-05 Counter value:  5710
Missing value 5730, previous was at [ Time delay ]: 1e-05 Counter value:  5729
Missing value 5733, previous was at [ Time delay ]: 1e-05 Counter value:  5732
Missing value 5883, previous was at [ Time delay ]: 1e-05 Counter value:  5882
Missing value 6125, previous was at [ Time delay ]: 1e-05 Counter value:  6124
Missing value 6131, previous was at [ Time delay ]: 1e-05 Counter value:  6130
Missing value 6134, previous was at [ Time delay ]: 1e-05 Counter value:  6133
Missing value 6140, previous was at [ Time delay ]: 1e-05 Counter value:  6139
Missing value 6166, previous was at [ Time delay ]: 1e-05 Counter value:  6165
Missing value 6169, previous was at [ Time delay ]: 1e-05 Counter value:  6168
Missing value 6262, previous was at [ Time delay ]: 1e-05 Counter value:  6261
Missing value 6473, previous was at [ Time delay ]: 1e-05 Counter value:  6472
Missing value 6604, previous was at [ Time delay ]: 1e-05 Counter value:  6603
Missing value 6620, previous was at [ Time delay ]: 1e-05 Counter value:  6619
Missing value 6785, previous was at [ Time delay ]: 1e-05 Counter value:  6784
Missing value 6786, previous was at
Missing value 6787, previous was at
Missing value 6789, previous was at [ Time delay ]: 1e-05 Counter value:  6788
Missing value 6790, previous was at
Missing value 6791, previous was at
Missing value 6792, previous was at
Missing value 6793, previous was at
Missing value 6811, previous was at [ Time delay ]: 1e-05 Counter value:  6810
Missing value 6814, previous was at [ Time delay ]: 1e-05 Counter value:  6813
Missing value 7026, previous was at [ Time delay ]: 1e-05 Counter value:  7025
Missing value 7322, previous was at [ Time delay ]: 1e-05 Counter value:  7321
Missing value 7502, previous was at [ Time delay ]: 1e-05 Counter value:  7501
Missing value 7674, previous was at [ Time delay ]: 1e-05 Counter value:  7673
Missing value 7916, previous was at [ Time delay ]: 1e-05 Counter value:  7915
Missing value 8061, previous was at [ Time delay ]: 1e-05 Counter value:  8060
Missing value 8130, previous was at [ Time delay ]: 1e-05 Counter value:  8129
Missing value 8340, previous was at [ Time delay ]: 1e-05 Counter value:  8339
Missing value 8347, previous was at [ Time delay ]: 1e-05 Counter value:  8346
Missing value 8465, previous was at [ Time delay ]: 1e-05 Counter value:  8464
Missing value 8478, previous was at [ Time delay ]: 1e-05 Counter value:  8477
Missing value 8738, previous was at [ Time delay ]: 1e-05 Counter value:  8737
Missing value 8809, previous was at [ Time delay ]: 1e-05 Counter value:  8808
Missing value 8814, previous was at [ Time delay ]: 1e-05 Counter value:  8813
Missing value 8831, previous was at [ Time delay ]: 1e-05 Counter value:  8830
Missing value 8886, previous was at [ Time delay ]: 1e-05 Counter value:  8885
Missing value 9276, previous was at [ Time delay ]: 1e-05 Counter value:  9275
Missing value 9288, previous was at [ Time delay ]: 1e-05 Counter value:  9287
Missing value 9309, previous was at [ Time delay ]: 1e-05 Counter value:  9308
Missing value 9569, previous was at [ Time delay ]: 1e-05 Counter value:  9568
Missing value 9980, previous was at [ Time delay ]: 1e-05 Counter value:  9979
Done.

Expected behavior
When we call the 2 or more yInfo functions that are near in time, we expect to see their output as many times as they have been called. Unfortunately, if we compare the console output with that shown by the yarplogger, we can observe that in the latter case some calls are missing.

Screenshots

In the following screenshot we can see that yarplogger missed to capture the yInfo with counter value 298.

screen_compare

Configuration (please complete the following information):

  • OS: Windows 10 + WLS2
  • yarp version: 3.4.5 (installed with robotology-superbuild)
  • compiler: GNU Make 4.2.1

Additional context

While I was providing a solution for the following assignments:

I was trying to debug the application using the yInfo functions to inspect the coverage of my code. Putting the yInfo calls before and after some instructions or code blocks, and looking at yarplogger, it seemed that some code blocks were never executed. However, after I had tried to run the code using only the console, I realized that the yarplogger was not showing all the information as it should have done.

As suggested by @pattacini and @traversaro, we did the test also in the configuration where we employ the yarprun as a broker in place of the localhost.

In this case, the results seem better (less traces missed), but the problem persists as the delay time decreases. I also noticed the yWarning and yError seem more rugged and reliable than the yInfo; maybe, this is due to a possible priority mechanism, I suppose.

@drdanz drdanz added Component: GUI - yarplogger Component: Tool - yarplogger-console Issue Type: Bug Involves some intervention from a system administrator labels Jul 7, 2021
@drdanz
Copy link
Member

drdanz commented Jul 7, 2021

I suppose this might be caused by the fact that the logger is using a BufferedPort...

yarp::os::BufferedPort<yarp::os::Bottle> logger_port;

@drdanz
Copy link
Member

drdanz commented Jul 7, 2021

CC @randaz81

@PeterBowman
Copy link
Member

I suppose this might be caused by the fact that the logger is using a BufferedPort...

I was thinking the same, but this buffered port is configured as strict (on read):

log_updater->logger_port.setStrict();

So perhaps the forwarding port (on write) is to be blamed, not the yarplogger's one. The log forwarder enables background writes:

outputPort.enableBackgroundWrite(true);

According to the following inline doc, it does not wait for writes to complete immediately (false), hence I think some of them may be lost in the process:

bool m_waitAfterSend {true}; ///< should we wait for writes to complete immediately after we start them?

@pattacini
Copy link
Member

Any comment on this @randaz81 ?

drdanz added a commit to drdanz/yarp that referenced this issue Jul 27, 2021
drdanz added a commit to drdanz/yarp that referenced this issue Jul 27, 2021
When the `forward` method is called twice, the `Bottle::clear()` method
could be called before the message is sent, hence deleting the bottle
being sent.

This fixes the `LogForwarder` skipping messages when messages are sent
too quickly (Fixes robotology#2643).
@drdanz drdanz linked a pull request Jul 27, 2021 that will close this issue
@drdanz
Copy link
Member

drdanz commented Jul 27, 2021

I believe that @PeterBowman is right, the problem is probably this static bottle:

void yarp::os::impl::LogForwarder::forward(const std::string& message)
{
mutex.lock();
static Bottle b;
b.clear();
std::string port = "[" + outputPort.getName() + "]";
b.addString(port);
b.addString(message);
outputPort.write(b);
mutex.unlock();
}

When the forward method is called twice, the Bottle::clear() method could be called before the message is sent, therefore, if I understand correctly, the empty bottle contains no data and, when serialized as binary (not as string), nothing is sent (which is probably a different bug to be further investigated.

I opened #2667 which fixes the issue for me (at least according to @sgiraz very useful check_log_file.sh script). Does anyone else want to try it before I merge it?

drdanz added a commit to drdanz/yarp that referenced this issue Aug 2, 2021
When the `forward` method is called twice, the `Bottle::clear()` method
could be called before the message is sent, hence deleting the bottle
being sent.

This fixes the `LogForwarder` skipping messages when messages are sent
too quickly (Fixes robotology#2643).
@drdanz
Copy link
Member

drdanz commented Aug 2, 2021

if I understand correctly, the empty bottle contains no data and, when serialized as binary (not as string), nothing is sent (which is probably a different bug to be further investigated.

Just for reference, I was wrong on this, empty Bottles are sent correctly both in binary and text form

@sgiraz
Copy link
Contributor Author

sgiraz commented Oct 13, 2021

Hi guys!
I hope you are well 🙂

I'm reaching you because I would like to know if you have found any solution to this logging issue.
If yes, please, could provide a brief description about how you solved it?

Thanks! 🙏🏻

@drdanz
Copy link
Member

drdanz commented Oct 14, 2021

Nope, sorry.
You can try #2667 that fixes the issue, but blocks the sender thread when there are too many messages, and it is therefore dangerous, hence it cannot be merged in YARP (at least for now). I don't see a solution, unless we implement a "background strict" mode in the port, that makes the buffer bigger when there are messages in queue.

@drdanz drdanz added this to the YARP 3.5.2 milestone Nov 12, 2021
@randaz81 randaz81 linked a pull request Dec 28, 2021 that will close this issue
@drdanz drdanz modified the milestones: YARP 3.5.2, YARP 3.6.1 Dec 28, 2021
@randaz81 randaz81 removed this from the YARP 3.6.1 milestone Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants