Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-node vs. per-packet vs. per-whatever run data #4

Open
simonduq opened this issue May 12, 2018 · 8 comments
Open

Per-node vs. per-packet vs. per-whatever run data #4

simonduq opened this issue May 12, 2018 · 8 comments
Labels
discussion Start a discussion

Comments

@simonduq
Copy link
Member

Currently, run data is just an array, with no defined semantics. But we will need sometimes per-node data (e.g. for power consumption), per-packet data (e.g. for latency), per-time data (to plot timelines).

Should we have the metric itself mandate per-what we want to measure things?

Should we instead leave it open and have the run provide anything it wants to, e.g. {per-node: xyz, per-packet: abc}

Other ideas?

@simonduq simonduq added the discussion Start a discussion label May 12, 2018
@romain-jacob
Copy link
Member

romain-jacob commented May 14, 2018

I think we definitely need some kind of structure here. Otherwise we are going to end up with a 'experiment data dump', which is not bad per se, but would somehow be an underachievement imho.
One should be able to search/sort through the results in the repository, which implies some level of constraints on the data being reported.

I would say that for a given profile, there should be a set of compulsory metrics to report, and how they should be computed. In addition, we can let user report additional metrics whenever relevant, but it should be formally defined in the _metrics folder

@simonduq
Copy link
Member Author

One should be able to search/sort through the results in the repository, which implies some level of constraints on the data being reported.

From a web hosting standpoint: currently only possible if you git clone and work on source. The day we want to be able to do it on the website, we'll have to move away from Github pages (does only static generated content)

I would say that for a given profile, there should be a set of compulsory metrics to report

Yes yes, as per the whitepaper (and this repo)

and how they should be computed

So you're saying we also define as part of the metric whether we compute distribution on a per-node or per-time unit basis? Maybe a proposal is: (1) keep the metric independent from the per-node or per-time notions, just leave it as what to measure and (2) have profiles specify "per-node" or "per-time", for each metric in requires. I could even imagine that some profile wants to measure both energy per-node and per-time for instance.

@romain-jacob
Copy link
Member

One should be able to search/sort through the results in the repository, which implies some level of constraints on the data being reported.

From a web hosting standpoint: currently only possible if you git clone and work on source. The day we want to be able to do it on the website, we'll have to move away from Github pages (does only static generated content)

Hum, that's a shame. But it should not be a show-killer. This functionality of browsing through the results can be embedded directly in the IoTBench website (I think it would be a good thing actually). The Github pages would remain useful to host the details about how are defined the profiles and how to contribute...

and how they should be computed

So you're saying we also define as part of the metric whether we compute distribution on a per-node or per-time unit basis? Maybe a proposal is: (1) keep the metric independent from the per-node or per-time notions, just leave it as what to measure and (2) have profiles specify "per-node" or "per-time", for each metric in requires. I could even imagine that some profile wants to measure both energy per-node and per-time for instance.

Well... There are two different things, as you are saying: (1) the physical parameters being measured, (2) the metrics you compute based on the collected data.
To me, the value lies in (2). My point was: (in my opinion) the profile needs to describe how to compute (2) based on (1). Furthermore, how (1) is acquired should be reported (if not defined by the profile, but this may become too constraining maybe).

For example:
(1) : power draw, measured for all nodes individually with a sampling interval of 0.1 us (or smaller)
(2) : (can be many)

  • mean energy consumption, averaged across all nodes for the entire test excluding bootstrapping
  • mean energy consumption, averaged across all nodes for the entire test including bootstrapping
  • max energy consumption, averaged across all nodes for the entire test after bootstrapping
  • etc...

I feel like a agree, no?

@simonduq
Copy link
Member Author

Yes we agree on the split between (1) what to measure and (2) how to process it into a metric.

Regarding the hosting thing: I believe we the Github pages will be sufficient until we get so much data that we need advanced search. Before that, we already have the ability to see all results for a given testbed, protocol, setup. You can see all runs of a setup (https://iot-benchmark.github.io/setups/setup5) or compare different protocols in a given profile (https://iot-benchmark.github.io/profiles/somepattern-4s).

If we're successful and need more advanced features we can move to the IoTBench website. But this is significantly more web dev and maintenance work IMO (management of accounts, database, submission forms etc.).

@romain-jacob
Copy link
Member

100% agree

@romain-jacob
Copy link
Member

Thinking about the new data schema: currently we have in 'runs' the report of the computed metrics (observed and output). I think that makes sense in order to leave flexibility on how the data is extracted from the experiment.

But, that also mean we don't have access to the raw data... (i.e., power draw). I feel like we turn around in circles here. I don't see how to progress. :-/

@simonduq
Copy link
Member Author

But why not just also include raw data in each run? (or a link to raw data)

@romain-jacob
Copy link
Member

Yeap, that might be an option to have as an optional field in the 'run' component...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Start a discussion
Projects
None yet
Development

No branches or pull requests

2 participants