Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for outer dataset formats? #31

Open
SkalskiP opened this issue Jul 25, 2023 · 34 comments
Open

Support for outer dataset formats? #31

SkalskiP opened this issue Jul 25, 2023 · 34 comments

Comments

@SkalskiP
Copy link

Hi 👋🏻 Looks super interesting. Do you plan to expand to other dataset formats?

@AyushExel
Copy link
Contributor

@SkalskiP Yes, I'm planning to add all data formats. A little issue of bandwidth, but I think I'll make some progress over the weekend. The plan is to cover formats like this yolo->coco->cls/img folders-> remote datasets RF, DVC etc. depending on how much time I can dedicate.

@AyushExel
Copy link
Contributor

AyushExel commented Jul 25, 2023

The other thing is to actually improve the UI. I think I've pushed streamlit to the limit and cracks are starting to show, but I also don't want to start a frontendv2 from scratch using react :)
And finally create an embeddings predictor from trochvision models so I can make ultralytics dependency on main branch in order to make a proper pypi release. Right now I can't do that as the twine won't allow me to build a package that depends on a branch

@SkalskiP
Copy link
Author

Awesome @AyushExel! 🔥 Have you considered using supervision for loading datasets? We offer YOLO, COCO, and PASCAL VOC dataset loaders. You should be able to add functionality in one afternoon. I hope! 😅

Take a look at the docs if you think it could be helpful: https://roboflow.github.io/supervision/dataset/core/

@AyushExel
Copy link
Contributor

Ohh I didn't know supervision loads coco too. Yeah then I think it's a no brainer to use that. The GUI stuff is almost ready. I'll take a look at supervison tonight

@SkalskiP
Copy link
Author

Awesome! Let me know how it goes! It would be so awesome to see supervision power up that feature.

@AyushExel
Copy link
Contributor

@SkalskiP how about training? Because the while idea is to simply allow users to get a trainable command for the new dataset they create within minutes. But ultralytics currently doesn't support training via coco format. So does supervision efficiently coverts the dataset for yolo format?
I think there is another advantage to have an intermediate step where coco/voc are converted to yolo internally(hidden from the user) because yolo format supports txt files which can combine images from various folders, which I think is not possible for coco? (I might be wrong, haven't used coco format a lot, but all imgs do need to be under one folder right?)

@SkalskiP
Copy link
Author

@AyushExel
With Supervision you can:

  1. Covert between formats. So you can start from COCO/VOC and convert into YOLO if you wish.
  2. Marge datasets. If user will be able to define multiple directors with YOLO or COCO format annotations / images and you can marge that into single one if you want.
  3. You can also split data if you want. Assuming someone gives you images and annotations for training and you want to get train and test set.

@hardikdava
Copy link
Contributor

@AyushExel, we are also adding functionality to evaluate models such as mAP and ConfusionMatrix. It would be needed to compare the accuracy. Supervison is supporting these kind of benchmarking for all supporting dataset. There might be possibility to add intermediate detection result conversion.

Let me know if I can contribute to it anyhow.

@AyushExel
Copy link
Contributor

Okay got it @SkalskiP
@hardikdava Mmmm comparing accuracy is more of a modeling problem. I was thinking of just handing off the dataset to the user with the command to train, but not do any modeling in the app. That is to reduce the scope of this app. But I'm not sure if that's the correct approach. What do you think?

@hardikdava
Copy link
Contributor

Got it @AyushExel . You can add supervison.DetectionDataset api to load into user prefer format and convert it to yolo format, make the intermediate steps and at the end train the model. Loading and converting will be the 2 lines command with supervision. If you want, I can prepare a draft PR for you if you want.

@AyushExel
Copy link
Contributor

Okay thanks! Hold on for a day, I'll prepare a high level api design doc or an issue here and then we can proceed. I don't want you to waste any effort if we end up not including that.

@SkalskiP
Copy link
Author

@AyushExel and @hardikdava I can help out as well

@AyushExel
Copy link
Contributor

Didn't get a lot of time today but here's the immediate near-future roadmap - #38

I'll add more and also the API design. Feel free to add more in suggestions in the comments there.

@SkalskiP
Copy link
Author

Would you like us to take a look at it?

@AyushExel
Copy link
Contributor

Sure

@SkalskiP
Copy link
Author

Okey. It looks like supervision could support your whole data-loading pipeline. We have:

  • Detection: YOLO, COCO, and PASCAL
  • Segmentation: YOLO and COCO (PASCAL coming with this release)
  • Classification: Directory Structure

What would you like us to contribute first?

@AyushExel
Copy link
Contributor

There's some parts of the codebase that aren't neat. Would you like to get on call and scope the supervision support? I can show you around and then we can get started

@SkalskiP
Copy link
Author

There's some parts of the codebase that aren't neat.

Your codebase or ours?

And yes! Sure! Let’s meet and talk it through.

@AyushExel
Copy link
Contributor

AyushExel commented Jul 31, 2023

Mine ofcourse.
Can you suggest a time tomorrow and provide an email

@SkalskiP
Copy link
Author

I'm in GMT+2. We could talk between 1:30 PM - 4:00 PM my time tomorrow. My email is [email protected].

@hardikdava
Copy link
Contributor

@SkalskiP @AyushExel Sorry guys, I have busy schedule tomorrow. I won't be able to join. But @SkalskiP please update me with the summary if possible. Thanks.

@SkalskiP
Copy link
Author

@hardikdava will do!

@SkalskiP
Copy link
Author

@AyushExel, can we make it 30 minutes earlier?

@AyushExel
Copy link
Contributor

Done @SkalskiP

@SkalskiP
Copy link
Author

@AyushExel, awesome! See you!

@AyushExel
Copy link
Contributor

Hey guys! Any updates on this?

@SkalskiP
Copy link
Author

SkalskiP commented Aug 9, 2023

@AyushExel sorry! I was busy with latest supervision release. But I’m back tomorrow. I can start tackling it.

@SkalskiP
Copy link
Author

I met with @AyushExel last week, and we agreed on the initial plan. The plan will have several phases.

  1. Rewrite the dataset loading logic found in the Explorer class. Currently, it uses a lot of code from the Ultralytics package. We could replace it with sv.DetectionDataset.from_yolo.
  2. Over time, we will be able to expand the range of supported models. Add COCO and PASCAL VOC.

@AyushExel is this idea still alive?

@AyushExel
Copy link
Contributor

@SkalskiP @hardikdava Yes ofcourse! I've been busy with a few projects here and there, but most likely I'll be back on this one starting next week. I plan to add some features to be able to load datasets/tables directly from s3 and gcs

@SkalskiP
Copy link
Author

@hardikdava will you have time to take a look at it?

@hardikdava
Copy link
Contributor

@SkalskiP Yeah, I can definitely take a look.

@SkalskiP
Copy link
Author

@hardikdava awesome! Apologies for slowing down the process. I can't wait to see supervision powering that functionality.

@AyushExel
Copy link
Contributor

@hardikdava I see a notification from you about some error github mobile but I'm not able to find it on here for some reason. can you please point me to it again? Thanks!

@hardikdava
Copy link
Contributor

@AyushExel Thanks but I got it solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants