Library | Description |
---|---|
OpenCV | Open Source Computer Vision Library |
Pillow | The friendly PIL fork (Python Imaging Library) |
scikit-image | collection of algorithms for image processing |
SciPy | open-source software for mathematics, science, and engineering |
mmcv | OpenMMLab foundational library for computer vision research |
imutils | A series of convenience functions to make basic image processing operations |
pgmagick | python based wrapper for GraphicsMagick/ImageMagick |
Mahotas | library of fast computer vision algorithms (last updated: 2021) |
SimpleCV | The Open Source Framework for Machine Vision (last updated: 2015) |
Library | Description |
---|---|
PMT | Piotr's Computer Vision Matlab Toolbox |
matlabfns | MATLAB and Octave Functions for Computer Vision and Image Processing, P. Kovesi, University of Western Australia |
VLFeat | open source library implements popular computer vision algorithms, A. Vedaldi and B. Fulkerson |
MLV | Mid-level Vision Toolbox (MLVToolbox), BWLab, University of Toronto |
ElencoCode | Loris Nanni's CV functions, University of Padova |
- Performance - Classficiation
- Confusion Matrix: TP, FP, TN, and FN for each class
- For class balanced datasets:
- Accuracy : (TP+TN)/(TP+FP+TN+FN)
- ROC curve: TPR vs FPR
- For class imbalanced datasets:
- Precision (PR): TP/(TP+FP)
- Recall (RC): TP/(TP+FN)
- F1-Score: 2PRRC/(PR+RC)
- Balanced accuracy: (TPR+TNR)/2
- Weighted-Averaged Precision, Recall, and F1-Score
- PR curve: PR vs RC
- Performance - Detection
- Intersection over Union (IoU)
- mAP: Average AP over all classes
- [email protected]: Uses IoU threshold 0.5 (PASCAL VOC)
- [email protected]:0.95: Averages AP over multiple IoU thresholds (COCO metric)
- False Positives Per Image (FPPI)
- Precision, Recall, and F1-Score
- Performance - Segementation
- Intersection over Union (IoU) / Jaccard Index
- Dice Coefficient / F1-Score
- Mean Pixel Accuracy (mPA)
- Boundary IoU (BIoU)
- Hausdorff Distance
- Precision, Recall, and F1-Score
- Performance - Tracking
- Multiple Object Tracking Accuracy (MOTA)
- Multiple Object Tracking Precision (MOTP)
- ID F1-Score (IDF1)
- Identity Switches (IDSW)
- Track Completeness (TC)
- Mostly Tracked (MT) / Mostly Lost (ML)
- Performance - Perceptual Quality (Super-resolution, Denoising, Contrast Enhancement)
- Peak Signal-to-Noise Ratio (PSNR)
- Mean Squared Error (MSE)
- Structural Similarity Index (SSIM)
- Multi-Scale SSIM (MS-SSIM)
- Learned Perceptual Image Patch Similarity (LPIPS)
- Visual Information Fidelity (VIF)
- Kernel Inception Distance (KID)
- Gradient Magnitude Similarity Deviation (GMSD)
- Edge Preservation Index (EPI)
- Natural Image Quality Evaluator (NIQE)
- Performance - Generation (GANs, Diffusion Models)
- Inception Score (IS)
- Fréchet Inception Distance (FID)
- Perceptual Path Length (PPL)
- Computation
- Inference Time - Frames Per Second (FPS)
- Model Size
- CORE Rank A:
- ICCV: International Conference on Computer Vision (IEEE) [dblp]
- CVPR: Conference on Computer Vision and Pattern Recognition (IEEE) [dblp]
- ECCV: European Conference on Computer Vision (Springer) [dblp]
- WACV: Winter Conference/Workshop on Applications of Computer Vision (IEEE) [dblp]
- ICASSP: International Conference on Acoustics, Speech, and Signal Processing (IEEE) [dblp]
- MICCAI: Conference on Medical Image Computing and Computer Assisted Intervention (Springer) [dblp]
- IROS: International Conference on Intelligent Robots and Systems (IEEE) [dblp]
- ACMMM: ACM International Conference on Multimedia (ACM) [dblp]
- CORE Rank B
- ACCV: Asian Conference on Computer Vision (Springer) [dblp]
- VCIP: International Conference on Visual Communications and Image Processing (IEEE) [dblp]
- ICIP: International Conference on Image Processing (IEEE) [dblp]
- CAIP: International Conference on Computer Analysis of Images and Patterns (Springer) [dblp]
- VISAPP: International Conference on Vision Theory and Applications (SCITEPRESS) [dblp]
- ICPR: International Conference on Pattern Recognition (IEEE) [dblp]
- ACIVS: Conference on Advanced Concepts for Intelligent Vision Systems (Springer) [dblp]
- EUSIPCO: European Signal Processing Conference (IEEE) [dblp]
- ICRA: International Conference on Robotics and Automation (IEEE) [dblp]
- BMVC: British Machine Vision Conference (organized by BMVA: British Machine Vision Association and Society for Pattern Recognition) [dblp]
- CORE Rank C:
- Unranked but popular
- MIUA: Conference on Medical Image Understanding and Analysis (organized by BMVA: British Machine Vision Association and Society for Pattern Recognition) [dblp]
- EUVIP: European Workshop on Visual Information Processing (IEEE, organized by EURASIP: European Association for Signal Processing) [dblp]
- CIC: Color and Imaging Conference (organized by IS&T: Society for Imaging Science and Technology) [dblp]
- CVCS: Colour and Visual Computing Symposium [dblp]
- DSP: International Conference on Digital Signal Processing [dblp]
- Tier 1
- IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI) [dblp]
- IEEE Transactions on Image Processing (IEEE TIP) [dblp]
- IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) [dblp]
- Springer International Journal of Computer Vision (Springer IJCV) [dblp]
- Elsevier Pattern Recognition (Elsevier PR) [dblp]
- Elsevier Computer Vision and Image Understanding (Elsevier CVIU) [dblp]
- Elsevier Expert Systems with Applications [dblp]
- Elsevier Neurocomputing [dblp]
- Springer Neural Computing and Applications [dblp]
- Tier 2
- Elsevier Image and Vision Computing (Elsevier IVC) [dblp]
- Elsevier Pattern Recognition Letters (Elsevier PR Letters) [dblp]
- Elsevier Journal of Visual Communication and Image Representation [dblp]
- Springer Journal of Mathematical Imaging and Vision [dblp]
- SPIE Journal of Electronic Imaging [dblp]
- IET Image Processing [dblp]
- Springer Pattern Analysis and Applications (Springer PAA) [dblp]
- Springer Machine Vision and Applications (Springer MVA) [dblp]
- IET Computer Vision [dblp]
- Open Access
- International Computer Vision Summer School (IVCSS) [2007-Present], Sicily, Italy [Website]
- Machine Intelligence and Visual Computing Summer School (VISUM) [2013-2022], Porto, Portugal [Website]
- BMVA British Computer Vision Summer School (CVSS) [2013-2020,2023-Present], UK [Website]
- Object Classification
- [LeNet-5, 1998] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
- [AlexNet, 2012] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012).
- [ZFNet, 2014] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, 2014.
- [VGG, 2014] Simonyan, Karen and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” CoRR abs/1409.1556 (2014): n. Pag.
- [GoogLeNet, 2015] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- [ResNet, 2016] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
- [InceptionV3, 2016] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
- [Xception, 2017] Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
- [EfficientNet, 2019] Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019.
- [ViT, 2020] Dosovitskiy, Alexey, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations. 2020.
- [ConvNeXt, 2022] Liu, Zhuang et al. “A ConvNet for the 2020s.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 11966-11976.
- Object Classification - Lightweight
- [SqueezeNet, 2016] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
- [MobileNetV2, 2018] Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
- [ShuffleNetV2, 2018] Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European conference on computer vision (ECCV). 2018.
- [MobileNetV3, 2019] Howard, Andrew, et al. "Searching for mobilenetv3." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
- [GhostNetV1, 2020] Han, Kai, et al. "Ghostnet: More features from cheap operations." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
- [MobileViT, 2021] Mehta, Sachin, and Mohammad Rastegari. "Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer." arXiv preprint arXiv:2110.02178 (2021).
- [GhostNetV2, 2022] Tang, Yehui, et al. "GhostNetv2: enhance cheap operation with long-range attention." Advances in Neural Information Processing Systems 35 (2022): 9969-9982.
- [ConvNeXt-Tiny, 2022] Liu, Zhuang et al. “A ConvNet for the 2020s.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 11966-11976.
- [MaxViT-Tiny, 2022] Tu, Zhengzhong, et al. "Maxvit: Multi-axis vision transformer." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
- [MobileFormer, 2022] Chen, Yinpeng, et al. "Mobile-former: Bridging mobilenet and transformer." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
- [ConvNeXtV2-Tiny, 2023] Woo, Sanghyun, et al. "Convnext v2: Co-designing and scaling convnets with masked autoencoders." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- Object Detection
- [Faster R-CNN, 2015] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).
- [SSD, 2016] Liu, Wei, et al. "Ssd: Single shot multibox detector." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
- [RetinaNet, 2017] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
- [YOLOV3, 2018] Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
- [YOLOX, 2021] Ge, Zheng, et al. "Yolox: Exceeding yolo series in 2021." arXiv preprint arXiv:2107.08430 (2021).
- [YOLOR, 2021] Wang, Chien-Yao, I-Hau Yeh, and Hong-Yuan Mark Liao. "You only learn one representation: Unified network for multiple tasks." arXiv preprint arXiv:2105.04206 (2021).
- [YOLOV7, 2023] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- Object Segmentation - Semantic / Instance / Panoptic
- Classical: Graph Cut / Normalized Cut, Fuzzy Clustering, Mean-shift / Quick-shift, SLIC, Active Contours (Snakes), Region Growing, K-means Clustering, Watershed, Level Set Methods, Markov Random Fields (MRF), Edge (1st / 2nd derivatives) + filling.
- [U-Net, 2015] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015.
- [DeepLabV3, 2017] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
- [PSPNet, 2017] Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
- [Mask R-CNN, 2017] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
- [U-Net++, 2018] Zhou, Zongwei et al. “UNet++: A Nested U-Net Architecture for Medical Image Segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support : 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, S... 11045 (2018): 3-11.
- [DeepLabV3+, 2018] Chen, Liang-Chieh et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” European Conference on Computer Vision (2018).
- [MaskFormer, 2021] Cheng, Bowen, Alex Schwing, and Alexander Kirillov. "Per-pixel classification is not all you need for semantic segmentation." Advances in Neural Information Processing Systems 34 (2021): 17864-17875.
- [SegFormer, 2021] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021.
- [SAM, 2023] A. Kirillov, E. Mintun, N. Ravi, et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026.
- [SEEM, 2023] Zou, Xueyan, et al. "Segment everything everywhere all at once." Advances in neural information processing systems 36 (2023): 19769-19782.
- Feature Matching
- {Local Features} [Superpoint, 2018] DeTone, Daniel, Tomasz Malisiewicz, and Andrew Rabinovich. "Superpoint: Self-supervised interest point detection and description." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018.
- {Local Features} [D2-Net, 2019] Dusmanu, Mihai, et al. "D2-net: A trainable cnn for joint detection and description of local features." arXiv preprint arXiv:1905.03561 (2019).
- [R2D2, 2019] Revaud, Jerome, et al. "R2D2: repeatable and reliable detector and descriptor." arXiv preprint arXiv:1906.06195 (2019).
- {Detector-Based Matcher} [SuperGlue, 2020] Sarlin, Paul-Edouard, et al. "Superglue: Learning feature matching with graph neural networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
- {Detector-Free Matcher} [DRC-Net, 2020] Li, Xinghui, et al. "Dual-resolution correspondence networks." Advances in Neural Information Processing Systems 33 (2020): 17346-17357.
- {Local Features} [DISK, 2020] Tyszkiewicz, Michał, Pascal Fua, and Eduard Trulls. "DISK: Learning local features with policy gradient." Advances in Neural Information Processing Systems 33 (2020): 14254-14265.
- {Detector-Free Matcher} [LoFTR, 2021] Sun, Jiaming, et al. "LoFTR: Detector-free local feature matching with transformers." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
- {Detector-Free Matcher} [MatchFormer, 2022] Wang, Qing, et al. "Matchformer: Interleaving attention in transformers for feature matching." Proceedings of the Asian Conference on Computer Vision. 2022.
- {Detector-Based Matcher} [LightGlue, 2023] Lindenberger, Philipp, Paul-Edouard Sarlin, and Marc Pollefeys. "LightGlue: Local Feature Matching at Light Speed." arXiv preprint arXiv:2306.13643 (2023).
- {Detector-Based Matcher} [GlueStick, 2023] Pautrat, Rémi, et al. "Gluestick: Robust image matching by sticking points and lines together." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
- {Detector-Free Matcher} [OAMatcher, 2023] Dai, Kun, et al. "OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature Matching." arXiv preprint arXiv:2302.05846 (2023).
- Edstedt, Johan, et al. "RoMa: Revisiting Robust Losses for Dense Feature Matching." arXiv preprint arXiv:2305.15404 (2023).
- Shen, Xuelun, et al. "GIM: Learning Generalizable Image Matcher From Internet Videos." The Twelfth International Conference on Learning Representations. 2023.
- {Detector-Free Matcher} [DeepMatcher, 2024] Xie, Tao, et al. "Deepmatcher: a deep transformer-based network for robust and accurate local feature matching." Expert Systems with Applications 237 (2024): 121361.
- {Detector-Free Matcher} [XFeat, 2024] Potje, Guilherme, et al. "XFeat: Accelerated Features for Lightweight Image Matching." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.
- Object Tracking
- [SORT, 2017] Wojke, Nicolai, Alex Bewley, and Dietrich Paulus. "Simple online and realtime tracking with a deep association metric." 2017 IEEE international conference on image processing (ICIP). IEEE, 2017.
- [Tracktor, 2019] Bergmann, Philipp, Tim Meinhardt, and Laura Leal-Taixe. "Tracking without bells and whistles." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
- [FairMOT, 2021] Zhang, Yifu, et al. "Fairmot: On the fairness of detection and re-identification in multiple object tracking." International Journal of Computer Vision 129 (2021): 3069-3087.
- [STARK, 2021] Yan, Bin, et al. "Learning spatio-temporal transformer for visual tracking." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
- [MixFormer, 2022] Cui, Yutao, et al. "Mixformer: End-to-end tracking with iterative mixed attention." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
- [ByteTrack, 2022] Zhang, Yifu, et al. "Bytetrack: Multi-object tracking by associating every detection box." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
- Image Generation
- [DCGAN, 2015] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
- [BigGAN, 2018] Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale GAN training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018).
- [StyleGANv3, 2021] Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021): 852-863.
- [DALL-E, 2021] Ramesh, Aditya, et al. "Zero-shot text-to-image generation." International conference on machine learning. Pmlr, 2021.
- [LAFITE, 2021] Zhou, Y., et al. "Lafite: Towards language-free training for text-to-image generation. arxiv 2021." arXiv preprint arXiv:2111.13792 2 (2021).
- [CLIP, 2021] Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
- [Imagen, 2022] Saharia, Chitwan, et al. "Photorealistic text-to-image diffusion models with deep language understanding." Advances in neural information processing systems 35 (2022): 36479-36494.
- [GLIDE, 2022] Nichol, Alexander Quinn, et al. "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models." International Conference on Machine Learning. PMLR, 2022.
- [unCLIP, 2022] Ramesh, Aditya, et al. "Hierarchical Text-Conditional Image Generation with CLIP Latents." arXiv preprint arXiv:2204.06125 (2022).
- [LDM / Stable Diffusion (SD), 2022] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
- [DALL-E 2, 2022] Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 1.2 (2022).
- [DALL-E 3, 2023] Betker, James, et al. "Improving image generation with better captions." Computer Science. https://cdn.openai.com/papers/dall-e-3.pdf 2.3 (2023): 8.
- [SDXL, 2023] Podell, Dustin, et al. "SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis." The Twelfth International Conference on Learning Representations. 2023.
- Image Retrieval
- [LSMH, 2016] Lu, Xiaoqiang, Xiangtao Zheng, and Xuelong Li. "Latent semantic minimal hashing for image retrieval." IEEE Transactions on Image Processing 26.1 (2016): 355-368.
- [R–GeM, 2018] Radenović, Filip, Giorgos Tolias, and Ondřej Chum. "Fine-tuning CNN image retrieval with no human annotation." IEEE transactions on pattern analysis and machine intelligence 41.7 (2018): 1655-1668.
- [HOW, 2020] Tolias, Giorgos, Tomas Jenicek, and Ondřej Chum. "Learning and aggregating deep local descriptors for instance-level recognition." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer International Publishing, 2020.
- [FIRe, 2021] Weinzaepfel, Philippe, et al. "Learning Super-Features for Image Retrieval." International Conference on Learning Representations. 2021.
- [Token, 2022] Wu, Hui, et al. "Learning token-based representation for image retrieval." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 3. 2022.
- WIP:
- Explainable AI (XAI)
- Video Summarization and Captioning
- Text Recognition
- Data Compression
- Affective Computing
- Image Colorization
- Virtual reality (VR)
- Augmented reality (AR)
- Visual Question Answering (VQA)
- Vision-Language Models (VLMs)
- DeepFake Detection
- 3D Reconstruction
- Image Captioning
- Image Super-Resolution / Image Restoration
- Pose Estimation
- Biometric Analysis
- Depth Estimation
- Meta Learning
- Semi-Supervised Learning - Zero/One/Few shot
Book | Links |
---|---|
Antonio Torralba, Phillip Isola, William T. Freeman. “Foundations of Computer Vision” MIT Press, (2024). | goodreads |
Nixon, Mark, and Alberto Aguado. “Feature extraction and image processing for computer vision” Academic press, (2019). | goodreads |
González, Rafael Corsino and Richard E. Woods. “Digital image processing, 4th Edition” (2018). | goodreads |
E.R. Davies. “Computer Vision: Principles, Algorithms, Applications, Learning” Academic press, (2017). | goodreads |
Prince, Simon. “Computer Vision: Models, Learning, and Inference” (2012). | goodreads |
Forsyth, David Alexander and Jean Ponce. “Computer Vision - A Modern Approach, Second Edition” (2011). | goodreads |
Szeliski, Richard. “Computer Vision - Algorithms and Applications” Texts in Computer Science (2010). | goodreads |
Bishop, Charles M.. “Pattern recognition and machine learning, 5th Edition” Information science and statistics (2007). | goodreads |
Harltey, Andrew and Andrew Zisserman. “Multiple view geometry in computer vision (2. ed.)” (2003). | goodreads |
Stockman, George C. and Linda G. Shapiro. “Computer Vision” (2001). | goodreads |
Course | Year | Instructor | Source |
---|---|---|---|
Introduction to Computer Vision | 2025 | James Tompkin | Brown |
Deep Learning for Computer Vision | 2024 | Fei-Fei Li | Stanford |
Advances in Computer Vision | 2023 | William T. Freeman | MIT |
OpenCV for Python Developers | 2023 | Patrick Crawford | LinkedIn Learning |
Computer Vision | 2021 | Andreas Geiger | University of Tübingen |
Computer Vision | 2021 | Yogesh S Rawat / Mubarak Shah | University of Central Florida |
Advanced Computer Vision | 2021 | Mubarak Shah | University of Central Florida |
Deep Learning for Computer Vision | 2020 | Justin Johnson | University of Michigan |
Advanced Deep Learning for Computer Vision | 2020 | Laura Leal-Taixé / Matthias Niessner | Technical University of Munich |
Introduction to Digital Image Processing | 2020 | Ahmadreza Baghaie | New York Institute of Technology |
Quantitative Imaging | 2019 | Kevin Mader | ETH Zurich |
Convolutional Neural Networks for Visual Recognition | 2017 | Fei-Fei Li | Stanford University |
Introduction to Digital Image Processing | 2015 | Rich Radke | Rensselaer Polytechnic Institute |
Machine Learning for Robotics and Computer Vision | 2014 | Rudolph Triebel | Technical University of Munich |
Multiple View Geometry | 2013 | Daniel Cremers | Technical University of Munich |
Variational Methods for Computer Vision | 2013 | Daniel Cremers | Technical University of Munich |
Computer Vision | 2012 | Mubarak Shah | University of Central Florida |
Image and video processing | - | Guillermo Sapiro | Duke University |
Introduction to Computer Vision | - | Aaron Bobick / Irfan Essa | Udacity |
- Tags: Object Classification
[ObjCls]
, Object Detection[ObjDet]
, Object Segmentation[ObjSeg]
, General Library[GenLib]
, Text Reading / Object Character Recognition[OCR]
, Action Recognition[ActRec]
, Object Tracking[ObjTrk]
, Data Augmentation[DatAug]
, Simultaneous Localization and Mapping[SLAM]
, Outlier/Anomaly/Novelty Detection[NvlDet]
, Content-based Image Retrieval[CBIR]
, Image Enhancement[ImgEnh]
, Aesthetic Assessment[AesAss]
, Explainable Artificial Intelligence[XAI]
, Text-to-Image Generation[TexImg]
, Pose Estimation[PosEst]
, Video Matting[VidMat]
, Eye Tracking[EyeTrk]
Repo | Tags | Description |
---|---|---|
computervision-recipes | [GenLib] |
Microsoft, Best Practices, code samples, and documentation for Computer Vision |
FastAI | [GenLib] |
FastAI, Library over PyTorch used for learning and practicing machine learning and deep learning |
pytorch-lightning | [GenLib] |
PyTorchLightning, Lightweight PyTorch wrapper for high-performance AI research |
ignite | [GenLib] |
PyTorch, High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently |
pytorch_geometric | [GenLib] |
Graph Neural Network Library for PyTorch |
kornia | [GenLib] |
Open Source Differentiable Computer Vision Library |
ncnn | [GenLib] |
Tencent, High-performance neural network inference framework optimized for the mobile platform |
MediaPipe | [ObjDet] [ObjSeg] [ObjTrk] [GenLib] |
Google, iOS - Andriod - C++ - Python - Coral, Face Detection - Face Mesh - Iris - Hands - Pose - Holistic - Hair Segmentation - Object Detection - Box Tracking - Instant Motion Tracking - Objectron - KNIFT (Similar to SIFT) |
PyTorch image models | [ObjCls] |
rwightman, PyTorch image classification models, scripts, pretrained weights |
mmclassification | [ObjCls] |
OpenMMLab, Image Classification Toolbox and Benchmark |
vit-pytorch | [ObjCls] |
SOTA for vision transformers |
face_classification | [ObjCls] [ObjDet] |
Real-time face detection and emotion/gender classification |
mmdetection | [ObjDet] |
OpenMMLab, Image Detection Toolbox and Benchmark |
detectron2 | [ObjDet] [ObjSeg] |
Facebook, FAIR's next-generation platform for object detection, segmentation and other visual recognition tasks |
detr | [ObjDet] |
Facebook, End-to-End Object Detection with Transformers |
libfacedetection | [ObjDet] |
An open source library for face detection in images, speed: ~1000FPS |
FaceDetection-DSFD | [ObjDet] |
Tencent, SOTA face detector |
object-Detection-Metrics | [ObjDet] |
Most popular metrics used to evaluate object detection algorithms |
SAHI | [ObjDet] [ObjSeg] |
A lightweight vision library for performing large scale object detection/ instance segmentation |
yolov5 | [ObjDet] |
ultralytics |
AlexeyAB/darknet pjreddie/darknet | [ObjDet] |
YOLOv4 / Scaled-YOLOv4 / YOLOv3 / YOLOv2 |
U-2-Net | [ObjDet] |
ultralytics U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection |
segmentation_models.pytorch | [ObjSeg] |
qubvel, PyTorch segmentation models with pretrained backbones |
mmsegmentation | [ObjSeg] |
OpenMMLab, Semantic Segmentation Toolbox and Benchmark |
mmocr | [OCR] |
OpenMMLab, Text Detection, Recognition and Understanding Toolbox |
pytesseract | [OCR] |
A Python wrapper for Google Tesseract |
EasyOCR | [OCR] |
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc |
PaddleOCR | [OCR] |
Practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices |
PaddleSeg | [ObjSeg] |
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc |
mmtracking | [ObjTrk] |
OpenMMLab, Video Perception Toolbox for object detection and tracking |
mmaction | [ActRec] |
OpenMMLab, An open-source toolbox for action understanding based on PyTorch |
albumentations | [DatAug] |
Fast image augmentation library and an easy-to-use wrapper around other libraries |
ORB_SLAM2 | [SLAM] |
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities |
pyod | [NvlDet] |
Python Toolbox for Scalable Outlier Detection (Anomaly Detection) |
imagededup | [CBIR] |
Image retrieval, CBIR, Find duplicate images made easy! |
image-match | [CBIR] |
Image retrieval, CBIR, Quickly search over billions of images |
Bringing-Old-Photos-Back-to-Life | [ImgEnh] |
Microsoft, Bringing Old Photo Back to Life (CVPR 2020 oral) |
image-quality-assessment | [AesAss] |
Idealo, Image Aesthetic, NIMA model to predict the aesthetic and technical quality of images |
aesthetics | [AesAss] |
Image Aesthetics Toolkit using Fisher Vectors |
pytorch-cnn-visualizations | [XAI] |
Pytorch implementation of convolutional neural network visualization techniques |
DALLE2-pytorch | [TexImg] |
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch |
imagen-pytorch | [TexImg] |
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch |
openpose | [PosEst] |
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation |
RobustVideoMatting | [VidMat] |
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML! |
fastudp | [NvlDet] [CBIR] |
An unsupervised and free tool for image and video dataset analysis |
Random-Erasing | [DatAug] |
Random Erasing Data Augmentation in PyTorch |
CutMix-PyTorch | [DatAug] |
Official Pytorch implementation of CutMix regularizer |
keras-cv | [GenLib] |
Library of modular computer vision oriented Keras components |
PsychoPy | [EyeTrk] |
Library for running psychology and neuroscience experiments |
alibi-detect | [NvlDet] |
Algorithms for outlier, adversarial and drift detection |
Captum | [XAI] |
built by PyTorch team, Model interpretability and understanding for PyTorch |
Alibi | [XAI] |
Algorithms for explaining machine learning models |
iNNvestigate | [XAI] |
for TF, A toolbox to iNNvestigate neural networks' predictions |
keras-vis | [XAI] |
for Keras, Neural network visualization toolkit |
Keract | [XAI] |
for Keras, Layers Outputs and Gradients |
pytorch-grad-cam | [XAI] |
for PyTorch, Advanced AI Explainability for computer vision |
SHAP | [XAI] |
A game theoretic approach to explain the output of any machine learning model |
TensorWatch | [XAI] |
built by Microsoft, Debugging, monitoring and visualization for Python Machine Learning and Data Science |
WeightWatcher | [XAI] |
an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data |
- PyTorch - CV Datasets, Meta
- Tensorflow - CV Datasets, Google
- CVonline: Image Databases, Edinburgh University, Thanks to Robert Fisher!
- Kaggle
- PaperWithCode, Meta
- RoboFlow
- VisualData
- CUHK Computer Vision
- VGG - University of Oxford
- labelme, Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
- CVAT, an interactive video and image annotation tool for computer vision.
- VoTT, Microsoft, Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
- labelImg, Graphical image annotation tool and label object bounding boxes in images.
- VIA, VGG Oxford, HTML-based standalone manual annotation software for image, audio and video.
- FiftyOne, open-source tool for building high-quality datasets and computer vision models.
- makesense.ai, a free-to-use online tool for labeling photos.
- @AurelienGeron
[Individual]
, Aurélien Géron: former lead of YouTube's video classification team, and author of the O'Reilly book Hands-On Machine Learning with Scikit-Learn and TensorFlow. - @howardjeremyp
[Individual]
, Jeremy Howard: former president and chief scientist of Kaggle, and co-founder of fast.ai. - @PieterAbbeel
[Individual]
, Pieter Abbeel: professor of electrical engineering and computer sciences, University of California, Berkeley. - @pascalpoupart3507
[Individual]
, Pascal Poupart: professor in the David R. Cheriton School of Computer Science at the University of Waterloo. - @MatthiasNiessner
[Individual]
, Matthias Niessner: Professor at the Technical University of Munich and head of the Visual Computing Lab. - @MichaelBronsteinGDL
[Individual]
, Michael Bronstein: DeepMind Professor of AI, University of Oxford / Head of Graph Learning Research, Twitter. - @DeepFindr
[Individual]
, Videos about all kinds of Machine Learning / Data Science topics. - @deeplizard
[Individual]
, Videos about building collective intelligence. - @YannicKilcher
[Individual]
, Yannic Kilcher: make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society. - @sentdex
[Individual]
, sentdex: provides Python programming tutorials in machine learning, finance, data analysis, robotics, web development, game development and more. - @bmvabritishmachinevisionas8529
[Conferences]
, BMVA: British Machine Vision Association. - @ComputerVisionFoundation
[Conferences]
, Computer Vision Foundation (CVF): co-sponsored conferences on computer vision (e.g. CVPR and ICCV). - @cvprtum
[University]
, Computer Vision Group at Technical University of Munich. - @UCFCRCV
[University]
, Center for Research in Computer Vision at University of Central Florida. - @dynamicvisionandlearninggr1022
[University]
, Dynamic Vision and Learning research group channel! Technical University of Munich. - @TubingenML
[University]
, Machine Learning groups at the University of Tübingen. - @computervisiontalks4659
[Talks]
, Computer Vision Talks. - @freecodecamp
[Talks]
, Videos to learn how to code. - @LondonMachineLearningMeetup
[Talks]
, Largest machine learning community in Europe. - @LesHouches-iu6nv
[Talks]
, Summer school on Statistical Physics of Machine learning held in Les Houches, July 4 - 29, 2022. - @MachineLearningStreetTalk
[Talks]
, top AI podcast on Spotify. - @WeightsBiases
[Talks]
, Weights and Biases team's conversations with industry experts, and researchers. - @PreserveKnowledge
[Talks]
, Canada higher education media organization that focuses on advances in mathematics, computer science, and artificial intelligence. - @TwoMinutePapers
[Papers]
, Two Minute Papers: Explaining AI papers in few mins. - @TheAIEpiphany
[Papers]
, Aleksa Gordić: x-Google DeepMind, x-Microsoft engineer explaining AI papers. - @bycloudAI
[Papers]
, bycloud: covers the latest AI tech/research papers for fun. - WIP:
- https://www.youtube.com/@AAmini
- https://www.youtube.com/@WhatsAI
- https://www.youtube.com/@mrdbourke
- https://www.youtube.com/@marksaroufim
- https://www.youtube.com/@NicholasRenotte
- https://www.youtube.com/@abhishekkrthakur
- https://www.youtube.com/@AladdinPersson
- https://www.youtube.com/@CodeEmporium
- https://www.youtube.com/@arp_ai
- https://www.youtube.com/@CodeThisCodeThat
- https://www.youtube.com/@connorshorten6311
- https://www.youtube.com/@SmithaKolan
- https://www.youtube.com/@AICoffeeBreak
- https://www.youtube.com/@independentcode
- https://www.youtube.com/@alfcnz
- https://www.youtube.com/@KapilSachdeva
- https://www.youtube.com/@AICoding
- https://www.youtube.com/@mildlyoverfitted
- Vision Science, announcements about industry/academic jobs in computer vision around the world (in English).
- bull-i3, posts about job opportunities in computer vision in France (in French).
- How to build a good poster - [Link1] [Link2] [Link3]
- How to report a good report - [Link1] [link2]
- The "Python Machine Learning (3rd edition)" book code repository
- Multithreading with OpenCV-Python to improve video processing performance
- Computer Vision Zone - Videos and implementations for computer vision projects
- MadeWithML, Learn how to responsibly deliver value with ML
- d2l-en, Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities
- Writing Pet Peeves, writing guide for correctness, references, and style
- Hitchhiker's Guide to Python, Python best practices guidebook, written for humans
- python-fire, Google, a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
- shotcut, a free, open source, cross-platform video editor.
- PyTorch Computer Vision Cookbook, PyTorch Computer Vision Cookbook, Published by Packt.
- Machine Learning Mastery - Blogs, Blogs written by Jason Brownlee about machine learning.
- PyImageSearch - Blogs, Blogs written by Adrian Rosebrock about computer vision.
- jetson-inference, guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
- Frida de Sigley
- Dan Harvey
- CORE Conference Ranking
- Scimago Journal Ranking
- benthecoder/yt-channels-DS-AI-ML-CS
- anomaly-detection-resources, Anomaly detection related books, papers, videos, and toolboxes
- awesome-satellite-imagery-datasets List of satellite image training datasets with annotations for computer vision and deep learning
- awesome-Face_Recognition, Computer vision papers about faces.
- the-incredible-pytorch, Curated list of tutorials, papers, projects, communities and more relating to PyTorch