Skip to content

Curated educational list for computer vision

Notifications You must be signed in to change notification settings

mawady/awesome-cv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Curated educational list for computer vision


Python Libraries

Library Description
OpenCV Open Source Computer Vision Library
Pillow The friendly PIL fork (Python Imaging Library)
scikit-image collection of algorithms for image processing
SciPy open-source software for mathematics, science, and engineering
mmcv OpenMMLab foundational library for computer vision research
imutils A series of convenience functions to make basic image processing operations
pgmagick python based wrapper for GraphicsMagick/ImageMagick
Mahotas library of fast computer vision algorithms (last updated: 2021)
SimpleCV The Open Source Framework for Machine Vision (last updated: 2015)

MATLAB Libraries

Library Description
PMT Piotr's Computer Vision Matlab Toolbox
matlabfns MATLAB and Octave Functions for Computer Vision and Image Processing, P. Kovesi, University of Western Australia
VLFeat open source library implements popular computer vision algorithms, A. Vedaldi and B. Fulkerson
MLV Mid-level Vision Toolbox (MLVToolbox), BWLab, University of Toronto
ElencoCode Loris Nanni's CV functions, University of Padova

Evaluation Metrics

  • Performance - Classficiation
    • Confusion Matrix: TP, FP, TN, and FN for each class
    • For class balanced datasets:
      • Accuracy : (TP+TN)/(TP+FP+TN+FN)
      • ROC curve: TPR vs FPR
    • For class imbalanced datasets:
      • Precision (PR): TP/(TP+FP)
      • Recall (RC): TP/(TP+FN)
      • F1-Score: 2PRRC/(PR+RC)
      • Balanced accuracy: (TPR+TNR)/2
      • Weighted-Averaged Precision, Recall, and F1-Score
      • PR curve: PR vs RC
  • Performance - Detection
    • Intersection over Union (IoU)
    • mAP: Average AP over all classes
    • [email protected]: Uses IoU threshold 0.5 (PASCAL VOC)
    • [email protected]:0.95: Averages AP over multiple IoU thresholds (COCO metric)
    • False Positives Per Image (FPPI)
    • Precision, Recall, and F1-Score
  • Performance - Segementation
    • Intersection over Union (IoU) / Jaccard Index
    • Dice Coefficient / F1-Score
    • Mean Pixel Accuracy (mPA)
    • Boundary IoU (BIoU)
    • Hausdorff Distance
    • Precision, Recall, and F1-Score
  • Performance - Tracking
    • Multiple Object Tracking Accuracy (MOTA)
    • Multiple Object Tracking Precision (MOTP)
    • ID F1-Score (IDF1)
    • Identity Switches (IDSW)
    • Track Completeness (TC)
    • Mostly Tracked (MT) / Mostly Lost (ML)
  • Performance - Perceptual Quality (Super-resolution, Denoising, Contrast Enhancement)
    • Peak Signal-to-Noise Ratio (PSNR)
    • Mean Squared Error (MSE)
    • Structural Similarity Index (SSIM)
    • Multi-Scale SSIM (MS-SSIM)
    • Learned Perceptual Image Patch Similarity (LPIPS)
    • Visual Information Fidelity (VIF)
    • Kernel Inception Distance (KID)
    • Gradient Magnitude Similarity Deviation (GMSD)
    • Edge Preservation Index (EPI)
    • Natural Image Quality Evaluator (NIQE)
  • Performance - Generation (GANs, Diffusion Models)
    • Inception Score (IS)
    • Fréchet Inception Distance (FID)
    • Perceptual Path Length (PPL)
  • Computation
    • Inference Time - Frames Per Second (FPS)
    • Model Size

Conferences

  • CORE Rank A:
    • ICCV: International Conference on Computer Vision (IEEE) [dblp]
    • CVPR: Conference on Computer Vision and Pattern Recognition (IEEE) [dblp]
    • ECCV: European Conference on Computer Vision (Springer) [dblp]
    • WACV: Winter Conference/Workshop on Applications of Computer Vision (IEEE) [dblp]
    • ICASSP: International Conference on Acoustics, Speech, and Signal Processing (IEEE) [dblp]
    • MICCAI: Conference on Medical Image Computing and Computer Assisted Intervention (Springer) [dblp]
    • IROS: International Conference on Intelligent Robots and Systems (IEEE) [dblp]
    • ACMMM: ACM International Conference on Multimedia (ACM) [dblp]
  • CORE Rank B
    • ACCV: Asian Conference on Computer Vision (Springer) [dblp]
    • VCIP: International Conference on Visual Communications and Image Processing (IEEE) [dblp]
    • ICIP: International Conference on Image Processing (IEEE) [dblp]
    • CAIP: International Conference on Computer Analysis of Images and Patterns (Springer) [dblp]
    • VISAPP: International Conference on Vision Theory and Applications (SCITEPRESS) [dblp]
    • ICPR: International Conference on Pattern Recognition (IEEE) [dblp]
    • ACIVS: Conference on Advanced Concepts for Intelligent Vision Systems (Springer) [dblp]
    • EUSIPCO: European Signal Processing Conference (IEEE) [dblp]
    • ICRA: International Conference on Robotics and Automation (IEEE) [dblp]
    • BMVC: British Machine Vision Conference (organized by BMVA: British Machine Vision Association and Society for Pattern Recognition) [dblp]
  • CORE Rank C:
    • ICISP: International Conference on Image and Signal Processing (Springer) [dblp]
    • ICIAR: International Conference on Image Analysis and Recognition (Springer) [dblp]
    • ICVS: International Conference on Computer Vision Systems (Springer) [dblp]
  • Unranked but popular
    • MIUA: Conference on Medical Image Understanding and Analysis (organized by BMVA: British Machine Vision Association and Society for Pattern Recognition) [dblp]
    • EUVIP: European Workshop on Visual Information Processing (IEEE, organized by EURASIP: European Association for Signal Processing) [dblp]
    • CIC: Color and Imaging Conference (organized by IS&T: Society for Imaging Science and Technology) [dblp]
    • CVCS: Colour and Visual Computing Symposium [dblp]
    • DSP: International Conference on Digital Signal Processing [dblp]

Journals

  • Tier 1
    • IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI) [dblp]
    • IEEE Transactions on Image Processing (IEEE TIP) [dblp]
    • IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) [dblp]
    • Springer International Journal of Computer Vision (Springer IJCV) [dblp]
    • Elsevier Pattern Recognition (Elsevier PR) [dblp]
    • Elsevier Computer Vision and Image Understanding (Elsevier CVIU) [dblp]
    • Elsevier Expert Systems with Applications [dblp]
    • Elsevier Neurocomputing [dblp]
    • Springer Neural Computing and Applications [dblp]
  • Tier 2
    • Elsevier Image and Vision Computing (Elsevier IVC) [dblp]
    • Elsevier Pattern Recognition Letters (Elsevier PR Letters) [dblp]
    • Elsevier Journal of Visual Communication and Image Representation [dblp]
    • Springer Journal of Mathematical Imaging and Vision [dblp]
    • SPIE Journal of Electronic Imaging [dblp]
    • IET Image Processing [dblp]
    • Springer Pattern Analysis and Applications (Springer PAA) [dblp]
    • Springer Machine Vision and Applications (Springer MVA) [dblp]
    • IET Computer Vision [dblp]
  • Open Access
    • IEEE Access [dblp]
    • MDPI Journal of Imaging [dblp]

Summer Schools

  • International Computer Vision Summer School (IVCSS) [2007-Present], Sicily, Italy [Website]
  • Machine Intelligence and Visual Computing Summer School (VISUM) [2013-2022], Porto, Portugal [Website]
  • BMVA British Computer Vision Summer School (CVSS) [2013-2020,2023-Present], UK [Website]

Popular Articles

  • Object Classification
    • [LeNet-5, 1998] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
    • [AlexNet, 2012] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012).
    • [ZFNet, 2014] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, 2014.
    • [VGG, 2014] Simonyan, Karen and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” CoRR abs/1409.1556 (2014): n. Pag.
    • [GoogLeNet, 2015] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
    • [ResNet, 2016] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    • [InceptionV3, 2016] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    • [Xception, 2017] Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    • [EfficientNet, 2019] Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019.
    • [ViT, 2020] Dosovitskiy, Alexey, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations. 2020.
    • [ConvNeXt, 2022] Liu, Zhuang et al. “A ConvNet for the 2020s.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 11966-11976.
  • Object Classification - Lightweight
    • [SqueezeNet, 2016] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
    • [MobileNetV2, 2018] Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
    • [ShuffleNetV2, 2018] Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European conference on computer vision (ECCV). 2018.
    • [MobileNetV3, 2019] Howard, Andrew, et al. "Searching for mobilenetv3." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
    • [GhostNetV1, 2020] Han, Kai, et al. "Ghostnet: More features from cheap operations." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
    • [MobileViT, 2021] Mehta, Sachin, and Mohammad Rastegari. "Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer." arXiv preprint arXiv:2110.02178 (2021).
    • [GhostNetV2, 2022] Tang, Yehui, et al. "GhostNetv2: enhance cheap operation with long-range attention." Advances in Neural Information Processing Systems 35 (2022): 9969-9982.
    • [ConvNeXt-Tiny, 2022] Liu, Zhuang et al. “A ConvNet for the 2020s.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 11966-11976.
    • [MaxViT-Tiny, 2022] Tu, Zhengzhong, et al. "Maxvit: Multi-axis vision transformer." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
    • [MobileFormer, 2022] Chen, Yinpeng, et al. "Mobile-former: Bridging mobilenet and transformer." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
    • [ConvNeXtV2-Tiny, 2023] Woo, Sanghyun, et al. "Convnext v2: Co-designing and scaling convnets with masked autoencoders." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
  • Object Detection
    • [Faster R-CNN, 2015] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).
    • [SSD, 2016] Liu, Wei, et al. "Ssd: Single shot multibox detector." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
    • [RetinaNet, 2017] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
    • [YOLOV3, 2018] Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
    • [YOLOX, 2021] Ge, Zheng, et al. "Yolox: Exceeding yolo series in 2021." arXiv preprint arXiv:2107.08430 (2021).
    • [YOLOR, 2021] Wang, Chien-Yao, I-Hau Yeh, and Hong-Yuan Mark Liao. "You only learn one representation: Unified network for multiple tasks." arXiv preprint arXiv:2105.04206 (2021).
    • [YOLOV7, 2023] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
  • Object Segmentation - Semantic / Instance / Panoptic
    • Classical: Graph Cut / Normalized Cut, Fuzzy Clustering, Mean-shift / Quick-shift, SLIC, Active Contours (Snakes), Region Growing, K-means Clustering, Watershed, Level Set Methods, Markov Random Fields (MRF), Edge (1st / 2nd derivatives) + filling.
    • [U-Net, 2015] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015.
    • [DeepLabV3, 2017] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
    • [PSPNet, 2017] Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    • [Mask R-CNN, 2017] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
    • [U-Net++, 2018] Zhou, Zongwei et al. “UNet++: A Nested U-Net Architecture for Medical Image Segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support : 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, S... 11045 (2018): 3-11.
    • [DeepLabV3+, 2018] Chen, Liang-Chieh et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” European Conference on Computer Vision (2018).
    • [MaskFormer, 2021] Cheng, Bowen, Alex Schwing, and Alexander Kirillov. "Per-pixel classification is not all you need for semantic segmentation." Advances in Neural Information Processing Systems 34 (2021): 17864-17875.
    • [SegFormer, 2021] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021.
    • [SAM, 2023] A. Kirillov, E. Mintun, N. Ravi, et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026.
    • [SEEM, 2023] Zou, Xueyan, et al. "Segment everything everywhere all at once." Advances in neural information processing systems 36 (2023): 19769-19782.
  • Feature Matching
    • {Local Features} [Superpoint, 2018] DeTone, Daniel, Tomasz Malisiewicz, and Andrew Rabinovich. "Superpoint: Self-supervised interest point detection and description." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018.
    • {Local Features} [D2-Net, 2019] Dusmanu, Mihai, et al. "D2-net: A trainable cnn for joint detection and description of local features." arXiv preprint arXiv:1905.03561 (2019).
    • [R2D2, 2019] Revaud, Jerome, et al. "R2D2: repeatable and reliable detector and descriptor." arXiv preprint arXiv:1906.06195 (2019).
    • {Detector-Based Matcher} [SuperGlue, 2020] Sarlin, Paul-Edouard, et al. "Superglue: Learning feature matching with graph neural networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
    • {Detector-Free Matcher} [DRC-Net, 2020] Li, Xinghui, et al. "Dual-resolution correspondence networks." Advances in Neural Information Processing Systems 33 (2020): 17346-17357.
    • {Local Features} [DISK, 2020] Tyszkiewicz, Michał, Pascal Fua, and Eduard Trulls. "DISK: Learning local features with policy gradient." Advances in Neural Information Processing Systems 33 (2020): 14254-14265.
    • {Detector-Free Matcher} [LoFTR, 2021] Sun, Jiaming, et al. "LoFTR: Detector-free local feature matching with transformers." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
    • {Detector-Free Matcher} [MatchFormer, 2022] Wang, Qing, et al. "Matchformer: Interleaving attention in transformers for feature matching." Proceedings of the Asian Conference on Computer Vision. 2022.
    • {Detector-Based Matcher} [LightGlue, 2023] Lindenberger, Philipp, Paul-Edouard Sarlin, and Marc Pollefeys. "LightGlue: Local Feature Matching at Light Speed." arXiv preprint arXiv:2306.13643 (2023).
    • {Detector-Based Matcher} [GlueStick, 2023] Pautrat, Rémi, et al. "Gluestick: Robust image matching by sticking points and lines together." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
    • {Detector-Free Matcher} [OAMatcher, 2023] Dai, Kun, et al. "OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature Matching." arXiv preprint arXiv:2302.05846 (2023).
    • Edstedt, Johan, et al. "RoMa: Revisiting Robust Losses for Dense Feature Matching." arXiv preprint arXiv:2305.15404 (2023).
    • Shen, Xuelun, et al. "GIM: Learning Generalizable Image Matcher From Internet Videos." The Twelfth International Conference on Learning Representations. 2023.
    • {Detector-Free Matcher} [DeepMatcher, 2024] Xie, Tao, et al. "Deepmatcher: a deep transformer-based network for robust and accurate local feature matching." Expert Systems with Applications 237 (2024): 121361.
    • {Detector-Free Matcher} [XFeat, 2024] Potje, Guilherme, et al. "XFeat: Accelerated Features for Lightweight Image Matching." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.
  • Object Tracking
    • [SORT, 2017] Wojke, Nicolai, Alex Bewley, and Dietrich Paulus. "Simple online and realtime tracking with a deep association metric." 2017 IEEE international conference on image processing (ICIP). IEEE, 2017.
    • [Tracktor, 2019] Bergmann, Philipp, Tim Meinhardt, and Laura Leal-Taixe. "Tracking without bells and whistles." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
    • [FairMOT, 2021] Zhang, Yifu, et al. "Fairmot: On the fairness of detection and re-identification in multiple object tracking." International Journal of Computer Vision 129 (2021): 3069-3087.
    • [STARK, 2021] Yan, Bin, et al. "Learning spatio-temporal transformer for visual tracking." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
    • [MixFormer, 2022] Cui, Yutao, et al. "Mixformer: End-to-end tracking with iterative mixed attention." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
    • [ByteTrack, 2022] Zhang, Yifu, et al. "Bytetrack: Multi-object tracking by associating every detection box." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
  • Image Generation
    • [DCGAN, 2015] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
    • [BigGAN, 2018] Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale GAN training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018).
    • [StyleGANv3, 2021] Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021): 852-863.
    • [DALL-E, 2021] Ramesh, Aditya, et al. "Zero-shot text-to-image generation." International conference on machine learning. Pmlr, 2021.
    • [LAFITE, 2021] Zhou, Y., et al. "Lafite: Towards language-free training for text-to-image generation. arxiv 2021." arXiv preprint arXiv:2111.13792 2 (2021).
    • [CLIP, 2021] Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
    • [Imagen, 2022] Saharia, Chitwan, et al. "Photorealistic text-to-image diffusion models with deep language understanding." Advances in neural information processing systems 35 (2022): 36479-36494.
    • [GLIDE, 2022] Nichol, Alexander Quinn, et al. "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models." International Conference on Machine Learning. PMLR, 2022.
    • [unCLIP, 2022] Ramesh, Aditya, et al. "Hierarchical Text-Conditional Image Generation with CLIP Latents." arXiv preprint arXiv:2204.06125 (2022).
    • [LDM / Stable Diffusion (SD), 2022] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
    • [DALL-E 2, 2022] Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 1.2 (2022).
    • [DALL-E 3, 2023] Betker, James, et al. "Improving image generation with better captions." Computer Science. https://cdn.openai.com/papers/dall-e-3.pdf 2.3 (2023): 8.
    • [SDXL, 2023] Podell, Dustin, et al. "SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis." The Twelfth International Conference on Learning Representations. 2023.
  • Image Retrieval
    • [LSMH, 2016] Lu, Xiaoqiang, Xiangtao Zheng, and Xuelong Li. "Latent semantic minimal hashing for image retrieval." IEEE Transactions on Image Processing 26.1 (2016): 355-368.
    • [R–GeM, 2018] Radenović, Filip, Giorgos Tolias, and Ondřej Chum. "Fine-tuning CNN image retrieval with no human annotation." IEEE transactions on pattern analysis and machine intelligence 41.7 (2018): 1655-1668.
    • [HOW, 2020] Tolias, Giorgos, Tomas Jenicek, and Ondřej Chum. "Learning and aggregating deep local descriptors for instance-level recognition." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer International Publishing, 2020.
    • [FIRe, 2021] Weinzaepfel, Philippe, et al. "Learning Super-Features for Image Retrieval." International Conference on Learning Representations. 2021.
    • [Token, 2022] Wu, Hui, et al. "Learning token-based representation for image retrieval." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 3. 2022.
  • WIP:
    • Explainable AI (XAI)
    • Video Summarization and Captioning
    • Text Recognition
    • Data Compression
    • Affective Computing
    • Image Colorization
    • Virtual reality (VR)
    • Augmented reality (AR)
    • Visual Question Answering (VQA)
    • Vision-Language Models (VLMs)
    • DeepFake Detection
    • 3D Reconstruction
    • Image Captioning
    • Image Super-Resolution / Image Restoration
    • Pose Estimation
    • Biometric Analysis
    • Depth Estimation
    • Meta Learning
    • Semi-Supervised Learning - Zero/One/Few shot

Reference Books

Book Links
Antonio Torralba, Phillip Isola, William T. Freeman. “Foundations of Computer Vision” MIT Press, (2024). goodreads
Nixon, Mark, and Alberto Aguado. “Feature extraction and image processing for computer vision” Academic press, (2019). goodreads
González, Rafael Corsino and Richard E. Woods. “Digital image processing, 4th Edition” (2018). goodreads
E.R. Davies. “Computer Vision: Principles, Algorithms, Applications, Learning” Academic press, (2017). goodreads
Prince, Simon. “Computer Vision: Models, Learning, and Inference” (2012). goodreads
Forsyth, David Alexander and Jean Ponce. “Computer Vision - A Modern Approach, Second Edition” (2011). goodreads
Szeliski, Richard. “Computer Vision - Algorithms and Applications” Texts in Computer Science (2010). goodreads
Bishop, Charles M.. “Pattern recognition and machine learning, 5th Edition” Information science and statistics (2007). goodreads
Harltey, Andrew and Andrew Zisserman. “Multiple view geometry in computer vision (2. ed.)” (2003). goodreads
Stockman, George C. and Linda G. Shapiro. “Computer Vision” (2001). goodreads

Courses

Course Year Instructor Source
Introduction to Computer Vision 2025 James Tompkin Brown
Deep Learning for Computer Vision 2024 Fei-Fei Li Stanford
Advances in Computer Vision 2023 William T. Freeman MIT
OpenCV for Python Developers 2023 Patrick Crawford LinkedIn Learning
Computer Vision 2021 Andreas Geiger University of Tübingen
Computer Vision 2021 Yogesh S Rawat / Mubarak Shah University of Central Florida
Advanced Computer Vision 2021 Mubarak Shah University of Central Florida
Deep Learning for Computer Vision 2020 Justin Johnson University of Michigan
Advanced Deep Learning for Computer Vision 2020 Laura Leal-Taixé / Matthias Niessner Technical University of Munich
Introduction to Digital Image Processing 2020 Ahmadreza Baghaie New York Institute of Technology
Quantitative Imaging 2019 Kevin Mader ETH Zurich
Convolutional Neural Networks for Visual Recognition 2017 Fei-Fei Li Stanford University
Introduction to Digital Image Processing 2015 Rich Radke Rensselaer Polytechnic Institute
Machine Learning for Robotics and Computer Vision 2014 Rudolph Triebel Technical University of Munich
Multiple View Geometry 2013 Daniel Cremers Technical University of Munich
Variational Methods for Computer Vision 2013 Daniel Cremers Technical University of Munich
Computer Vision 2012 Mubarak Shah University of Central Florida
Image and video processing - Guillermo Sapiro Duke University
Introduction to Computer Vision - Aaron Bobick / Irfan Essa Udacity

Repos

  • Tags: Object Classification [ObjCls], Object Detection [ObjDet], Object Segmentation [ObjSeg], General Library [GenLib], Text Reading / Object Character Recognition [OCR], Action Recognition [ActRec], Object Tracking [ObjTrk], Data Augmentation [DatAug], Simultaneous Localization and Mapping [SLAM], Outlier/Anomaly/Novelty Detection [NvlDet], Content-based Image Retrieval [CBIR], Image Enhancement [ImgEnh], Aesthetic Assessment [AesAss], Explainable Artificial Intelligence [XAI], Text-to-Image Generation [TexImg], Pose Estimation [PosEst], Video Matting [VidMat], Eye Tracking [EyeTrk]
Repo Tags Description
computervision-recipes [GenLib] Microsoft, Best Practices, code samples, and documentation for Computer Vision
FastAI [GenLib] FastAI, Library over PyTorch used for learning and practicing machine learning and deep learning
pytorch-lightning [GenLib] PyTorchLightning, Lightweight PyTorch wrapper for high-performance AI research
ignite [GenLib] PyTorch, High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently
pytorch_geometric [GenLib] Graph Neural Network Library for PyTorch
kornia [GenLib] Open Source Differentiable Computer Vision Library
ncnn [GenLib] Tencent, High-performance neural network inference framework optimized for the mobile platform
MediaPipe [ObjDet] [ObjSeg] [ObjTrk] [GenLib] Google, iOS - Andriod - C++ - Python - Coral, Face Detection - Face Mesh - Iris - Hands - Pose - Holistic - Hair Segmentation - Object Detection - Box Tracking - Instant Motion Tracking - Objectron - KNIFT (Similar to SIFT)
PyTorch image models [ObjCls] rwightman, PyTorch image classification models, scripts, pretrained weights
mmclassification [ObjCls] OpenMMLab, Image Classification Toolbox and Benchmark
vit-pytorch [ObjCls] SOTA for vision transformers
face_classification [ObjCls] [ObjDet] Real-time face detection and emotion/gender classification
mmdetection [ObjDet] OpenMMLab, Image Detection Toolbox and Benchmark
detectron2 [ObjDet] [ObjSeg] Facebook, FAIR's next-generation platform for object detection, segmentation and other visual recognition tasks
detr [ObjDet] Facebook, End-to-End Object Detection with Transformers
libfacedetection [ObjDet] An open source library for face detection in images, speed: ~1000FPS
FaceDetection-DSFD [ObjDet] Tencent, SOTA face detector
object-Detection-Metrics [ObjDet] Most popular metrics used to evaluate object detection algorithms
SAHI [ObjDet] [ObjSeg] A lightweight vision library for performing large scale object detection/ instance segmentation
yolov5 [ObjDet] ultralytics
AlexeyAB/darknet pjreddie/darknet [ObjDet] YOLOv4 / Scaled-YOLOv4 / YOLOv3 / YOLOv2
U-2-Net [ObjDet] ultralytics U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection
segmentation_models.pytorch [ObjSeg] qubvel, PyTorch segmentation models with pretrained backbones
mmsegmentation [ObjSeg] OpenMMLab, Semantic Segmentation Toolbox and Benchmark
mmocr [OCR] OpenMMLab, Text Detection, Recognition and Understanding Toolbox
pytesseract [OCR] A Python wrapper for Google Tesseract
EasyOCR [OCR] Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc
PaddleOCR [OCR] Practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices
PaddleSeg [ObjSeg] Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc
mmtracking [ObjTrk] OpenMMLab, Video Perception Toolbox for object detection and tracking
mmaction [ActRec] OpenMMLab, An open-source toolbox for action understanding based on PyTorch
albumentations [DatAug] Fast image augmentation library and an easy-to-use wrapper around other libraries
ORB_SLAM2 [SLAM] Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
pyod [NvlDet] Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
imagededup [CBIR] Image retrieval, CBIR, Find duplicate images made easy!
image-match [CBIR] Image retrieval, CBIR, Quickly search over billions of images
Bringing-Old-Photos-Back-to-Life [ImgEnh] Microsoft, Bringing Old Photo Back to Life (CVPR 2020 oral)
image-quality-assessment [AesAss] Idealo, Image Aesthetic, NIMA model to predict the aesthetic and technical quality of images
aesthetics [AesAss] Image Aesthetics Toolkit using Fisher Vectors
pytorch-cnn-visualizations [XAI] Pytorch implementation of convolutional neural network visualization techniques
DALLE2-pytorch [TexImg] Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
imagen-pytorch [TexImg] Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
openpose [PosEst] OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
RobustVideoMatting [VidMat] Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
fastudp [NvlDet] [CBIR] An unsupervised and free tool for image and video dataset analysis
Random-Erasing [DatAug] Random Erasing Data Augmentation in PyTorch
CutMix-PyTorch [DatAug] Official Pytorch implementation of CutMix regularizer
keras-cv [GenLib] Library of modular computer vision oriented Keras components
PsychoPy [EyeTrk] Library for running psychology and neuroscience experiments
alibi-detect [NvlDet] Algorithms for outlier, adversarial and drift detection
Captum [XAI] built by PyTorch team, Model interpretability and understanding for PyTorch
Alibi [XAI] Algorithms for explaining machine learning models
iNNvestigate [XAI] for TF, A toolbox to iNNvestigate neural networks' predictions
keras-vis [XAI] for Keras, Neural network visualization toolkit
Keract [XAI] for Keras, Layers Outputs and Gradients
pytorch-grad-cam [XAI] for PyTorch, Advanced AI Explainability for computer vision
SHAP [XAI] A game theoretic approach to explain the output of any machine learning model
TensorWatch [XAI] built by Microsoft, Debugging, monitoring and visualization for Python Machine Learning and Data Science
WeightWatcher [XAI] an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data

Dataset Collections


Annotation Tools

  • labelme, Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
  • CVAT, an interactive video and image annotation tool for computer vision.
  • VoTT, Microsoft, Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
  • labelImg, Graphical image annotation tool and label object bounding boxes in images.
  • VIA, VGG Oxford, HTML-based standalone manual annotation software for image, audio and video.
  • FiftyOne, open-source tool for building high-quality datasets and computer vision models.
  • makesense.ai, a free-to-use online tool for labeling photos.

YouTube Channels


Mailing Lists

  • Vision Science, announcements about industry/academic jobs in computer vision around the world (in English).
  • bull-i3, posts about job opportunities in computer vision in France (in French).

Misc


Thanks