Dev Resource
Public dataset
- Dataset source
- Require login - https://public.roboflow.com/
- Conll-03
- MNIST
- CIFAR-10 - https://www.cs.toronto.edu/~kriz/cifar.html
- 10 classes labeled images
- Coco dataset - https://cocodataset.org/#home
- LVIS - Large Vocabulary Instance Segmentation
- https://www.lvisdataset.org/
- ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images
- Flower dataset
- CARLA - self-driving car dataset
Package/Library 🔧
- List of all popular ML package for various fields
Small package :
-
deepchecks - https://deepchecks.com/
-
A very easy to use lib to do model & data checking, e.g. train/test set distribution check
-
Very useful seemingly
-
ML core framework :
- Scikit-learn (Based on SciPy)
- https://scikit-learn.org/stable/install.html
- The user guide is too detailed. Best usage if you have a target to search for.
-
Pycaret
- https://pycaret.gitbook.io/docs/
- wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.
-
PyTorch
- Machine Learning lib
- https://pytorch.org/get-started/locally/
- pytorch lightning (Based on Pytorch)
- A high level API allow researcher to build PyTorch model more efficiently
- i.e. More speedy in coding
- https://www.pytorchlightning.ai/
-
Fastai (Based on PyTorch)
- Deep Learning lib
- Allow the control from High, Middle, Low level
- Domains: NLP, CV, Tabular, (?)collaborative filtering
- NOT support Mac, so run it on Cloud/Colab
-
Tensorflow (VS PyTorch)
- Keras (Based on Tensorflow)
- Autokeras (Based on Keras)
- Deep Learning lib
- Similar to scikit but more high level api
- e.g. you can specify a task type, they help to choose the model algorithm
Domain-Specific :
- NLP
- Scikit-crfsuite
- CRF model classifier
- MITIE
- Train NER model
- NCRF++
- FastText - https://fasttext.cc/docs/en/supervised-tutorial.html
- fastText is a library for efficient learning of word representations and sentence classification.****
- Scikit-crfsuite
- Computer vision
- Framework
- OpenMMLab - https://openmmlab.com/codebase
- From China, quite a big framework
- Wrap up a wide range of application.
- Icevision - https://airctic.com/0.12.0/
- YOLOv5 - https://docs.ultralytics.com/
- Quite infamous due to not recognized by original YOLO author
- try YOLOv4 instead
- Detectron
- Mediapipe
- OpenMMLab - https://openmmlab.com/codebase
- Data
- augmentation lib - Albumentations
- Framework
Tool / Software
- Annotation tool:
- LabelImg
- labelstudio - https://labelstud.io/
- can label many types of data(img, audio, text…etc)
- Extremely useful than labelimg (recommened by colleagues)
- CVAT
- makesense.ai
- Labelbox
- Roboflow (not free for business)
Resource List
- State-of-the-Art models list - https://paperswithcode.com/sota