upd readme

sb-ai-lab · Aug 9, 2024 · 7bd8f97 · 7bd8f97
1 parent c977bfb
commit 7bd8f97
Show file tree

Hide file tree

Showing 11 changed files with 77 additions and 195 deletions.
diff --git a/README.md b/README.md
@@ -1,180 +1,34 @@
 <img src=docs/imgs/lightautoml_logo_color.png />
 
-[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/lightautoml)](https://pypi.org/project/lightautoml/)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/lightautoml)](https://pypi.org/project/lightautoml)
 [![PyPI - Version](https://img.shields.io/pypi/v/lightautoml)](https://pypi.org/project/lightautoml)
 ![pypi - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=green)
-[![GitHub License](https://img.shields.io/github/license/sb-ai-lab/LightAutoML)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE)
-[![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml)
-<br>
 [![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/lightautoml/CI.yml)](https://github.com/sb-ai-lab/lightautoml/actions/workflows/CI.yml?query=branch%3Amain)
 ![Read the Docs](https://img.shields.io/readthedocs/lightautoml)
-[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+### [Documentation](https://lightautoml.readthedocs.io/)  |  [Installation](#installation) | [Examples](#resources) | [Telegram chat](https://t.me/joinchat/sp8P7sdAqaU0YmRi) | [Telegram channel](https://t.me/lightautoml)
 
-LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks:
-- binary classification
-- multiclass classification
-- multilabel classification
-- regression
-
-Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**.
+LightAutoML (LAMA) allows you to create machine learning models using just a few lines of code, or build your own custom pipeline using ready blocks. It supports tabular, time series, image and text data.
 
 **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets.
 
-
-<a name="toc"></a>
-# Table of Contents
-
-* [Installation](#installation)
-* [Documentation](https://lightautoml.readthedocs.io/)
-* [Quick tour](#quicktour)
-* [Resources](#examples)
-* [Advanced features](#advancedfeatures)
-* [Support and feature requests](#support)
-* [Contributing to LightAutoML](#contributing)
-* [License](#license)
-
-**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it.
-
-
-<a name="installation"></a>
-# Installation
-To install LAMA framework on your machine from PyPI:
-```bash
-# Base functionality:
-pip install -U lightautoml
-
-# For partial installation use corresponding option
-# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies
-pip install -U lightautoml[nlp]
-```
-
-Additionally, run following commands to enable pdf report generation:
-
-```bash
-# MacOS
-brew install cairo pango gdk-pixbuf libffi
-
-# Debian / Ubuntu
-sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info
-
-# Fedora
-sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2
-
-# Windows
-# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows
-```
-[Back to top](#toc)
-
 <a name="quicktour"></a>
 # Quick tour
 
-Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:
-### Use ready preset for tabular data
-```python
-import pandas as pd
-from sklearn.metrics import f1_score
-
-from lightautoml.automl.presets.tabular_presets import TabularAutoML
-from lightautoml.tasks import Task
-
-df_train = pd.read_csv('../input/titanic/train.csv')
-df_test = pd.read_csv('../input/titanic/test.csv')
-
-automl = TabularAutoML(
-    task = Task(
-        name = 'binary',
-        metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1))
-)
-oof_pred = automl.fit_predict(
-    df_train,
-    roles = {'target': 'Survived', 'drop': ['PassengerId']}
-)
-test_pred = automl.predict(df_test)
-
-pd.DataFrame({
-    'PassengerId':df_test.PassengerId,
-    'Survived': (test_pred.data[:, 0] > 0.5)*1
-}).to_csv('submit.csv', index = False)
-```
+There are two main ways to solve machine learning problems using LightAutoML:
+* Ready to use preset
+    ```python
+    from lightautoml.automl.presets.tabular_presets import TabularAutoML
+    from lightautoml.tasks import Task
 
-### LightAutoML as a framework: build your own custom pipeline
-
-```python
-import pandas as pd
-from sklearn.metrics import f1_score
-
-from lightautoml.automl.presets.tabular_presets import TabularAutoML
-from lightautoml.tasks import Task
-
-df_train = pd.read_csv('../input/titanic/train.csv')
-df_test = pd.read_csv('../input/titanic/test.csv')
-N_THREADS = 4
-
-reader = PandasToPandasReader(Task("binary"), cv=5, random_state=42)
-
-# create a feature selector
-selector = ImportanceCutoffSelector(
-    LGBSimpleFeatures(),
-    BoostLGBM(
-        default_params={'learning_rate': 0.05, 'num_leaves': 64,
-        'seed': 42, 'num_threads': N_THREADS}
-    ),
-    ModelBasedImportanceEstimator(),
-    cutoff=0
-)
-
-# build first level pipeline for AutoML
-pipeline_lvl1 = MLPipeline([
-    # first model with hyperparams tuning
-    (
-        BoostLGBM(
-            default_params={'learning_rate': 0.05, 'num_leaves': 128,
-            'seed': 1, 'num_threads': N_THREADS}
-        ),
-        OptunaTuner(n_trials=20, timeout=30)
-    ),
-    # second model without hyperparams tuning
-    BoostLGBM(
-        default_params={'learning_rate': 0.025, 'num_leaves': 64,
-        'seed': 2, 'num_threads': N_THREADS}
-    )
-], pre_selection=selector, features_pipeline=LGBSimpleFeatures(), post_selection=None)
-
-# build second level pipeline for AutoML
-pipeline_lvl2 = MLPipeline(
-    [
-        BoostLGBM(
-            default_params={'learning_rate': 0.05, 'num_leaves': 64,
-            'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
-            freeze_defaults=True
-        )
-    ],
-    pre_selection=None,
-    features_pipeline=LGBSimpleFeatures(),
-    post_selection=None
-)
-
-# build AutoML pipeline
-automl = AutoML(reader, [
-        [pipeline_lvl1],
-        [pipeline_lvl2],
-    ],
-    skip_conn=False
-)
-
-# train AutoML and get predictions
-oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
-test_pred = automl.predict(df_test)
-
-pd.DataFrame({
-    'PassengerId':df_test.PassengerId,
-    'Survived': (test_pred.data[:, 0] > 0.5)*1
-}).to_csv('submit.csv', index = False)
-```
+    automl = TabularAutoML(task = Task(name = 'binary', metric = 'auc'))
+    oof_preds = automl.fit_predict(train_df, roles = {'target': 'my_target', 'drop': ['column_to_drop']}).data
+    test_preds = automl.predict(test_df).data
+    ```
 
-LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section.
+* As a framework</br>
+    LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#resources) section.
 
-<a name="examples"></a>
+<a name="resources"></a>
 # Resources
 
 ### Kaggle kernel examples of LightAutoML usage:
@@ -230,6 +84,35 @@ LighAutoML framework has a lot of ready-to-use parts and extensive customization
     - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936)
     - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g)
 
+<a name="installation"></a>
+# Installation
+To install LAMA framework on your machine from PyPI:
+```bash
+# Base functionality:
+pip install -U lightautoml
+
+# For partial installation use corresponding option
+# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies
+pip install -U lightautoml[nlp]
+```
+
+Additionally, run following commands to enable pdf report generation:
+
+```bash
+# MacOS
+brew install cairo pango gdk-pixbuf libffi
+
+# Debian / Ubuntu
+sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info
+
+# Fedora
+sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2
+
+# Windows
+# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows
+```
+
+
 <a name="advancedfeatures"></a>
 # Advanced features
 ### GPU and Spark pipelines
@@ -243,9 +126,8 @@ If you are interested in contributing to LightAutoML, please read the [Contribut
 
 <a name="support"></a>
 # Support and feature requests
-Seek prompt advice at [Telegram group](https://t.me/lightautoml).
-
-Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues).
+- Seek prompt advice at [Telegram group](https://t.me/joinchat/sp8P7sdAqaU0YmRi).
+- Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues).
 
 <a name="license"></a>
 # License

diff --git a/examples/tutorials/Tutorial_10_relational_data_with_star_scheme.ipynb b/examples/tutorials/Tutorial_10_relational_data_with_star_scheme.ipynb
@@ -9,7 +9,7 @@
    "source": [
     "# Tutorial 10: Relational datasets (with star scheme)\n",
     "\n",
-    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/39cb56feae6766464d39dd2349480b97099d2535/imgs/LightAutoML_logo_big.png)\n",
+    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/39cb56feae6766464d39dd2349480b97099d2535/docs/imgs/lightautoml_logo_color.png)\n",
     "\n"
    ]
   },
@@ -110,7 +110,7 @@
    "source": [
     "Consider an example of data with a star scheme organization. The dataset contains data on the sale of meals in the restaurant chain, consists of three tables: the main one containing information about completed orders (`train` and `test` parts), and two auxiliary tables containing information about restaurants (`fulfilment_center_info`) and available dishes (`meal_info`). The tables and the scheme of their organization are shown in the image below.\n",
     "\n",
-    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/imgs/Star_scheme_tables.png)"
+    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/docs/imgs/Star_scheme_tables.png)"
    ]
   },
   {

diff --git a/examples/tutorials/Tutorial_11_time_series.ipynb b/examples/tutorials/Tutorial_11_time_series.ipynb
@@ -461,7 +461,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img src=\"../../imgs/tutorial_11_general_problem_statement.png\" alt=\"Time series general problem statement\"/>"
+    "<img src=\"../../docs/imgs/tutorial_11_general_problem_statement.png\" alt=\"Time series general problem statement\"/>"
    ]
   },
   {
@@ -475,7 +475,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img src=\"../../imgs/tutorial_11_case_problem_statement.png\" alt=\"Time series case problem statement\"/>"
+    "<img src=\"../../docs/imgs/tutorial_11_case_problem_statement.png\" alt=\"Time series case problem statement\"/>"
    ]
   },
   {
@@ -489,7 +489,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img src=\"../../imgs/tutorial_11_history_step_params.png\" alt=\"History and step params description\"/>"
+    "<img src=\"../../docs/imgs/tutorial_11_history_step_params.png\" alt=\"History and step params description\"/>"
    ]
   },
   {
@@ -503,7 +503,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img src=\"../../imgs/tutorial_11_transformers_params.png\" alt=\"Transformers params description\"/>"
+    "<img src=\"../../docs/imgs/tutorial_11_transformers_params.png\" alt=\"Transformers params description\"/>"
    ]
   },
   {

diff --git a/examples/tutorials/Tutorial_1_basics.ipynb b/examples/tutorials/Tutorial_1_basics.ipynb
@@ -16,7 +16,7 @@
    "id": "35c56a11",
    "metadata": {},
    "source": [
-    "<img src=\"../../imgs/LightAutoML_logo_big.png\" alt=\"LightAutoML logo\" style=\"width:100%;\"/>"
+    "<img src=\"../../docs/imgs/lightautoml_logo_color.png\" alt=\"LightAutoML logo\" style=\"width:100%;\"/>"
    ]
   },
   {
@@ -986,7 +986,7 @@
     "\n",
     "Let's look at how the LightAutoML model is arranged and what it consists in general.\n",
     "\n",
-    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/imgs/tutorial_1_laml_big.png)\n",
+    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/docs/imgs/tutorial_1_laml_big.png)\n",
     "\n",
     "#### 1.3.1 Reader object\n",
     "\n",
@@ -1008,7 +1008,7 @@
     "\n",
     "As a result, after analyzing and processing the data, the ```Reader``` object forms and returns a ```LAMA Dataset```. It contains the original data and markup with metainformation. In this dataset it is possible to see the roles defined by the ```Reader``` object, selected features etc. Then ML pipelines are trained on this data. \n",
     "\n",
-    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/imgs/tutorial_1_ml_pipeline.png)\n",
+    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/docs/imgs/tutorial_1_ml_pipeline.png)\n",
     "\n",
     "Each such pipeline is one or more machine learning algorithms that share one post-processing block and one validation scheme. Several such pipelines can be trained in parallel on one dataset, and they form a level. Number of levels can be unlimited as possible. List of all levels of AutoML pipeline is available via ```.levels``` attribute of ```AutoML``` instance. Level predictions can be inputs to other models or ML pipelines (i. e. stacking scheme). As inputs for subsequent levels, it is possible to use the original data by setting ```skip_conn``` argument in ```True``` when initializing preset instance. At the last level, if there are several pipelines, blending is used to build a prediction. \n",
     "\n",
@@ -1036,11 +1036,11 @@
     "\n",
     "Here is a default AutoML pipeline for binary classification and regression tasks (```TabularAutoML``` preset):\n",
     "\n",
-    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/imgs/tutorial_blackbox_pipeline.png)\n",
+    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/docs/imgs/tutorial_blackbox_pipeline.png)\n",
     "\n",
     "Another example:\n",
     "\n",
-    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/imgs/tutorial_1_pipeline.png)\n",
+    "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/docs/imgs/tutorial_1_pipeline.png)\n",
     "\n",
     "Let's discuss some of the params we can setup:\n",
     "- `task` - the type of the ML task (the only **must have** parameter)\n",

diff --git a/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb b/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb
@@ -12,7 +12,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img src=\"../../imgs/LightAutoML_logo_big.png\" alt=\"LightAutoML logo\" style=\"width:100%;\"/>"
+    "<img src=\"../../docs/imgs/lightautoml_logo_color.png\" alt=\"LightAutoML logo\" style=\"width:100%;\"/>"
    ]
   },
   {
@@ -34,7 +34,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![WB0](../../imgs/tutorial_whitebox_report_1.png)"
+    "![WB0](../../docs/imgs/tutorial_whitebox_report_1.png)"
    ]
   },
   {
@@ -48,7 +48,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![WB1](../../imgs/tutorial_whitebox_report_2.png)"
+    "![WB1](../../docs/imgs/tutorial_whitebox_report_2.png)"
    ]
   },
   {
@@ -62,7 +62,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![WB2](../../imgs/tutorial_whitebox_report_3.png)"
+    "![WB2](../../docs/imgs/tutorial_whitebox_report_3.png)"
    ]
   },
   {
@@ -76,7 +76,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![WB3](../../imgs/tutorial_whitebox_report_4.png)"
+    "![WB3](../../docs/imgs/tutorial_whitebox_report_4.png)"
    ]
   },
   {