AutoML : Accelerate your Machine Learning Process

kupas data
4 min readNov 29, 2023

--

Introduction

Machine Learning is a topic that is often discussed in the world of technology. Many educational institutions have opened special courses or bootcamps on this topic. The use of Machine Learning has begun to spread to various fields, such as health, plantations, transportation, and so on.

One of the challenges faced when building machine learning is how to find the best algorithm that produces maximum performance metrics. There are lots of algorithms such as Random Forest, Decision Tree, XGBoost and so on. Several experiments need to be carried out to determine which algorithm is appropriate to solve the problem at hand.

What is Auto ML?

AutoML is a relatively new and upcoming subset of machine learning. The main approach in AutoML is to limit the involvement of data scientists and let the tool handle all time-consuming processes in machine learning like data preprocessing, best algorithm selection, hyperparameter tuning, etc., thus saving time for setting up these ML models and speeding up their deployment.

AutoML is a advance version of machine learning, because

· It is a training of machine learning models to automate the process of selecting algorithms.

· This includes everything from data preprocessing to selecting the most suitable model

· It is handle hyperparameter tuning and model selection tasks, which typically require time and expertise.

· Users without experience in machine learning can train high-performing models with minimal effort.

Application

Auto ML can be used for all machine learning usecase, such as classification, regression, forecasting, computer vision & NLP problems .

Example

There are several AutoML tools available in the market these days.

TPOT

TPOT is a python AutoML tool that optimizes machine learning pipelines using genetic programming. TPOT is built on top of scikit-learn, so all of the code it generates should look familiar.

To install TPOT, run the following command: pip install tpot

Here is the example for the usage of TPOT with various datasets

Auto-sklearn

Auto-sklearn is a python-based open-source toolkit for doing AutoML and a drop-in replacement for a scikit-learn estimatior. Built around the scikit-learn machine learning library, auto-sklearn automatically searches for the right learning algorithm for a new machine learning dataset and optimizes its.

Auto-sklearn can solve classification and regression use case. And the first version of auto-sklearn was introduced in 2015.

To install Auto-Sklearn, run the following command : pip install auto-sklearn

By writing just five lines of Python code, beginners can see the prediction, and experts can boost their productivity. Here are some main features of auto-sklearn:

· Written in Python (scikit-learn)

· Useful for many tasks, such as classification or regression

· Consists of preprocessing methods (normalizing data, scaling data, handling missing values)

· Searches for optimal ML pipelines among a considerable search space

· State of the art thanks to using meta-learning, Bayesian optimization, ensemble techniques.

ML Box

MLBox is a powerful Automated Machine Learning python library. It provides the following features:

· Fast reading and distributed data preprocessing/cleaning/formatting

· Highly robust feature selection and leak detection

· Accurate hyper-parameter optimization in high-dimensional space

· State-of-the art predictive models for classification and regression

· Prediction with models interpretation

H2O.ai

H2O AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit.

H2O offers a number of model explainability methods that apply to AutoML objects (groups of models), as well as individual models (e.g. leader model). Explanations can be generated automatically with a single function call, providing a simple interface to exploring and explaining the AutoML models.

H2O is available for R and Python.

Here is a example of model results from H2O

Thank you everyone

So in recap, the question is will AutoML replace Data Scientists?

It’s a common question that arises whenever the topic of AutoML is brought up. Besides the difficulty of automating many of the data science tasks, that’s not really the point behind AutoML, its purpose is to assist data scientists and free them from the burden of repetitive, so they can invest their time on tasks that are more challenging things.

References

1. TPOT : https://epistasislab.github.io/tpot/

2. Auto-sklearn : https://automl.github.io/auto-sklearn/master/ & https://neptune.ai/blog/a-quickstart-guide-to-auto-sklearn-automl-for-machine-learning-practitioners

3. ML Box : https://github.com/AxeldeRomblay/MLBox

H2O.ai : https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

--

--

kupas data
kupas data

No responses yet