Machine Learning

Machine Learning is a subfield of Artificial Intelligence where parameters of an algorithm are updated from data inputs or by interacting with an environment.

What is Machine Learning?

Machine Learning is a subfield of Artificial Intelligence where algorithms update their parameters (learn) by interacting with an environment or from going through a training dataset, ie. fixed data used to tweak the parameters of algorithms. Machine Learning is primarily used to make predictions using statistics, applying those statistical methods to business problems is the realm of predictive analytics. Most recent advances in Machine Learning have been driven by research in Deep Learning, which is the combination of Machine Learning and multi-layer artificial neural networks.

Why use Machine Learning?

The advantage of using Machine Learning instead of traditional Artificial Intelligence algorithms is that models can act and adapt to different possible environments or data without being explicitly programmed to do so. In that sense, it is more flexible than rule-based Artificial Intelligence where humans often have to hard-code expert knowledge about the task at hand. A striking example has been AlphaZero beating the best (overly engineered) chess programs without any prior rule or knowledge about chess.

The two use-cases of machine learning are to either give a numeric answer to a question (eg. "what will be the price of bitcoin tomorrow"), which we call a regression problem, the simpler example being a linear regression (aka a straight line passing through the data). The second use-case is classification where the objective is to tell if a given input belongs to a particular group. A popular example is to determine whether in an image there is a cat or a dog.

Types of Machine Learning

The three most important themes in machine learning are supervised learning, unsupervised learning and reinforcement learning. Data mining falls into the scope of unsupervised learning, where the goal is to extract insights about the data. Traditionally, supervised learning made extensive use of decision trees, support-vector machines, bayesian networks and genetic algorithms (defined below), though in the past 5 years deep learning is omnipresent in machine learning algorithms.

  1. Supervised Learning: the model learns from a training data with labels in order to be able to predict the labels of some test data, where the algorithm has not seen the labels. This can be for instance a classifier being able to tell if an image is a dog or a cat after having been exposed to millions of images of cats and dogs, with the corresponding labels. Supervised comes from the model learning from the correct questions (training examples) and answers (training labels), like if it was a teacher supervising the learning by providing some material.

  2. Unsupervised Learning: the model does not have access to any label. One popular application is clustering where the goal is to identify natural groups of examples in the dataset. This could be for instance determining the behavior of a website user. Another idea is to see which datapoints in our dataset is very far away from what we would expect from the rest of the distribution, which we call an outlier, or anomaly in anomaly detection. This is used in computer security to detect strange behaviors, for instance with non human-like requests.

  3. Reinforcement Learning: an agent interacts with an environment by observing states and rewards from his actions. It shares some components of agent-based modeling like having agents that interact with environments, and the state is essentially the collection of all the agents' properties. However, in reinforcement learning with one agent we do not consider the rules as something separate from the environment, but solely as a particular part of the environment that an agent needs to learn. In Multi-Agent Reinforcement Learning we can either consider each agent as being in its own one-agent reinforcement learning environment, or provide ways of communication between agents that would allow them to share their plans, states and rewards. For more details about reinforcement learning and how to apply it with neural networks see the glossary page on deep reinforcement learning.

  4. Data Mining: methods used in data mining broadly overlap with machine learning techniques, especially unsupervised learning. However, the main goal is to discover underlying properties in the training data, without any requirement of "learning". These deduced properties will then be used by predictive algorithms.

  5. Decision Trees: at each node of a tree specific variables of the data is assessed, which will determine which children node will be visited next. This algorithm can be used to end with a certain decision or classify images. These decision trees, coupled with more complex algorithms such as "Random Forest" are often enough to win kaggle competition.

  6. Support-vector machines: can work for both classification and regression, by mapping the inputs into another higher-dimensional space, using what is known as the kernel trick.

  7. Bayesian Networks: assumes the training data is a table of multiple variables with dependencies that can be modeled by a direct acyclic graph. Such a graph, known as a bayesian network, can be used to compute probability distributions using chain rules and conditional probabilities. The resulting model is easily interpretable and is therefore sometimes preferred over black box solutions such as deep neural networks.

  8. Genetic Algorithms: each agent is defined by some information (its genome), and the survival of its genes depends on how it will perform to do a certain task. This mimicks how evolution managed to produce intelligent behavior, by simulating a multi-agent simulations over many steps, and expecting that after multiple crossovers and mutations the correct behaviors (corresponding to the right genes) will spontaneously emerge.

Applying Machine Learning in practice

HASH's platform allows users to easily create environments for multi-agent simulations. These can be modified for synchronous multi-agent reinforcement learning problems, where each agent navigates and discovers solutions of arbitrarily complex systems. It is also possible to simulate entire populations of agents and run genetic algorithms.

For supervised learning approaches, datasets can be added to the simulation as standard tables (CSV files) or standardized web formats (JSON files).

Create a free

account

Sign up to try HASH out for yourself, and see what all the fuss is about

By signing up you agree to our terms and conditions and privacy policy