What is a ‘random forest’?

Spread the love

A: A random forest is a machine-learning method that makes predictions by combining the decisions of many simpler models called decision trees. A decision tree works like a tree from bottom-up. At each node, it asks a question about the data, e.g. “is a person’s age greater than 30?”. Then, depending on the answer, it moves left or right until it reaches a final decision at a ‘leaf’.

While single trees are easier to understand, they can also overfit the data, i.e. they may learn small quirks of the training data that don’t generalise well. A random forest minimises this issue by building a large number of trees, each trained on a slightly different random sample of the data.

When asked to make a prediction, every tree gives an answer. For classification problems, the random forest picks the most common answer. For regression problems, it averages the numerical outputs. Because random errors and quirks tend to cancel out when the verdicts of many trees are combined, a random forest is often more accurate than any single tree while still being relatively straightforward to use.

In a study published on November 17 in the journal PNAS, for instance, scientists trained random forest models to learn the “chemical fingerprints” of fossils, then based on that made them predict whether some organic molecules present on rocks could have come from lifeforms or natural processes. Thus they reported evidence of photosynthetic microbes from 2.5 billion years ago.

Spread the love

What is a ‘random forest’?

Leave a Reply Cancel Reply