Causal Inference Makes Sense of AI

Classical machine learning (ML) is remarkably effective at finding patterns and associations in data. It can spot correlations that escape human eyes and minds. Yet the technology suffers from a serious liability: it doesn’t understand why events take place or how one outcome impacts another.

This has huge ramifications in everything from medicine to self-driving vehicles. For example, an AI system might identify a correlation between certain environmental conditions and cancer, but it can’t determine which factor caused the disease. Similarly, a self-driving car might recognize objects—pedestrians, crosswalks, and signals—but it is unable to grasp underlying behaviors and actions.

Causal inference aims to produce AI systems that operate better in the real world. “Causal Inference (CI) methodology provides tools for deciding what additional assumptions are needed to substantiate causal and counterfactuals claims from a given type of data,” said Judea Pearl, a professor in the Computer Science Department of the University of California, Los Angeles (UCLA), 2011 ACM A.M. Turing Award laureate, and co-author of The Book of Why.

The framework makes AI systems smarter by injecting real-world logic and cause-and-effect dynamics into decision making. “There’s a core element of causality that’s intertwined with all critical tasks,” said Emre Kiciman, senior principal research manager at Microsoft Research.

Beyond Correlation

If machine learning and classical AI have a superpower, it’s the ability to identify statistical relationships within enormous datasets. This makes these systems ideally suited for tasks such as image recognition, predictive text generation, and language translation. Although AI systems commit occasional errors, they are dependable enough to have gained widespread adoption.

Statistical correlations do not imply causality, however. Because LLMs, predictive analytics, and other forms of AI have a limited grasp of the underlying data, they can go off the rails and produce incorrect and even dangerous conclusions. The problem grows worse as a model becomes more complex and situations become more abstract. “Very often an LLM spits out something that seems convincing because it has simply ‘read’ a lot about the subject. LLMs are only as good as what they have read,” said Illya Shpitser, John C. Malone Associate Professor of Computer Science at Johns Hopkins University.

The problem is rooted in a basic truth about LLMs. Explained Pearl, “Instead of training themselves on observations obtained directly from the environment, they are trained on linguistic texts written by authors who already have causal models of the world. The programs can then extract information from the text without experiencing any of the underlying data.” The resulting sequence of linguistic extrapolations are, in a weak sense, a reflection of the authors’ causal understanding.

In practical terms, this means that using LLMs and predictive analytics to make decisions without CI lies somewhere between dicey and dangerous—particularly in areas like agriculture, economics, healthcare, and law where “there’s a very low level of assurance that the model will produce good information,” Shpitser said.

Getting to Why

Constructing more advanced Causal AI models is an area of intense focus. For example, Kiciman and fellow researchers found that LLMs generate text corresponding to correct causal arguments about 97% of the time for basic tasks and 86% of the time for more challenging event-driven events. However, these numbers aren’t good enough to deploy CI frameworks for critical tasks—where failures could lead to catastrophic results.

The good news? “We now understand how the two paradigms, ML and CI, can work in symbiotic harmony,” Pearl said. Simply put: the CI component identifies the properties needed to answer real-world questions based on qualitative assumptions, while the ML component pinpoints the best estimates of those properties from the data.

By combining scientific knowledge and data, Causal AI models can discover valid links that might otherwise go unnoticed. For example, they can fuse data from several sources and identify factors that render a medical treatment harmful to one patient and beneficial to another, previously indistinguishable, patient. Likewise, they can generate explanations based on retrospective or counterfactual thinking, for instance: what if this patient had never smoked?

Causal AI also delivers a path to explainability within AI and machine learning models. In many cases, it’s possible to interrogate a model and understand how the system arrived at a specific outcome. It’s also possible to adopt a human-managed approach that emphasizes accountability, governance, contestability, and redress. A critical factor in success, Shpitser said, is validating data and ensuring that statistical systems are working effectively. “Otherwise, you wind up with spurious associations.”

Businesses, governments, and others are turning to Causal AI. The World Economic Forum describes the approach as a “revolution.” Already, the technology is contributing to a better understanding of medical data, identifying social influences related to food choices, modeling regulatory decision making and social policy, promoting precision agriculture, and fueling better business decisions. Within a few years, Causal AI could unleash personalized medicine and identify highly effective climate change policies.

Beyond Reason

Researchers continue to explore ways to construct models and algorithms optimized for Causal AI. For instance, “We’ve discovered that large language models actually capture a lot of domain knowledge, and we can tap into that for causal analysis,” Kiciman said. He hopes to combine LLM strengths and knowledge of the causes of misleading correlations, together with humans in the loop, to ease and broaden the usage of reliable causal inference frameworks.

Kiciman aims to combine the strengths of large language models with human oversight and a better understanding of misleading correlations to forge stronger causal inference frameworks. So far, this “bootstrapping” approach has resulted in a 50% time savings in designing high-quality analysis models. “The LLM takes care of the tedious, obvious parts,” he said.

Stamping out biases that can skew decision-making is at the center of Shpitser’s research. “There’s an important question that we need to ask: Can we train predictors to avoid discriminatory behavior?” he asked. He believes that it’s vital to scrutinize inference methods, algorithms, and data quality—including considering missing data components and possible measurement errors. There’s also a need to understand that data often holds a mirror to underlying societal problems.

Finally, transparency and validation standards for causal inference must continue to improve, Kiciman said. While Causal AI will likely enjoy a bright future, it’s always important to recognize the risks of handing off automation for critical issues to machines. “When it comes to critical decision-making scenarios—including or especially those that affect people’s physical or mental well-being—it’s important to proceed cautiously,” he concluded.

Samuel Greengard is an author and journalist based in West Linn, OR, USA.