Can AI Make the Team?

Humans and AI systems increasingly are moving into closer quarters to solve complex problems, and some industry observers say it’s time to plan for human-machine teaming.

This needs to happen “if we are to really scale the gap from just research prototypes to systems that deliver true business value,” according to Kartik Talamadupula, head of artificial intelligence at Wand Synthesis AI, which markets an AI operating system for hybrid workforces.

Working together means cooperative teaming, augmenting each other’s performance, with one or the other sometimes trying to gain a competitive edge, Talamadupula said, referencing The Turing Test.

Karen Panetta, an IEEE fellow and dean for graduate education at Tufts University, believes it is “essential” for humans to partner with AI to learn new information, while making sure context is captured properly. “You’re not just going to let the AI [make decisions] without humans in the loop,’’ she said.

However, others aren’t so sure teaming with AI is a good idea. “When we say ‘partner’, we anthropomorphize AI and that’s a mistake,’’ said Oren Etzioni, a professor emeritus at the University of Washington. “With humans, you have to take into account feelings. This is an inhuman partner’’ so only its answers matter, he said.  

We should not treat AI as a partner, but rather a powerful tool whose input we need to carefully consider, added Etzioni, who is also co-founder of Vercept.com, and former CEO of the Allen Institute for AI.

“It is really important to assess the strengths and weaknesses of the technology and have the right view,’’ Etzioni said. “People have gotten into trouble by being overly accepting of AI.’’ For example, in the legal system, AI has overly influenced people in sentencing decisions, he said. Etzioni has also seen the opposite occur, where doctors discounted AI’s decisions too much. “You can err in both directions,’’ he said.

AI and humans can have a symbiotic relationship, but not a partnership, he stressed. “I think the right approach is to treat it appropriately as valuable input, but you don’t want to trust it blindly,” Etzioni said.

A recent paper from the MIT Center for Collective Intelligence found that on average, humans and AI systems working in tandem don’t outperform the best human-only or AI-only system. Despite humans and computers working together on some of the most important AI use cases, “On average, human–AI combinations performed significantly worse than the best of humans or AI alone,’’ the study found.

To ensure a robotic agent is a true contributing team member, it must be given greater autonomy, Talamadupula said, which requires trust and keeping the systems in check. There are few instances where they should be fully autonomous, since humans will need “explainability and observability in how [AI is] making decisions,’’ Panetta observed.

When AI and humans team, “There will be different levels of knowledge or breadth of knowledge of the AI, so there’s someone who needs to verify it’s making the right decision,’’ she said.

This includes areas where a person’s life is impacted, so there must always be a human in the loop. “It is essential that a human being makes diagnostic decisions and prognoses, as well as treatment,’’ Panetta stressed. “We don’t want that to ever go away.”

Talamadupula agreed, saying that even “space exploration without humans…is impossible.” However, there are exceptions. In applications where there are very harsh environments, such as underwater exploration, “Sometimes AI will have to make all the decisions,” he said.  

While he endorses the concept, Talamadupula believes we are still far from humans and robots teaming. Even though large language models (LLMs) have enabled a broad and diverse set of AI use cases, they can’t yet be fully trusted.

“Think of LLMs as savants that can speak more intelligently about any topic under the sun; however, if you happen to be an expert on that topic, you’ll find the depth of what the LLM can talk about is not necessarily there,’’ Talamadupula noted. “They’re glorified sentence completers in some sense.”

Humans can’t fully trust chatbots yet because they lack depth, he said.

Etzioni echoed that, saying that “you ask it one thing and it will give you phenomenally good answers, but then you ask it [a question] slightly reworded and you could get a terrible answer with egregious errors.”

He cautioned for the need to be “very careful with the explanations [AI] gives. They can fail to reflect the actual process it went through” to come up with an answer.

To foster greater trust and transparency in LLMs for autonomous systems, humans need to focus on hallucinations, which are a very real concern, Etzioni said.

It’s important to consider whether an AI’s recommendations, insights, or actions taken on a person’s behalf will be possible to reverse, like charging your credit card, he said.

For better or for worse the can of worms has been opened, much of the world’s software is using LLMs, yet the problem is they lack reliability, oversight, and explainability, Talamadupula said.

To mitigate the risk that comes from deploying robots and LLMs, Talamadupula advised putting guardrails in place to limit their work in certain sandbox-like settings. For example, “Put a kill switch on autonomous cars and having a person [in the car] killing the machine if something catastrophic is about to happen,’’ he said.

Now is the time to plan for human-machine partnering. There’s evidence showing the changes agentic systems have brought to different kinds of work, how good they are at automating repetitive tasks, and the breadth of expertise being pulled from the Internet, Talamaduplula said.

What [AI is] not good at is judgment and determining if something is the right thing to do,’’ he said. “We need to combine the world of human judgement with repetitive work. So, it’s important to think about how humans exist with systems in this brave new world,’’ and how AI can “augment humans and the human experience.”

Further, “we need to get out of the confines of the chatbot system,’’ Talamadupula added. “We need to think about vision, touch, sense, movement, motion . . . There is no playbook for this. It has to be prioritized,” because right now, “we’re increasingly in a world where organizations are just looking for bang for the buck from these large language models—and there is not nearly enough of an incentive to research the shortcomings of these models.’’

Esther Shein is a freelance technology and business writer based in the Boston area.