Microsoft’s Bing A.I. made several factual errors in last week’s launch demo

Microsoft’s Bing A.I. made several factual errors in last week’s launch demo


Microsoft CEO Satya Nadella

Jordan Novet | CNBC

During last week’s chatbot hype, with Microsoft and Google attempting to outduel each other in showcasing early versions of artificial intelligence-powered search, more than 1 million people signed up to try Microsoft’s tool in the first 48 hours, the company said.

Microsoft CEO Satya Nadella told CNBC that the technology, which can spit out complete answers that read like they were written by a human, was “perhaps the industrial revolution brought to knowledge work.”

related investing news

Wall Street puts Salesforce, Nvidia, Humana under the microscope. Here's our take.

CNBC Investing Club

But for those concerned about accuracy, the AI leaves plenty to be desired.

In Microsoft’s demo in front of reporters, the ChatGPT-like technology embedded in the company’s Bing search engine analyzed earnings reports from Gap and Lululemon. In comparing its answers to the actual reports, the chatbot missed some numbers. Others appear to have been made up.

“Bing AI got some answers completely wrong during their demo. But no one noticed,” wrote independent search researcher Dmitri Brereton in a Substack post on Monday. “Instead, everyone jumped on the Bing hype train.”

Brereton identified possible factual issues in the Microsoft demo in its responses about vacuum cleaner specifications and travel plans to Mexico in addition to the financial errors. He told CNBC he wasn’t initially looking for errors, and only discovered them when he looked more closely to write a comparison of the AI unveilings from Microsoft and Google.

AI experts call the phenomenon “hallucination,” or the propensity of tools based on large language models to simply make stuff up. Last week, Google introduced a competing AI tool that also included factual errors — although the mistakes were quickly called out by viewers.

Both companies are rushing to incorporate new kinds of generative AI into search engines and are eager to show their advancements following the explosion of ChatGPT, which OpenAI introduced to the public in November. OpenAI has raised billions from Microsoft, while competing startups like Stability AI and Hugging Face also have ballooned to billion-dollar valuations in private funding rounds.

While Google has been reluctant to add AI-generated responses into search engines, citing reputational risk and safety concerns, Microsoft, in its announcement last week, stressed the short-term potential of releasing the technology to some of the public.

“I think it’s important not to be in a lab,” Nadella said. “You have to get these things out safely.”

When it came time to demo Bing AI’s response to a query on corporate earnings, there were some problems.

Yusuf Mehdi, a marketing executive at Microsoft, navigated to Gap’s investor relations site, and asked the Bing AI to summarize the “key takeaways” from the retailer’s third-quarter earnings release in November.

“Very cool. A massive time savings,” Mehdi said.

These are screen shots from Microsoft’s demo:

Here are some mistakes in the summary:

  • Gap’s reported gross margin was 37.4%. But after excluding charges related to Yeezy, the adjusted gross margin was 38.7%.
  • Gap operating margin was 4.6%, not 5.9%, a number that can’t be found in the company’s report.
  • Adjusted diluted earnings per share was $0.71 adjusted, instead of $0.42, a number that’s not in the report. The figure Gap reported included an adjusted income tax benefit of about $0.33.
  • Gap pulled its full-year outlook in August and said in the third-quarter report that “net sales could be down mid-single digits year-over-year in the fourth quarter.” That would imply a decline in revenue for the full year as opposed to “growth in the low double digits.” There is no forecast for operating margin or EPS.

Microsoft said it knows about the errors and that it expects Bing AI to make mistakes.

“We’re aware of this report and have analyzed its findings in our efforts to improve this experience,” a Microsoft spokesperson told CNBC. “We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better.”

Microsoft then asked Bing AI to compare Gap’s earnings with Lululemon’s report. Mehdi wanted Bing to pull the information from the two reports into a table.

“Look how amazing this is,” he said. “Just like that, in one table, I can get an answer to this question. Think how much time that would’ve taken otherwise.”

Here’s what the Bing AI tool returned:

There are several errors in the table, starting with margins.

  • Lululemon’s gross margin was 55.9%, not 58.7%.
  • The company’s operating margin was 19%, not 20.7%.
  • Lululemon reported diluted EPS of $2, and adjusted EPS of $1.62. Bing showed a diluted EPS number of $1.65.
  • Gap had $679 million in cash and cash equivalents, not $1.4 billion.
  • Gap had $3.04 billion in inventory, not $1.9 billion.

WATCH: CNBC’s full interview with C3.ai CEO Thomas Siebel

Watch CNBC's full interview with C3.ai CEO Thomas Siebel



Source

OpenAI taps iPhone assembler Foxconn to manufacture data center components in U.S.
Technology

OpenAI taps iPhone assembler Foxconn to manufacture data center components in U.S.

OpenAI is partnering with Taiwan’s Foxconn, the world’s largest contract electronics manufacturer, to design and build artificial intelligence data center components in the U.S., the AI startup’s latest announcement tied to its massive infrastructure development plans. While no financial terms were disclosed, OpenAI said in Thursday’s announcement that it will have early access to evaluate […]

Read More
Feds charge 4 in plot to export restricted Nvidia chips to China, Hong Kong
Technology

Feds charge 4 in plot to export restricted Nvidia chips to China, Hong Kong

Four men have been indicted on federal criminal charges related to a plot to export Nvidia chips worth millions of dollars to China and Hong Kong in violation of tight U.S. restrictions, court documents show. One of the defendants, Brian Curtis Raymond, a 46-year-old resident of Huntsville, Alabama, was identified last week as the chief […]

Read More
Joby lawsuit accuses air taxi rival Archer of using stolen information to ‘one-up’ deal
Technology

Joby lawsuit accuses air taxi rival Archer of using stolen information to ‘one-up’ deal

An electric air taxi by Joby Aviation flies near the Downtown Manhattan Heliport in Manhattan, New York City, U.S., November 12, 2023. Roselle Chen | Reuters Air taxi maker Joby Aviation in a new lawsuit accused competitor Archer Aviation of using stolen information by a former employee to “one-up” a partnership deal with a real […]

Read More