AI Chatbots: Researchers Discover Gap in Artificial Intelligence Technology's Safety Controls
(Photo : Leon Neal/Getty Images)
Researchers detailed in a new paper their findings of how they were able to circumvent artificial intelligence chatbots' safety guardrails against hate speech and misinformation.

Researchers discovered a gap in the safety guardrails of various artificial intelligence chatbots, including ChatGPT, Claude, and Google Bard, that prevent the technology from generating multiple toxic materials, such as hate speech and disinformation.

In a report released on Thursday, researchers from Carnegie Mellon University located in Pittsburgh and the Center for AI Safety in San Francisco revealed how anyone could circumvent AI safety measures and make leading chatbots in the industry produce unlimited amounts of harmful information.

Gaps in AI Chatbot Safety Guardrails

The new findings underscore increasing concern regarding artificial intelligence chatbots being capable of flooding the internet with false and dangerous information. This comes despite attempts by their creators to make sure that such a situation would not be possible.

Additionally, it shows how disagreements among leading AI companies are creating an increasingly unpredictable environment for advanced technology. The researchers discovered they could use a method taken from open-source AI systems, whose underlying computer code has been released for the public to use, as per the New York Times.

Facebook's parent company, Meta, recently let anyone do what they want with their artificial intelligence technology. The decision received widespread criticism in some tech circles, arguing that it could result in the spread of powerful AI that has little regard for controls.

However, the tech firm said that it offered the technology as open-source software to accelerate the progress of AI to provide a deeper understanding of the technology's risks. Additionally, proponents of open-source software argue that the tight controls that only a few companies have over AI stifle competition.

The current debate of whether to let everyone see computer code and collectively fix it rather than keeping it private predates the chatbot era by several decades. Furthermore, the situation is likely to become even more contentious after what the researchers revealed in their Thursday report.

Read Also: Microsoft Bing's New AI Chat is Expanding Access To Other Browsers

Concern Over AI Safety and Risk

The research paper examined the vulnerability of large language models (LLMs) to automated adversarial attacks. According to ZDNet, the authors wanted to demonstrate that despite claims of being resistant to attacks, AI chatbots are still susceptible to tricks that bypass their content filters and are forced to generate harmful information, misinformation, and hate speech.

Aviv Ovadya, a researcher at the Berkman Klein Center for Internet & Society at Harvard, said the findings show the brittleness of the defenses built into artificial intelligence chatbots.

Since the launch of ChatGPT, many users have tried to use the technology to generate malicious content. This prompted its creator, OpenAI, to implement stronger safety guardrails, which made users unable to ask questions that involved illegal activities, hate speeches, and topics that promoted violence.

The findings suggest that it would be better to allow the public to have access to fix computer code, which the researchers believe would be more effective than keeping the technology private. Many have hailed the paper as a game-changer in the industry that could lead to re-evaluating how safety measures are implemented within AI systems, said Fagen Wasanni Technologies.

Related Article: EU Launches Antitrust Inquiry With Microsoft for Bundling Teams With Office Apps