Research says Large Language Models may generate toxic content & expose private information

GPT models handle toxicity with hidden methods. Koyejo states that popular models lack transparency, motivating research into potential issues.

New Delhi,UPDATED: Aug 28, 2023 18:45 IST

LLM may produce toxic content and leak privacy

Highlights

Researchers warn that the perception of Large Language Models(LLMs) as flawless is risky
Investigation reveals privacy leakage in GPT models inadvertently disclosing sensitive data
Researchers stress that scepticism is essential when dealing with AI interfaces

Researchers found the extensive language model lacks transparency and can expose private information. Despite concerns about hallucinations, misinformation, and bias in generative AI, more than half of the respondents expressed their willingness to utilise this emerging technology for critical areas such as medical advice and financial planning.

Researchers Sanmi Koyejo from Stanford and Bo Li from the University of Illinois Urbana-Champaign, along with collaborators from UC Berkeley and Microsoft Research, aimed to address this issue in their recent study on GPT models, which they've shared on the arXiv preprint server.

Trust issues in GPT models perceived flawlessness

Li emphasises, "Many people perceive large language models as flawless compared to other models, which is risky, especially for crucial domains. Our research underscores that these models aren't currently dependable enough for critical tasks."

Also Read

Elon Musk’s Starlink satellite internet services could soon reach India, regulatory approval pending

Koyejo and Li evaluated GPT-3.5 and GPT-4 across eight trust aspects: bias, toxicity, robustness, privacy, ethics, and fairness. While newer models reduce toxicity on standard benchmarks, they retain biases and susceptibility to generating harmful content, potentially disclosing private data. Koyejo reminds that users overestimate their capabilities; these models, despite their prowess, possess vulnerabilities.

Exercise caution

Current GPT models handle toxicity in ways that aren't fully transparent. Koyejo notes, "Many popular models are closed-off and lack transparency, so we're not privy to the intricacies of their training processes." This lack of transparency prompted the researchers to delve into this area, aiming to identify potential pitfalls.

Also Read

5 companies behind India's historic journey to moon

Li describes their approach as akin to a "Red Team," subjecting the models to various testing methodologies they devised.

Upon testing benign prompts, Koyejo and Li discovered that GPT-3.5 and GPT-4 did indeed reduce toxic outputs significantly compared to previous models. However, the probability of toxicity still hovered around 32 percent. When presented with adversarial prompts—explicitly instructing the model to produce toxic content—the probability of generating toxicity surged to 100 percent.

GPT models' privacy leaks & bias uncovered

GPT models leaked private data and exhibited biases. GPT-4 had more leaks, indicating responsiveness to prompts. Despite improvements, risks of harmful content persist. Benchmark studies expose behavioral gaps. Koyejo and Li call for unbiased research and user skepticism, emphasizing human oversight. Original research remains essential in navigating AI's evolving landscape.