Researchers find LLMs like ChatGPT output sensitive data even after it’s been ‘deleted’

  • 📰 Cointelegraph
  • ⏱ Reading Time:
  • 42 sec. here
  • 2 min. at publisher
  • 📊 Quality Score:
  • News: 20%
  • Publisher: 51%

Education Education Headlines News

Education Education Latest News,Education Education Headlines

Researchers from the University of North Carolina, Chapel Hill, recently published pre-print research outlining the difficulties in deleting information from LLMs and developing mitigation methods to prevent adversarial prompting attacks.

A trio of scientists from the University of North Carolina, Chapel Hill recentlypre-print artificial intelligence research showcasing how difficult it is to remove sensitive data from large language models such as OpenAI’s ChatGPT and Google’s Bard.

Once a model is trained, its creators cannot, for example, go back into the database and delete specific files in order to prohibit the model from outputting related results. Essentially, all the information a model is trained on exists somewhere inside its weights and parameters where they’re undefinable without actually generating outputs. This is the “black box” of AI.

Here, we see that despite being"deleted" from a model's weights, the word"Spain" can still be conjured using reworded prompts.However, as the UNC researchers point out, this method relies on humans finding all the flaws a model might exhibit and, even when successful, it still doesn’t “delete” the information from the model.“A possibly deeper shortcoming of RLHF is that a model may still know the sensitive information.

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 562. in EDUCATİON

Education Education Latest News, Education Education Headlines