LLMs Like ChatGPT Persistently Leak Sensitive Data

    Published on:

    In pioneering research, a team at the University of North Carolina at Chapel Hill sheds light on the pressing issue of data retention in large-scale language models (LLMs) such as OpenAI’s ChatGPT and Google’s Bard.

    Despite attempts to remove them, the complexity of these AI models continues to regurgitate. confidential datasparking tough debates about information security and AI ethics.

    The challenge of “undeleteable” data

    Researchers set out to investigate the erasure of sensitive information from LLMs. However, they encountered a fact. Deleting such data is difficult, but verifying deletion presents similar challenges. When these giant AIs are trained on huge datasets, they store the data in a complex maze of parameters and weights.

    This predicament becomes ominous when AI models can inadvertently expose sensitive data such as personally identifying information or financial records, laying the groundwork for exploitation.

    Moreover, the crux of the problem lies in the blueprints of these models. The preparation phase includes training on a huge database and fine-tuning to ensure consistent output. The term “Generative Pretrained Transformer” encapsulated in GPT provides a glimpse into this mechanism.

    UNC scholars have uncovered a hypothetical scenario in which LLMs leveraging a trove of sensitive banking data pose a potential threat. Modern guardrails employed by AI developers are insufficient to allay this concern.

    These safeguards, such as hard-coded prompts and a paradigm known as reinforcement learning from human feedback (RLHF), play an important role in suppressing unwanted output. However, the data remains lurking in the depths of the model and can be quickly recalled by simply rephrasing the prompt.

    Close the security gap

    The UNC team found that despite implementing state-of-the-art model editing techniques, such as Rank-One model editing, substantial factual information remained accessible. Their findings revealed that facts can be recovered approximately 38% and 29% of the time through white-box and black-box attacks, respectively.

    In their quest, the researchers utilized a model known as GPT-J. With 6 billion parameters, this tool is dwarfed by ChatGPT’s base model, the behemoth GPT-3.5, which has 170 billion parameters. This stark contrast suggests the tremendous challenge of sanitizing large models like GPT-3.5 from unwarranted data.

    Additionally, UNC scholars have devised new defenses to protect LLM from certain “extraction attacks.” These nefarious schemes exploit the model’s guardrails to fish out sensitive data. Nevertheless, the paper eerily hinted at a perpetual cat-and-mouse game, with defensive strategies forever chasing evolving offensive tactics.

    Microsoft delegates nuclear team to power AI

    In this context, the rapid growth in the field of AI has led technology giants like Microsoft to venture into uncharted territory. Microsoft’s recent formation of a nuclear team to strengthen its AI initiatives highlights the growing demand and future intertwining of AI and energy resources. As AI models evolve, the demand for energy will skyrocket, paving the way for innovative solutions to meet this growing demand.

    The debate about data retention and deletion in LLMs extends beyond academic disciplines. Thorough research and industry-wide dialogue is required to foster a robust framework that ensures data security while nurturing the growth and potential of AI.

    This work by UNC researchers is a major step toward understanding and ultimately solving the “undeleteable” data problem, and brings us one step closer to making AI a safer tool for the digital age.


    Leave a Reply

    Please enter your comment!
    Please enter your name here