OpenAI is just an open source tool that cleans up your secrets before ChatGPT sees them

short

OpenAI has released the privacy filter within Apache 2.0 on GitHub and Hugging Face.
The 1.5 billion-parameter model runs locally and hides names, addresses and passwords.
Up to 96% F1 on the standard PII-Masking-300k out of the box.

Every day, millions of people paste things into ChatGPT that they probably shouldn’t. Tax returns. Medical records. Create emails with customer names. This strange rash. The API key they swore would be rolled out next week.

OpenAI just released a free tool that cleans everything up before your chatbot even sees it.

It’s called Privacy filterIt was released this week under the Apache 2.0 license, meaning anyone can download, use, modify and sell products built on top of it. The model lives Face hugging and GitHub, weighing in at 1.5 billion parameters (the metric that measures the potential breadth of knowledge of the model), which is small enough to run on a regular laptop.

Think of it like spell check, but for privacy. You feed it a block of text, and it returns the same text with all the sensitive bits swapped out for generic placeholders like (PRIVATE_PERSON) or (ACCOUNT_NUMBER).

Remember when people were able to do that Unedited Parts of the Jeffrey Epstein files because the Donald Trump administration simply used a black mark to try to hide those secrets? If they had used this model, there would not have been a problem.

What the OpenAI Privacy Filter actually does

The Privacy Filter scans eight categories of personal information: names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets like passwords and API keys. It reads the entire text in one pass, then marks sensitive parts so they can be hidden or redacted.

Here’s a real example from an OpenAI ad. I pasted an email that said:

“Thanks again for the meeting earlier today. (…) For reference, the project file is listed under 4829-1037-5581. If anything changes on your end, feel free to respond here at maya.chen@example.com or call me at +1 (415) 555-0124.”

“Privacy Filter” spits again:

“Thanks again for the meeting earlier today (…) For reference, the project file is listed under (ACCOUNT_NUMBER). If anything changes on your end, feel free to reply here at (PRIVATE_EMAIL) or contact me at (PRIVATE_PHONE).”

Instead of dealing with black boxes and tags, it changes the actual text.

A lot of tools are already trying to capture phone numbers and email addresses. They work by looking for patterns, such as “three digits, dash, three digits.” This is fine for the obvious things but breaks down when the second things depend on context.

Is “Annie” a proprietary name or a trademark? Is “123 Main Street” a person’s home or the address of a storefront business? Pattern matching cannot be known. The privacy filter can, because it actually reads the sentence surrounding it.

The model seems to be very good at detecting these nuances. OpenAI reported that its model scored 96% on a benchmark using Personal anonymity – 300 kilos The data set is out of the box, with a corrected version of the same test pushing it to 97.43%.

In other words, it detects private information 96% of the time. Your job, as a privacy-conscious person, is to take care of the other 4%

The “runs locally” part is the point

Privacy nerds might see this as a good thing: OpenAI has made a model small and powerful enough to run on your device, which means your text never leaves your computer to be cleaned up.

This is important because the alternative, and the one that most companies currently use, is to send your raw data to some cloud service that they claim is secure and then trust them. This arrangement does not always progress well.

It is also free and open source, so researchers can investigate, improve and use it without worrying about legal consequences.

The data is cleansed on your laptop, and only the cleaned version goes elsewhere. If you run a small business, this means you can use AI to summarize customer emails without handing over the customer’s name to a third party. Freelance attorneys can enter case notes into the chatbot without leaking to the client. Doctors can formulate patient referrals without the patient’s identity. Developers can debug code with AI without pasting their API keys directly into the router, which seems to be a rite of passage that no one talks about.

For ordinary people, the use case is more mundane and more common. You want to ask ChatGPT to rewrite that angry email to your landlord, but you don’t like the idea of handing over your home address to OpenAI. Privacy Filter solves this problem in one step.

Running open source AI models locally was a hobbyist project using gaming GPUs. This is no longer the case. Tools like LM Studio Now make it as difficult as installing Spotify.

What it is not

OpenAI has been upfront about the limits. The company warned that the privacy filter “is not an anonymization tool, a certificate of compliance, or a substitute for a policy review.”

Translation: Don’t use it as your only line of defense at a hospital, law firm, or bank. It can miss unusual identifiers, over-revise short sentences, and perform unevenly across languages. It’s one tool in a package, not a check box for compliance. After all, 96% accuracy is not 100% accuracy.

Daily debriefing Newsletter

Start each day with the latest news, plus original features, podcasts, videos and more.

Source link

OpenAI is just an open source tool that cleans up your secrets before ChatGPT sees them

short

What the OpenAI Privacy Filter actually does

The “runs locally” part is the point

What it is not

Daily debriefing Newsletter

Leave a ReplyCancel Reply

Three “Forever Arrows” for the AI era

Where is the average milk production located?

Cryptocurrency prices are on edge as Iran demands closure of the Strait of Hormuz – again

short

What the OpenAI Privacy Filter actually does

The “runs locally” part is the point

What it is not

Daily debriefing Newsletter

Leave a ReplyCancel Reply

Trending now

Three “Forever Arrows” for the AI ​​era

Where is the average milk production located?

Cryptocurrency prices are on edge as Iran demands closure of the Strait of Hormuz – again

Three “Forever Arrows” for the AI era