Do text embeddings perfectly encode text?

Announcing our new article from Jack Morris!

Article Preview:

… Now imagine you’re a software engineer building a RAG system for your company. You decide to store your vectors in a vector database. You notice that in a vector database, what’s stored are embedding vectors, not the text data itself. The database fills up with rows and rows of random-seeming numbers that represent text data but never ‘sees’ any text data at all. You know that the text corresponds to customer documents that are protected by your company’s privacy policy.

But you’re not really sending the text off-premises at any time, you only ever send the vectors, which look like random numbers. What if someone hacks into the database and gains access to all your text embedding vectors – would this be bad? Or if the service provider wanted to sell your data to advertisers – could they? Both scenarios involve being able to take embedding vectors and invert them somehow back to text

The problem of recovering text from embeddings is exactly the scenario we tackle in our paper Text Embeddings Reveal As Much as Text (EMNLP 2023). Are embedding vectors a secure format for information storage and communication? Put simply: can input text be recovered from output embeddings?

Continue Reading ->

Do text embeddings perfectly encode text?

About The Author

Leave a reply Cancel reply

Recent Posts

Recent Comments

Do text embeddings perfectly encode text?

About The Author

Related Posts

Andrew Lee: How AI will Shape the Future of Email

Update #78: Accelerating Candy Crush Development and Neural Network Flexibility

Venkatesh Rao: Protocols, Intelligence, and Scaling

Jonathan Frankle: From Lottery Tickets to LLMs

Leave a reply Cancel reply

Recent Posts

Recent Comments