Zettabytes

Zettabytes

Home
Notes
Archive
About

Share this post

Zettabytes
Zettabytes
The End of Big Data

The End of Big Data

As much as 80% of business data gets deleted, wiped, overwritten, or destroyed.

John Foley's avatar
John Foley
May 28, 2025

Share this post

Zettabytes
Zettabytes
The End of Big Data
1
Share
Cross-post from Zettabytes
How much data is actually captured and stored? What happens to old data? And what about the data center hardware? -
John Foley

Welcome to Zettabytes, a new Substack newsletter on the hypergrowth of AI-fueled data.

Thanks for reading Zettabytes! Subscribe for free to receive my new posts.

Photo credit: John Foley

Data volumes keep growing, but most data actually disappears. It may be deleted, wiped, overwritten, shredded. I’ve smashed old PC drives with a hammer.

That’s true in the data center, as well.

Kevin Purdy, a former tech reporter with Ars Technica, offers a first-hand look in an article on IT asset disposition, or ITAD, where decommissioned computers are sent for recycling or to be refurbished and resold on places like eBay.

Purdy visited SK TES, an ITAD provider located in northern Virginia, an area known for data centers that serve big customers, including federal agencies.

Purdy takes us through the intake process: Pallets of used equipment arrive, get labeled, tracked, inventoried, examined, disassembled, and tested. Most are refurbished for resale; some get shredded.

It’s designed so that existing data-on-disk doesn’t accidentally make its way back into circulation. “The biggest risk is data escape,” says SK TES’s chief commercial officer, a concerning prospect as old servers get forklifted.

There’s something to be learned from the comments on the Ars article, as well.

Most business data is deleted or destroyed

I took a photo of the “Data Destruction Day” sign on the campus of Northeastern University in Boston a few year ago. It was provided as a free service to students. “Bring your electronics and paper items for secure, sustainable, and free destruction,” the sign said.

It’s long been the case that some data get captured and stored on hard drives, SSD, memory, flash, appliances, tape, S3, and cloud storage where it can be accessed and managed. However, according to Anthropic’s Claude chatbot as much as 80% of business data is eventually deleted or destroyed.

Here’s what Claude said about this.

Prompt: What percentage of business data is stored and retained? And what percentage of business data is deleted or otherwise destroyed?

Claude:

Precise statistics on business data retention vs. destruction are challenging to pin down due to varying industry practices and lack of comprehensive tracking, but here are the general patterns:

Data retention estimates:

  • Approximately 20-30% of business data is actively retained long-term

  • Critical business records (financial, legal, compliance) have higher retention rates

  • Operational data often has shorter lifecycles with lower retention percentages

Data destruction patterns:

  • An estimated 70-80% of business data is eventually deleted or destroyed

  • Much of this occurs through automated processes (log rotation, cache clearing, temporary file cleanup)

  • Planned destruction follows data retention policies, typically ranging from 3-7 years for most business records

Key factors affecting these percentages:

  • Industry regulations - Healthcare, finance, and government retain more data due to compliance requirements

  • Data type - Transactional and customer data retained longer than operational logs

  • Storage costs - Economic pressures drive deletion of less valuable data

  • Business value - Analytics and historical trend data may be retained longer

Modern trends:

  • Cloud storage cost reductions are slightly increasing retention rates

  • Data governance initiatives are making destruction more systematic

  • AI and analytics needs are extending some data lifecycles

  • Privacy regulations (GDPR, CCPA) are enforcing more systematic deletion

The exact percentages vary significantly by industry, with heavily regulated sectors retaining much higher percentages than typical businesses. These figures represent general patterns rather than precise measurements, as comprehensive industry-wide data destruction statistics aren't systematically collected.

- end response -

Net-net: aggregate data keeps growing

Data deletion isn’t new of course, but it’s something to consider as data volumes continue to spike upwards.

And keep in mind that some data — a “vast amount,” according to Google Gemini — is never stored at all. Gemini refers to this as dark data or ephemeral data. Examples include transient sensor data, lost packets, even unsaved work.

It’s remarkable to think that even as the majority of data is never captured or goes away, the net effect is still of fast-growing data volumes on everything from personal laptops to hyperscale data centers. In other words, what we see is the tip of the data iceberg.


Interested in big, bigger, biggest data? Subscribe to Zettabytes and share. The post below explains what I’m up to.

Welcome to Zettabytes!

Welcome to Zettabytes!

John Foley
·
May 24
Read full story

Share

Share this post

Zettabytes
Zettabytes
The End of Big Data
1
Share

No posts

© 2025 John Foley
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share