Unstructured data and the storage it needs

IDC estimates that upwards of 80% of business information is likely to be formed of unstructured data by 2025.

And while “unstructured” can be something of a misnomer, because all files have some sort of metadata by which they can be searched and ordered, for example, there are huge volumes of such data in the hands of businesses.

In this article, we look at what’s particular to working with unstructured data and the storage – usually file or object – that it needs.

In the past, images, voice recordings, videos, chat logs and documents of varying kinds were largely just a storage liability and seen as a headache for anyone who needed to manage, organise and keep it secure.

But now unstructured data is seen as a valuable source of business information. With analytics processing, value can be gained from it – for example, it’s possible to run AI/ML against sets of advertisement images and map what site visitors see to click behaviour. Analysis of unstructured image data can create structured fields that can drive editorial decision-making.

Elsewhere, backups – long consigned to dusty and hard-to-access tape archives – are now viewed as a potential data source for analytics processing. And with the threat of ransomware high on the agenda, the necessity of backups to recover to is more pertinent than ever.

Structured, unstructured, semi-structured

Unstructured data, broadly speaking, is data and information that does not conform to a predefined data model – in other words, information that is created and lives outside a relational database.

Business information generated by systems is most likely to be structured, with customer and product details, order numbers, stock levels and shipment information created by a sales system and stored in its underlying database being typical examples.

Those are more than likely SQL databases, configured with a table-based schema and data held in rows and columns that allow for very rapid writes and querying of the data, with very good transactional integrity. SQL databases are at the heart of the most performant and mission-critical applications in use.

Unstructured/semi-structured

Unstructured data is often created by people, and it includes email, social media posts, voice recordings, images, video, notes, and documents such as PDFs.

As mentioned, most unstructured data can actually be what you’d call semi-structured and though not held in a database – although that is possible – there is some structure there in its metadata. For example, an image of a delivered item would, superficially, be unstructured – although metadata from the camera files makes it semi-structured.

And then there are backup files, in which all an organisation’s data is copied, compressed, encrypted and packaged into the (usually proprietary) format of the backup vendor.

The fact that backups bundle together all types of data make it an unstructured data challenge, and one that has possibly more relevance than ever with the rise of the ransomware threat.

Unstructured and semi-structured storage needs

As we’ve seen, unstructured data is more or less defined by the fact it is not created by use of a database. It may be the case that more structure is applied to unstructured data later in its life, but then it becomes something else.

What we’ll look at here are the key requirements for storage infrastructure for unstructured data. These are:

  • Volume: Usually there is lot of unstructured data, so capacity is a key requirement.
  • File and/or object storage: Block storage is for databases, and as we’ve seen that’s just not a requirement for unstructured data use cases. File-based (NAS) and object storage fulfil the need for.
  • Performance: Historically this wouldn’t have been on the agenda, but with the need for analytics closer to real time and for rapid recovery from cyber attack, it’s now more of a consideration.

Cloud and unstructured data

With these requirements in mind, cloud storage would appear to fit the bill well as a site to store unstructured data. There are potentially a few things that work against it, however.

Cloud storage provides object (overwhelmingly, in terms of volume) and file-access storage so it is potentially well-suited in that regard.

Cloud storage can also provide capacity, and it may well be the case that data can be stored at volume in the cloud in an extremely cost-effectively manner. But it is usually the case that costs can be kept very low only when data is not accessed, so that’s the first potential drawback of cloud storage.

So, the cloud is very good for cold data but any kind of I/O starts to push up costs. That may be acceptable depending on the size and access requirements of your workload, however. Small datasets, or those that require infrequent access, would be ideal.

On-site object and file storage

Clustered NAS and object storage are both well-suited to very large volumes of unstructured data. If anything, object storage is even better-suited to large amounts of data due to its superior ability to scale.

File-based storage is based on a file system and a tree-like hierarchical structure. This can lead to performance overheads as the file system is traversed. Object storage, by contrast, is based on a flat structure with objects/files possessing a unique ID that facilitates access.

On-site storage can allay concerns about security of data and its availability, and can potentially work out less costly than putting data in the cloud.

Either set of protocols – file and object – is well-suited to unstructured data storage.

Add flash for fast access

It’s quite possible to build adequately performing file and object storage on-site using spinning disk. At the capacities needed, HDD is often the most economic option.

But advances in flash manufacturing have led to high-capacity solid state storage becoming available, and storage array makers have started to use it in file and object storage-capable hardware.

This is QLC – quad-level cell – flash. This packs in four levels of binary switches to flash cells to provide higher storage density and so lower cost per GB than any other flash commercially usable currently.

The trade-offs that come with QLC, however, are that flash lifetime can be compromised, so it’s better suited to large-capacity, less frequently accessed data.

But the speed of flash is particularly well-suited to unstructured use cases, such as in analytics where rapid processing and therefore I/O is needed – and in cases where customers may want to restore large datasets from backups in case of a ransomware attack, for example.

Storage hardware providers that sell QLC-based arrays suited to file and in some cases object storage include:

Dell EMC, with PowerScale, which includes EMC’s Isilon scale-out NAS (partially) rebranded and with S3 object storage access. Its all-flash (it also has hybrid flash) NVMe QLC flash-equipped options come in a range of capacities that scale to tens of PB.

NetApp, which recently launched a new QLC flash storage array family – the C-series – aimed at higher-capacity use cases that also need the speed of SSD. The C-series starts with three options – the C250, C400 and C800 – which scale to 35PB, 71PB and 106PB respectively. Object storage access is possible but limited using the protocol via NetApp’s Ontap OS.

Pure Storage with its FlashArray//C provides all-QLC NVMe-connected flash in two models, the //C40 and //C60 with capacities into the PB range. Meanwhile, Pure’s FlashBlade//S family is explicitly marketed as “fast file and object” with NVMe QLC in its proprietary modules in two models. The S200 emphasises capacity, with data reduction, while the S500 goes for performance.

Read More
Margarete Culton

Latest

Mentalist Oz Pearlman Will Get Inside Trump’s Mind at the White House Correspondents’ Dinner

Typically, the White House Correspondents’ Dinner features a comedian for its star act. In years past, the journalists, executives, agents, and miscellaneous members of the DC establishment have gathered at the Washington Hilton to hear speeches from the head of the correspondents’ association and the president. Then a comedian gets up to properly skewer the

David Pollack Reflects on Being Laid Off From ESPN College GameDay

Moving from the Saturday morning spotlight to a home studio was a major shift for one of the most decorated defensive players in college football history. David Pollack, the former Georgia Bulldog and longtime ESPN mainstay, recently shared his perspective on the day his 13-year tenure at the network came to an abrupt end. Appearing

Star High School Football Player Shot and Killed in Texas

Star High School Football Player Shot and Killed in Texas A Lancaster High School football player was shot and killed during an off-campus shooting this week. Myers Anthony, a 16-year-old football star at Lancaster High School in Lancaster. The shooting is still being investigated as a homicide and appears to be an isolated incident. Anthony

New Orleans Saints News, April 16: Could Arvell Reese fall to the Saints?

Skip to main content Here are today’s Saints news links Apr 16, 2026, 12:30 PM UTC Welcome to today’s roundup of New Orleans Saints and NFL news! Some Saints players are showing up off the football field. A worrying trend. Without a doubt for the Saints. New Orleans Saints News Apr 15 New Orleans Saints

Newsletter

Don't miss

Mentalist Oz Pearlman Will Get Inside Trump’s Mind at the White House Correspondents’ Dinner

Typically, the White House Correspondents’ Dinner features a comedian for its star act. In years past, the journalists, executives, agents, and miscellaneous members of the DC establishment have gathered at the Washington Hilton to hear speeches from the head of the correspondents’ association and the president. Then a comedian gets up to properly skewer the

David Pollack Reflects on Being Laid Off From ESPN College GameDay

Moving from the Saturday morning spotlight to a home studio was a major shift for one of the most decorated defensive players in college football history. David Pollack, the former Georgia Bulldog and longtime ESPN mainstay, recently shared his perspective on the day his 13-year tenure at the network came to an abrupt end. Appearing

Star High School Football Player Shot and Killed in Texas

Star High School Football Player Shot and Killed in Texas A Lancaster High School football player was shot and killed during an off-campus shooting this week. Myers Anthony, a 16-year-old football star at Lancaster High School in Lancaster. The shooting is still being investigated as a homicide and appears to be an isolated incident. Anthony

New Orleans Saints News, April 16: Could Arvell Reese fall to the Saints?

Skip to main content Here are today’s Saints news links Apr 16, 2026, 12:30 PM UTC Welcome to today’s roundup of New Orleans Saints and NFL news! Some Saints players are showing up off the football field. A worrying trend. Without a doubt for the Saints. New Orleans Saints News Apr 15 New Orleans Saints

How NFL Prospects Can Build a Winning Football Resume

How NFL Prospects Can Build a Winning Football Resume For serious football players, a clean, well-structured football resume example can help turn game film into something a coach, scout, recruiter, or personnel staffer can scan fast and actually use. The competition is brutal at every level, with only 1.4% of NCAA football players drafted into the NFL

Family Business? Tee Grizzley Reacts After His Mom Accuses Him Of Leaving Her To Struggle (PHOTOS)

Y’all… it looks like some family tension might be brewing behind the scenes involving Tee Grizzley and his mom. What seemed like a regular social media post quickly turned into something deeper. And now, folks are side-eyeing the situation and wondering what’s really going on. RELATED: Tee Grizzley Shares A Message For Artists After His

SoE necessary but not sufficient, business leaders say

PE­TER CHRISTO­PHER Se­nior Mul­ti­me­dia Re­porter pe­ter.christo­pher@guardian.co.tt Heavy hand­ed but nec­es­sary giv­en the state of crime in T&T. This was a com­mon as­sess­ment from var­i­ous busi­ness groups when asked for their per­spec­tive on the lat­est de­c­la­ra­tion of a state of emer­gency in the coun­try. The T&T Cham­ber of In­dus­try and Com­merce, in a re­leased is­sued yes­ter­day

The Big Business of Carolyn Bessette-Kennedy

Can a nine-episode limited series really impact an entire season of shopping trends? Today brands are experiencing—and chasing—the “Carolyn Bessette-Kennedy effect” as a result of Ryan Murphy’s Love Story. And in many cases, it’s more pervasive than they could have prepared for. The FX series, based on the relationship between John F. Kennedy Jr. and