Unstructured data and the storage it needs

IDC estimates that upwards of 80% of business information is likely to be formed of unstructured data by 2025.

And while “unstructured” can be something of a misnomer, because all files have some sort of metadata by which they can be searched and ordered, for example, there are huge volumes of such data in the hands of businesses.

In this article, we look at what’s particular to working with unstructured data and the storage – usually file or object – that it needs.

In the past, images, voice recordings, videos, chat logs and documents of varying kinds were largely just a storage liability and seen as a headache for anyone who needed to manage, organise and keep it secure.

But now unstructured data is seen as a valuable source of business information. With analytics processing, value can be gained from it – for example, it’s possible to run AI/ML against sets of advertisement images and map what site visitors see to click behaviour. Analysis of unstructured image data can create structured fields that can drive editorial decision-making.

Elsewhere, backups – long consigned to dusty and hard-to-access tape archives – are now viewed as a potential data source for analytics processing. And with the threat of ransomware high on the agenda, the necessity of backups to recover to is more pertinent than ever.

Structured, unstructured, semi-structured

Unstructured data, broadly speaking, is data and information that does not conform to a predefined data model – in other words, information that is created and lives outside a relational database.

Business information generated by systems is most likely to be structured, with customer and product details, order numbers, stock levels and shipment information created by a sales system and stored in its underlying database being typical examples.

Those are more than likely SQL databases, configured with a table-based schema and data held in rows and columns that allow for very rapid writes and querying of the data, with very good transactional integrity. SQL databases are at the heart of the most performant and mission-critical applications in use.

Unstructured/semi-structured

Unstructured data is often created by people, and it includes email, social media posts, voice recordings, images, video, notes, and documents such as PDFs.

As mentioned, most unstructured data can actually be what you’d call semi-structured and though not held in a database – although that is possible – there is some structure there in its metadata. For example, an image of a delivered item would, superficially, be unstructured – although metadata from the camera files makes it semi-structured.

And then there are backup files, in which all an organisation’s data is copied, compressed, encrypted and packaged into the (usually proprietary) format of the backup vendor.

The fact that backups bundle together all types of data make it an unstructured data challenge, and one that has possibly more relevance than ever with the rise of the ransomware threat.

Unstructured and semi-structured storage needs

As we’ve seen, unstructured data is more or less defined by the fact it is not created by use of a database. It may be the case that more structure is applied to unstructured data later in its life, but then it becomes something else.

What we’ll look at here are the key requirements for storage infrastructure for unstructured data. These are:

  • Volume: Usually there is lot of unstructured data, so capacity is a key requirement.
  • File and/or object storage: Block storage is for databases, and as we’ve seen that’s just not a requirement for unstructured data use cases. File-based (NAS) and object storage fulfil the need for.
  • Performance: Historically this wouldn’t have been on the agenda, but with the need for analytics closer to real time and for rapid recovery from cyber attack, it’s now more of a consideration.

Cloud and unstructured data

With these requirements in mind, cloud storage would appear to fit the bill well as a site to store unstructured data. There are potentially a few things that work against it, however.

Cloud storage provides object (overwhelmingly, in terms of volume) and file-access storage so it is potentially well-suited in that regard.

Cloud storage can also provide capacity, and it may well be the case that data can be stored at volume in the cloud in an extremely cost-effectively manner. But it is usually the case that costs can be kept very low only when data is not accessed, so that’s the first potential drawback of cloud storage.

So, the cloud is very good for cold data but any kind of I/O starts to push up costs. That may be acceptable depending on the size and access requirements of your workload, however. Small datasets, or those that require infrequent access, would be ideal.

On-site object and file storage

Clustered NAS and object storage are both well-suited to very large volumes of unstructured data. If anything, object storage is even better-suited to large amounts of data due to its superior ability to scale.

File-based storage is based on a file system and a tree-like hierarchical structure. This can lead to performance overheads as the file system is traversed. Object storage, by contrast, is based on a flat structure with objects/files possessing a unique ID that facilitates access.

On-site storage can allay concerns about security of data and its availability, and can potentially work out less costly than putting data in the cloud.

Either set of protocols – file and object – is well-suited to unstructured data storage.

Add flash for fast access

It’s quite possible to build adequately performing file and object storage on-site using spinning disk. At the capacities needed, HDD is often the most economic option.

But advances in flash manufacturing have led to high-capacity solid state storage becoming available, and storage array makers have started to use it in file and object storage-capable hardware.

This is QLC – quad-level cell – flash. This packs in four levels of binary switches to flash cells to provide higher storage density and so lower cost per GB than any other flash commercially usable currently.

The trade-offs that come with QLC, however, are that flash lifetime can be compromised, so it’s better suited to large-capacity, less frequently accessed data.

But the speed of flash is particularly well-suited to unstructured use cases, such as in analytics where rapid processing and therefore I/O is needed – and in cases where customers may want to restore large datasets from backups in case of a ransomware attack, for example.

Storage hardware providers that sell QLC-based arrays suited to file and in some cases object storage include:

Dell EMC, with PowerScale, which includes EMC’s Isilon scale-out NAS (partially) rebranded and with S3 object storage access. Its all-flash (it also has hybrid flash) NVMe QLC flash-equipped options come in a range of capacities that scale to tens of PB.

NetApp, which recently launched a new QLC flash storage array family – the C-series – aimed at higher-capacity use cases that also need the speed of SSD. The C-series starts with three options – the C250, C400 and C800 – which scale to 35PB, 71PB and 106PB respectively. Object storage access is possible but limited using the protocol via NetApp’s Ontap OS.

Pure Storage with its FlashArray//C provides all-QLC NVMe-connected flash in two models, the //C40 and //C60 with capacities into the PB range. Meanwhile, Pure’s FlashBlade//S family is explicitly marketed as “fast file and object” with NVMe QLC in its proprietary modules in two models. The S200 emphasises capacity, with data reduction, while the S500 goes for performance.

Read More
Margarete Culton

Latest

I Drove Hyundai’s Hydrogen-Fueled Nexo. It’s Perfect, Just Not for the US

Hyundai's new 2026 Nexo is an electric SUV that cruises for up to 450 miles and refuels at a familiar-looking pump in 5 minutes. Instead of a battery pack, the Nexo generates electricity on the go from a hydrogen tank and fuel cell. On paper, it's exactly what Americans want -- long-range, fast fill-ups, few

10 Years Later, One of The Best Shonen Jump Series of All Time Is Still Awaiting a Sequel

Written and illustrated by Katsura Hoshino, D.Gray-man is one of the best series ever published in the Weekly Shonen Jump magazine. While the manga began serialization in 2004, it faced multiple hiatuses due to the creator’s poor health and never got the attention it deserved. In April 2018, the series moved to the quarterly magazine Jump SQ.

Exodus’ former studio head James Ohlen touches on why he left Archetype Entertainment: “I was running on fumes”

"It was hurting my health" Image credit: Archetype Entertainment Back in December of last year, despite being the head of the studio, James Ohlen left Archetype Entertainment, also leaving his role as producer on Exodus behind. It was a bit of a surprise, given that he co-founded the studio after having retired from BioWare in

“We will probably get some flack”: Subnautica 2 may feel polished for an early access game, but it was important for the team it...

No one wants another Moonbreaker Image credit: Krafton / Rock Paper Shotgun It's been more than a decade since the original Subnautica dove into early access. The deep sea survival game spent four years there as developer Unknown Worlds Entertainment added new features, biomes, and polished the whole thing up with the game's players. It

Newsletter

Don't miss

I Drove Hyundai’s Hydrogen-Fueled Nexo. It’s Perfect, Just Not for the US

Hyundai's new 2026 Nexo is an electric SUV that cruises for up to 450 miles and refuels at a familiar-looking pump in 5 minutes. Instead of a battery pack, the Nexo generates electricity on the go from a hydrogen tank and fuel cell. On paper, it's exactly what Americans want -- long-range, fast fill-ups, few

10 Years Later, One of The Best Shonen Jump Series of All Time Is Still Awaiting a Sequel

Written and illustrated by Katsura Hoshino, D.Gray-man is one of the best series ever published in the Weekly Shonen Jump magazine. While the manga began serialization in 2004, it faced multiple hiatuses due to the creator’s poor health and never got the attention it deserved. In April 2018, the series moved to the quarterly magazine Jump SQ.

Exodus’ former studio head James Ohlen touches on why he left Archetype Entertainment: “I was running on fumes”

"It was hurting my health" Image credit: Archetype Entertainment Back in December of last year, despite being the head of the studio, James Ohlen left Archetype Entertainment, also leaving his role as producer on Exodus behind. It was a bit of a surprise, given that he co-founded the studio after having retired from BioWare in

“We will probably get some flack”: Subnautica 2 may feel polished for an early access game, but it was important for the team it...

No one wants another Moonbreaker Image credit: Krafton / Rock Paper Shotgun It's been more than a decade since the original Subnautica dove into early access. The deep sea survival game spent four years there as developer Unknown Worlds Entertainment added new features, biomes, and polished the whole thing up with the game's players. It

UK games industry fundamentally misunderstood, new report calls for unified research framework

UKIE and entertainment charity OKRE call for industry, government, and academia to collaborate on the framework to address identified research gaps Image credit: James Newcombe UKIE and entertainment charity OKRE have developed a framework to help the UK games industry maximise its economic and social value. The Building a Unified Framework for the UK Video

WD sees sustainability as key business driver in an ‘AI economy’

Hard drive company WD promoted long-term operations and sustainability executive Jackie Jung to become its first chief sustainability officer in February, as it steps up sales to companies building AI data centers. Her vision: Turn sustainability into a “brand” for WD, a strategy that reduces risk for the $6 billion company (formerly known as Western

5 Business Ideas Worth Starting in 2026

If there is one thing Nigerians understand well, it is how to spot opportunity inside hardship. In 2026, that mindset will matter more than ever. The economy is tough, competition is rising, and many people are looking for smarter ways to earn, build, and survive. But even in a difficult environment, some businesses still stand

Getting a business loan now comes with a frequent flyer upside

Australian fintech Prospa has partnered with Qantas Business Rewards, letting eligible SMEs earn up to 500,000 points per loan. What’s happening: Australian fintech lender Prospa has partnered with Qantas Business Rewards to allow eligible small and medium business owners to earn up to 500,000 Qantas Points per loan when taking out a Prospa Small Business