{"id":612840,"date":"2023-02-28T08:49:52","date_gmt":"2023-02-28T14:49:52","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/02\/28\/unstructured-data-and-the-storage-it-needs\/"},"modified":"2023-02-28T08:49:52","modified_gmt":"2023-02-28T14:49:52","slug":"unstructured-data-and-the-storage-it-needs","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/02\/28\/unstructured-data-and-the-storage-it-needs\/","title":{"rendered":"Unstructured data and the storage it needs"},"content":{"rendered":"<section id=\"content-body\">\n<p>IDC estimates that upwards of 80% of business information is\u00a0likely to be formed of unstructured data by 2025.<\/p>\n<p>And while \u201cunstructured\u201d can be something of a misnomer, because all files have some sort of metadata by which they can be searched and ordered, for example, there are huge volumes of such data in the hands of businesses.<\/p>\n<p>In this article, we look at what\u2019s particular to working with unstructured data and the storage \u2013 usually file or\u00a0<a href=\"https:\/\/www.computerweekly.com\/resources\/Object-storage\">object<\/a>\u00a0\u2013 that it needs.<\/p>\n<p>In the past, images, voice recordings, videos, chat logs and documents of varying kinds were largely just a storage liability and seen as a headache for anyone who needed to manage, organise and keep it secure.<\/p>\n<p>But now unstructured data is seen as a valuable source of business information. With analytics processing, value can be gained from it \u2013 for example, it\u2019s possible to run\u00a0<a href=\"https:\/\/www.computerweekly.com\/feature\/Storage-requirements-for-AI-ML-and-analytics-in-2022https:\/www.computerweekly.com\/feature\/Putting-artificial-intelligence-and-machine-learning-workloads-in-the-cloud\">AI\/ML<\/a>\u00a0against sets of advertisement images and map what site visitors see to click behaviour. Analysis of unstructured image data can create structured fields that can drive editorial decision-making.<\/p>\n<p>Elsewhere, backups \u2013 long consigned to dusty and hard-to-access\u00a0<a href=\"https:\/\/www.computerweekly.com\/feature\/Top-five-ways-to-benefit-from-tape-today\">tape<\/a>\u00a0archives \u2013 are now viewed as a potential data source for analytics processing. And with the threat of\u00a0<a href=\"https:\/\/www.computerweekly.com\/feature\/Ransomware-storage-and-backup-Impacts-limits-and-capabilities\">ransomware<\/a>\u00a0high on the agenda, the necessity of backups to recover to is more pertinent than ever.<\/p>\n<section data-menu-title=\"Structured, unstructured, semi-structured\">\n<h3><i data-icon=\"1\"><\/i>Structured, unstructured, semi-structured<\/h3>\n<p>Unstructured data, broadly speaking, is data and information that does not conform to a predefined data model \u2013 in other words, information that is created and lives\u00a0<a href=\"https:\/\/www.computerweekly.com\/feature\/Is-object-storage-good-for-databases\">outside a\u00a0relational database<\/a>.<\/p>\n<p>Business information generated by systems is most likely to be\u00a0structured, with customer and product details, order numbers, stock levels and shipment information created by a sales system and stored in its underlying database being typical examples.<\/p>\n<p>Those are more than likely SQL databases, configured with a table-based schema and data held in rows and columns that allow for very rapid writes and querying of the data, with very good transactional integrity. SQL\u00a0databases are at the heart of the most performant and mission-critical applications in use.<\/p>\n<\/section>\n<section data-menu-title=\"Unstructured\/semi-structured\">\n<h3><i data-icon=\"1\"><\/i>Unstructured\/semi-structured<\/h3>\n<p>Unstructured data is often created by people, and it includes email, social media posts, voice recordings, images, video, notes, and documents such as PDFs.<\/p>\n<p>As mentioned, most unstructured data can actually be what you\u2019d call semi-structured and though not held in a database \u2013 although that is possible \u2013 there is some structure there in its metadata.\u00a0For example, an image of a delivered item would, superficially, be unstructured \u2013 although metadata from the camera files makes it semi-structured.<\/p>\n<p>And then there are backup files, in which all an organisation\u2019s data is copied, compressed, encrypted and packaged into the (usually proprietary) format of the backup vendor.<\/p>\n<p>The fact that backups bundle together all types of data make it an unstructured data challenge, and one that has possibly more relevance than ever with the rise of the ransomware threat.<\/p>\n<\/section>\n<section data-menu-title=\"Unstructured and semi-structured storage needs\">\n<h3><i data-icon=\"1\"><\/i>Unstructured and semi-structured storage needs<\/h3>\n<p>As we\u2019ve seen, unstructured data is more or less defined by the fact it is not created by use of a database. It may be the case that more structure is applied to unstructured data later in its life, but then it becomes something else.<\/p>\n<p>What we\u2019ll look at here are the key requirements for storage infrastructure for unstructured data.\u00a0These are:<\/p>\n<ul type=\"disc\">\n<li>Volume: Usually there is lot of unstructured data, so capacity is a key requirement.<\/li>\n<li>File and\/or object storage: Block storage is for databases, and as we\u2019ve seen that\u2019s just not a requirement for unstructured data use cases. File-based (NAS) and object storage fulfil the need for.<\/li>\n<li>Performance: Historically this wouldn\u2019t have been on the agenda, but with the need for analytics closer to real time and for rapid recovery from cyber attack, it\u2019s now more of a consideration.<\/li>\n<\/ul>\n<\/section>\n<section data-menu-title=\"Cloud and unstructured data\">\n<h3><i data-icon=\"1\"><\/i>Cloud and unstructured data<\/h3>\n<p>With these requirements in mind, cloud storage would appear to fit the bill well as a site to store unstructured data. There are potentially a few things that work against it, however.<\/p>\n<p>Cloud storage provides object (overwhelmingly, in terms of volume) and file-access storage so it is potentially well-suited in that regard.<\/p>\n<p>Cloud storage can also provide capacity, and it may well be the case that data can be stored at volume in the cloud in an extremely cost-effectively manner. But it is usually the case that costs can be kept very low only when data is not accessed, so that\u2019s the first potential drawback of cloud storage.<\/p>\n<p>So, the cloud is very good for cold data but any kind of I\/O starts to push up costs.\u00a0That may be acceptable depending on the size and access requirements of your workload, however. Small datasets, or those that require infrequent access, would be ideal.<\/p>\n<\/section>\n<section data-menu-title=\"On-site object and file storage\">\n<h3><i data-icon=\"1\"><\/i>On-site object and file storage<\/h3>\n<p>Clustered NAS and object storage are both well-suited to very large volumes of unstructured data.\u00a0If anything, object storage is even better-suited to large amounts of data due to its superior ability to scale.<\/p>\n<p>File-based storage is based on a file system and a tree-like hierarchical structure. This can lead to performance overheads as the file system is traversed. Object storage, by contrast, is based on a flat structure with objects\/files possessing a unique ID that facilitates access.<\/p>\n<p>On-site storage can allay concerns about security of data and its availability, and can potentially work out less costly than putting data in the cloud.<\/p>\n<p>Either set of protocols \u2013 file and object \u2013 is well-suited to unstructured data storage.<\/p>\n<\/section>\n<section data-menu-title=\"Add flash for fast access\">\n<h3><i data-icon=\"1\"><\/i>Add flash for fast access<\/h3>\n<p>It\u2019s quite possible to build adequately performing file and object storage on-site using spinning disk. At the capacities needed, HDD is often the most economic option.<\/p>\n<p>But advances in flash manufacturing have led to high-capacity solid state storage becoming available, and storage array makers have started to use it in file and object storage-capable hardware.<\/p>\n<p>This is QLC \u2013 quad-level cell \u2013 flash. This packs in four levels of binary switches to flash cells to provide higher storage density and so lower cost per GB than any other flash commercially usable currently.<\/p>\n<p>The trade-offs that come with QLC, however, are that flash lifetime can be compromised, so it\u2019s better suited to large-capacity, less frequently accessed data.<\/p>\n<p>But the speed of flash is particularly well-suited to unstructured use cases, such as in analytics where rapid processing and therefore I\/O is needed \u2013 and in cases where customers may want to restore large datasets from backups in case of a ransomware attack, for example.<\/p>\n<p>Storage hardware providers that sell QLC-based arrays suited to file and in some cases object storage include:<\/p>\n<p>Dell EMC, with PowerScale, which includes EMC\u2019s Isilon scale-out NAS (partially) rebranded and with S3 object storage access. Its all-flash (it also has hybrid flash) NVMe QLC flash-equipped options come in a range of capacities that scale to tens of PB.<\/p>\n<p>NetApp, which recently launched a new QLC\u00a0flash storage\u00a0array family \u2013 the C-series \u2013 aimed at higher-capacity use cases that also need the speed of SSD.\u00a0The C-series starts with three options \u2013 the C250, C400 and C800 \u2013 which scale to 35PB, 71PB and 106PB respectively. Object storage access is possible but limited using the protocol via NetApp\u2019s Ontap OS.<\/p>\n<p>Pure Storage with its FlashArray\/\/C provides all-QLC NVMe-connected flash in two models, the \/\/C40 and \/\/C60 with capacities into the PB range. Meanwhile, Pure\u2019s FlashBlade\/\/S family is explicitly marketed as \u201cfast file and object\u201d with NVMe QLC in its proprietary modules in two models. The S200 emphasises capacity, with data reduction, while the S500 goes for performance.<\/p>\n<\/section>\n<\/section>\n<p><a href=\"https:\/\/www.computerweekly.com\/feature\/Unstructured-data-and-the-storage-it-needs\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Margarete Culton<\/p>\n","protected":false},"excerpt":{"rendered":"<p>IDC estimates that upwards of 80% of business information is\u00a0likely to be formed of unstructured data by 2025. And while \u201cunstructured\u201d can be something of a misnomer, because all files have some sort of metadata by which they can be searched and ordered, for example, there are huge volumes of such data in the hands<\/p>\n","protected":false},"author":1,"featured_media":612841,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23212,46,76879],"tags":[],"class_list":{"0":"post-612840","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-storage","8":"category-technology","9":"category-unstructured"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/612840","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=612840"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/612840\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/612841"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=612840"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=612840"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=612840"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}