{"id":592856,"date":"2023-01-01T05:50:42","date_gmt":"2023-01-01T11:50:42","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/01\/01\/the-importance-of-improving-data-quality-at-source\/"},"modified":"2023-01-01T05:50:42","modified_gmt":"2023-01-01T11:50:42","slug":"the-importance-of-improving-data-quality-at-source","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/01\/01\/the-importance-of-improving-data-quality-at-source\/","title":{"rendered":"The importance of improving data quality at source"},"content":{"rendered":"<div id=\"content-center\">\n<ul>\n<li><i data-icon=\"1\"><\/i><\/li>\n<li><i data-icon=\"2\"><\/i><\/li>\n<\/ul>\n<div id=\"contributors-block\">\n<p><img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/computerweekly\/SA-Mathieson-CW-contributor.jpg\" alt=\"SA Mathieson\">\n\t\t\t\t\t<\/p>\n<p><span>By<\/span><\/p>\n<ul>\n<li>\n\t\t\t\t\t<a href=\"https:\/\/www.techtarget.com\/contributor\/SA-Mathieson\">SA Mathieson<\/a>\n\t\t\t\t\t\t<\/li>\n<\/ul>\n<p>\n\tPublished: <span>29 Dec 2022<\/span>\n<\/p>\n<\/div>\n<section id=\"content-body\">\n<p>The UK can blame its bad immigration data on Hungary, one of the eight countries which joined the European Union in 2004. Unlike most existing EU countries, the UK government allowed its citizens to move and work without restrictions, expecting 5,000 to 13,000 people to arrive each year. But this was a massive underestimate, causing accusations that immigration was out of control and arguably contributing to Britain\u2019s exit from the EU.<\/p>\n<p>Based on the results of <a href=\"https:\/\/urldefense.proofpoint.com\/v2\/url?u=https-3A__www.google.com_url-3Fclient-3Dinternal-2Delement-2Dcse-26cx-3D000538068201538516906-3Ayfye0gb-5Fe3i-26q-3Dhttps-3A__www.computerweekly.com_news_252507874_ONS-2Dpublishes-2Dtechnical-2Dlearnings-2Dof-2DUKs-2Dfirst-2Donline-2Dcensus-26sa-3DU-26ved-3D2ahUKEwisjLvB-5FIf8AhX3R-5FEDHWHHC-5FAQFnoECAMQAg-26usg-3DAOvVaw3F6LxFtVX8p3Jy6dmZbIFg&#038;d=DwMFBA&#038;c=tEbGsWWjqkBSpaWdXc_mdMSanI1bDu-FKXiKGCfVmPM&#038;r=Ct-iuFti-_T7pu9VpvgUhAeYIEXajAFK1_-AkFenM5M&#038;m=CsOQlCvjat974jOt9OyDZdAS3_j_yohaNglc2LqWroVQuDbgs4ro9I4i3xHajPSZ&#038;s=gHt4Lt1xKPH1c2hKfQWVQwP24zcNEsQ5U4ZXZ4gT3YA&#038;e=\">the 2021 Census<\/a>, the country which sent the most people to the UK was Poland, followed by Romania. But Hungary is the home of budget airline Wizz Air, which as part of keeping down costs tends to use smaller airports such as Luton, Birmingham and Sheffield Doncaster.<\/p>\n<p>Also to keep down costs, the International Passenger Survey run by the Office for National Statistics (ONS) at the time focused on Heathrow, Gatwick and Manchester. As a result, it didn\u2019t notice increasing numbers of eastern Europeans using budget flights run by Wizz Air and others.<\/p>\n<p>Georgina Sturge, a statistician for the House of Commons Library research service, highlights the episode in her new book,\u00a0<em>Bad data<\/em>, as an example of how data collection can go awry. The passenger survey had been set up in the 1960s, when far fewer people travelled internationally, more left the UK permanently than arrived, and most people required visas.<\/p>\n<p>\u201cPeople didn\u2019t tend to travel in large droves from Pozna\u0144 to Doncaster in the past,\u201d says Sturge. \u201cUnfortunately for the statisticians, who hadn\u2019t even stationed anyone there to do the survey at the time, that was exactly what people started to do.\u201d<\/p>\n<p>Sturge says the UK has excellent <a href=\"https:\/\/urldefense.proofpoint.com\/v2\/url?u=https-3A__www.google.com_url-3Fclient-3Dinternal-2Delement-2Dcse-26cx-3D000538068201538516906-3Ayfye0gb-5Fe3i-26q-3Dhttps-3A__www.computerweekly.com_news_252521424_Health-2Ddata-2Dstrategy-2Dto-2Dexorcise-2Dghosts-2Dof-2DGPDPR-26sa-3DU-26ved-3D2ahUKEwje4tn3-5FIf8AhUDilwKHSuaCRcQFnoECAQQAg-26usg-3DAOvVaw2Q5SJfgbGhHslAktFWMbHY&#038;d=DwMFBA&#038;c=tEbGsWWjqkBSpaWdXc_mdMSanI1bDu-FKXiKGCfVmPM&#038;r=Ct-iuFti-_T7pu9VpvgUhAeYIEXajAFK1_-AkFenM5M&#038;m=CsOQlCvjat974jOt9OyDZdAS3_j_yohaNglc2LqWroVQuDbgs4ro9I4i3xHajPSZ&#038;s=H76V8zvWvN2S96uDWH1634LC6I1UJoYd-oCbz0WMgBs&#038;e=\">official data in some areas, including health<\/a>, traffic accident statistics and much of the ONS\u2019s output. The Office for Statistics Regulation maintains a list of <a href=\"https:\/\/uksa.statisticsauthority.gov.uk\/list-of-national-statistics\/\">approved national statistics<\/a>\u00a0which she describes as the gold standard.<\/p>\n<p>\u201cBut ultimately, if we\u2019re asked a question or we need to produce some briefing material on something and there is any data out there which seems remotely reliable, we will pretty much end up using it,\u201d she says of her work for MPs and their staff. \u201cFrom our perspective, it\u2019s about explaining the caveats.\u201d This means thinking about where data comes from, how it is collected and for what purpose, considering the human processes involved rather than just the technical matter of getting hold of it.<\/p>\n<section data-menu-title=\"Replication crisis\">\n<h3><i data-icon=\"1\"><\/i>Replication crisis<\/h3>\n<p>Parliamentarians are not alone in being hungry for data, and not too picky about what they consume. Recent years have seen several scientific fields threatened by a replication crisis, where the results of research published in peer-reviewed journals cannot be reproduced by others repeating the work, in some cases because the data has errors or is faked.<\/p>\n<p>Researchers who rely on such research data may find their work is undermined, but the risk can be lessened by using services that carry out reliability checks on papers. Healthcare journalist and academic Ivan Oransky co-founded <a href=\"https:\/\/retractionwatch.com\/\">Retraction Watch<\/a>, a database of scientific papers that have been withdrawn. Its data is used by publishers and companies to check references through bibliographic management software including EndNote, Papers and Zotero, as well as digital library service Third Iron. \u201cWe would be happy to work with more, and to have our database integrated into the manuscript management systems that publishers use,\u201d he says.<\/p>\n<p>However, he adds, the bigger problem lies in inaccurate papers and data that have not been retracted, making it worth using <a href=\"https:\/\/pubpeer.com\/\">post-publication review services such as PubPeer<\/a>, of which he is a volunteer director. More generally, he adds that researchers are well-advised to follow the Russian proverb, \u201ctrust, but verify\u201d, adopted by former US president Ronald Reagan in nuclear disarmament talks with the Soviet Union.<\/p>\n<p>Researchers should aim to obtain and analyse the original data before relying on it for a project or further research. \u201cThat may seem inefficient, but it\u2019s far better than being caught unaware when a project is much further along,\u201d says Oransky.<\/p>\n<p>Another approach is to improve the classification of scientific data, particularly that held in text. Neal Dunkinson, vice-president of solutions and professional services for semantic analytics company SciBite, says the word \u201chedgehog\u201d in a genetics paper may refer to the sonic hedgehog gene that helps control how bodies develop from embryos, named after the video game character, or it may refer to the small, spiny mammal in general.\u00a0<\/p>\n<p>Cambridge-based SciBite, which was bought by Dutch scientific publisher Elsevier in 2020, has developed a service to automate the tagging mentions of 40,000 genes to standard identities, making searches of papers, slides and electronic lab notebooks more precise. To do so, it has built lists of acronyms, alternative names and spellings, and common misspellings. As well as applying it to existing material, it offers a real-time option that prompts researchers to add tags through drop-down lists or the equivalent of a spellchecker.<\/p>\n<p>Dunkinson says that good-quality data in life sciences should be \u201cfair\u201d \u2013 findable, accessible, interoperable and reusable. \u201cWe don\u2019t at the moment critique the quality of the information written down \u2013 that\u2019s about repeatability in the experimental process \u2013 but how usable is that information, is it tagged properly, is it stored correctly, do people know where it is, is it in the right formats,\u201d he says.<\/p>\n<\/section>\n<section data-menu-title=\"Dependency chain in financial auditing\">\n<h3><i data-icon=\"1\"><\/i>Dependency chain in financial auditing<\/h3>\n<p>Financial auditing, like much scientific research, relies on other people\u2019s data. Organisations are responsible for their accounts, but auditors have to extract data so they can check its accuracy and integrity. London-based audit technology company Engine B has worked with the Institute of Chartered Accountants in England and Wales and audit firms to build a common data model that can extract material from common enterprise resource planning suite packages.<\/p>\n<p>The company\u2019s head of audit and ethics, Franki Hackett, says the system uses knowledge of common software and practices to present what it thinks will correctly transform a file so it can be loaded into this common model, but it remains wise to include human checks. \u201cYou can take the human out of the loop, but when you do, you quite often see errors in fidelity, or mistranslation of data or inappropriate transformation and loading,\u201d she says. \u201cKeeping a good balance between the machine and the human being is a critical part of that stage of data quality.\u201d<\/p>\n<p>If it has processed a previous version, Engine B\u2019s system flags any changes in the data\u2019s structure, such as new fields. Hackett says organisations tend to be weak at reviewing data processes after they have been set up, meaning that such changes get missed. \u201cAn \u2018if it ain\u2019t broke, don\u2019t fix it\u2019 mentality can miss that creeping brokeness,\u201d she says.<\/p>\n<p>Auditors working to decide if they can verify the accuracy and completeness of an organisation\u2019s financial records often compare two sets of data recording the same things, such as the general ledger with details of all transactions and the trial balance which summarises debits and credits. These should match up, but it\u2019s common to find discrepancies such as different dates for transactions, which can indicate poor controls. Hackett says she has seen senior financial professionals sticking their usernames and passwords on their monitors for others to use, risking outright fraud but also making mistakes more likely \u2013 and different dates in the two data sets can indicate attempts to fix such mistakes.<\/p>\n<p>In similar fashion, through academic research on tax transparency, Hackett has found that country-level data that a European directive requires some large companies to publish often doesn\u2019t tally with global figures. The parameters of the required national data are badly defined, she says: \u201cThey can produce something which is fundamentally kind-of unusable, a nonsense that\u2019s a public relations exercise a lot of the time.\u201d It demonstrates the need to know exactly what questions data collection is trying to answer.<\/p>\n<p>Waseem Ali, chief executive of diversity-focused consultancy training business Rockborne, previously worked as chief data officer for insurance market Lloyds of London and head of analytics for healthcare provider Virgin Care. For insurers, bad data can mean wrongly priced premiums, but in healthcare, it can mean failing to provide potentially life-saving advice.<\/p>\n<p>\u201cThere is a high likelihood that I will have some sort of heart disease, based on my family history and my ethnicity,\u201d says Ali. \u201cHaving the right quality data about me allows healthcare providers to intervene sooner, so they can ensure that someone like Waseem goes to the gym regularly and eats properly.\u201d As well as being in the interest of patients, data-driven predictive work could cut healthcare system costs by reducing the number of major interventions later.<\/p>\n<p>Ali says organisations can seek to improve data quality by understanding its end-to-end journey and focusing on the most business-critical material. Improvements can be made through simple changes such as standardising how teams calculate the likes of profit margins and customers\u2019 experience so these can be properly compared. \u201cI\u2019ve been in organisations where the same statistic is reported with different numbers due to the way it is being interpreted,\u201d he says.<\/p>\n<p>Anthony Scriffignano, chief data scientist of Dun &#038; Bradstreet, a Florida-based company that has published data on businesses for two centuries, sees four types of data quality: accuracy, completeness, timeliness and veracity. Completeness and timeliness are relatively easy to check, although a blank field can mean the data doesn\u2019t exist rather than it has been missed \u2013 such as because a business does not have a telephone number \u2013 and data collected today may have been created or updated some time previously.<\/p>\n<p>Checking accuracy is harder. In some cases, Dun &#038; Bradstreet can draw on official documents, but if there is no authoritative source, \u201cit becomes a little bit of an art\u201d, says Scriffignano. It can consider the reliability of the organisation providing information and whether numerical data is within likely ranges, although the latter needs to be. It may sound unlikely that a removal and storage provider is more than five centuries old, but as it says on its lorries, Aberdeen\u2019s Shore Porters Society was founded in 1498. The key is to have rigorous checking processes. \u201cYou can\u2019t just wing it,\u201d he says.<\/p>\n<p>The hardest of the four is veracity. Scriffignano points out that \u201cthe truth, the whole truth and nothing but the truth\u201d can be three different things, with the first broken by lying, the second broken by omission and the third only fulfilled through being entirely truthful.<\/p>\n<p>There are ways to check that a set of data satisfies all three, such as statistical analysis of its distribution. If a graph of a set of data would normally look like a bell curve with a high point in the middle and tapering sides, but instead only includes the high middle, it indicates that some data is being excluded \u2013 the truth but not the whole truth. Dun &#038; Bradstreet saw data on bankruptcies being warped during the Covid-19 pandemic, as smaller ones were missed or not reported.<\/p>\n<p>Despite all the ways that data can be tested, Scriffignano says the biggest problems are caused by organisations unintentionally ingesting data that has unknown issues. \u201cAs a consumer of data, depending on what you\u2019re doing with it, you probably should think about where you\u2019re getting it from and how you know that you trust it,\u201d he says.<\/p>\n<\/section>\n<\/section>\n<section id=\"DigDeeperSplash\">\n<h4>\n\t\t\t<i data-icon=\"m\"><\/i>Read more on Data quality management and governance<\/h4>\n<ul>\n<li><a id=\"DigDeeperItem-1\" href=\"https:\/\/www.techtarget.com\/searchhrsoftware\/news\/252513530\/HR-tech-spending-jumps-as-workloads-increase\"><br \/>\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineimages\/keyboard_g1077903946_searchsitetablet_520X173.jpg\" srcset=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineimages\/keyboard_g1077903946_searchsitetablet_520X173.jpg 960w,https:\/\/cdn.ttgtmedia.com\/rms\/onlineimages\/keyboard_g1077903946.jpg 1280w\" alt ><\/p>\n<h5>HR tech spending jumps as workloads increase<\/h5>\n<div>\n<p><img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineImages\/thibodeau_patrick.jpg\" alt=\"PatrickThibodeau\">\n\t\t\t\t\t\t\t\t\t<\/p>\n<p><span>By: <span>Patrick\u00a0Thibodeau<\/span><\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<p>\t\t\t\t<\/a><\/li>\n<li><a id=\"DigDeeperItem-2\" href=\"https:\/\/www.techtarget.com\/searcherp\/feature\/Financial-tools-for-business-breathe-life-into-transaction-data\"><br \/>\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/visuals\/searchCRM\/marketing\/CRM_article_010_searchsitetablet_520X173.jpg\" srcset=\"https:\/\/cdn.ttgtmedia.com\/visuals\/searchCRM\/marketing\/CRM_article_010_searchsitetablet_520X173.jpg 960w,https:\/\/cdn.ttgtmedia.com\/visuals\/searchCRM\/marketing\/CRM_article_010.jpg 1280w\" alt ><\/p>\n<h5>Financial tools for business breathe life into transaction data<\/h5>\n<div>\n<p><img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineImages\/rosencrance_linda.jpg\" alt=\"LindaRosencrance\">\n\t\t\t\t\t\t\t\t\t<\/p>\n<p><span>By: <span>Linda\u00a0Rosencrance<\/span><\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<p>\t\t\t\t<\/a><\/li>\n<li><a id=\"DigDeeperItem-3\" href=\"https:\/\/www.techtarget.com\/searchhrsoftware\/news\/252471866\/Fords-approach-to-employee-activism-listening\"><br \/>\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/visuals\/conferences\/conference_article_010_searchsitetablet_520X173.jpg\" srcset=\"https:\/\/cdn.ttgtmedia.com\/visuals\/conferences\/conference_article_010_searchsitetablet_520X173.jpg 960w,https:\/\/cdn.ttgtmedia.com\/visuals\/conferences\/conference_article_010.jpg 1280w\" alt ><\/p>\n<h5>Ford&#8217;s approach to employee activism: Listening<\/h5>\n<div>\n<p><img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineImages\/thibodeau_patrick.jpg\" alt=\"PatrickThibodeau\">\n\t\t\t\t\t\t\t\t\t<\/p>\n<p><span>By: <span>Patrick\u00a0Thibodeau<\/span><\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<p>\t\t\t\t<\/a><\/li>\n<li><a id=\"DigDeeperItem-4\" href=\"https:\/\/www.techtarget.com\/searchenterpriseai\/news\/252470776\/Gen-Z-in-the-workforce-both-want-and-fear-AI-and-automation\"><br \/>\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/visuals\/searchEnterpriseAI\/ai-enterprise-apps\/searchEnterpriseAI_027_searchsitetablet_520X173.png\" srcset=\"https:\/\/cdn.ttgtmedia.com\/visuals\/searchEnterpriseAI\/ai-enterprise-apps\/searchEnterpriseAI_027_searchsitetablet_520X173.png 960w,https:\/\/cdn.ttgtmedia.com\/visuals\/searchEnterpriseAI\/ai-enterprise-apps\/searchEnterpriseAI_027.png 1280w\" alt ><\/p>\n<h5>Gen Z in the workforce both want and fear AI and automation<\/h5>\n<div>\n<p><img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineImages\/labbe_mark.jpg\" alt=\"MarkLabbe\">\n\t\t\t\t\t\t\t\t\t<\/p>\n<p><span>By: <span>Mark\u00a0Labbe<\/span><\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<p>\t\t\t\t<\/a><\/li>\n<\/ul>\n<\/section>\n<\/div>\n<p><a href=\"https:\/\/www.computerweekly.com\/feature\/The-importance-of-improving-data-quality-at-source\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Becki Lupo<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By SA Mathieson Published: 29 Dec 2022 The UK can blame its bad immigration data on Hungary, one of the eight countries which joined the European Union in 2004. Unlike most existing EU countries, the UK government allowed its citizens to move and work without restrictions, expecting 5,000 to 13,000 people to arrive each year.<\/p>\n","protected":false},"author":1,"featured_media":592857,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[33201,28369,46],"tags":[],"class_list":{"0":"post-592856","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-importance","8":"category-improving","9":"category-technology"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/592856","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=592856"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/592856\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/592857"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=592856"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=592856"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=592856"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}