Encord, the platform for data-centric computer vision, has released an automated data quality assessment tool that automatically finds errors within annotated training data.
Currentmethods of collating training data for AI are inherently manual, and with millions of data points to analyse, errors are hard to detect. These errors feed into model training and create complications further down the computer vision process. If objects are labelled with the wrong classification such as dogs being labelled as cats, the model will get confused and pick up on “dog-like” qualities of what it thinks are cats.
Harnessing the power of automation, Encord’s data quality assessment tool replaces manual processes that make AI development expensive, time-consuming and difficult to scale. It streamlines the ground truth review process in a fully automated way without requiring additional client input and removing the manual steps for validation.
Annotations are ranked by probability of error that ultimately enables AI teams to build computer vision models based on good quality training data.
Backed by CRV, Y Combinator, WndrCo, Crane Venture Partners, Harpoon Ventures and Harvard Management Corporation. Encord has partnered with the likes of world-leading healthcare institutions including Kings College London where its tool increased the efficiency of annotating pre-cancerous polyp videos by an average of 6.4x and automated 97% of labels, making the most expensive clinician 16x more efficient at labelling medical images.
It has also worked with Memorial Sloan Kettering Cancer Centre and Stanford Medical Centre where use of the platform reduced experiment duration by 80%.
“With the rise of data-centric AI, the success of an AI initiative is largely dependent on the quality of training data and data pipeline. Even with the most sophisticated models in the world, AI teams cannot create an accurate AI application if there are errors in training data,” said Eric Landau, Co-Founder and CTO at Encord. “There are currently no tools on the market that automate the process of finding errors within training data. Our tool drastically improves computer vision datasets, which will ultimately improve computer vision models.”
The data quality assessment tool will slot into Encord’s existing platform. It provides powerful automation features that enable organisations to check the results of machine learning for accuracy and find ground truths.
“Companies that utilise computer vision models deal with an enormous amount of data. However, there are always errors within the data and a significant amount of effort goes into building internal manual processes to sift through the data, find errors and clean the data,” said Ashesh Jain, who was the Head of Autonomy at Lyft.
”A label quality assessment tool can potentially not only reduce cost and save time but, as you’re able to understand what a bad sample is, it also validates and improves the overall model. This is a win-win for anyone that is building a computer vision model based on annotated training data.“ he continued.