Developing the capacity to annotate massive volumes of data while maintaining quality is a function of the model development lifecycle that enterprises often underestimate. It’s resource intensive and requires specialized expertise.
At the heart of any successful machine learning/artificial intelligence (ML/AI) initiative is a commitment to high-quality training data and a pathway to quality data that is proven and well-defined. Without this quality data pipeline, the initiative is doomed to fail.
Computer vision or data science teams often turn to external partners to develop their data training pipeline, and these partnerships drive model performance.
There is no one definition of quality: “quality data” is completely contingent on the specific computer vision or machine learning project. However, there is a general process all teams can follow when working with an external partner, and this path to quality data can be broken down into four prioritized phases.
nnotation criteria and quality requirements
Training data quality is an evaluation of a data set’s fitness to serve its purpose in a given ML/AI use case.
The computer vision team needs to establish an unambiguous set of rules that describe what quality means in the context of their project. Annotation criteria are the collection of rules that define which objects to annotate, how to annotate them correctly, and what the quality targets are.
Accuracy or quality targets define the lowest acceptable result for evaluation metrics like accuracy, recall, precision, F1 score, et cetera. Typically, a computer vision team will have quality targets for how accurately objects of interest were classified, how accurately objects were localized, and how accurately relationships between objects were identified.
Workforce training and platform configuration
Platform configuration. Task design and workflow setup require time and expertise, and accurate annotation requires task-specific tools. At this stage, data science teams need a partner with expertise to help them determine how best to configure labeling tools, classification taxonomies, and annotation interfaces for accuracy and throughput.
Worker testing and scoring. To accurately label data, annotators need a well-designed
By: Ben Schneider
Title: Computer vision in AI: The data needed to succeed
Sourced From: www.technologyreview.com/2021/04/29/1023746/computer-vision-in-ai-the-data-needed-to-succeed/
Published Date: Thu, 29 Apr 2021 14:00:00 +0000
Did you miss our previous article…