Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new ways to increase sales, reduce costs, streamline business processes, and understand their customers better. AWS helps customers accelerate their AI/ML adoption by delivering powerful compute, high-speed networking, and scalable high-performance storage options on demand for any machine learning project. This lowers the barrier to entry for organizations looking to adopt the cloud to scale their ML applications.

Developers and data scientists are pushing the boundaries of technology and increasingly adopting deep learning, which is a type of machine learning based on neural network algorithms. These deep learning models are larger and more sophisticated resulting in rising costs to run underlying infrastructure to train and deploy these models.

To enable customers to accelerate their AI/ML transformation, AWS is building high-performance and low-cost machine learning chips. AWS Inferentia is the first machine learning chip built from the ground up by AWS for the lowest cost machine learning inference in the cloud. In fact, Amazon EC2 Inf1 instances powered by Inferentia, deliver 2.3x higher performance and up to 70% lower cost for machine learning inference than current generation GPU-based EC2 instances. AWS Trainium is the second machine learning chip by AWS that is purpose-built for training deep learning models and will be available in late 2021.

Customers across industries have deployed their ML applications in production on Inferentia and seen significant performance improvements and cost savings. For example, AirBnB’s customer support platform enables intelligent, scalable, and exceptional service experiences to its community of millions of hosts and guests across the globe. It used Inferentia-based EC2 Inf1 instances to deploy natural language processing (NLP) models that supported its chatbots. This led to a 2x improvement in performance out of the box over GPU-based instances.

With these innovations in silicon, AWS is enabling customers to train and execute their deep learning models in production easily with high performance and throughput at significantly lower costs.

Machine learning challenges speed shift to cloud-based infrastructure

Machine learning is an iterative process that requires teams to build, train, and deploy applications quickly, as well as train, retrain, and experiment frequently to increase the prediction accuracy of the models. When deploying trained models into their business applications, organizations need to also scale their applications to serve new users across the globe. They need to be able to serve multiple requests coming in at the same time with near real-time latency to ensure a superior user experience.

Emerging use cases such as object detection, natural language processing (NLP), image classification, conversational AI, and time series data rely on deep learning technology. Deep learning models are exponentially increasing in size and complexity, going from having millions of parameters to billions in a matter of a couple of years.

Training and deploying these complex and sophisticated models translates to significant infrastructure costs. Costs can quickly snowball to become prohibitively large as organizations scale their applications to deliver near real-time experiences to their users and customers.

This is where cloud-based machine learning infrastructure services can help. The cloud provides on-demand access to compute, high-performance networking, and large data storage, seamlessly combined with ML operations and higher level AI services, to enable organizations to get started immediately and scale their AI/ML initiatives. 

How AWS is helping customers accelerate their AI/ML transformation

AWS Inferentia and AWS Trainium aim to democratize machine learning and make it accessible to developers irrespective of experience and organization size. Inferentia’s design is optimized for high performance, throughput, and low latency, which makes it ideal for deploying ML inference at scale.

Each AWS Inferentia chip contains four NeuronCores that implement a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations, such as convolution and transformers. NeuronCores are also

Read More


By: Amazon Web Services
Title: High-performance, low-cost machine learning infrastructure is accelerating innovation in the cloud
Sourced From:
Published Date: Mon, 01 Nov 2021 16:05:06 +0000