Raju Gulabani, VP, Databases, Analytics, and AI, AWS said “The combination of better algorithms and broad access to massive amounts of data and cost-effective computing power provided by the cloud is making AI a reality for application developers. We are excited to see how customers use Amazon Lex, Amazon Polly, and Amazon Rekognition to build a new generation of apps that have human-like intelligence and can see, hear, speak, and interact with people and their environments”
Lex
Lex is the machine learning technologies that powers Amazon Alexa, with the key components being Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). Lex can be used by developers to quickly make chat and voice bots, that can be integrated into services and applications. Lex is deployed as a fully managed service, requiring little time to set up, manage and scale.
The main concepts used by Amazon Lex. Image: Amazon.
The main concepts used by Amazon Lex. Image: Amazon.
Lex has an itnegrated development environment in a console. Developers can create bots, test them, and deploy them through the interface. There are some sample bots to start with as well. Bots built using Lex can be used on multiple platforms, and Amazon handles the authentication processes for different platforms. Lex can connect with Facebook Messenger as of now, but support for Slack and Twilio is being worked on. The Lex service is charged at the rate of $4 for 1000 speech requests, and $0.75 for 1000 text requests.
The testing interface for bots. Image: Amazon.
The testing interface for bots. Image: Amazon.
Benjamin Stein, Director of Messaging Products, Twilio said “Developers and businesses use Twilio to build apps that can communicate with customers in virtually every corner of the world. Amazon Lex will provide developers with an easy-to-use modular architecture and comprehensive APIs to enable building and deploying conversational bots on mobile platforms. We look forward to seeing what our customers build using Twilio and Amazon Lex.”
Polly
Polly is a cloud based text to speech service that generates human like voice based on a text string. The files can be downloaded as an mp3 for use in applications and services. Speech Synthesis Markup Language (SSML) is supported for advanced functionality, such as mixed language text. Developers can use SSML to indicate to Polly that some words in an English sentence are in French. There is a vast language end region menu, with support for five regional accents for English, including Indian. There are two alternative accents each for French, Portuguese and Spanish.
amazon-aws-polly_talk_1
Polly supports plain text or SSML. Image: Amazon.
The SDK or console can be used to send text to Polly, which then converts it to speech in the cloud and beams it back. The service can be integrated into e-book readers, personal assistants, entertainment apps, public service announcement systems, or e-learning platforms. Polly can handle high volumes of text rapidly as well. Polly can return input over a command line interface as well.
some of the accents and languages available in Polly. Image: Amazon.
some of the accents and languages available in Polly. Image: Amazon.
Joseph Price, Senior Product Manager, The Washington Post said “We’ve long been interested in providing audio versions of our stories, but have found that existing text-to-speech solutions are not cost-effective for the speech quality they offer. With the arrival of Amazon Polly and its high-quality voices, we look forward to offering readers more rich and versatile ways to experience our content.”
Rekognition
Rekognition is an image analysis artificial intelligence. Rekognition can be used to recognise faces, objects and scenes in an image. The AI delivers a confidence score for each identification, which is a rating of how accurate the identification is likely to be. These confidence scores can be further processed by an app or a service. There are advanced facial analysis functionalities such as face comparison, and face search.
Rekognition gives scores on a picture, confidently identifying the image as that of a "dog" and a "pet". Image: Amazon.
Rekognition gives scores on a picture, confidently identifying the image as that of a “dog” and a “pet”. Image: Amazon.
Some of the capabilities of Rekognition include assessing if the mouth of a person is open or shut, whether or not they are smiling, if they are happy, whether they are wearing sunglasses and identify the presence or lack of facial hair. Applications for Rekognition include security services, smart marketing implementations that track user engagements, or automatic indexing and tagging for vast image libraries.
AWS has advanced features for processing images of faces. Image: Amazon.
AWS has advanced features for processing images of faces. Image: Amazon.
Don MacAskill, Co-Founder, Chief Executive Officer, and Chief Geek, SmugMug said “SmugMug customers want to spend their time making more memories, not manually managing their photo collection. Amazon Rekognition will allow us to automatically identify the content in customers’ photos, unlocking a host of features that will allow them and their visitors to have more time to focus on enjoying life and celebrating their photos.”
The three new AI services are scalable and cost effective, with developers paying for only what they use. Amazon has simplified access to neural networks, data required for training, and expertise in machine learning. The heavy lifting is done by Amazon already, with the artificial intelligences trained for a wide variety of scenarios. Developers can directly start using the AI without building machine learning algorithms, training the AI with models, or commit to infrastructure investments up front.