Real-time rendering is helping one studio take virtual production to impossible heights.
In their latest project, the creators at Los Angeles-based company Impossible Objects were tasked with depicting an epic battle between characters from the upcoming video game, Diablo Immortal. But the showdown had to take place on the surface of a Google Pixel phone, set in the living room of a live actor.
The team at Impossible Objects brought this vision to life using accelerated virtual production workflows to blend visual effects with live action. Using Epic Games’ Unreal Engine and NVIDIA A6000-powered Dell Precision 7920 workstations, the team created all the stunning cinematics and graphics, from high-fidelity textures and reflections to realistic camera movement and lighting.
These advanced technologies helped the artists make instant creative decisions as they could view high-quality virtual imagery rendered in real time.
“We can build larger, photorealistic worlds and not worry about relying on outdated creative workflows,” said Joe Sill, founder of Impossible Objects. “With Unreal Engine and NVIDIA RTX-powered Dell Precision workstations, we brought these Diablo Immortal characters to life.”
Real-Time Technologies Deliver Impossible Results
Previously, to tackle a project like this, the Impossible Objects team would look at concept art and storyboards to get an idea of what the visuals were supposed to look like. But with virtual production, the creators can work in a nonlinear way, bridging the gap between imagination and the final high-resolution images faster than before.
For the Diablo Immortals project, Impossible Objects used Unreal Engine for previsualization — where the artists were able to make creative, intentional decisions because they were experiencing high-fidelity images in real time. Moreover, the previsualization happened simultaneously with the virtual art department and layout phases.
The team used NVIDIA A6000-powered Dell Precision 7920 workstations — an advanced combination that allowed the artists to enhance the virtual production and creative workflows. The A6000 GPU delivers 48 gigabytes of VRAM, a crucial spec when offline rendering in Unreal Engine. With more GPU memory, the team had room for more geometry and higher resolution textures.
“Rendering would not have been possible without the A6000 — we maxed out on its 48 gigs of memory, using all that room for textures, environments and geometry,” said Luc Delamare, head of Technology at Impossible Objects. “We could throw anything at the GPU, and we’d still have plenty of performance for real-time workflows.”
Typically, this project would have taken up to six months to complete. But the nonlinear approach enabled by the real-time pipeline allowed Impossible Objects to cut the production time in half.
The video game characters in the commercial were prebuilt and provided by Blizzard. Impossible Objects used Autodesk Maya to up-res the characters and scale them to perform better in a cinematic setting.
The team often toggled between compositing software, Autodesk Maya and Unreal Engine as they ported animation back and forth between the applications. And as the project started to get bigger, Impossible Objects turned to another solution: NVIDIA Deep Learning Super Sampling, an AI rendering technology that uses a neural network to boost frame rates and produce sharp images.
“NVIDIA DLSS was incredibly important, as we were able to use it in the real-time workflow, even with characters that had high polygon counts,” said Delamare. “This solution became really helpful, especially as the project started to get denser and denser.”
At the animation stage, Unreal Engine and NVIDIA RTX allowed the team to simultaneously update cinematography and lighting in real time. The end result was fewer department handoffs, which resulted in time saved and efficient creative communication gained.
With all of these advanced technologies combined, Impossible Objects had the power to create a more efficient, iterative process — one that allowed the team to ditch linear pipelines and instead take on a much more creative, collaborative workflow.
To learn more about the project, watch the video below:
Back in 2017, we began AIY Projects to make do-it-yourself artificial intelligence projects accessible to anybody. Our first project was the AIY Voice Kit, which allows you to build your own intelligent device that responds to voice commands. Then we released the AIY Vision Kit, which can recognize objects seen by its camera using on-device TensorFlow models. We were amazed by the projects people built with these kits and thrilled to see educational programs use them to introduce young engineers to the possibilities of computer science and machine learning (ML). So I’m excited to continue our mission to bring machine learning to everyone with the more powerful and more customizable AIY Maker Kit.
Making ML accessible to all
The Voice Kit and Vision Kit are a lot of fun to put together and they include great programs that demonstrate the possibilities of ML on a small device. However, they don’t provide the tools or procedures to help beginners achieve their own ML project ideas. When we released those kits in 2017, it was actually quite difficult to train an ML model, and getting a model to run on a device like a Raspberry Pi was even more challenging. Nowadays, if you have some experience with ML and know where to look for help, it’s not so surprising that you can train an object detection model in your web browser in less than an hour, or that you can run a pose detection model on a battery-powered device. But if you don’t have any experience, it can be difficult to discover the latest ML tools, let alone get started with them.
We intend to solve that with the Maker Kit. With this kit, we’re not offering any new hardware or ML tools; we’re offering a simplified workflow and a series of tutorials that use the latest tools to train TensorFlow Lite models and execute them on small devices. So it’s all existing technology, but better packaged so beginners can stop searching and start building incredible things right away.
Simplified tools for success
The material we’ve collected and created for the Maker Kit offers an end-to-end experience that’s ideal for educational programs and users who just want to make something with ML as fast as possible.
The hardware setup requires a Raspberry Pi, a Pi Camera, a USB microphone, and a Coral USB Accelerator so you can execute advanced vision models at high speed on the Coral Edge TPU. If you want your hardware in a case, we offer two DIY options: a 3D-printed case design or a cardboard case you can build using materials at home.
Once it’s booted up with our Maker Kit system image, just run some of our code examples and follow our coding tutorials. You’ll quickly discover how easy it is to accomplish amazing things with ML that were recently considered accessible only to experts, including object detection, pose classification, and speech recognition.
Our code examples use some pre-trained models and you can get more models that are accelerated on the Edge TPU from the Coral models library. However, training your own models allows you to explore all new project ideas. So the Maker Kit also offers step-by-step tutorials that show you how to collect your own datasets and train your own vision and audio models.
Last but not least, we want you to spend nearly all your time writing the code that’s unique to your project. So we created a Python library that reduces the amount of code needed to perform an inference down to a tiny part of your project. For example, this is how you can run an object detection model and draw labeled bounding boxes on a live camera feed:
from aiymakerkit import vision from aiymakerkit import utils import models
for frame in vision.get_frames(): objects = detector.get_objects(frame, threshold=0.4) vision.draw_objects(frame, objects, labels)
Our intent is to hide the code you don’t absolutely need. You still have access to structured inference results and program flow, but without any boilerplate code to handle the model.
This aiymakerkit library is built upon TensorFlow Lite and it’s available on GitHub, so we invite you to explore the innards and extend the Maker Kit API for your projects.
Getting started
We created the Maker Kit to be fully customizable for your projects. So rather than provide all the materials in a box with a predetermined design, we designed it with hardware that’s already available in stores (listed on our website) and with optional instructions to build your own case.
To get started, visit our website at g.co/aiy/maker, gather the required materials, flash our system image, and follow our programming tutorials to start exploring the possibilities. With this head start toward building smart applications that run entirely on an embedded system, we can’t wait to see what you will create.
Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize natural-sounding human speech. It is used in a variety of use cases, such as contact center systems, delivering conversational user experiences with human-like voices for automated real-time status check, automated account and billing inquiries, and by news agencies like The Washington Post to allow readers to listen to news articles.
As of today, Amazon Polly provides over 60 voices in 30+ language variants. Amazon Polly also uses context to pronounce certain words differently based upon the verb tense and other contextual information. For example, “read” in “I read a book” (present tense) and “I will read a book” (future tense) is pronounced differently.
However, in some situations you may want to customize the way Amazon Polly pronounces a word. For example, you may need to match the pronunciation with local dialect or vernacular. Names of things (e.g., Tomato can be pronounced as tom-ah-to or tom-ay-to), people, streets, or places are often pronounced in many different ways.
In this post, we demonstrate how you can leverage lexicons for creating custom pronunciations. You can apply lexicons for use cases such as publishing, education, or call centers.
Customize pronunciation using SSML tag
Let’s say you stream a popular podcast from Australia and you use the Amazon Polly Australian English (Olivia) voice to convert your script into human-like speech. In one of your scripts, you want to use words that are unknown to Amazon Polly voice. For example, you want to send Mātariki (Māori New Year) greetings to your New Zealand listeners. For such scenarios, Amazon Polly supports phonetic pronunciation, which you can use to achieve a pronunciation that is close to the correct pronunciation in the foreign language.
You can use the <phoneme> Speech Synthesis Markup Language (SSML) tag to suggest a phonetic pronunciation in the ph attribute. Let me show you how you can use <phoneme> SSML tag.
First, login into your AWS console and search for Amazon Polly in the search bar at the top. Select Amazon Polly and then choose Try Polly button.
In the Amazon Polly console, select Australian English from the language dropdown and enter following text in the Input text box and then click on Listen to test the pronunciation.
I’m wishing you all a very Happy Mātariki.
Sample speech without applying phonetic pronunciation:
If you hear the sample speech above, you can notice that the pronunciation of Mātariki – a word which is not part of Australian English – isn’t quite spot-on. Now, let’s look at how in such scenarios we can use phonetic pronunciation using <phoneme> SSML tag to customize the speech produced by Amazon Polly.
To use SSML tags, turn ON the SSML option in Amazon Polly console. Then copy and paste following SSML script containing phonetic pronunciation for Mātariki specified inside the ph attribute of the <phoneme> tag.
<speak>
I’m wishing you all a very Happy
<phoneme alphabet="x-sampa" ph="mA:.tA:.ri.ki">Mātariki</phoneme>.
</speak>
With the <phoneme> tag, Amazon Polly uses the pronunciation specified by the ph attribute instead of the standard pronunciation associated by default with the language used by the selected voice.
Sample speech after applying phonetic pronunciation:
If you hear the sample sound, you’ll notice that we opted for a different pronunciation for some of vowels (e.g., ā) to make Amazon Polly synthesize the sounds that are closer to the correct pronunciation. Now you might have a question, how do I generate the phonetic transcription “mA:.tA:.ri.ki” for the word Mātariki?
Amazon Polly offers support in two phonetic alphabets: IPA and X-Sampa. Benefit of X-Sampa is that they are standard ASCII characters, so it is easier to type the phonetic transcription with a normal keyboard. You can use either of IPA or X-Sampa to generate your transcriptions, but make sure to stay consistent with your choice, especially when you use a lexicon file which we’ll cover in the next section.
Each phoneme in the phoneme table represents a speech sound. The bolded letters in the “Example” column of the Phoneme/Viseme table in the Australian English page linked above represent the part of the word the “Phoneme” corresponds to. For example, the phoneme /j/ represents the sound that an Australian English speaker makes when pronouncing the letter “y” in “yes.”
Customize pronunciation using lexicons
Phoneme tags are suitable for one-off situations to customize isolated cases, but these are not scalable. If you process huge volume of text, managed by different editors and reviewers, we recommend using lexicons. Using lexicons, you can achieve consistency in adding custom pronunciations and simultaneously reduce manual effort of inserting phoneme tags into the script.
A good practice is that after you test the custom pronunciation on the Amazon Polly console using the <phoneme> tag, you create a library of customized pronunciations using lexicons. Once lexicons file is uploaded, Amazon Polly will automatically apply phonetic pronunciations specified in the lexicons file and eliminate the need to manually provide a <phoneme> tag.
Create a lexicon file
A lexicon file contains the mapping between words and their phonetic pronunciations. Pronunciation Lexicon Specification (PLS) is a W3C recommendation for specifying interoperable pronunciation information. The following is an example PLS document:
Make sure that you use correct value for the xml:lang field. Use en-AU if you’re uploading the lexicon file to use with the Amazon Polly Australian English voice. For a complete list of supported languages, refer to Languages Supported by Amazon Polly.
To specify a custom pronunciation, you need to add a <lexeme> element which is a container for a lexical entry with one or more <grapheme> element and one or more pronunciation information provided inside <phoneme> element.
The <grapheme> element contains the text describing the orthography of the <lexeme> element. You can use a <grapheme> element to specify the word whose pronunciation you want to customize. You can add multiple <grapheme> elements to specify all word variations, for example with or without macrons. The <grapheme> element is case-sensitive, and during speech synthesis Amazon Polly string matches the words inside your script that you’re converting to speech. If a match is found, it uses the <phoneme> element, which describes how the <lexeme> is pronounced to generate phonetic transcription.
You can also use <alias> for commonly used abbreviations. In the preceding example of a lexicon file, NZ is used as an alias for New Zealand. This means that whenever Amazon Polly comes across “NZ” (with matching case) in the body of the text, it’ll read those two letters as “New Zealand”.
You can save a lexicon file with as a .pls or .xml file before uploading it to Amazon Polly.
Upload and apply the lexicon file
Upload your lexicon file to Amazon Polly using the following instructions:
On the Amazon Polly console, choose Lexicons in the navigation pane.
Choose Upload lexicon.
Enter a name for the lexicon and then choose a lexicon file.
Choose the file to upload.
Choose Upload lexicon.
If a lexicon by the same name (whether a .pls or .xml file) already exists, uploading the lexicon overwrites the existing lexicon.
Now you can apply the lexicon to customize pronunciation.
Choose Text-to-Speech in the navigation pane.
Expand Additional settings.
Turn on Customize pronunciation.
Choose the lexicon on the drop-down menu.
You can also choose Upload lexicon to upload a new lexicon file (or a new version).
It’s a good practice to version control the lexicon file in a source code repository. Keeping the custom pronunciations in a lexicon file ensures that you can consistently refer to phonetic pronunciations for certain words across the organization. Also, keep in mind the pronunciation lexicon limits mentioned on Quotas in Amazon Polly page.
Test the pronunciation after applying the lexicon
Let’s perform quick test using “Wishing all my listeners in NZ, a very Happy Mātariki” as the input text.
We can compare the audio files before and after applying the lexicon.
Before applying the lexicon:
After applying the lexicon:
Conclusion
In this post, we discussed how you can customize pronunciations of commonly used acronyms or words not found in the selected language in Amazon Polly. You can use <phoneme> SSML tag which is great for inserting one-off customizations or testing purposes. We recommend using Lexicon to create a consistent set of pronunciations for frequently used words across your organization. This enables your content writers to spend time on writing instead of the tedious task of adding phonetic pronunciations in the script repetitively. You can try this in your AWS account on the Amazon Polly console.
Ratan Kumar is a Solutions Architect based out of Auckland, New Zealand. He works with large enterprise customers helping them design and build secure, cost-effective, and reliable internet scale applications using the AWS cloud. He is passionate about technology and likes sharing knowledge through blog posts and twitch sessions.
Maciek Tegi is a Principal Audio Designer and a Product Manager for Polly Brand Voices. He has worked in professional capacity in the tech industry, movies, commercials and game localization. In 2013, he was the first audio engineer hired to the Alexa Text-To- Speech team. Maciek was involved in releasing 12 Alexa TTS voices across different countries, over 20 Polly voices, and 4 Alexa celebrity voices. Maciek is a triathlete, and an avid acoustic guitar player.
Eyal Ben-Ari just took his first shot on a goal of bringing professional-class analytics to amateur soccer players.
The CEO of startup Track160, in Tel Aviv, has seen his company’s AI-powered sports analytics software tested and used in the big leagues. Now he’s turning his attention to underserved amateurs in the clubs and community teams he says make up “the bigger opportunity” among the world’s 250 million soccer players.
“Almost everyone in professional sports uses data analytics today. Now we’re trying to enable any team at any level to capture their own data and analytics, and the only way to do it is leveraging AI,” he said.
A Kickoff Down Under
In April, the company launched its Coach160 software in Australia, where it’s getting kudos from amateur soccer clubs in Victoria and Queensland. It uses computer vision to let teams automatically generate rich reports and annotated videos with an off-the-shelf camera and a connection to the cloud.
“The analysis and data provided by Track160 will prove a wonderful resource for our coaches and players,” said Vaughn Coveny, a retired pro soccer player now working with multiple youth teams in the region.
Startup With an AI Heritage
Miky Tamir, a serial entrepreneur in sports tech, co-founded Track160 in 2017. The company’s investors include the Deutsche Fussball Liga, Germany’s national soccer league, which contributed annotated datasets from several of its seasons.
“That helped set a baseline, then we applied transfer learning and developed an ever-growing internal database,” said Tamir Anavi, Track160’s CTO.
Using video from a single camera, the company’s software identifies and tracks players as 3D skeletons, then tags events and actions as they move.
“We use deep learning in every step to understand where the camera is, where the pitch is and where the players are on it,” Anavi said.
With that information, the software delivers detailed analytics and more. It constructs a 3D model so players and coaches can view any part of the game from any perspective, providing what Ben-Ari calls “a metaverse experience.”
Software Certified by the Pros
The Coach160 software got high scores for speed and accuracy in a benchmark for electronic tracking systems created by FIFA, the global federation of more than 200 pro soccer leagues. “We delivered the same performance as others who used six times more cameras,” said Anavi.
One pro league uses the code to get real-time data on game days. It processes 4K video streams with four NVIDIA GPUs and libraries that accelerate the work.
When it comes to AI, Track160 relies on NVIDIA TensorRT to make its models lean so they run fast.
“We couldn’t do inference without it. The work went from being impossible to running smoothly and that got our system from a prototype to production,” said Anavi.
Track160 frequently signed on as a member of NVIDIA Metropolis, a program for companies in intelligent video analytics. Ben Ari says he’ll tap the program’s early access to technology and expertise to accelerate his company’s growth.
Looking Beyond Oz
Australia was a natural first target given its penchant for new technology and large number of amateur soccer players and clubs, said Ben-Ari, who is already planning a launch in the U.S.
Long term, the company plans to train models for other sports, too.
“We see a kind of viral effect where everyone will want to have this,” he said.
“As a dad, I want to know what’s happening when my daughter plays, and even if they’re not pros, people want to know their performance,” said Ben-Ari, who likes to pour over his stats from triathlons.
Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology accelerates creative workflows.
Concept artist Pablo Muñoz Gómez dives In the NVIDIA Studio this week, showcasing artwork that depicts a fantastical myth.
Gómez, a creator based in Australia, is equally passionate about helping digital artists, teaching 3D classes and running the Zbrush guides website with his creative specialties: concept and character artistry.
“For me, everything starts with a story,” Muñoz Gómez said.
His 3D Forest Creature contains a fascinating myth. “The story of the forest creature is rather simple … a very small fantasy character that lives in the forest and spends his life balancing rocks, the larger the stones he manages to balance and stack on top of each other, the larger he’ll grow and the more invisible he’ll become. Eventually, he’ll reach a colossal size and disappear.”
Gómez begins his journey in a 2D app, Krita, with a preliminary sketch. The idea is to figure out how many 3D assets will be needed while adding a little bit of color as reference for the palette later on.
Next, Gómez moves to Zbrush, where he uses custom brushes to sculpt basic models for the creature, rocks and plants. It’s the first of multiple leaps in his 2D to 3D workflow, detailed in this two-part 3D Forest Creature tutorial.
Gómez then turns to Adobe Substance 3D Painter to apply various colors and materials directly to his 3D models. Here, the benefits of NVIDIA RTX acceleration shine. NVIDIA Iray technology in the viewport enables Gómez to edit in real time and use ray-traced baking for faster rendering speeds — all accelerated by his GeForce RTX 3090 GPU.
“Since I switched to the GeForce RTX 3090, I’m simply able to spend more time in the creative stages.”
Seeking further customization for his background, Gómez downloads and imports a grass asset from the Substance 3D asset library into Substance 3D Sampler, adjusting a few sliders to create a photorealistic material. RTX-exclusive interactive ray tracing lets Gómez apply realistic wear-and-tear effects in real time, powered by his GPU.
3D workflows can be incredibly demanding. As Gómez notes, the right GPU allows him to focus on content creation. “Since I switched to the GeForce RTX 3090, I’m simply able to spend more time in the ‘creative stages’ and testing things to refine my concept when I don’t have to wait for a render or worry about optimizing a scene so I can see it in real time,” he said.
Gómez sets up his scene in Marmoset 4, critically changing the denoiser from CPU to GPU. Doing so unlocks real-time ray tracing and smooth visuals in the viewport while he works. This can be done by accessing the Lighting then Ray Tracing selections in the main menu and changing the denoiser from CPU to GPU.
With the scene in a good place after some edits, Gómez generates his renders.
He makes final composition, lighting and color correction in Adobe Photoshop. With the addition of a new background, the scene is complete.
More 3D to Explore
Gómez has created several tutorials demonstrating 3D content creation techniques to aspiring artists. Check out this one on how to build a 3D scene from scratch.
Part one of the Studio Session, Creating Stunning 3D Crystals, offers an inside look at sketching and concepting in Krita and modeling in Zbrush, while part two focuses on baking in Adobe Substance 3D Painter and texturing in Marmoset Toolbag 4.
Generally, low-polygon models for 3D workflows are great to work with on hardware that can’t handle high-poly counts. Gómez’s Studio Session, Creating a 3D Low-Poly Floating Island, demonstrates how to build low-poly models like his Floating Island within Zbrush and touch up in Adobe Photoshop.
However, with the graphics horsepower and AI benefits of NVIDIA RTX and GeForce RTX GPUs, 3D artists can work with high-polygon models quickly and easily.
Learning how to create in 3D takes ingenuity, notes Gómez: “You become more resourceful making your tools work for you in the way you want, even if that means finding a better tool to solve a particular process.” But with enough practice, as seen from the variety of Gómez’s portfolio, the results can be stunning.
No one likes sitting at a red light. But signalized intersections aren’t just a minor nuisance for drivers; vehicles consume fuel and emit greenhouse gases while waiting for the light to change.
What if motorists could time their trips so they arrive at the intersection when the light is green? While that might be just a lucky break for a human driver, it could be achieved more consistently by an autonomous vehicle that uses artificial intelligence to control its speed.
In a new study, MIT researchers demonstrate a machine-learning approach that can learn to control a fleet of autonomous vehicles as they approach and travel through a signalized intersection in a way that keeps traffic flowing smoothly.
Using simulations, they found that their approach reduces fuel consumption and emissions while improving average vehicle speed. The technique gets the best results if all cars on the road are autonomous, but even if only 25 percent use their control algorithm, it still leads to substantial fuel and emissions benefits.
“This is a really interesting place to intervene. No one’s life is better because they were stuck at an intersection. With a lot of other climate change interventions, there is a quality-of-life difference that is expected, so there is a barrier to entry there. Here, the barrier is much lower,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in the Department of Civil and Environmental Engineering and a member of the Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS).
The lead author of the study is Vindula Jayawardana, a graduate student in LIDS and the Department of Electrical Engineering and Computer Science. The research will be presented at the European Control Conference.
Intersection intricacies
While humans may drive past a green light without giving it much thought, intersections can present billions of different scenarios depending on the number of lanes, how the signals operate, the number of vehicles and their speeds, the presence of pedestrians and cyclists, etc.
Typical approaches for tackling intersection control problems use mathematical models to solve one simple, ideal intersection. That looks good on paper, but likely won’t hold up in the real world, where traffic patterns are often about as messy as they come.
Wu and Jayawardana shifted gears and approached the problem using a model-free technique known as deep reinforcement learning. Reinforcement learning is a trial-and-error method where the control algorithm learns to make a sequence of decisions. It is rewarded when it finds a good sequence. With deep reinforcement learning, the algorithm leverages assumptions learned by a neural network to find shortcuts to good sequences, even if there are billions of possibilities.
This is useful for solving a long-horizon problem like this; the control algorithm must issue upwards of 500 acceleration instructions to a vehicle over an extended time period, Wu explains.
“And we have to get the sequence right before we know that we have done a good job of mitigating emissions and getting to the intersection at a good speed,” she adds.
But there’s an additional wrinkle. The researchers want the system to learn a strategy that reduces fuel consumption and limits the impact on travel time. These goals can be conflicting.
“To reduce travel time, we want the car to go fast, but to reduce emissions, we want the car to slow down or not move at all. Those competing rewards can be very confusing to the learning agent,” Wu says.
While it is challenging to solve this problem in its full generality, the researchers employed a workaround using a technique known as reward shaping. With reward shaping, they give the system some domain knowledge it is unable to learn on its own. In this case, they penalized the system whenever the vehicle came to a complete stop, so it would learn to avoid that action.
Traffic tests
Once they developed an effective control algorithm, they evaluated it using a traffic simulation platform with a single intersection. The control algorithm is applied to a fleet of connected autonomous vehicles, which can communicate with upcoming traffic lights to receive signal phase and timing information and observe their immediate surroundings. The control algorithm tells each vehicle how to accelerate and decelerate.
Their system didn’t create any stop-and-go traffic as vehicles approached the intersection. (Stop-and-go traffic occurs when cars are forced to come to a complete stop due to stopped traffic ahead). In simulations, more cars made it through in a single green phase, which outperformed a model that simulates human drivers. When compared to other optimization methods also designed to avoid stop-and-go traffic, their technique resulted in larger fuel consumption and emissions reductions. If every vehicle on the road is autonomous, their control system can reduce fuel consumption by 18 percent and carbon dioxide emissions by 25 percent, while boosting travel speeds by 20 percent.
“A single intervention having 20 to 25 percent reduction in fuel or emissions is really incredible. But what I find interesting, and was really hoping to see, is this non-linear scaling. If we only control 25 percent of vehicles, that gives us 50 percent of the benefits in terms of fuel and emissions reduction. That means we don’t have to wait until we get to 100 percent autonomous vehicles to get benefits from this approach,” she says.
Down the road, the researchers want to study interaction effects between multiple intersections. They also plan to explore how different intersection set-ups (number of lanes, signals, timings, etc.) can influence travel time, emissions, and fuel consumption. In addition, they intend to study how their control system could impact safety when autonomous vehicles and human drivers share the road. For instance, even though autonomous vehicles may drive differently than human drivers, slower roadways and roadways with more consistent speeds could improve safety, Wu says.
While this work is still in its early stages, Wu sees this approach as one that could be more feasibly implemented in the near-term.
“The aim in this work is to move the needle in sustainable mobility. We want to dream, as well, but these systems are big monsters of inertia. Identifying points of intervention that are small changes to the system but have significant impact is something that gets me up in the morning,” she says.
This work was supported, in part, by the MIT-IBM Watson AI Lab.
A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when it comes to professional language translation. Customers of translation systems and services expect accurate and highly customized outputs. To achieve this, they often reuse previous translation outputs—called translation memory (TM)—and compare them to new input text. In computer-assisted translation, this technique is known as fuzzy matching. The primary function of fuzzy matching is to assist the translator by speeding up the translation process. When an exact match can’t be found in the TM database for the text being translated, translation management systems (TMSs) often have the option to search for a match that is less than exact. Potential matches are provided to the translator as additional input for final translation. Translators who enhance their workflow with machine translation capabilities such as Amazon Translate often expect fuzzy matching data to be used as part of the automated translation solution.
In this post, you learn how to customize output from Amazon Translate according to translation memory fuzzy match quality scores.
Translation Quality Match
The XML Localization Interchange File Format (XLIFF) standard is often used as a data exchange format between TMSs and Amazon Translate. XLIFF files produced by TMSs include source and target text data along with match quality scores based on the available TM. These scores—usually expressed as a percentage—indicate how close the translation memory is to the text being translated.
Some customers with very strict requirements only want machine translation to be used when match quality scores are below a certain threshold. Beyond this threshold, they expect their own translation memory to take precedence. Translators often need to apply these preferences manually either within their TMS or by altering the text data. This flow is illustrated in the following diagram. The machine translation system processes the translation data—text and fuzzy match scores— which is then reviewed and manually edited by translators, based on their desired quality thresholds. Applying thresholds as part of the machine translation step allows you to remove these manual steps, which improves efficiency and optimizes cost.
Figure 1: Machine Translation Review Flow
The solution presented in this post allows you to enforce rules based on match quality score thresholds to drive whether a given input text should be machine translated by Amazon Translate or not. When not machine translated, the resulting text is left to the discretion of the translators reviewing the final output.
Solution Architecture
The solution architecture illustrated in Figure 2 leverages the following services:
One function preprocesses the quality match threshold configuration files and persists the data into Parameter Store
One function automatically creates the asynchronous translation jobs
Amazon Simple Queue Service – An Amazon SQS queue triggers the translation flow as a result of new files coming into the source bucket
Figure 2: Solution Architecture
You first set up quality thresholds for your translation jobs by editing a configuration file and uploading it into the fuzzy match threshold configuration S3 bucket. The following is a sample configuration in CSV format. We chose CSV for simplicity, although you can use any format. Each line represents a threshold to be applied to either a specific translation job or as a default value to any job.
default, 75
SourceMT-Test, 80
The specifications of the configuration file are as follows:
Column 1 should be populated with the name of the XLIFF file—without extension—provided to the Amazon Translate job as input data.
Column 2 should be populated with the quality match percentage threshold. For any score below this value, machine translation is used.
For all XLIFF files whose name doesn’t match any name listed in the configuration file, the default threshold is used—the line with the keyword default set in Column 1.
Figure 3: Auto-generated parameter in Systems Manager Parameter Store
When a new file is uploaded, Amazon S3 triggers the Lambda function in charge of processing the parameters. This function reads and stores the threshold parameters into Parameter Store for future usage. Using Parameter Store avoids performing redundant Amazon S3 GET requests each time a new translation job is initiated. The sample configuration file produces the parameter tags shown in the following screenshot.
The job initialization Lambda function uses these parameters to preprocess the data prior to invoking Amazon Translate. We use an English-to-Spanish translation XLIFF input file, as shown in the following code. It contains the initial text to be translated, broken down into what is referred to as segments, represented in the source tags.
The source text has been pre-matched with the translation memory beforehand. The data contains potential translation alternatives—represented as <alt-trans> tags—alongside a match quality attribute, expressed as a percentage. The business rule is as follows:
Segments received with alternative translations and a match quality below the threshold are untouched or empty. This signals to Amazon Translate that they must be translated.
Segments received with alternative translations with a match quality above the threshold are pre-populated with the suggested target text. Amazon Translate skips those segments.
Let’s assume the quality match threshold configured for this job is 80%. The first segment with 99% match quality isn’t machine translated, whereas the second segment is, because its match quality is below the defined threshold. In this configuration, Amazon Translate produces the following output:
In the second segment, Amazon Translate overwrites the target text initially suggested (Selección) with a higher quality translation: Visita de selección.
One possible extension to this use case could be to reuse the translated output and create our own translation memory. Amazon Translate supports customization of machine translation using translation memory thanks to the parallel data feature. Text segments previously machine translated due to their initial low-quality score could then be reused in new translation projects.
In the following sections, we walk you through the process of deploying and testing this solution. You use AWS CloudFormation scripts and data samples to launch an asynchronous translation job personalized with a configurable quality match threshold.
For ConfigBucketName, enter the S3 bucket containing the threshold configuration files.
For ParameterStoreRoot, enter the root path of the parameters created by the parameters processing Lambda function.
For QueueName, enter the SQS queue that you create to post new file notifications from the source bucket to the job initialization Lambda function. This is the function that reads the configuration file.
For SourceBucketName, enter the S3 bucket containing the XLIFF files to be translated. If you prefer to use a preexisting bucket, you need to change the value of the CreateSourceBucket parameter to No.
For WorkingBucketName, enter the S3 bucket Amazon Translate uses for input and output data.
Choose Next.
Figure 4: CloudFormation stack details
Optionally on the StackOptions page, add key names and values for the tags you may want to assign to the resources about to be created.
Choose Next.
On the Review page, select I acknowledge that this template might cause AWS CloudFormation to create IAM resources.
Review the other settings, then choose Create stack.
AWS CloudFormation takes several minutes to create the resources on your behalf. You can watch the progress on the Events tab on the AWS CloudFormation console. When the stack has been created, you can see a CREATE_COMPLETE message in the Status column on the Overview tab.
There should be two files: an .xlf file in XLIFF format, and a threshold configuration file with .cfg as the extension. The following is an excerpt of the XLIFF file.
Figure 5: English to French sample file extract
On the Amazon S3 console, upload the quality threshold configuration file into the configuration bucket you specified earlier.
The value set for test_En_to_Fr is 75%. You should be able to see the parameters on the Systems Manager console in the Parameter Store section.
Still on the Amazon S3 console, upload the .xlf file into the S3 bucket you configured as source. Make sure the file is under a folder named translate (for example, <my_bucket>/translate/test_En_to_Fr.xlf).
This starts the translation flow.
Open the Amazon Translate console.
A new job should appear with a status of In Progress.
Figure 6: In progress translation jobs on Amazon Translate console
Once the job is complete, click into the job’s link and consult the output. All segments should have been translated.
All segments should have been translated. In the translated XLIFF file, look for segments with additional attributes named lscustom:match-quality, as shown in the following screenshot. These custom attributes identify segments where suggested translation was retained based on score.
Figure 7: Custom attributes identifying segments where suggested translation was retained based on score
These were derived from the translation memory according to the quality threshold. All other segments were machine translated.
You have now deployed and tested an automated asynchronous translation job assistant that enforces configurable translation memory match quality thresholds. Great job!
Cleanup
If you deployed the solution into your account, don’t forget to delete the CloudFormation stack to avoid any unexpected cost. You need to empty the S3 buckets manually beforehand.
Conclusion
In this post, you learned how to customize your Amazon Translate translation jobs based on standard XLIFF fuzzy matching quality metrics. With this solution, you can greatly reduce the manual labor involved in reviewing machine translated text while also optimizing your usage of Amazon Translate. You can also extend the solution with data ingestion automation and workflow orchestration capabilities, as described in Speed Up Translation Jobs with a Fully Automated Translation System Assistant.
About the Authors
Narcisse Zekpa is a Solutions Architect based in Boston. He helps customers in the Northeast U.S. accelerate their adoption of the AWS Cloud, by providing architectural guidelines, design innovative, and scalable solutions. When Narcisse is not building, he enjoys spending time with his family, traveling, cooking, and playing basketball.
Dimitri Restaino is a Solutions Architect at AWS, based out of Brooklyn, New York. He works primarily with Healthcare and Financial Services companies in the North East, helping to design innovative and creative solutions to best serve their customers. Coming from a software development background, he is excited by the new possibilities that serverless technology can bring to the world. Outside of work, he loves to hike and explore the NYC food scene.
Federated learning has become a major area of machine learning (ML) research in recent years due to its versatility in training complex models over massive amounts of data without the need to share that data with a centralized entity. However, despite this flexibility and the amount of research already conducted, it’s difficult to implement due to its many moving parts—a significant deviation from traditional ML pipelines.
The challenges in working with federated learning result from the diversity of local data and end-node hardware, privacy concerns, and optimization constraints. These are compounded by the sheer volume of federated learning clients and their data and necessitates a wide skill set, significant interdisciplinary research efforts, and major engineering resources to manage. In addition, federated learning applications often need to scale the learning process to millions of clients to simulate a real-world environment. All of these challenges underscore the need for a simulation platform, one that enables researchers and developers to perform proof-of-concept implementations and validate performance before building and deploying their ML models.
There has been a lot of research in the last few years directed at tackling the many challenges in working with federated learning, including setting up learning environments, providing privacy guarantees, implementing model-client updates, and lowering communication costs. FLUTE addresses many of these while providing enhanced customization and enabling new research on a realistic scale. It also allows developers and researchers to test and experiment with certain scenarios, such as data privacy, communication strategies, and scalability, before implementing their ML model in a production framework.
One of FLUTE’s main benefits is its native integration with Azure ML workspaces, leveraging the platform’s features to manage and track experiments, parameter sweeps, and model snapshots. Its distributed nature is based on Python and PyTorch, and the flexibly designed client-server architecture helps researchers and developers quickly prototype novel approaches to federated learning. However, FLUTE’s key innovation and technological differentiator is the ease it provides in implementing new scenarios for experimentation in core areas of active research in a robust high-performance simulator.
FLUTE offers a platform where all clients are implemented as isolated object instances, as shown in Figure 1. The interface between the server and the remaining workers relies on messages that contain client IDs and training information, with MPI as the main communication protocol. Local data on each client stays within local storage boundaries and is never aggregated with other local sources. Clients only communicate gradients to the central server.
The following features contribute to FLUTE’s versatile framework and enable experimentation with new federated learning approaches:
Scalability: Scale is a critical factor in understanding practical metrics, such as convergence and privacy-utility tradeoffs. Researchers and developers can run large-scale experiments using tens of thousands of clients with a reasonable turnaround time.
Flexibility: FLUTE supports diverse federated learning configurations, including standardized implementations such as DGA and FedAvg.
Versatility: FLUTE’s generic API helps researchers and developers easily implement new models, datasets, metrics, and experimentation features, while its open architecture helps them add new algorithms in such areas as optimization, privacy, and robustness.
Available as an open-source platform
As part of this announcement, we’re making FLUTE available as a versatile open-source platform for rapid prototyping and experimentation. It comes with a set of basic tools to help kickstart experiments. We hope researchers and developers take advantage of this framework by exploring new approaches to federated learning.
Looking ahead
FLUTE’s innovative framework offers a new paradigm for implementing federated learning algorithms at scale, and this is just the beginning. We’re making improvements with the view toward making FLUTE the standard federated learning simulation platform. Future releases will include algorithmic enhancements in optimization and support for additional communication protocols. We’re also adding features to make it easier to set up experiments when including tailored features in new tasks and the ability to easily incorporate FLUTE as a library into Azure ML pipelines.
Additional resources
Check out this video for a deep dive into FLUTE architecture and a tutorial on how to use it. Our documentation also explains how to implement FLUTE.
You can learn more about the FLUTE project by visiting our project page, and discover more about our current federated learning research as well as other projects related to privacy in AI on our group page.
Matt Taddy, vice president of Amazon’s Private Brands business, is the coauthor of Modern Business Analytics: Practical Data Science for Decision Making, a primer for those who want to gain the skills to use data science to help make decisions in business and beyond.Read More