Abstracts: September 13, 2023

Abstracts: September 13, 2023

Microsoft Research Podcast - Abstracts

Episode 148 | September 13, 2023

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.  

In the inaugural episode of the series, Dr. Ava Amini and Dr. Kevin K. Yang, both Senior Researchers with Microsoft Health Futures, join host Dr. Gretchen Huizinga to discuss “Protein generation with evolutionary diffusion: Sequence is all you need.” The paper introduces EvoDiff, a suite of models that leverages evolutionary-scale protein data to help design novel proteins more efficiently. Improved protein engineering has the potential to help create new vaccines to prevent disease and new ways to recycle plastics.

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract!—of their new and noteworthy papers.

[MUSIC FADES]

Today, I’m talking to Dr. Ava Amini and Dr. Kevin Yang, both senior researchers at Microsoft Health Futures. Ava and Kevin are co-authors of a paper titled “Protein generation with evolutionary diffusion: Sequence is all you need,” and a preprint of the paper is available now on bioRxiv. Ava and Kevin, thanks for joining us on Abstracts

KEVIN YANG: Thanks for having us. 

AVA AMINI: Thank you so much. 

HUIZINGA: So, Kevin, in just a couple sentences, tell us what problem this research addresses and why people should care.


YANG: Yeah, so proteins are this really big, important family of biomolecules, and they’re responsible for a lot of cellular processes. For example, hemoglobin carries oxygen in your blood, and insulin regulates your blood sugar levels. And people are interested in generating new proteins to do things that people care about—not necessarily in our bodies, but we’re interested in proteins as industrial enzymes so for catalysis and to make new chemicals or for therapeutics to make new drugs. And as a step towards this goal, we train a suite of models that we call EvoDiff that learns to generate realistic but novel proteins. So proteins do a lot of useful things in nature, but we can really expand their repertoire to do things that people care about but that nature may not really care about. One really good historical example of this is that most of our modern laundry detergents contain enzymes that break down things that stain your clothes. And these enzymes were based on natural proteins, but natural proteins don’t work under high heat. They don’t work in detergent. So somebody engineered those to work in the conditions of our washing machine. And they work really well nowadays. Looking forward, we look at some of the challenges facing our world, such as sustainability. So some really big things people are working on now are things like enzymes that can break down plastic and help us recycle plastic or enzymes that can perform photosynthesis more efficiently. And then on the other side, there’s therapeutics, and an obvious example there is vaccine design. So designing vaccines quickly and safely for new diseases as they emerge.  

HUIZINGA: Ava, how does your approach build on or differ from what’s been done previously in this field? 

AMINI: Yeah, so we call our approach EvoDiff, and EvoDiff has two components. The first, Evo, refers to evolutionary, and the second, Diff, refers to this notion of diffusion. And the two things that make our approach cool and powerful is the fact that we are leveraging data about proteins that is at an evolutionary scale in terms of the size and the diversity of the datasets about natural proteins that we use. And specifically, we use that data to build a type of AI model that is called a diffusion model. Now, for a little backstory on this, a few years ago, we in the AI community learned that we can do really well in generating brand-new images by taking natural images, adding small amounts of noise to them, corrupting them, and then training an AI model called a diffusion model to remove that noise. And so what we’ve done in this paper is that we have constructed and trained these diffusion models to do the same kind of process on protein data at evolutionary scale. 

HUIZINGA: Kevin, back to you, let’s go a little deeper on methodology. How did you do this?

YANG: Yeah, so we really wanted to do this in a protein sequence space. So in protein biology, you have sequences of amino acids. So that’s a series of amino acid monomers that form a chain, and then that chain folds oftentimes into a 3D structure. And function is usually mediated by that 3D structure. Unfortunately, it’s difficult and can be slow and expensive to obtain experimental structures for all these proteins. And so previous diffusion models of proteins have really focused on generating a three-dimensional structure. And then you can use some other method to find a sequence that will fold to that structure. But what we really wanted to do was generate proteins directly as sequences because it’s much easier to get sequences than it is to get structure. So there’s many, many more sequences out there than there are structures. And we know that deep learning methods scale really well as you increase the size and quality of the datasets they’re trained on. And so we … and by we, it’s me and Ava but also Nitya Thakkar, who was an undergraduate intern last summer with me and Ava, and then Sarah Alamdari, our data scientist, who also did a lot of the hands-on programming for this. And then we also got a lot of help from Rianne van den Berg, who is at AI4Science, and then Alex Lu and Nicolò Fusi, also here in New England. So we went and got these large, diverse, evolutionary datasets of protein sequences, and then we used a deep learning framework called PyTorch to train these diffusion models. And then we do a lot of computational experiments to see whether they do the things we want them to do, which Ava, I think, will talk about next. 

HUIZINGA: Right. Right. So, Ava, yes, what were your major findings?

AMINI: Yeah, the first question we really asked was, can our method, EvoDiff, generate proteins that are new, that are realistic, and that are diverse, meaning they’re not similar to proteins that exist in nature but still are realistic? And so what we found was that indeed, we can do this, and we can do this really well. In fact, the generated proteins from our method show a better coverage of the whole landscape of structural features, functional features, and features in sequence space that exist amongst proteins in nature. And so that was our first really exciting result, that we could generate proteins that were really of high quality using our method. The second thing we asked was, OK, now if we give some context to the model, a little bit of information, can we guide the generation to fulfill particular properties that we want to see in that protein? And so specifically here, we experimented with two types of experiments where first, we can give a part of the protein to the model, let’s say, a part of the protein that binds to another protein. And we hold that part constant and ask the model to generate the sequence around that. And we see that we can do really well on this task, as well. And why that’s important is because it means we can now design new proteins that meet some criteria that we, the users, want the protein to have. For example, the ability to bind to something else. And finally, the last really exciting result was … one point that we’ve talked about is why we want to do this generation in sequence space rather than structure—because structure is difficult, it’s expensive, and there are particular types of proteins that don’t actually end up folding into a final 3D structure. They’re what we call disordered. And these types of disordered proteins have really, really important roles in biology and in disease. And so what we show is that because we do our generation and design in protein sequence space, we can actually generate these types of disordered proteins that are completely inaccessible to methods that rely on using information about the protein’s 3D shape. 

HUIZINGA: So, Kevin, building on Ava’s description there of the structure and sequence space, how is your work significant in terms of real-world impact? 

YANG: Right, so there’s a lot of interest in designing or generating new proteins that do useful things as therapeutics or as industrial catalysts and for a lot of other things, as well. And what our work really does is it gives us a method that can reliably generate high-quality proteins directly in sequence space. And this is good because now we can leverage evolutionary-scale data to do this on any downstream protein engineering problem without relying on a structure-based design or structure-based data. And we’re hoping that this opens up a lot of possibilities for protein engineering, protein design, and we’re really excited about some new experimental work that we—and we hope others—will use to build on this method.

HUIZINGA: Are you guys the first to move into the evolutionary scale in this? Is that a differentiator for your work? 

YANG: So there have been a few other preprints or papers that talk about applying diffusion to protein sequences. The difference here is that, yes, like I said, we’re the first ones to do this at evolutionary scale. So people will also train these models on small sets of related protein sequences. For example, you might go look for an enzyme family and find all the sequences in nature of that family and train a model to generate new examples of that enzyme. But what we’re doing is we’re looking at data that’s from all different species and all different functional classes of proteins and giving us a model that is hopefully universal or as close to universal as we can get for protein sequence space. 

HUIZINGA: Wow. Ava, if there was one thing you want listeners to take away from this work, what would it be? 

AMINI: If there’s one thing to take away, I think it would be this idea that we can and should do protein generation over sequence because of the generality we’re able to achieve, the scale that we’re able to achieve, and the modularity and that our diffusion framework gives us the ability to do that and also to control how we design these proteins to meet specific functional goals. 

HUIZINGA: So, Kevin, to kind of wrap it up, I wonder if you could address what unanswered questions still remain, or unsolved problems in this area, and what’s next on your research agenda. 

YANG: So there’s kind of two directions we want to see here. One is, we want to test better ideas for conditioner models. And what I mean there is we want to feed in text or a desired chemical reaction or some other function directly and have it generate those things that will then go work in the lab. And that’s a really big step up from just generating sequences that work and are novel. And two is, in biology and in protein engineering, models are really good, but what really matters is, do things work in the lab? So we are actually looking to do some of our own experiments to see if the proteins we generate from EvoDiff work as desired in the lab. 

[MUSIC PLAYS]

HUIZINGA: Ava Amini and Kevin Yang, thanks so much for joining us today, and to our listeners, thanks for tuning in. If you’re interested in learning more about the paper, you can find a link at aka.ms/abstracts or you can find a preprint of the paper on bioRxiv. See you next time on Abstracts!

The post Abstracts: September 13, 2023 appeared first on Microsoft Research.

Read More

Unlocking the Language of Genomes and Climates: Anima Anandkumar on Using Generative AI to Tackle Global Challenges

Unlocking the Language of Genomes and Climates: Anima Anandkumar on Using Generative AI to Tackle Global Challenges

Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research.

Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Technology.

At the talk, Anandkumar says that generative AI was described as “an inflection point in our lives,” with discussions swirling around how to “harness it to benefit society and humanity through scientific applications.”

On the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Anandkumar on generative AI’s potential to make splashes in the scientific community.

It can, for example, be fed DNA, RNA, viral and bacterial data to craft a model that understands the language of genomes. That model can help predict dangerous coronavirus variants to accelerate drug and vaccine research.

Generative AI can also predict extreme weather events like hurricanes or heat waves. Even with an AI boost, trying to predict natural events is challenging because of the sheer number of variables and unknowns.

“Those are the aspects we’re working on at NVIDIA and Caltech, in collaboration with many other organizations, to say, ‘How do we capture the multitude of scales present in the natural world?’” she said. “With the limited data we have, can we hope to extrapolate to finer scales? Can we hope to embed the right constraints and come up with physically valid predictions that make a big impact?”

Anandkumar adds that to ensure AI models are responsibly and safely used, existing laws must be strengthened to prevent dangerous downstream applications.

She also talks about the AI boom, which is transforming the role of humans across industries, and problems yet to be solved.

“This is the research advice I give to everyone: the most important thing is the question, not the answer,” she said.

You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games
A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry
Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs
Luis Voloch talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.

Read More

World scale inverse reinforcement learning in Google Maps

World scale inverse reinforcement learning in Google Maps

Routing in Google Maps remains one of our most helpful and frequently used features. Determining the best route from A to B requires making complex trade-offs between factors including the estimated time of arrival (ETA), tolls, directness, surface conditions (e.g., paved, unpaved roads), and user preferences, which vary across transportation mode and local geography. Often, the most natural visibility we have into travelers’ preferences is by analyzing real-world travel patterns.

Learning preferences from observed sequential decision making behavior is a classic application of inverse reinforcement learning (IRL). Given a Markov decision process (MDP) — a formalization of the road network — and a set of demonstration trajectories (the traveled routes), the goal of IRL is to recover the users’ latent reward function. Although past research has created increasingly general IRL solutions, these have not been successfully scaled to world-sized MDPs. Scaling IRL algorithms is challenging because they typically require solving an RL subroutine at every update step. At first glance, even attempting to fit a world-scale MDP into memory to compute a single gradient step appears infeasible due to the large number of road segments and limited high bandwidth memory. When applying IRL to routing, one needs to consider all reasonable routes between each demonstration’s origin and destination. This implies that any attempt to break the world-scale MDP into smaller components cannot consider components smaller than a metropolitan area.

To this end, in “Massively Scalable Inverse Reinforcement Learning in Google Maps“, we share the result of a multi-year collaboration among Google Research, Maps, and Google DeepMind to surpass this IRL scalability limitation. We revisit classic algorithms in this space, and introduce advances in graph compression and parallelization, along with a new IRL algorithm called Receding Horizon Inverse Planning (RHIP) that provides fine-grained control over performance trade-offs. The final RHIP policy achieves a 16–24% relative improvement in global route match rate, i.e., the percentage of de-identified traveled routes that exactly match the suggested route in Google Maps. To the best of our knowledge, this represents the largest instance of IRL in a real world setting to date.

Google Maps improvements in route match rate relative to the existing baseline, when using the RHIP inverse reinforcement learning policy.

The benefits of IRL

A subtle but crucial detail about the routing problem is that it is goal conditioned, meaning that every destination state induces a slightly different MDP (specifically, the destination is a terminal, zero-reward state). IRL approaches are well suited for these types of problems because the learned reward function transfers across MDPs, and only the destination state is modified. This is in contrast to approaches that directly learn a policy, which typically require an extra factor of S parameters, where S is the number of MDP states.

Once the reward function is learned via IRL, we take advantage of a powerful inference-time trick. First, we evaluate the entire graph’s rewards once in an offline batch setting. This computation is performed entirely on servers without access to individual trips, and operates only over batches of road segments in the graph. Then, we save the results to an in-memory database and use a fast online graph search algorithm to find the highest reward path for routing requests between any origin and destination. This circumvents the need to perform online inference of a deeply parameterized model or policy, and vastly improves serving costs and latency.

Reward model deployment using batch inference and fast online planners.

Receding Horizon Inverse Planning

To scale IRL to the world MDP, we compress the graph and shard the global MDP using a sparse Mixture of Experts (MoE) based on geographic regions. We then apply classic IRL algorithms to solve the local MDPs, estimate the loss, and send gradients back to the MoE. The worldwide reward graph is computed by decompressing the final MoE reward model. To provide more control over performance characteristics, we introduce a new generalized IRL algorithm called Receding Horizon Inverse Planning (RHIP).

IRL reward model training using MoE parallelization, graph compression, and RHIP.

RHIP is inspired by people’s tendency to perform extensive local planning (“What am I doing for the next hour?”) and approximate long-term planning (“What will my life look like in 5 years?”). To take advantage of this insight, RHIP uses robust yet expensive stochastic policies in the local region surrounding the demonstration path, and switches to cheaper deterministic planners beyond some horizon. Adjusting the horizon H allows controlling computational costs, and often allows the discovery of the performance sweet spot. Interestingly, RHIP generalizes many classic IRL algorithms and provides the novel insight that they can be viewed along a stochastic vs. deterministic spectrum (specifically, for H=∞ it reduces to MaxEnt, for H=1 it reduces to BIRL, and for H=0 it reduces to MMP).

Given a demonstration from so to sd, (1) RHIP follows a robust yet expensive stochastic policy in the local region surrounding the demonstration (blue region). (2) Beyond some horizon H, RHIP switches to following a cheaper deterministic planner (red lines). Adjusting the horizon enables fine-grained control over performance and computational costs.

Routing wins

The RHIP policy provides a 15.9% and 24.1% lift in global route match rate for driving and two-wheelers (e.g., scooters, motorcycles, mopeds) relative to the well-tuned Maps baseline, respectively. We’re especially excited about the benefits to more sustainable transportation modes, where factors beyond journey time play a substantial role. By tuning RHIP’s horizon H, we’re able to achieve a policy that is both more accurate than all other IRL policies and 70% faster than MaxEnt.

Our 360M parameter reward model provides intuitive wins for Google Maps users in live A/B experiments. Examining road segments with a large absolute difference between the learned rewards and the baseline rewards can help improve certain Google Maps routes. For example:

Nottingham, UK. The preferred route (blue) was previously marked as private property due to the presence of a large gate, which indicated to our systems that the road may be closed at times and would not be ideal for drivers. As a result, Google Maps routed drivers through a longer, alternate detour instead (red). However, because real-world driving patterns showed that users regularly take the preferred route without an issue (as the gate is almost never closed), IRL now learns to route drivers along the preferred route by placing a large positive reward on this road segment.

Conclusion

Increasing performance via increased scale – both in terms of dataset size and model complexity – has proven to be a persistent trend in machine learning. Similar gains for inverse reinforcement learning problems have historically remained elusive, largely due to the challenges with handling practically sized MDPs. By introducing scalability advancements to classic IRL algorithms, we’re now able to train reward models on problems with hundreds of millions of states, demonstration trajectories, and model parameters, respectively. To the best of our knowledge, this is the largest instance of IRL in a real-world setting to date. See the paper to learn more about this work.

Acknowledgements

This work is a collaboration across multiple teams at Google. Contributors to the project include Matthew Abueg, Oliver Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O’Banion, Ryan Epp, Renaud Hartert, Rui Song, Thomas Sharp, Rémi Robert, Zoltan Szego, Beth Luan, Brit Larabee and Agnieszka Madurska.

We’d also like to extend our thanks to Arno Eigenwillig, Jacob Moorman, Jonathan Spencer, Remi Munos, Michael Bloesch and Arun Ahuja for valuable discussions and suggestions.

Read More

NVIDIA Lends Support to Washington’s Efforts to Ensure AI Safety

NVIDIA Lends Support to Washington’s Efforts to Ensure AI Safety

In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy.

The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, NVIDIA founder and CEO Jensen Huang will join other industry leaders in a closed-door meeting on AI Wednesday with the full Senate.

Seven companies including Adobe, IBM, Palantir and Salesforce joined NVIDIA in supporting the eight agreements the Biden-Harris administration released in July with support from Amazon, Anthropic, Google, Inflection, Meta, Microsoft and OpenAI.

The commitments are designed to advance common standards and best practices to ensure the safety of generative AI systems until regulations are in place, the White House said. They include:

  • Testing the safety and capabilities of AI products before they’re deployed,
  • Safeguarding AI models against cyber and insider threats, and
  • Using AI to help meet society’s greatest challenges, from cancer to climate change.

Dally Shares NVIDIA’s Experience

In his testimony, Dally told the Senate subcommittee that government and industry should balance encouraging innovation in AI with ensuring models are deployed responsibly.

The subcommittee’s hearing, “Oversight of AI: Rules for Artificial Intelligence,” is among actions from policymakers around the world trying to identify and address potential risks of generative AI.

Earlier this year, the subcommittee heard testimonies from leaders of Anthropic, IBM and OpenAI, as well as academics such as Yoshua Bengio, a University of Montreal professor considered one of the godfathers of AI.

Dally, who leads a global team of more than 300 at NVIDIA Research, shared the witness table on Tuesday with Brad Smith, Microsoft’s president and vice chair. Dally’s testimony briefly encapsulated NVIDIA’s unique role in the evolution of AI over the last two decades.

How Accelerated Computing Sparked AI

He described how NVIDIA invented the GPU in 1999 as a graphics processing unit, then fit it for a broader role in parallel processing in 2006 with the CUDA programming software. Over time, developers across diverse scientific and technical computing fields found this new form of accelerated computing could significantly advance their work.

Along the way, researchers discovered GPUs also were a natural fit for AI’s neural networks, because they require massive parallel processing.

In 2012, the AlexNet model, trained on two NVIDIA GPUs, demonstrated human-like capabilities in image recognition. That result helped spark a decade of rapid advances using GPUs, leading to ChatGPT and other generative AI models used by hundreds of millions worldwide.

Today, accelerated computing and generative AI are showing the potential to transform industries, address global challenges and profoundly benefit society, said Dally, who chaired Stanford University’s computer science department before joining NVIDIA.

AI’s Potential and Limits

In written testimony, Dally provided examples of how AI is empowering professionals to do their jobs better than they might have imagined in fields as diverse as business, healthcare and climate science.

Like any technology, AI products and services have risks and are subject to existing laws and regulations that aim to mitigate those risks.

Industry also has a role to play in deploying AI responsibly. Developers set limits for AI models when they train them and define their outputs.

Dally noted that NVIDIA released in April NeMo Guardrails, open-source software developers can use to guide generative AI applications in producing accurate, appropriate and secure text responses. He said that NVIDIA also maintains internal risk-management guidelines for AI models.

Eyes on the Horizon

Making sure that new and exceptionally large AI models are accurate and safe is a natural role for regulators, Dally suggested.

Picture of Sen Blumenthal welcoming Dally to the hearing
Subcommittee chair Sen. Richard Blumenthal (D-CT) welcomed Dally to the hearing.

He said that these “frontier” models are being developed at a gigantic scale. They exceed the capabilities of ChatGPT and other existing models that have already been well-explored by developers and users.

Dally urged the subcommittee to balance thoughtful regulation with the need to encourage innovation in an AI developer community that includes thousands of startups, researchers and enterprises worldwide. AI tools should be widely available to ensure a level playing field, he said.

During questioning, Senator Amy Klobuchar (D-MN) asked Dally why NVIDIA announced in March it’s working with Getty Images.

“At NVIDIA, we believe in respecting people’s intellectual property rights,” Dally replied. “We partnered with Getty to train large language models with a service called Picasso, so people who provided the original content got remunerated.”

In closing, Dally reaffirmed NVIDIA’s dedication to innovating generative AI and accelerated computing in ways that serve the best interests of all.

Read More

Mobility Gets Amped: IAA Show Floor Energized by Surge in EV Reveals, Generative AI

Mobility Gets Amped: IAA Show Floor Energized by Surge in EV Reveals, Generative AI

Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich.

NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for optimizing manufacturing…to accelerating AV development with simulation…to enhancing retail experiences.

The gen AI message was also shared just ahead of the show in a fireside chat at NVIDIA headquarters with NVIDIA VP of Automotive Ali Kani and Aakash Arora, managing director and partner at Boston Consulting Group, who discussed the rapid pace of innovation, and how genAI will improve in-car experiences and transform the way vehicles are designed, manufactured and sold.

Electric Vehicles Dominate the Show Floor 

The auto industry’s move toward electrification was on full display at IAA, with a number of global automakers showcasing their current and upcoming electric mobility lineup.

Mercedes-Benz took the wraps off its Concept CLA Class, giving visitors insight into the brand’s future vision for the entry-level segment.

Designed on the upcoming Mercedes-Benz Modular Architecture (MMA) platform, the exterior of the Concept CLA Class teases an iconic design and evokes dynamic performance. Its interior provides the ultimate customer experience with exceptional comfort and convenience.

The combination of high performance, sustainability, safety and comfort paired with an outstanding digital experience will help Mercedes-Benz realize its Ambition 2039 vision to be net carbon neutral across its entire fleet of new vehicles by the end of the next decade.

As the first car to be developed on the MMA platform, the Concept CLA Class paves the way for next-gen electric-drive technology, and features Mercedes-Benz’s new operating system, MB.OS, with automated driving capabilities powered by NVIDIA DRIVE. With an anticipated range of more than 466 miles, the CLA Class has an 800V electric architecture to maximize efficiency and performance and rapid charging. Configured for a sporty, rear-wheel drive, its modular design will also be scalable for other vehicle segments.

Lotus conducted test drives at IAA of its Lotus Eletre Hyper-SUV, which features an immersive digital cockpit, a battery range of up to 370 miles and autonomous-driving capabilities powered by the NVIDIA DRIVE Orin system-on-a-chip. With DRIVE at the wheel, the all-electric car offers server-level computing power that can be continuously enhanced during the car’s lifetime through over-the-air updates.

Lotus Eletre Hyper-SUV. Image courtesy of Lotus.

U.S.-based Lucid Motors premiered during IAA its limited-production Lucid Air Midnight Dream Edition electric sedan, which provides up to 496 miles of range. The sedan was created with the European market in mind.

The automaker also showcased other models, including its Lucid Air Pure, Air Touring and Air Grand Touring, which come with the DreamDrive Pro advanced driver-assistance system (ADAS) powered by the high-performance compute of NVIDIA DRIVE for a seamless automated driving experience.

Lucid Air Midnight Dream. Image courtesy of Lucid Motors.

China’s emerging EV makers — which have been quick to embrace the shift to electric powertrains and software-defined strategies — were also in force at IAA as they set their sights on the European market.

Auto giant BYD presented a diverse lineup of five EVs targeting the European market, along with the seven-seater DENZA D9 MPV, or multi-purpose vehicle, which features significant safety, performance and convenience options for drivers and passengers. DENZA is a joint venture brand between BYD and Mercedes-Benz.

The eco-friendly EVs demonstrate the latest in next-gen electric technology and underscore BYD’s position as a leading global car brand.

BYD booth at IAA. Image courtesy of BYD.

LeapMotor unveiled its new model, the C10 SUV, built on its LEAP 3.0 architecture. The vehicle is equipped with 30 high-resolution sensors, including lidar and 8-megapixel high-definition cameras, for accurate surround-perception capabilities. It’s powered by NVIDIA DRIVE Orin, which delivers 254 TOPS of compute to enable safe, high-speed and urban intelligent-driving capabilities.

LeapMotor C10 SUV. Image courtesy of LeapMotor.

XPENG’s inaugural presence at IAA served as the ideal opportunity to introduce its latest models to Europe, including its G9 and P7 EVs, with NVIDIA DRIVE Orin under the hood. Deliveries of the P7 recently commenced, with the vehicles now available in Norway, Sweden, Denmark and the Netherlands. The automaker’s intelligent G6 Coupe SUV, also powered by NVIDIA DRIVE Orin, will be made available to the European market next year.

XPENG G9 and P7. Image courtesy of XPENG.

Ecosystem Partners Paint IAA Show Floor Green

In addition to automakers, NVIDIA ecosystem partners at IAA showcased their latest innovations and developments in the mobility space:

  • DeepRoute.ai showed its Driver 3.0 HD Map-Free solution built on NVIDIA DRIVE Orin and designed to offer a non-geofenced solution for mass-produced ADAS vehicles. The company plans to bring this NVIDIA-powered solution to the European market and expand beyond later next year.
  • DeepScenario showed how it’s using NVIDIA hardware for training and inference on its AI models.
  • dRISK, an NVIDIA DRIVE Sim ecosystem member, demonstrated its full-stack solution for training, testing and validating on level 2-level 5 ADAS/AV/ADS software, preparing autonomy to handle regulatory requirements and the full complexity of the real world for the next generation of highly effective and commercially viable autonomous solutions.
  • NODAR introduced GridDetect, its latest 3D vision product for level 3 driving. Using off-the-shelf cameras and NVIDIA DRIVE Orin, NODAR’s latest system provides high-resolution, real-time 3D sensing at up to 1,000m and can detect objects as small as 10cm at 150m. GridDetect also provides a comprehensive bird’s-eye view of objects in all conditions — including in challenging scenarios like nighttime, adverse weather and severe fog.
  • SafeAD demonstrated its perception technology for mapless driving, fleet map updates and validation processes.
NODAR GridDetect system for high-resolution, real-time 3D sensing. Image courtesy of NODAR.

Read More

FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs 

FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs 

This research paper was presented at the 28th ACM SIGPLAN International Conference on Functional Programming (opens in new tab) (ICFP), a premier forum for discussing design, implementations, principles, and uses of functional programming.

FP2: Fully In-Place Functional Programming; ICFP 2023

Functional programming languages offer a host of advantages, such as ensuring memory safety (opens in new tab) and eliminating arbitrary side effects. This enables systematic analysis and compositional program construction, facilitating development of scalable and complex software systems. However, a drawback of functional programming is its tendency to liberally allocate new memory. We believe this characteristic has impeded widespread adoption in performance-critical domains. How can we overcome this limitation and harness the benefits of functional programming while maintaining efficient memory usage? 

To illustrate the issue, let’s examine the well-known functional program to reverse a list in linear time using an accumulating parameter:

FP2: Fully In-Place Functional Programming - reverse list code in Koka

The reversal function is written in Koka (opens in new tab), a functional language developed at Microsoft that implements the techniques described in this blog post. Here, a list is either empty (as Nil) or non-empty as a Cons(head,tail) node, and contains the first element as the head and the rest of the list as the tail

In most functional languages, reversing a list this way allocates a fresh result list in the heap, where a list of integers from 1 to 10 is reversed, as shown in Figure 1.

FP2: Fully In-Place Functional Programming; Fig 1- This illustration shows two single-linked lists. The first single-linked list contains the numbers 6 to 10 and is pointed to by
Figure 1: The list [1..5] has already been reversed into acc, but we still must reverse the list [6..10].

As the list xs is non-empty, we add its first element to our accumulating acc parameter before recursing on the rest of the list xx. As shown in Figure 2, this step allocates a new Cons cell but also leaves the Cons cell of xs to be garbage collected. This is rather wasteful.

FP2: Fully In-Place Functional Programming; Fig 3- This illustration depicts two single-linked lists. The first single-linked list contains the numbers 7 to 10 and is pointed to by
Figure 2: The lists after one step of recursion. The top Cons cell on the left has become garbage, while the top Cons cell on the right is freshly allocated.

Fully in-place functional programming avoids allocation 

Recent developments have made it possible to avoid such allocations. In particular, by using a compiler-guided reference counting algorithm called Perceus, we can reuse objects in place whenever the objects are uniquely referenced at runtime. With such reuse, the reverse function can reverse a unique input list xs in-place without allocating any fresh Cons nodes, essentially switching the tail pointers of xs in-place. However, the dynamic nature of this form of reuse makes it hard to predict its application at runtime.  

In our paper, “FP2: Fully in-Place Functional Programming (opens in new tab),” which we’re presenting at ICFP 2023 (opens in new tab), we describe the new fip keyword. It statically checks that programs like the accumulating reverse function can execute in-place, that is, using constant stack space without needing any heap allocation as long as the arguments are unique.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.


Tree traversals and zippers

In fact, many familiar functions and algorithms satisfy our fully in-place criteria. For example, consider a binary tree with all the values at the leaves:

FP2: Fully In-Place Functional Programming - binary tree code in Koka

Now, suppose that we want to navigate through this tree, moving up and down in search of a particular element. You might add parent pointers, but in a functional language, there is an alternative solution originally proposed by Gérard Huet known as the zipper (opens in new tab):

FP2: Fully In-Place Functional Programming - Zipper code in Koka

The zipper stores subtrees along the path from the current node up to the root node. We can define operations on pairs consisting of this type of zipper and the current tree, enabling seamless movement through the tree. For example, the following function uses the zipper to move the focus to the left subtree:

FP2: Fully In-Place Functional Programming - focus on left subtree code in Koka

Here, we move to the left subtree of the current node (if it exists) and extend the zipper data type accordingly. In his 1997, Huet already observed that such zipper operations could be implemented in place:

Efficient destructive algorithms on binary trees may be programmed with these completely applicative primitives, which all use constant time, since they all reduce to local pointer manipulation.

In Koka, we can now make Huet’s intuition precise, where the fip keyword guarantees that left is in place. On closer examination, this might be surprising. While the list reversal example reused a Cons node, here it seems like we may need to garbage collect a Bin constructor and allocate a new BinL constructor. Nonetheless, because both constructors have two fields, the previous Bin memory location can still be reused (only updating the constructor tag). Our paper provides the analysis details that enable this, rooted in the concept of “reuse credits.”

Now, suppose we want to update all the values stored in a tree. Using a zipper, we can do this fully in place. While traversing, the zipper stores input tree fragments in order, using BinL for unvisited and BinR for visited subtrees. Reusing the zipper nodes allows in-order tree mapping without heap or stack usage. The tree map function starts by descending to the leftmost leaf, accumulating unvisited subtrees in BinL. Once we hit the leftmost leaf, we apply the argument function f and work our way back up, recursively processing any unvisited subtrees, as shown in Figure 3.

FP2: Fully In-Place Functional Programming - unvisited subtrees code in Koka

The mutually tail-recursive app and down functions are fully in place. Each matched Bin pairs with BinL, and each BinL with BinR, ultimately leading to BinR pairing with Bin. The definition of tmap may seem somewhat complex, but it is much simpler than its iterative imperative counterpart that uses direct pointer reversal.

FP2: Fully In-Place Functional Programming; Fig 3- An illustration of a binary search tree, where the search path has been pointer-reversed. There are five nodes in total: three leaf nodes and two internal nodes. The first leaf node is the left child of the root and has already been visited. The root node is marked as
Figure 3: The program after visiting the leaf containing f(2) on the given tree. The pointers in the zipper are reversed.

Perspectives and further reading

Koka’s new fip keyword ensures that certain functions do not allocate and only use constant stack space, offering efficient and secure code execution akin to static linear types or Rust’s borrow checker. This introduces a new paradigm for writing programs that are purely functional but can still execute in place. We consider this new technique to be a significant milestone on the path toward using high-level functional programming to develop robust software that delivers both competitive and predictable performance. 

To learn about fully in-place functional programming and the Koka language, start at the Koka homepage (opens in new tab). Koka implements a variety of innovative language features, including algebraic effect handlers and first-class constructor contexts. We encourage readers to continue exploring and experimenting with fully in-place programming. For example, try implementing skew binary heaps (opens in new tab) in Koka. Can you demonstrate fully in-place heap union?

The post FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs  appeared first on Microsoft Research.

Read More

Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users

Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users

Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes.

SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, training, and deploying ML models—within a single, integrated visual interface. You also get access to a large collection of models and pre-built solutions that you can deploy with a few clicks.

To use SageMaker Studio or other personal apps such as Amazon SageMaker Canvas, or to collaborate in shared spaces, AWS customers need to first set up a SageMaker domain. A SageMaker domain consists of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. When a user is onboarded to a SageMaker domain, they are assigned a user profile that they can use to launch their apps. User authentication can be via AWS IAM Identity Center (successor to AWS Single Sign-On) or AWS Identity and Access Management (IAM).

Setting up a SageMaker domain and associated user profiles requires understanding the concepts of IAM roles, domains, authentication, and VPCs, and going through a number of configuration steps. To complete these configuration steps, data scientists and developers typically work with their IT admin teams who provision SageMaker Studio and set up the right guardrails.

Customers told us that the onboarding process can sometimes be time consuming, delaying data scientists and ML teams from getting started with SageMaker Studio. We listened and simplified the onboarding experience!

Introducing the simplified Quick Studio setup

The new Quick Studio setup experience for SageMaker provides a new onboarding and administration experience that makes it easy for individual users to set up and manage SageMaker Studio. Data scientists and ML admins can set up SageMaker Studio in minutes with a single click. SageMaker takes care of provisioning the SageMaker domain with default presets, including setting up the IAM role, IAM authentication, and public internet mode. ML admins can alter SageMaker Studio settings for the created domain and customize the UI further at any time. Let’s take a look at how it works.

Prerequisites

To use the Quick Studio setup, you need the following:

  • An AWS account
  • An IAM role with permissions to create the resources needed to set up a SageMaker domain

Use the Quick Studio setup option

Let’s discuss a scenario where a new user wants to access SageMaker Studio. The user experience includes the following steps:

  1. In your AWS account, navigate to the SageMaker console and choose Set up for single user.

SageMaker starts preparing the SageMaker domain. This process typically takes a few minutes. The new domain’s name is prefixed with QuickSetupDomain-.

As soon as the SageMaker domain is ready, a notification appears on the screen stating “The SageMaker Domain is ready” and the user profile under the domain is also created successfully.

  1. Choose Launch next to the created user profile and choose Studio.

Because it’s the first time SageMaker Studio is getting launched for this user profile, SageMaker creates a new JupyterServer app, which takes a few minutes.

A few minutes later, the Studio IDE loads and you’re presented with the SageMaker Studio Home page.

Components of the Quick Studio setup

When using the Quick Studio setup, SageMaker creates the following resources:

  • A new IAM role with the appropriate permissions for using SageMaker Studio, Amazon Simple Storage Service (Amazon S3), and SageMaker Canvas. You can modify the permissions of the created IAM role at any time based on your use case or persona-specific requirements.
  • Another IAM role prefixed with AmazonSagemakerCanvasForecastRole-, which enables permissions for the SageMaker Canvas time series forecasting feature.
  • A SageMaker Studio domain and a user profile for the domain with unique names. IAM is used as the authentication mode. The IAM role created is used as the default SageMaker execution role for the domain and user profile. You can launch any of the personal apps available, such as SageMaker Studio and SageMaker Canvas, which are enabled by default.
  • An EFS volume, which serves as the file system for SageMaker Studio. Apart from Amazon EFS, a new S3 bucket with prefix sagemaker-studio- is created for notebook sharing.

SageMaker Studio also uses the default VPC and its associated subnets. If there is no default VPC, or if the default VPC has no subnets, then it selects one of the existing VPCs that has associated subnets. If there is no VPC, it will prompt the user to create one on the Amazon VPC console. The VPC with all subnets under it are used to set up Amazon EFS.

Conclusion

Now, a single click is all it takes to get started with SageMaker Studio. The Quick Studio setup for individual users is available in all AWS commercial Regions where SageMaker is currently available.

Try out this new feature on the SageMaker console and let us know what you think. We always look forward to your feedback! You can send it through your usual AWS Support contacts or post it on the AWS Forum for SageMaker.


About the authors

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including natural language processing (NLP), MLOps, and low-code/no-code tools.

Read More

Unlocking language barriers: Translate application logs with Amazon Translate for seamless support

Unlocking language barriers: Translate application logs with Amazon Translate for seamless support

Application logs are an essential piece of information that provides crucial insights into the inner workings of an application. This includes valuable information such as events, errors, and user interactions that would aid an application developer or an operations support engineer to debug and provide support. However, when these logs are presented in languages other than English, it creates a significant hurdle for developers who can’t read the content, and hinders the support team’s ability to identify and address issues promptly.

In this post, we explore a solution on how you can unlock language barriers using Amazon Translate, a fully managed neural machine translation service for translating text to and from English across a wide range of supported languages. The solution will complement your existing logging workflows by automatically translating all your applications logs in Amazon CloudWatch in real time, which can alleviate the challenges posed by non-English application logs.

Solution overview

This solution shows you how you can use three key services to automate the translation of your application logs in an event-driven manner:

  • CloudWatch Logs is used to monitor, store, and access your log files generated from various sources such as AWS services and your applications
  • Amazon Translate is used to perform the translation of text to and from English
  • AWS Lambda is a compute service that lets you run codes to retrieve application logs and translate them through the use of the Amazon Translate SDK

The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

  1. A custom or third-party application is hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance and the generated application logs are uploaded to CloudWatch Logs via the CloudWatch Logs agent.
  2. Each log entry written to CloudWatch Logs triggers the Lambda function subscribed to the CloudWatch log group.
  3. The function processes the contents of the log entry and uses Amazon Translate SDK translate_text to translate the log content.
  4. The translated log content is returned to the function.
  5. The function writes the translated log content back to CloudWatch Logs in a different log group.

The entire process happens automatically in real time, and your developers will be able to access the translated application logs from the CloudWatch log groups with no change in how your existing application writes logs to CloudWatch.

Prerequisites

To follow through the instructions in this solution, you need an AWS account with an AWS Identity and Access Management (IAM) user who has permission to AWS CloudFormation, Amazon Translate, CloudWatch, Lambda, and IAM.

Deploy the solution

To get started, launch the following CloudFormation template to create a Lambda function, two CloudWatch log groups, and IAM role. Proceed to deploy with the default settings. This template takes about 1 minute to complete.

After the stack is created successfully, you can review the Lambda function by navigating to the Lambda console and locating the function translate-application-logs.

You can observe that there is a CloudWatch Logs trigger added to the function.

You can view the details of the trigger configuration by navigating to the Configuration tab and choosing Triggers in the navigation pane.

You can confirm that the trigger has been configured to subscribe to log events from the log group /applicationlogs. This is where your non-English application logs will be written to.

Next, choose Environment variables in the navigation pane.

Two environment variables are provided here:

  • source_language – The original language that the application log is in (for example, ja for Japanese)
  • target_language – The target language to translate the application log to (for example, en for English)

For a list of supported languages, refer to Supported languages and language codes.

Next, go to the Code tab and review the function logic:

import json, boto3, gzip, base64, os

translate = boto3.client(service_name='translate', region_name=os.environ['AWS_REGION'], use_ssl=True)
logs = boto3.client('logs')
    
def lambda_handler(event, context):
    # retrieve log messages
    encoded_zipped_data = event['awslogs']['data']
    zipped_data = base64.b64decode(encoded_zipped_data)
    data = gzip.decompress(zipped_data)
    json_log = json.loads(data)
    logGroup = json_log['logGroup']+'-'+os.environ['target_language']
    logStream = json_log['logStream']
    
    # check  if log group exists, create if not    
    dlg = logs.describe_log_groups(logGroupNamePrefix=logGroup)
    if len(dlg['logGroups']) == 0:
        logs.create_log_group(logGroupName=logGroup)

    # check if log stream exists, create if not    
    dls = logs.describe_log_streams(logGroupName=logGroup, logStreamNamePrefix=logStream)
    if len(dls['logStreams']) == 0:
        logs.create_log_stream(logGroupName=logGroup, logStreamName=logStream)

    # translate log event messages from source language to target language
    for logevent in json_log['logEvents']:
        logevent['message'] = translate.translate_text(Text=logevent['message'], SourceLanguageCode=os.environ['source_language'], TargetLanguageCode=os.environ['target_language']).get('TranslatedText')
        del logevent['id']

    # write translated log events back to a different log group in CloudWatch
    logs.put_log_events(
        logGroupName = logGroup,
        logStreamName = logStream,
        logEvents = json_log['logEvents']
    )
    
    # return success
    return {
        'statusCode': 200,
        'body': 'Translation success!'
    }

Test the solution

Finally, to test the solution, you can create a log message through the CloudWatch console and choose the created log group and log stream.

After creating your log messages, you will be able to see it translated immediately.

Clean up

To clean up the resources created in this post, delete the CloudFormation stack via the CloudFormation console.

Conclusion

This post addressed the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment. Through this implementation, developers can now seamlessly bridge the language barrier, empowering them to address issues swiftly and effectively.

Try out this implementation and let us know your thoughts in the comments.


About the author

Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan spends most of his free time with his family and documenting his daughter’s growing up journey.

Read More

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

This is a guest post from Scalable Capital, a leading FinTech in Europe that offers digital wealth management and a brokerage platform with a trading flat rate.

As a fast-growing company, Scalable Capital’s goals are to not only build an innovative, robust, and reliable infrastructure, but to also provide the best experiences for our clients, especially when it comes to client services.

Scalable receives hundreds of email inquiries from our clients on a daily basis. By implementing a modern natural language processing (NLP) model, the response process has been shaped much more efficiently, and waiting time for clients has been reduced tremendously. The machine learning (ML) model classifies new incoming customer requests as soon as they arrive and redirects them to predefined queues, which allows our dedicated client success agents to focus on the contents of the emails according to their skills and provide appropriate responses.

In this post, we demonstrate the technical benefits of using Hugging Face transformers deployed with Amazon SageMaker, such as training and experimentation at scale, and increased productivity and cost-efficiency.

Problem statement

Scalable Capital is one of the fastest growing FinTechs in Europe. With the aim to democratize investment, the company provides its clients with easy access to the financial markets. Clients of Scalable can actively participate in the market through the company’s brokerage trading platform, or use Scalable Wealth Management to invest in an intelligent and automated fashion. In 2021, Scalable Capital experienced a tenfold increase of its client base, from tens of thousands to hundreds of thousands.

To provide our clients with a top-class (and consistent) user experience across products and client service, the company was looking for automated solutions to generate efficiencies for a scalable solution while maintaining operational excellence. Scalable Capital’s data science and client service teams identified that one of the largest bottlenecks in servicing our clients was responding to email inquiries. Specifically, the bottleneck was the classification step, in which employees had to read and label request texts on a daily basis. After the emails were routed to their proper queues, the respective specialists quickly engaged and resolved the cases.

To streamline this classification process, the data science team at Scalable built and deployed a multitask NLP model using state-of-the-art transformer architecture, based on the pre-trained distilbert-base-german-cased model published by Hugging Face. distilbert-base-german-cased uses the knowledge distillation method to pretrain a smaller general-purpose language representation model than the original BERT base model. The distilled version achieves comparable performance to the original version, while being smaller and faster. To facilitate our ML lifecycle process, we decided to adopt SageMaker to build, deploy, serve, and monitor our models. In the following section, we introduce our project architecture design.

Solution overview

Scalable Capital’s ML infrastructure consists of two AWS accounts: one as an environment for the development stage and the other one for the production stage.

The following diagram shows the workflow for our email classifier project, but can also be generalized to other data science projects.

Email classification project diagram

Email classification project diagram

The workflow consists of the following components:

  • Model experimentation – Data scientists use Amazon SageMaker Studio to carry out the first steps in the data science lifecycle: exploratory data analysis (EDA), data cleaning and preparation, and building prototype models. When the exploratory phase is complete, we turn to VSCode hosted by a SageMaker notebook as our remote development tool to modularize and productionize our code base. To explore different types of models and model configurations, and at the same time to keep track of our experimentations, we use SageMaker Training and SageMaker Experiments.
  • Model build – After we decide on a model for our production use case, in this case a multi-task distilbert-base-german-cased model, fine-tuned from the pretrained model from Hugging Face, we commit and push our code to Github develop branch. The Github merge event triggers our Jenkins CI pipeline, which in turn starts a SageMaker Pipelines job with test data. This acts as a test to make sure that codes are running as expected. A test endpoint is deployed for testing purposes.
  • Model deployment – After making sure that everything is running as expected, data scientists merge the develop branch into the primary branch. This merge event now triggers a SageMaker Pipelines job using production data for training purposes. Afterwards, model artifacts are produced and stored in an output Amazon Simple Storage Service (Amazon S3) bucket, and a new model version is logged in the SageMaker model registry. Data scientists examine the performance of the new model, then approve if it’s in line with expectations. The model approval event is captured by Amazon EventBridge, which then deploys the model to a SageMaker endpoint in the production environment.
  • MLOps – Because the SageMaker endpoint is private and can’t be reached by services outside of the VPC, an AWS Lambda function and Amazon API Gateway public endpoint are required to communicate with CRM. Whenever new emails arrive in the CRM inbox, CRM invokes the API Gateway public endpoint, which in turn triggers the Lambda function to invoke the private SageMaker endpoint. The function then relays the classification back to CRM through the API Gateway public endpoint. To monitor the performance of our deployed model, we implement a feedback loop between CRM and the data scientists to keep track of prediction metrics from the model. On a monthly basis, CRM updates the historical data used for experimentation and model training. We use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) as a scheduler for our monthly retrain.

In the following sections, we break down the data preparation, model experimentation, and model deployment steps in more detail.

Data preparation

Scalable Capital uses a CRM tool for managing and storing email data. Relevant email contents consist of subject, body, and the custodian banks. There are three labels to assign to each email: which line of business the email is from, which queue is appropriate, and the specific topic of the email.

Before we start training any NLP models, we ensure that the input data is clean and the labels are assigned according to expectation.

To retrieve clean inquiry contents from Scalable clients, we remove from raw email data and extra text and symbols, such as email signatures, impressums, quotes of previous messages in email chains, CSS symbols, and so on. Otherwise, our future trained models might experience degraded performance.

Labels for emails evolve over time as Scalable client service teams add new ones and refine or remove existing ones to accommodate business needs. To make sure that labels for training data as well as expected classifications for prediction are up to date, the data science team works in close collaboration with the client service team to ensure the correctness of the labels.

Model experimentation

We start our experiment with the readily available pre-trained distilbert-base-german-cased model published by Hugging Face. Because the pre-trained model is a general-purpose language representation model, we can adapt the architecture to perform specific downstream tasks—such as classification and question answering—by attaching appropriate heads to the neural network. In our use case, the downstream task we are interested in is sequence classification. Without modifying the existing architecture, we decide to fine-tune three separate pre-trained models for each of our required categories. With the SageMaker Hugging Face Deep Learning Containers (DLCs), starting and managing NLP experiments are made simple with Hugging Face containers and the SageMaker Experiments API.

The following is a code snippet of train.py:

config = AutoConfig.from_pretrained("distilbert-base-german-cased")  # load original config
config.num_labels = num_labels  # adapt original config to a specific number of labels (default is 2)
# instantiate a pretrained model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-german-cased", config=config)

trainer = Trainer(
    model=model,  # the instantiated Transformers model to be trained
    args=training_args,  # training arguments, defined above
    train_dataset=train_dataset,  # training dataset
    eval_dataset=val_dataset  # evaluation dataset
)
trainer.train()

The following code is the Hugging Face estimator:

huggingface_estimator = HuggingFace(
    entry_point='train.py',
    source_dir='./scripts',
    instance_type='ml.p3.2xlarge',
    instance_count=1,
    role=role,
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    hyperparameters = hyperparameters
)

To validate the fine-tuned models, we use the F1-score due to the imbalanced nature of our email dataset, but also to compute other metrics such as accuracy, precision, and recall. For the SageMaker Experiments API to register the training job’s metrics, we need to first log the metrics to the training job local console, which are picked up by Amazon CloudWatch. Then we define the correct regex format to capture the CloudWatch logs. The metric definitions include the name of the metrics and regex validation for extracting the metrics from the training job:

metric_definitions = [
    {"Name": "train:loss", "Regex": "'loss': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "learning_rate", "Regex": "'learning_rate': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:loss", "Regex": "'eval_loss': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:accuracy", "Regex": "'train_accuracy': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:accuracy", "Regex": "'eval_accuracy': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:precision", "Regex": "'train_precision': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:precision", "Regex": "'eval_precision': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:recall", "Regex": "'train_recall': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:recall", "Regex": "'eval_recall': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:f1", "Regex": "'train_f1': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:f1", "Regex": "'eval_f1': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:runtime", "Regex": "'eval_runtime': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:samples_per_second", "Regex": "'eval_samples_per_second': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "epoch", "Regex": "'epoch': ([0-9]+(.|e-)[0-9]+),?"},
]

As part of the training iteration for the classifier model, we use a confusion matrix and classification report to evaluate the result. The following figure shows the confusion matrix for line of business prediction.

Confusion Matrix

Confusion Matrix

The following screenshot shows an example of the classification report for line of business prediction.

Classification Report

Classification Report

As a next iteration of our experiment, we’ll take advantage of multi-task learning to improve our model. Multi-task learning is a form of training where a model learns to solve multiple tasks simultaneously, because the shared information among tasks can improve learning efficiencies. By attaching two more classification heads to the original distilbert architecture, we can carry out multi-task fine-tuning, which attains reasonable metrics for our client service team.

Model deployment

In our use case, the email classifier is to be deployed to an endpoint, to which our CRM pipeline can send a batch of unclassified emails and get back predictions. Because we have other logics—such as input data cleaning and multi-task predictions—in addition to Hugging Face model inference, we need to write a custom inference script that adheres to the SageMaker standard.

The following is a code snippet of inference.py:

def model_fn(model_dir):
    model = load_from_artifact(model_dir)

    return model

def transform_fn(model, input_data, content_type, accept):
    if content_type == "application/json":
        data = json.loads(input_data)
        data = pd.DataFrame(data)
        
    else:
        raise ValueError(f"Unsupported content type: {content_type}")

    data = preprocess(data)

    # Inference
    with torch.no_grad():
        predictions = model(data)

    predictions = postprocess(predictions)

    if content_type == 'application/json':
        return json.dumps(predictions.to_dict(orient="records"))
    else:
        raise NotImplementedError

When everything is up and ready, we use SageMaker Pipelines to manage our training pipeline and attach it to our infrastructure to complete our MLOps setup.

To monitor the performance of the deployed model, we build a feedback loop to enable CRM to provide us with the status of classified emails when cases are closed. Based on this information, we make adjustments to improve the deployed model.

Conclusion

In this post, we shared how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services. Thanks to this infrastructure, we are able to iterate and deploy new models more efficiently, and are therefore able to improve existing processes within Scalable as well as our clients’ experiences.

To learn more about Hugging Face and SageMaker, refer to the following resources:


About the Authors

Dr. Sandra Schmid is Head of Data Analytics at Scalable GmbH. She is responsible for data-driven approaches and use cases in the company together with her teams. Her key focus is finding the best combination of machine learning and data science models and business goals in order to gain as much business value and efficiencies out of data as possible.

Huy Dang Data Scientist at Scalable GmbH. His responsibilities include data analytics, building and deploying machine learning models, as well as developing and maintaining infrastructure for the data science team. In his spare time, he enjoys reading, hiking, rock climbing, and staying up to date with the latest machine learning developments.

Mia Chang is a ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys yoga, board games, and brewing coffee.

Moritz Guertler is an Account Executive in the Digital Native Businesses segment at AWS. He focuses on customers in the FinTech space and supports them in accelerating innovation through secure and scalable cloud infrastructure.

Read More