Technovation Empowers Girls in AI, Making AI Education More Inclusive and Engaging

Technovation Empowers Girls in AI, Making AI Education More Inclusive and Engaging

Tara Chklovski has spent much of her career inspiring young women to take on some of the world’s biggest challenges using technology.

The founder and CEO of education nonprofit Technovation joined the AI Podcast in 2019 to discuss the AI Family Challenge. Now, she returns to explain how inclusive AI makes the world a better and, crucially, less boring place.

In this episode of the NVIDIA AI Podcast, Chklovski and Anshita Saini, a Technovation alumna and member of the technical staff at OpenAI, explore how the nonprofit empowers girls worldwide through technology education.

They discuss the organization’s growth from its early days to its current focus on AI education and real-world problem-solving.

Anshita Saini speaking at the Technovation World Summit event.

In addition, Saini shares her journey from creating an app that helped combat a vaping crisis at her high school, to her first exposure to AI, through to her current role working on ChatGPT. She also talks about Wiser AI, an initiative she recently founded to support women leaders and other underrepresented voices in artificial intelligence.

Technovation is preparing the next generation of female leaders in AI and technology. Learn about the opportunity to mentor a team of girls for the 2025 season.

And learn more about the latest technological advancements by registering for NVIDIA GTC, the conference for the era of AI, taking place March 17-21.

Time Stamps

2:21 – Recognizing AI’s revolutionary potential in 2016.

5:39 – Technovation’s pioneering approach to incorporating ChatGPT in education.

12:17 – Saini builds an app through Technovation that addressed a real problem at her high school.

29:12 – The importance of having women represented on software development teams.

You Might Also Like… 

NVIDIA’s Louis Stewart on How AI Is Shaping Workforce Development

Louis Stewart, head of strategic initiatives for NVIDIA’s global developer ecosystem, discusses why workforce development is crucial for maximizing AI benefits. He emphasizes the importance of AI education, inclusivity and public-private partnerships in preparing the global workforce for the future. Engaging with AI tools and understanding their impact on the workforce landscape is vital to ensuring these changes benefit everyone.

Currents of Change: ITIF’s Daniel Castro on Energy-Efficient AI and Climate Change

AI is everywhere. So, too, are concerns about advanced technology’s environmental impact. Daniel Castro, vice president of the Information Technology and Innovation Foundation and director of its Center for Data Innovation, discusses his AI energy use report that addresses misconceptions about AI’s energy consumption. He also talks about the need for continued development of energy-efficient technology.

How AI Can Enhance Disability Inclusion and Education

U.S. Special Advisor on International Disability Rights at the U.S. Department of State Sara Minkara and Timothy Shriver, chairman of the board of Special Olympics, discuss AI’s potential to enhance disability inclusion and education. They discuss the need to hear voices from the disability community in conversations about AI development and policy. They also cover why building an inclusive future is good for society’s collective cultural, financial and social well-being.

Subscribe to the AI Podcast

Get the AI Podcast through Amazon Music, Apple Podcasts, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, SoundCloud, Spotify, Stitcher and TuneIn.

Read More

Ideas: Building AI for population-scale systems with Akshay Nambi

Ideas: Building AI for population-scale systems with Akshay Nambi

Outline illustration of Akshay Nambi | Ideas podcast

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, guest host Chris Stetkiewicz talks with Microsoft Principal Researcher Akshay Nambi about his focus on developing AI-driven technology that addresses real-world challenges at scale. Drawing on firsthand experiences, Nambi combines his expertise in electronics and computer science to create systems that enhance road safety, agriculture, and energy infrastructure. He’s currently working on AI-powered tools to improve education, including a digital assistant that can help teachers work more efficiently and create effective lesson plans and solutions to help improve the accuracy of models underpinning AI tutors.

Learn more:

Teachers in India help Microsoft Research design AI tool for creating great classroom content
Microsoft Research Blog, October 2023

HAMS: Harnessing AutoMobiles for Safety
Project homepage

Microsoft Research AI project automates driver’s license tests in India (opens in new tab)
Microsoft Source Asia Blog

InSight: Monitoring the State of the Driver in Low-Light Using Smartphones
Publication, September 2020

Chanakya: Learning Runtime Decisions for Adaptive Real-Time Perception
Publication, December 2023

ALT: Towards Automating Driver License Testing using Smartphones
Publication, November 2019

Dependable IoT
Project homepage

Vasudha
Project homepage

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

AKSHAY NAMBI: For me, research is just not about pushing the boundaries of the knowledge. It’s about ensuring that these advancements translate to meaningful impact on the ground. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology that’s scaled to benefit large populations? And two, at the same time, I’m motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new, and that’s what keeps me excited.

[TEASER ENDS]

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

I’m your guest host, Chris Stetkiewicz. Today, I’m talking to Akshay Nambi. Akshay is a principal researcher at Microsoft Research. His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems. Akshay’s research extends across education, agriculture, transportation, and energy. He is currently working on enhancing the quality and reliability of AI systems by addressing critical challenges such as reasoning, grounding, and managing complex queries.

Akshay, welcome to the podcast.


AKSHAY NAMBI: Thanks for having me.

STETKIEWICZ: I’d like to begin by asking you to tell us your origin story. How did you get started on your path? Was there a big idea or experience that captured your imagination or motivated you to do what you’re doing today?

NAMBI: If I look back, my journey into research wasn’t a straight line. It was more about discovering my passion through some unexpected opportunities and also finding purpose along the way. So before I started with my undergrad studies, I was very interested in electronics and systems. My passion for electronics, kind of, started when I was in school. I was more like an average student, not a nerd or not too curious, but I was always tinkering around, doing things, building stuff, and playing with gadgets and that, kind of, made me very keen on electronics and putting things together, and that was my passion. But sometimes things don’t go as planned. So I didn’t get into the college which I had hoped to join for electronics, so I ended up pursuing computer science, which wasn’t too bad either. So during my final year of bachelor’s, I had to do a final semester project, which turned out to be a very pivotal moment. And that’s when I got to know this institute called Indian Institute of Science (IISc), which is a top research institute in India and also globally. And I had a chance to work on a project there. And it was my first real exposure to open-ended research, right, so I remember … where we were trying to build a solution that helped to efficiently construct an ontology for a specific domain, which simply means that we were building systems to help users uncover relationships in the data and allow them to query it more efficiently, right. And it was super exciting for me to design and build something new. And that experience made me realize that I wanted to pursue research further. And right after that project, I decided to explore research opportunities, which led me to join Indian Institute of Science again as a research assistant.

STETKIEWICZ: So what made you want to take the skills you were developing and apply them to a research career?

NAMBI: So interestingly when I joined IISc, the professor I worked with specialized in electronics, so things come back, so something I had always been passionate about. And I was the only computer science graduate in the lab at that time with others being electronic engineers, and I didn’t even know how to solder. But the lab environment was super encouraging, collaborative, so I, kind of, caught up very quickly. In that lab, basically, I worked on several projects in the emerging fields of embedded device and energy harvesting systems. Specifically, we were designing systems that could harvest energy from sources like sun, hydro, and even RF (radio frequency) signals. And my role was kind of twofold. One, I designed circuits and systems to make energy harvesting more efficient so that you can store this energy. And then I also wrote programs, software, to ensure that the harvested energy can be used efficiently. For instance, as we harvest some of this energy, you want to have your programs run very quickly so that you are able to sense the data, send it to the server in an efficient way. And one of the most exciting projects I worked during that time was on data-driven agriculture. So this was back in 2008, 2009, right, where we developed an embedded system device with sensors to monitor the agricultural fields, collecting data like soil moisture, soil temperature. And that was sent to the agronomists who were able to analyze this data and provide feedback to farmers. In many remote areas, still access to power is a huge challenge. So we used many of the technologies we were developing in the lab, specifically energy harvesting techniques, to power these sensors and devices in the rural farms, and that’s when I really got to see firsthand how technology could help people’s lives, particularly in rural settings. And that’s what, kind of, stood out in my experience at IISc, right, was that it was [the] end-to-end nature of the work. And it was not just writing code or designing circuits. It was about identifying the real-world problems, solving them efficiently, and deploying solutions in the field. And this cemented my passion for creating technology that solves real-world problems, and that’s what keeps me driving even today.

STETKIEWICZ: And as you’re thinking about those problems that you want to try and solve, where did you look for, for inspiration? It sounds like some of these are happening right there in your home.

NAMBI: That’s right. Growing up and living in India, I’ve been surrounded by these, kind of, many challenges. And these are not distant problems. These are right in front of us. And some of them are quite literally outside the door. So being here in India provides a unique opportunity to tackle some of the pressing real-world challenges in agriculture, education, or in road safety, where even small advancements can create significant impact.

STETKIEWICZ: So how would you describe your research philosophy? Do you have some big goals that guide you?

NAMBI: Right, as I mentioned, right, my research philosophy is mainly rooted in solving real-world problems through end-to-end innovation. For me, research is just not about pushing the boundaries of the knowledge. It’s about ensuring that these advancements translate to meaningful impact on the ground, right. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology that’s scaled to benefit large populations? And two, at the same time, I’m motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new. And that’s what keeps me excited.

STETKIEWICZ: So let’s talk a little bit about your journey at Microsoft Research. I know you began as an intern, and some of the initial work you did was focused on computer vision, road safety, energy efficiency. Tell us about some of those projects.

NAMBI: As I was nearing the completion of my PhD, I was eager to look for opportunities in industrial labs, and Microsoft Research obviously stood out as an exciting opportunity. And additionally, the fact that Microsoft Research India was in my hometown, Bangalore, made it even more appealing. So when I joined as an intern, I worked together with Venkat Padmanabhan, who now leads the lab, and we started this project called HAMS, which stands for Harnessing Automobiles for Safety. As you know, road safety is a major public health issue globally, responsible for almost 1.35 million fatalities annually and with the situation being even more severe in countries like India. For instance, there are estimates that there’s a life lost on the road every four minutes in India. When analyzing the factors which affect road safety, we saw mainly three elements. One, the vehicle. Second, the infrastructure. And then the driver. Among these, the driver plays the most critical role in many incidents, whether it’s over-speeding, driving without seat belts, drowsiness, fatigue, any of these, right. And this realization motivated us to focus on driver monitoring, which led to the development of HAMS. In a nutshell, HAMS is basically a smartphone-based system where you’re mounting your smartphone on a windshield of a vehicle to monitor both the driver and the driving in real time with the goal of improving road safety. Basically, it observes key aspects such as where the driver is looking, whether they are distracted or fatigued[1], while also considering the external driving environment, because we truly believe to improve road safety, we need to understand not just the driver’s action but also the context in which they are driving. For example, if the smartphone’s accelerometer detects sharp braking, the system would automatically check the distance to the vehicle in the front using the rear camera and whether the driver was distracted or fatigued using the front camera. And this holistic approach ensures a more accurate and comprehensive assessment of the driving behavior, enabling a more meaningful feedback.

STETKIEWICZ: So that sounds like a system that’s got several moving parts to it. And I imagine you had some technical challenges you had to deal with there. Can you talk about that?

NAMBI: One of our guiding principles in HAMS was to use commodity, off-the-shelf smartphone devices, right. This should be affordable, in the range of $100 to $200, so that you can just take out regular smartphones and enable this driver and driving monitoring. And that led to handling several technical challenges. For instance, we had to develop efficient computer vision algorithms that could run locally on the device with cheap smartphone processing units while still performing very well at low-light conditions. We wrote multiple papers and developed many of the novel algorithms which we implemented on very low-cost smartphones. And once we had such a monitoring system, right, you can imagine there’s several deployment opportunities, starting from fleet monitoring to even training new drivers, right. However, one application we hadn’t originally envisioned but turned out to be its most impactful use case even today is automated driver’s license testing. As you know, before you get a license, a driver is supposed to pass a test, but what happens in many places, including India, is that licenses are issued with very minimal or no actual testing, leading to unsafe and untrained drivers on the road. At the same time as we were working on HAMS, Indian government were looking at introducing technology to make testing more transparent and also automated. So we worked with the right set of partners, and we demonstrated to the government that HAMS could actually completely automate the entire license testing process. So we first deployed this system in Dehradun RTO (Regional Transport Office)—which is the equivalent of a DMV in the US—in 2019, working very closely with RTO officials to define what should be some of the evaluation criteria, right. Some of these would be very simple like, oh, is it the same candidate who is taking the test who actually registered for the test, right? And whether they are wearing seat belts. Did they scan their mirrors before taking a left turn and how well they performed in tasks like reverse parking and things like that.

STETKIEWICZ: So what’s been the government response to that? Have they embraced it or deployed it in a wider extent?

NAMBI: Yes, yes. So after the deployment in Dehradun in 2019, we actually open sourced the entire HAMS technology and our partners are now working with several state governments and scaled HAMS to several states in India. And as of today, we have around 28 RTOs where HAMS is actually being deployed, and the pass rate of such license test is just 60% as compared to 90-plus percent with manual testing. That’s the extensive rigor the system brings in. And now what excites me is after nearly five years later, we are now taking the next step in this project where we are now evaluating the long-term impact of this intervention on driving behavior and road safety. So we are collaborating with Professor Michael Kremer, who is a Nobel laureate and professor at University of Chicago, and his team to study how this technology has influenced driving patterns and accident rates over time. So this focus on closing the loop and moving beyond just deployment in the field to actually measuring the real impact, right, is something that truly excites me and that makes research at Microsoft is very unique. And that is actually one of the reasons why I joined Microsoft Research as a full-time after my internship, and this unique flexibility to work on real-world problems, develop novel research ideas, and actually collaborate with partners both internally and externally to deploy at scale is something that is very unique here.

STETKIEWICZ: So have you actually received any evidence that the project is working? Is driving getting safer?

NAMBI: Yes, these are very early analysis, and there are very positive insights we are getting from that. Soon we will be releasing a white paper on our study on this long-term impact.

STETKIEWICZ: That’s great. I look forward to that one. So you’ve also done some interesting work involving the Internet of Things, with an emphasis on making it more reliable and practical. So for those in our audience who may not know, the Internet of Things, or IoT, is a network that includes billions of devices and sensors in things like smart thermostats and fitness trackers. So talk a little bit about your work in this area.

NAMBI: Right, so IoT, as you know, is already transforming several industries with billions of sensors being deployed in areas like industrial monitoring, manufacturing, agriculture, smart buildings, and also air pollution monitoring. And if you think about it, these sensors provide critical data that businesses rely for decision making. However, a fundamental challenge is ensuring that the data collected from these sensors is actually reliable. If the data is faulty, it can lead to poor decisions and inefficiencies. And the challenge is that these sensor failures are always not obvious. What I mean by that is when a sensor stops working, it always doesn’t stop sending data, but it often continues to send some data which appear to be normal. And that’s one of the biggest problems, right. So detecting these errors is non-trivial because the faulty sensors can mimic real-world working data, and traditional solutions like deploying redundant sensors or even manually inspecting them are very expensive, labor intensive, and also sometimes infeasible, especially for remote deployments. Our goal in this work was to develop a simple and efficient way to remotely monitor the health of the IoT sensors. So what we did was we hypothesized that most sensor failures occurred due to the electronic malfunctions. It could be either due to short circuits or component degradation or due to environmental factors such as heat, humidity, or pollution. Since these failures originate within the sensor hardware itself, we saw an opportunity to leverage some of the basic electronic principles to create a novel solution. The core idea was to develop a way to automatically generate a fingerprint for each sensor. And by fingerprint, I mean the unique electrical characteristic exhibited by a properly working sensor. We built a system that could devise these fingerprints for different types of sensors, allowing us to detect failures purely based on the sensors internal characteristics, that is the fingerprint, and even without looking at the data it produces. Essentially what it means now is that we were able to tag each sensor data with a reliability score, ensuring verifiability.

STETKIEWICZ: So how does that technology get deployed in the real world? Is there an application where it’s being put to work today?

NAMBI: Yes, this technology, we worked together with Azure IoT and open-sourced it where there were several opportunities and several companies took the solution into their systems, including air pollution monitoring, smart buildings, industrial monitoring. The one which I would like to talk about today is about air pollution monitoring. As you know, air pollution is a major challenge in many parts of the world, especially in India. And traditionally, air quality monitoring relies on these expensive fixed sensors, which provide limited coverage. On the other hand, there is a rich body of work on low-cost sensors, which can offer wider deployment. Like, you can put these sensors on a bus or a vehicle and have it move around the entire city, where you can get much more fine-grained, accurate picture on the ground. But these are often unreliable because these are low-cost sensors and have reliability issues. So we collaborated with several startups who were developing these low-cost air pollution sensors who were finding it very challenging to gain trust because one of the main concerns was the accuracy of the data from low-cost sensors. So our solution seamlessly integrated with these sensors, which enabled verification of the data quality coming out from these low-cost air pollution sensors. So this bridged the trust gap, allowing government agencies to initiate large-scale pilots using low-cost sensors for fine-grain air-quality monitoring.

STETKIEWICZ: So as we’re talking about evolving technology, large language models, or LLMs, are also enabling big changes, and they’re not theoretical. They’re happening today. And you’ve been working on LLMs and their applicability to real-world problems. Can you talk about your work there and some of the latest releases?

NAMBI: So when ChatGPT was first released, I, like many people, was very skeptical. However, I was also curious both of how it worked and, more importantly, whether it could accelerate solutions to real-world problems. That led to the exploration of LLMs in education, where we fundamentally asked this question, can AI help improve educational outcomes? And this was one of the key questions which led to the development of Shiksha copilot, which is a genAI-powered assistant designed to support teachers in their daily work, starting from helping them to create personalized learning experience, design assignments, generate hands-on activities, and even more. Teachers today universally face several challenges, from time management to lesson planning. And our goal with Shiksha was to empower them to significantly reduce the time spent on this task. For instance, lesson planning, which traditionally took about 60 minutes, can now be completed in just five minutes using the Shiksha copilot. And what makes Shiksha unique is that it’s completely grounded in the local curriculum and the learning objectives, ensuring that the AI-generated content aligns very well with the pedagogical best practices. The system actually supports multilingual interactions, multimodal capabilities, and also integration with external knowledge base, making it very highly adaptable for different curriculums. Initially, many teachers were skeptical. Some feared this would limit their creativity. However, as they began starting to use Shiksha, they realized that it didn’t replace their expertise, but rather amplified it, enabling them to do work faster and more efficiently.

STETKIEWICZ: So, Akshay, the last time you and I talked about Shiksha copilot, it was very much in the pilot phase and the teachers were just getting their hands on it. So it sounds like, though, you’ve gotten some pretty good feedback from them since then.

NAMBI: Yes, so when we were discussing, we were doing this six-month pilot with 50-plus teachers where we gathered overwhelming positive feedback on how technologies are helping teachers to reduce time in their lesson planning. And in fact, they were using the system so much that they really enjoyed working with Shiksha copilot where they were able to do more things with much less time, right. And with a lot of feedback from teachers, we have improved Shiksha copilot over the past few months. And starting this academic year, we have already deployed Shiksha to 1,000-plus teachers in Karnataka. This is with close collaboration with our partners in … with the Sikshana Foundation and also with the government of Karnataka. And the response has been already incredibly encouraging. And looking ahead, we are actually focusing on again, closing this loop, right, and measuring the impact on the ground, where we are doing a lot of studies with the teachers to understand not just improving efficiency of the teachers but also measuring how AI-generated content enriched by teachers is actually enhancing student learning objectives. So that’s the study we are conducting, which hopefully will close this loop and understand our original question that, can AI actually help improve educational outcomes?

STETKIEWICZ: And is the deployment primarily in rural areas, or does it include urban centers, or what’s the target?

NAMBI: So the current deployment with 1,000 teachers is a combination of both rural and urban public schools. These are covering both English medium and Kannada medium teaching schools with grades from Class 5 to Class 10.

STETKIEWICZ: Great. So Shiksha was focused on helping teachers and making their jobs easier, but I understand you’re also working on some opportunities to use AI to help students succeed. Can you talk about that?

NAMBI: So as you know, LLMs are still evolving and inherently they are fragile, and deploying them in real-world settings, especially in education, presents a lot of challenges. With Shiksha, if you think about it, teachers remain in control throughout the interaction, making the final decision on whether to use the AI-generated content in the classroom or not. However, when it comes to AI tutors for students, the stakes are slightly higher, where we need to ensure the AI doesn’t produce incorrect answers, misrepresent concepts, or even mislead explanations. Currently, we are developing solutions to enhance accuracy and also the reasoning capabilities of these foundational models, particularly solving math problems. This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

STETKIEWICZ: So you’ve talked about working in computer vision and IoT and LLMs. What do those areas have in common? Is there some thread that weaves through the work that you’re doing?

NAMBI: That’s a great question. As a systems researcher, I’m quite interested in this end-to-end systems development, which means that my focus is not just about improving a particular algorithm but also thinking about the end-to-end system, which means that I, kind of, think about computer vision, IoT, and even LLMs as tools, where we would want to improve them for a particular application. It could be agriculture, education, or road safety. And then how do you think this holistically to come up with the best efficient system that can be deployed at population scale, right. I think that’s the connecting story here, that how do you have this systemic thinking which kind of takes the existing tools, improves them, makes it more efficient, and takes it out from the lab to real world.

STETKIEWICZ: So you’re working on some very powerful technology that is creating tangible benefits for society, which is your goal. At the same time, we’re still in the very early stages of the development of AI and machine learning. Have you ever thought about unintended consequences? Are there some things that could go wrong, even if we get the technology right? And does that kind of thinking ever influence the development process?

NAMBI: Absolutely. Unintended consequences are something I think about deeply. Even the most well-designed technology can have these ripple effects that we may not fully anticipate, especially when we are deploying it at population scale. For me, being proactive is one of the key important aspects. This means not only designing the technology at the lab but actually also carefully deploying them in real world, measuring its impact, and working with the stakeholders to minimize the harm. In most of my work, I try to work very closely with the partner team on the ground to monitor, analyze, how the technology is being used and what are some of the risks and how can we eliminate that. At the same time, I also remain very optimistic. It’s also about responsibility. If we are able to embed societal values, ethics, into the design of the system and involve diverse perspectives, especially from people on the ground, we can remain vigilant as the technology evolves and we can create systems that can truly deliver immense societal benefits while addressing many of the potential risks.

STETKIEWICZ: So we’ve heard a lot of great examples today about building technology to solve real-world problems and your motivation to keep doing that. So as you look ahead, where do you see your research going next? How will people be better off because of the technology you develop and the advances that they support?

NAMBI: Yeah, I’m deeply interested in advancing AI systems that can truly assist anyone in their daily tasks, whether it’s providing personalized guidance to a farmer in a rural village, helping a student get instant 24 by 7 support for their learning doubts, or even empowering professionals to work more efficiently. And to achieve this, my research is focusing on tackling some of the fundamental challenges in AI with respect to reasoning and reliability and also making sure that AI is more context aware and responsive to evolving user needs. And looking ahead, I envision AI as not just an assistant but also as an intelligent and equitable copilot seamlessly integrated into our everyday life, empowering individuals across various domains.

STETKIEWICZ: Great. Well, Akshay, thank you for joining us on Ideas. It’s been a pleasure.

[MUSIC]

NAMBI: Yeah, I really enjoyed talking to you, Chris. Thank you.

STETKIEWICZ: Till next time.

[MUSIC FADES]


[1] To ensure data privacy, all processing is done locally on the smartphone. This approach ensures that driving behavior insights remain private and secure with no personal data stored or shared.

The post Ideas: Building AI for population-scale systems with Akshay Nambi appeared first on Microsoft Research.

Read More

Transforming credit decisions using generative AI with Rich Data Co and AWS

Transforming credit decisions using generative AI with Rich Data Co and AWS

This post is co-written with Gordon Campbell, Charles Guan, and Hendra Suryanto from RDC. 

The mission of Rich Data Co (RDC) is to broaden access to sustainable credit globally. Its software-as-a-service (SaaS) solution empowers leading banks and lenders with deep customer insights and AI-driven decision-making capabilities.

Making credit decisions using AI can be challenging, requiring data science and portfolio teams to synthesize complex subject matter information and collaborate productively. To solve this challenge, RDC used generative AI, enabling teams to use its solution more effectively:

  • Data science assistant – Designed for data science teams, this agent assists teams in developing, building, and deploying AI models within a regulated environment. It aims to boost team efficiency by answering complex technical queries across the machine learning operations (MLOps) lifecycle, drawing from a comprehensive knowledge base that includes environment documentation, AI and data science expertise, and Python code generation.
  • Portfolio assistant – Designed for portfolio managers and analysts, this agent facilitates natural language inquiries about loan portfolios. It provides critical insights on performance, risk exposures, and credit policy alignment, enabling informed commercial decisions without requiring in-depth analysis skills. The assistant is adept at high-level questions (such as identifying high-risk segments or potential growth opportunities) and one-time queries, allowing the portfolio to be diversified.

In this post, we discuss how RDC uses generative AI on Amazon Bedrock to build these assistants and accelerate its overall mission of democratizing access to sustainable credit.

Solution overview: Building a multi-agent generative AI solution

We began with a carefully crafted evaluation set of over 200 prompts, anticipating common user questions. Our initial approach combined prompt engineering and traditional Retrieval Augmented Generation (RAG). However, we encountered a challenge: accuracy fell below 90%, especially for more complex questions.

To overcome the challenge, we adopted an agentic approach, breaking down the problem into specialized use cases. This strategy equipped us to align each task with the most suitable foundation model (FM) and tools. Our multi-agent framework is orchestrated using LangGraph, and it consisted of:

  1. Orchestrator – The orchestrator is responsible for routing user questions to the appropriate agent. In this example, we start with the data science or portfolio agent. However, we envision many more agents in the future. The orchestrator can also use user context, such as the user’s role, to determine routing to the appropriate agent.
  2. Agent – The agent is designed for a specialized task. It’s equipped with the appropriate FM for the task and the necessary tools to perform actions and access knowledge. It can also handle multiturn conversations and orchestrate multiple calls to the FM to reach a solution.
  3. Tools – Tools extend agent capabilities beyond the FM. They provide access to external data and APIs or enable specific actions and computation. To efficiently use the model’s context window, we construct a tool selector that retrieves only the relevant tools based on the information in the agent state. This helps simplify debugging in the case of errors, ultimately making the agent more effective and cost-efficient.

This approach gives us the right tool for the right job. It enhances our ability to handle complex queries efficiently and accurately while providing flexibility for future improvements and agents.

The following image is a high-level architecture diagram of the solution.

High-level architecture diagram

Data science agent: RAG and code generation

To boost productivity of data science teams, we focused on rapid comprehension of advanced knowledge, including industry-specific models from a curated knowledge base. Here, RDC provides an integrated development environment (IDE) for Python coding, catering to various team roles. One role is model validator, who rigorously assesses whether a model aligns with bank or lender policies. To support the assessment process, we designed an agent with two tools:

  1. Content retriever toolAmazon Bedrock Knowledge Bases powers our intelligent content retrieval through a streamlined RAG implementation. The service automatically converts text documents to their vector representation using Amazon Titan Text Embeddings and stores them in Amazon OpenSearch Serverless. Because the knowledge is vast, it performs semantic chunking, making sure that the knowledge is organized by topic and can fit within the FM’s context window. When users interact with the agent, Amazon Bedrock Knowledge Bases using OpenSearch Serverless provides fast, in-memory semantic search, enabling the agent to retrieve the most relevant chunks of knowledge for relevant and contextual responses to users.
  2. Code generator tool – With code generation, we selected Anthropic’s Claude model on Amazon Bedrock due to its inherent ability to understand and generate code. This tool is grounded to answer queries related to data science and can generate Python code for quick implementation. It’s also adept at troubleshooting coding errors.

Portfolio agent: Text-to-SQL and self-correction

To boost the productivity of credit portfolio teams, we focused on two key areas. For portfolio managers, we prioritized high-level commercial insights. For analysts, we enabled deep-dive data exploration. This approach empowered both roles with rapid understanding and actionable insights, streamlining decision-making processes across teams.

Our solution required natural language understanding of structured portfolio data stored in Amazon Aurora. This led us to base our solution on a text-to-SQL model to efficiently bridge the gap between natural language and SQL.

To reduce errors and tackle complex queries beyond the model’s capabilities, we developed three tools using Anthropic’s Claude model on Amazon Bedrock for self-correction:

  1. Check query tool – Verifies and corrects SQL queries, addressing common issues such as data type mismatches or incorrect function usage
  2. Check result tool – Validates query results, providing relevance and prompting retries or user clarification when needed
  3. Retry from user tool – Engages users for additional information when queries are too broad or lack detail, guiding the interaction based on database information and user input

These tools operate in an agentic system, enabling accurate database interactions and improved query results through iterative refinement and user engagement.

To improve accuracy, we tested model fine-tuning, training the model on common queries and context (such as database schemas and their definitions). This approach reduces inference costs and improves response times compared to prompting at each call. Using Amazon SageMaker JumpStart, we fine-tuned Meta’s Llama model by providing a set of anticipated prompts, intended answers, and associated context. Amazon SageMaker Jumpstart offers a cost-effective alternative to third-party models, providing a viable pathway for future applications. However, we didn’t end up deploying the fine-tuned model because we experimentally observed that prompting with Anthropic’s Claude model provided better generalization, especially for complex questions. To reduce operational overhead, we will also evaluate structured data retrieval on Amazon Bedrock Knowledge Bases.

Conclusion and next steps with RDC

To expedite development, RDC collaborated with AWS Startups and the AWS Generative AI Innovation Center. Through an iterative approach, RDC rapidly enhanced its generative AI capabilities, deploying the initial version to production in just 3 months. The solution successfully met the stringent security standards required in regulated banking environments, providing both innovation and compliance.

“The integration of generative AI into our solution marks a pivotal moment in our mission to revolutionize credit decision-making. By empowering both data scientists and portfolio managers with AI assistants, we’re not just improving efficiency—we’re transforming how financial institutions approach lending.”

–Gordon Campbell, Co-Founder & Chief Customer Officer at RDC

RDC envisions generative AI playing a significant role in boosting the productivity of the banking and credit industry. By using this technology, RDC can provide key insights to customers, improve solution adoption, accelerate the model lifecycle, and reduce the customer support burden. Looking ahead, RDC plans to further refine and expand its AI capabilities, exploring new use cases and integrations as the industry evolves.

For more information about how to work with RDC and AWS and to understand how we’re supporting banking customers around the world to use AI in credit decisions, contact your AWS Account Manager or visit Rich Data Co.

For more information about generative AI on AWS, refer to the following resources:


About the Authors

Daniel Wirjo is a Solutions Architect at AWS, focused on FinTech and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Iman Abbasnejad is a computer scientist at the Generative AI Innovation Center at Amazon Web Services (AWS) working on Generative AI and complex multi-agents systems.

Gordon Campbell is the Chief Customer Officer and Co-Founder of RDC, where he leverages over 30 years in enterprise software to drive RDC’s leading AI Decisioning platform for business and commercial lenders. With a proven track record in product strategy and development across three global software firms, Gordon is committed to customer success, advocacy, and advancing financial inclusion through data and AI.

Charles Guan is the Chief Technology Officer and Co-founder of RDC. With more than 20 years of experience in data analytics and enterprise applications, he has driven technological innovation across both the public and private sectors. At RDC, Charles leads research, development, and product advancement—collaborating with universities to leverage advanced analytics and AI. He is dedicated to promoting financial inclusion and delivering positive community impact worldwide.

Hendra Suryanto is the Chief Data Scientist at RDC with more than 20 years of experience in data science, big data, and business intelligence. Before joining RDC, he served as a Lead Data Scientist at KPMG, advising clients globally. At RDC, Hendra designs end-to-end analytics solutions within an Agile DevOps framework. He holds a PhD in Artificial Intelligence and has completed postdoctoral research in machine learning.

Read More

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

AI agents are rapidly becoming the next frontier in enterprise transformation, with 82% of organizations planning adoption within the next 3 years. According to a Capgemini survey of 1,100 executives at large enterprises, 10% of organizations already use AI agents, and more than half plan to use them in the next year. The recent release of the DeepSeek-R1 models brings state-of-the-art reasoning capabilities to the open source community. Organizations can build agentic applications using these reasoning models to execute complex tasks with advanced decision-making capabilities, enhancing efficiency and adaptability.

In this post, we dive into how organizations can use Amazon SageMaker AI, a fully managed service that allows you to build, train, and deploy ML models at scale, and can build AI agents using CrewAI, a popular agentic framework and open source models like DeepSeek-R1.

Agentic design vs. traditional software design

Agentic systems offer a fundamentally different approach compared to traditional software, particularly in their ability to handle complex, dynamic, and domain-specific challenges. Unlike traditional systems, which rely on rule-based automation and structured data, agentic systems, powered by large language models (LLMs), can operate autonomously, learn from their environment, and make nuanced, context-aware decisions. This is achieved through modular components including reasoning, memory, cognitive skills, and tools, which enable them to perform intricate tasks and adapt to changing scenarios.

Traditional software platforms, though effective for routine tasks and horizontal scaling, often lack the domain-specific intelligence and flexibility that agentic systems provide. For example, in a manufacturing setting, traditional systems might track inventory but lack the ability to anticipate supply chain disruptions or optimize procurement using real-time market insights. In contrast, an agentic system can process live data such as inventory fluctuations, customer preferences, and environmental factors to proactively adjust strategies and reroute supply chains during disruptions.

Enterprises should strategically consider deploying agentic systems in scenarios where adaptability and domain-specific expertise are critical. For instance, consider customer service. Traditional chatbots are limited to preprogrammed responses to expected customer queries, but AI agents can engage with customers using natural language, offer personalized assistance, and resolve queries more efficiently. AI agents can significantly improve productivity by automating repetitive tasks, such as generating reports, emails, and software code. The deployment of agentic systems should focus on well-defined processes with clear success metrics and where there is potential for greater flexibility and less brittleness in process management.

DeepSeek-R1

In this post, we show you how to deploy DeepSeek-R1 on SageMaker, particularly the Llama-70b distilled variant DeepSeek-R1-Distill-Llama-70B to a SageMaker real-time endpoint. DeepSeek-R1 is an advanced LLM developed by the AI startup DeepSeek. It employs reinforcement learning techniques to enhance its reasoning capabilities, enabling it to perform complex tasks such as mathematical problem-solving and coding. To learn more about DeepSeek-R1, refer to DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart and deep dive into the thesis behind building DeepSeek-R1.

Generative AI on SageMaker AI

SageMaker AI, a fully managed service, provides a comprehensive suite of tools designed to deliver high-performance, cost-efficient machine learning (ML) and generative AI solutions for diverse use cases. SageMaker AI empowers you to build, train, deploy, monitor, and govern ML and generative AI models through an extensive range of services, including notebooks, jobs, hosting, experiment tracking, a curated model hub, and MLOps features, all within a unified integrated development environment (IDE).

SageMaker AI simplifies the process for generative AI model builders of all skill levels to work with foundation models (FMs):

  • Amazon SageMaker Canvas enables data scientists to seamlessly use their own datasets alongside FMs to create applications and architectural patterns, such as chatbots and Retrieval Augmented Generation (RAG), in a low-code or no-code environment.
  • Amazon SageMaker JumpStart offers a diverse selection of open and proprietary FMs from providers like Hugging Face, Meta, and Stability AI. You can deploy or fine-tune models through an intuitive UI or APIs, providing flexibility for all skill levels.
  • SageMaker AI features like notebooks, Amazon SageMaker Training, inference, Amazon SageMaker for MLOps, and Partner AI Apps enable advanced model builders to adapt FMs using LoRA, full fine-tuning, or training from scratch. These services support single GPU to HyperPods (cluster of GPUs) for training and include built-in FMOps tools for tracking, debugging, and deployment.

With SageMaker AI, you can build generative AI-powered agentic workflows using a framework of your choice. Some of the key benefits of using SageMaker AI for fine-tuning and hosting LLMs or FMs include:

  • Ease of deployment – SageMaker AI offers access to SageMaker JumpStart, a curated model hub where models with open weights are made available for seamless deployment through a few clicks or API calls. Additionally, for Hugging Face Hub models, SageMaker AI provides pre-optimized containers built on popular open source hosting frameworks such as vLLM, NVIDIA Triton, and Hugging Face Text Generation Inference (TGI). You simply need to specify the model ID, and the model can be deployed quickly.
  • Instance-based deterministic pricing – SageMaker AI hosted models are billed based on instance-hours rather than token usage. This pricing model enables you to more accurately predict and manage generative AI inference costs while scaling resources to accommodate incoming request loads.
  • Deployments with quantization – SageMaker AI enables you to optimize models prior to deployment using advanced strategies such as quantized deployments (such as AWQ, GPTQ, float16, int8, or int4). This flexibility allows you to efficiently deploy large models, such as a 32-billion parameter model, onto smaller instance types like ml.g5.2xlarge with 24 GB of GPU memory, significantly reducing resource requirements while maintaining performance.
  • Inference load balancing and optimized routing – SageMaker endpoints support load balancing and optimized routing with various strategies, providing users with enhanced flexibility and adaptability to accommodate diverse use cases effectively.
  • SageMaker fine-tuning recipes – SageMaker offers ready-to-use recipes for quickly training and fine-tuning publicly available FMs such as Meta’s Llama 3, Mistral, and Mixtral. These recipes use Amazon SageMaker HyperPod (a SageMaker AI service that provides resilient, self-healing clusters optimized for large-scale ML workloads), enabling efficient and resilient training on a GPU cluster for scalable and robust performance.

Solution overview

CrewAI provides a robust framework for developing multi-agent systems that integrate with AWS services, particularly SageMaker AI. CrewAI’s role-based agent architecture and comprehensive performance monitoring capabilities work in tandem with Amazon CloudWatch.

The framework excels in workflow orchestration and maintains enterprise-grade security standards aligned with AWS best practices, making it an effective solution for organizations implementing sophisticated agent-based systems within their AWS infrastructure.

In this post, we demonstrate how to use CrewAI to create a multi-agent research workflow. This workflow creates two agents: one that researches on a topic on the internet, and a writer agent takes this research and acts like an editor by formatting it in a readable format. Additionally, we guide you through deploying and integrating one or multiple LLMs into structured workflows, using tools for automated actions, and deploying these workflows on SageMaker AI for a production-ready deployment.

The following diagram illustrates the solution architecture.

Prerequisites

To follow along with the code examples in the rest of this post, make sure the following prerequisites are met:

  • Integrated development environment – This includes the following:
    • (Optional) Access to Amazon SageMaker Studio and the JupyterLab IDE – We will use a Python runtime environment to build agentic workflows and deploy LLMs. Having access to a JupyterLab IDE with Python 3.9, 3.10, or 3.11 runtimes is recommended. You can also set up Amazon SageMaker Studio for single users. For more details, see Use quick setup for Amazon SageMaker AI. Create a new SageMaker JupyterLab Space for a quick JupyterLab notebook for experimentation. To learn more, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools.
    • Local IDE – You can also follow along in your local IDE (such as PyCharm or VSCode), provided that Python runtimes have been configured for site to AWS VPC connectivity (to deploy models on SageMaker AI).
  • Permission to deploy models – Make sure that your user execution role has the necessary permissions to deploy models to a SageMaker real-time endpoint for inference. For more information, refer to Deploy models for inference.
  • Access to Hugging Face Hub – You must have access to Hugging Face Hub’s deepseek-ai/DeepSeek-R1-Distill-Llama-8B model weights from your environment.
  • Access to code – The code used in this post is available in the following GitHub repo.

Simplified LLM hosting on SageMaker AI

Before orchestrating agentic workflows with CrewAI powered by an LLM, the first step is to host and query an LLM using SageMaker real-time inference endpoints. There are two primary methods to host LLMs on SageMaker AI:

  • Deploy from SageMaker JumpStart
  • Deploy from Hugging Face Hub

Deploy DeepSeek from SageMaker JumpStart

SageMaker JumpStart offers access to a diverse array of state-of-the-art FMs for a wide range of tasks, including content writing, code generation, question answering, copywriting, summarization, classification, information retrieval, and more. It simplifies the onboarding and maintenance of publicly available FMs, allowing you to access, customize, and seamlessly integrate them into your ML workflows. Additionally, SageMaker JumpStart provides solution templates that configure infrastructure for common use cases, along with executable example notebooks to streamline ML development with SageMaker AI.

The following screenshot shows an example of available models on SageMaker JumpStart.

To get started, complete the following steps:

  1. Install the latest version of the sagemaker-python-sdk using pip.
  2. Run the following command in a Jupyter cell or the SageMaker Studio terminal:
pip install -U sagemaker
  1. List all available LLMs under the Hugging Face or Meta JumpStart hub. The following code is an example of how to do this programmatically using the SageMaker Python SDK:
from sagemaker.jumpstart.filters import (And, Or)
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# generate a conditional filter to only select LLMs from HF or Meta
filter_value = Or(
    And("task == llm", "framework == huggingface"), 
    "framework == meta", "framework == deekseek"
)

# Retrieve all available JumpStart models
all_models = list_jumpstart_models(filter=filter_value)

For example, deploying the deepseek-llm-r1 model directly from SageMaker JumpStart requires only a few lines of code:

from sagemaker.jumpstart.model import JumpStartModel

model_id = " deepseek-llm-r1" 
model_version = "*"

# instantiate a new JS meta model
model = JumpStartModel(
    model_id=model_id, 
    model_version=model_version
)

# deploy model on a 1 x p5e instance 
predictor = model.deploy(
    accept_eula=True, 
    initial_instance_count=1, 
    # endpoint_name="deepseek-r1-endpoint" # optional endpoint name
)

We recommend deploying your SageMaker endpoints within a VPC and a private subnet with no egress, making sure that the models remain accessible only within your VPC for enhanced security.

We also recommend you integrate with Amazon Bedrock Guardrails for increased safeguards against harmful content. For more details on how to implement Amazon Bedrock Guardrails on a self-hosted LLM, see Implement model-independent safety measures with Amazon Bedrock Guardrails.

Deploy DeepSeek from Hugging Face Hub

Alternatively, you can deploy your preferred model directly from the Hugging Face Hub or the Hugging Face Open LLM Leaderboard to a SageMaker endpoint. Hugging Face LLMs can be hosted on SageMaker using a variety of supported frameworks, such as NVIDIA Triton, vLLM, and Hugging Face TGI. For a comprehensive list of supported deep learning container images, refer to the available Amazon SageMaker Deep Learning Containers. In this post, we use a DeepSeek-R1-Distill-Llama-70B SageMaker endpoint using the TGI container for agentic AI inference. We deploy the model from Hugging Face Hub using Amazon’s optimized TGI container, which provides enhanced performance for LLMs. This container is specifically optimized for text generation tasks and automatically selects the most performant parameters for the given hardware configuration. To deploy from Hugging Face Hub, refer to the GitHub repo or the following code snippet:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
import os
from datetime import datetime

# Model configuration
hub = {'HF_MODEL_ID':'deepseek-ai/DeepSeek-R1-Distill-Llama-70B', #Llama-3.3-70B-Instruct
       'SM_NUM_GPUS': json.dumps(number_of_gpu),
       'HF_TOKEN': HUGGING_FACE_HUB_TOKEN,
       'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',  # Set to INFO level
       'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True'  # configure CUDA memory to use expandable memory segments
}
# Create and deploy model
huggingface_model =   HuggingFaceModel(image_uri=get_huggingface_llm_image_uri("huggingface", 
version="2.3.1"),
env=hub,
role=role,sagemaker_session=sagemaker_session)
predictor = huggingface_model.deploy(
               initial_instance_count=1,
               instance_type="ml.p4d.24xlarge"
               endpoint_name=custom_endpoint_name,
               container_startup_health_check_timeout=900)

A new DeepSeek-R1-Distill-Llama-70B endpoint should be InService in under 10 minutes. If you want to change the model from DeepSeek to another model from the hub, simply replace the following parameter or refer to the DeepSeek deploy example in the following GitHub repo. To learn more about deployment parameters that can be reconfigured inside TGI containers at runtime, refer to the following GitHub repo on TGI arguments.

...
"HF_MODEL_ID": "deepseek-ai/...", # replace with any HF hub models
# "HF_TOKEN": "hf_..." # add your token id for gated models
...

For open-weight models deployed directly from hubs, we strongly recommend placing your SageMaker endpoints within a VPC and a private subnet with no egress, making sure that the models remain accessible only within your VPC for a secure deployment.

Build a simple agent with CrewAI

CrewAI offers the ability to create multi-agent and very complex agentic orchestrations using LLMs from several LLM providers, including SageMaker AI and Amazon Bedrock. In the following steps, we create a simple blocks counting agent to serve as an example.

Create a blocks counting agent

The following code sets up a simple blocks counter workflow using CrewAI with two main components:

  • Agent creation (blocks_counter_agent) – The agent is configured with a specific role, goal, and capabilities. This agent is equipped with a tool called BlocksCounterTool.
  • Task definition (count_task) – This is a task that we want this agent to execute. The task includes a template for counting how many of each color of blocks are present, where {color} will be replaced with actual color of the block. The task is assigned to blocks_counter_agent.
from crewai import Agent, Task
from pydantic import BaseModel, Field

# 1. Configure agent
blocks_counter_agent = Agent(
    role="Blocks Inventory Manager",
    goal="Maintain accurate block counts",
    tools=[BlocksCounterTool],
    verbose=True
)

# 2. Create counting task
count_task = Task(
    description="Count {color} play blocks in storage",
    expected_output="Exact inventory count for specified color",
    agent=blocks_counter_agent
)

As you can see in the preceding code, each agent begins with two essential components: an agent definition that establishes the agent’s core characteristics (including its role, goal, backstory, available tools, LLM model endpoint, and so on), and a task definition that specifies what the agent needs to accomplish, including the detailed description of work, expected outputs, and the tools it can use during execution.

This structured approach makes sure that agents have both a clear identity and purpose (through the agent definition) and a well-defined scope of work (through the task definition), enabling them to operate effectively within their designated responsibilities.

Tools for agentic AI

Tools are special functions that give AI agents the ability to perform specific actions, like searching the internet or analyzing data. Think of them as apps on a smartphone—each tool serves a specific purpose and extends what the agent can do. In our example, BlocksCounterTool helps the agent count the number of blocks organized by color.

Tools are essential because they let agents do real-world tasks instead of just thinking about them. Without tools, agents would be like smart speakers that can only talk—they could process information but couldn’t take actual actions. By adding tools, we transform agents from simple chat programs into practical assistants that can accomplish real tasks.

Out-of-the-box tools with CrewAI
Crew AI offers a range of tools out of the box for you to use along with your agents and tasks. The following table lists some of the available tools.

Category Tool Description
Data Processing Tools FileReadTool For reading various file formats
Web Interaction Tools WebsiteSearchTool For web content extraction
Media Tools YoutubeChannelSearchTool For searching YouTube channels
Document Processing PDFSearchTool For searching PDF documents
Development Tools CodeInterpreterTool For Python code interpretation
AI Services DALL-E Tool For image generation

Build custom tools with CrewAI
You can build custom tools in CrewAI in two ways: by subclassing BaseTool or using the @tool decorator. Let’s look at the following BaseTool subclassing option to create the BlocksCounterTool we used earlier:

from crewai.tools import BaseTool

class BlocksCounterTool(BaseTool):
    name = "blocks_counter" 
    description = "Simple tool to count play blocks"

    def _run(self, color: str) -> str:
        return f"There are 10 {color} play blocks available"

Build a multi-agent workflow with CrewAI, DeepSeek-R1, and SageMaker AI

Multi-agent AI systems represent a powerful approach to complex problem-solving, where specialized AI agents work together under coordinated supervision. By combining CrewAI’s workflow orchestration capabilities with SageMaker AI based LLMs, developers can create sophisticated systems where multiple agents collaborate efficiently toward a specific goal. The code used in this post is available in the following GitHub repo.

Let’s build a research agent and writer agent that work together to create a PDF about a topic. We will use a DeepSeek-R1 Distilled Llama 3.3 70B model as a SageMaker endpoint for the LLM inference.

Define your own DeepSeek SageMaker LLM (using LLM base class)

The following code integrates SageMaker hosted LLMs with CrewAI by creating a custom inference tool that formats prompts with system instructions for factual responses, uses Boto3, an AWS core library, to call SageMaker endpoints, and processes responses by separating reasoning (before </think>) from final answers. This enables CrewAI agents to use deployed models while maintaining structured output patterns.

# Calls SageMaker endpoint for DeepSeek inference
def deepseek_llama_inference(prompt: dict, endpoint_name: str, region: str = "us-east-2") -> dict:
    try:
        # ... Response parsing Code...

    except Exception as e:
        raise RuntimeError(f"Error while calling SageMaker endpoint: {e}")

# CrewAI-compatible LLM implementation for DeepSeek models on SageMaker.
class DeepSeekSageMakerLLM(LLM):
    def __init__(self, endpoint: str):
        # <... Initialize LLM with SageMaker endpoint ...>

    def call(self, prompt: Union[List[Dict[str, str]], str], **kwargs) -> str:
        # <... Format and return the final response ...>

Name the DeepSeek-R1 Distilled endpoint
Set the endpoint name as defined earlier when you deployed DeepSeek from the Hugging Face Hub:

deepseek_endpoint = "deepseek-r1-dist-v3-llama70b-2025-01-22"

Create a DeepSeek inference tool
Just like how we created the BlocksCounterTool earlier, let’s create a tool that uses the DeepSeek endpoint for our agents to use. We use the same BaseTool subclass here, but we hide it in the CustomTool class implementation in sage_tools.py in the tools folder. For more information, refer to the GitHub repo.

from crewai import Crew, Agent, Task, Process 

# Create the Tool for LLaMA inference
deepseek_tool = CustomTool(
    name="deepseek_llama_3.3_70B",
    func=lambda inputs: deepseek_llama_inference(
        prompt=inputs,
        endpoint_name=deepseek_endpoint
    ),
    description="A tool to generate text using the DeepSeek LLaMA model deployed on SageMaker."
)

Create a research agent
Just like the simple blocks agent we defined earlier, we follow the same template here to define the research agent. The difference here is that we give more capabilities to this agent. We attach a SageMaker AI based DeepSeek-R1 model as an endpoint for the LLM.

This helps the research agent think critically about information processing by combining the scalable infrastructure of SageMaker with DeepSeek-R1’s advanced reasoning capabilities.

The agent uses the SageMaker hosted LLM to analyze patterns in research data, evaluate source credibility, and synthesize insights from multiple inputs. By using the deepseek_tool, the agent can dynamically adjust its research strategy based on intermediate findings, validate hypotheses through iterative questioning, and maintain context awareness across complex information it gathers.

# Research Agent

research_agent = Agent(
    role="Research Bot",
    goal="Scan sources, extract relevant information, and compile a research summary.",
    backstory="An AI agent skilled in finding relevant information from a variety of sources.",
    tools=[deepseek_tool],
    allow_delegation=True,
    llm=DeepSeekSageMakerLLM(endpoint=deepseek_endpoint),
    verbose=False
)

Create a writer agent
The writer agent is configured as a specialized content editor that takes research data and transforms it into polished content. This agent works as part of a workflow where it takes research from a research agent and acts like an editor by formatting the content into a readable format. The agent is used for writing and formatting, and unlike the research agent, it doesn’t delegate tasks to other agents.

writer_agent = Agent(
    role="Writer Bot",
    goal="Receive research summaries and transform them into structured content.",
    backstory="A talented writer bot capable of producing high-quality, structured content based on research.",
    tools=[deepseek_tool],
    allow_delegation=False,
    llm=DeepSeekSageMakerLLM(endpoint=deepseek_endpoint),
    verbose=False
)

Define tasks for the agents
Tasks in CrewAI define specific operations that agents need to perform. In this example, we have two tasks: a research task that processes queries and gathers information, and a writing task that transforms research data into polished content.

Each task includes a clear description of what needs to be done, the expected output format, and specifies which agent will perform the work. This structured approach makes sure that agents have well-defined responsibilities and clear deliverables.

Together, these tasks create a workflow where one agent researches a topic on the internet, and another agent takes this research and formats it into readable content. The tasks are integrated with the DeepSeek tool for advanced language processing capabilities, enabling a production-ready deployment on SageMaker AI.

research_task = Task(
    description=(
        "Your task is to conduct research based on the following query: {prompt}.n"
    ),
    expected_output="A comprehensive research summary based on the provided query.",
    agent=research_agent,
    tools=[deepseek_tool]
)

writing_task = Task(
    description=(
              "Your task is to create structured content based on the research provided.n""),
    expected_output="A well-structured article based on the research summary.",
    agent=research_agent,
    tools=[deepseek_tool]
)

Define a crew in CrewAI
A crew in CrewAI represents a collaborative group of agents working together to achieve a set of tasks. Each crew defines the strategy for task execution, agent collaboration, and the overall workflow. In this specific example, the sequential process makes sure tasks are executed one after the other, following a linear progression. There are other more complex orchestrations of agents working together, which we will discuss in future blog posts.

This approach is ideal for projects requiring tasks to be completed in a specific order. The workflow creates two agents: a research agent and a writer agent. The research agent researches a topic on the internet, then the writer agent takes this research and acts like an editor by formatting it into a readable format.

Let’s call the crew scribble_bots:

# Define the Crew for Sequential Workflow # 

scribble_bots = Crew( agents=[research_agent, writer_agent], 
       tasks=[research_task, writing_task], 
       process=Process.sequential # Ensure tasks execute in sequence)

Use the crew to run a task
We have our endpoint deployed, agents created, and crew defined. Now we’re ready to use the crew to get some work done. Let’s use the following prompt:

result = scribble_bots.kickoff(inputs={"prompt": "What is DeepSeek?"})

Our result is as follows:

**DeepSeek: Pioneering AI Solutions for a Smarter Tomorrow**

In the rapidly evolving landscape of artificial intelligence, 
DeepSeek stands out as a beacon of innovation and practical application. 
As an AI company, DeepSeek is dedicated to advancing the field through cutting-edge research and real-world applications, 
making AI accessible and beneficial across various industries.

**Focus on AI Research and Development**

………………….. ………………….. ………………….. …………………..

Clean up

Complete the following steps to clean up your resources:

  1. Delete your GPU DeekSeek-R1 endpoint:
import boto3

# Create a low-level SageMaker service client.
sagemaker_client = boto3.client('sagemaker', region_name=<region>)

# Delete endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
  1. If you’re using a SageMaker Studio JupyterLab notebook, shut down the JupyterLab notebook instance.

Conclusion

In this post, we demonstrated how you can deploy an LLM such as DeepSeek-R1—or another FM of your choice—from popular model hubs like SageMaker JumpStart or Hugging Face Hub to SageMaker AI for real-time inference. We explored inference frameworks like Hugging Face TGI which helps streamline deployment while integrating built-in performance optimizations to minimize latency and maximize throughput. Additionally, we showcased how the SageMaker developer-friendly Python SDK simplifies endpoint orchestration, allowing seamless experimentation and scaling of LLM-powered applications.

Beyond deployment, this post provided an in-depth exploration of agentic AI, guiding you through its conceptual foundations, practical design principles using CrewAI, and the seamless integration of state-of-the-art LLMs like DeepSeek-R1 as the intelligent backbone of an autonomous agentic workflow. We outlined a sequential CrewAI workflow design, illustrating how to equip LLM-powered agents with specialized tools that enable autonomous data retrieval, real-time processing, and interaction with complex external systems.

Now, it’s your turn to experiment! Dive into our publicly available code on GitHub, and start building your own DeepSeek-R1-powered agentic AI system on SageMaker. Unlock the next frontier of AI-driven automation—seamlessly scalable, intelligent, and production-ready.

Special thanks to Giuseppe Zappia, Poli Rao, and Siamak Nariman for their support with this blog post.


About the Authors

Surya Kari is a Senior Generative AI Data Scientist at AWS, specializing in developing solutions leveraging state-of-the-art foundation models. He has extensive experience working with advanced language models including DeepSeek-R1, the LLama family, and Qwen, focusing on their fine-tuning and optimization for specific scientific applications. His expertise extends to implementing efficient training pipelines and deployment strategies using AWS SageMaker, enabling the scaling of foundation models from development to production. He collaborates with customers to design and implement generative AI solutions, helping them navigate model selection, fine-tuning approaches, and deployment strategies to achieve optimal performance for their specific use cases.

Bobby Lindsey is a Machine Learning Specialist at Amazon Web Services. He’s been in technology for over a decade, spanning various technologies and multiple roles. He is currently focused on combining his background in software engineering, DevOps, and machine learning to help customers deliver machine learning workflows at scale. In his spare time, he enjoys reading, research, hiking, biking, and trail running.

Karan Singh is a Generative AI Specialist for third-party models at AWS, where he works with top-tier third-party foundation model (FM) providers to develop and execute joint Go-To-Market strategies, enabling customers to effectively train, deploy, and scale FMs to solve industry specific challenges. Karan holds a Bachelor of Science in Electrical and Instrumentation Engineering from Manipal University, a master’s in science in Electrical Engineering from Northwestern University and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Read More

Automate bulk image editing with Crop.photo and Amazon Rekognition

Automate bulk image editing with Crop.photo and Amazon Rekognition

Evolphin Software, Inc. is a leading provider of digital and media asset management solutions based in Silicon Valley, California. Crop.photo from Evolphin Software is a cloud-based service that offers powerful bulk processing tools for automating image cropping, content resizing, background removal, and listing image analysis.

Crop.photo is tailored for high-end retailers, ecommerce platforms, and sports organizations. The solution has created a unique offering for bulk image editing through its advanced AI-driven solutions. In this post, we explore how Crop.photo uses Amazon Rekognition to provide sophisticated image analysis, enabling automated and precise editing of large volumes of images. This integration streamlines the image editing process for clients, providing speed and accuracy, which is crucial in the fast-paced environments of ecommerce and sports.

Automation: The way out of bulk image editing challenges

Bulk image editing isn’t just about handling a high volume of images, it’s about delivering flawless results with speed at scale. Large retail brands, marketplaces, and sports industries process thousands of images weekly. Each image must be catalog-ready or broadcast-worthy in minutes, not hours.

The challenge lies not just in the quantity but in maintaining high-quality images and brand integrity. Speed and accuracy are non-negotiable. Retailers and sports organizations expect rapid turnaround without compromising image integrity.

This is where Crop.photo’s smart automations come in with an innovative solution for high-volume image processing needs. The platform’s advanced AI algorithms can automatically detect subjects of interest, crop the images, and optimize thousands of images simultaneously while providing consistent quality and brand compliance. By automating repetitive editing tasks, Crop.photo enables enterprises to reduce image processing time from hours to minutes, allowing creative teams to focus on higher-value activities.

Challenges in the ecommerce industry

The ecommerce industry often encounters the following challenges:

  • Inefficiencies and delays in manual image editing – Ecommerce companies rely on manual editing for tasks like resizing, alignment, and background removal. This process can be time-consuming and prone to delays and inconsistencies. A more efficient solution is needed to streamline the editing process, especially during platform migrations or large updates.
  • Maintaining uniformity across diverse image types – Companies work with a variety of image types, from lifestyle shots to product close-ups, across different categories. Maintaining uniformity and professionalism in all image types is essential to meet the diverse needs of marketing, product cataloging, and overall brand presentation.
  • Large-scale migration and platform transition – Transitioning to a new ecommerce platform involves migrating thousands of images, which presents significant logistical challenges. Providing consistency and quality across a diverse range of images during such a large-scale migration is crucial for maintaining brand standards and a seamless user experience.

For a US top retailer, wholesale distribution channels posed a unique challenge. Thousands of fashion images need to be made for the marketplace with less than a day’s notice for flash sales. Their director of creative operations said,

“Crop.photo is an essential part of our ecommerce fashion marketplace workflow. With over 3,000 on-model product images to bulk crop each month, we rely on Crop.photo to enable our wholesale team to quickly publish new products on popular online marketplaces such as Macy’s, Nordstrom, and Bloomingdales. By increasing our retouching team’s productivity by over 70%, Crop.photo has been a game changer for us. Bulk crop images used to take days can now be done in a matter of seconds!”

Challenges in the sports industry

The sports industry often contends with the following challenges:

  • Bulk player headshot volume and consistency – Sports organizations face the challenge of bulk cropping and resizing hundreds of player headshots for numerous teams, frequently on short notice. Maintaining consistency and quality across a large volume of images can be difficult without AI.
  • Diverse player facial features – Players have varying facial features, such as different hair lengths, forehead sizes, and face dimensions. Adapting cropping processes to accommodate these differences traditionally requires manual adjustments for each image, which leads to inconsistencies and significant time investment.
  • Editorial time constraints – Tight editorial schedules and resource limitations are common in sports organizations. The time-consuming nature of manual cropping tasks strains editorial teams, particularly during high-volume periods like tournaments, where delays and rushed work can impact quality and timing.

An Imaging Manager at Europe’s Premier Football Organization expressed,

“We recently found ourselves with 40 images from a top flight English premier league club needing to be edited just 2 hours before kick-off. Using the Bulk AI headshot cropping for sports feature from Crop.photo, we had perfectly cropped headshots of the squad in just 5 minutes, making them ready for publishing in our website CMS just in time. We would never have met this deadline using manual processes. This level of speed was unthinkable before, and it’s why we’re actively recommending Crop.photo to other sports leagues.”

Solution overview

Crop.photo uses Amazon Rekognition to power a robust solution for bulk image editing. Amazon Rekognition offers features like object and scene detection, facial analysis, and image labeling, which they use to generate markers that drive a fully automated image editing workflow.

The following diagram presents a high-level architectural data flow highlighting several of the AWS services used in building the solution.

Architecture diagram showing the end-to-end workflow for Crop.photo’s automated bulk image editing using AWS services.

The solution consists of the following key components:

  • User authenticationAmazon Cognito is used for user authentication and user management.
  • Infrastructure deployment – Frontend and backend servers are used on Amazon Elastic Container Service (Amazon ECS) for container deployment, orchestration, and scaling.
  • Content delivery and cachingAmazon CloudFront is used to cache content, improving performance and routing traffic efficiently.
  • File uploadsAmazon Simple Storage Service (Amazon S3) enables transfer acceleration for fast, direct uploads to Amazon S3.
  • Media and job storage – Information about uploaded files and job execution is stored in Amazon Aurora.
  • Image processingAWS Batch processes thousands of images in bulk.
  • Job managementAmazon Simple Queue Service (Amazon SQS) manages and queues jobs for processing, making sure they’re run in the correct order by AWS Batch.
  • Media analysis – Amazon Rekognition services analyze media files, including:
    • Face Analysis to generate headless crops.
    • Moderation to detect and flag profanity and explicit content.
    • Label Detection to provide context for image processing and focus on relevant objects.
    • Custom Labels to identify and verify brand logos and adhere to brand guidelines.
  • Asynchronous job notificationsAmazon Simple Notification Service (Amazon SNS), Amazon EventBridge, and Amazon SQS deliver asynchronous job completion notifications, manage events, and provide reliable and scalable processing.

Amazon Rekognition is an AWS computer vision service that powers Crop.photo’s automated image analysis. It enables object detection, facial recognition, and content moderation capabilities:

  • Face detection – The Amazon Rekognition face detection feature automatically identifies and analyzes faces in product images. You can use this feature for face-based cropping and optimization through adjustable bounding boxes in the interface.
  • Image color analysis – The color analysis feature examines image composition, identifying dominant colors and balance. This integrates with Crop.photo’s brand guidelines checker to provide consistency across product images.
  • Object detection – Object detection automatically identifies key elements in images, enabling smart cropping suggestions. The interface highlights detected objects, allowing you to prioritize specific elements during cropping.
  • Custom label detection – Custom label detection recognizes brand-specific items and assets. Companies can train models for their unique needs, automatically applying brand-specific cropping rules to maintain consistency.
  • Text detection (OCR) – The OCR capabilities of Amazon Recognition detect and preserve text within images during editing. The system highlights text areas to make sure critical product information remains legible after cropping.

Within the Crop.photo interface, users can upload videos through the standard interface, and the speech-to-text functionality will automatically transcribe any audio content. This transcribed text can then be used to enrich the metadata and descriptions associated with the product images or videos, improving searchability and accessibility for customers. Additionally, the brand guidelines check feature can be applied to the transcribed text, making sure that the written content aligns with the company’s branding and communication style.

The Crop.photo service follows a transparent pricing model that combines unlimited automations with a flexible image credit system. Users have unrestricted access to create and run as many automation workflows as needed, without any additional charges. The service includes a range of features at no extra cost, such as basic image operations, storage, and behind-the-scenes processing.

For advanced AI-powered image processing tasks, like smart cropping or background removal, users consume image credits. The number of credits required for each operation is clearly specified, allowing users to understand the costs upfront. Crop.photo offers several subscription plans with varying image credit allowances, enabling users to choose the plan that best fits their needs.

Results: Improved speed and precision

The automated image editing capabilities of Crop.photo with the integration of Amazon Rekognition has increased speed in editing, with 70% faster image retouching for ecommerce. With a 75% reduction in manual work, the turnaround time for new product images is reduced from 2–3 days to just 1 hour. Similarly, the bulk image editing process has been streamlined, allowing over 100,000 image collections to be processed per day using AWS Fargate. Advanced AI-powered image analysis and editing features provide consistent, high-quality images at scale, eliminating the need for manual review and approval of thousands of product images.

For instance, in the ecommerce industry, this integration facilitates automatic product detection and precise cropping, making sure every image meets specific marketplace and brand standards. In sports, it enables quick identification and cropping of player facial features, including head, eyes, and mouth, adapting to varying backgrounds and maintaining brand consistency.

The following images are before and after pictures for an ecommerce use case.

For a famous wine retailer in the United Kingdom, the integration of Amazon Rekognition with Crop.photo streamlined the processing of over 1,700 product images, achieving a 95% reduction in bulk image editing time, a confirmation to the efficiency of AI-powered enhancement.

Similarly, a top 10 global specialty retailer experienced a transformative impact on their ecommerce fashion marketplace workflow. By automating the cropping of over 3,000 on-model product images monthly, they boosted their retouching team’s productivity by over 70%, maintaining compliance with the varied image standards of multiple online marketplaces.

Conclusion

These case studies illustrate the tangible benefits of integrating Crop.photo with Amazon Rekognition, demonstrating how automation and AI can revolutionize the bulk image editing landscape for ecommerce and sports industries.

Crop.photo, from AWS Partner Evolphin Software, offers powerful bulk processing tools for automating image cropping, content resizing, and listing image analysis, using advanced AI-driven solutions. Crop.photo is tailored for high-end retailers, ecommerce platforms, and sports organizations. Its integration with Amazon Rekognition aims to streamline the image editing process for clients, providing speed and accuracy in the high-stakes environment of ecommerce and sports. Crop.photo plans additional AI capabilities with Amazon Bedrock generative AI frameworks to adapt to emerging digital imaging trends, so it remains an indispensable tool for its clients.

To learn more about Evolphin Software and Crop.photo, visit their website.

To learn more about Amazon Rekognition, refer to the Amazon Rekognition Developer Guide.


About the Authors

Rahul Bhargava, founder & CTO of Evolphin Software and Crop.photo, is reshaping how brands produce and manage visual content at scale. Through Crop.photo’s AI-powered tools, global names like Lacoste and Urban Outfitters, as well as ambitious Shopify retailers, are rethinking their creative production workflows. By leveraging cutting-edge Generative AI, he’s enabling brands of all sizes to scale their content creation efficiently while maintaining brand consistency.

Vaishnavi Ganesan is a Solutions Architect specializing in Cloud Security at AWS based in the San Francisco Bay Area. As a trusted technical advisor, Vaishnavi helps customers to design secure, scalable and innovative cloud solutions that drive both business value and technical excellence. Outside of work, Vaishnavi enjoys traveling and exploring different artisan coffee roasters.

John Powers is an Account Manager at AWS, who provides guidance to Evolphin Software and other organizations to help accelerate business outcomes leveraging AWS Technologies. John has a degree in Business Administration and Management with a concentration in Finance from Gonzaga University, and enjoys snowboarding in the Sierras in his free time.

Read More

Revolutionizing business processes with Amazon Bedrock and Appian’s generative AI skills

Revolutionizing business processes with Amazon Bedrock and Appian’s generative AI skills

This blog post is co-written with Louis Prensky and Philip Kang from Appian. 

The digital transformation wave has compelled enterprises to seek innovative solutions to streamline operations, enhance efficiency, and maintain a competitive edge. Recognizing the growing complexity of business processes and the increasing demand for automation, the integration of generative AI skills into environments has become essential. This strategic move addresses key challenges such as managing vast amounts of unstructured data, adhering to regulatory compliance, and automating repetitive tasks to boost productivity. Using robust infrastructure and advanced language models, these AI-driven tools enhance decision-making by providing valuable insights, improving operational efficiency by automating routine tasks, and helping with data privacy through built-in detection and management of sensitive information. For enterprises, this means achieving higher levels of operational excellence, significant cost savings, and scalable solutions that adapt to business growth. For customers, it translates to improved service quality, enhanced data protection, and a more dynamic, responsive service, ultimately driving better experiences and satisfaction.

Appian has led the charge by offering generative AI skills powered by a collaboration with Amazon Bedrock and Anthropic’s Claude large language models (LLMs). This partnership allows organizations to:

  • Enhance decision making with valuable insights
  • Improve operational efficiency by automating tasks
  • Help protect data privacy through built-in detection and management of sensitive information
  • Maintain compliance with HIPAA and FedRAMP compliant AI skills

Critically, by placing AI in the context of a wider environment, organizations can operationalize AI in processes that seamlessly integrate with existing software, pass work between digital workers and humans, and help achieve strong security and compliance.

Background

Appian, an AWS Partner with competencies in financial services, healthcare, and life sciences, is a leading provider of low-code automation software to streamline and optimize complex business processes for enterprises. The Appian AI Process Platform includes everything you need to design, automate, and optimize even the most complex processes, from start to finish. The world’s most innovative organizations trust Appian to improve their workflows, unify data, and optimize operations—resulting in accelerated growth and superior customer experiences.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Appian uses the robust infrastructure of Amazon Bedrock and Anthropic’s Claude LLMs to offer fully integrated, pre-built generative AI skills that help developers enhance and automate business processes using low-code development. These use case-driven tools automate common tasks in business processes, making AI-powered applications faster and easier to develop.

This blog post will cover how Appian AI skills build automation into organizations’ mission-critical processes to improve operational excellence, reduce costs, and build scalable solutions. Additionally, we’ll cover real-world examples of processes such as:

  • A mortgage lender that used AI-driven data extraction to reduce mortgage processing times from 16 weeks to 10 weeks.
  • A financial services company that achieved a four-fold reduction in data extraction time from trade-related emails.
  • A legal institution that used AI to reduce attorney time spent on contract review, enabling them to focus on other, high-value work.

Current challenges faced by enterprises

Modern enterprises face numerous challenges, including:

  • Managing vast amounts of unstructured data: Enterprises deal with immense volumes of data generated from various sources such as emails, documents, and customer interactions. Organizing, analyzing, and extracting valuable insights from unstructured data can be overwhelming without advanced AI capabilities.
  • Help protect data privacy and compliance: With increasing regulatory requirements around data privacy and protection, organizations must safeguard sensitive information, such as personally identifiable information (PII). Manual processes for data redaction and compliance checks are often error-prone and resource-intensive.
  • Streamlining repetitive and time-consuming tasks: Routine tasks such as data entry, document processing, and content classification consume significant time and effort. Automating these tasks can lead to substantial productivity gains and allow employees to focus on more strategic activities.
  • Adapting to rapidly changing market conditions: In a fast-paced business environment, organizations need to be agile and responsive. This requires real-time data analysis and decision-making capabilities that traditional systems might not provide. AI helps businesses quickly adapt to industry changes and customer demands.
  • Enhancing decision-making with accurate data insights: Making informed decisions requires access to accurate and timely data. However, extracting meaningful insights from large datasets can be challenging without advanced analytical tools. AI-powered solutions can process and analyze data at scale, providing valuable insights that drive better decision-making.

Appian AI service architecture

The architecture of the generative AI skills integrates both the Amazon Bedrock and Amazon Textract scalable infrastructure with Appian’s process management capabilities. This generative AI architecture is designed with private AI as the foundation and upholds those principles.

If a customer site isn’t located in an AWS Region that supports a feature, customers can send their data to a supported Region, as shown in the following figure.Appian Architecture diagram

The key components of this architecture include:

  1. Appian AI Process Platform instances: The frontend serves as the primary application environment where users interact with the system application to upload documents, initiate workflows, and view processed results.
  2. Appian AI service: This service functions as an intermediary layer between the Appian instances and AWS AI services (Amazon Textract and Amazon Bedrock). This layer encapsulates the logic required to interact with the AWS AI services to manage API calls, data formatting, and error handling.
  3. Amazon Textract: This AWS service is used to automate the extraction of text and structured data from scanned documents and images and provide the extracted data in a structured format.
  4. Amazon Bedrock: This AWS service provides advanced AI capabilities using FMs for tasks such as text summarization, sentiment analysis, and natural language understanding. This helps enhance the extracted data with deeper insights and contextual understanding.

Solution

Appian generative AI skills, powered by Amazon Bedrock with Anthropic’s Claude family of LLMs, are designed to jump-start the use of generative AI in your processes. The following figure showcases the diverse capabilities of Appian’s generative AI skills, demonstrating how they enable enterprises to seamlessly automate complex tasks.

Selecting an AI skill

Appian select AI skills

Editing an AI skill

Edit Appian AI skills

Each new skill provides a pre-populated prompt template tailored to specific tasks, alleviating the need to start from scratch. Businesses can select the desired action and customize the prompt for a perfect fit, enabling the automation of tasks such as:

  • Content analysis and processing: With Appian’s generative AI skills, businesses can automatically generate, summarize, and classify content across various formats. This capability is particularly useful for managing large volumes of customer feedback, generating reports, and creating content summaries, significantly reducing the time and effort required for manual content processing.
  • Text and data extraction: Organizations generate mountains of data and documents. Extracting this information manually can be both burdensome and error-prone. Appian’s AI skills can perform highly accurate text extraction from PDF files and scanned images and pull relevant data from both structured and unstructured data sources such as invoices, forms, and emails. This speeds up data processing and promotes higher accuracy and consistency.
  • PII extraction and redaction: Identifying and managing PII within large datasets is crucial for data governance and compliance. Appian’s AI skills can automatically identify and extract sensitive information from documents and communication channels. Additionally, Appian supports plugins that can redact this content for further privacy. This assists your compliance efforts without extensive manual intervention.
  • Document summarization: Appian’s AI skills can summarize documents to give users an overview before digging into the details. Whether it’s summarizing research papers, legal documents, or internal reports, AI can generate concise summaries, saving time and making sure that critical information is highlighted for quick review.

The following figure shows an example of a prompt-builder skill used to extract unstructured data from a bond certificate.

Create Gen AI Prompt

Each AI skill offers pre-populated prompt templates, allowing you to deploy AI without starting from scratch. Each template caters to specific business needs, making implementation straightforward and efficient. Plus, users can customize these prompts to fit their unique requirements and operational needs.

Key takeaways

In this solution, Appian Cloud seamlessly integrates and customizes Amazon Bedrock and Claude LLMs behind the scenes, abstracting complexity to deliver enterprise-grade AI capabilities tailored to its cloud environment. It provides pre-built, use case specific prompt templates for tasks like text summarization and data extractions, dynamically customized based on user inputs and business context. Using the scalability of the Amazon Bedrock infrastructure, Appian Cloud provides optimal performance and efficient handling of enterprise-scale workflows, all within a fully managed cloud service.

By addressing these complexities, Appian Cloud empowers businesses to focus solely on using AI to achieve operational excellence and business outcomes without the burdens of technical setup, integration challenges, or ongoing maintenance efforts.

Customer success stories

Appian’s AI skills have proven effective across multiple industries. Here are a few real-world examples:

  • Mortgage processing: This organization automated the extraction of over 60 data fields from inconsistent document formats, reducing the process timeline from 16 weeks to 10 weeks and achieving 98.33% accuracy. The implementation of Appian’s generative AI skills allowed the mortgage processor to streamline their workflow, significantly cutting down on processing time and improving data accuracy, which led to faster loan approvals and increased customer satisfaction.
  • Financial services: A financial service company received over 1,000 loosely structured emails about trades. Manually annotating these emails led to significant human errors. With an Appian generative AI skill, the customer revamped the entity tagging process by automatically extracting approximately 40 data fields from unstructured emails. This resulted in a four-fold reduction in extraction time and achieved over 95% accuracy, improving the user experience compared to traditional ML extraction tools. The automated process not only reduced errors but also enhanced the speed and reliability of data extraction, leading to more accurate and timely trading decisions.
  • Legal review: A legal institution had to review negotiated contracts against the original contracts to determine whether the outlined risks had been resolved. This manual process was error prone and labor intensive. By deploying a generative AI skill, they automated the extraction of changes between contracts to find the differences and whether risks had been resolved. This streamlined the attorney review process and provided insights and reasoning into the differences found. The automated solution significantly reduced the time attorneys spent on contract review, allowing them to focus on more strategic tasks and improving the overall efficiency of the legal department.

Conclusion

AWS and Appian’s collaboration marks a significant advancement in business process automation. By using the power of Amazon Bedrock and Anthropic’s Claude models, Appian empowers enterprises to optimize and automate processes for greater efficiency and effectiveness. This partnership sets a new standard for AI-driven business solutions, leading to greater growth and enhanced customer experiences. The ability to quickly deploy and customize AI skills allows businesses to stay agile and responsive in a dynamic environment.

Appian solutions are available as software as a service (SaaS) offerings in AWS Marketplace. Check out the Appian website to learn more about how to use the AI skills.


About the Authors

Sunil BemarkarSunil Bemarkar is a Senior Partner Solutions Architect at Amazon Web Services. He works with various Independent Software Vendors (ISVs) and Strategic customers across industries to accelerate their digital transformation journey and cloud adoption.

John Klacynski is a Principal Customer Solution Manager within the AWS Independent Software Vendor (ISV) team. In this role, he programmatically helps ISV customers adopt AWS technologies and services to reach their business goals more quickly.

Louis Prensky is a Senior Product Manager at Appian. He is responsible for driving product strategy and feature design for AI Skills within Appian’s Cognitive Automation Group.

Philip KangPhilip Kang is a Principal Solutions Consultant in Partner Technology & Innovation centers with Appian. In this role, he spearheads technical innovation with a focus on AI/ML and cloud solutions.

Read More

Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms

Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms

PyTorch* 2.6 has just been released with a set of exciting new features including torch.compile compatibility with Python 3.13, new security and performance enhancements, and a change in the default parameter for torch.load. PyTorch also announced the deprecation of its official Anaconda channel.

Among the performance features are three that enhance developer productivity on Intel platforms:

  1. Improved Intel GPU availability
  2. FlexAttention optimization on x86 CPU for LLM
  3. FP16 on x86 CPU support for eager and Inductor modes

Improved Intel GPU Availability

To provide developers working in artificial intelligence (AI) with better support for Intel GPUs, the PyTorch user experience on these GPUs has been enhanced. This improvement includes simplified installation steps, a Windows* release binary distribution, and expanded coverage of supported GPU models, including the latest Intel® Arc™ B-Series discrete graphics.

These new features help promote accelerated machine learning workflows within the PyTorch ecosystem, providing a consistent developer experience and support. Application developers and researchers seeking to fine-tune, perform inference, and develop with PyTorch models on Intel® Core™ Ultra AI PCs  and Intel® Arc™ discrete graphics will now be able to install PyTorch directly with binary releases for Windows, Linux*, and Windows Subsystem for Linux 2.

The new features include:

  • Simplified Intel GPU software stack setup to enable one-click installation of the torch-xpu PIP wheels to run deep learning workloads in a ready-to-use fashion, thus eliminating the complexity of installing and activating Intel GPU development software bundles. 
  • Windows binary releases for torch core, torchvision and torchaudio have been made available for Intel GPUs, expanding from Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics and Intel® Arc™ A-Series graphics to the latest GPU hardware Intel® Arc™ B-Series graphics support. 
  • Further enhanced coverage of Aten operators on Intel GPUs with SYCL* kernels for smooth eager mode execution, as well as bug fixes and performance optimizations for torch.compile on Intel GPUs. 

Get a tour of new environment setup, PIP wheels installation, and examples on Intel® Client GPUs and Intel® Data Center GPU Max Series in the Getting Started Guide.

FlexAttention Optimization on X86 CPU for LLM

FlexAttention was first introduced in PyTorch 2.5, to address the need to support various Attentions or even combinations of them. This PyTorch API leverages torch.compile to generate a fused FlashAttention kernel, which eliminates extra memory allocation and achieves performance comparable to handwritten implementations.

Previously, FlexAttention was implemented for CUDA* devices based on the Triton backend. Since PyTorch 2.6, X86 CPU support of FlexAttention was added through TorchInductor CPP backend. This new feature leverages and extends current CPP template abilities to support broad attention variants (e.g., PageAttention, which is critical for LLMs inference) based on the existing FlexAttention API, and brings optimized performance on x86 CPUs. With this feature, user can easily use FlexAttention API to compose their Attention solutions on CPU platforms and achieve good performance.

Typically, FlexAttention is utilized by popular LLM ecosystem projects, such as Hugging Face transformers and vLLM in their LLM related modeling (e.g., PagedAttention) to achieve better out-of-the-box performance. Before the official adoption happens, this enabling PR in Hugging Face can help us the performance benefits that FlexAttention can bring on x86 CPU platforms.

The graph below shows the performance comparison of PyTorch 2.6 (with this feature) and PyTorch 2.5 (without this feature) on typical Llama models. For real-time mode (Batch Size = 1), there is about 1.13x-1.42x performance improvement for next token across different input token lengths. As for best throughput under a typical SLA (P99 token latency <=50ms), PyTorch 2.6 achieves more than 7.83x performance than PyTorch 2.5 as PyTorch 2.6 can run at 8 inputs (Batch Size = 8) together and still keep SLA while PyTorch 2.5 can only run 1 input, because FlexAttention based PagedAttention in PyTorch 2.6 provides more efficiency during multiple batch size scenarios.

Figure 1. Performance comparison of PyTorch 2.6 and PyTorch 2.5 on Typical Llama Models

Figure 1. Performance comparison of PyTorch 2.6 and PyTorch 2.5 on Typical Llama Models

FP16 on X86 CPU Support for Eager and Inductor Modes

Float16 is a commonly used reduced floating-point type that improves performance in neural network inference and training. CPUs like recently launched Intel® Xeon® 6 with P-Cores support Float16 datatype with native accelerator AMX, which highly improves the Float16 performance. Float16 support on x86 CPU was first introduced in PyTorch 2.5 as a prototype feature. Now it has been further improved for both eager mode and Torch.compile + Inductor mode, which is pushed to Beta level for broader adoption. This helps the deployment on the CPU side without the need to modify the model weights when the model is pre-trained with mixed precision of Float16/Float32. On platforms that support AMX Float16 (i.e., the Intel® Xeon® 6 processors with P-cores), Float16 has the same pass rate as Bfloat16 across the typical PyTorch benchmark suites: TorchBench, Hugging Face, and Timms. It also shows good performance comparable to 16 bit datatype Bfloat16.

Summary

In this blog, we discussed three features to enhance developer productivity on Intel platforms in PyTorch 2.6. These three features are designed to improve Intel GPU availability, optimize FlexAttention for x86 CPUs tailored for large language models (LLMs), and support FP16 on x86 CPUs in both eager and Inductor modes. Get PyTorch 2.6 and try them for yourself or you can access PyTorch 2.6 on the Intel® Tiber™ AI Cloud to take advantage of hosted notebooks that are optimized for Intel hardware and software.

Acknowledgements

The release of PyTorch 2.6 is an exciting milestone for Intel platforms, and it would not have been possible without the deep collaboration and contributions from the community. We extend our heartfelt thanks to Alban, Andrey, Bin, Jason, Jerry and Nikita for sharing their invaluable ideas, meticulously reviewing PRs, and providing insightful feedback on RFCs. Their dedication has driven continuous improvements and pushed the ecosystem forward for Intel platforms.

References

Product and Performance Information

Measurement on AWS EC2 m7i.metal-48xl using: 2x Intel® Xeon® Platinum 8488C, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB [8], DSA [8], IAA[8], QAT[on CPU, 8], Total Memory 512GB (16x32GB DDR5 4800 MT/s [4400 MT/s]), BIOS Amazon EC2 1.0, microcode 0x2b000603, 1x Elastic Network Adapter (ENA) 1x Amazon Elastic Block Store 800G, Ubuntu 24.04.1 LTS 6.8.0-1018-aws Test by Intel on Jan 15th 2025.

Notices and Disclaimers

Performance varies by use, configuration and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.

Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

AI disclaimer:

AI features may require software purchase, subscription or enablement by a software or platform provider, or may have specific configuration or compatibility requirements. Details at www.intel.com/AIPC. Results may vary.

Read More

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

*Primary Contributors
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sigmoid attention and conduct an in-depth theoretical and empirical analysis. Theoretically, we prove that transformers with sigmoid attention are universal function approximators and…Apple Machine Learning Research