Building a board game with the TFLite plugin for Flutter

Building a board game with the TFLite plugin for Flutter

Posted by Wei Wei, Developer Advocate

In our previous blog posts Building a board game app with TensorFlow: a new TensorFlow Lite reference app and Building a reinforcement learning agent with JAX, and deploying it on Android with TensorFlow Lite, we demonstrated how to train a reinforcement learning (RL) agent with TensorFlow, TensorFlow Agents and JAX respectively, and then deploy the converted TFLite model in an Android app using TensorFlow Lite, to play a simple board game ‘Plane Strike’.

While these end-to-end tutorials are helpful for Android developers, we have heard from the Flutter developer community that it would be interesting to make the app cross-platform. Inspired by the officially released TensorFlow Lite Plugin for Flutter recently, we are going to write one last tutorial and port the app to Flutter.
Flow Chart illustrating training a Reinforncement Learning (RL) Agent with TensorFlow, TensorFlow Agents and JAX, deploying the converted model in an Android app and Flutter using the TensorFlow Lite plugin

Since we already have the model trained with TensorFlow and converted to TFLite, we can just load the model with TFLite interpreter:

void _loadModel() async {
  // Create the interpreter
  _interpreter = await Interpreter.fromAsset(_modelFile);
}

Then we pass in the user board state and help the game agent identify the most promising position to strike next (please refer to our previous blog posts if you need a refresher on the game rules) by running TFLite inference:

int predict(List<List<double>> boardState) {
  var input = [boardState];
  var output = List.filled(_boardSize * _boardSize, 0)
      .reshape([1, _boardSize * _boardSize]);

  // Run inference
  _interpreter.run(input, output);

  // Argmax
  double max = output[0][0
];


  int
maxIdx = 0;
  for (int i = 1; i < _boardSize * _boardSize; i++) {
    if (max < output[0][i]) {
      maxIdx = i;
      max = output[0][i];
    }
  }

  return maxIdx;
}

That’s it! With some additional Flutter frontend code to render the game boards and track game progress, we can immediately run the game on both Android and iOS (currently the plugin only supports these two mobile platforms). You can find the complete code on GitHub.

ALT TEXT
If you want to dig digger, there are a couple of things you can try:
  1. Convert the TFAgents-trained model to TFLite and run it with the plugin
  2. Leverage the RL technique we have used and build a new agent for the tic tac toe game in the Flutter Casual Games Toolkit. You will need to create a new RL environment and train the model from scratch before deployment, but the core concept and technique are pretty much the same.

This concludes this mini-series of blogs on leveraging TensorFlow/JAX to build games for Android and Flutter. And we very much look forward to all the exciting things you build with our tooling, so be sure to share them with @googledevs, @TensorFlow, and your developer communities!

Read More

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Purina US, a subsidiary of Nestle, has a long history of enabling people to more easily adopt pets through Petfinder, a digital marketplace of over 11,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, Petfinder has helped millions of pets find their forever homes.

Purina consistently seeks ways to make the Petfinder platform even better for both shelters and rescue groups and pet adopters. One challenge they faced was adequately reflecting the specific breed of animals up for adoption. Because many shelter animals are mixed breed, identifying breeds and attributes correctly in the pet profile required manual effort, which was time consuming. Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale.

This post details how Purina used Amazon Rekognition Custom Labels, AWS Step Functions, and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

Solution overview

Predicting animal breeds from an image needs custom ML models. Developing a custom model to analyze images is a significant undertaking that requires time, expertise, and resources, often taking months to complete. Additionally, it often requires thousands or tens of thousands of hand-labeled images to provide the model with enough data to accurately make decisions. Setting up a workflow for auditing or reviewing model predictions to validate adherence to your requirements can further add to the overall complexity.

With Rekognition Custom Labels, which is built on the existing capabilities of Amazon Rekognition, you can identify the objects and scenes in images that are specific to your business needs. It is already trained on tens of millions of images across many categories. Instead of thousands of images, you can upload a small set of training images (typically a few hundred images or less per category) that are specific to your use case.

The solution uses the following services:

  • Amazon API Gateway is a fully managed service that makes it easy for developers to publish, maintain, monitor, and secure APIs at any scale.
  • The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework for defining cloud infrastructure as code with modern programming languages and deploying it through AWS CloudFormation.
  • AWS CodeBuild is a fully managed continuous integration service in the cloud. CodeBuild compiles source code, runs tests, and produces packages that are ready to deploy.
  • Amazon DynamoDB is a fast and flexible nonrelational database service for any scale.
  • AWS Lambda is an event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.
  • Amazon Rekognition offers pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos. With Amazon Rekognition Custom Labels, you can identify the objects and scenes in images that are specific to your business needs.
  • AWS Step Functions is a fully managed service that makes it easier to coordinate the components of distributed applications and microservices using visual workflows.
  • AWS Systems Manager is a secure end-to-end management solution for resources on AWS and in multicloud and hybrid environments. Parameter Store, a capability of Systems Manager, provides secure, hierarchical storage for configuration data management and secrets management.

Purina’s solution is deployed as an API Gateway HTTP endpoint, which routes the requests to obtain pet attributes. It uses Rekognition Custom Labels to predict the pet breed. The ML model is trained from pet profiles pulled from Purina’s database, assuming the primary breed label is the true label. DynamoDB is used to store the pet attributes. Lambda is used to process the pet attributes request by orchestrating between API Gateway, Amazon Rekognition, and DynamoDB.

The architecture is implemented as follows:

  1. The Petfinder application routes the request to obtain the pet attributes via API Gateway.
  2. API Gateway calls the Lambda function to obtain the pet attributes.
  3. The Lambda function calls the Rekognition Custom Label inference endpoint to predict the pet breed.
  4. The Lambda function uses the predicted pet breed information to perform a pet attributes lookup in the DynamoDB table. It collects the pet attributes and sends it back to the Petfinder application.

The following diagram illustrates the solution workflow.

The Petfinder team at Purina wants an automated solution that they can deploy with minimal maintenance. To deliver this, we use Step Functions to create a state machine that trains the models with the latest data, checks their performance on a benchmark set, and redeploys the models if they have improved. The model retraining is triggered from the number of breed corrections made by users submitting profile information.

Model training

Developing a custom model to analyze images is a significant undertaking that requires time, expertise, and resources. Additionally, it often requires thousands or tens of thousands of hand-labeled images to provide the model with enough data to accurately make decisions. Generating this data can take months to gather and requires a large effort to label it for use in machine learning. A technique called transfer learning helps produce higher-quality models by borrowing the parameters of a pre-trained model, and allows models to be trained with fewer images.

Our challenge is that our data is not perfectly labeled: humans who enter the profile data can and do make mistakes. However, we found that for large enough data samples, the mislabeled images accounted for a sufficiently small fraction and model performance was not impacted more than 2% in accuracy.

ML workflow and state machine

The Step Functions state machine is developed to aid in the automatic retraining of the Amazon Rekognition model. Feedback is gathered during profile entry—each time a breed that has been inferred from an image is modified by the user to a different breed, the correction is recorded. This state machine is triggered from a configurable threshold number of corrections and additional pieces of data.

The state machine runs through several steps to create a solution:

  1. Create train and test manifest files containing the list of Amazon Simple Storage Service (Amazon S3) image paths and their labels for use by Amazon Rekognition.
  2. Create an Amazon Rekognition dataset using the manifest files.
  3. Train an Amazon Rekognition model version after the dataset is created.
  4. Start the model version when training is complete.
  5. Evaluate the model and produce performance metrics.
  6. If performance metrics are satisfactory, update the model version in Parameter Store.
  7. Wait for the new model version to propagate in the Lambda functions (20 minutes), then stop the previous model.

Model evaluation

We use a random 20% holdout set taken from our data sample to validate our model. Because the breeds we detect are configurable, we don’t use a fixed dataset for validation during training, but we do use a manually labeled evaluation set for integration testing. The overlap of the manually labeled set and the model’s detectable breeds is used to compute metrics. If the model’s breed detection accuracy is above a specified threshold, we promote the model to be used in the endpoint.

The following are a few screenshots of the pet prediction workflow from Rekognition Custom Labels.

Deployment with the AWS CDK

The Step Functions state machine and associated infrastructure (including Lambda functions, CodeBuild projects, and Systems Manager parameters) are deployed with the AWS CDK using Python. The AWS CDK code synthesizes a CloudFormation template, which it uses to deploy all infrastructure for the solution.

Integration with the Petfinder application

The Petfinder application accesses the image classification endpoint through the API Gateway endpoint using a POST request containing a JSON payload with fields for the Amazon S3 path to the image and the number of results to be returned.

KPIs to be impacted

To justify the added cost of running the image inference endpoint, we ran experiments to determine the value that the endpoint adds for Petfinder. The use of the endpoint offers two main types of improvement:

  • Reduced effort for pet shelters who are creating the pet profiles
  • More complete pet profiles, which are expected to improve search relevance

Metrics for measuring effort and profile completeness include the number of auto-filled fields that are corrected, total number of fields filled, and time to upload a pet profile. Improvements to search relevance are indirectly inferred from measuring key performance indicators related to adoption rates. According to Purina, after the solution went live, the average time for creating a pet profile on the Petfinder application was reduced from 7 minutes to 4 minutes. That is a huge improvement and time savings because in 2022, 4 million pet profiles were uploaded.

Security

The data that flows through the architecture diagram is encrypted in transit and at rest, in accordance with the AWS Well-Architected best practices. During all AWS engagements, a security expert reviews the solution to ensure a secure implementation is provided.

Conclusion

With their solution based on Rekognition Custom Labels, the Petfinder team is able to accelerate the creation of pet profiles for pet shelters, reducing administrative burden on shelter personnel. The deployment based on the AWS CDK deploys a Step Functions workflow to automate the training and deployment process. To start using Rekognition Custom Labels, refer to Getting Started with Amazon Rekognition Custom Labels. You can also check out some Step Functions examples and get started with the AWS CDK.


About the Authors

Mason Cahill is a Senior DevOps Consultant with AWS Professional Services. He enjoys helping organizations achieve their business goals, and is passionate about building and delivering automated solutions on the AWS Cloud. Outside of work, he loves spending time with his family, hiking, and playing soccer.

Matthew Chasse is a Data Science consultant at Amazon Web Services, where he helps customers build scalable machine learning solutions.  Matthew has a Mathematics PhD and enjoys rock climbing and music in his free time.

Rushikesh Jagtap is a Solutions Architect with 5+ years of experience in AWS Analytics services. He is passionate about helping customers to build scalable and modern data analytics solutions to gain insights from the data. Outside of work, he loves watching Formula1, playing badminton, and racing Go Karts.

Tayo Olajide is a seasoned Cloud Data Engineering generalist with over a decade of experience in architecting and implementing data solutions in cloud environments. With a passion for transforming raw data into valuable insights, Tayo has played a pivotal role in designing and optimizing data pipelines for various industries, including finance, healthcare, and auto industries. As a thought leader in the field, Tayo believes that the power of data lies in its ability to drive informed decision-making and is committed to helping businesses leverage the full potential of their data in the cloud era. When he’s not crafting data pipelines, you can find Tayo exploring the latest trends in technology, hiking in the great outdoors, or tinkering with gadgetry and software.

Read More

Understanding the user: How the Enterprise System Usability Scale aligns with user reality

Understanding the user: How the Enterprise System Usability Scale aligns with user reality

This position research paper was presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities.

Microsoft at CSCW 2023 conference highlights

In the business world, measuring success is as critical as selecting the right goals, and metrics act as a guiding compass, shaping organizational objectives. They are instrumental as businesses strategize to develop products that are likely to succeed in specific markets or among certain user groups.  

However, businesses often overlook whether these metrics accurately reflect users’ experiences and behaviors. Do they truly reflect the consumers’ journey and provide a reliable evaluation of the products’ place in the market? Put differently, do these metrics truly capture a product’s effectiveness and value, or are they superficial, overlooking deeper insights that could lead a business toward lasting success?

Microsoft Research Podcast

Collaborators: Gov4git with Petar Maymounkov and Kasia Sitkiewicz

Gov4git is a governance tool for decentralized, open-source cooperation, and is helping to lay the foundation for a future in which everyone can collaborate more efficiently, transparently, and easily and in ways that meet the unique desires and needs of their respective communities.


Challenges in enterprise usability metrics research

In our paper, “A Call to Revisit Classic Measurements for UX Evaluation (opens in new tab),” presented at the UX Outcomes Workshop at CSCW 2023 (opens in new tab), we explore these questions about usability metrics—which evaluate the simplicity and effectiveness of a product, service, or system for its users—and their applicability to enterprise products. These metrics are vital when measuring a product’s health in the market and predicting adoption rates, user engagement, and, by extension, revenue generation. Current usability metrics in the enterprise space often fail to align with the actual user’s reality when using technical enterprise products such as business analytics, data engineering, and data science software. Oftentimes, they lack methodological rigor, calling into question their generalizability and validity.

One example is the System Usability Scale (opens in new tab) (SUS), the most widely used usability metric. In the context of enterprise products, at least two questions used in SUS do not resonate with users’ actual experiences: “I think I would like to use the system frequently” and “I think I need the support of a technical person to be able to use this product.” Because users of enterprise products are consumers, not necessarily customers, they often do not get to choose which product to use. In some cases, they are IT professionals with no one to turn to for technical assistance. This misalignment highlights the need to refine how we measure usability for enterprise products. 

Another concern is the lack of rigorous validation for metrics that reflect a product’s performance. For instance, UMUX-Lite (opens in new tab) is a popular metric for its simplicity and strong correlation with SUS. However, its scoring methodology requires that researchers use an equation consisting of a regression weight and constant to align the average scores with SUS scores. This lacks a solid theoretical foundation, which raises questions about UMUX-Lite’s ability to generalize to different contexts and respondent samples.

The lack of standardization underscores the need for metrics that are grounded in the user’s reality for the types of products being assessed and based on theoretical and empirical evidence, ensuring that they are generalizable to diverse contexts. This approach will pave the way for more reliable insights into product usability, fostering informed decisions crucial for enhancing the user experience and driving product success.

ESUS: A reality-driven approach to usability metrics

Recognizing this need, we endeavored to create a new usability metric that accurately reflects the experience of enterprise product users, built on solid theory and supported by empirical evidence. Our research combines qualitative and quantitative approaches to devise a tailored usability metric for enterprise products, named the Enterprise System Usability Scale (ESUS). 

ESUS offers a number of benefits over the SUS and UMUX-Lite. It is more concise than the SUS, containing only half the questions and streamlining the evaluation process. It also eliminates the need for practitioners to use a sample-specific weight and constant, as required by UMUX-Lite, providing a more reliable measure of product usability. Moreover, ESUS demonstrates convergent validity, correlating with other usability metrics, such as SUS. Most importantly, through its conciseness and specificity, it was designed with enterprise product users in mind, providing relevant and actionable insights.  

In Table 1 below, we offer ESUS as a step towards more accurate, reliable, and user-focused metrics for enterprise products, which are instrumental in driving well-informed decisions in improving product usability and customer satisfaction.

ESUS Items 1 2 3 4 5
How useful is [this product] to you? Not at all useful Slightly useful Somewhat useful Mostly useful Very useful
How easy or hard was [this product] to use for you? Very hard Hard Neutral Easy Very easy
How confident were you when using [this product]? Not at all confident Slightly confident Somewhat confident Mostly confident Very confident
How well do the functions work together or do not work together in [this product]? Does not work together at all Does not work well together Neutral Works well together Works very well together
How easy or hard was it to get started with [this product]? Very hard Hard Neutral Easy Very easy
Table 1: Proposed ESUS questionnaire

Looking ahead: Advancing precision in understanding the user

Moving forward, our focus is on rigorously testing and enhancing ESUS. We aim to examine its consistency over time and its effectiveness with small sample sizes. Our goal is to ensure our metrics are as robust and adaptable as the rapidly evolving enterprise product environment requires. We’re committed to continuous improvement, striving for metrics that are not just accurate but also relevant and reliable, offering actionable insights for an ever-improving user experience.

The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.

Read More

NVIDIA Expands Robotics Platform to Meet the Rise of Generative AI

NVIDIA Expands Robotics Platform to Meet the Rise of Generative AI

Powerful generative AI models and cloud-native APIs and microservices are coming to the edge.

Generative AI is bringing the power of transformer models and large language models to virtually every industry. That reach now includes areas that touch edge, robotics and logistics systems: defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions and more.

NVIDIA today announced major expansions to two frameworks on the NVIDIA Jetson platform for edge AI and robotics: the NVIDIA Isaac ROS robotics framework has entered general availability, and the NVIDIA Metropolis expansion on Jetson is coming next.

To accelerate AI application development and deployments at the edge, NVIDIA has also created a Jetson Generative AI Lab for developers to use with the latest open-source generative AI models.

More than 1.2 million developers and over 10,000 customers have chosen NVIDIA AI and the Jetson platform, including Amazon Web Services, Cisco, John Deere, Medtronic, Pepsico and Siemens.

With the rapidly evolving AI landscape addressing increasingly complicated scenarios, developers are being challenged by longer development cycles to build AI applications for the edge. Reprogramming robots and AI systems on the fly to meet changing environments, manufacturing lines and automation needs of customers is time-consuming and requires expert skills.

Generative AI offers zero-shot learning — the ability for a model to recognize things specifically unseen before in training — with a natural language interface to simplify the development, deployment and management of AI at the edge.

Transforming the AI Landscape

Generative AI dramatically improves ease of use by understanding human language prompts to make model changes. Those AI models are more flexible in detecting, segmenting, tracking, searching and even reprogramming — and  help outperform traditional convolutional neural network-based models.

Generative AI is expected to add $10.5 billion in revenue for manufacturing operations worldwide by 2033, according to ABI Research.

“Generative AI will significantly accelerate deployments of AI at the edge with better generalization, ease of use and higher accuracy than previously possible,” said Deepu Talla, vice president of embedded and edge computing at NVIDIA. “This largest-ever software expansion of our Metropolis and Isaac frameworks on Jetson, combined with the power of transformer models and generative AI, addresses this need.”

Developing With Generative AI at the Edge

The Jetson Generative AI Lab provides developers access to optimized tools and tutorials for deploying open-source LLMs, diffusion models to generate stunning interactive images, vision language models (VLMs) and vision transformers (ViTs) that combine vision AI and natural language processing to provide comprehensive understanding of the scene.

Developers can also use the NVIDIA TAO Toolkit to create efficient and accurate AI models for edge applications. TAO provides a low-code interface to fine-tune and optimize vision AI models, including ViT and vision foundational models. They can also customize and fine-tune foundational models like NVIDIA NV-DINOv2 or public models like OpenCLIP to create highly accurate vision AI models with very little data. TAO additionally now includes VisualChangeNet, a new transformer-based model for defect inspection.

Harnessing New Metropolis and Isaac Frameworks

NVIDIA Metropolis makes it easier and more cost-effective for enterprises to embrace world-class, vision AI-enabled solutions to improve critical operational efficiency and safety problems. The platform brings a collection of powerful application programming interfaces and microservices for developers to quickly develop complex vision-based applications.

More than 1,000 companies, including BMW Group, Pepsico, Kroger, Tyson Foods, Infosys and Siemens, are using NVIDIA Metropolis developer tools to solve Internet of Things, sensor processing and operational challenges with vision AI — and the rate of adoption is quickening. The tools have now been downloaded over 1 million times by those looking to build vision AI applications.

To help developers quickly build and deploy scalable vision AI applications, an expanded set of Metropolis APIs and microservices on NVIDIA Jetson will be available by year’s end.

Hundreds of customers use the NVIDIA Isaac platform to develop high-performance robotics solutions across diverse domains, including agriculture, warehouse automation, last-mile delivery and service robotics, among others.

At ROSCon 2023, NVIDIA announced major improvements to perception and simulation capabilities with new releases of Isaac ROS and Isaac Sim software. Built on the widely adopted open-source Robot Operating System (ROS), Isaac ROS brings perception to automation, giving eyes and ears to the things that move. By harnessing the power of GPU-accelerated GEMs, including visual odometry, depth perception, 3D scene reconstruction, localization and planning, robotics developers gain the tools needed to swiftly engineer robotic solutions tailored for a diverse range of applications.

Isaac ROS has reached production-ready status with the latest Isaac ROS 2.0 release, enabling developers to create and bring high-performance robotics solutions to market with Jetson.

“ROS continues to grow and evolve to provide open-source software for the whole robotics community,” said Geoff Biggs, CTO of the Open Source Robotics Foundation. “NVIDIA’s new prebuilt ROS 2 packages, launched with this release, will accelerate that growth by making ROS 2 readily available to the vast NVIDIA Jetson developer community.”

Delivering New Reference AI Workflows

Developing a production-ready AI solution entails optimizing the development and training of AI models tailored to specific use cases, implementing robust security features on the platform, orchestrating the application, managing fleets, establishing seamless edge-to-cloud communication and more.

NVIDIA announced a curated collection of AI reference workflows based on Metropolis and Isaac frameworks that enable developers to quickly adopt the entire workflow or selectively integrate individual components, resulting in substantial reductions in both development time and cost. The three distinct AI workflows include: Network Video Recording, Automatic Optical Inspection and Autonomous Mobile Robot.

“NVIDIA Jetson, with its broad and diverse user base and partner ecosystem, has helped drive a revolution in robotics and AI at the edge,” said Jim McGregor, principal analyst at Tirias Research. “As application requirements become increasingly complex, we need a foundational shift to platforms that simplify and accelerate the creation of edge deployments. This significant software expansion by NVIDIA gives developers access to new multi-sensor models and generative AI capabilities.”

More Coming on the Horizon 

NVIDIA announced a collection of system services which are fundamental capabilities that every developer requires when building edge AI solutions. These services will simplify integration into workflows and spare developer the arduous task of building them from the ground up.

The new NVIDIA JetPack 6, expected to be available by year’s end, will empower AI developers to stay at the cutting edge of computing without the need for a full Jetson Linux upgrade, substantially expediting development timelines and liberating them from Jetson Linux dependencies. JetPack 6 will also use the collaborative efforts with Linux distribution partners to expand the horizon of Linux-based distribution choices, including Canonical’s Optimized and Certified Ubuntu, Wind River Linux, Concurrent Real’s Redhawk Linux and various Yocto-based distributions.

Partner Ecosystem Benefits From Platform Expansion

The Jetson partner ecosystem provides a wide range of support, from hardware, AI software and application design services to sensors, connectivity and developer tools. These NVIDIA Partner Network innovators play a vital role in providing the building blocks and sub-systems for many products sold on the market.

The latest release allows Jetson partners to accelerate their time to market and expand their customer base by adopting AI with increased performance and capabilities.

Independent software vendor partners will also be able to expand their offerings for Jetson.

Join us Tuesday, Nov. 7, at 9 a.m. PT for the Bringing Generative AI to Life with NVIDIA Jetson webinar, where technical experts will dive deeper into the news announced here, including accelerated APIs and quantization methods for deploying LLMs and VLMs on Jetson, optimizing vision transformers with TensorRT, and more.

Sign up for NVIDIA Metropolis early access here.

 

 

 

 

 

 

 

Read More

Making Machines Mindful: NYU Professor Talks Responsible AI

Making Machines Mindful: NYU Professor Talks Responsible AI

Artificial intelligence is now a household term. Responsible AI is hot on its heels.

Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous.

In the latest episode of the NVIDIA AI Podcast, host Noah Kravitz ‌spoke with Stoyanovich about responsible AI, her advocacy efforts and how people can help.

Stoyanovich started her work at the Center for Responsible AI with basic research. She soon realized that what was needed were better guardrails, not just more algorithms.

As AI’s potential has grown, along with the ethical concerns surrounding its use, Stoyanovich clarifies that the “responsibility” lies with people, not AI.

“The responsibility refers to people taking responsibility for the decisions that we make individually and collectively about whether to build an AI system and how to build, test, deploy and keep it in check,” she said.

AI ethics is a related concern, used to refer to “the embedding of moral values and principles into the design, development and use of the AI,” she added.

Lawmakers have taken notice. For example, New York recently implemented a law that makes job candidate screening more transparent.

According to Stoyanovich, “the law is not perfect,” but “we can only learn how to regulate something if we try regulating” and converse openly with the “people at the table being impacted.”

Stoyanovich wants two things: for people to recognize that AI can’t predict human choices and that AI systems be transparent and accountable, carrying a “nutritional label.”

That process should include considerations on who is using AI tools, how they’re used to make decisions and who is subjected to those decisions, she said.

Stoyanovich urges people to “start demanding actions and explanations to understand” how AI is used at local, state and federal levels.

“We need to teach ourselves to help others learn about what AI is and why we should care,” she said. “So please get involved in how we govern ourselves, because we live in a democracy. We have to step up.”

You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games
A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry
Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs
Luis Voloch talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunesGoogle PodcastsGoogle PlayCastbox, DoggCatcher, OvercastPlayerFM, Pocket Casts, PodbayPodBean, PodCruncher, PodKicker, SoundcloudSpotifyStitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.

Read More

Into the Omniverse: Marmoset Brings Breakthroughs in Rendering, Extends OpenUSD Support to Enhance 3D Art Production

Into the Omniverse: Marmoset Brings Breakthroughs in Rendering, Extends OpenUSD Support to Enhance 3D Art Production

Editor’s note: This post is part of Into the Omniverse, a series focused on how artists and developers from startups to enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Real-time rendering, animation and texture baking are essential workflows for 3D art production. Using the Marmoset Toolbag software, 3D artists can enhance their creative workflows and build complex 3D models without disruptions to productivity.

The latest release of Marmoset Toolbag, version 4.06, brings increased support for Universal Scene Description, aka OpenUSD, enabling seamless compatibility with NVIDIA Omniverse, a development platform for connecting and building OpenUSD-based tools and applications.

3D creators and technical artists using Marmoset can now enjoy improved interoperability, accelerated rendering, real-time visualization and efficient performance —  redefining the possibilities of their creative workflows.

Enhancing Cross-Platform Creativity With OpenUSD

Creators are taking their workflows to the next level with OpenUSD.

Berlin-based Armin Halač works as a principal animator at Wooga, a mobile games development studio known for projects like June’s Journey and Ghost Detective. The nature of his job means Halač is no stranger to 3D workflows — he gets hands-on with animation and character rigging.

For texturing and producing high-quality renders, Marmoset is Halač’s go-to tool, providing a user-friendly interface and powerful features to simplify his workflow. Recently, Halač used Marmoset to create the captivating cover image for his book, A Complete Guide to Character Rigging for Games Using Blender.

Using the added support for USD, Halač can seamlessly send 3D assets from Blender to Marmoset, creating new possibilities for collaboration and improved visuals.

The cover image of Halač’s book.

Nkoro Anselem Ire, a.k.a askNK, is a popular YouTube creator as well as a media and visual arts professor at a couple of universities who is also seeing workflow benefits from increased USD support.

As a 3D content creator, he uses Marmoset Toolbag for the majority of his PBR workflow — from texture baking and lighting to animation and rendering. Now, with USD, askNK is enjoying newfound levels of creative flexibility as the framework allows him to “collaborate with individuals or team members a lot easier because they can now pick up and drop off processes while working on the same file.”

Halač and askNK recently joined an NVIDIA-hosted livestream where community members and the Omniverse team explored the benefits of a Marmoset- and Omniverse-boosted workflow.

Daniel Bauer is another creator experiencing the benefits of Marmoset, OpenUSD and Omniverse. A SolidWorks mechanical engineer with over 10 years of experience, Bauer works frequently in CAD software environments, where it’s typical to assign different materials to various scene components. The variance can often lead to shading errors and incorrect geometry representation, but using USD, Bauer can avoid errors by easily importing versions of his scene from Blender to Marmoset Toolbag to Omniverse USD Composer.

A Kuka Scara robot simulation with 10 parallel small grippers for sorting and handling pens.

Additionally, 3D artists Gianluca Squillace and Pasquale Scionti are harnessing the collaborative power of Omniverse, Marmoset and OpenUSD to transform their workflows from a convoluted series of exports and imports to a streamlined, real-time, interconnected process.

Squillace crafted a captivating 3D character with Pixologic ZBrush, Autodesk Maya, Adobe Substance 3D Painter and Marmoset Toolbag — aggregating the data from the various tools in Omniverse. With USD, he seamlessly integrated his animations and made real-time adjustments without the need for constant file exports.

Simultaneously, Scionti constructed a stunning glacial environment using Autodesk 3ds Max, Adobe Substance 3D Painter, Quixel and Unreal Engine, uniting the various pieces from his tools in Omniverse. His work showcased the potential of Omniverse to foster real-time collaboration as he was able to seamlessly integrate Squillace’s character into his snowy world.

Advancing Interoperability and Real-Time Rendering

Marmoset Toolbag 4.06 provides significant improvements to interoperability and image fidelity for artists working across platforms and applications. This is achieved through updates to Marmoset’s OpenUSD support, allowing for seamless compatibility and connection with the Omniverse ecosystem.

The improved USD import and export capabilities enhance interoperability with popular content creation apps and creative toolkits like Autodesk Maya and Autodesk 3ds Max, SideFX Houdini and Unreal Engine.

Additionally, Marmoset Toolbag 4.06 brings additional updates, including:

  • RTX-accelerated rendering and baking: Toolbag’s ray-traced renderer and texture baker are accelerated by NVIDIA RTX GPUs, providing up to a 2x improvement in render times and a 4x improvement in bake times.
  • Real-time denoising with OptiX: With NVIDIA RTX devices, creators can enjoy a smooth and interactive ray-tracing experience, enabling real-time navigation of the active viewport without visual artifacts or performance disruptions.
  • High DPI performance with DLSS image upscaling: The viewport now renders at a reduced resolution and uses AI-based technology to upscale images, improving performance while minimizing image-quality reductions.

Download Toolbag 4.06 directly from Marmoset to explore USD support and RTX-accelerated production tools. New users are eligible for a full-featured, 30-day free trial license.

Get Plugged Into the Omniverse 

Learn from industry experts on how OpenUSD is enabling custom 3D pipelines, easing 3D tool development and delivering interoperability between 3D applications in sessions from SIGGRAPH 2023, now available on demand.

Anyone can build their own Omniverse extension or Connector to enhance their 3D workflows and tools. Explore the Omniverse ecosystem’s growing catalog of connections, extensions, foundation applications and third-party tools.

For more resources on OpenUSD, explore the Alliance for OpenUSD forum or visit the AOUSD website.

Share your Marmoset Toolbag and Omniverse work as part of the latest community challenge, #SeasonalArtChallenge. Use the hashtag to submit a spooky or festive scene for a chance to be featured on the @NVIDIAStudio and @NVIDIAOmniverse social channels.

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team

Developers can check out these Omniverse resources to begin building on the platform. 

Stay up to date on the platform by subscribing to the newsletter and following NVIDIA Omniverse on Instagram, LinkedIn, Medium, Threads and Twitter.

For more, check out our forums, Discord server, Twitch and YouTube channels.

Featured image courtesy of Armin Halač, Christian Nauck and Masuquddin Ahmed.

Read More

Foxconn and NVIDIA Amp Up Electric Vehicle Innovation

Foxconn and NVIDIA Amp Up Electric Vehicle Innovation

NVIDIA founder and CEO Jensen Huang joined Hon Hai (Foxconn) Chairman and CEO Young Liu to unveil the latest in their ongoing partnership to develop the next wave of intelligent electric vehicle (EV) platforms for the global automotive market.

This latest move, announced today at the fourth annual Hon Hai Tech Day in Taiwan, will help Foxconn realize its EV vision with a range of NVIDIA DRIVE solutions — including NVIDIA DRIVE Orin today and its successor, DRIVE Thor, down the road.

In addition, Foxconn will be a contract manufacturer of highly automated and autonomous, AI-rich EVs featuring the upcoming NVIDIA DRIVE Hyperion 9 platform, which includes DRIVE Thor and a state-of-the-art sensor architecture.

Next-Gen EVs With Extraordinary Performance  

The computational requirements for highly automated and fully self-driving vehicles are enormous. NVIDIA offers the most advanced, highest-performing AI car computers for the transportation industry, with DRIVE Orin selected for use by more than 25 global automakers.

Already a tier-one manufacturer of DRIVE Orin-powered electronic control units (ECUs), Foxconn will also manufacture ECUs featuring DRIVE Thor, once available.

The upcoming DRIVE Thor superchip harnesses advanced AI capabilities first deployed in NVIDIA Grace CPUs and Hopper and Ada Lovelace architecture-based GPUs — and is expected to deliver a staggering 2,000 teraflops of high-performance compute to enable functionally safe and secure intelligent driving.

Next-generation NVIDIA DRIVE Thor.

Heightened Senses

Unveiled at GTC last year, DRIVE Hyperion 9 is the latest evolution of NVIDIA’s modular development platform and reference architecture for automated and autonomous vehicles. Set to be powered by DRIVE Thor, it will integrate a qualified sensor architecture for level 3 urban and level 4 highway driving scenarios.

With a diverse and redundant array of high-resolution camera, radar, lidar and ultrasonic sensors, DRIVE Hyperion can process an extraordinary amount of safety-critical data to enable vehicles to deftly navigate their surroundings.

Another advantage of DRIVE Hyperion is its compatibility across generations, as it retains the same compute form factor and NVIDIA DriveWorks application programming interfaces, enabling a seamless transition from DRIVE Orin to DRIVE Thor and beyond.

Plus, DRIVE Hyperion can help speed development times and lower costs for electronics manufacturers like Foxconn, since the sensors available on the platform have cleared NVIDIA’s rigorous qualification processes.

The shift to software-defined vehicles with a centralized electronic architecture will drive the need for high-performance, energy-efficient computing solutions such as DRIVE Thor. By coupling it with the DRIVE Hyperion sensor architecture, Foxconn and its automotive customers will be better equipped to realize a new era of safe and intelligent EVs.

Since its inception, Hon Hai Tech Day has served as a launch pad for Foxconn to showcase its latest endeavors in contract design and manufacturing services and new technologies. These accomplishments span the EV sector and extend to the broader consumer electronics industry.

Catch more on Liu and Huang’s fireside chat at Hon Hai Tech Day.

Read More

Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker

Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker

Amazon Pharmacy is a full-service pharmacy on Amazon.com that offers transparent pricing, clinical and customer support, and free delivery right to your door. Customer care agents play a crucial role in quickly and accurately retrieving information related to pharmacy information, including prescription clarifications and transfer status, order and dispensing details, and patient profile information, in real time. Amazon Pharmacy provides a chat interface where customers (patients and doctors) can talk online with customer care representatives (agents). One challenge that agents face is finding the precise information when answering customers’ questions, because the diversity, volume, and complexity of healthcare’s processes (such as explaining prior authorizations) can be daunting. Finding the right information, summarizing it, and explaining it takes time, slowing down the speed to serve patients.

To tackle this challenge, Amazon Pharmacy built a generative AI question and answering (Q&A) chatbot assistant to empower agents to retrieve information with natural language searches in real time, while preserving the human interaction with customers. The solution is HIPAA compliant, ensuring customer privacy. In addition, agents submit their feedback related to the machine-generated answers back to the Amazon Pharmacy development team, so that it can be used for future model improvements.

In this post, we describe how Amazon Pharmacy implemented its customer care agent assistant chatbot solution using AWS AI products, including foundation models in Amazon SageMaker JumpStart to accelerate its development. We start by highlighting the overall experience of the customer care agent with the addition of the large language model (LLM)-based chatbot. Then we explain how the solution uses the Retrieval Augmented Generation (RAG) pattern for its implementation. Finally, we describe the product architecture. This post demonstrates how generative AI is integrated into an already working application in a complex and highly regulated business, improving the customer care experience for pharmacy patients.

The LLM-based Q&A chatbot

The following figure shows the process flow of a patient contacting Amazon Pharmacy customer care via chat (Step 1). Agents use a separate internal customer care UI to ask questions to the LLM-based Q&A chatbot (Step 2). The customer care UI then sends the request to a service backend hosted on AWS Fargate (Step 3), where the queries are orchestrated through a combination of models and data retrieval processes, collectively known as the RAG process. This process is the heart of the LLM-based chatbot solution and its details are explained in the next section. At the end of this process, the machine-generated response is returned to the agent, who can review the answer before providing it back to the end-customer (Step 4). It should be noted that agents are trained to exercise judgment and use the LLM-based chatbot solution as a tool that augments their work, so they can dedicate their time to personal interactions with the customer. Agents also label the machine-generated response with their feedback (for example, positive or negative). This feedback is then used by the Amazon Pharmacy development team to improve the solution (through fine-tuning or data improvements), forming a continuous cycle of product development with the user (Step 5).

Process flow and high level architecture

The following figure shows an example from a Q&A chatbot and agent interaction. Here, the agent was asking about a claim rejection code. The Q&A chatbot (Agent AI Assistant) answers the question with a clear description of the rejection code. It also provides the link to the original documentation for the agents to follow up, if needed.

Example screenshot from Q&A chatbot

Accelerating the ML model development

In the previous figure depicting the chatbot workflow, we skipped the details of how to train the initial version of the Q&A chatbot models. To do this, the Amazon Pharmacy development team benefited from using SageMaker JumpStart. SageMaker JumpStart allowed the team to experiment quickly with different models, running different benchmarks and tests, failing fast as needed. Failing fast is a concept practiced by the scientist and developers to quickly build solutions as realistic as possible and learn from their efforts to make it better in the next iteration. After the team decided on the model and performed any necessary fine-tuning and customization, they used SageMaker hosting to deploy the solution. The reuse of the foundation models in SageMaker JumpStart allowed the development team to cut months of work that otherwise would have been needed to train models from scratch.

The RAG design pattern

One core part of the solution is the use of the Retrieval Augmented Generation (RAG) design pattern for implementing Q&A solutions. The first step in this pattern is to identify a set of known question and answer pairs, which is the initial ground truth for the solution. The next step is to convert the questions to a better representation for the purpose of similarity and searching, which is called embedding (we embed a higher-dimensional object into a hyperplane with less dimensions). This is done through an embedding-specific foundation model. These embeddings are used as indexes to the answers, much like how a database index maps a primary key to a row. We’re now ready to support new queries coming from the customer. As explained previously, the experience is that customers send their queries to agents, who then interface with the LLM-based chatbot. Within the Q&A chatbot, the query is converted to an embedding and then used as a search key for a matching index (from the previous step). The matching criteria is based on a similarity model, such as FAISS or Amazon Open Search Service (for more details, refer to Amazon OpenSearch Service’s vector database capabilities explained). When there are matches, the top answers are retrieved and used as the prompt context for the generative model. This corresponds to the second step in the RAG pattern—the generative step. In this step, the prompt is sent to the LLM (generator foundation modal), which composes the final machine-generated response to the original question. This response is provided back through the customer care UI to the agent, who validates the answer, edits it if needed, and sends it back to the patient. The following diagram illustrates this process.

Rag flow

Managing the knowledge base

As we learned with the RAG pattern, the first step in performing Q&A consists of retrieving the data (the question and answer pairs) to be used as context for the LLM prompt. This data is referred to as the chatbot’s knowledge base. Examples of this data are Amazon Pharmacy internal standard operating procedures (SOPs) and information available in Amazon Pharmacy Help Center. To facilitate the indexing and the retrieval process (as described previously), it’s often useful to gather all this information, which may be hosted across different solutions such as in wikis, files, and databases, into a single repository. In the particular case of the Amazon Pharmacy chatbot, we use Amazon Simple Storage Service (Amazon S3) for this purpose because of its simplicity and flexibility.

Solution overview

The following figure shows the solution architecture. The customer care application and the LLM-based Q&A chatbot are deployed in their own VPC for network isolation. The connection between the VPC endpoints is realized through AWS PrivateLink, guaranteeing their privacy. The Q&A chatbot likewise has its own AWS account for role separation, isolation, and ease of monitoring for security, cost, and compliance purposes. The Q&A chatbot orchestration logic is hosted in Fargate with Amazon Elastic Container Service (Amazon ECS). To set up PrivateLink, a Network Load Balancer proxies the requests to an Application Load Balancer, which stops the end-client TLS connection and hands requests off to Fargate. The primary storage service is Amazon S3. As mentioned previously, the related input data is imported into the desired format inside the Q&A chatbot account and persisted in S3 buckets.

Solutions architecture

When it comes to the machine learning (ML) infrastructure, Amazon SageMaker is at the center of the architecture. As explained in the previous sections, two models are used, the embedding model and the LLM model, and these are hosted in two separate SageMaker endpoints. By using the SageMaker data capture feature, we can log all inference requests and responses for troubleshooting purposes, with the necessary privacy and security constraints in place. Next, the feedback taken from the agents is stored in a separate S3 bucket.

The Q&A chatbot is designed to be a multi-tenant solution and support additional health products from Amazon Health Services, such as Amazon Clinic. For example, the solution is deployed with AWS CloudFormation templates for infrastructure as a code (IaC), allowing different knowledge bases to be used.

Conclusion

This post presented the technical solution for Amazon Pharmacy generative AI customer care improvements. The solution consists of a question answering chatbot implementing the RAG design pattern on SageMaker and foundation models in SageMaker JumpStart. With this solution, customer care agents can assist patients more quickly, while providing precise, informative, and concise answers.

The architecture uses modular microservices with separate components for knowledge base preparation and loading, chatbot (instruction) logic, embedding indexing and retrieval, LLM content generation, and feedback supervision. The latter is especially important for ongoing model improvements. The foundation models in SageMaker JumpStart are used for fast experimentation with model serving being done with SageMaker endpoints. Finally, the HIPAA-compliant chatbot server is hosted on Fargate.

In summary, we saw how Amazon Pharmacy is using generative AI and AWS to improve customer care while prioritizing responsible AI principles and practices.

You can start experimenting with foundation models in SageMaker JumpStart today to find the right foundation models for your use case and start building your generative AI application on SageMaker.


About the author

Burak Gozluklu is a Principal AI/ML Specialist Solutions Architect located in Boston, MA. He helps global customers adopt AWS technologies and specifically AI/ML solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. Burak is passionate about yoga and meditation.

Jangwon Kim is a Sr. Applied Scientist at Amazon Health Store & Tech. He has expertise in LLM, NLP, Speech AI, and Search. Prior to joining Amazon Health, Jangwon was an applied scientist at Amazon Alexa Speech. He is based out of Los Angeles.

Alexandre Alves is a Sr. Principal Engineer at Amazon Health Services, specializing in ML, optimization, and distributed systems. He helps deliver wellness-forward health experiences.

Nirvay Kumar is a Sr. Software Dev Engineer at Amazon Health Services, leading architecture within Pharmacy Operations after many years in Fulfillment Technologies. With expertise in distributed systems, he has cultivated a growing passion for AI’s potential. Nirvay channels his talents into engineering systems that solve real customer needs with creativity, care, security, and a long-term vision. When not hiking the mountains of Washington, he focuses on thoughtful design that anticipates the unexpected. Nirvay aims to build systems that withstand the test of time and serve customers’ evolving needs.

Read More

Keeping an eye on your cattle using AI technology

Keeping an eye on your cattle using AI technology

At Amazon Web Services (AWS), not only are we passionate about providing customers with a variety of comprehensive technical solutions, but we’re also keen on deeply understanding our customers’ business processes. We adopt a third-party perspective and objective judgment to help customers sort out their value propositions, collect pain points, propose appropriate solutions, and create the most cost-effective and usable prototypes to help them systematically achieve their business goals.

This method is called working backwards at AWS. It means putting aside technology and solutions, starting from the expected results of customers, confirming their value, and then deducing what needs to be done in reverse order before finally implementing a solution. During the implementation phase, we also follow the concept of minimum viable product and strive to quickly form a prototype that can generate value within a few weeks, and then iterate on it.

Today, let’s review a case study where AWS and New Hope Dairy collaborated to build a smart farm on the cloud. From this blog post, you can have a deep understanding about what AWS can provide for building a smart farm and how to build smart farm applications on the cloud with AWS experts.

Project background

Milk is a nutritious beverage. In consideration of national health, China has been actively promoting the development of the dairy industry. According to data from Euromonitor International, the sale of dairy products in China reached 638.5 billion RMB in 2020 and is expected to reach 810 billion RMB in 2025. In addition, the compound annual growth rate in the past 14 years has also reached 10 percent, showing rapid development.

On the other hand, as of 2022, most of the revenue in the Chinese dairy industry still comes from liquid milk. Sixty percent of the raw milk is used for liquid milk and yogurt, and another 20 percent is milk powder—a derivative of liquid milk. Only a very small amount is used for highly processed products such as cheese and cream.

Liquid milk is a lightly processed product and its output, quality, and cost are closely linked to raw milk. This means that if the dairy industry wants to free capacity to focus on producing highly processed products, create new products, and conduct more innovative biotechnology research, it must first improve and stabilize the production and quality of raw milk.

As a dairy industry leader, New Hope Dairy has been thinking about how to improve the efficiency of its ranch operations and increase the production and quality of raw milk. New Hope Dairy hopes to use the third-party perspective and technological expertise of AWS to facilitate innovation in the dairy industry. With support and promotion from Liutong Hu, VP and CIO of New Hope Dairy, the AWS customer team began to organize operations and potential innovation points for the dairy farms.

Dairy farm challenges

AWS is an expert in the field of cloud technology, but to implement innovation in the dairy industry, professional advice from dairy subject matter experts is necessary. Therefore, we conducted several in-depth interviews with Liangrong Song, the Deputy Director of Production Technology Center of New Hope Dairy, the ranch management team, and nutritionists to understand some of the issues and challenges facing the farm.

First is taking inventory of reserve cows

The dairy cows on the ranch are divided into two types: dairy cows and reserve cows. Dairy cows are mature and continuously produce milk, while reserve cows are cows that have not yet reached the age to produce milk. Large and medium-sized farms usually provide reserve cows with a larger open activity area to create a more comfortable growing environment.

However, both dairy cows and reserve cows are assets of the farm and need to be inventoried monthly. Dairy cows are milked every day, and because they are relatively still during milking, inventory tracking is easy. However, reserve cows are in an open space and roam freely, which makes it inconvenient to inventory them. Each time inventory is taken, several workers count the reserve cows repeatedly from different areas, and finally, the numbers are checked. This process consumes one to two days for several workers, and often there are problems with aligning the counts or uncertainties about whether each cow has been counted.

Significant time can be saved if we have a way to inventory reserve cows quickly and accurately.

Second is identifying lame cattle

Currently, most dairy companies use a breed named Holstein to produce milk. Holsteins are the black and white cows most of us are familiar with. Despite most dairy companies using the same breed, there are still differences in milk production quantity and quality among different companies and ranches. This is because the health of dairy cows directly affects milk production.

However, cows cannot express discomfort on their own like humans can, and it isn’t practical for veterinarians to give thousands of cows physical examinations regularly. Therefore, we have to use external indicators to quickly judge the health status of cows.

smart ranch with aws

The external indicators of a cow’s health include body condition score and lameness degree. Body condition score is largely related to the cow’s body fat percentage and is a long-term indicator, while lameness is a short-term indicator caused by leg problems or foot infections and other issues that affect the cow’s mood, health, and milk production. Additionally, adult Holstein cows can weigh over 500 kg, which can cause significant harm to their feet if they aren’t stable. Therefore, when lameness occurs, veterinarians should intervene as soon as possible.

According to a 2014 study, the proportion of severely lame cows in China can be as high as 31 percent. Although the situation might have improved since the study, the veterinarian count on farms is extremely limited, making it difficult to monitor cows regularly. When lameness is detected, the situation is often severe, and treatment is time-consuming and difficult, and milk production is already affected.

If we have a way to timely detect lameness in cows and prompt veterinarians to intervene at the mild lameness stage, the overall health and milk production of the cows will increase, and the performance of the farm will improve.

Lastly, there is feed cost optimization

Within the livestock industry, feed is the biggest variable cost. To ensure the quality and inventory of feed, farms often need to purchase feed ingredients from domestic and overseas suppliers and deliver them to feed formulation factories for processing. There are many types of modern feed ingredients, including soybean meal, corn, alfalfa, oat grass, and so on, which means that there are many variables at play. Each type of feed ingredient has its own price cycle and price fluctuations. During significant fluctuations, the total cost of feed can fluctuate by more than 15 percent, causing a significant impact.

Feed costs fluctuate, but dairy product prices are relatively stable over the long term. Consequently, under otherwise unchanged conditions, the overall profit can fluctuate significantly purely due to feed cost changes.

To avoid this fluctuation, it’s necessary to consider storing more ingredients when prices are low. But stocking also needs to consider whether the price is genuinely at the trough and what quantity of feed should be purchased according to the current consumption rate.

If we have a way to precisely forecast feed consumption and combine it with the overall price trend to suggest the best time and quantity of feed to purchase, we can reduce costs and increase efficiency on the farm.

It’s evident that these issues are directly related to the customer’s goal of improving farm operational efficiency, and the methods are respectively freeing up labor, increasing production and reducing costs. Through discussions on the difficulty and value of solving each issue, we chose increasing production as the starting point and prioritized solving the problem of lame cows.

Research

Before discussing technology, research had to be conducted. The research was jointly conducted by the AWS customer team, the AWS Generative AI Innovation Center, which managed the machine learning algorithm models, and AWS AI Shanghai Lablet, which provides algorithm consultation on the latest computer vision research and the expert farming team from New Hope Dairy. The research was divided into several parts:

  • Understanding the traditional paper-based identification method of lame cows and developing a basic understanding of what lame cows are.
  • Confirming existing solutions, including those used in farms and in the industry.
  • Conducting farm environment research to understand the physical situation and limitations.

Through studying materials and observing on-site videos, the teams gained a basic understanding of lame cows. Readers can also get a basic idea of the posture of lame cows through the animated image below.

Lame Cows

In contrast to a relatively healthy cow.

healthy cow

Lame cows have visible differences in posture and gait compared to healthy cows.

Regarding existing solutions, most ranches rely on visual inspection by veterinarians and nutritionists to identify lame cows. In the industry, there are solutions that use wearable pedometers and accelerometers for identification, as well as solutions that use partitioned weighbridges for identification, but both are relatively expensive. For the highly competitive dairy industry, we need to minimize identification costs and the costs and dependence on non-generic hardware.

After discussing and analyzing the information with ranch veterinarians and nutritionists, the AWS Generative AI Innovation Center experts decided to use computer vision (CV) for identification, relying only on ordinary hardware: civilian surveillance cameras, which don’t add any additional burden to the cows and reduce costs and usage barriers.

After deciding on this direction, we visited a medium-sized farm with thousands of cows on site, investigated the ranch environment, and determined the location and angle of camera placement.

Initial proposal

Now, for the solution. The core of our CV-based solution consists of the following steps:

  • Cow identification: Identify multiple cows in a single frame of video and mark the position of each cow.
  • Cow tracking: While video is recording, we need to continuously track cows as the frames change and assign a unique number to each cow.
  • Posture marking: Reduce the dimensionality of cow movements by converting cow images to marked points.
  • Anomaly identification: Identify anomalies in the marked points’ dynamics.
  • Lame cow algorithm: Normalize the anomalies to obtain a score to determine the degree of cow lameness.
  • Threshold determination: Obtain a threshold based on expert inputs.

According to the judgment of the AWS Generative AI Innovation Center experts, the first few steps are generic requirements that can be solved using open-source models, while the latter steps require us to use mathematical methods and expert intervention.

Difficulties in the solution

To balance cost and performance, we chose the yolov5l model, a medium-sized pre-trained model for cow recognition, with an input width of 640 pixels, which provides good value for this scene.

While YOLOv5 is responsible for recognizing and tagging cows in a single image, in reality, videos consist of multiple images (frames) that change continuously. YOLOv5 cannot identify that cows in different frames belong to the same individual. To track and locate a cow across multiple images, another model called SORT is needed.

SORT stands for simple online and realtime tracking, where online means it considers only the current and previous frames to track without consideration of any other frames, and realtime means it can identify the object’s identity immediately.

After the development of SORT, many engineers implemented and optimized it, leading to the development of OC-SORT, which considers the appearance of the object, DeepSORT (and its upgraded version, StrongSORT), which includes human appearance, and ByteTrack, which uses a two-stage association linker to consider low-confidence recognition. After testing, we found that for our scene, DeepSORT’s appearance tracking algorithm is more suitable for humans than for cows, and ByteTrack’s tracking accuracy is slightly weaker. As a result, we ultimately chose OC-SORT as our tracking algorithm.

Next, we use DeepLabCut (DLC for short) to mark the skeletal points of the cows. DLC is a markerless model, which means that although different points, such as the head and limbs, might have different meanings, they are all just points for DLC, which only requires us to mark the points and train the model.

This leads to a new question: how many points should we mark on each cow and where should we mark them? The answer to this question affects the workload of marking, training, and subsequent inference efficiency. To solve this problem, we must first understand how to identify lame cows.

Based on our research and the inputs of our expert clients, lame cows in videos exhibit the following characteristics:

  • An arched back: The neck and back are curved, forming a triangle with the root of the neck bone (arched-back).
  • Frequent nodding: Each step can cause the cow to lose balance or slip, resulting in frequent nodding (head bobbing).
  • Unstable gait: The cow’s gait changes after a few steps, with slight pauses (gait pattern change).

Comparison between healthy cow and lame cow

With regards to neck and back curvature as well as nodding, experts from AWS Generative AI Innovation Center have determined that marking only seven back points (one on the head, one at the base of the neck, and five on the back) on cattle can result in good identification. Since we now have a frame of identification, we should also be able to recognize unstable gait patterns.

Next, we use mathematical expressions to represent the identification results and form algorithms.

Human identification of these problems isn’t difficult, but precise algorithms are required for computer identification. For example, how does a program know the degree of curvature of a cow’s back given a set of cow back coordinate points? How does it know if a cow is nodding?

In terms of back curvature, we first consider treating the cow’s back as an angle and then we find the vertex of that angle, which allows us to calculate the angle. The problem with this method is that the spine might have bidirectional curvature, making the vertex of the angle difficult to identify. This requires switching to other algorithms to solve the problem.

key-points-of-a-cow

In terms of nodding, we first considered using the Fréchet distance to determine if the cow is nodding by comparing the difference in the curve of the cow’s overall posture. However, the problem is that the cow’s skeletal points might be displaced, causing significant distance between similar curves. To solve this problem, we need to take out the position of the head relative to the recognition box and normalize it.

After normalizing the position of the head, we encountered a new problem. In the image that follows, the graph on the left shows the change in the position of the cow’s head. We can see that due to recognition accuracy issues, the position of the head point will constantly shake slightly. We need to remove these small movements and find the relatively large movement trend of the head. This is where some knowledge of signal processing is needed. By using a Savitzky-Golay filter, we can smooth out a signal and obtain its overall trend, making it easier for us to identify nodding, as shown by the orange curve in the graph on the right.

key points curve

Additionally, after dozens of hours of video recognition, we found that some cows with extremely high back curvature actually did not have a hunched back. Further investigation revealed that this was because most of the cows used to train the DLC model were mostly black or black and white, and there weren’t many cows that were mostly white or close to pure white, resulting in the model recognizing them incorrectly when they had large white areas on their bodies, as shown by the red arrow in the figure below. This can be corrected through further model training.

In addition to solving the preceding problems, there were other generic problems that needed to be solved:

  • There are two paths in the video frame, and cows in the distance might also be recognized, causing problems.
  • The paths in the video also have a certain curvature, and the cow’s body length becomes shorter when the cow is on the sides of the path, making the posture easy to identify incorrectly.
  • Due to the overlap of multiple cows or occlusion from the fence, the same cow might be identified as two cows.
  • Due to tracking parameters and occasional frame skipping of the camera, it’s impossible to correctly track the cows, resulting in ID confusion issues.

In the short term, based on the alignment with New Hope Dairy on delivering a minimum viable product and then iterate on it, these problems can usually be solved by outlier judgment algorithms combined with confidence filtering, and if they cannot be solved, they will become invalid data, which requires us to perform additional training and continuously iterate our algorithms and models.

In the long term, AWS AI Shanghai Lablet provided future experiment suggestions to solve the preceding problems based on their object-centric research: Bridging the Gap to Real-World Object-Centric Learning and Self-supervised Amodal Video Object Segmentation. Besides invalidating those outlier data, the issues can also be addressed by developing more precise object-level models for pose estimation, amodal segmentation, and supervised tracking. However, traditional vision pipelines for these tasks typically require extensive labeling. Object-centric learning focuses on tackling the binding problem of pixels to objects without additional supervision. The binding process not only provides information on the location of objects but also results in robust and adaptable object representations for downstream tasks. Because the object-centric pipeline focuses on self-supervised or weakly-supervised settings, we can improve performance without significantly increasing labeling costs for our customers.

After solving a series of problems and combining the scores given by the farm veterinarian and nutritionist, we have obtained a comprehensive lameness score for cows, which helps us identify cows with different degrees of lameness such as severe, moderate, and mild, and can also identify multiple body posture attributes of cows, helping further analysis and judgment.

Within weeks, we developed an end-to-end solution for identifying lame cows. The hardware camera for this solution cost only 300 RMB, and the Amazon SageMaker batch inference, when using the g4dn.xlarge instance, took about 50 hours for 2 hours of video, totaling only 300 RMB. When it enters production, if five batches of cows are detected per week (assuming about 10 hours), and including the rolling saved videos and data, the monthly detection cost for a medium-sized ranch with several thousand cows is less than 10,000 RMB.

Currently, our machine learning model process is as follows:

  1. Raw video is recorded.
  2. Cows are detected and identified.
  3. Each cow is tracked, and key points are detected.
  4. Each cow’s movement is analyzed.
  5. A lameness score is determined.

identification process

Model deployment

We’ve described the solution for identifying lame cows based on machine learning before. Now, we need to deploy these models on SageMaker. As shown in the following figure:

Architecture diagram

Business implementation

Of course, what we’ve discussed so far is just the core of our technical solution. To integrate the entire solution into the business process, we also must address the following issues:

  • Data feedback: For example, we must provide veterinarians with an interface to filter and view lame cows that need to be processed and collect data during this process to use as training data.
  • Cow identification: After a veterinarian sees a lame cow, they also need to know the cow’s identity, such as its number and pen.
  • Cow positioning: In a pen with hundreds of cows, quickly locate the target cow.
  • Data mining: For example, find out how the degree of lameness affects feeding, rumination, rest, and milk production.
  • Data-driven: For example, identify the genetic, physiological, and behavioral characteristics of lame cows to achieve optimal breeding and reproduction.

Only by addressing these issues can the solution truly solve the business problem, and the collected data can generate long-term value. Some of these problems are system integration issues, while others are technology and business integration issues. We will share further information about these issues in future articles.

Summary

In this article, we briefly explained how the AWS Customer Solutions team innovates quickly based on the customer’s business. This mechanism has several characteristics:

  • Business led: Prioritize understanding the customer’s industry and business processes on site and in person before discussing technology, and then delve into the customer’s pain points, challenges, and problems to identify important issues that can be solved with technology.
  • Immediately available: Provide a simple but complete and usable prototype directly to the customer for testing, validation, and rapid iteration within weeks, not months.
  • Minimal cost: Minimize or even eliminate the customer’s costs before the value is truly validated, avoiding concerns about the future. This aligns with the AWS frugality leadership principle.

In our collaborative innovation project with the dairy industry, we not only started from the business perspective to identify specific business problems with business experts, but also conducted on-site investigations at the farm and factory with the customer. We determined the camera placement on site, installed and deployed the cameras, and deployed the video streaming solution. Experts from AWS Generative AI Innovation Center dissected the customer’s requirements and developed an algorithm, which was then engineered by a solution architect for the entire algorithm.

With each inference, we could obtain thousands of decomposed and tagged cow walking videos, each with the original video ID, cow ID, lameness score, and various detailed scores. The complete calculation logic and raw gait data were also retained for subsequent algorithm optimization.

Lameness data can not only be used for early intervention by veterinarians, but also combined with milking machine data for cross-analysis, providing an additional validation dimension and answering some additional business questions, such as: What are the physical characteristics of cows with the highest milk yield? What is the effect of lameness on milk production in cows? What is the main cause of lame cows, and how can it be prevented? This information will provide new ideas for farm operations.

The story of identifying lame cows ends here, but the story of farm innovation has just begun. In subsequent articles, we will continue to discuss how we work closely with customers to solve other problems.


About the Authors


Hao Huang
is an applied scientist at the AWS Generative AI Innovation Center. He specializes in Computer Vision (CV) and Visual-Language Model (VLM). Recently, he has developed a strong interest in generative AI technologies and has already collaborated with customers to apply these cutting-edge technologies to their business. He is also a reviewer for AI conferences such as ICCV and AAAI.


Peiyang He
is a senior data scientist at the AWS Generative AI Innovation Center. She works with customers across a diverse spectrum of industries to solve their most pressing and innovative business needs leveraging GenAI/ML solutions. In her spare time, she enjoys skiing and traveling.


Xuefeng Liu
leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific and Greater China regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.


Tianjun Xiao
is a senior applied scientist at the AWS AI Shanghai Lablet, co-leading the computer vision efforts. Presently, his primary focus lies in the realms of multimodal foundation models and object-centric learning. He is actively investigating their potential in diverse applications, including video analysis, 3D vision and autonomous driving.


Zhang Dai
is a an AWS senior solution architect for China Geo Business Sector. He helps companies of various sizes achieve their business goals by providing consultancy on business processes, user experience and cloud technology. He is a prolific blog writer and also author of two books: The Modern Autodidact and Designing Experience.


Jianyu Zeng
is a senior customer solutions manager at AWS, whose responsibility is to support customers, such as New Hope group, during their cloud transition and assist them in realizing business value through cloud-based technology solutions. With a strong interest in artificial intelligence, he is constantly exploring ways to leverage AI to drive innovative changes in our customer’s businesses.


Carol Tong Min
is a senior business development manager, responsible for Key Accounts in GCR GEO West, including two important enterprise customers: Jiannanchun Group and New Hope Group. She is customer obsessed, and always passionate about supporting and accelerating customers’ cloud journey.

Nick Jiang is a senior specialist sales at AIML SSO team in China. He is focus on transferring innovative AIML solutions and helping with customer to build the AI related workloads within AWS.

Read More