Establishing an AI/ML center of excellence

Establishing an AI/ML center of excellence

The rapid advancements in artificial intelligence and machine learning (AI/ML) have made these technologies a transformative force across industries. According to a McKinsey study, across the financial services industry (FSI), generative AI is projected to deliver over $400 billion (5%) of industry revenue in productivity benefits. As maintained by Gartner, more than 80% of enterprises will have AI deployed by 2026. At Amazon, we believe innovation (rethink and reinvent) drives improved customer experiences and efficient processes, leading to increased productivity. Generative AI is a catalyst for business transformation, making it imperative for FSI organizations to determine where generative AI’s current capabilities could deliver the biggest value for FSI customers.

Organizations across industries face numerous challenges implementing generative AI across their organization, such as lack of clear business case, scaling beyond proof of concept, lack of governance, and availability of the right talent. An effective approach that addresses a wide range of observed issues is the establishment of an AI/ML center of excellence (CoE). An AI/ML CoE is a dedicated unit, either centralized or federated, that coordinates and oversees all AI/ML initiatives within an organization, bridging business strategy to value delivery. As observed by Harvard Business Review, an AI/ML CoE is already established in 37% of large companies in the US. For organizations to be successful in their generative AI journey, there is growing importance for coordinated collaboration across lines of businesses and technical teams.

This post, along with the Cloud Adoption Framework for AI/ML and Well-Architected Machine Learning Lens, serves as a guide for implementing an effective AI/ML CoE with the objective to capture generative AI’s possibilities. This includes guiding practitioners to define the CoE mission, forming a leadership team, integrating ethical guidelines, qualification and prioritization of use cases, upskilling of teams, implementing governance, creating infrastructure, embedding security, and enabling operational excellence.

What is an AI/ML CoE?

The AI/ML CoE is responsible for partnering with lines of business and end-users in identifying AI/ML use cases aligned to business and product strategy, recognizing common reusable patterns from different business units (BUs), implementing a company-wide AI/ML vision, and deploying an AI/ML platform and workloads on the most appropriate combination of computing hardware and software. The CoE team synergizes business acumen with profound technical AI/ML proficiency to develop and implement interoperable, scalable solutions throughout the organization. They establish and enforce best practices encompassing design, development, processes, and governance operations, thereby mitigating risks and making sure robust business, technical, and governance frameworks are consistently upheld. For ease of consumption, standardization, scalability, and value delivery, the outputs of an AI/ML CoE can be of two types: guidance such as published guidance, best practices, lessons learned, and tutorials, and capabilities such as people skills, tools, technical solutions, and reusable templates.

The following are benefits of establishing an AI/ML CoE:

  • Faster time to market through a clear path to production
  • Maximized return on investments through delivering on the promise of generative AI business outcomes
  • Optimized risk management
  • Structured upskilling of teams
  • Sustainable scaling with standardized workflows and tooling
  • Better support and prioritization of innovation initiatives

The following figure illustrates the key components for establishing an effective AI/ML CoE.

AIML CoE Framework

In the following sections, we discuss each numbered component in detail.

1. Sponsorship and mission

The foundational step in setting up an AI/ML CoE is securing sponsorship from senior leadership, establishing leadership, defining its mission and objectives, and aligning empowered leadership.

Establish sponsorship

Establish clear leadership roles and structure to provide decision-making processes, accountability, and adherence to ethical and legal standards:

  • Executive sponsorship – Secure support from senior leadership to champion AI/ML initiatives
  • Steering committee – Form a committee of key stakeholders to oversee the AI/ML CoE’s activities and strategic direction
  • Ethics board – Create a board to address ethical and responsible AI considerations in AI/ML development and deployment

Define the mission

Making the mission customer- or product-focused and aligned with the organization’s overall strategic goals helps outline the AI/ML CoE’s role in achieving them. This mission, usually set by the executive sponsor in alignment with the heads of business units, serves as a guiding principle for all CoE activities, and contains the following:

  • Mission statement – Clearly articulate the purpose of the CoE in advancing customer and product outcomes applying AI/ML technologies
  • Strategic objectives – Outline tangible and measurable AI/ML goals that align with the organization’s overall strategic goals
  • Value proposition – Quantify the expected business value Key Performance Indicators (KPIs) such as cost savings, revenue gains, user satisfaction, time savings, and time-to-market

2. People

According to a Gartner report, 53% of business, functional, and technical teams rate their technical acumen on generative AI as “Intermediate” and 64% of senior leadership rate their skill as “Novice.” By developing customized solutions tailored to the specific and evolving needs of the business, you can foster a culture of continuous growth and learning and cultivate a deep understanding of AI and ML technologies, including generative AI skill development and enablement.

Training and enablement

To help educate employees on AI/ML concepts, tools, and techniques, the AI/ML CoE can develop training programs, workshops, certification programs, and hackathons. These programs can be tailored to different levels of expertise and designed to help employees understand how to use AI/ML to solve business problems. Additionally, the CoE could provide a mentoring platform to employees who are interested in further enhancing their AI/ML skills, develop certification programs to recognize employees who have achieved a certain level of proficiency in AI/ML, and provide ongoing training to keep the team updated with the latest technologies and methodologies.

Dream team

Cross-functional engagement is essential to achieve well-rounded AI/ML solutions. Having a multidisciplinary AI/ML CoE that combines industry, business, technical, compliance, and operational expertise helps drive innovation. It harnesses the 360 view potential of AI in achieving a company’s strategic business goals. Such a diverse team with AI/ML expertise may include roles such as:

  • Product strategists – Make sure all products, features, and experiments are cohesive to the overall transformation strategy
  • AI researchers – Employ experts in the field to drive innovation and explore cutting-edge techniques such as generative AI
  • Data scientists and ML engineers – Develop capabilities for data preprocessing, model training, and validation
  • Domain experts – Collaborate with professionals from business units who understand the specific applications and business need
  • Operations – Develop KPIs, demonstrate value delivery, and manage machine learning operations (MLOPs) pipelines
  • Project managers – Appoint project managers to implement projects efficiently

Knowledge sharing

By fostering collaboration within the CoE, internal stakeholders, business unit teams, and external stakeholders, you can enable knowledge sharing and cross-disciplinary teamwork. Encourage knowledge sharing, establish a knowledge repository, and facilitate cross-functional projects to maximize the impact of AI/ML initiatives. Some example key actions to foster knowledge sharing are:

  • Cross-functional collaborations – Promote teamwork between experts in generative AI and business unit domain-specific professionals to innovate on cross-functional use cases
  • Strategic partnerships – Investigate partnerships with research institutions, universities, and industry leaders specializing in generative AI to take advantage of their collective expertise and insights

3. Governance

Establish governance that enables the organization to scale value delivery from AI/ML initiatives while managing risk, compliance, and security. Additionally, pay special attention to the changing nature of the risk and cost that is associated with the development as well as the scaling of AI.

Responsible AI

Organizations can navigate potential ethical dilemmas associated with generative AI by incorporating considerations such as fairness, explainability, privacy and security, robustness, governance, and transparency. To provide ethical integrity, an AI/ML CoE helps integrate robust guidelines and safeguards across the AI/ML lifecycle in collaboration with stakeholders. By taking a proactive approach, the CoE provides ethical compliance but also builds trust, enhances accountability, and mitigates potential risks such as veracity, toxicity, data misuse, and intellectual property concerns.

Standards and best practices

Continuing its stride towards excellence, the CoE helps define common standards, industry-leading practices, and guidelines. These encompass a holistic approach, covering data governance, model development, ethical deployment, and ongoing monitoring, reinforcing the organization’s commitment to responsible and ethical AI/ML practices. Examples of such standards include:

  • Development framework – Establishing standardized frameworks for AI development, deployment, and governance provides consistency across projects, making it easier to adopt and share best practices.
  • Repositories – Centralized code and model repositories facilitate the sharing of best practices and industry standard solutions in coding standards, enabling teams to adhere to consistent coding conventions for better collaboration, reusability, and maintainability.
  • Centralized knowledge hub – A central repository housing datasets and research discoveries to serve as a comprehensive knowledge center.
  • Platform – A central platform such as Amazon SageMaker for creation, training, and deployment. It helps manage and scale central policies and standards.
  • Benchmarking and metrics – Defining standardized metrics and benchmarking to measure and compare the performance of AI models, and the business value derived.

Data governance

Data governance is a crucial function of an AI/ML CoE, such as making sure data is collected, used, and shared in a responsible and trustworthy manner. Data governance is essential for AI applications, because these applications often use large amounts of data. The quality and integrity of this data are critical to the accuracy and fairness of AI-powered decisions. The AI/ML CoE helps define best practices and guidelines for data preprocessing, model development, training, validation, and deployment. The CoE should make sure that data is accurate, complete, and up-to-date; the data is protected from unauthorized access, use, or disclosure; and data governance policies demonstrate the adherence to regulatory and internal compliance.

Model oversight

Model governance is a framework that determines how a company implements policies, controls access to models, and tracks their activity. The CoE helps make sure that models are developed and deployed in a safe, trustworthy, and ethical fashion. Additionally, it can confirm that model governance policies demonstrate the organization’s commitment to transparency, fostering trust with customers, partners, and regulators. It can also provide safeguards customized to your application requirements and make sure responsible AI policies are implemented using services such as Guardrails for Amazon Bedrock.

Value delivery

Manage the AI/ML initiative return on investment, platform and services expenses, efficient and effective use of resources, and ongoing optimization. This requires monitoring and analyzing use case-based value KPIs and expenditures related to data storage, model training, and inference. This includes assessing the performance of various AI models and algorithms to identify cost-effective, resource-optimal solutions such as using AWS Inferentia for inference and AWS Trainium for training. Setting KPIs and metrics is pivotal to gauge effectiveness. Some example KPIs are:

  • Return on investment (ROI) – Evaluating financial returns against investments justifies resource allocation for AI projects
  • Business impact – Measuring tangible business outcomes like revenue uplift or enhanced customer experiences validates AI’s value
  • Project delivery time – Tracking time from project initiation to completion showcases operational efficiency and responsiveness

4. Platform

The AI/ML CoE, in collaboration with the business and technology teams, can help build an enterprise-grade and scalable AI platform, enabling organizations to operate AI-enabled services and products across business units. It can also help develop custom AI solutions and help practitioners adapt to change in AI/ML development.

Data and engineering architecture

The AI/ML CoE helps set up the right data flows and engineering infrastructure, in collaboration with the technology teams, to accelerate the adoption and scaling of AI-based solutions:

  • High-performance computing resources – Powerful GPUs such as Amazon Elastic Compute Cloud (Amazon EC2) instances, powered by the latest NVIDIA H100 Tensor Core GPUs, are essential for training complex models.
  • Data storage and management – Implement robust data storage, processing, and management systems such as AWS Glue and Amazon OpenSearch Service.
  • Platform – Using cloud platforms can provide flexibility and scalability for AI/ML projects for tasks such as SageMaker, which can help provide end-to-end ML capability across generative AI experimentation, data prep, model training, deployment, and monitoring. This further helps accelerate generative AI workloads from experimentation to production. Amazon Bedrock is an easier way to build and scale generative AI applications with foundation models (FMs). As a fully managed service, it offers a choice of high-performing FMs from leading AI companies including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon.
  • Development tools and frameworks – Use industry-standard AI/ML frameworks and tools such as Amazon CodeWhisperer, Apache MXNet, PyTorch, and TensorFlow.
  • Version control and collaboration tools – Git repositories, project management tools, and collaboration platforms can facilitate teamwork, such as AWS CodePipeline and Amazon CodeGuru.
  • Generative AI frameworks – Utilize state-of-the-art foundation models, tools, agents, knowledge bases, and guardrails available on Amazon Bedrock.
  • Experimentation platforms – Deploy platforms for experimentation and model development, allowing for reproducibility and collaboration, such as Amazon SageMaker JumpStart.
  • Documentation – Emphasize the documentation of processes, workflows, and best practices within the platform to facilitate knowledge sharing among practitioners and teams.

Lifecycle management

Within the AI/ML CoE, the emphasis on scalability, availability, reliability, performance, and resilience is fundamental to the success and adaptability of AI/ML initiatives. Implementation and operationalization of a lifecycle management system such as MLOps can help automate deployment and monitoring, resulting in improved reliability, time to market, and observability. Using tools like Amazon SageMaker Pipelines for workflow management, Amazon SageMaker Experiments for managing experiments, and Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration enables adaptable deployment and management of AI/ML applications, fostering scalability and portability across various environments. Similarly, employing serverless architectures such as AWS Lambda empowers automatic scaling based on demand, reducing operational complexity while offering flexibility in resource allocation.

Strategic alliances in AI services

The decision to buy or build solutions involves trade-offs. Buying offers speed and convenience by using pre-built tools, but may lack customization. On the other hand, building provides tailored solutions but demands time and resources. The balance hinges on the project scope, timeline, and long-term needs, achieving optimal alignment with organizational goals and technical requirements. The decision, ideally, can be based on a thorough assessment of the specific problem to be solved, the organization’s internal capabilities, and the area of the business targeted for growth. For example, if the business system helps establish uniqueness and then builds to differentiate in the market, or if the business system supports a standard commoditized business process, then buys to save.

By partnering with third-party AI service providers, such as AWS Generative AI Competency Partners, the CoE can use their expertise and experience to accelerate the adoption and scaling of AI-based solutions. These partnerships can help the CoE stay up to date with the latest AI/ML research and trends, and can provide access to cutting-edge AI/ML tools and technologies. Additionally, third-party AI service providers can help the CoE identify new use cases for AI/ML and can provide guidance on how to implement AI/ML solutions effectively.

5. Security

Emphasize, assess, and implement security and privacy controls across the organization’s data, AI/ML, and generative AI workloads. Integrate security measures across all aspects of AI/ML to identify, classify, remediate, and mitigate vulnerabilities and threats.

Holistic vigilance

Based on how your organization is using generative AI solutions, scope the security efforts, design resiliency of the workloads, and apply relevant security controls. This includes employing encryption techniques, multifactor authentication, threat detection, and regular security audits to make sure data and systems remain protected against unauthorized access and breaches. Regular vulnerability assessments and threat modeling are crucial to address emerging threats. Strategies such as model encryption, using secure environments, and continuous monitoring for anomalies can help protect against adversarial attacks and malicious misuse. To monitor the model for threats detection, you can use tools like Amazon GuardDuty. With Amazon Bedrock, you have full control over the data you use to customize the foundation models for your generative AI applications. Data is encrypted in transit and at rest. User inputs and model outputs are not shared with any model providers; keeping your data and applications secure and private.

End-to-end assurance

Enforcing the security of the three critical components of any AI system (inputs, model, and outputs) is critical. Establishing clearly defined roles, security policies, standards, and guidelines across the lifecycle can help manage the integrity and confidentiality of the system. This includes implementation of industry best practice measures and industry frameworks, such as NIST, OWASP-LLM, OWASP-ML, MITRE Atlas. Furthermore, evaluate and implement requirements such as Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) and European Union’s General Data Protection Regulation (GDPR). You can use tools such as Amazon Macie to discover and protect your sensitive data.

Infrastructure (data and systems)

Given the sensitivity of the data involved, exploring and implementing access and privacy-preserving techniques is vital. This involves techniques such as least privilege access, data lineage, keeping only relevant data for use case, and identifying and classifying sensitive data to enable collaboration without compromising individual data privacy. It’s essential to embed these techniques within the AI/ML development lifecycle workflows, maintain a secure data and modeling environment, and stay in compliance with privacy regulations and protect sensitive information. By integrating security-focused measures into the AI/ML CoE’s strategies, the organization can better mitigate risks associated with data breaches, unauthorized access, and adversarial attacks, thereby providing integrity, confidentiality, and availability for its AI assets and sensitive information.

6. Operations

The AI/ML CoE needs to focus on optimizing the efficiency and growth potential of implementing generative AI within the organization’s framework. In this section, we discuss several key aspects aimed at driving successful integration while upholding workload performance.

Performance management

Setting KPIs and metrics is pivotal to gauge effectiveness. Regular assessment of these metrics allows you to track progress, identify trends, and foster a culture of continual improvement within the CoE. Reporting on these insights provides alignment with organizational objectives and informs decision-making processes for enhanced AI/ML practices. Solutions such as Bedrock integration with Amazon CloudWatch, helps track and manage usage metrics, and build customized dashboards for auditing.

An example KPI is model accuracy: assessing models against benchmarks provides reliable and trustworthy AI-generated outcomes.

Incident management

AI/ML solutions need ongoing control and observation to manage any anomalous activities. This requires establishing processes and systems across the AI/ML platform, ideally automated. A standardized incident response strategy needs to be developed and implemented in alignment with the chosen monitoring solution. This includes elements such as formalized roles and responsibilities, data sources and metrics to be monitored, systems for monitoring, and response actions such as mitigation, escalation, and root cause analysis.

Continuous improvement

Define rigorous processes for generative AI model development, testing, and deployment. Streamline the development of generative AI models by defining and refining robust processes. Regularly evaluate the AI/ML platform performance and enhance generative AI capabilities. This involves incorporating feedback loops from stakeholders and end-users and dedicating resources to exploratory research and innovation in generative AI. These practices drive continual improvement and keep the CoE at the forefront of AI innovation. Furthermore, implement generative AI initiatives seamlessly by adopting agile methodologies, maintaining comprehensive documentation, conducting regular benchmarking, and implementing industry best practices.

7. Business

The AI/ML CoE helps drive business transformation by continuously identifying priority pain points and opportunities across business units. Aligning business challenges and opportunities to customized AI/ML capabilities, the CoE drives rapid development and deployment of high-value solutions. This alignment to real business needs enables step-change value creation through new products, revenue streams, productivity, optimized operations, and customer satisfaction.

Envision an AI strategy

With the objective to drive business outcomes, establish a compelling multi-year vision and strategy on how the adoption of AI/ML and generative AI techniques can transform major facets of the business. This includes quantifying the tangible value at stake from AI/ML in terms of revenues, cost savings, customer satisfaction, productivity, and other vital performance indicators over a defined strategic planning timeline, such as 3–5 years. Additionally, the CoE must secure buy-in from executives across business units by making the case for how embracing AI/ML will create competitive advantages and unlock step-change improvements in key processes or offerings.

Use case management

To identify, qualify, and prioritize the most promising AI/ML use cases, the CoE facilitates an ongoing discovery dialogue with all business units to surface their highest-priority challenges and opportunities. Each complex business issue or opportunity must be articulated by the CoE, in collaboration with business unit leaders, as a well-defined problem and opportunity statement that lends itself to an AI/ML-powered solution. These opportunities establish clear success metrics tied to business KPIs and outline the potential value impact vs. implementation complexity. A prioritized pipeline of high-potential AI/ML use cases can then be created, ranking opportunities based on expected business benefit and feasibility.

Proof of concept

Before undertaking full production development, prototype proposed solutions for high-value use cases through controlled proof of concept (PoC) projects focused on demonstrating initial viability. Rapid feedback loops during these PoC phases allow for iteration and refinement of approaches at a small scale prior to wider deployment. The CoE establishes clear success criteria for PoCs, in alignment with business unit leaders, that map to business metrics and KPIs for ultimate solution impact. Furthermore, the CoE can engage to share expertise, reusable assets, best practices, and standards.

Executive alignment

To provide full transparency, the business unit executive stakeholders must be aligned with AI/ML initiatives, and have regular reporting with them. This way, any challenges that need to be escalated can be quickly resolved with executives who are familiar with the initiatives.

8. Legal

The legal landscape of AI/ML and generative AI is complex and evolving, presenting a myriad of challenges and implications for organizations. Issues such as data privacy, intellectual property, liability, and bias require careful consideration within the AI/ML CoE. As regulations struggle to keep pace with technological advancements, the CoE must partner with the organization’s legal team to navigate this dynamic terrain to enforce compliance and responsible development and deployment of these technologies. The evolving landscape demands that the CoE, working in collaboration with the legal team, develops comprehensive AI/ML governance policies covering the entire AI/ML lifecycle. This process involves business stakeholders in decision-making processes and regular audits and reviews of AI/ML systems to validate compliance with governance policies.

9. Procurement

The AI/ML CoE needs to work with partners, both Independent Software Vendors (ISV) and System Integrators (SI) to help with the buy and build strategies. They need to partner with the procurement team to develop a selection, onboarding, management, and exit framework. This includes acquiring technologies, algorithms, and datasets (sourcing reliable datasets is crucial for training ML models, and acquiring cutting-edge algorithms and generative AI tools enhances innovation). This will help accelerated development of capabilities needed for business. Procurement strategies must prioritize ethical considerations, data security, and ongoing vendor support to provide sustainable, scalable, and responsible AI integration.

10. Human Resources

Partner with Human Resources (HR) on AI/ML talent management and pipeline. This involves cultivating talent to understand, develop, and implement these technologies. HR can help bridge the technical and non-technical divide, fostering interdisciplinary collaboration, building a path for onboarding new talent, training them, and growing them on both professional and skills. They can also address ethical concerns through compliance training, upskill employees on the latest emerging technologies, and manage the impact of job roles that are critical for continued success.

11. Regulatory and compliance

The regulatory landscape for AI/ML is rapidly evolving, with governments worldwide racing to establish governance regimes for the increasing adoption of AI applications. The AI/ML CoE needs a focused approach to stay updated, derive actions, and implement regulatory legislations such as Brazil’s General Personal Data Protection Law (LGPD), Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA), and the European Union’s General Data Protection Regulation (GDPR), and frameworks such as ISO 31700, ISO 29100, ISO 27701, Federal Information Processing Standards (FIPS), and NIST Privacy Framework. In the US, regulatory actions include mitigating risks posed by the increased adoption of AI, protecting workers affected by generative AI, and providing stronger consumer protections. The EU AI Act includes new assessment and compliance requirements.

As AI regulations continue to take shape, organizations are advised to establish responsible AI as a C-level priority, set and enforce clear governance policies and processes around AI/ML, and involve diverse stakeholders in decision-making processes. The evolving regulations emphasize the need for comprehensive AI governance policies that cover the entire AI/ML lifecycle, and regular audits and reviews of AI systems to address biases, transparency, and explainability in algorithms. Adherence to standards fosters trust, mitigates risks, and promotes responsible deployment of these advanced technologies.

Conclusion

The journey to establishing a successful AI/ML center of excellence is a multifaceted endeavor that requires dedication and strategic planning, while operating with agility and collaborative spirit. As the landscape of artificial intelligence and machine learning continues to evolve at a rapid pace, the creation of an AI/ML CoE represents a necessary step towards harnessing these technologies for transformative impact. By focusing on the key considerations, from defining a clear mission to fostering innovation and enforcing ethical governance, organizations can lay a solid foundation for AI/ML initiatives that drive value. Moreover, an AI/ML CoE is not just a hub for technological innovation; it’s a beacon for cultural change within the organization, promoting a mindset of continuous learning, ethical responsibility, and cross-functional collaboration.

Stay tuned as we continue to explore the AI/ML CoE topics in our upcoming posts in this series. If you need help establishing an AI/ML Center of Excellence, please reach out to a specialist.


About the Authors

Ankush Chauhan is a Sr. Manager, Customer Solutions at AWS based in New York, US. He supports Capital Markets customers optimize their cloud journey, scale adoption, and realize the transformative value of building and inventing in the cloud. In addition, he is focused on enabling customers on their AI/ML journeys including generative AI. Beyond work, you can find Ankush running, hiking, or watching soccer.

Ava Kong is a Generative AI Strategist at the AWS Generative AI Innovation Center, specializing in the financial services sector. Based in New York, Ava has worked closely with a variety of financial institutions on a variety of use cases, combining the latest in generative AI technology with strategic insights to enhance operational efficiency, drive business outcomes, and demonstrate the broad and impactful application of AI technologies.

Vikram Elango is a Sr. AI/ML Specialist Solutions Architect at AWS, based in Virginia, US. He is currently focused on generative AI, LLMs, prompt engineering, large model inference optimization, and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking, and camping with his family.

Rifat Jafreen is a Generative AI Strategist in the AWS Generative AI Innovation center where her focus is to help customers realize business value and operational efficiency by using generative AI. She has worked in industries across telecom, finance, healthcare and energy; and onboarded machine learning workloads for numerous customers. Rifat is also very involved in MLOps, FMOps and Responsible AI.

Authors would like to extend special thanks to Arslan Hussain, David Ping, Jarred Graber, and Raghvender Arni, for their support, expertise, and guidance.

Read More

‘Honkai: Star Rail’ Blasts Off on GeForce NOW

‘Honkai: Star Rail’ Blasts Off on GeForce NOW

Gear up, Trailblazers — Honkai: Star Rail lands on GeForce NOW this week, along with an in-game reward for members to celebrate the title’s launch in the cloud.

Stream it today, along with five new games joining the GeForce NOW library of more than 1,900 titles this week.

Five Stars

Take a galactic journey in the cloud with Honkai: Star Rail, a new Cosmic Adventure Strategy role-playing game from HoYoverse, the company behind Genshin Impact. The title seamlessly blends intricate storytelling with immersive gameplay mechanics for an epic journey through the cosmos.

Meet a cast of unique characters and explore diverse planets, each with its own mysteries to uncover. Assemble formidable teams, strategically deploying skills and resources to overcome mighty adversaries and unravel the mysteries of the Honkai phenomenon. Encounter new civilizations and face off against threats that endanger the Astral Express, overcome the struggles caused by Stellaron together, powerful artifacts that hold the keys to the universe’s fate.

Begin the trailblazing journey without needing to wait for downloads or game updates with GeForce NOW. Members who’ve opted into GeForce NOW’s Rewards program will receive an email with a code for a Honkai: Star Rail starter kit, containing 30,000 credits, three Refined Aethers and three Traveler’s Guides. All aboard the Astral Express for adventures and thrills!

A Big Cloud for New Games 

Little Kitty Big City on GeForce MEOW
Stream it on GeForce MEOW.

Do what cats do best in Little Kitty, Big City, the open-world adventure game from Double Dagger Studios. Explore the city as a curious little kitty with a big personality, make new friends with stray animals, and wear delightful little hats. Create a little bit of chaos finding the way back home throughout the big city.

Here’s the full list of new games this week:

  • Little Kitty, Big City (New release on Steam and Xbox, available on PC Game Pass, May 9)
  • Farmer’s Life (Steam)
  • Honkai: Star Rail (Epic Games Store)
  • Supermarket Simulator (Steam)
  • Tomb Raider: Definitive Edition (Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Build a Hugging Face text classification model in Amazon SageMaker JumpStart

Build a Hugging Face text classification model in Amazon SageMaker JumpStart

Amazon SageMaker JumpStart provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including image, text, and tabular.

This post introduces using the text classification and fill-mask models available on Hugging Face in SageMaker JumpStart for text classification on a custom dataset. We also demonstrate performing real-time and batch inference for these models. This supervised learning algorithm supports transfer learning for all pre-trained models available on Hugging Face. It takes a piece of text as input and outputs the probability for each of the class labels. You can fine-tune these pre-trained models using transfer learning even when a large corpus of text isn’t available. It’s available in the SageMaker JumpStart UI in Amazon SageMaker Studio. You can also use it through the SageMaker Python SDK, as demonstrated in the example notebook Introduction to SageMaker HuggingFace – Text Classification.

Solution overview

Text classification with Hugging Face in SageMaker provides transfer learning on all pre-trained models available on Hugging Face. According to the number of class labels in the training data, a classification layer is attached to the pre-trained Hugging Face model. Then either the whole network, including the pre-trained model, or only the top classification layer can be fine-tuned on the custom training data. In this transfer learning mode, training can be achieved even with a smaller dataset.

In this post, we demonstrate how to do the following:

  • Use the new Hugging Face text classification algorithm
  • Perform inference with the Hugging Face text classification algorithm
  • Fine-tune the pre-trained model on a custom dataset
  • Perform batch inference with the Hugging Face text classification algorithm

Prerequisites

Before you run the notebook, you must complete some initial setup steps. Let’s set up the SageMaker execution role so it has permissions to run AWS services on your behalf:

!pip install sagemaker --upgrade --quiet

import sagemaker, boto3, json
from sagemaker.session import Session
sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

Run inference on the pre-trained model

SageMaker JumpStart support inference for any text classification model available through Hugging Face. The model can be hosted for inference and support text as the application/x-text content type. This will not only allow you to use a set of pre-trained models, but also enable you to choose other classification tasks.

The output contains the probability values, class labels for all classes, and the predicted label corresponding to the class index with the highest probability encoded in JSON format. The model processes a single string per request and outputs only one line. The following is an example of a JSON format response:

accept: application/json;verbose
{"probabilities": [prob_0, prob_1, prob_2, ...],
"labels": [label_0, label_1, label_2, ...],
"predicted_label": predicted_label}

If accept is set to application/json, then the model only outputs probabilities. For more details on training and inference, see the sample notebook.

You can run inference on the text classification model by passing the model_id in the environment variable while creating the object of the Model class. See the following code:

from sagemaker.jumpstart.model import JumpStartModel

hub = {}
HF_MODEL_ID = 'distilbert-base-uncased-finetuned-sst-2-english' # Pass any other HF_MODEL_ID from - https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
hub['HF_MODEL_ID'] = HF_MODEL_ID
hub['HF_TASK'] = 'text-classification'

model = JumpStartModel(model_id=infer_model_id, env =hub, enable_network_isolation=False

Fine-tune the pre-trained model on a custom dataset

You can fine-tune each of the pre-trained fill-mask or text classification models to any given dataset made up of text sentences with any number of classes. The pretrained model attaches a classification layer to the text embedding model and initializes the layer parameters to random values. The output dimension of the classification layer is determined based on the number of classes detected in the input data. The objective is to minimize classification errors on the input data. Then you can deploy the fine-tuned model for inference.

The following are the instructions for how the training data should be formatted for input to the model:

  • Input – A directory containing a data.csv file. Each row of the first column should have an integer class label between 0 and the number of classes. Each row of the second column should have the corresponding text data.
  • Output – A fine-tuned model that can be deployed for inference or further trained using incremental training.

The following is an example of an input CSV file. The file should not have any header. The file should be hosted in an Amazon Simple Storage Service (Amazon S3) bucket with a path similar to the following: s3://bucket_name/input_directory/. The trailing / is required.

|0 |hide new secretions from the parental units|
|0 |contains no wit , only labored gags|
|1 |that loves its characters and communicates something rather beautiful about human nature|
|...|...|

The algorithm also supports transfer learning for Hugging Face pre-trained models. Each model is identified by a unique model_id. The following example shows how to fine-tune a BERT base model identified by model_id=huggingface-tc-bert-base-cased on a custom training dataset. The pre-trained model tarballs have been pre-downloaded from Hugging Face and saved with the appropriate model signature in S3 buckets, such that the training job runs in network isolation.

For transfer learning on your custom dataset, you might need to change the default values of the training hyperparameters. You can fetch a Python dictionary of these hyperparameters with their default values by calling hyperparameters.retrieve_default, update them as needed, and then pass them to the Estimator class. The hyperparameter Train_only_top_layer defines which model parameters change during the fine-tuning process. If train_only_top_layer is True, parameters of the classification layers change and the rest of the parameters remain constant during the fine-tuning process. If train_only_top_layer is False, all parameters of the model are fine-tuned. See the following code:

from sagemaker import hyperparameters# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)# [Optional] Override default hyperparameters with custom values
hyperparameters["epochs"] = "5"

For this use case, we provide SST2 as a default dataset for fine-tuning the models. The dataset contains positive and negative movie reviews. It has been downloaded from TensorFlow under the Apache 2.0 License. The following code provides the default training dataset hosted in S3 buckets:

# Sample training data is available in this bucket
training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
training_data_prefix = "training-datasets/SST/"

training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"

We create an Estimator object by providing the model_id and hyperparameters values as follows:

# Create SageMaker Estimator instance
tc_estimator = JumpStartEstimator(
hyperparameters=hyperparameters,
model_id=dropdown.value,
instance_type=training_instance_type,
metric_definitions=training_metric_definitions,
output_path=s3_output_location,
enable_network_isolation=False if model_id == "huggingface-tc-models" else True
)

To launch the SageMaker training job for fine-tuning the model, call .fit on the object of the Estimator class, while passing the S3 location of the training dataset:

# Launch a SageMaker Training job by passing s3 path of the training data
tc_estimator.fit({"training": training_dataset_s3_path}, logs=True)

You can view performance metrics such as training loss and validation accuracy/loss through Amazon CloudWatch while training. You can also fetch these metrics and analyze them using TrainingJobAnalytics:

df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe() #It will produce a dataframe with different metrics
df.head(10)

The following graph shows different metrics collected from the CloudWatch log using TrainingJobAnalytics.

For more information about how to use the new SageMaker Hugging Face text classification algorithm for transfer learning on a custom dataset, deploy the fine-tuned model, run inference on the deployed model, and deploy the pre-trained model as is without first fine-tuning on a custom dataset, see the following example notebook.

Fine-tune any Hugging Face fill-mask or text classification model

SageMaker JumpStart supports the fine-tuning of any pre-trained fill-mask or text classification Hugging Face model. You can download the required model from the Hugging Face hub and perform the fine-tuning. To use these models, the model_id is provided in the hyperparameters as hub_key. See the following code:

HF_MODEL_ID = "distilbert-base-uncased" # Specify the HF_MODEL_ID here from https://huggingface.co/models?pipeline_tag=fill-mask&sort=downloads or https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
hyperparameters["hub_key"] = HF_MODEL_ID

Now you can construct an object of the Estimator class by passing the updated hyperparameters. You call .fit on the object of the Estimator class while passing the S3 location of the training dataset to perform the SageMaker training job for fine-tuning the model.

Fine-tune a model with automatic model tuning

SageMaker automatic model tuning (ATM), also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. In the following code, you use a HyperparameterTuner object to interact with SageMaker hyperparameter tuning APIs:

from sagemaker.tuner import ContinuousParameter
# Define objective metric based on which the best model will be selected.
amt_metric_definitions = {
"metrics": [{"Name": "val_accuracy", "Regex": "'eval_accuracy': ([0-9\.]+)"}],
"type": "Maximize",
}
# You can select from the hyperparameters supported by the model, and configure ranges of values to be searched for training the optimal model.(https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html)
hyperparameter_ranges = {
"learning_rate": ContinuousParameter(0.00001, 0.0001, scaling_type="Logarithmic")
}
# Increase the total number of training jobs run by AMT, for increased accuracy (and training time).
max_jobs = 6
# Change parallel training jobs run by AMT to reduce total training time, constrained by your account limits.
# if max_jobs=max_parallel_jobs then Bayesian search turns to Random.
max_parallel_jobs = 2

After you have defined the arguments for the HyperparameterTuner object, you pass it the Estimator and start the training. This will find the best-performing model.

Perform batch inference with the Hugging Face text classification algorithm

If the goal of inference is to generate predictions from a trained model on a large dataset where minimizing latency isn’t a concern, then the batch inference functionality may be most straightforward, more scalable, and more appropriate.

Batch inference is useful in the following scenarios:

  • Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset
  • Get inference from large datasets
  • Run inference when you don’t need a persistent endpoint
  • Associate input records with inferences to assist the interpretation of results

For running batch inference in this use case, you first download the SST2 dataset locally. Remove the class label from it and upload it to Amazon S3 for batch inference. You create the object of Model class without providing the endpoint and create the batch transformer object from it. You use this object to provide batch predictions on the input data. See the following code:

batch_transformer = model.transformer(
instance_count=1,
instance_type=inference_instance_type,
output_path=output_path,
assemble_with="Line",
accept="text/csv"
)

batch_transformer.transform(
input_path, content_type="text/csv", split_type="Line"
)

batch_transformer.wait()

After you run batch inference, you can compare the predication accuracy on the SST2 dataset.

Conclusion

In this post, we discussed the SageMaker Hugging Face text classification algorithm. We provided example code to perform transfer learning on a custom dataset using a pre-trained model in network isolation using this algorithm. We also provided the functionality to use any Hugging Face fill-mask or text classification model for inference and transfer learning. Lastly, we used batch inference to run inference on large datasets. For more information, check out the example notebook.


About the authors

Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his master’s from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.

Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that the ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Read More

‘Get On the Train,’ NVIDIA CEO Says at ServiceNow’s Knowledge 2024

‘Get On the Train,’ NVIDIA CEO Says at ServiceNow’s Knowledge 2024

Now’s the time to hop aboard AI, NVIDIA founder and CEO Jensen Huang declared Wednesday as ServiceNow unveiled a demo of futuristic AI avatars together with NVIDIA during a keynote at the Knowledge 24 conference in Las Vegas.

“If something is moving a million times faster every 10 years, what should you do?” Huang asked, citing rapid advancements in AI capabilities. “The first thing you should do is instead of looking at the train, from the side is … get on the train, because on the train, it’s not moving that fast.”

The demo — built on NVIDIA NIM inference microservices and NVIDIA Avatar Cloud Engine, or ACE, speech and animation generative AI technologies, all available with NVIDIA AI Enterprise software — highlighted how AI advancements support cutting-edge digital avatar communications and have the potential to revolutionize customer service interactions.

The demo showed a customer who was struggling with a slow internet connection interacting with a digital avatar. The AI customer service avatar comes to the rescue –  swiftly diagnoses the problem, offers an option for a faster internet connection, confirms the customer’s credit card number and upgrades their internet connection immediately.

The futuristic demonstration took place in front of thousands of conference attendees who were eager to learn about the latest enterprise generative AI technology advancements, which promise to empower workers across the globe.

“We’ve transitioned from instruction-driven computer coding, which very few people can do, to intention-driven computing, which is connecting with somebody through intention,” Huang said during an on-stage conversation at the conference with ServiceNow Chief Operating Officer Chirantan “CJ” Desai.

The moment is another compelling example of the ongoing collaboration between ServiceNow and NVIDIA to explore more engaging, personal service experiences across various functions, including IT services, human resources, customer support and more.

The demonstration builds upon the companies’ plan to collaborate on robust, generative AI capabilities within enterprise operations and incorporates NVIDIA ACE and NVIDIA NIM microservices.

These avatars are designed to add a human-like touch to digital interactions, improving customer experience by providing empathetic and efficient support.

These include NVIDIA Riva for automatic speech recognition and text-to-speech, NVIDIA Audio2Face for facial animation, and NVIDIA Omniverse Renderer for high-quality visual output.

ServiceNow and NVIDIA are further exploring the use of AI avatars to provide another communication option for users who prefer visual interactions.

 

Visit this link to watch a recording of Huang and Desai presenting the digital avatar demo at the Knowledge 24 keynote. 


###END###

Read More

How Dialog Axiata used Amazon SageMaker to scale ML models in production with AI Factory and reduced customer churn within 3 months

How Dialog Axiata used Amazon SageMaker to scale ML models in production with AI Factory and reduced customer churn within 3 months

The telecommunications industry is more competitive than ever before. With customers able to easily switch between providers, reducing customer churn is a crucial priority for telecom companies who want to stay ahead. To address this challenge, Dialog Axiata has pioneered a cutting-edge solution called the Home Broadband (HBB) Churn Prediction Model.

This post explores the intricacies of Dialog Axiata’s approach, from the meticulous creation of nearly 100 features across ­10 distinct areas and the implementation of two essential models using Amazon SageMaker:

  • A base model powered by CatBoost, an open source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm
  • An ensemble model, taking advantage of the strengths of multiple machine learning (ML) models

About Dialog Axiata

Dialog Axiata PLC (part of the Axiata Group Berhad) is one of Sri Lanka’s largest quad-play telecommunications service providers and the country’s largest mobile network operator with 17.1 million subscribers, which amounts to 57% of the Sri Lankan mobile market. Dialog Axiata provides a variety of services, such as fixed-line, home broadband, mobile, television, payment apps, and financial services in Sri Lanka.

In 2022, Dialog Axiata made significant progress in their digital transformation efforts, with AWS playing a key role in this journey. They focused on improving customer service using data with artificial intelligence (AI) and ML and saw positive results, with their Group AI Maturity increasing from 50% to 80%, according to the TM Forum’s AI Maturity Index.

Dialog Axiata runs some of their business-critical telecom workloads on AWS, including Charging Gateway, Payment Gateway, Campaign Management System, SuperApp, and various analytics tasks. They use variety of AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Kubernetes Service (Amazon EKS) for computing, Amazon Relational Database Service (Amazon RDS) for databases, Amazon Simple Storage Service (Amazon S3) for object storage, Amazon OpenSearch Service for search and analytics, SageMaker for ML, and AWS Glue for data integration. This strategic use of AWS services delivers efficiency and scalability of their operations, as well as the implementation of advanced AI/ML applications.

For more about how Axiata uses AWS services, see Axiata Selects AWS as its Primary Cloud Provider to Drive Innovation in the Telecom Industry

Challenges with understanding customer churn

The Sri Lankan telecom market has high churn rates due to several factors. Multiple mobile operators provide similar services, making it easy for customers to switch between providers. Prepaid services dominate the market, and multi-SIM usage is widespread. These conditions lead to a lack of customer loyalty and high churn rates.

In addition to its core business of mobile telephony, Dialog Axiata also offers a number of services, including broadband connections and Dialog TV. However, customer churn is a common issue in the telecom industry. Therefore, Dialog Axiata needs to find ways to reduce their churn rate and retain more of their existing home broadband customers. Potential solutions could involve improving customer satisfaction, enhancing value propositions, analyzing reasons for churn, or implementing customer retention initiatives. The key is for Dialog Axiata to gain insights into why customers are leaving and take meaningful actions to increase customer loyalty and satisfaction.

Solution overview

To reduce customer churn, Dialog Axiata used SageMaker to build a predictive model that assigns each customer a churn risk score. The model was trained on demographic, network usage, and network outage data from across the organization. By predicting churn 45 days in advance, Dialog Axiata is able to proactively retain customers and significantly reduce customer churn.

Dialog Axiata’s churn prediction approach is built on a robust architecture involving two distinct pipelines: one dedicated to training the models, and the other for inference or making predictions. The training pipeline is responsible for developing the base model, which is a CatBoost model trained on a comprehensive set of features. To further enhance the predictive capabilities, an ensemble model is also trained to identify potential churn instances that may have been missed by the base model. This ensemble model is designed to capture additional insights and patterns that the base model alone may not have effectively captured.

The integration of the ensemble model alongside the base model creates a synergistic effect, resulting in a more comprehensive and accurate inference process. By combining the strengths of both models, Dialog Axiata’s churn prediction system gains an enhanced overall predictive capability, providing a more robust and reliable identification of customers at risk of churning.

Both the training and inference pipelines are run three times per month, aligning with Dialog Axiata’s billing cycle. This regular schedule makes sure that the models are trained and updated with the latest customer data, enabling timely and accurate churn predictions.

In the training process, features are sourced from Amazon SageMaker Feature Store, which houses nearly 100 carefully curated features. Because real-time inference is not a requirement for this specific use case, an offline feature store is used to store and retrieve the necessary features efficiently. This approach allows for batch inference, significantly reducing daily expenses to under $0.50 while processing batch sizes averaging around 100,000 customers within a reasonable runtime of approximately 50 minutes.

Dialog Axiata has meticulously selected instance types to strike a balance between optimal resource utilization and cost-effectiveness. However, should the need arise for faster pipeline runtime, larger instance types can be recommended. This flexibility allows Dialog Axiata to adjust the pipeline’s performance based on specific requirements, while considering the trade-off between speed and cost considerations.

After the predictions are generated separately using both the base model and the ensemble model, Dialog Axiata takes action to retain the customers identified as potential churn risks. The customers predicted to churn by the base model, along with those exclusively identified by the ensemble model, are targeted with personalized retention campaigns. By excluding any overlapping customers between the two models, Dialog Axiata ensures a focused and efficient outreach strategy.

The following figure illustrates the output predictions and churn probabilities generated by the base model and the ensemble model.

The first table is the output from the base model, which provides valuable insights into each customer’s churn risk. The columns in this table include a customer identifier (Cx), a Churn Reason column that highlights potential reasons for churn, such as Daily Usage or ARPU Drop (Average Revenue Per User), and a Churn Probability column that quantifies the likelihood of each customer churning.

The second table presents the output from the ensemble model, a complementary approach designed to capture additional churn risks that may have been missed by the base model. This table has two columns: the customer identifier (Cx) and a binary Churn column that indicates whether the customer is predicted to churn (1) or not (0).

The arrows connecting the two tables visually represent the process Dialog Axiata employs to comprehensively identify customers at risk of churning.

The following figure showcases the comprehensive output of this analysis, where customers are meticulously segmented, scored, and classified according to their propensity to churn or discontinue their services. The analysis delves into various factors, such as customer profiles, usage patterns, and behavioral data, to accurately identify those at a higher risk of churning. With this predictive model, Dialog Axiata can pinpoint specific customer segments that require immediate attention and tailored retention efforts.

With this powerful information, Dialog Axiata develops targeted retention strategies and campaigns specifically designed for high-risk customer groups. These campaigns may include personalized offers, as shown in the following figure, incentives, or customized communication aimed at addressing the unique needs and concerns of at-risk customers.

These personalized campaigns, tailored to each customer’s needs and preferences, aim to proactively address their concerns and provide compelling reasons for them to continue their relationship with Dialog Axiata.

Methodologies

This solution uses the following methodologies:

  • Comprehensive analysis of customer data – The foundation of the solution’s success lies in the comprehensive analysis of more than 100 features spanning demographic, usage, payment, network, package, geographic (location), quad-play, customer experience (CX) status, complaint, and other related data. This meticulous approach allows Dialog Axiata to gain valuable insights into customer behavior, enabling them to predict potential churn events with remarkable accuracy.
  • Dual-model strategy (base and ensemble models) – What sets Dialog Axiata’s approach apart is the use of two essential models. The base model, powered by CatBoost, provides a solid foundation for churn prediction. The threshold probability to define churn is calculated by considering ROC optimization and business requirements. Concurrently, the ensemble model strategically combines the strengths of various algorithms. This combination enhances the robustness and accuracy of the predictions. The models are developed considering precision as the evaluation parameter.
  • Actionable insights shared with business units – The insights derived from the models are not confined to the technical realm. Dialog Axiata ensures that these insights are effectively communicated and put into action by sharing the models separately with the business units. This collaborative approach means that the organization is better equipped to proactively address customer churn.
  • Proactive measures with two action types – Equipped with insights from the models, Dialog Axiata has implemented two main action types: network issue-based and non-network issue-based. During the inference phase, the churn status and churn reason are predicted. The top five features that have a high probability for the churn reason are selected using SHAP (SHapley Additive exPlanations). Then, the selected features associated with the churn reason are further classified into two categories: network issue-based and non-network issue-based. If there are features related to network issues, those users are categorized as network issue-based users. The resultant categorization, along with the predicted churn status for each user, is then transmitted for campaign purposes. This information is valuable in scheduling targeted campaigns based on the identified churn reasons, enhancing the precision and effectiveness of the overall campaign strategy.

Dialog Axiata’s AI Factory

Dialog Axiata built the AI Factory to facilitate running all AI/ML workloads on a single platform with multiple capabilities across various building blocks. To tackle technical aspects and challenges related to continuous integration and continuous delivery (CI/CD) and cost-efficiency, Dialog Axiata turned to the AI Factory framework. Using the power of SageMaker as the platform, they implemented separate SageMaker pipelines for model training and inference, as shown in the following diagram.

A primary advantage lies in cost reduction through the implementation of CI/CD pipelines. By conducting experiments within these automated pipelines, significant cost savings could be achieved. It also helps maintain an experiment version tracking system. Additionally, the integration of AI Factory components contributes to a reduction in time to production and overall workload by reducing repetitive tasks through the use of reusable artifacts. The incorporation of an experiment tracking system facilitates the monitoring of performance metrics, enabling a data-driven approach to decision-making.

Furthermore, the deployment of alerting systems enhances the proactive identification of failures, allowing for immediate actions to resolve issues. Data drift and model drift are also monitored. This streamlined process makes sure that any issues are addressed promptly, minimizing downtime and optimizing system reliability. By developing this project under the AI Factory framework, Dialog Axiata could overcome the aforementioned challenges.

Furthermore, the AI Factory framework provides a robust security framework to govern confidential user data and access permissions. It offers solutions to optimize AWS costs, including lifecycle configurations, alerting systems, and monitoring dashboards. These measures contribute to enhanced data security and cost-effectiveness, aligning with Dialog Axiata’s objectives and resulting in the efficient operation of AI initiatives.

Dialog Axiata’s MLOps process

The following diagram illustrates Dialog Axiata’s MLOps process.

The following key components are used in the process:

  • SageMaker as the ML Platform – Dialog Axiata uses SageMaker as their core ML platform to perform feature engineering, and train and deploy models in production.
  • SageMaker Feature Store – By using a centralized repository for ML features, SageMaker Feature Store enhances data consumption and facilitates experimentation with validation data. Instead of directly ingesting data from the data warehouse, the required features for training and inference steps are taken from the feature store. With SageMaker Feature Store, Dialog Axiata could reduce the time for feature creation because they could reuse the same features.
  • Amazon SageMaker PipelinesAmazon SageMaker Pipelines is a CI/CD service for ML. These workflow automation components helped the Dialog Axiata team effortlessly scale their ability to build, train, test, and deploy multiple models in production; iterate faster; reduce errors due to manual orchestration; and build repeatable mechanisms.
  • Reusable components – Employing containerized environments, such as Docker images, and custom modules promoted the bring your own code approach within Dialog Axiata’s ML pipelines.
  • Monitoring and alerting – Monitoring tools and alert systems provided ongoing success by keeping track of the model and pipeline status.

Business outcomes

The churn prediction solution implemented by Dialog Axiata has yielded remarkable business outcomes, exemplifying the power of data-driven decision-making and strategic deployment of AI/ML technologies. Within a relatively short span of 5 months, the company witnessed a substantial reduction in month-over-month gross churn rates, a testament to the effectiveness of the predictive model and the actionable insights it provides.

This outstanding achievement not only underscores the robustness of the solution, it also highlights its pivotal role in fortifying Dialog Axiata’s position as a leading player in Sri Lanka’s highly competitive telecommunications landscape. By proactively identifying and addressing potential customer churn risks, the company has reinforced its commitment to delivering exceptional service and fostering long-lasting customer relationships.

Conclusion

Dialog Axiata’s journey in overcoming telecom churn challenges showcases the power of innovative solutions and the seamless integration of AI technologies. By using the AI Factory framework and SageMaker, Dialog Axiata not only addressed complex technical challenges, but also achieved tangible business benefits. This success story emphasizes the crucial role of predictive analytics in staying ahead in the competitive telecom industry, demonstrating the transformative impact of advanced AI models.

We appreciate you for reading this post, and hope you learned something new and useful. Please don’t hesitate to leave your feedback in the comments section.

Thank you Nilanka S. Weeraman, Sajani Jayathilaka, and Devinda Liyanage for your valuable contributions to this blog post.


About the Authors

Senthilvel (Vel) Palraj is a Senior Solutions Architect at AWS with over 15 years of IT experience. In this role, he helps customers in the telco, and media and entertainment industries across India and SAARC countries transition to the cloud. Before joining AWS India, Vel worked as a Senior DevOps Architect with AWS ProServe North America, supporting major Fortune 500 corporations in the United States. He is passionate about GenAI & AIML and leverages his deep knowledge to provide strategic guidance to companies looking to adopt and optimize AWS services. Outside of work, Vel enjoys spending time with his family and mountain biking on rough terrains.

Chamika Ramanayake is the Head of AI Platforms at Dialog Axiata PLC, Sri Lanka’s leading telecommunications company. He leverages his 7 years of experience in the telecommunication industry when leading his team to design and set the foundation to operationalize the end-to-end AI/ML system life cycle in the AWS cloud environment. He holds an MBA from PIM, University of Sri Jayawardenepura, and a B.Sc. Eng (Hons) in Electronics and Telecommunication Engineering from the University of Moratuwa.

Read More

Amazon SageMaker now integrates with Amazon DataZone to streamline machine learning governance

Amazon SageMaker now integrates with Amazon DataZone to streamline machine learning governance

Amazon SageMaker is a fully managed machine learning (ML) service that provides a range of tools and features for building, training, and deploying ML models. Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources.

Today, we are excited to announce an integration between Amazon SageMaker and Amazon DataZone to help you set up infrastructure with security controls, collaborate on machine learning (ML) projects, and govern access to data and ML assets.

When solving a business problem with ML, you create ML models from training data and integrate those models with business applications to make predictive decisions. For example, you could use an ML model for loan application processing to make decisions such as approving or denying a loan. When deploying such ML models, effective ML governance helps build trust in ML-powered applications, minimize risks, and promote responsible AI practices.

A comprehensive governance strategy spans across infrastructure, data, and ML. ML governance requires implementing policies, procedures, and tools to identify and mitigate various risks associated with ML use cases. Applying governance practices at every stage of the ML lifecycle is essential for successfully maximizing the value for the organization. For example, when building a ML model for a loan application processing use case, you can align the model development and deployment with your organization’s overall governance policies and controls to create effective loan approval workflows.

However, it might be challenging and time-consuming to apply governance across an ML lifecycle because it typically requires custom workflows and integration of several tools. With the new built-in integration between SageMaker and Amazon DataZone, you can streamline setting up ML governance across infrastructure, collaborate on business initiatives, and govern data and ML assets in just a few clicks.

For governing ML use cases, this new integration offers the following capabilities:

  • Business project management – You can create, edit, and view projects, as well as add users to start collaborating on the shared business objective
  • Infrastructure management – You can create multiple project environments and deploy infrastructure resources with embedded security controls to meet the enterprise needs
  • Asset governance – Users can search, discover, request access, and publish data and ML assets along with business metadata to the enterprise business catalog

In this post, we dive deep into how to set up and govern ML use cases. We discuss the end-to-end journey for setup and configuration of the SageMaker and Amazon DataZone integration. We also discuss how you can use self-service capabilities to discover, subscribe, consume, and publish data and ML assets as you work through your ML lifecycle.

Solution overview

With Amazon DataZone, administrators and data stewards who oversee an organization’s data assets can manage and govern access to data. These controls are designed to enforce access with the right level of privileges and context. Amazon DataZone makes it effortless for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so that they can discover, use, and collaborate to derive data-driven insights. The following diagram illustrates a sample architecture of Amazon DataZone and Amazon SageMaker integration.

With this integration, you can deploy SageMaker infrastructure using blueprints. The new SageMaker blueprint provides a well-architected infrastructure template. With this template, ML administrators can build a SageMaker environment profile with appropriate controls from services such as Amazon Virtual Private Cloud (VPC), Amazon Key Management Service (KMS Keys), and AWS Identity and Access Management (IAM), and enable ML builders to use this environment profile to deploy a SageMaker domain in minutes. When you create a SageMaker environment using the SageMaker environment profile, Amazon DataZone provisions a data and ML asset catalog, Amazon SageMaker Studio, and (IAM) roles for managing Amazon DataZone project permissions. The following diagram shows how the SageMaker environment fits in with the existing environments in Amazon DataZone projects.

To facilitate data and ML asset governance from SageMaker Studio, we extended SageMaker Studio to incorporate the following component:

  • Asset – A data or ML resource that can be published to a catalog or project inventory, discovered, and shared. Amazon Redshift tables and AWS Glue tables are original Amazon DataZone assets. With this integration, we introduce two more asset types: SageMaker Feature Groups and Model Package Groups.
  • Owned assets – A collection of project inventory assets discoverable only by project members. These are the staging assets in the project inventory that are not available to Amazon DataZone domain users until they are explicitly published to the Amazon DataZone business catalog.
  • Asset catalog – A collection of published assets in the Amazon DataZone business catalog discoverable across your organization with business context, thereby enabling everyone in your organization to find assets quickly for their use case.
  • Subscribed assets – A collection of assets the subscriber has been approved from the Amazon DataZone business catalog. Owners of those assets have to approve the request for access before the subscriber can consume them.

The following diagram shows an example of an ML asset like Customer-Churn-Model lifecycle with the described components.

In the following sections, we show you the user experience of the SageMaker and Amazon DataZone integration with an example. We demonstrate how to set up Amazon DataZone, including a domain, project, and SageMaker environment, and how to perform asset management using SageMaker Studio. The following diagram illustrates our workflow.

Set up an Amazon DataZone domain, project, and SageMaker environment

On the Amazon DataZone console, administrators create an Amazon DataZone domain, get access to the Amazon DataZone data portal, and provision a new project with access to specific data and users.

Administrators use the SageMaker blueprint that has enterprise level security controls to setup the SageMaker environment profile. Then, the SageMaker infrastructure with appropriate organizational boundaries will deploy in minutes so that ML builders can start using it for their ML use cases.

In the Amazon DataZone data portal, ML builders can create or join a project to collaborate on the business problem being solved. To start their ML use case in SageMaker, they use the SageMaker environment profile made by the administrators to create a SageMaker environment or use an existing one.

ML builders can then seamlessly federate into SageMaker Studio from the Amazon DataZone data portal with just a few clicks. The following actions can happen in SageMaker Studio:

  • Subscribe – SageMaker allows you to find, access, and consume the assets in the Amazon DataZone business catalog. When you find an asset in the catalog that you want to access, you need to subscribe to the asset, which creates a subscription request to the asset owner.
  • Publish – SageMaker allows you to publish your assets and their metadata as an owner of the asset to the Amazon DataZone business catalog so that others in the organization can subscribe and consume in their ML use cases.

Perform asset management using SageMaker Studio

In SageMaker Studio, ML builders can search, discover, and subscribe to data and ML assets in their business catalog. They can consume these assets for ML workflows such as data preparation, model training, and feature engineering in SageMaker Studio and SageMaker Canvas. Upon completing the ML tasks, ML builders can publish data, models, and feature groups to the business catalog for governance and discoverability.

Search and discover assets

After ML builders are federated into SageMaker Studio, they can view the Assets option in the navigation pane.

On the Assets page, ML builders can search and discover data assets and ML assets without additional administrator overhead.

The search result displays all the assets corresponding to the search criteria, including a name and description. ML builders can further filter by the type of asset to narrow down their results. The following screenshot is an example of available assets from a search result.

Subscribe to assets

After ML builders discover the asset from their search results, they can choose the asset to see details such as schema or metadata to understand whether the asset is useful for their use case.

To gain access to the asset, choose Subscribe to initiate the request for access from the asset owner. This action allows data governance for the asset owners to determine which members of the organization can access their assets.

The owner of the asset will be able to see the request in the Incoming subscription requests section on the Assets page. The asset owners can approve or reject the request with justifications. ML builders will also be able to see the corresponding action on the Assets page in the Outgoing subscription requests section. The following screenshot shows an example of managing asset requests and the Subscribed assets tab. In the next steps, we demonstrate how a subscribed data asset like mkt_sls_table and an ML asset like Customer-Churn-Model are used within SageMaker.

Consume subscribed assets

After ML builders are approved to access the subscribed assets, they can choose to use Amazon SageMaker Canvas or JupyterLab within SageMaker Studio. In this section, we explore the scenarios in which ML builders can consume the subscribed assets.

Consume a subscribed Model Package Group in SageMaker Studio

ML builders can see all the subscribed Model Package Groups in SageMaker Studio by choosing Open in Model Registry on the asset details page. ML builders are also able to consume the subscribed model by deploying the model to an endpoint for prediction. The following screenshot shows an example of opening a subscribed model asset.

Consume a subscribed data asset in SageMaker Canvas

When ML builders open the SageMaker Canvas app from SageMaker Studio, they are able to use Amazon SageMaker Data Wrangler and datasets. ML builders can view their subscribed data asset to perform experimentation and build models. As part of this integration, ML builders can view their subscribed assets under sub_db, and publish their assets via pub_db.The created models can then be registered in the Amazon SageMaker Model Registry from SageMaker Canvas. The following screenshot is an example of the subscribed asset mkt_sls_table for data preparation in SageMaker Canvas.

Consume a subscribed data asset in JupyterLab notebooks

ML builders can navigate to JupyterLab in SageMaker Studio to open a notebook and start their data experimentation. In JupyterLab notebooks, ML builders are able to see the subscribed data assets to query in their notebook and consume for experimentation and model building. The following screenshot is an example of the subscribed asset mkt_sls_table for data preparation in SageMaker Studio.

Publish assets

After experimentation and analysis, ML builders are able to share the assets with the rest of the organization by publishing them to the Amazon DataZone business catalog.  They can also make their assets only available to the project members by just publishing to the project inventory. ML builders can achieve these tasks by using the SageMaker SDK or publishing directly from SageMaker Studio.

You can publish ML assets by navigating to the specific asset tab and choosing Publish to asset catalog or Publish to inventory. The following screenshot show how you can publish feature group to asset catalog.

The following screenshot show how you can also publish model group to asset catalog or project inventory.

On the Assets page, you can use the data source feature to publish data assets like an AWS Glue table or Redshift table.

Conclusion

Governance is a multi-faceted discipline that encompasses controls across infrastructure management, data management, model management, access management, policy management, and more. ML governance plays a key role for organizations to successfully scale their ML usage across a wide range of use cases and also mitigate technical and operational risks.

The new SageMaker and Amazon DataZone integration enables your organization to streamline infrastructure controls and permissions, in addition to data and ML asset governance in ML projects. The provisioned ML environment is secure, scalable, and reliable for your teams to access data and ML assets, and build and train ML models.

We would like to hear from you on how this new capability is helping your ML governance use cases. Be on the lookout for more data and ML governance blog posts. Try out this new SageMaker integration for ML governance capability and leave your comments in the comments section.


About the authors

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, digital transformation, and enabling automation to improve overall organizational efficiency and productivity. He has over 7 years of automation experience deploying various technologies. In his spare time, Siamak enjoys exploring the outdoors, long-distance running, and playing sports.

Kareem Syed-Mohammed is a Product Manager at AWS. He is focused on ML Observability and ML Governance. Prior to this, at Amazon QuickSight, he led embedded analytics, and developer experience. In addition to QuickSight, he has been with AWS Marketplace and Amazon retail as a Product Manager. Kareem started his career as a developer for call center technologies, Local Expert and Ads for Expedia, and management consultant at McKinsey.

Dr. Sokratis Kartakis is a Principal Machine Learning and Operations Specialist Solutions Architect at AWS. Sokratis focuses on enabling enterprise customers to industrialize their Machine Learning (ML) and generative AI solutions by exploiting AWS services and shaping their operating model, i.e. MLOps/FMOps/LLMOps foundations, and transformation roadmap leveraging best development practices. He has spent 15+ years on inventing, designing, leading, and implementing innovative end-to-end production-level ML and AI solutions in the domains of energy, retail, health, finance, motorsports etc.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle.

Read More

LLM profiling guides KV cache optimization

LLM profiling guides KV cache optimization

This research paper was presented at the 12th International Conference on Learning Representations (opens in new tab) (ICLR 2024), the premier conference dedicated to the advancement of deep learning.

White ICLR logo to the left of the first page of the accepted paper, “Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs” on a purple background.

Large language models (LLMs) rely on complex internal mechanisms that require more memory than what is typically available to operate on standard devices. One such mechanism is the key-value (KV) cache, which stores and retrieves previously computed data, helping the model generate responses quickly without needing to recalculate information it has already processed. This method uses a substantial amount of memory because it keeps a large amount of this data readily accessible to enhance the model’s speed and efficiency. Consequently, the KV cache can become prohibitively large as the complexity of the tasks increases, sometimes requiring up to 320 GB for a single operation. To address this, we developed FastGen, a novel method aimed at reducing the memory demands for LLMs.

Our paper, “Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs (opens in new tab),” presented at ICLR 2024, we describe how FastGen optimizes the way LLMs store and access data, potentially cutting memory use by half while preserving their efficiency. This approach represents a significant step toward making sophisticated AI tools more accessible and affordable for broader applications. We are honored to share that this paper has been awarded an Honorable Mention for the Outstanding Paper Award (opens in new tab).

Observations of the KV cache

The development of FastGen is underpinned by our observations of how the KV cache functions. We first observed that not all the data in the KV cache is needed for LLMs to complete their required tasks, as shown in Figure 1. By providing the KV cache with the mechanism to discard unnecessary data, it is possible to significantly cut memory use. For example, some LLM modules don’t require broad contexts to process input. For this, it is possible to construct a KV cache that removes data that contains less important long-range contexts, such as several sentences or paragraphs. Also, some LLM modules primarily attend only to special tokens, such as punctuation, for which it is possible to create a KV cache that retains only those tokens. Finally, some LLM modules broadly need all tokens, and for these we can employ the standard KV cache and store all words.  

Another key observation in our study is that attention modules in different layers and positions in the LLM behave differently and need different preferences for their KV cache, as shown on the right in Figure 1. 

Spotlight: Event Series

Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI. Watch Episodes 1 & 2 on-demand.


Graphs depicting the different structures of the KV cache. The graph on the left contains common structures. The circle graphs on the right contain compositions of three modules that are in the same layer, but the way they store data is different.
Figure 1: These graphs depict the different structures of the KV cache. The graph on the left contains common structures. The circle graphs on the right contain compositions of three modules that are in the same layer, but the way they store data is different.

FastGen accounts for the diversity of KV cache structures

Because different KV caches have different structures, they need to be handled differently. We based the development of the FastGen algorithm on our observations, enabling it to categorize and optimize the data that is stored in a given KV cache. FastGen first analyzes the specific behaviors of different modules to understand their structures, a method called profiling. It then uses the results to adjust how data is stored in real-time, making the process more efficient. Our tests show that FastGen can reduce the amount of memory by 50% without sacrificing quality. Additional experiments, discussed in detail in our paper, confirm that the profiling process is crucial and significantly improves the efficiency of the KV cache.  

The broader picture

Fueled by unprecedented advances in data handling and computational capabilities, LLM pretraining has emerged as a cornerstone of deep learning, transforming natural language processing tasks and continuously challenging our understanding of learning and cognition.

However, greater capabilities can bring challenges. As models scale larger, customizing them for specific tasks can become more resource-intensive. At Microsoft Research, we are exploring different approaches to more efficient model editing. A critical strategy involves targeted model profiling, which identifies essential components of a model that align with predefined goals. This profiling informs precise model modifications, optimizing resource use and effectiveness.

The two research projects we are presenting at ICLR 2024 support these goals. Both adopt the profile-then-edit paradigm to address different problems. FastGen reduces memory consumption. Our related work, Post-hoc Attention Steering for LLMs (PASTA), focuses on better controllability. These approaches are designed to be resource-efficient, as they do not require tuning or back propagation. Looking ahead, our goal is to further develop these techniques to improve the resource-efficiency of LLM applications, making them more accessible to a wider audience.  

The post LLM profiling guides KV cache optimization appeared first on Microsoft Research.

Read More

AI Decoded: New DaVinci Resolve Tools Bring RTX-Accelerated Renaissance to Editors

AI Decoded: New DaVinci Resolve Tools Bring RTX-Accelerated Renaissance to Editors

AI tools accelerated by NVIDIA RTX have made it easier than ever to edit and work with video.

Case in point: Blackmagic Design’s DaVinci Resolve 19 recently added AI features that make video editing workflows more streamlined. These new features — along with all its other AI-powered effects — get a big boost from optimization for NVIDIA RTX PCs and workstations.

Editors use Blackmagic Design’s DaVinci Resolve — one of the leading nonlinear video editing platforms — to bring their creative vision to life, incorporating visual effects (VFX), color correction, motion graphics and more to their high-resolution footage and audio clips.

DaVinci Resolve’s new AI tools accelerated by RTX unlock endless possibilities.

Resolve includes a large variety of built-in tools. Some are corrective in nature, letting editors match colors from two sets of footage, reframe footage after the fact or remove objects that weren’t meant to be in a shot. Others give editors the power to manipulate footage and audio in new ways, including smooth slow-motion effects and footage upscaling.

In the past, many of these tools required significant time and effort from users to implement. Resolve now uses AI acceleration to speed up many of these workflows, leaving more time for users to focus on creativity rather than batch processing.

Even better, the entire app is optimized for NVIDIA TensorRT deep learning inference software to get the best performance from GPU-reliant effects and other features, boosting performance by 2x.

New in Resolve 19

The newest release, DaVinci Resolve 19, adds two new AI features that make video editing more efficient: the IntelliTrack AI point tracker for object tracking, stabilization and audio panning, and UltraNR, which uses AI for spatial noise reduction.

IntelliTrack AI makes it easy to stabilize footage during the editing process. It can also be used in Resolve’s Fairlight tool to track on-screen subjects, and automatically generate audio panning within a scene by tracking people or objects as they move across 2D and 3D spaces. With AI audio panning to video, editors can quickly pan, or move audio across the stereo field, multiple actors in a scene, controlling their voice positions in the mix environment. All of this can be done by hand, but IntelliTrack’s AI acceleration speeds up the entire process.

UltraNR is an AI-accelerated denoise mode in Resolve’s spatial noise reduction palette. Editors can use it to dramatically reduce digital noise — undesired fluctuations of color or luminance that obscure detail — from a frame while maintaining image clarity. They can also combine the tool with temporal noise reduction for even more effective denoising in images with motion, where fluctuations can be more noticeable.

Both IntelliTrack and UltraNR get a big boost when running on NVIDIA RTX PCs and workstations. TensorRT lets them run up to 3x faster on a GeForce GTX 4090 laptop vs. the Macbook Pro M3 Max.

In fact, all DaVinci Resolve AI effects are accelerated on RTX GPUs by NVIDIA TensorRT. The new Resolve update also includes GPU acceleration for Beauty, Edge Detect and Watercolor effects, doubling their performance on NVIDIA GPUs.

Find out more about DaVinci Resolve 19, and try it yourself for free, at Blackmagic Design.

Learn how AI is supercharging creativity, and how to get the most from your own creative process, with NVIDIA Studio.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More