NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit

The NVIDIA AI Summit India, taking place Oct. 23–25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India is tackling the world’s grand challenges.

A major highlight: a fireside chat with NVIDIA founder and CEO Jensen Huang on October 24. He’ll share his insights on AI’s pivotal role in reshaping industries and how India is emerging as a global AI leader, and be joined by the chairman and managing director of Reliance Industries, Mukesh Ambani.

Passes for the event are sold out. But don’t worry — audiences can tune in via livestream or watch on-demand sessions at NVIDIA AI Summit.

With over 50 sessions, live demos and hands-on workshops, the event will showcase AI’s transformative impact across industries like robotics, supercomputing and industrial digitalization.

It will explore opportunities both globally and locally in India. Over 70% of the use cases discussed will focus on how AI can address India’s most pressing challenges.

India’s AI Journey

India’s rise to become a global AI leader is powered by its focus on building AI infrastructure and foundational models.

NVIDIA’s accelerated computing platform, now 100,000x more energy-efficient for processing large language models than a decade ago, is driving this progress.

If car efficiency had improved at the same rate, vehicles today would get 280,000 miles per gallon — enough to drive to the moon with a single gallon of gas.

As India solidifies its place in AI leadership, the summit will tackle key topics.

These include building AI infrastructure with NVIDIA’s advanced GPUs, harnessing foundational models for Indian languages, fueling innovation in India’s startup ecosystem and upskilling developers to take the country’s workforce to the AI front office. The momentum is undeniable.

India’s AI Summit: Driving Innovation, Solving Grand Challenges

NVIDIA is at the heart of India’s rise as an AI powerhouse.

With six locations across the country hosting over 4,000 employees, NVIDIA plays a central role in the country’s rapid progress in AI.

The company works with enterprises, cloud providers and startups to build AI infrastructure powered by NVIDIA’s accelerated computing stack comprising tens of thousands of its most advanced GPUs, high-performance networking, and AI software platforms and tools.

The summit will feature sessions on how this infrastructure empowers sectors like healthcare, agriculture, education and manufacturing.

Jensen Huang’s Fireside Chat

The fireside chat with Huang on October 24 is a must-watch.

He’ll discuss how AI is revolutionizing industries worldwide and India’s increasingly important role as a global AI leader.

To hear his thoughts firsthand, tune in to the livestream or catch the session on demand for insights from one of the most influential figures in AI.

Key Sessions and Speakers

Top industry experts like Niki Parmar (Essential AI), Deepu Talla (NVIDIA) and Abhinav Aggarwal (Fluid AI) will dive into a range of game-changing topics, including:

  • Generative AI and large language models (LLMs): Discover innovations in video synthesis and high-quality data models for large-scale inference.
  • Robotics and industrial efficiency: See how AI-powered robotics tackle automation challenges in manufacturing and warehouse operations.
  • AI in healthcare: Learn how AI transforms diagnostics and treatments, improving outcomes across India’s healthcare system.

These sessions will also introduce cutting-edge NVIDIA AI networking technologies, essential for building next-gen AI data centers.

Workshops and Startup Innovation

India’s vibrant startup ecosystem will be in the spotlight at the summit.

Nearly 2,000 companies in India are part of NVIDIA Inception, a program that supports startups driving innovation in AI and other fields.

Onsite workshops at the AI Summit will offer hands-on experiences with NVIDIA’s advanced AI tools, giving developers and startups practical skills to push the boundaries of innovation.

Meanwhile, Reverse VC Pitches will provide startups with unique insights as venture capital firms pitch their visions for the future, sparking fresh ideas and collaborations.

Industrial AI and Manufacturing Innovation

NVIDIA is also backing India’s industrial expansion by deploying AI technologies like Omniverse and Isaac.

These tools are enhancing everything from factory planning to manufacturing and construction, helping build greenfield factories that are more efficient and sustainable.

These technologies integrate advanced AI capabilities into factory operations, cutting costs while boosting sustainability.

Through hands-on workshops and deep industry insights, participants will see how India is positioning itself to lead the world in AI innovation.

Join the livestream or explore sessions on demand at NVIDIA AI Summit.

Read More

Amazon Bedrock Custom Model Import now generally available

Amazon Bedrock Custom Model Import now generally available

Today, we’re pleased to announce the general availability (GA) of Amazon Bedrock Custom Model Import. This feature empowers customers to import and use their customized models alongside existing foundation models (FMs) through a single, unified API. Whether leveraging fine-tuned models like Meta Llama, Mistral Mixtral, and IBM Granite, or developing proprietary models based on popular open-source architectures, customers can now bring their custom models into Amazon Bedrock without the overhead of managing infrastructure or model lifecycle tasks.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

With Amazon Bedrock Custom Model Import, customers can access their imported custom models on demand in a serverless manner, freeing them from the complexities of deploying and scaling models themselves. They’re able to accelerate generative AI application development by using native Amazon Bedrock tools and features such as Knowledge Bases, Guardrails, Agents, and more—all through a unified and consistent developer experience.

Benefits of Amazon Bedrock Custom Model Import include:

  1. Flexibility to use existing fine-tuned models:Customers can use their prior investments in model customization by importing existing customized models into Amazon Bedrock without the need to recreate or retrain them. This flexibility maximizes the value of previous efforts and accelerates application development.
  2. Integration with Amazon Bedrock Features: Imported custom models can be seamlessly integrated with the native tools and features of Amazon Bedrock, such as Knowledge Bases, Guardrails, Agents, and Model Evaluation. This unified experience enables developers to use the same tooling and workflows across both base FMs and imported custom models.
  3. Serverless: Customers can access their imported custom models in an on-demand and serverless manner. This eliminates the need to manage or scale underlying infrastructure, as Amazon Bedrock handles all those aspects. Customers can focus on developing generative AI applications without worrying about infrastructure management or scalability issues.
  4. Support for popular model architectures: Amazon Bedrock Custom Model Import supports a variety of popular model architectures, including Meta Llama 3.2, Mistral 7B, Mixtral 8x7B, and more. Customers can import custom weights in formats like Hugging Face Safetensors from Amazon SageMaker and Amazon S3. This broad compatibility allows customers to work with models that best suit their specific needs and use cases, allowing for greater flexibility and choice in model selection.
  5. Leverage Amazon Bedrock converse API: Amazon Custom Model Import allows our customers to use their supported fine-tuned models with Amazon Bedrock Converse API which simplifies and unifies the access to the models.

Getting started with Custom Model Import

One of the critical requirements from our customers is the ability to customize models with their proprietary data while retaining complete ownership and control over the tuned model artifact and its deployment. Customization could be in form of domain adaptation or instruction fine-tuning. Customers have a wide degree of options for fine-tuning models efficiently and cost effectively. However, hosting models presents its own unique set of challenges. Customers are looking for some key aspects, namely:

  • Using the existing customization investment and fine-grained control over customization.
  • Having a unified developer experience when accessing custom models or base models through Amazon Bedrock’s API.
  • Ease of deployment through a fully managed, serverless, service.
  • Using pay-as-you-go inference to minimize the costs of their generative AI workloads.
  • Be backed by enterprise grade security and privacy tooling.

Amazon Bedrock Custom Model Import feature seeks to address these concerns. To bring your custom model into the Amazon Bedrock ecosystem, you need to run an import job. The import job can be invoked using the AWS Management Console or through APIs. In this post, we demonstrate the code for running the import model process through APIs. After the model is imported, you can invoke the model by using the model’s Amazon Resource Name (ARN).

As of this writing, supported model architectures today include Meta Llama (v.2, 3, 3.1, and 3.2), Mistral 7B, Mixtral 8x7B, Flan and IBM Granite models like Granite 3B-Code, 8B-Code, 20B-Code and 34B-Code.

A few points to be aware of when importing your model:

  • Models must be serialized in Safetensors format.
  • If you have a different format, you can potentially use Llama convert scripts or Mistral convert scripts to convert your model to a supported format.
  • The import process expects at least the following files:.safetensors, json, tokenizer_config.json, tokenizer.json, and tokenizer.model.
  • The precision for the model weights supported is FP32, FP16, and BF16.
  • For fine-tuning jobs that create adapters like LoRA-PEFT adapters, the import process expects the adapters to be merged into the main base model weight as described in Model merging.

Importing a model using the Amazon Bedrock console

  1. Go to the Amazon Bedrock console and choose Foundational models and then Imported models from the navigation pane on the left hand side to get to the Models
  2. Click on Import Model to configure the import process.
  3. Configure the model.
    1. Enter the location of your model weights. These can be in Amazon S3 or point to a SageMaker Model ARN object.
    2. Enter a Job name. We recommend this be suffixed with the version of the model. As of now, you need to manage the generative AI operations aspects outside of this feature.
    3. Configure your AWS Key Management Service (AWS KMS) key for encryption. By default, this will default to a key owned and managed by AWS.
    4. Service access role. You can create a new role or use an existing role which will have the necessary permissions to run the import process. The permissions must include access to your Amazon S3 if you’re specifying model weights through S3.
  1. After the Import Model job is complete, you will see the model and the model ARN. Make a note of the ARN to use later.
  2. Test the model using the on-demand feature in the Text playground as you would for any base foundations model.

The import process validates that the model configuration complies with the specified architecture for that model by reading the config.json file and validates the model architecture values such as the maximum sequence length and other relevant details. It also checks that the model weights are in the Safetensors format. This validation verifies that the imported model meets the necessary requirements and is compatible with the system.

Fine tuning a Meta Llama Model on SageMaker

Meta Llama 3.2 offers multi-modal vision and lightweight models, representing Meta’s latest advances in large language models (LLMs). These new models provide enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, the Llama 3.2 models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features to help you build a new generation of AI experiences.

SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This gives you multiple options to discover and use hundreds of models for your use case.

In this section, we’ll show you how to fine-tune the Llama 3.2 3B Instruct model using SageMaker JumpStart. We’ll also share the supported instance types and context for the Llama 3.2 models available in SageMaker JumpStart. Although not highlighted in this post, you can also find other Llama 3.2 Model variants that can be fine-tuned using SageMaker JumpStart.

Instruction fine-tuning

The text generation model can be instruction fine-tuned on any text data, provided that the data is in the expected format. The instruction fine-tuned model can be further deployed for inference. The training data must be formatted in a JSON Lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, but can be saved in multiple JSON Lines files. The training folder can also contain a template.json file describing the input and output formats.

Synthetic dataset

For this use case, we’ll use a synthetically generated dataset named amazon10Ksynth.jsonl in an instruction-tuning format. This dataset contains approximately 200 entries designed for training and fine-tuning LLMs in the finance domain.

The following is an example of the data format:

instruction_sample = {
    "question": "What is Amazon's plan for expanding their physical store footprint and how will that impact their overall revenue?",
    "context": "The 10-K report mentions that Amazon is continuing to expand their physical store network, including 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth.",
    "answer": "Amazon is expanding their physical store footprint, with 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth."
}
 
print(instruction_sample)

Prompt template

Next, we create a prompt template for using the data in an instruction input format for the training job (because we are instruction fine-tuning the model in this example), and for inferencing the deployed endpoint.

import json

prompt_template = {
  "prompt": "question: {question} context: {context}",
  "completion": "{answer}"
}

with open("prompt_template.json", "w") as f:
    json.dump(prompt_template, f)

After the prompt template is created, upload the prepared dataset that will be used for fine-tuning to Amazon S3.

from sagemaker.s3 import S3Uploader
import sagemaker
output_bucket = sagemaker.Session().default_bucket()
local_data_file = "amazon10Ksynth.jsonl"
train_data_location = f"s3://{output_bucket}/amazon10Ksynth_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("prompt_template.json", train_data_location)
print(f"Training data: {train_data_location}")

Fine-tuning the Meta Llama 3.2 3B model

Now, we’ll fine-tune the Llama 3.2 3B model on the financial dataset. The fine-tuning scripts are based on the scripts provided by the Llama fine-tuning repository.

from sagemaker.jumpstart.estimator import JumpStartEstimator
 
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},
    disable_output_compression=True,
    instance_type="ml.g5.12xlarge",  
)
 
# Set the hyperparameters for instruction tuning
estimator.set_hyperparameters(
    instruction_tuned="True", epoch="5", max_input_length="1024"
)
 
# Fit the model on the training data
estimator.fit({"training": train_data_location})

Importing a custom model from SageMaker to Amazon Bedrock

In this section, we will use a Python SDK to create a model import job, get the imported model ID and finally generate inferences. You can refer to the console screenshots in the earlier section  for how to import a model using the Amazon Bedrock console.

Parameter and helper function set up

First, we’ll create a few helper functions and set up our parameters to create the import job. The import job is responsible for collecting and deploying the model from SageMaker to Amazon Bedrock. This is done by using the create_model_import_job function.

Stored safetensors need to be formatted so that the Amazon S3 location is the top-level folder. The configuration files and safetensors will be stored as shown in the following figure.

import json
import boto3
from botocore.exceptions import ClientError
bedrock = boto3.client('bedrock', region_name='us-east-1')
job_name = 'fine-tuned-model-import-demo'
sagemaker_model_name = 'meta-textgeneration-llama-3-2-3b-2024-10-12-23-29-57-373'
model_url = {'s3DataSource':
                 {'s3Uri':
                      "s3://sagemaker-{REGION}-{AWS_ACCOUNT}/meta-textgeneration-llama-3-2-3b-2024-10-12-23-19-53-906/output/model/"
                  }
             }

Check the status and get job ARN from the response:

After a few minutes, the model will be imported, and the status of the job can be checked using get_model_import_job. The job ARN is then used to get the imported model ARN, which we will use to generate inferences.

def get_import_model_from_job(job_name):
    response = bedrock.get_model_import_job(jobIdentifier=job_name)
    return response['importedModelArn']


job_arn = response['jobArn']
import_model_arn = get_import_model_from_job(job_arn)

Generating inferences using the imported custom model

The model can be invoked by using the invoke_model and converse APIs. The following is a support function that will be used to invoke and extract the generated text from the overall output.

from botocore.exceptions import ClientError

client = boto3.client('bedrock-runtime', region_name='us-east-1')

def generate_conversation_with_imported_model(native_request, model_id):
    request = json.dumps(native_request)
    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)
        model_response = json.loads(response["body"].read())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

Context set up and model response

Finally, we can use the custom model. First, we format our inquiry to match the fined-tuned prompt structure. This will make sure that the responses generated closely resemble the format used in the fine-tuning phase and are more aligned to our needs. To do this we use the template that we used to format the data used for fine-tuning. The context will be coming from your RAG solutions like Amazon Bedrock Knowledgebases. For this example, we take a sample context and add to demo the concept:

input_output_demarkation_key = "nn### Response:n"
question = "Tell me what was the improved inflow value of cash?"

context = "Amazons free cash flow less principal repayments of finance leases and financing obligations improved to an inflow of $46.1 billion for the trailing twelve months, compared with an outflow of $10.1 billion for the trailing twelve months ended March 31, 2023."

payload = {
    "prompt": template[0]["prompt"].format(
                                    question=question, # user query
                                    context=context 
                                       + input_output_demarkation_key # rag context
                                        ),
    "max_tokens": 100, 
    "temperature": 0.01 
}
generate_conversation_with_imported_model(payload, import_model_arn)

The output will look similar to:

After the model has been fine-tuned and imported into Amazon Bedrock, you can experiment by sending different sets of input questions and context to the model to generate a response, as shown in the following example:

question: """How did Amazon's international segment operating income change 
            in Q4 2022 compared to the prior year?"""
context: """Amazon's international segment reported an operating loss of 
            $1.1 billion in Q4 2022, an improvement from a $1.7 billion 
            operating loss in Q4 2021."""
response:

Some points to note

This examples in this post are to demonstrate Custom Model Import and aren’t designed to be used in production. Because the model has been trained on only 200 samples of synthetically generated data, it’s only useful for testing purposes. You would ideally have more diverse datasets and additional samples with continuous experimentation conducted using hyperparameter tuning for your respective use case, thereby steering the model to create a more desirable output. For this post, ensure that the model temperature parameter is set to 0 and max_tokens run time parameter is set to a lower values such as 100–150 tokens so that a succinct response is generated. You can experiment with other parameters to generate a desirable outcome. See Amazon Bedrock Recipes and GitHub for more examples.

Best practices to consider:

This feature brings significant advantages for hosting your fine-tuned models efficiently. As we continue to develop this feature to meet our customers’ needs, there are a few points to be aware of:

  • Define your test suite and acceptance metrics before starting the journey. Automating this will help to save time and effort.
  • Currently, the model weights need to be all-inclusive, including the adapter weights. There are multiple methods for merging the models and we recommend experimenting to determine the right methodology. The Custom Model Import feature lets you test your model on demand.
  • When creating your import jobs, add versioning to the job name to help quickly track your models. Currently, we’re not offering model versioning, and each import is a unique job and creates a unique model.
  • The precision supported for the model weights is FP32, FP16, and BF16. Run tests to validate that these will work for your use case.
  • The maximum concurrency that you can expect for each model will be 16 per account. Higher concurrency requests will cause the service to scale and increase the number of model copies.
  • The number of model copies active at any point in time will be available through Amazon CloudWatch See Import a customized model to Amazon Bedrock for more information.
  • As of the writing this post, we are releasing this feature in the US-EAST-1 and US-WEST-2 AWS Regions only. We will continue to release to other Regions. Follow Model support by AWS Region for updates.
  • The default import quota for each account is three models. If you need more for your use cases, work with your account teams to increase your account quota.
  • The default throttling limits for this feature for each account will be 100 invocations per second.
  • You can use this sample notebook to performance test your models imported via this feature. This notebook is mere reference and not designed to be an exhaustive testing. We will always recommend you to run your own full performance testing along with your end to end testing including functional and evaluation testing.

Now available

Amazon Bedrock Custom Model Import is generally available today in Amazon Bedrock in the US-East-1 (N. Virginia) and US-West-2 (Oregon) AWS Regions. See the full Region list for future updates. To learn more, see the Custom Model Import product page and pricing page.

Give Custom Model Import a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.


About the authors

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Claudio Mazzoni is a Sr GenAI Specialist Solutions Architect at AWS working on world class applications guiding costumers through their implementation of GenAI to reach their goals and improve their business outcomes. Outside of work Claudio enjoys spending time with family, working in his garden and cooking Uruguayan food.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.

Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.

Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.

Read More

Deploy a serverless web application to edit images using Amazon Bedrock

Deploy a serverless web application to edit images using Amazon Bedrock

Generative AI adoption among various industries is revolutionizing different types of applications, including image editing. Image editing is used in various sectors, such as graphic designing, marketing, and social media. Users rely on specialized tools for editing images. Building a custom solution for this task can be complex. However, by using various AWS services, you can quickly deploy a serverless solution to edit images. This approach can give your teams access to image editing foundation models (FMs) using Amazon Bedrock.

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. Amazon Bedrock is serverless, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

Amazon Titan Image Generator G1 is an AI FM available with Amazon Bedrock that allows you to generate an image from text, or upload and edit your own image. Some of the key features we focus on include inpainting and outpainting.

This post introduces a solution that simplifies the deployment of a web application for image editing using AWS serverless services. We use AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon Bedrock with the Amazon Titan Image Generator G1 model to build an application to edit images using prompts. We cover the inner workings of the solution to help you understand the function of each service and how they are connected to give you a complete solution. At the time of writing this post, Amazon Titan Image Generator G1 comes in two versions; for this post, we use version 2.

Solution overview

The following diagram provides an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our frontend application. A combination of API Gateway and a Lambda function is used for our backend services, and Amazon Bedrock integrates with the FM model, enabling users to edit the image using prompts.

Solution Overview

Prerequisites

You must have the following in place to complete the solution in this post:

Deploy solution resources using AWS CloudFormation

When you run the AWS CloudFormation template, the following resources are deployed:

  • Amazon Cognito resources:
  • Lambda resources:
    • Function: <Stack name>-ImageEditBackend-<auto-generated>
  • AWS Identity Access Management (IAM) resources:
    • IAM role: <Stack name>-ImageEditBackendRole-<auto-generated>
    • IAM inline policy: AmazonBedrockAccess (this policy allows Lambda to invoke Amazon Bedrock FM amazon.titan-image-generator-v2:0)
  • API Gateway resources:
    • Rest API: ImageEditingAppBackendAPI
    • Methods:
      • OPTIONS – Added header mapping for CORS
      • POST – Lambda integration
    • Authorization: Through Amazon Cognito using CognitoAuthorizer

After you deploy the CloudFormation template, copy the following from the Outputs tab to be used during the deployment of Amplify:

  • userPoolId
  • userPoolClientId
  • invokeUrl

CFN Output

Deploy the Amplify application

You have to manually deploy the Amplify application using the frontend code found on GitHub. Complete the following steps:

  1. Download the frontend code from the GitHub repo.
  2. Unzip the downloaded file and navigate to the folder.
  3. In the js folder, find the config.js file and replace the values of XYZ for userPoolId, userPoolClientId, and invokeUrl with the values you collected from the CloudFormation stack outputs. Set the region value based on the Region where you’re deploying the solution.

The following is an example config.js file:

window._config = {
    cognito: {
        userPoolId: 'XYZ', // e.g. us-west-2_uXboG5pAb
        userPoolClientId: 'XYZ', // e.g. 25ddkmj4v6hfsfvruhpfi7n4hv
        region: 'XYZ// e.g. us-west-2
    },
    api: {
        invokeUrl: 'XYZ' // e.g. https://rc7nyt4tql.execute-api.us-west-2.amazonaws.com/prod,
    }
};

Extract Update Config File

  1. Select all the files and compress them as shown in the following screenshot.

Make sure you zip the contents and not the top-level folder. For example, if your build output generates a folder named AWS-Amplify-Code, navigate into that folder and select all the contents, and then zip the contents.

Create New Zip File

  1. Use the new .zip file to manually deploy the application in Amplify.

After it’s deployed, you will receive a domain that you can use in later steps to access the application.

AWS Amplify Search Create App

  1. Create a test user in the Amazon Cognito user pool.

An email address is required for this user because you will need to mark the email address as verified.

Cognito Create User

  1. Return to the Amplify page and use the domain it automatically generated to access the application.

Use Amazon Cognito for user authentication

Amazon Cognito is an identity platform that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the image editing application.

Upon accessing the Image Editing Tool URL, you will be prompted to sign in with a previously created test user. For first-time sign-ins, users will be asked to update their password. After this process, the user’s credentials are validated against the records stored in the user pool. If the credentials match, Amazon Cognito will issue a JSON Web Token (JWT). In the API payload to be sent section of the page, you will notice that the Authorization field has been updated with the newly issued JWT.

Use Lambda for backend code and Amazon Bedrock for generative AI function

The backend code is hosted on Lambda, and launched by user requests routed through API Gateway. The Lambda function process the request payload and forwards it to Amazon Bedrock. The reply from Amazon Bedrock follows the same route as the initial request.

Use API Gateway for API management

API Gateway streamlines API management, allowing developers to deploy, maintain, monitor, secure, and scale their APIs effortlessly. In our use case, API Gateway serves as the orchestrator for the application logic and provides throttling to manage the load to the backend. Without API Gateway, you would need to use the JavaScript SDK in the frontend to interact directly with the Amazon Bedrock API, bringing more work to the frontend.

Use Amplify for frontend code

Amplify offers a development environment for building secure, scalable mobile and web applications. It allows developers to focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our frontend code using the method outlined earlier in this post.

Image editing tool walkthrough

Navigate to the URL provided after you created the application in Amplify and sign in. At first login attempt, you’ll be asked to reset your password.

App Login

As you follow the steps for this tool, you will notice the API Payload to be Sent section on the right side updating dynamically, reflecting the details mentioned in the corresponding steps that follow.

Step 1: Create a mask on your image

To create a mask on your image, choose a file (JPEG, JPG, or PNG).

After the image is loaded, the frontend converts the file into base64 and base_image value is updated.

As you select a portion of the image you want to edit, a mask will be created, and mask value is updated with a new base64 value. You can also use the stroke size option to adjust the area you are selecting.

You now have the original image and the mask image encoded in base64. (The Amazon Titan Image Generator G1 model requires the inputs to be in base64 encoding.)

Choose File and Create Mask

Step 2: Write a prompt and set your options

Write a prompt that describes what you want to do with the image. For this example, we enter Make the driveway clear and empty. This is reflected in the prompt on the right.

You can choose from the following image editing options: inpainting and outpainting. The value for mode is updated depending on your selection.

  • Use inpainting to remove masked elements and replace them with background pixels
  • Use outpainting to extend the pixels of the masked image to the image boundaries

Choose Send to API to send the payload to the API gateway. This action invokes the Lambda function, which validates the received payload. If the payload is validated successfully, the Lambda function proceeds to invoke the Amazon Bedrock API for further processing.

The Amazon Bedrock API generates two image outputs in base64 format, which are transmitted back to the frontend application and rendered as visual images.

Prompt

Step 3: View and download the result

The following screenshot shows the results of our test. You can download the results or provide an updated prompt to get a new output.

Download

Testing and troubleshooting

When you initiate the Send to API action, the system performs a validation check. If required information is missing or incorrect, it will display an error notification. For instance, if you attempt to send an image to the API without providing a prompt, an error message will appear on the right side of the interface, alerting you to the missing input, as shown in the following screenshot.

App Error

Clean up

If you decide to discontinue using the Image Editing Tool, you can follow these steps to remove the Image Editing Tool, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

  1. Delete the CloudFormation stack:
    1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
    2. Locate the stack you created during the deployment process (you assigned a name to it).
    3. Select the stack and choose Delete.
  2. Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

In this post, we explored a sample solution that you can use to deploy an image editing application by using AWS serverless services and generative AI services. We used Amazon Bedrock and an Amazon Titan FM that allows you to edit images by using prompts. By adopting this solution, you gain the advantage of using AWS managed services, so you don’t have to maintain the underlying infrastructure. Get started today by deploying this sample solution.

Additional resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about the Amazon Titan Image Generator G1 model, see the following resources:


About the Authors

Salman AhmedSalman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.

Sergio BarrazaSergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.

Ravi KumarRavi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.

Ankush GoyalAnkush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Read More

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

Many organizations are building generative AI applications powered by large language models (LLMs) to boost productivity and build differentiated experiences. These LLMs are large and complex and deploying them requires powerful computing resources and results in high inference costs. For businesses and researchers with limited resources, the high inference costs of generative AI models can be a barrier to enter the market, so more efficient and cost-effective solutions are needed. Most generative AI use cases involve human interaction, which requires AI accelerators that can deliver real time response rates with low latency. At the same time, the pace of innovation in generative AI is increasing, and it’s becoming more challenging for developers and researchers to quickly evaluate and adopt new models to keep pace with the market.

One of ways to get started with LLMs such as Llama and Mistral are by using Amazon Bedrock. However, customers who want to deploy LLMs in their own self-managed workflows for greater control and flexibility of underlying resources can use these LLMs optimized on top of AWS Inferentia2-powered Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances. In this blog post, we will introduce how to use an Amazon EC2 Inf2 instance to cost-effectively deploy multiple industry-leading LLMs on AWS Inferentia2, a purpose-built AWS AI chip, helping customers to quickly test and open up an API interface to facilitate performance benchmarking and downstream application calls at the same time.

Model introduction

There are many popular open source LLMs to choose from, and for this blog post, we will review three different use cases based on model expertise using Meta-Llama-3-8B-Instruct, Mistral-7B-instruct-v0.2, and CodeLlama-7b-instruct-hf.

Model name Release company Number of parameters Release time Model capabilities
Meta-Llama-3-8B-Instruct Meta 8 billion April 2024 Language understanding, translation, code generation, inference, chat
Mistral-7B-Instruct-v0.2 Mistral AI 7.3 billion March 2024 Language understanding, translation, code generation, inference, chat
CodeLlama-7b-Instruct-hf Meta 7 billion August 2023 Code generation, code completion, chat

Meta-Llama-3-8B-Instruct is a popular language models, released by Meta AI in April 2024. The Llama 3 model has improved pre-training, instant comprehension, output generation, coding, inference, and math skills. The Meta AI team says that Llama 3 has the potential to be the initiator of a new wave of innovation in AI. The Llama 3 model is available in two publicly released versions, 8B and 70B. At the time of writing, Llama 3.1 instruction-tuned models are available in 8B, 70B, and 405B versions. In this blog post, we will use the Meta-Llama-3-8B-Instruct model, but the same process can be followed for Llama 3.1 models.

Mistral-7B-instruct-v0.2, released by Mistral AI in March 2024, marks a major milestone in the development of the publicly available foundation model. With its impressive performance, efficient architecture, and wide range of features, Mistral 7B v0.2 sets a new standard for user-friendly and powerful AI tools. The model excels at tasks ranging from natural language processing to coding, making it an invaluable resource for researchers, developers, and businesses. In this blog post, we will use the Mistral-7B-instruct-v0.2 model, but the same process can be followed for the Mistral-7B-instruct-v0.3 model.

CodeLlama-7b-instruct-hf is a collection of models published by Meta AI. It is an LLM that uses text prompts to generate code. Code Llama is aimed at code tasks, making developers’ workflow faster and more efficient and lowering the learning threshold for coders. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more powerful and well-documented software.

Solution architecture

The solution uses a client-server architecture, and the client uses the HuggingFace Chat UI to provide a chat page that can be accessed on a PC or mobile device. Server-side model inference uses Hugging Face’s Text Generation Inference, an efficient LLM inference framework that runs in a Docker container. We pre-compiled the model using Hugging Face’s Optimum Neuron and uploaded the compilation results to Hugging Face Hub. We have also added a model switching mechanism to the HuggingFace Chat UI to control the loading of different models in the Text Generation Inference container through a scheduler (Scheduler).

Solution highlights

  1. All components are deployed on an Inf2 instance with a single chip instance (inf2.xl or inf2.8xl), and users can experience the effects of multiple models on one instance.
  2. With the client-server architecture, users can flexibly replace either the client or the server side according to their actual needs. For example, the model can be deployed in Amazon SageMaker, and the frontend Chat UI can be deployed on the Node server. To facilitate the demonstration, we deployed both the front and back ends on the same Inf2 server.
  3. Using a publicly available framework, users can customize frontend pages or models according to their own needs.
  4. Using an API interface for Text Generation Inference facilitates quick access for users using the API.
  5. Deployment using AWS Cloudformation, suitable for all types of businesses and developers within the enterprise.

Main components

The following are the main components of the solution.

Hugging Face Optimum Neuron

Optimum Neuron is an interface between the HuggingFace Transformers library and the AWS Neuron SDK. It provides a set of tools for model load, training, and inference for single and multiple accelerator setups of different downstream tasks. In this article, we mainly used Optimum Neuron’s export interface. To deploy the HuggingFace Transformers model on Neuron devices, the model needs to be compiled and exported to a serialized format before the inference is performed. The export interface is pre-compiled (Ahead of-time compilation (AOT)) using the Neuron compiler (Neuronx-cc), and the model is converted into a serialized and optimized TorchScript module. This is shown in the following figure.

During the compilation process, we introduced a tensor parallelism mechanism to split the weights, data, and computations between the two NeuronCores. For more compilation parameters, see Export a model to Inferentia.

Hugging Face’s Text Generation Inference (TGI)

Text Generation Inference (TGI) is a framework written in Rust and Python for deploying and serving LLMs. TGI provides high performance text generation services for the most popular publicly available foundation LLMs. Its main features are:

  • Simple launcher that provides inference services for many LLMs
  • Supports both generate and stream interfaces
  • Token stream using server-sent events (SSE)
  • Supports AWS Inferentia, Trainium, NVIDIA GPUs and other accelerators

HuggingFace Chat UI

HuggingFace Chat UI is an open-source chat tool built by SvelteKit and can be deployed to Cloudflare, Netlify, Node, and so on. It has the following main features:

  • Page can be customized
  • Conversation records can be stored, and chat records are stored in MongoDB
  • Supports operation on PC and mobile terminals
  • The backend can connect to Text Generation Inference and supports API interfaces such as Anthropic, Amazon SageMaker, and Cohere
  • Compatible with various publicly available foundation models (Llama series, Mistral/Mixtral series, Falcon, and so on.

Thanks to the page customization capabilities of the Hugging Chat UI, we’ve added a model switching function, so users can switch between different models on the same EC2 Inf2 instance.

Solution deployment

  1. Before deploying the solution, make sure you have an inf2.xl or inf2.8xl usage quota in the us-east-1 (Virginia) or us-west-2 (Oregon) AWS Region. See the reference link for how to apply for a quota.
  2. Sign in to the AWS Management Consol and switch the Region to us-east-1 (Virginia) or us-west-2 (Oregon) in the upper right corner of the console page.
  3. Enter Cloudformation in the service search box and choose Create stack.
  4. Select Choose an existing template, and then select Amazon S3 URL.
  5. If you plan to use an existing virtual private cloud (VPC), use the steps in a; if you plan to create a new VPC to deploy, use the steps in b.
    1. Use an existing VPC.
      1. Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_default_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
      2. Stack name: Enter the stack name.
      3. InstanceType: select inf2.xl (lower cost) or inf2.8xl (better performance).
      4. KeyPairName (optional): if you want to sign in to the Inf2 instance, enter the KeyPairName name.
      5. VpcId: Select VPC.
      6. PublicSubnetId: Select a public subnet.
      7. VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
      8. Choose Next, then Next again. Choose Submit.
    2. Create a new VPC.
      1. Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_new_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
      2. Stack name: Enter the stack name.
      3. InstanceType: Select inf2.xl or inf2.8xl.
      4. KeyPairName (optional): If you want to sign in to the Inf2 instance, enter the KeyPairName name.
      5. VpcId: Leave as New.
      6. PublicSubnetId: Leave as New.
      7. VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
  6. Choose Next, and then Next again. Then choose Submit.6. After creating the stack, wait for the resources to be created and started (about 15 minutes). After the stack status is displayed as CREATE_COMPLETE, choose Outputs. Choose the URL where the key is the corresponding value location for Public endpoint for the web server (close all VPN connections and firewall programs).

User interface

After the solution is deployed, users can access the preceding URL on the PC or mobile phone. On the page, the Llama3-8B model will be loaded by default. Users can switch models in the menu settings, select the model name to be activated in the model list, and choose Activate to switch models. Switching models requires reloading the new model into the Inferentia 2 accelerator memory. This process takes about 1 minute. During this process, users can check the loading status of the new model by choosing Retrieve model status. If the status is Available, it indicates that the new model has been successfully loaded.

The effects of the different models are shown in the following figure:

The following figures shows the solution in a browser on a PC:

API interface and performance testing

The solution uses a Text Generation Inference Inference Server, which supports /generate and /generate_stream interfaces and uses port 8080 by default. You can make API calls by replacing <IP> that follows with the IP address deployed previously.

The /generate interface is used to return all responses to the client at once after generating all tokens on the server side.

curl <IP>:8080/generate
    -X POST
    -d '{"inputs”: "Calculate the distance from Beijing to Shanghai"}'
    -H 'Content-Type: application/json'

/generate_stream is used to reduce waiting delays and enhance the user experience by receiving tokens one by one when the model output length is relatively large.

curl <IP>:8080/generate_stream 
    -X POST
    -d '{"inputs”: "Write an essay on the mental health of elementary school students with no more than 300 words. "}' 
    -H 'Content-Type: application/json'

Here is a sample code to use requests interface in python.

import requests
url = "http://<IP>:8080/generate"
headers = {"Content-Type": "application/json"}
data = {"inputs": "Calculate the distance from Beijing to Shanghai","parameters":{
    "max_new_tokens":200
  }
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

Summary

In this blog post, we introduced methods and examples of deploying popular LLMs on AWS AI chips, so that users can quickly experience the productivity improvements provided by LLMs. The model deployed on Inf2 instance has been validated by multiple users and scenarios, showing strong performance and wide applicability. AWS is continuously expanding its application scenarios and features to provide users with efficient and economical computing capabilities. See Inf2 Inference Performance to check the types and list of models supported on the Inferentia2 chip. Contact us to give feedback on your needs or ask questions about deploying LLMs on AWS AI chips.

References


About the authors

Zheng Zhang is a technical expert for Amazon Web Services machine learning products, focus on Amazon Web Services-based accelerated computing and GPU instances. He has rich experiences on large-scale model training and inference acceleration in machine learning.

Bingyang Huang is a Go-To-Market Specialist of Accelerated Computing at GCR SSO GenAI team. She has experience on deploying the AI accelerator on customer’s production environment. Outside of work, she enjoys watching films and exploring good foods.

Tian Shi is Senior Solution Architect at Amazon Web Services. He has rich experience in cloud computing, data analysis, and machine learning and is currently dedicated to research and practice in the fields of data science, machine learning, and serverless. His translations include Machine Learning as a Service, DevOps Practices Based on Kubernetes, Practical Kubernetes Microservices, Prometheus Monitoring Practice, and CoreDNS Study Guide in the Cloud Native Era.

Chuan Xie is a Senior Solution Architect at Amazon Web Services Generative AI, responsible for the design, implementation, and optimization of generative artificial intelligence solutions based on the Amazon Cloud. River has many years of production and research experience in the communications, ecommerce, internet and other industries, and rich practical experience in data science, recommendation systems, LLM RAG, and others. He has multiple AI-related product technology invention patents.

Read More