How an NVIDIA Engineer Unplugs to Recharge During Free Days

How an NVIDIA Engineer Unplugs to Recharge During Free Days

On a weekday afternoon, Ashwini Ashtankar sat on the bank of the Doodhpathri River, in a valley nestled in the Himalayas. Taking a deep breath, she noticed that there was no city noise, no pollution — and no work emails.

Ashtankar, a senior tools development engineer in NVIDIA’s Pune, India, office, took advantage of the company’s free days — two extra days off per quarter when the whole company disconnects from work — to recharge. Free days are fully paid by NVIDIA, not counted as vacation or as personal time off, and are in addition to country-specific holidays and time-away programs.

Free days give employees time to take an adventure, a breather — or both. Ashtankar and her husband, Dipen Sisodia — also an NVIDIAN — spent it outdoors, hiking up a mountain, playing in snow and exploring forests and lush green meadows.

“My free days give me time to focus on myself and recharge,” said Ashtankar. “We didn’t take our laptops. We were able to completely disconnect, like all NVIDIANs were doing at the same time.”

Ashtankar returned to work feeling refreshed and recharged, she said. Her team tests software features of NVIDIA products, focusing on GPU display drivers and the GeForce NOW game-streaming service, to make sure bugs are found and addressed before a product reaches customers.

“I take pride in tackling challenges with the highest level of quality and creativity, all in support of delivering the best products to our customers,” she said. “To do that, sometimes the most productive thing we can do is rest and let the soul catch up with the body.”

Ashtankar plans to build her career at NVIDIA for many years to come.

“I’ve never heard of another company that truly cares this much about its employees,” she said.

Learn more about NVIDIA life, culture and careers.

Read More

The future of productivity agents with NinjaTech AI and AWS Trainium

The future of productivity agents with NinjaTech AI and AWS Trainium

This is a guest post by Arash Sadrieh, Tahir Azim, and Tengfui Xue from NinjaTech AI.

NinjaTech AI’s mission is to make everyone more productive by taking care of time-consuming complex tasks with fast and affordable artificial intelligence (AI) agents. We recently launched MyNinja.ai, one of the world’s first multi-agent personal AI assistants, to drive towards our mission. MyNinja.ai is built from the ground up using specialized agents that are capable of completing tasks on your behalf, including scheduling meetings, conducting deep research from the web, generating code, and helping with writing. These agents can break down complicated, multi-step tasks into branched solutions, and are capable of evaluating the generated solutions dynamically while continually learning from past experiences. All of these tasks are accomplished in a fully autonomous and asynchronous manner, freeing you up to continue your day while Ninja works on these tasks in the background, and engaging when your input is required.

Because no single large language model (LLM) is perfect for every task, we knew that building a personal AI assistant would require multiple LLMs optimized specifically for a variety of tasks. In order to deliver the accuracy and capabilities to delight our users, we also knew that we would require these multiple models to work together in tandem. Finally, we needed scalable and cost-effective methods for training these various models—an undertaking that has historically been costly to pursue for most startups. In this post, we describe how we built our cutting-edge productivity agent NinjaLLM, the backbone of MyNinja.ai, using AWS Trainium chips.

Building a dataset

We recognized early that to deliver on the mission of tackling tasks on a user’s behalf, we needed multiple models that were optimized for specific tasks. Examples include our Deep Researcher, Deep Coder, and Advisor models. After testing available open source models, we felt that the out-of-the-box capabilities and responses were insufficient with prompt engineering alone to meet our needs. Specifically, in our testing with open source models, we wanted to make sure each model was optimized for a ReAct/chain-of-thought style of prompting. Additionally, we wanted to make sure the model would, when deployed as part of a Retrieval Augmented Generation (RAG) system, accurately cite each source, as well as any bias towards saying “I don’t know” as opposed to generating false answers. For that purpose, we chose to fine-tune the models for the various downstream tasks.

In constructing our training dataset, our goal was twofold: adapt each model for its suited downstream task and persona (Researcher, Advisor, Coder, and so on), and adapt the models to follow a specific output structure. To that end, we followed the Lima approach for fine-tuning. We used a training sample size of roughly 20 million tokens, focusing on the format and tone of the output while using a diverse but relatively small sample size. To construct our supervised fine-tuning dataset, we began by creating initial seed tasks for each model. With these seed tasks, we generated an initial synthetic dataset using Meta’s Llama 2 model. We were able to use the synthetic dataset to perform an initial round of fine-tuning. To initially evaluate the performance of this fine-tuned model, we crowd-sourced user feedback to iteratively create more samples. We also used a series of benchmarks—internal and public—to assess model performance and continued to iterate.

Fine-tuning on Trainium

We elected to start with the Llama models for a pre-trained base model for several reasons: most notably the great out-of-the-box performance, strong ecosystem support from various libraries, and the truly open source and permissive license. At the time, we began with Llama 2, testing across the various sizes (7B, 13B, and 70B). For training, we chose to use a cluster of trn1.32xlarge instances to take advantage of Trainium chips. We used a cluster of 32 instances in order to efficiently parallelize the training. We also used AWS ParallelCluster to manage cluster orchestration. By using a cluster of Trainium instances, each fine-tuning iteration took less than 3 hours, at a cost of less than $1,000. This quick iteration time and low cost, allowed us to quickly tune and test our models and improve our model accuracy. To achieve the accuracies discussed in the following sections, we only had to spend around $30k, savings hundreds of thousands, if not millions of dollars if we had to train on traditional training accelerators.

The following diagram illustrates our training architecture.

After we had established our fine-tuning pipelines built on top of Trainium, we were able to fine-tune and refine our models thanks to the Neuron Distributed training libraries. This was exceptionally useful and timely, because leading up to the launch of MyNinja.ai, Meta’s Llama 3 models were released. Llama 3 and Llama 2 share similar architecture, so we were able to rapidly upgrade to the newer model. This velocity in switching allowed us to take advantage of the inherent gains in model accuracy, and very quickly run through another round of fine-tuning with the Llama 3 weights and prepare for launch.

Model evaluation

For evaluating the model, there were two objectives: evaluate the model’s ability to answer user questions, and evaluate the system’s ability to answer questions with provided sources, because this is our personal AI assistant’s primary interface. We selected the HotPotQA and Natural Questions (NQ) Open datasets, both of which are a good fit because of their open benchmarking datasets with public leaderboards.

We calculated accuracy by matching the model’s answer to the expected answer, using the top 10 passages retrieved from a Wikipedia corpus. We performed content filtering and ranking using ColBERTv2, a BERT-based retrieval model. We achieved accuracies of 62.22% on the NQ Open dataset and 58.84% on HotPotQA by using our enhanced Llama 3 RAG model, demonstrating notable improvements over other baseline models. The following figure summarizes our results.

Future work

Looking ahead, we’re working on several developments to continue improving our model’s performance and user experience. First, we intend to use ORPO to fine-tune our models. ORPO combines traditional fine-tuning with preference alignment, while using a single preference alignment dataset for both. We believe this will allow us to better align models to achieve better results for users.

Additionally, we intend to build a custom ensemble model from the various models we have fine-tuned thus far. Inspired by Mixture of Expert (MoE) model architectures, we intend to introduce a routing layer to our various models. We believe this will radically simplify our model serving and scaling architecture, while maintaining the quality in various tasks that our users have come to expect from our personal AI assistant.

Conclusion

Building next-gen AI agents to make everyone more productive is NinjaTech AI’s pathway to achieving its mission. To democratize access to this transformative technology, it is critical to have access to high-powered compute, open source models, and an ecosystem of tools that make training each new agent affordable and fast. AWS’s purpose-built AI chips, access to the top open source models, and its training architecture make this possible.

To learn more about how we built NinjaTech AI’s multi-agent personal AI, you can read our whitepaper. You can also try these AI agents for free at MyNinja.ai.


About the authors

 Arash Sadrieh is the Co-Founder and Chief Science Officer at Ninjatech.ai. Arash co-founded Ninjatech.ai with a vision to make everyone more productive by taking care of time-consuming tasks with AI agents. This vision was shaped during his tenure as a Senior Applied Scientist at AWS, where he drove key research initiatives that significantly improved infrastructure efficiency over six years, earning him multiple patents for optimizing core infrastructure. His academic background includes a PhD in computer modeling and simulation, with collaborations with esteemed institutions such as Oxford University, Sydney University, and CSIRO. Prior to his industry tenure, Arash had a postdoctoral research tenure marked by publications in high-impact journals, including Nature Communications.

Tahir Azim is a Staff Software Engineer at NinjaTech. Tahir focuses on NinjaTech’s Inf2 and Trn1 based training and inference platforms, its unified gateway for accessing these platforms, and its RAG-based research skill. He previously worked at Amazon as a senior software engineer, building data-driven systems for optimal utilization of Amazon’s global Internet edge infrastructure, driving down cost, congestion and latency. Before moving to industry, Tahir earned an M.S. and Ph.D. in Computer Science from Stanford University, taught for three years as an assistant professor at NUST(Pakistan), and did a post-doc in fast data analytics systems at EPFL. Tahir has authored several publications presented at top-tier conferences such as VLDB, USENIX ATC, MobiCom and MobiHoc.

Tengfei Xue is an Applied Scientist at NinjaTech AI. His current research interests include natural language processing and multimodal learning, particularly using large language models and large multimodal models. Tengfei completed his PhD studies at the School of Computer Science, University of Sydney, where he focused on deep learning for healthcare using various modalities. He was also a visiting PhD candidate at the Laboratory of Mathematics in Imaging (LMI) at Harvard University, where he worked on 3D computer vision for complex geometric data.

Read More

Build generative AI applications on Amazon Bedrock — the secure, compliant, and responsible foundation

Build generative AI applications on Amazon Bedrock — the secure, compliant, and responsible foundation

Generative AI has revolutionized industries by creating content, from text and images to audio and code. Although it can unlock numerous possibilities, integrating generative AI into applications demands meticulous planning. Amazon Bedrock is a fully managed service that provides access to large language models (LLMs) and other foundation models (FMs) from leading AI companies through a single API. It provides a broad set of tools and capabilities to help build generative AI applications.

Starting today, I’ll be writing a blog series to highlight some of the key factors driving customers to choose Amazon Bedrock. One of the most important reason is that Bedrock enables customers to build a secure, compliant, and responsible foundation for generative AI applications. In this post, I explore how Amazon Bedrock helps address security and privacy concerns, enables secure model customization, accelerates auditability and incident response, and fosters trust through transparency and responsible AI. Plus, I’ll showcase real-world examples of companies building secure generative AI applications on Amazon Bedrock—demonstrating its practical applications across different industries.

Listening to what our customers are saying

During the past year, my colleague Jeff Barr, VP & Chief Evangelist at AWS, and I have had the opportunity to speak with numerous customers about generative AI. They mention compelling reasons for choosing Amazon Bedrock to build and scale their transformative generative AI applications. Jeff’s video highlights some of the key factors driving customers to choose Amazon Bedrock today.

As you build and operationalize generative AI, it’s important not to lose sight of critically important elements—security, compliance, and responsible AI—particularly for use cases involving sensitive data. The OWASP Top 10 For LLMs outlines the most common vulnerabilities, but addressing these may require additional efforts including stringent access controls, data encryption, preventing prompt injection attacks, and compliance with policies. You want to make sure your AI applications work reliably, as well as securely.

Making data security and privacy a priority

Like many organizations starting their generative AI journey, the first concern is to make sure the organization’s data remains secure and private when used for model tuning or Retrieval Augmented Generation (RAG). Amazon Bedrock provides a multi-layered approach to address this issue, helping you ensure that your data remains secure and private throughout the entire lifecycle of building generative AI applications:

  • Data isolation and encryption. Any customer content processed by Amazon Bedrock, such as customer inputs and model outputs, is not shared with any third-party model providers, and will not be used to train the underlying FMs. Furthermore, data is encrypted in-transit using TLS 1.2+ and at-rest through AWS Key Management Service (AWS KMS).
  • Secure connectivity options. Customers have flexibility with how they connect to Amazon Bedrock’s API endpoints. You can use public internet gateways, AWS PrivateLink (VPC endpoint) for private connectivity, and even backhaul traffic over AWS Direct Connect from your on-premises networks.
  • Model access controls. Amazon Bedrock provides robust access controls at multiple levels. Model access policies allow you to explicitly allow or deny enabling specific FMs for your account. AWS Identity and Access Management (IAM) policies let you further restrict which provisioned models your applications and roles can invoke, and which APIs on those models can be called.

Druva provides a data security software-as-a-service (SaaS) solution to enable cyber, data, and operational resilience for all businesses. They used Amazon Bedrock to rapidly experiment, evaluate, and implement different LLM components tailored to solve specific customer needs around data protection without worrying about the underlying infrastructure management.

“We built our new service Dru — an AI co-pilot that both IT and business teams can use to access critical information about their protection environments and perform actions in natural language — in Amazon Bedrock because it provides fully managed and secure access to an array of foundation models,”

– David Gildea, Vice President of Product, Generative AI at Druva.

Ensuring secure customization

A critical aspect of generative AI adoption for many organizations is the ability to securely customize the application to align with your specific use cases and requirements, including RAG or fine-tuning FMs. Amazon Bedrock offers a secure approach to model customization, so sensitive data remains protected throughout the entire process:

  • Model customization data security. When fine-tuning a model, Amazon Bedrock uses the encrypted training data from an Amazon Simple Storage Service (Amazon S3) bucket through a private VPC connection. Amazon Bedrock doesn’t use model customization data for any other purpose. Your training data isn’t used to train the base Amazon Titan models or distributed to third parties. Nor is other usage data, such as usage timestamps, logged account IDs, and other information logged by the service, used to train the models. In fact, none of the training or validation data you provide for fine tuning or continued pre-training is stored by Amazon Bedrock. When the model customization work is complete—it remains isolated and encrypted with your KMS keys.
  • Secure deployment of fine-tuned models. The pre-trained or fine-tuned models are deployed in isolated environments specifically for your account. You can further encrypt these models with your own KMS keys, preventing access without appropriate IAM permissions.
  • Centralized multi-account model access.  AWS Organizations provides you with the ability to centrally manage your environment across multiple accounts. You can create and organize accounts in an organization, consolidate costs, and apply policies for custom environments. For organizations with multiple AWS accounts or a distributed application architecture, Amazon Bedrock supports centralized governance and access to FMs – you can secure your environment, create and share resources, and centrally manage permissions. Using standard AWS cross-account IAM roles, administrators can grant secure access to models across different accounts, enabling controlled and auditable usage while maintaining a centralized point of control.

With seamless access to LLMs in Amazon Bedrock—and with data encrypted in-transit and at-rest—BMW Group securely delivers high-quality connected mobility solutions to motorists around the world.

“Using Amazon Bedrock, we’ve been able to scale our cloud governance, reduce costs and time to market, and provide a better service for our customers. All of this is helping us deliver the secure, first-class digital experiences that people across the world expect from BMW.”

– Dr. Jens Kohl, Head of Offboard Architecture, BMW Group.

Enabling auditability and visibility

In addition to the security controls around data isolation, encryption, and access, Amazon Bedrock provides capabilities to enable auditability and accelerate incident response when needed:

  • Compliance certifications. For customers with stringent regulatory requirements, you can use Amazon Bedrock in compliance with the General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and more. In addition, AWS has successfully extended the registration status of Amazon Bedrock in Cloud Infrastructure Service Providers in Europe Data Protection Code of Conduct (CISPE CODE) Public Register. This declaration provides independent verification and an added level of assurance that Amazon Bedrock can be used in compliance with the GDPR. For Federal agencies and public sector organizations, Amazon Bedrock recently announced FedRAMP Moderate, approved for use in our US East and West AWS Regions. Amazon Bedrock is also under JAB review for FedRAMP High authorization in AWS GovCloud (US).
  • Monitoring and logging. Native integrations with Amazon CloudWatch and AWS CloudTrail provide comprehensive monitoring, logging, and visibility into API activity, model usage metrics, token consumption, and other performance data. These capabilities enable continuous monitoring for improvement, optimization, and auditing as needed – something we know is critical from working with customers in the cloud for the last 18 years. Amazon Bedrock allows you to enable detailed logging of all model inputs and outputs, including IAM invocation role, and metadata associated with all calls that are performed in your account. These logs facilitate monitoring model responses to adhere to your organization’s AI policies and reputation guidelines. When you enable log model invocation logging, you can use AWS KMS to encrypt your log data, and use IAM policies to protect who can access your log data. None of this data is stored within Amazon Bedrock, and is only available within a customer’s account.

Implementing responsible AI practices

AWS is committed to developing generative AI responsibly, taking a people-centric approach that prioritizes education, science, and our customers, to integrate responsible AI across the full AI lifecycle. With AWS’s comprehensive approach to responsible AI development and governance, Amazon Bedrock empowers you to build trustworthy generative AI systems in line with your responsible AI principles.

We give our customers the tools, guidance, and resources they need to get started with purpose-built services and features, including several in Amazon Bedrock:

  • Safeguard generative AI applications– Guardrails for Amazon Bedrock is the only responsible AI capability offered by a major cloud provider that enables customers to customize and apply safety, privacy, and truthfulness checks for your generative AI applications. Guardrails helps customers block as much as 85% more harmful content than protection natively provided by some FMs on Amazon Bedrock today. It works with all LLMs in Amazon Bedrock, fine-tuned models, and also integrates with Agents and Knowledge Bases for Amazon Bedrock. Customers can define content filters with configurable thresholds to help filter harmful content across hate speech, insults, sexual language, violence, misconduct (including criminal activity), and prompt attacks (prompt injection and jailbreak). Using a short natural language description, Guardrails for Amazon Bedrock allows you to detect and block user inputs and FM responses that fall under restricted topics or sensitive content such as personally identifiable information (PII). You can combine multiple policy types to configure these safeguards for different scenarios and apply them across FMs on Amazon Bedrock. This ensures that your generative AI applications adhere to your organization’s responsible AI policies as well as provide a consistent and safe user experience.
  • Provenance tracking. Now available in preview, Model Evaluation on Amazon Bedrock helps customers evaluate, compare, and select the best FMs for their specific use case based on custom metrics, such as accuracy and safety, using either automatic or human evaluations. Customers can evaluate AI models in two ways—automatic or with human input. For automatic evaluations, they pick criteria such as accuracy or toxicity, and use their own data or public datasets. For evaluations needing human judgment, customers can easily set up workflows for human review with a few clicks. After setting up, Amazon Bedrock runs the evaluations and provides a report showing how well the model performed on important safety and accuracy measures. This report helps customers choose the best model for their needs, even more important when helping customers are evaluating migrating to a new model in Amazon Bedrock against an existing model for an application.
  • Watermark detection. All Amazon Titan FMs are built with responsible AI in mind. Amazon Titan Image Generator creates images embedded with imperceptible digital watermarks. The watermark detection for Amazon Titan Image Generator allows you to identify images generated by Amazon Titan Image Generator, a foundation model that allows users to create realistic, studio-quality images in large volumes and at low cost, using natural language prompts. With this feature, you can increase transparency around AI-generated content by mitigating harmful content generation and reducing the spread of misinformation. It also provides a confidence score, allowing you to assess the reliability of the detection, even if the original image has been modified. Simply upload an image in the Amazon Bedrock console, and the API will detect watermarks embedded in images created by Titan Image Generator, including those generated by the base model and any customized versions.
  • AI Service Cards provide transparency and document the intended use cases and fairness considerations for our AWS AI services. Our latest services cards include Amazon Titan Text Premier and Amazon Titan Text Lite and Titan Text Express with more coming soon.

Aha! is a software company that helps more than 1 million people bring their product strategy to life.

“Our customers depend on us every day to set goals, collect customer feedback, and create visual roadmaps. That is why we use Amazon Bedrock to power many of our generative AI capabilities. Amazon Bedrock provides responsible AI features, which enable us to have full control over our information through its data protection and privacy policies, and block harmful content through Guardrails for Bedrock.”

– Dr. Chris Waters, co-founder and Chief Technology Officer at Aha!

Building trust through transparency

By addressing security, compliance, and responsible AI holistically, Amazon Bedrock helps customers to unlock generative AI’s transformative potential. As generative AI capabilities continue to evolve so rapidly, building trust through transparency is crucial. Amazon Bedrock works continuously to help develop safe and secure applications and practices, helping build generative AI applications responsibly.

The bottom line? Amazon Bedrock makes it effortless for you to unlock sustained growth with generative AI and experience the power of LLMs. Get started today – Build AI applications or customize models securely using your data to start your generative AI journey with confidence.

Resources

For more information about generative AI and Amazon Bedrock, explore the following resources:


About the author

Vasi Philomin is VP of Generative AI at AWS. He leads generative AI efforts, including Amazon Bedrock and Amazon Titan.

Read More

Build a conversational chatbot using different LLMs within single interface – Part 1

Build a conversational chatbot using different LLMs within single interface – Part 1

With the advent of generative artificial intelligence (AI), foundation models (FMs) can generate content such as answering questions, summarizing text, and providing highlights from the sourced document. However, for model selection, there is a wide choice from model providers, like Amazon, Anthropic, AI21 Labs, Cohere, and Meta, coupled with discrete real-world data formats in PDF, Word, text, CSV, image, audio, or video.

Amazon Bedrock is a fully managed service that makes it straightforward to build and scale generative AI applications. Amazon Bedrock offers a choice of high-performing FMs from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, through a single API. It enables you to privately customize FMs with your data using techniques such as fine-tuning, prompt engineering, and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources while complying with security and privacy requirements.

In this post, we show you a solution for building a single interface conversational chatbot that allows end-users to choose between different large language models (LLMs) and inference parameters for varied input data formats. The solution uses Amazon Bedrock to create choice and flexibility to improve the user experience and compare the model outputs from different options.

The entire code base is available in GitHub, along with an AWS CloudFormation template.

What is RAG

Retrieval Augmented Generation (RAG) can enhance the generation process by using the benefits of retrieval, enabling a natural language generation model to produce more informed and contextually appropriate responses. By incorporating relevant information from retrieval into the generation process, RAG aims to improve the accuracy, coherence, and informativeness of the generated content.

Implementing an effective RAG system requires several key components working in harmony:

  • Foundation models – The foundation of a RAG architecture is a pre-trained language model that handles text generation. Amazon Bedrock encompasses models from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Amazon that possess strong language comprehension and synthesis abilities to engage in conversational dialogue.
  • Vector store – At the heart of the retrieval functionality is a vector store database persisting document embeddings for similarity search. This allows rapid identification of relevant contextual information. AWS offers many services for your vector database requirements:
  • Retriever – The retriever module uses the vector store to efficiently find pertinent documents and passages to augment prompts.
  • Embedder – To populate the vector store, an embedding model encodes source documents into vector representations consumable by the retriever. Models like Amazon Titan Embeddings G1 – Text v1.2 are ideal for this text-to-vector abstraction.
  • Document ingestion – Robust pipelines ingest, preprocess, and tokenize source documents, chunking them into manageable passages for embedding and efficient lookup. For this solution, we use the LangChain framework for document preprocessing. By orchestrating these core components using LangChain, RAG systems empower language models to access vast knowledge for grounded generation.

We have fully managed support for our end-to-end RAG workflow using Knowledge Bases for Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can give FMs and agents contextual information from your company’s private data sources for RAG to deliver more relevant, accurate, and customized responses.

To equip FMs with up-to-date and proprietary information, organizations use RAG to fetch data from company data sources and enrich the prompt to provide more relevant and accurate responses. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows. Session context management is built in, so your app can readily support multi-turn conversations.

Solution overview

This chatbot is built using RAG, enabling it to provide versatile conversational abilities. The following figure illustrates a sample UI of the Q&A interface using Streamlit and the workflow.

This post provides a single UI with multiple choices for the following capabilities:

  • Leading FMs available through Amazon Bedrock
  • Inference parameters for each of these models
  • Source data input formats for RAG:
    • Text (PDF, CSV, Word)
    • Website link
    • YouTube video
    • Audio
    • Scanned image
    • PowerPoint
  • RAG operation using the LLM, inference parameter, and sources:
    • Q&A
    • Summary: summarize, get highlights, extract text

We have used one of LangChain’s many document loaders, YouTubeLoader. The from_you_tube_url function helps extract transcripts and metadata from the YouTube video.

The documents contain two attributes:

  • page_content with the transcripts
  • metadata with basic information about the video

Text is extracted from the transcript and using Langchain TextLoader, the document is split and chunked, and embeddings are created, which are then stored in the vector store.

The following diagram illustrates the solution architecture.

Prerequisites

To implement this solution, you should have the following prerequisites:

  • An AWS account with the required permissions to launch the stack using AWS CloudFormation.
  • Amazon Elastic Compute Cloud (Amazon EC2) hosting the application should have internet access so as to download all the necessary OS patches and application related (python) libraries
  • A basic understanding of Amazon Bedrock and FMs.
  • This solution uses the Amazon Titan Text Embedding model. Make sure this model is enabled for use in Amazon Bedrock. On the Amazon Bedrock console, choose Model access in the navigation pane.
    • If Amazon Titan Text Embeddings is enabled, the access status will state Access granted.
    • If the model is not available, enable access to the model by choosing Manage model access, selecting Titan Multimodal Embeddings G1, and choosing Request model access. The model is enabled for use immediately.

Deploy the solution

The CloudFormation template deploys an Amazon Elastic Compute Cloud (Amazon EC2) instance to host the Streamlit application, along with other associated resources like an AWS Identity and Access Management (IAM) role and Amazon Simple Storage Service (Amazon S3) bucket. For more information about Amazon Bedrock and IAM, refer to How Amazon Bedrock Works with IAM.

In this post, we deploy the Streamlit application over an EC2 instance inside a VPC, but you can deploy it as a containerized application using a serverless solution with AWS Fargate. We discuss this in more detail in Part 2.

Complete the following steps to deploy the solution resources using AWS CloudFormation:

  1. Download the CloudFormation template StreamlitAppServer_Cfn.yml from the GitHub repo.
  2. On the AWS CloudFormation, create a new stack.
  3. For Prepare template, select Template is ready.
  4. In the Specify template section, provide the following information:
    1. For Template source, select Upload a template file.
    2. Choose file and upload the template you downloaded.
  5. Choose Next.

  1. For Stack name, enter a name (for this post, StreamlitAppServer).
  2. In the Parameters section, provide the following information:
    1. For Specify the VPC ID where you want your app server deployed, enter the VPC ID where you want to deploy this application server.
    2. For VPCCidr, enter the CIDR of the VPC you’re using.
    3. For SubnetID, enter the subnet ID from the same VPC.
    4. For MYIPCidr, enter the IP address of your computer or workstation so you can open the Streamlit application in your local browser.

You can run the command curl https://api.ipify.org on your local terminal to get your IP address.

Specify_Stack_Details_Screenshot-2

  1. Leave the rest of the parameters as defaulted.
  2. Choose Next.
  3. In the Capabilities section, select the acknowledgement check box.
  4. Choose Submit.

Wait until you see the stack status show as CREATE_COMPLETE.

  1. Choose the stack’s Resources tab to see the resources you launched as part of the stack deployment.

  1. Choose the link for S3Bucket to be redirected to the Amazon S3 console.
    1. Note the S3 bucket name to update the deployment script later.
    2. Choose Create folder to create a new folder.
    3. For Folder name, enter a name (for this post, gen-ai-qa).

Make sure to follow AWS security best practices for securing data in Amazon S3. For more details, see Top 10 security best practices for securing data in Amazon S3.

  1. Return to the stack Resources tab and choose the link to StreamlitAppServer to be redirected to the Amazon EC2 console.
    1. Select StreamlitApp_Sever and choose Connect.

This will open a new page with various ways to connect to the EC2 instance launched.

  1. For this solution, select Connect using EC2 Instance Connect, then choose Connect.

This will open an Amazon EC2 session in your browser.

  1. Run the following command to monitor the progress of all the Python-related libraries being installed as part of the user data:
tail -f /tmp/userData.log
  1. When you see the message Finished running user data..., you can exit the session by pressing Ctrl + C.

This takes about 15 minutes to complete.

  1. Run the following commands to start the application:
cd $HOME/bedrock-qnachatbot
bucket_name=$(aws cloudformation describe-stacks --stack-name StreamlitAppServer --query "Stacks[0].Outputs[?starts_with(OutputKey, 'BucketName')].OutputValue" --output text)
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
aws_region_name=$(curl -s http://169.254.169.254/latest/meta-data/placement/region -H "X-aws-ec2-metadata-token: $TOKEN")
sed -i "s/<S3_Bucket_Name>/${bucket_name}/g" $HOME/bedrock-qnachatbot/src/utils.py
sed -i "s/<AWS_Region>/${aws_region_name}/g" $HOME/bedrock-qnachatbot/src/utils.py
export AWS_DEFAULT_REGION=${aws_region_name}
streamlit run src/1_🏠_Home.py

  1. Make a note of the External URL value.
  2. If by any chance you exit of the session (or application is stopped), you can restart the application by running the same command as highlighted in Step # 18

Use the chatbot

Use the external URL you copied in the previous step to access the application.

You can upload your file to start using the chatbot for Q&A.

Clean up

To avoid incurring future charges, delete the resources that you created:

  1. Empty the contents of the S3 bucket you created as a part of this post.
  2. Delete the CloudFormation stack you created as part of this post.

Conclusion

In this post, we showed you how to create a Q&A chatbot that can answer questions across an enterprise’s corpus of documents with choices of FM available within Amazon Bedrock—within a single interface.

In Part 2, we show you how to use Knowledge Bases for Amazon Bedrock with enterprise-grade vector databases like OpenSearch Service, Amazon Aurora PostgreSQL, MongoDB Atlas, Weaviate, and Pinecone with your Q&A chatbot.


About the Authors

Anand Mandilwar is an Enterprise Solutions Architect at AWS. He works with enterprise customers helping customers innovate and transform their business in AWS. He is passionate about automation around Cloud operation , Infrastructure provisioning and Cloud Optimization. He also likes python programming. In his spare time, he enjoys honing his photography skill especially in Portrait and landscape area.

NagaBharathi Challa is a solutions architect in the US federal civilian team at Amazon Web Services (AWS). She works closely with customers to effectively use AWS services for their mission use cases, providing architectural best practices and guidance on a wide range of services. Outside of work, she enjoys spending time with family & spreading the power of meditation.

Read More

GeForce NOW Unleashes High-Stakes Horror With ‘Resident Evil Village’

GeForce NOW Unleashes High-Stakes Horror With ‘Resident Evil Village’

Get ready to feel some chills, even amid the summer heat. Capcom’s award-winning Resident Evil Village brings a touch of horror to the cloud this GFN Thursday, part of three new games joining GeForce NOW this week.

And a new app update brings a visual enhancement to members, along with new ways to curate their GeForce NOW gaming libraries.

Greetings on GFN
#GreetingsFromGFN by @railbeam.

Members are showcasing their favorite locations to visit in the cloud. Follow along with #GreetingsFromGFN on @NVIDIAGFN social media accounts and share picturesque scenes from the cloud for a chance to be featured.

The Bell Tolls for All

Resident Evil Village on GeForce NOW
The cloud — big enough, even, for Lady Dimitrescu and her towering castle.

Resident Evil Village, the follow-up to Capcom’s critically acclaimed Resident Evil 7 Biohazard, delivers a gripping blend of survival-horror and action. Step into the shoes of Ethan Winters, a desperate father determined to rescue his kidnapped daughter.

Set against a backdrop of a chilling European village teeming with mutant creatures, the game includes a captivating cast of characters, including the enigmatic Lady Dimitrescu, who haunts the dimly lit halls of her grand castle. Fend off hordes of enemies, such as lycanthropic villagers and grotesque abominations.

Experience classic survival-horror tactics — such as resource management and exploration — mixed with action featuring intense combat and higher enemy counts.

Ultimate and Priority members can experience the horrors of this dark and twisted world in gruesome, mesmerizing detail with support for ray tracing and high dynamic range (HDR) for the most lifelike shadows and sharp visual fidelity when navigating every eerie hallway. Members can stream it all seamlessly from NVIDIA GeForce RTX-powered servers in the cloud and get a taste of the chills with the Resident Evil Village demo before taking on the towering Lady Dimitrescu in the full game.

I Can See Clearly Now

The latest GeForce NOW app update — version 2.0.64 — adds support for 10-bit color precision. Available for Ultimate members, this feature enhances image quality when streaming on Windows, macOS and NVIDIA SHIELD TV.

SDR10 on GeForce NOW
Rolling out now.

10-bit color precision significantly improves the accuracy and richness of color gradients during streaming. Members will especially notice its effects in scenes with detailed color transitions, such as for vibrant skies, dimly lit interiors, and various loading screens and menus. It’s useful for non-HDR displays and non-HDR-supported games. Find the setting in the GeForce NOW app > Streaming Quality > Color Precision, with the recommended default value of 10-bit.

Try it out on the neon-lit streets of Cyberpunk 2077 for smoother color transitions, and traverse the diverse landscapes of Assassin’s Creed Valhalla and other games for a more immersive streaming experience.

The update, rolling out now, also brings bug fixes and new ways to curate a member’s in-app game library. For more information, visit the NVIDIA Knowledgebase.

Lights, Camera, Action: New Games

Beyond Good and Evil 20th Anniversary Edition on GeForce NOW
Uncover the truth.

Join the rebellion as action reporter Jade in Beyond Good & Evil – 20th Anniversary Edition from Ubisoft. Embark on this epic adventure in up to 4K 60 frames per second with improved graphics and audio, a new speedrun mode, updated achievements and an exclusive anniversary gallery. Enjoy unique new rewards exploring Hillys and discover more about Jade’s past in a new treasure hunt throughout the planet.

Check out the list of new games this week:

  • Beyond Good & Evil – 20th Anniversary Edition (New release on Steam and Ubisoft, June 24)
  • Drug Dealer Simulator 2 (Steam)
  • Resident Evil Village (Steam)
  • Resident Evil Village Demo (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Into the Omniverse: SyncTwin Helps Democratize Industrial Digital Twins With Generative AI, OpenUSD

Into the Omniverse: SyncTwin Helps Democratize Industrial Digital Twins With Generative AI, OpenUSD

Editor’s note: This post is part of Into the Omniverse, a series focused on how technical artists, developers and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Efficiency and sustainability are critical for organizations looking to be at the forefront of industrial innovation.

To address the digitalization needs of manufacturing and other industries, SyncTwin GmbH — a company that builds software to optimize production, intralogistics and assembly  — developed a digital twin app using NVIDIA cuOpt, an accelerated optimization engine for solving complex routing problems, and NVIDIA Omniverse, a platform of application programming interfaces, software development kits and services that enable developers to build OpenUSD-based applications.

SyncTwin is harnessing the power of the extensible OpenUSD framework for describing, composing, simulating, and collaborating within 3D worlds to help its customers create physically accurate digital twins of their factories. The digital twins can be used to optimize production and enhance digital precision to meet industrial performance.

OpenUSD’s Role in Modern Manufacturing

Manufacturing workflows are incredibly complex, making effective communication and integration across various domains pivotal to ensuring operational efficiency. The SyncTwin app provides seamless collaboration capabilities for factory plant managers and their teams, enabling them to optimize processes and resources.

The app uses OpenUSD and Omniverse to help make factory planning and operations easier and more accessible by integrating various manufacturing aspects into a cohesive digital twin. Customers can integrate visual data, production details, product catalogs, orders, schedules, resources and production settings all in one place with OpenUSD.

The SyncTwin app creates realistic, virtual environments that facilitate seamless interactions between different sectors of factory operations. This capability enables diverse data — including floorplans from Microsoft PowerPoint and warehouse container data from Excel spreadsheets — to be aggregated in a unified digital twin.

The flexibility of OpenUSD allows for non-destructive editing and composition of complex 3D assets and animations, further enhancing the digital twin.

“OpenUSD is the common language bringing all these different factory domains into a single digital twin,” said Michael Wagner, cofounder and chief technology officer of SyncTwin. “The framework can be instrumental in dismantling data silos and enhancing collaborative efficiency across different factory domains, such as assembly, logistics and infrastructure planning.”

Hear Wagner discuss turning PowerPoint and Excel data into digital twin scenarios using the SyncTwin App in a LinkedIn livestream on July 4 at 11 a.m. CET.

Pioneering Generative AI in Factory Planning

By integrating generative AI into its platform, SyncTwin also provides users with data-driven insights and recommendations, enhancing decision-making processes.

This AI integration automates complex analyses, accelerates operations and reduces the need for manual inputs. Learn more about how SyncTwin and other startups are combining the powers of OpenUSD and generative AI to elevate their technologies in this NVIDIA GTC session.

Hear SyncTwin and NVIDIA experts discuss how digital twins are unlocking new possibilities in this recent community livestream:

Editor’s note: This post is part of Into the Omniverse, a series focused on how technical artists, developers and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Efficiency and sustainability are critical for organizations looking to be at the forefront of industrial innovation.

To address the digitalization needs of manufacturing and other industries, SyncTwin GmbH — a company that builds software to optimize production, intralogistics and assembly  — developed a digital twin app using the NVIDIA cuOpt accelerated optimization engine for solving complex routing problems and NVIDIA Omniverse, a platform of application programming interfaces (APIs), software development kits (SDKs) and services that enable developers to build OpenUSD-based applications.

SyncTwin is harnessing the power of the extensible OpenUSD framework for describing, composing, simulating, and collaborating within 3D worlds to help their customers create physically accurate digital twins of their factories. The digital twins can be used to optimize production and enhance digital precision to meet industrial performance.

OpenUSD’s Role in Modern Manufacturing

Manufacturing workflows are incredibly complex, making effective communication and integration across various domains pivotal to ensuring operational efficiency. The SyncTwin app provides seamless collaboration capabilities for factory plant managers and their teams, enabling them to optimize processes and resources.

The app uses OpenUSD and Omniverse to help make factory planning and operations easier and more accessible by integrating various manufacturing aspects into a cohesive digital twin. Customers can integrate visual data, production details, product catalogs, orders, schedules, resources and production settings all in one place with OpenUSD.

The SyncTwin app creates realistic, virtual environments that facilitate seamless interactions between different sectors of factory operations. This capability enables diverse data —  including floorplans from Microsoft PowerPoint and warehouse container data from an Excel spreadsheet — to be aggregated in a unified digital twin.

The flexibility of OpenUSD allows for non-destructive editing and composition of complex 3D assets and animations, further enhancing the digital twin.

“OpenUSD is the common language bringing all these different factory domains into a single digital twin,” said Michael Wagner, cofounder and chief technology officer of SyncTwin. “The framework can be instrumental in dismantling data silos and enhancing collaborative efficiency across different factory domains, such as assembly, logistics and infrastructure planning.”

Hear Wagner discuss turning PowerPoint and Excel data into digital twin scenarios using the SyncTwin App in a LinkedIn livestream on July 4 at 11am CET.

Pioneering Generative AI in Factory Planning

By integrating generative AI into its platform, SyncTwin also provides users with data-driven insights and recommendations, enhancing decision-making processes.

This AI integration automates complex analyses, accelerates operations and reduces the need for manual inputs. Learn more about how SyncTwin and other startups are combining the powers of OpenUSD and generative AI to elevate their technologies in this NVIDIA GTC session.

Hear SyncTwin and NVIDIA experts discuss how digital twins are unlocking new possibilities in this recent community livestream:

By tapping into the power of OpenUSD and NVIDIA’s AI and optimization technologies, SyncTwin is helping set new standards for factory planning and operations, improving operational efficiency and supporting the vision of sustainability and cost reduction across manufacturing.

Get Plugged Into the World of OpenUSD

Learn more about OpenUSD and meet with NVIDIA experts at SIGGRAPH, taking place July 28-Aug. 1 at the Colorado Convention Center and online. Attend these SIGGRAPH highlights:

  • NVIDIA founder and CEO Jensen Huang’s fireside chat on Monday, July 29, covering the latest in generative AI and accelerated computing.
  • OpenUSD Day on Tuesday, July 30, where industry luminaries and developers will showcase how to build 3D pipelines and tools using OpenUSD.
  • Hands-on OpenUSD training for all skill levels.

Check out this video series about how OpenUSD can improve 3D workflows. For more resources on OpenUSD, explore the Alliance for OpenUSD forum and visit the AOUSD website.

Get started with NVIDIA Omniverse by downloading the standard license free, access OpenUSD resources and learn how Omniverse Enterprise can connect teams. Follow Omniverse on Instagram, LinkedIn, Medium and X. For more, join the Omniverse community on the forums, Discord server and YouTube channel. 

Featured image courtesy of SyncTwin GmbH.

Read More