Where are they now? Reconnecting with Meta Research PhD Fellowship alumniRead More
Developer Infrastructure: 2022 mid-year academic review
As part of Meta’s commitment to innovation, members of the Developer Infrastructure (DevInfra) team are often involved in academic papers.Read More
Meet the Omnivore: Christopher Scott Constructs Architectural Designs, Virtual Environments With NVIDIA Omniverse
Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse to accelerate their 3D workflows and create virtual worlds.
Growing up in a military family, Christopher Scott moved more than 30 times, which instilled in him “the ability to be comfortable with, and even motivated by, new environments,” he said.
Today, the environments he explores — and creates — are virtual ones.
As chief technical director for 3D design and visualization services at Infinite-Compute, Scott creates physically accurate virtual environments using familiar architectural products in conjunction with NVIDIA Omniverse Enterprise, a platform for connecting and building custom 3D pipelines.
With a background in leading cutting-edge engineering projects for the U.S. Department of Defense, Scott now creates virtual environments focused on building renovation and visualization for the architecture, engineering, construction and operations (AECO) industry.
These true-to-reality virtual environments — whether of electrical rooms, manufacturing factories, or modern home designs — enable quick, efficient design of products, processes and facilities before bringing them to life in the real world.
They also help companies across AECO and other industries save money, speed project completion and make designs interactive for customers — as will be highlighted at NVIDIA GTC, a global conference on AI and the metaverse, running online Sept. 19-22.
“Physically accurate virtual environments help us deliver client projects faster, while maintaining a high level of quality and performance consistency,” said Scott, who’s now based in Austin, Texas. “The key value we offer clients is the ability to make better decisions with confidence.”
To construct his visualizations, Scott uses Omniverse Create and Omniverse Connectors for several third-party applications: Trimble SketchUp for 3D models for drawing and design; Autodesk Revit for 3D design and 2D annotation of buildings; and Unreal Engine for creating walkthrough simulations and 3D virtual spaces.
In addition, he uses software like Blender for visual effects, motion graphics and animation, and PlantFactory for modeling 3D vegetation, which gives his virtual spaces a lively and natural aesthetic.
Project Speedups With Omniverse
Within just four years, Scott went from handling 50 projects a year to more than 3,500, he said.
Around 80 of his projects each month include lidar-to-point-cloud work, a complex process that involves transforming spatial data into a collection of coordinates for 3D models for manufacturing and design.
Using Omniverse doubles productivity for this demanding workload, he said, as it offers physically accurate photorealism and rendering in real time, as well as live-sync collaboration across users.
“Previously, members of our team functioned as individual islands of productivity,” Scott said. “Omniverse gave us the integrated collaboration we desired to enhance our effectiveness and efficiency.”
At Omniverse’s core is Universal Scene Description — an open-source, extensible 3D framework and common language for creating virtual worlds.
“Omniverse’s USD standard to integrate outputs from multiple software programs allowed our team to collaborate on a source-of-truth project — letting us work across time zones much faster,” said Scott, who further accelerates his workflow by running it on NVIDIA RTX GPUs, including the RTX A6000 on Infinite-Compute’s on-demand cloud infrastructure.
“It became clear very soon after appreciating the depth and breadth of Omniverse that investing in this pipeline was not just enabling me to improve current operations,” he added. “It provides a platform for future growth — for my team members and my organization as a whole.”
While Scott says his work leans more technical than creative, he sees using Omniverse as a way to bridge these two sides of his brain.
“I’d like to think that adopting technologies like Omniverse to deliver cutting-edge solutions that have a meaningful and measurable impact on my clients’ businesses is, in its own way, a creative exercise, and perhaps even a work of art,” he said.
Join In on the Creation
Creators and developers across the world can download NVIDIA Omniverse for free, and enterprise teams can use the platform for their 3D projects.
Hear about NVIDIA’s latest AI breakthroughs powering graphics and virtual worlds at GTC, running online Sept. 19-22. Register free now and attend the top sessions for 3D creators and developers to learn more about how Omniverse can accelerate workflows.
Join the NVIDIA Omniverse User Group to connect with the growing community and see Scott’s work in Omniverse celebrated.
Check out artwork from other “Omnivores” and submit projects in the gallery. Connect your workflows to Omniverse with software from Adobe, Autodesk, Epic Games, Maxon, Reallusion and more.
Follow NVIDIA Omniverse on Instagram, Twitter, YouTube and Medium for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.
The post Meet the Omnivore: Christopher Scott Constructs Architectural Designs, Virtual Environments With NVIDIA Omniverse appeared first on NVIDIA Blog.
PaLI: Scaling Language-Image Learning in 100+ Languages
Advanced language models (e.g., GPT, GLaM, PaLM and T5) have demonstrated diverse capabilities and achieved impressive results across tasks and languages by scaling up their number of parameters. Vision-language (VL) models can benefit from similar scaling to address many tasks, such as image captioning, visual question answering (VQA), object recognition, and in-context optical-character-recognition (OCR). Increasing the success rates for these practical tasks is important for everyday interactions and applications. Furthermore, for a truly universal system, vision-language models should be able to operate in many languages, not just one.
In “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, we introduce a unified language-image model trained to perform many tasks and in over 100 languages. These tasks span vision, language, and multimodal image and language applications, such as visual question answering, image captioning, object detection, image classification, OCR, text reasoning, and others. Furthermore, we use a collection of public images that includes automatically collected annotations in 109 languages, which we call the WebLI dataset. The PaLI model pre-trained on WebLI achieves state-of-the-art performance on challenging image and language benchmarks, such as COCO-Captions, TextCaps, VQAv2, OK-VQA, TextVQA and others. It also outperforms prior models’ multilingual visual captioning and visual question answering benchmarks.
Overview
One goal of this project is to examine how language and vision models interact at scale and specifically the scalability of language-image models. We explore both per-modality scaling and the resulting cross-modal interactions of scaling. We train our largest model to 17 billion (17B) parameters, where the visual component is scaled up to 4B parameters and the language model to 13B.
The PaLI model architecture is simple, reusable and scalable. It consists of a Transformer encoder that processes the input text, and an auto-regressive Transformer decoder that generates the output text. To process images, the input to the Transformer encoder also includes “visual words” that represent an image processed by a Vision Transformer (ViT). A key component of the PaLI model is reuse, in which we seed the model with weights from previously-trained uni-modal vision and language models, such as mT5-XXL and large ViTs. This reuse not only enables the transfer of capabilities from uni-modal training, but also saves computational cost.
Dataset: Language-Image Understanding in 100+ Languages
Scaling studies for deep learning show that larger models require larger datasets to train effectively. To unlock the potential of language-image pretraining, we construct WebLI, a multilingual language-image dataset built from images and text available on the public web.
WebLI scales up the text language from English-only datasets to 109 languages, which enables us to perform downstream tasks in many languages. The data collection process is similar to that employed by other datasets, e.g. ALIGN and LiT, and enabled us to scale the WebLI dataset to 10 billion images and 12 billion alt-texts.
In addition to annotation with web text, we apply the Cloud Vision API to perform OCR on the images, leading to 29 billion image-OCR pairs. We perform near-deduplication of the images against the train, validation and test splits of 68 common vision and vision-language datasets, to avoid leaking data from downstream evaluation tasks, as is standard in the literature. To further improve the data quality, we score image and alt-text pairs based on their cross-modal similarity, and tune the threshold to keep only 10% of the images, for a total of 1 billion images used for training PaLI.
Sampled images from WebLI associated with multilingual alt-text and OCR. The second image is by jopradier (original), used under the CC BY-NC-SA 2.0 license. Remaining images are also used with permission. |
Statistics of recognized languages from alt-text and OCR in WebLI. |
Image-text pair counts of WebLI and other large-scale vision-language datasets, CLIP, ALIGN and LiT. |
Training Large Language-Image Models
Vision-language tasks require different capabilities and sometimes have diverging goals. Some tasks inherently require localization of objects to solve the task accurately, whereas some other tasks might need a more global view. Similarly, different tasks might require either long or compact answers. To address all of these objectives, we leverage the richness of the WebLI pre-training data and introduce a mixture of pre-training tasks, which prepare the model for a variety of downstream applications. To accomplish the goal of solving a wide variety of tasks, we enable knowledge-sharing between multiple image and language tasks by casting all tasks into a single generalized API (input: image + text; output: text), which is also shared with the pretraining setup. The objectives used for pre-training are cast into the same API as a weighted mixture aimed at both maintaining the ability of the reused model components and training the model to perform new tasks (e.g., split-captioning for image description, OCR prediction for scene-text comprehension, VQG and VQA prediction).
The model is trained in JAX with Flax using the open-sourced T5X and Flaxformer framework. For the visual component, we introduce and train a large ViT architecture, named ViT-e, with 4B parameters using the open-sourced BigVision framework. ViT-e follows the same recipe as the ViT-G architecture (which has 2B parameters). For the language component, we concatenate the dense token embeddings with the patch embeddings produced by the visual component, together as the input to the multimodal encoder-decoder, which is initialized from mT5-XXL. During the training of PaLI, the weights of this visual component are frozen, and only the weights of the multimodal encoder-decoder are updated.
Results
We compare PaLI on common vision-language benchmarks that are varied and challenging. The PaLI model achieves state-of-the-art results on these tasks, even outperforming very large models in the literature. For example, it outperforms the Flamingo model, which is several times larger (80B parameters), on several VQA and image-captioning tasks, and it also sustains performance on challenging language-only and vision-only tasks, which were not the main training objective.
PaLI (17B parameters) outperforms the state-of-the-art approaches (including SimVLM, CoCa, GIT2, Flamingo, BEiT3) on multiple vision-and-language tasks. In this plot we show the absolute score differences compared with the previous best model to highlight the relative improvements of PaLI. Comparison is on the official test splits when available. CIDEr score is used for evaluation of the image captioning tasks, whereas VQA tasks are evaluated by VQA Accuracy. |
<!–
PaLI (17B parameters) outperforms the state-of-the-art approaches (including SimVLM, CoCa, GIT2, Flamingo, BEiT3) on multiple vision-and-language tasks. In this plot we show the absolute score differences compared with the previous best model to highlight the relative improvements of PaLI. Comparison is on the official test splits when available. CIDEr score is used for evaluation of the image captioning tasks, whereas VQA tasks are evaluated by VQA Accuracy. |
–>
Model Scaling Results
We examine how the image and language model components interact with each other with regards to model scaling and where the model yields the most gains. We conclude that scaling both components jointly results in the best performance, and specifically, scaling the visual component, which requires relatively few parameters, is most essential. Scaling is also critical for better performance across multilingual tasks.
Scaling both the language and the visual components of the PaLI model contribute to improved performance. The plot shows the score differences compared to the PaLI-3B model: CIDEr score is used for evaluation of the image captioning tasks, whereas VQA tasks are evaluated by VQA Accuracy. |
Model Introspection: Model Fairness, Biases, and Other Potential Issues
To avoid creating or reinforcing unfair bias within large language and image models, important first steps are to (1) be transparent about the data that were used and how the model used those data, and (2) test for model fairness and conduct responsible data analyses. To address (1), our paper includes a data card and model card. To address (2), the paper includes results of demographic analyses of the dataset. We consider this a first step and know that it will be important to continue to measure and mitigate potential biases as we apply our model to new tasks, in alignment with our AI Principles.
Conclusion
We presented PaLI, a scalable multi-modal and multilingual model designed for solving a variety of vision-language tasks. We demonstrate improved performance across visual-, language- and vision-language tasks. Our work illustrates the importance of scale in both the visual and language parts of the model and the interplay between the two. We see that accomplishing vision and language tasks, especially in multiple languages, actually requires large scale models and data, and will potentially benefit from further scaling. We hope this work inspires further research in multi-modal and multilingual models.
Acknowledgements
We thank all the authors who conducted this research Soravit (Beer) Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari,Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut. We also thank Claire Cui, Slav Petrov, Tania Bedrax-Weiss, Joelle Barral, Tom Duerig, Paul Natsev, Fernando Pereira, Jeff Dean, Jeremiah Harmsen, Zoubin Ghahramani, Erica Moreira, Victor Gomes, Sarah Laszlo, Kathy Meier-Hellstern, Susanna Ricco, Rich Lee, Austin Tarango, Emily Denton, Bo Pang, Wei Li, Jihyung Kil, Tomer Levinboim, Julien Amelot, Zhenhai Zhu, Xiangning Chen, Liang Chen, Filip Pavetic, Daniel Keysers, Matthias Minderer, Josip Djolonga, Ibrahim Alabdulmohsin, Mostafa Dehghani, Yi Tay, Elizabeth Adkison, James Cockerille, Eric Ni, Anna Davies, and Maysam Moussalem for their suggestions, improvements and support. We thank Tom Small for providing visualizations for the blogpost.
Use Amazon SageMaker Data Wrangler for data preparation and Studio Labs to learn and experiment with ML
Amazon SageMaker Studio Lab is a free machine learning (ML) development environment based on open-source JupyterLab for anyone to learn and experiment with ML using AWS ML compute resources. It’s based on the same architecture and user interface as Amazon SageMaker Studio, but with a subset of Studio capabilities.
When you begin working on ML initiatives, you need to perform exploratory data analysis (EDA) or data preparation before proceeding with model building. Amazon SageMaker Data Wrangler is a capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for ML applications via a visual interface. Data Wrangler reduces the time it takes to aggregate and prepare data for ML from weeks to minutes.
A key accelerator of feature preparation in Data Wrangler is the Data Quality and Insights Report. This report checks data quality and helps detect abnormalities in your data, so that you can perform the required data engineering to fix your dataset. You can use the Data Quality and Insights Report to perform an analysis of your data to gain insights into your dataset such as the number of missing values and number of outliers. If you have issues with your data, such as target leakage or imbalance, the insights report can bring those issues to your attention and help you identify the data preparation steps you need to perform.
Studio Lab users can benefit from Data Wrangler because data quality and feature engineering are critical for the predictive performance of your model. Data Wrangler helps with data quality and feature engineering by giving insights into data quality issues and easily enabling rapid feature iteration and engineering using a low-code UI.
In this post, we show you how to perform exploratory data analysis, prepare and transform data using Data Wrangler, and export the transformed and prepared data to Studio Lab to carry out model building.
Solution overview
The solution includes the following high-level steps:
- Create AWS account and admin user. This is a prerequisite
- Download the dataset churn.csv.
- Load the dataset to Amazon Simple Storage Service (Amazon S3).
- Create a SageMaker Studio domain and launch Data Wrangler.
- Import the dataset into the Data Wrangler flow from Amazon S3.
- Create the Data Quality and Insights Report and draw conclusions on necessary feature engineering.
- Perform the necessary data transforms in Data Wrangler.
- Download the Data Quality and Insights Report and the transformed dataset.
- Upload the data to a Studio Lab project for model training.
The following diagram illustrates this workflow.
Prerequisites
To use Data Wrangler and Studio Lab, you need the following prerequisites:
- Studio Lab – For onboarding information, refer to Amazon SageMaker Studio Lab, a Free Service to Learn and Experiment with ML.
- An AWS account – If you don’t have an AWS account, you can create and activate a new AWS account.
- An IAM user with SageMaker permissions – For instructions on creating an AWS Identity and Access Management (IAM) user, refer to Set Up Amazon SageMaker Prerequisites.
- A Studio domain – For instructions, refer to Onboard to Amazon SageMaker Domain. For more information about the domain environment, see Amazon SageMaker Machine Learning Environments.
- A dataset – You can bring your own dataset or experiment with Data Wrangler using the churn.csv dataset used in this post. This is a synthetic dataset from a telecommunications mobile phone carrier for customer churn prediction.
-
Access to an S3 bucket – You can use the SageMaker default bucket (
sagemaker-{region}-{account_id}
), or create your own S3 bucket.
Build a data preparation workflow with Data Wrangler
To get started, complete the following steps:
- Upload your dataset to Amazon S3.
- On the SageMaker console, under Control panel in the navigation pane, choose Studio.
- On the Launch app menu next to your user profile, choose Studio.
After you successfully log in to Studio, you should see a development environment like the following screenshot. - To create a new Data Wrangler workflow, on the File menu, choose New, then choose Data Wrangler Flow.
The first step in Data Wrangler is to import your data. You can import data from multiple data sources, such as Amazon S3, Amazon Athena, Amazon Redshift, Snowflake, and Databricks. In this example, we use Amazon S3.If you just want to see how Data Wrangler works, you can always choose Use sample dataset. - Choose Import data.
- Choose Amazon S3.
- Choose the dataset you uploaded and choose Import.
Data Wrangler enables you to either import the entire dataset or sample a portion of it. - To quickly get insights on the dataset, choose First K for Sampling and enter 50000 for Sample size.
Understand data quality and get insights
Let’s use the Data Quality and Insights Report to perform an analysis of the data that we imported into Data Wrangler. You can use the report to understand what steps you need to take to clean and process your data. This report provides information such as the number of missing values and the number of outliers. If you have issues with your data, such as target leakage or imbalance, the insights report can bring those issues to your attention.
- Choose the plus sign next to Data types and choose Get data insights.
- For Analysis type, choose Data Quality and Insights Report.
- For Target column, choose Churn?.
- For Problem type¸ select Classification.
- Choose Create.
You’re presented with a detailed report that you can review and download. The report includes several sections such as quick model, feature summary, feature correlation, and data insights. The following screenshots provide examples of these sections.
Observations from the report
From the report, we can make the following observations:
- No duplicate rows were found.
- The
State
column appears to be quite evenly distributed, so the data is balanced in terms of state population. - The
Phone
column presents too many unique values to be of any practical use. Too many unique values make this column not useful. We can drop thePhone
column in our transformation. - Based on feature correlation section of the report,
Mins
andCharge
are highly correlated. We can remove one of them.
Transformation
Based on our observations, we want to make the following transformations:
- Remove the
Phone
column because it has many unique values. - We also see several features that essentially have 100% correlation with one another. Including these feature pairs in some ML algorithms can create undesired problems, whereas in others it will only introduce minor redundancy and bias. Let’s remove one feature from each of the highly correlated pairs:
Day Charge
from the pair withDay Mins
,Night Charge
from the pair withNight Mins
, andIntl Charge
from the pair withIntl Mins
. - Convert
True
orFalse
in theChurn
column to be a numerical value of 1 or 0.
- Return to the data flow and choose the plus sign next to Data types.
- Choose Add transform.
- Choose Add step.
- You can search for the transform you looking for (in our case, manage columns).
- Choose Manage columns.
- For Transform¸ choose Drop column.
- For Columns to drop¸ choose
Phone
,Day Charge
,Eve Charge
,Night Charge
, andIntl Charge
. - Choose Preview, then choose Update.
Let’s add another transform to perform a categorical encode on theChurn?
column. - Choose the transform Encode categorical.
- For Transform, choose Ordinal encode.
- For Input columns, choose the
Churn?
column. - For Invalid handling strategy, choose Replace with NaN.
- Choose Preview, then choose Update.
Now True
and False
are converted to 1 and 0, respectively.
Now that we have a good understand of the data and have prepared and transformed the data for model building, we can move the data to Studio Lab for model building.
Upload the data to Studio Lab
To start using the data in Studio Lab, complete the following steps:
- Choose Export data to export to an S3 bucket.
- For Amazon S3 location, enter your S3 path.
- Specify the file type.
- Choose Export data.
- After you export the data, you can download the data from the S3 bucket to your local computer.
- Now you can go to Studio Lab and upload the file to Studio Lab.
Alternatively, you can connect to Amazon S3 from Studio Lab. For more information, refer to Use external resources in Amazon SageMaker Studio Lab.
- Let’s install SageMaker and import Pandas.
- Import all libraries as required.
- Now we can read the CSV file.
- Let’s print
churn
to confirm the dataset is correct.
Now that you have the processed dataset in Studio Lab, you can carry out further steps required for model building.
Data Wrangler pricing
You can perform all the steps in this post for EDA or data preparation within Data Wrangler and pay for the simple instance, jobs, and storage pricing based on usage or consumption. No upfront or licensing fees are required.
Clean up
When you’re not using Data Wrangler, it’s important to shut down the instance on which it runs to avoid incurring additional fees. To avoid losing work, save your data flow before shutting Data Wrangler down.
- To save your data flow in Studio, choose File, then choose Save Data Wrangler Flow.
Data Wrangler automatically saves your data flow every 60 seconds. - To shut down the Data Wrangler instance, in Studio, choose Running Instances and Kernels.
- Under RUNNING APPS, choose the shutdown icon next to the
sagemaker-data-wrangler-1.0 app
. - Choose Shut down all to confirm.
Data Wrangler runs on an ml.m5.4xlarge instance. This instance disappears from RUNNING INSTANCES when you shut down the Data Wrangler app.
After you shut down the Data Wrangler app, it has to restart the next time you open a Data Wrangler flow file. This can take a few minutes.
Conclusion
In this post, we saw how you can gain insights into your dataset, perform exploratory data analysis, prepare and transform data using Data Wrangler within Studio, and export the transformed and prepared data to Studio Lab and carry out model building and other steps.
With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface.
About the authors
Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about the cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.
Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with a passion to design, create and promote human-centered Data and Analytics experiences. He supports AWS Strategic customers on their transformation towards data driven organization.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Scaling multilingual virtual assistants to 1,000 languages
Self-supervised training, distributed training, and knowledge distillation have delivered remarkable results, but they’re just the tip of the iceberg.Read More
Our commitment on using AI to accelerate progress on global development goals
I joined Google earlier this year to lead a new function: Technology & Society. Our aim is to help connect research, people and ideas across Google to shape the future of our technology innovations and their impact on society for the better. A key area of focus is AI, a field I have studied and immersed myself in over the years. I recently met with a team at the Google AI Center in Ghana that is using advanced technology to address an ancient problem: detecting locust outbreaks which threaten food security and livelihoods for millions of people. And in India and Bangladesh, our Crisis Response teams are using our machine-learning-based forecasting to provide over 360 million people with alerts about upcoming floods.
Efforts like these make me optimistic about how AI can contribute to solving societal problems. They also reinforce how high the stakes are for people everywhere, especially as global forces threaten the progress we’ve made on health, prosperity and environmental issues.
AI for the Global Goals
As the United Nations General Assembly begins, the world will come together to discuss issues of global importance, including assessing progress towards the Sustainable Development Goals (SDGs) which provide a roadmap on economic growth, social inclusion and environmental protection. While it’s clear the global community has made significant strides in meeting the 17 interlinked goals since their adoption by 193 countries, challenges persist in every country. Currently, no country is on track to meet all the goals by 2030.
From the launch of the SDGs in 2015, Google has believed in their importance and looked for ways to support progress. We know that advanced technology, such as AI, can be a powerful tool in advancing these goals. Research that I co-led before joining Google found AI could contribute to progress on all the SDGs — a finding confirmed by the UN. In 2018 Google launched AI for Social Good, focusing applied research and grantmaking efforts on some of the most intractable issues. But we know more needs to be done.
So today we’re expanding our efforts with AI for the Global Goals, which will bring together research, technology and funding to accelerate progress on the SDGs. This commitment will include $25 million to support NGOs and social enterprises working with AI to accelerate progress towards these goals. Based on what we’ve learned so far, we believe that with the AI capabilities and financial support we will provide, grantees can cut in half the time or cost to achieve their goals. In addition to funding, where appropriate, we’ll provide Google.org Fellowships, where teams of Google employees work alongside organizations for up to six months. Importantly, projects will be open-sourced so other organizations can build on the work. All of Google’s work and contributions will be guided by our Responsible AI Principles.
Since 2018, we’ve been focusing applied research and grantmaking efforts on some of the most intractable issues with over 50 organizations in countries ranging from Japan to Kenya to Brazil. We’ve supported organizations making progress on emissions monitoring, antimicrobial image analysis and mental health for LGBTQ+ youth. Working side-by-side with these organizations has shown us the creative ways a thriving ecosystem of companies, nonprofits and universities can use AI. We think we can use the same model to help countries make progress on the SDGs.
A critical time for global progress
COVID-19, global conflict, and climate change have set us back. Fewer people have the opportunity to move out of poverty, inequitable access to healthcare and education continues, gender inequality persists, and environmental threats pose immediate and long-term risks. We know that AI and other advanced technology can help tackle these setbacks. For example, in a significant development for biology and human health, DeepMind used AI to predict 200 million protein structures. They open-sourced the structures in partnership with EMBL-EBI, giving over 500,000 biologists tools to accelerate work on drug discovery, treatment and therapies — thereby making it possible to tackle many of the world’s neglected diseases.
As someone who has spent the last several decades working at the nexus of technology and societal good, it matters deeply that progress here will benefit communities everywhere. No single organization alone will develop and deploy all the solutions we’ll need; we all need to do our part. We’re looking forward to continuing to partner with experts around the world and learning what we can accomplish together.
GFN Thursday Delivers Seven New Games This Week
TGIGFNT: thank goodness it’s GFN Thursday. Start your weekend early with seven new games joining the GeForce NOW library of over 1,400 titles.
Whether it’s streaming on an older-than-the-dinosaurs PC, a Mac that normally couldn’t dream of playing PC titles, or mobile devices – it’s all possible to play your way thanks to GeForce NOW.
Get Right Into the Gaming
Test your tactical skills in the new authentic WW1 first person shooter, Isonzo.
Battle among the scenic peaks, rugged valleys and idyllic towns of northern Italy. Choose from six classes based on historical combat roles and build a loadout from a selection of weapons, equipment and perks linked to that class. Shape a dynamic battlefield by laying sandbags and wire, placing ammo crates, deploying trench periscopes or sniper shields, and more.
Lead to charge to victory in this game and six more this week, including:
- Isonzo (New release on Steam and Epic Games Store)
- Little Orpheus (New release on Steam and Epic Games Store)
- Q.U.B.E. 10th Anniversary (New release on Steam)
- Metal: Hellsinger (New release on Steam, Sept. 15)
- Animal Shelter (Steam)
- Spirit of the North (Epic Games Store)
- Startup Company (Returning to GeForce NOW, Steam)
Members can also discover impressive new prehistoric species with the Jurassic World Evolution 2: Late Cretaceous Pack DLC, available on GeForce NOW this week.
Inspired by the fascinating Late Cretaceous period, this pack includes four captivating species that roamed the land, sea and air over 65 million years ago from soaring, stealthy hunters of the skies to one of the largest dinosaurs ever discovered.
Finally, kick off the weekend by telling us about a game that you love on Twitter or in the comments below.
what’s a game you didn’t expect to like, but now you love?
— NVIDIA GeForce NOW (@NVIDIAGFN) September 14, 2022
The post GFN Thursday Delivers Seven New Games This Week appeared first on NVIDIA Blog.
Announcing the winners of the Dynabench Data Collection and Benchmarking Platform request for proposals
Announcing the winners of the Dynabench Data Collection and Benchmarking Platform request for proposalsRead More
Announcing Visual Conversation Builder for Amazon Lex
Amazon Lex is a service for building conversational interfaces using voice and text. Amazon Lex provides high-quality speech recognition and language understanding capabilities. With Amazon Lex, you can add sophisticated, natural language bots to new and existing applications. Amazon Lex reduces multi-platform development efforts, allowing you to easily publish your speech or text chatbots to mobile devices and multiple chat services, like Facebook Messenger, Slack, Kik, or Twilio SMS.
Today, we added a Visual Conversation Builder (VCB) to Amazon Lex—a drag-and-drop conversation builder that allows users to interact and define bot information by manipulating visual objects. These are used to design and edit conversation flows in a no-code environment. There are three main benefits of the VCB:
- It’s easier to collaborate through a single pane of glass
- It simplifies conversational design and testing
- It reduces code complexity
In this post, we introduce the VCB, how to use it, and share customer success stories.
Overview of the Visual Conversation Builder
In addition to the already available menu-based editor and Amazon Lex APIs, the visual builder gives a single view of an entire conversation flow in one location, simplifying bot design and reducing dependency on development teams. Conversational designers, UX designers, and product managers—anyone with an interest in building a conversation on Amazon Lex—can utilize the builder.
Designers and developers can now collaborate and build conversations easily in the VCB without coding the business logic behind the conversation. The visual builder helps accelerate time to market for Amazon Lex-based solutions by providing better collaboration, easier iterations of the conversation design, and reduced code complexity.
With the visual builder, it’s now possible to quickly view the entire conversation flow of the intent at a glance and get visual feedback as changes are made. Changes to your design are instantly reflected in the view, and any effects to dependencies or branching logic is immediately apparent to the designer. You can use the visual builder to make any changes to the intent, such as adding utterances, slots, prompts, or responses. Each block type has its own settings that you can configure to tailor the flow of the conversation.
Previously, complex branching of conversations required implementation of AWS Lambda—a serverless, event-driven compute service—to achieve the desired pathing. The visual builder reduces the need for Lambda integrations, and designers can perform conversation branching without the need for Lambda code, as shown in the following example. This helps to decouple conversation design activities from Lambda business logic and integrations. You can still use the existing intent editor in conjunction with the visual builder, or switch between them at any time when creating and modifying intents.
The VCB is a no-code method of designing complex conversations. For example, you can now add a confirmation prompt in an intent and branch based on a Yes or No response to different paths in the flow without code. Where future Lambda business logic is needed, conversation designers can add placeholder blocks into the flow so developers know what needs to be addressed through code. Code hook blocks with no Lambda functions attached automatically take the Success pathway so testing of the flow can continue until the business logic is completed and implemented. In addition to branching, the visual builder offers designers the ability to go to another intent as part of the conversation flow.
Upon saving, VCB automatically scans the build to detect any errors in the conversation flow. In addition, the VCB auto-detects missing failure paths and provides the capability to auto-add those paths into the flow, as shown in the following example.
Using the Visual Conversation Builder
You can access the VCB via the Amazon Lex console by going to a bot and editing or creating a new intent. On the intent page, you can now switch between the visual builder interface and the traditional intent editor, as shown in the following screenshot.
For the intent, the visual builder shows what has already been designed in a visual layout, whereas new intents start with a blank canvas. The visual builder displays existing intents graphically on the canvas. For new intents, you start with a blank canvas and simply drag the components you want to add onto the canvas and begin connecting them together to create the conversation flow.
The visual builder has three main components: blocks, ports, and edges. Let’s get into how these are used in conjunction to create a conversation from beginning to end within an intent.
The basic building unit of a conversation flow is called a block. The top menu of the visual builder contains all the blocks you are able to use. To add a block to a conversation flow, drag it from the top menu onto the flow.
Each block has a specific functionality to handle different use cases of a conversation. The currently available block types are as follows:
- Start – The root or first block of the conversation flow that can also be configured to send an initial response
- Get slot value – Tries to elicit a value for a single slot
- Condition – Can contain up to four custom branches (with conditions) and one default branch
- Dialog code hook – Handles invocation of the dialog Lambda function and includes bot responses based on dialog Lambda functions succeeding, failing, or timing out
- Confirmation – Queries the customer prior to fulfillment of the intent and includes bot responses based on the customer saying yes or no to the confirmation prompt
- Fulfillment – Handles fulfillment of the intent and can be configured to invoke Lambda functions and respond with messages if fulfillment succeeds or fails
- Closing response – Allows the bot to respond with a message before ending the conversation
- Wait for user input – Captures input from the customer and switches to another intent based on the utterance
- End conversation – Indicates the end of the conversation flow
Take the Order Flowers bot as an example. The OrderFlowers
intent, when viewed in the visual builder, uses five blocks: Start, three different Get slot value blocks, and Confirmation.
Each block can contain one more ports, which are used to connect one block to another. Blocks contain an input port and one or more output ports based on desired paths for states such a success, timeout, and error.
The connection between the output port of one block and the input port of another block is referred to as an edge.
In the OrderFlowers
intent, when the conversation starts, the Start output port is connected to the Get slot value: FlowerType input port using an edge. Each Get slot value block is connected using ports and edges to create a sequence in the conversation flow, which ensures the intent has all the slot values it needs to put in the order.
Notice that currently there is no edge connected to the failure output port of these blocks, but the builder will automatically add these if you choose Save intent and then choose Confirm in the pop-up Auto add block and edges for failure paths. The visual builder then adds an End conversation block and a Go to intent block, connecting the failure and error output ports to Go to intent and connecting the Yes/No ports of the Confirmation block to End conversation.
After the builder adds the blocks and edges, the intent is saved and the conversation flow can be built and tested. Let’s add a Welcome intent to the bot using the visual builder. From the OrderFlowers
intent visual builder, choose Back to intents list in the navigation pane. On the Intents page, choose Add intent followed by Add empty intent. In the Intent name field, enter Welcome
and choose Add.
Switch to the Visual builder tab and you will see an empty intent, with only the Start block currently on the canvas. To start, add some utterances to this intent so that the bot will be able to direct users to the Welcome intent. Choose the edit button of the Start block and scroll down to Sample utterances. Add the following utterances to this intent and then close the block:
- Can you help me?
- Hi
- Hello
- I need help
Now let’s add a response for the bot to give when it hits this intent. Because the Welcome intent won’t be processing any logic, we can drag a Closing response block into the canvas to add this message. After you add the block, choose the edit icon on the block and enter the following response:
The canvas should now have two blocks, but they aren’t connected to each other. We can connect the ports of these two blocks using an edge.
To connect the two ports, simply click and drag from the No response output port of the Start block to the input port of the Closing response block.
At this point, you can complete the conversation flow in two different ways:
- First, you can manually add the End conversation block and connect it to the Closing response block.
- Alternatively, choose Save intent and then choose Confirm to have the builder create this block and connection for you.
After the intent is saved, choose Build and wait for the build to complete, then choose Test.
The bot will now properly greet the customer if an utterance matches this newly created intent.
Customer stories
NeuraFlash is an Advanced AWS Partner with over 40 collective years of experience in the voice and automation space. With a dedicated team of Conversational Experience Designers, Speech Scientists, and AWS developers, NeuraFlash helps customers take advantage of the power of Amazon Lex in their contact centers.
“One of our key focus areas is helping customers leverage AI capabilities for developing conversational interfaces. These interfaces often require specialized bot configuration skills to build effective flows. With the Visual Conversation Builder, our designers can quickly and easily build conversational interfaces, allowing them to experiment at a faster rate and deliver quality products for our customers without requiring developer skills. The drag-and-drop UI and the visual conversation flow is a game-changer for reinventing the contact center experience.”
The SmartBots ML-powered platform lies at the core of the design, prototyping, testing, validating, and deployment of AI-driven chatbots. This platform supports the development of custom enterprise bots that can easily integrate with any application—even an enterprise’s custom application ecosystem.
“The Visual Conversation Builder’s easy-to-use drag-and-drop interface enables us to easily onboard Amazon Lex, and build complex conversational experiences for our customers’ contact centers. With this new functionality, we can improve Interactive Voice Response (IVR) systems faster and with minimal effort. Implementing new technology can be difficult with a steep learning curve, but we found that the drag-and-drop features were easy to understand, allowing us to realize value immediately.“
Conclusion
The Visual Conversation Builder for Amazon Lex is now generally available, for free, in all AWS Regions where Amazon Lex V2 operates.
Additionally, on August 17, 2022, Amazon Lex V2 released a change to the way conversations are managed with the user. This change gives you more control over the path that the user takes through the conversation. For more information, see Understanding conversation flow management. Note that bots created before August 17, 2022, do not support the VCB for creating conversation flows.
To learn more, see Amazon Lex FAQs and the Amazon Lex V2 Developer Guide. Please send feedback to AWS re:Post for Amazon Lex or through your usual AWS support contacts.
About the authors
Thomas Rindfuss is a Sr. Solutions Architect on the Amazon Lex team. He invents, develops, prototypes, and evangelizes new technical features and solutions for Language AI services that improves the customer experience and eases adoption.
Austin Johnson is a Solutions Architect at AWS , helping customers on their cloud journey. He is passionate about building and utilizing conversational AI platforms to add sophisticated, natural language interfaces to their applications.