Amazon Polly NTTS voices now available in Singapore, Tokyo, Frankfurt, and London Regions

Amazon Polly NTTS voices now available in Singapore, Tokyo, Frankfurt, and London Regions

Amazon Polly turns text into lifelike speech, allowing you to create voice-enabled applications. We’re excited to announce the general availability of all Neural Text-to-Speech (NTTS) voices in Asia Pacific (Singapore), Asian Pacific (Tokyo), EU (Frankfurt), and EU (London) Regions. You can now synthesize over 14 NTTS voices in these Regions, including the Newscaster and Conversational speaking styles. In addition, you can continue to synthesize the over 60 standard voices available in 29 languages in the Amazon Polly portfolio.

Learn how our customers are using Amazon Polly voices to build new categories of speech-enabled products, including voicing news content, games, eLearning platforms, telephony applications, accessibility applications, Internet of Things (IoT), and more.

Amazon Polly voices are high quality, cost-effective, and ensure fast responses, making it a viable option for low-latency use cases.

Amazon Polly also supports SSML tags, which give you additional control over your speech output.

For more information, see the Amazon Polly Developer Guide and the full list of text-to-speech voices, and log in to the Amazon Polly console to try them out!


About the Author

Ankit Dhawan is a Senior Product Manager for Amazon Polly, a technology enthusiast, and a huge Liverpool FC fan. When not working on delighting our customers, you will find him exploring the Pacific Northwest with his wife and dog. He is an eternal optimist, and loves reading biographies and playing poker. You can indulge him in a conversation on technology, entrepreneurship, or soccer anytime of the day.

 

Read More

Announcing the winners of the Facebook Reality Labs Liquid Crystal research awards

This past May, we launched the Facebook Reality Labs Liquid Crystal (LC) research awards with the goal of encouraging young generations in the LC field and in other cross-disciplinary fields to explore the possibilities of LC technology in the AR/VR field. In partnership with the International Liquid Crystal Society, we invited graduate students and postdocs (who have graduated in the last three years) to apply for this award.
View RFPWe received applications from 14 research institutions. The submitted applications covered emerging research in the following areas: novel liquid crystal materials, fast response time liquid crystal displays, polarization holograms, Pancharatnam Berry Phase optics, polarization optics, and applications of LC technologies for improving AR/VR optics and display systems.

Taking into consideration the originality and novelty of the resulting research, along with its potential impact in the AR/VR field, we selected six extraordinary researchers from six different institutions. Thank you to everyone who took the time to submit an application, and congratulations to the winners.

Research award recipients

Diamond award

Tilted chiral liquid crystal gratings for efficient large-angle diffraction
Inge Nys, Ghent University

Platinum award

Improving near-eye display resolution by polarization multiplexing; high-resolution additive light field near-eye display by switchable PBP lenses; polarization-independent PBP lens system
Tao Zhan, University of Central Florida

Augmented reality with image registration, vision correction and sunlight readability via liquid crystal devices
Yu-Jen Wang, National Chiao Tung University

Gold award

Polarizing beam splitter cube for circularly and elliptically polarized light
Sawyer Miller, University of Arizona

Reconfigurable and spatially programmable chameleon skin-like material utilizing light responsive covalent adaptable cholesteric liquid crystal elastomers
Alina Martinez, University of Colorado Boulder

Reversible circularly polarized reflection in a self-organized helical superstructure enabled by a visible-light-driven axially chiral molecular switch
Hao Wang, Kent State University

The post Announcing the winners of the Facebook Reality Labs Liquid Crystal research awards appeared first on Facebook Research.

Read More

The Technology Behind our Recent Improvements in Flood Forecasting

The Technology Behind our Recent Improvements in Flood Forecasting

Posted by Sella Nevo, Senior Software Engineer, Google Research, Tel Aviv

Flooding is the most common natural disaster on the planet, affecting the lives of hundreds of millions of people around the globe and causing around $10 billion in damages each year. Building on our work in previous years, earlier this week we announced some of our recent efforts to improve flood forecasting in India and Bangladesh, expanding coverage to more than 250 million people, and providing unprecedented lead time, accuracy and clarity.

To enable these breakthroughs, we have devised a new approach for inundation modeling, called a morphological inundation model, which combines physics-based modeling with machine learning (ML) to create more accurate and scalable inundation models in real-world settings. Additionally, our new alert-targeting model allows identifying areas at risk of flooding at unprecedented scale using end-to-end machine learning models and data that is publicly available globally. In this post, we also describe developments for the next generation of flood forecasting systems, called HydroNets (presented at ICLR AI for Earth Sciences and EGU this year), which is a new architecture specially built for hydrologic modeling across multiple basins, while still optimizing for accuracy at each location.

Forecasting Water Levels
The first step in a flood forecasting system is to identify whether a river is expected to flood. Hydrologic models (or gauge-to-gauge models) have long been used by governments and disaster management agencies to improve the accuracy and extend the lead time of their forecasts. These models receive inputs like precipitation or upstream gauge measurements of water level (i.e., the absolute elevation of the water above sea level) and output a forecast for the water level (or discharge) in the river at some time in the future.

The hydrologic model component of the flood forecasting system described in this week’s Keyword post doubled the lead time of flood alerts for areas covering more than 75 million people. These models not only increase lead time, but also provide unprecedented accuracy, achieving an R2 score of more than 99% across all basins we cover, and predicting the water level within a 15 cm error bound more than 90% of the time. Once a river is predicted to reach flood level, the next step in generating actionable warnings is to convert the river level forecast into a prediction for how the floodplain will be affected.

Morphological Inundation Modeling
In prior work, we developed high quality elevation maps based on satellite imagery, and ran physics-based models to simulate water flow across these digital terrains, which allowed warnings with unprecedented resolution and accuracy in data-scarce regions. In collaboration with our satellite partners, Airbus, Maxar and Planet, we have now expanded the elevation maps to cover hundreds of millions of square kilometers. However, in order to scale up the coverage to such a large area while still retaining high accuracy, we had to re-invent how we develop inundation models.

Inundation modeling estimates what areas will be flooded and how deep the water will be. This visualization conceptually shows how inundation could be simulated, how risk levels could be defined (represented by red and white colors), and how the model could be used to identify areas that should be warned (green dots).

Inundation modeling at scale suffers from three significant challenges. Due to the large areas involved and the resolution required for such models, they necessarily have high computational complexity. In addition, most global elevation maps don’t include riverbed bathymetry, which is important for accurate modeling. Finally, the errors in existing data, which may include gauge measurement errors, missing features in the elevation maps, and the like, need to be understood and corrected. Correcting such problems may require collecting additional high-quality data or fixing erroneous data manually, neither of which scale well.

Our new approach to inundation modeling, which we call a morphological model, addresses these issues by using several innovative tricks. Instead of modeling the complex behaviors of water flow in real time, we compute modifications to the morphology of the elevation map that allow one to simulate the inundation using simple physical principles, such as those describing hydrostatic systems.

First, we train a pure-ML model (devoid of physics-based information) to estimate the one-dimensional river profile from gauge measurements. The model takes as input the water level at a specific point on the river (the stream gauge) and outputs the river profile, which is the water level at all points in the river. We assume that if the gauge increases, the water level increases monotonically, i.e., the water level at other points in the river increases as well. We also assume that the absolute elevation of the river profile decreases downstream (i.e., the river flows downhill).

We then use this learned model and some heuristics to edit the elevation map to approximately “cancel out” the pressure gradient that would exist if that region were flooded. This new synthetic elevation map provides the foundation on which we model the flood behavior using a simple flood-fill algorithm. Finally, we match the resulting flooded map to the satellite-based flood extent with the original stream gauge measurement.

This approach abandons some of the realistic constraints of classical physics-based models, but in data scarce regions where existing methods currently struggle, its flexibility allows the model to automatically learn the correct bathymetry and fix various errors to which physics-based models are sensitive. This morphological model improves accuracy by 3%, which can significantly improve forecasts for large areas, while also allowing for much more rapid model development by reducing the need for manual modeling and correction.

Alert targeting
Many people reside in areas that are not covered by the morphological inundation models, yet access to accurate predictions are still urgently needed. To reach this population and to increase the impact of our flood forecasting models, we designed an end-to-end ML-based approach, using almost exclusively data that is globally publicly available, such as stream gauge measurements, public satellite imagery, and low resolution elevation maps. We train the model to use the data it is receiving to directly infer the inundation map in real time.

A direct ML approach from real-time measurements to inundation.

This approach works well “out of the box” when the model only needs to forecast an event that is within the range of events previously observed. Extrapolating to more extreme conditions is much more challenging. Nevertheless, proper use of existing elevation maps and real-time measurements can enable alerts that are more accurate than presently available for those in areas not covered by the more detailed morphological inundation models. Because this model is highly scalable, we were able to launch it across India after only a few months of work, and we hope to roll it out to many more countries soon.

Improving Water Levels Forecasting
In an effort to continue improving flood forecasting, we have developed HydroNets — a specialized deep neural network architecture built specifically for water levels forecasting — which allows us utilize some exciting recent advances in ML-based hydrology in a real-world operational setting. Two prominent features distinguish it from standard hydrologic models. First, it is able to differentiate between model components that generalize well between sites, such as the modeling of rainfall-runoff processes, and those that are specific to a given site, like the rating curve, which converts a predicted discharge volume into an expected water level. This enables the model to generalize well to different sites, while still fine-tuning its performance to each location. Second, HydroNets takes into account the structure of the river network being modeled, by training a large architecture that is actually a web of smaller neural networks, each representing a different location along the river. This allows neural networks that are modeling upstream sites to pass information encoded in embeddings to models of downstream sites, so that every model can know everything it needs without a drastic increase in parameters.

The animation below illustrates the structure and flow of information in HydroNets. The output from the modeling of upstream sub-basins is combined into a single representation of a given basin state. It is then processed by the shared model component, which is informed by all basins in the network, and passed on to the label prediction model, which calculates the water level (and the loss function). The output from this iteration of the network is then passed on to inform downstream models, and so on.

An illustration of the HydroNets architecture.

We’re incredibly excited about this progress, and are working hard on improving our systems further.

Acknowledgements
This work is a collaboration between many research teams at Google, and is part of our AI for Social Good efforts. We’d also like to thank our Geo and Policy teams, as well as Google.org.

Read More

Helping companies prioritize their cybersecurity investments

Helping companies prioritize their cybersecurity investments

One reason that cyberattacks have continued to grow in recent years is that we never actually learn all that much about how they happen. Companies fear that reporting attacks will tarnish their public image, and even those who do report them don’t share many details because they worry that their competitors will gain insight into their security practices. 

“It’s really a nice gift that we’ve given to cyber-criminals,” says Taylor Reynolds, technology policy director at MIT’s Internet Policy Research Initiative (IPRI). “In an ideal world, these attacks wouldn’t happen over and over again, because companies would be able to use data from attacks to develop quantitative measurements of the security risk so that we could prevent such incidents in the future.”

In an economy where most industries are tightening their belts, many organizations don’t know which types of attacks lead to the largest financial losses, and therefore how to best deploy scarce security resources. 

But a new platform from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) aims to change that, quantifying companies’ security risk without requiring them to disclose sensitive data about their systems to the research team, much less their competitors.

Developed by Reynolds alongside economist Andrew Lo and cryptographer Vinod Vaikuntanathan, the platform helps companies do multiple things:

  • quantify how secure they are;
  • understand how their security compares to peers; and
  • evaluate whether they’re spending the right amount of money on security, and if and how they should change their particular security priorities.

The team received internal data from seven large companies that averaged 50,000 employees and annual revenues of $24 billion. By securely aggregating 50 different security incidents that took place at the companies, the researchers were able to analyze which specific steps were not taken that could have prevented them. (Their analysis used a well-established set of nearly 200 security actions referred to as the Center for Internet Security Sub-Controls.) 

“We were able to paint a really thorough picture in terms of which security failures were costing companies the most money,” says Reynolds, who co-authored a related paper with professors Lo and Vaikuntanathan, MIT graduate student Leo de Castro, Principal Research Scientist Daniel J. Weitzner, PhD student Fransisca Susan, and graduate student Nicolas Zhang. “If you’re a chief information security officer at one of these organizations, it can be an overwhelming task to try to defend absolutely everything. They need to know where they should direct their attention.”

The team calls their platform “SCRAM,” for “Secure Cyber Risk Aggregation and Measurement.” Among other findings, they determined that the three following security vulnerabilities had the largest total losses, each in excess of $1 million:

Failures in preventing malware attacks

Malware attacks, like the one last month that reportedly forced the wearables company Garmin to pay a $10 million ransom, are still a tried-and-true method of gaining control of valuable consumer data. Reynolds says that companies continue to struggle to prevent such attacks, relying on regularly backing up their data and reminding their employees not to click on suspicious emails. 

Communication over unauthorized ports 

Curiously, the team found that every firm in their study said they had, in fact, implemented the security measure of blocking access to unauthorized ports — the digital equivalent of companies locking all their doors. Even still, attacks that involved gaining access to these ports accounted for a large number of high-cost losses. 

“Losses can arise even when there are defenses that are well-developed and understood,” says Weitzner, who also serves as director of MIT IPRI. “It’s important to recognize that improving common existing defenses should not be neglected in favor of expanding into new areas of defense.”

Failures in log management for security incidents 

Every day companies amass detailed “logs” denoting activity within their systems. Senior security officers often turn to these logs after an attack to audit the incident and see what happened. Reynolds says that there are many ways that companies could be using machine learning and artificial intelligence more efficiently to help understand what’s happening — including, crucially, during or even before a security attack. 

Two other key areas that warrant further analysis include taking inventory of hardware so that only authorized devices are given access, as well as boundary defenses like firewalls and proxies that aim to control the flow of traffic through network borders. 

The team developed their data aggregation platform in conjunction with MIT cryptography experts, using an existing method called multi-party computation (MPC) that allows them to perform calculations on data without themselves being able to read or unlock it. After computing its anonymized findings, the SCRAM system then asks each contributing company to help it unlock only the answer using their own secret cryptographic key.

“The power of this platform is that it allows firms to contribute locked data that would otherwise be too sensitive or risky to share with a third party,” says Reynolds.

As a next step, the researchers plan to expand the pool of participating companies, with representation from a range of different sectors that include electricity, finance, and biotech. Reynolds says that if the team can gather data from upwards of 70 or 80 companies, they’ll be able to do something unprecedented: put an actual dollar figure on the risk of particular defenses failing.

The project was a cross-campus effort involving affiliates at IPRI, CSAIL’s Theory of Computation group, and the MIT Sloan School of Management. It was funded by the Hewlett Foundation and CSAIL’s Financial Technology industry initiative (“FinTech@CSAIL”). 

Read More

DIY with AI: GTC to Host NVIDIA Deep Learning Institute Courses for Anyone, Anywhere

DIY with AI: GTC to Host NVIDIA Deep Learning Institute Courses for Anyone, Anywhere

The NVIDIA Deep Learning Institute is launching three new courses, which can be taken for the first time ever at the GPU Technology Conference next month. 

The new instructor-led workshops cover fundamentals of deep learning, recommender systems and Transformer-based applications. Anyone connected online can join for a nominal fee, and participants will have access to a fully configured, GPU-accelerated server in the cloud. 

DLI instructor-led trainings consist of hands-on remote learning taught by NVIDIA-certified experts in virtual classrooms. Participants can interact with their instructors and peers in real time. They can whiteboard ideas, tackle interactive coding challenges and earn a DLI certificate of subject competency to support their professional growth.

DLI at GTC is offered globally, with several courses available in Korean, Japanese and Simplified Chinese for attendees in their respective time zones.

New DLI workshops launching at GTC include:

  • Fundamentals of Deep Learning — Build the confidence to take on a deep learning project by learning how to train a model, work with common data types and model architectures, use transfer learning between models, and more.
  • Building Intelligent Recommender Systems — Create different types of recommender systems: content-based, collaborative filtering, hybrid, and more. Learn how to use the open-source cuDF library, Apache Arrow, alternating least squares, CuPy and TensorFlow 2 to do so.
  • Building Transformer-Based Natural Language Processing Applications — Learn about NLP topics like Word2Vec and recurrent neural network-based embeddings, as well as Transformer architecture features and how to improve them. Use pre-trained NLP models for text classification, named-entity recognition and question answering, and deploy refined models for live applications.

Other DLI offerings at GTC will include:

  • Fundamentals of Accelerated Computing with CUDA Python — Dive into how to use Numba to compile NVIDIA CUDA kernels from NumPy universal functions, as well as create and launch custom CUDA kernels, while applying key GPU memory management techniques.
  • Applications of AI for Predictive Maintenance — Leverage predictive maintenance and identify anomalies to manage failures and avoid costly unplanned downtimes, use time-series data to predict outcomes using machine learning classification models with XGBoost, and more.
  • Fundamentals of Accelerated Data Science with RAPIDS — Learn how to use cuDF and Dask to ingest and manipulate large datasets directly on the GPU, applying GPU-accelerated machine learning algorithms including XGBoost, cuGRAPH and cuML to perform data analysis at massive scale.
  • Fundamentals of Accelerated Computing with CUDA C/C++ — Find out how to accelerate CPU-only applications to run their latent parallelism on GPUs, using techniques like essential CUDA memory management to optimize accelerated applications.
  • Fundamentals of Deep Learning for Multi-GPUs — Scale deep learning training to multiple GPUs, significantly shortening the time required to train lots of data and making solving complex problems with deep learning feasible.
  • Applications of AI for Anomaly Detection — Discover how to implement multiple AI-based solutions to identify network intrusions, using accelerated XGBoost, deep learning-based autoencoders and generative adversarial networks.

With more than 2 million registered NVIDIA developers working on technological breakthroughs to solve the world’s toughest problems, the demand for deep learning expertise is greater than ever. The full DLI course catalog includes a variety of topics for anyone interested in learning more about AI, accelerated computing and data science.

Get a glimpse of the DLI experience:

Workshops have limited seating, with the early bird deadline on Sep 25. Register now.

The post DIY with AI: GTC to Host NVIDIA Deep Learning Institute Courses for Anyone, Anywhere appeared first on The Official NVIDIA Blog.

Read More

What Is MLOps?

What Is MLOps?

MLOps may sound like the name of a shaggy, one-eyed monster, but it’s actually an acronym that spells success in enterprise AI.

A shorthand for machine learning operations, MLOps is a set of best practices for businesses to run AI successfully.

MLOps is a relatively new field because commercial use of AI is itself fairly new.

MLOps: Taking Enterprise AI Mainstream

The Big Bang of AI sounded in 2012 when a researcher won an image-recognition contest using deep learning. The ripples expanded quickly.

Today, AI translates web pages and automatically routes customer service calls. It’s helping hospitals read X-rays, banks calculate credit risks and retailers stock shelves to optimize sales.

In short, machine learning, one part of the broad field of AI, is set to become as mainstream as software applications. That’s why the process of running ML needs to be as buttoned down as the job of running IT systems.

Machine Learning Layered on DevOps

MLOps is modeled on the existing discipline of DevOps, the modern practice of efficiently writing, deploying and running enterprise applications. DevOps got its start a decade ago as a way warring tribes of software developers (the Devs) and IT operations teams (the Ops) could collaborate.

MLOps adds to the team the data scientists, who curate datasets and build AI models that analyze them. It also includes ML engineers, who run those datasets through the models in disciplined, automated ways.

MLOps combine machine learning, applications development and IT operations. Source: Neal Analytics

It’s a big challenge in raw performance as well as management rigor. Datasets are massive and growing, and they can change in real time. AI models require careful tracking through cycles of experiments, tuning and retraining.

So, MLOps needs a powerful AI infrastructure that can scale as companies grow. For this foundation, many companies use NVIDIA DGX systems, CUDA-X and other software components available on NVIDIA’s software hub, NGC.

Lifecycle Tracking for Data Scientists

With an AI infrastructure in place, an enterprise data center can layer on the following elements of an MLOps software stack:

  • Data sources and the datasets created from them
  • A repository of AI models tagged with their histories and attributes
  • An automated ML pipeline that manages datasets, models and experiments through their lifecycles
  • Software containers, typically based on Kubernetes, to simplify running these jobs

It’s a heady set of related jobs to weave into one process.

Data scientists need the freedom to cut and paste datasets together from external sources and internal data lakes. Yet their work and those datasets need to be carefully labeled and tracked.

Likewise, they need to experiment and iterate to craft great models well torqued to the task at hand. So they need flexible sandboxes and rock-solid repositories.

And they need ways to work with the ML engineers who run the datasets and models through prototypes, testing and production. It’s a process that requires automation and attention to detail so models can be easily interpreted and reproduced.

Today, these capabilities are becoming available as part of cloud-computing services. Companies that see machine learning as strategic are creating their own AI centers of excellence using MLOps services or tools from a growing set of vendors.

Gartner on ML pipeline
Gartner’s view of the machine-learning pipeline

Data Science in Production at Scale

In the early days, companies such as Airbnb, Facebook, Google, NVIDIA and Uber had to build these capabilities themselves.

“We tried to use open source code as much as possible, but in many cases there was no solution for what we wanted to do at scale,” said Nicolas Koumchatzky, a director of AI infrastructure at NVIDIA.

“When I first heard the term MLOps, I realized that’s what we’re building now and what I was building before at Twitter,” he added.

Koumchatzky’s team at NVIDIA developed MagLev, the MLOps software that hosts NVIDIA DRIVE, our platform for creating and testing autonomous vehicles. As part of its foundation for MLOps, it uses the NVIDIA Container Runtime and Apollo, a set of components developed at NVIDIA to manage and monitor Kubernetes containers running across huge clusters.

Laying the Foundation for MLOps at NVIDIA

Koumchatzky’s team runs its jobs on NVIDIA’s internal AI infrastructure based on GPU clusters called DGX PODs.  Before the jobs start, the infrastructure crew checks whether they are using best practices.

First, “everything must run in a container — that spares an unbelievable amount of pain later looking for the libraries and runtimes an AI application needs,” said Michael Houston, whose team builds NVIDIA’s AI systems including Selene, a DGX SuperPOD recently ranked the most powerful industrial computer in the U.S.

Among the team’s other checkpoints, jobs must:

  • Launch containers with an approved mechanism
  • Prove the job can run across multiple GPU nodes
  • Show performance data to identify potential bottlenecks
  • Show profiling data to ensure the software has been debugged

The maturity of MLOps practices used in business today varies widely, according to Edwin Webster, a data scientist who started the MLOps consulting practice a year ago for Neal Analytics and wrote an article defining MLOps. At some companies, data scientists still squirrel away models on their personal laptops, others turn to big cloud-service providers for a soup-to-nuts service, he said.

Two MLOps Success Stories

Webster shared success stories from two of his clients.

One involves a large retailer that used MLOps capabilities in a public cloud service to create an AI service that reduced waste 8-9 percent with daily forecasts of when to restock shelves with perishable goods. A budding team of data scientists at the retailer created datasets and built models; the cloud service packed key elements into containers, then ran and managed the AI jobs.

Another involves a PC maker that developed software using AI to predict when its laptops would need maintenance so it could automatically install software updates. Using established MLOps practices and internal specialists, the OEM wrote and tested its AI models on a fleet of 3,000 notebooks. The PC maker now provides the software to its largest customers.

Many, but not all, Fortune 100 companies are embracing MLOps, said Shubhangi Vashisth, a senior principal analyst following the area at Gartner. “It’s gaining steam, but it’s not mainstream,” she said.

Vashisth co-authored a white paper that lays out three steps for getting started in MLOps: Align stakeholders on the goals, create an organizational structure that defines who owns what, then define responsibilities and roles — Gartner lists a dozen of them.

Gartner on MLOps which it here calls the machine learning development lifecycle
Gartner refers to the overall MLOps process as the machine learning development lifecycle (MLDLC).

Beware Buzzwords: AIOps, DLOps, DataOps, and More

Don’t get lost in a forest of buzzwords that have grown up along this avenue. The industry has clearly coalesced its energy around MLOps.

By contrast, AIOps is a narrower practice of using machine learning to automate IT functions. One part of AIOps is IT operations analytics, or ITOA. Its job is to examine the data AIOps generate to figure out how to improve IT practices.

Similarly, some have coined the terms DataOps and ModelOps to refer to the people and processes for creating and managing datasets and AI models, respectively. Those are two important pieces of the overall MLOps puzzle.

Interestingly, every month thousands of people search for the meaning of DLOps. They may imagine DLOps are IT operations for deep learning. But the industry uses the term MLOps, not DLOps, because deep learning is a part of the broader field of machine learning.

Despite the many queries, you’d be hard pressed to find anything online about DLOps. By contrast, household names like Google and Microsoft as well as up-and-coming companies like Iguazio and Paperspace have posted detailed white papers on MLOps.

MLOps: An Expanding Software and Services Smorgasbord

Those who prefer to let someone else handle their MLOps have plenty of options.

Major cloud-service providers like Alibaba, AWS and Oracle are among several that offer end-to-end services accessible from the comfort of your keyboard.

For users who spread their work across multiple clouds, DataBricks’ MLFlow supports MLOps services that work with multiple providers and multiple programming languages, including Python, R and SQL. Other cloud-agnostic alternatives include open source software such as Polyaxon and KubeFlow.

Companies that believe AI is a strategic resource they want behind their firewall can choose from a growing list of third-party providers of MLOps software. Compared to open-source code, these tools typically add valuable features and are easier to put into use.

NVIDIA certified products from six of them as part of its DGX-Ready Software program-:

  • Allegro AI
  • cnvrg.io
  • Core Scientific
  • Domino Data Lab
  • Iguazio
  • Paperspace

All six vendors provide software to manage datasets and models that can work with Kubernetes and NGC.

It’s still early days for off-the-shelf MLOps software.

Gartner tracks about a dozen vendors offering MLOps tools including ModelOp and ParallelM now part of DataRobot, said analyst Vashisth. Beware offerings that don’t cover the entire process, she warns. They force users to import and export data between programs users must stitch together themselves, a tedious and error-prone process.

The edge of the network, especially for partially connected or unconnected nodes, is another underserved area for MLOps so far, said Webster of Neal Analytics.

Koumchatzky, of NVIDIA, puts tools for curating and managing datasets at the top of his wish list for the community.

“It can be hard to label, merge or slice datasets or view parts of them, but there is a growing MLOps ecosystem to address this. NVIDIA has developed these internally, but I think it is still undervalued in the industry.” he said.

Long term, MLOps needs the equivalent of IDEs, the integrated software development environments like Microsoft Visual Studio that apps developers depend on. Meanwhile Koumchatzky and his team craft their own tools to visualize and debug AI models.

The good news is there are plenty of products for getting started in MLOps.

In addition to software from its partners, NVIDIA provides a suite of mainly open-source tools for managing an AI infrastructure based on its DGX systems, and that’s the foundation for MLOps. These software tools include:

Many are available on NGC and other open source repositories. Pulling these ingredients into a recipe for success, NVIDIA provides a reference architecture for creating GPU clusters called DGX PODs.

In the end, each team needs to find the mix of MLOps products and practices that best fits its use cases. They all share a goal of creating an automated way to run AI smoothly as a daily part of a company’s digital life.

 

The post What Is MLOps? appeared first on The Official NVIDIA Blog.

Read More

KeyPose: Estimating the 3D Pose of Transparent Objects from Stereo

KeyPose: Estimating the 3D Pose of Transparent Objects from Stereo

Posted by Kurt Konolige, Software Engineer, Robotics at Google

Estimating the position and orientation of 3D objects is one of the core problems in computer vision applications that involve object-level perception, such as augmented reality and robotic manipulation. In these applications, it is important to know the 3D position of objects in the world, either to directly affect them, or to place simulated objects correctly around them. While there has been much research on this topic using machine learning (ML) techniques, especially Deep Nets, most have relied on the use of depth sensing devices, such as the Kinect, which give direct measurements of the distance to an object. For objects that are shiny or transparent, direct depth sensing does not work well. For example, the figure below includes a number of objects (left), two of which are transparent stars. A depth device does not find good depth values for the stars, and gives a very poor reconstruction of the actual 3D points (right).

Left: RGB image of transparent objects.  Right: A four-panel image showing the reconstructed depth for the scene on the left.The top row includes depth images and the bottom row presents the 3D point cloud. The left panels were reconstructed using a depth camera and the right panels are output from the ClearGrasp model.  Note that although ClearGrasp inpaints the depth of the stars, it mistakes the actual depth of the rightmost one.

One solution to this problem, such as that proposed by ClearGrasp, is to use a deep neural network to inpaint the corrupted depth map of the transparent objects. Given a single RGB-D image of transparent objects, ClearGrasp uses deep convolutional networks to infer surface normals, masks of transparent surfaces, and occlusion boundaries, which it uses to refine the initial depth estimates for all transparent surfaces in the scene (far right in the figure above). This approach is very promising, and allows scenes with transparent objects to be processed by pose-estimation methods that rely on depth.  But inpainting can be tricky, especially when trained completely with synthetic images, and can still result in errors in depth.

In “KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects”, presented at CVPR 2020 in collaboration with the Stanford AI Lab, we describe an ML system that estimates the depth of transparent objects by directly predicting 3D keypoints. To train the system we gather a large real-world dataset of images of transparent objects in a semi-automated way, and efficiently label their pose using 3D keypoints selected by hand. We then train deep models (called KeyPose) to estimate the 3D keypoints end-to-end from monocular or stereo images, without explicitly computing depth. The models work on objects both seen and unseen during training, for both individual objects and categories of objects. While KeyPose can work with monocular images, the extra information available from stereo images allows it to improve its results by a factor of two over monocular image input, with typical errors from 5 mm to 10 mm, depending on the objects. It substantially improves over state-of-the-art in pose estimation for these objects, even when competing methods are provided with ground truth depth. We are releasing the dataset of keypoint-labeled transparent objects for use by the research community.

Real-World Transparent Object Dataset with 3D Keypoint Labels
To facilitate gathering large quantities of real-world images, we set up a robotic data-gathering system in which a robot arm moves through a trajectory while taking video with two devices, a stereo camera and the Kinect Azure depth camera.

Automated image sequence capture using a robot arm with a stereo camera and an Azure Kinect device.

The AprilTags on the target enable accurate tracing of the pose of the cameras. By hand-labelling only a few images in each video with 2D keypoints, we can extract 3D keypoints for all frames of the video using multi-view geometry, thus increasing the labelling efficiency by a factor of 100.

We captured imagery for 15 different transparent objects in five categories, using 10 different background textures and four different poses for each object, yielding a total of 600 video sequences comprising 48k stereo and depth images. We also captured the same images with an opaque version of the object, to provide accurate ground truth depth images. All the images are labelled with 3D keypoints. We are releasing this dataset of real-world images publicly, complementing the synthetic ClearGrasp dataset with which it shares similar objects.

KeyPose Algorithm Using Early Fusion Stereo
The idea of using stereo images directly for keypoint estimation was developed independently for this project; it has also appeared recently in the context of hand-tracking. The diagram below shows the basic idea: the two images from a stereo camera are cropped around the object and fed to the KeyPose network, which predicts a sparse set of 3D keypoints that represent the 3D pose of the object. The network is trained using supervision from the labelled 3D keypoints.

One of the key aspects of stereo KeyPose is the use of early fusion to intermix the stereo images, and allow the network to implicitly compute disparity, in contrast to late fusion, in which keypoints are predicted for each image separately, and then combined. As shown in the diagram below, the output of KeyPose is a 2D keypoint heatmap in the image plane along with a disparity (i.e., inverse depth) heatmap for each keypoint. The combination of these two heatmaps yields the 3D coordinate of the keypoint, for each keypoint.

Keypose system diagram. Stereo images are passed to a CNN model to produce a probability heatmap for each keypoint.  This heatmap yields 2D image coordinates U,V for the keypoint.  The CNN model also produces a disparity (inverse depth) heatmap for each keypoint, which when combined with the U,V coordinates, gives a 3D position (X,Y,Z).

When compared to late fusion or to monocular input, early fusion stereo typically is twice as accurate.

Results
The images below show qualitative results of KeyPose on individual objects. On the left is one of the original stereo images; in the middle are the predicted 3D keypoints projected onto the image. On the right, we visualize points from a 3D model of the bottle, placed at the pose determined by the predicted 3D keypoints. The network is efficient and accurate, predicting keypoints with an MAE of 5.2 mm for the bottle and 10.1 mm for the mug using just 5 ms on a standard GPU.

The following table shows results for KeyPose on category-level estimation. The test set used a background texture not seen by the training set. Note that the MAE varies from 5.8 mm to 9.9 mm, showing the accuracy of the method.

Quantitative comparison of KeyPose with the state-of-the-art DenseFusion system, on category-level data. We provide DenseFusion with two versions of depth, one from the transparent objects, and one from opaque objects. <2cm is the percent of estimates with errors less than 2 cm. MAE is the mean absolute error of the keypoints, in mm.

For a complete accounting of quantitative results, as well as, ablation studies, please see the paper and supplementary materials and the KeyPose website.

Conclusion
This work shows that it is possible to accurately estimate the 3D pose of transparent objects from RGB images without reliance on depth images. It validates the use of stereo images as input to an early fusion deep net, where the network is trained to extract sparse 3D keypoints directly from the stereo pair. We hope the availability of an extensive, labelled dataset of transparent objects will help to advance the field. Finally, while we used semi-automatic methods to efficiently label the dataset, we hope to employ self-supervision methods in future work to do away with manual labelling.

Acknowledgements
I want to thank my co-authors, Xingyu Liu of Stanford University, and Rico Jonschkowski and Anelia Angelova; as well the many who helped us through discussions during the project and paper writing, including Andy Zheng, Shuran Song, Vincent Vanhoucke, Pete Florence, and Jonathan Tompson.

Read More

MIT hosts seven distinguished MLK Professors and Scholars for 2020-21

MIT hosts seven distinguished MLK Professors and Scholars for 2020-21

In light of the Covid-19 pandemic, MIT has been charged with reimagining its campus, classes, and programs, including the Dr. Martin Luther King, Jr. (MLK) Visiting Professors and Scholars Program (VPSP).

Founded in 1990, MLK VPSP honors the life and legacy of Martin Luther King, Jr. by increasing the presence of and recognizing the contributions of scholars from underrepresented groups at MIT. MLK Visiting Professors and Scholars enhance their scholarship through intellectual engagement with the MIT community and enrich the cultural, academic, and professional experience of students. The program hosts between four and eight scholars each year. But what does a virtual year mean for a visiting scholar?

Even with the challenge of remote learning and limited in-person contact, MLK VPSP faculty hosts have articulated innovative ways to engage with the MIT community. Moya Bailey, for instance, will be a content contributor for the Program in Women’s and Gender Studies’ website and social media accounts. Charles Senteio will continue to collaborate with the Office of Minority Education on curriculum development that reflects a diverse student population with a focus on health and well-being, and he will also explore remote learning and its impact on curriculum.

With Provost Martin Schmidt’s steadfast institutional support, and with active oversight from Institute Community and Equity Officer John Dozier and Associate Provost Tim Jamison, the MLK VPSP continues to honor King’s legacy and be an institutional priority on campus and online. For Academic Year 2020-2021, MIT is hosting seven accomplished scholars representing different areas of interest from all over the United States and Canada.

2020-2021 MLK Visiting Professors and Scholars

Moya Bailey is an assistant professor at Northeastern University in the Department of Cultures, Societies, and Global Studies and in the program in Women’s, Gender, and Sexuality Studies. In 2010, Bailey coined the term “misogynoir,” widely adopted by scholars, which describes the anti-Black racist misogyny that Black women experience. In the spring, she will teach a course in the MIT Program in Women’s and Gender Studies called Black Feminist Health Science Studies. In April 2021, she will organize and host a daylong Black Feminist Health Science symposium.

Jamie Macbeth joins the program for another year in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) as a valuable member of the Genesis group, a research team mainly focused on building computer systems and computational models of human intelligence based on humans’ capability for understanding natural language. One of Macbeth’s research collaborations involves using computer systems in understanding natural language to detect aggressive language on social media with the eventual goal of violence prevention. He will continue to mentor and collaborate with women and underrepresented groups at the undergraduate, MS, and PhD levels.

Ben McDonald is returning for a second year as a postdoc in the Department of Chemistry. His research focuses on developing designer polymers for chemical warfare-responsive membranes and surfactants to control the function of dynamic, complex soft colloids. His role as a mentor will expand to include both undergraduate and graduate students in the Swager Lab. McDonald will continue to collaborate with Chemistry Alliance for Diversity and Inclusion at MIT to organize and host virtual seminars showcasing the work of underrepresented scholars of color in the fields of chemistry and chemical engineering.

Luis Gilberto Murillo-Urrutia, a research fellow hosted by the Environmental Solutions Initiative (ESI), joins us from the Center for Latin America and Latino Studies at American University. His research focuses on the intersection of peace and security with environmental conservation, particularly in Afro-Colombian territories. During his visit, Murillo-Urrutia will hold mentorship sessions at ESI for students conducting research on environmental planning and policy or with a minor in environment and sustainability.

Thomas Searles, recently promoted to associate professor with tenure, is visiting from the Department of Physics at Howard University. While at MIT, he will pursue numerical studies of topological materials for photonic and quantum technological applications. He will mentor students from his lab, the Black Students Union, National Society of Black Engineers, and the Black Graduate Student Association. Searles plans to meet with the MIT physics graduate admissions committee to formulate recruitment strategies with his home and other historically Black colleges and universities.

Charles Senteio joins the program from Rutgers University School of Communication and Information, where he is an assistant professor in library and information science. As a visiting scholar at the MIT Sloan School of Management, he will collaborate with the Operations Management Group to expand on his community health informatics research and investigate health equity barriers. He recently facilitated a workshop, “Healthcare, Technology, and Social Justice Converge — Applied Equity Research and Why It Matters to All of Us” at the MIT Day of Dialogue event in August.

Patricia Saulis is Wolastoqey (Maliseet) from Wolastoq Negotkuk (Tobique First Nation in New Brunswick, Canada). As an MLK Visiting Scholar, Saulis will collaborate with her faculty host, Professor James Paradis from Comparative Media Studies/Writing, on a course titled, “Transmedia Art, Extraction and Environmental Justice” and engage with MIT Center for Environmental Health Sciences on their EPA Superfund-related work in the Northeastern United States. She will work closely with the American Indian Science and Engineering Society (AISES) and the Native American Students Association in raising awareness of the challenges impacting our Indigenous students. Through dialogue and presentations, she will help promote the understanding of Indigenous Peoples’ culture and help identify strategies to create a more inclusive campus for our Indigenous community. 

Community engagement

This year’s scholars are eager to join our community and embark on a mutually rewarding journey of learning and engagement — wherever in the world we may be.  

MIT community members are invited to join the Institute Community and Equity Office in engaging the MLK Professors and Scholars through a signature monthly speaker series, where each scholar will present their research and hold discussions via Zoom. The first welcome event will be held on Sept. 16 from 12 to 1 p.m. Contact Rachel Ornitz rornitz@mit.edu for event details.

For more information about this year’s and previous scholars and the program, visit the newly redesigned MLK Visiting Professors and Scholars website.

Read More