From “cheetah-noids” to humanoids

In November 2018, MIT Professor Sangbae Kim brought his mini cheetah robot onto “The Tonight Show’s” Tonight Show-botics segment. Much to the delight of host Jimmy Fallon, the mini cheetah did some yoga, got back up after falling, and executed a perfect backflip. Behind the stage, Benjamin Katz ’16, SM ’18 was remotely controlling the cheetah’s nimble maneuvers.

For Katz, waiting in the wings as the robot performed in front of a national audience was the culmination of nearly five years of work.

As an undergraduate at MIT, Katz studied mechanical engineering, opting for the flexible Course 2A degree program with a concentration in controls, instrumentation, and robotics. Toward the end of his first year, he emailed Kim to see if there were any job opportunities in Kim’s Biomimetic Robotics Lab. He then spent the summer in Kim’s lab as part of the MIT Undergraduate Research Opportunities Program (UROP). For his UROP research and undergraduate thesis, he began to look at how to utilize pieces built for the electronics hobby market in robotics. “You can find really high-performance motors built for things like remote control airplanes and drones. I basically thought you could also use these parts for robots, which is something no one was doing,” recalls Katz.

Kim was immediately impressed by Katz’s abilities an engineer and designer.

“Ben is an extremely versatile engineer who can cover structure and mechanism design, electric motor dynamics, power electronics, and classical control, a range of expertise usually requiring four-to-five engineers to cover,” says Kim.

After deciding to pursue a master’s degree in mechanical engineering at MIT, Katz continued working in Kim’s lab and developed solutions for actuators in robotics. While working on the third iteration of Kim’s robot, known as Cheetah 3, Katz and his labmates shifted their focus to developing a smaller version of the robot.

“There are a lot of nice things about having a smaller robot: If something breaks you can easily fix it, it’s cheaper, and it’s safe enough for one person to wrangle alone,” says Katz. “Even though a small robot may not always be the most practical for real-world applications, its controllers, software, and research can be trivially ported to a big robot that can carry larger payloads.”

Drawing upon his undergraduate research, Katz and the research team used 12 motors originally designed for drones to build actuators in each joint of the small quadruped robot that would be dubbed the “mini cheetah.”

Armed with this smaller robot, Katz set out to make the mini cheetah more agile and resilient. Alongside then-EECS student Jared Di Carlo ’19, Mng ’20, Katz focused on controls related to locomotion in the mini cheetah. In class 6.832 (Underactuated Robotics), taught by Professor Russ Tedrake, the pair worked on a project that would allow the mini cheetah safely backflip from a crouched position.

“It was basically a giant offline optimization problem to get the mini cheetah to backflip,” says Katz.

Using offline nonlinear optimization to generate the backflip trajectory, he and Di Carlo were able to program the mini cheetah to crouch and rotate 360 degrees around an axis.

While working on the cheetah, Katz was constantly pursuing other engineering projects as a hobby. This included a very different rotating robot as a pet project. Alongside Di Carlo, Katz utilized the MIT community makerspace known as MITERS to develop a robot that could solve a Rubik’s Cube in a record-breaking 0.38 seconds.

“That project was purely for fun during MIT’s Independent Activities Period,” recalls Katz. “We used custom-built actuators on each of the Rubik’s Cube’s faces alongside webcams to identify the colors and move the blocks accordingly.”

He chronicled his other pet projects on his “build-its” blog, which developed a strong following. Projects included planar magnetic headphones, a desktop Furuta pendulum, and an electric travel ukulele.

“Ben was constantly building and analyzing something along with our lab and class projects during his entire time at MIT,” says Kim. “His incessant desire to learn, build, and analyze is quite remarkable.”

After graduating with his master’s degree in 2018, Katz worked as a technical associate in Kim’s lab before accepting a position at Boston Dynamics in 2019.

As a designer at Boston Dynamics, Katz has transitioned from cheetah robots to humanoid robots on ATLAS, a research platform billed as the “world’s most dynamic humanoid robot.” Much like the mini cheetah, ATLAS can execute incredibly dynamic maneuvers, including backflips and even parkour.

While the mini cheetah holding yoga poses and ATLAS doing parkour seems like entertainment befitting “The Tonight Show,” Katz is quick to remind others that these robots are fulfilling a real-world need. The robots could someday maneuver in areas that are too dangerous for humans — including buildings that are on fire and disaster areas. They could open new possibilities for lifesaving disaster relief and first-responders in emergencies.

“What we did in Sangbae’s lab is going to help make these machines ubiquitous and actually useful in the real world as viable products,” adds Katz.

Read More

NVIDIA Awards $50,000 Fellowships to Ph.D. Students for GPU Computing Research

For more than two decades, NVIDIA has supported graduate students doing GPU-based work through the NVIDIA Graduate Fellowship Program. Today we’re announcing the latest awards of up to $50,000 each to 10 Ph.D. students involved in GPU computing research.

Selected from a highly competitive applicant pool, the awardees will participate in a summer internship preceding the fellowship year. The work they’re doing puts them at the forefront of GPU computing, with fellows tackling projects in deep learning, robotics, computer vision, computer graphics, architecture, circuits, high performance computing, life sciences and programming systems.

“Our fellowship recipients are among the most talented graduate students in the world,” said NVIDIA Chief Scientist Bill Dally. “They’re working on some of the most important problems in computer science, and we’re delighted to support their research.”

The NVIDIA Graduate Fellowship Program is open to applicants worldwide.

Our 2022-2023 fellowship recipients are:

  • Davis Rempe, Stanford University — Modeling 3D motion to solve pose estimation, shape reconstruction and motion forecasting, which enables intelligent systems that understand dynamic 3D objects, humans and scenes.
  • Hao Chen, University of Texas at Austin — Developing next-generation VLSI physical synthesis tools capable of generating sign-off quality layouts in advanced manufacturing nodes, particularly in analog/mixed-signal circuits.
  • Mohit Shridhar, University of Washington — Connecting language to perception and action for vision-based robotics, where representations of vision and language are learned through embodied interactions rather than from static datasets.
  • Sai Praveen Bangaru, Massachusetts Institute of Technology — Developing algorithms and compilers for the systematic differentiation of numerical integrators, allowing them to mix seamlessly with machine learning components.
  • Shlomi Steinberg, University of California, Santa Barbara — Developing models and computational tools for physical light transport — the computational discipline that studies the simulation of partially coherent light in complex environments.
  • Sneha Goenka, Stanford University — Exploring genomic analysis pipelines through hardware-software co-design to enable the ultra-rapid diagnosis of genetic diseases and accelerate large-scale comparative genomic analysis.
  • Yufei Ye, Carnegie Mellon University — Building agents that can perceive physical interactions among objects, understand the consequences of interactions with the physical world, and even predict the potential effects of specific interactions.
  • Yuke Wang, University of California, Santa Barbara — Exploring novel algorithm- and system-level designs and optimizations to accelerate diverse deep-learning workloads, including deep neural networks and graph neural networks.
  • Yuntian Deng, Harvard University — Developing scalable, controllable and interpretable natural language generation approaches using deep generative models with potential applications in long-form text generation.
  • Zekun Hao, Cornell University — Developing algorithms that learn from real-world visual data and apply that knowledge to help human creators build photorealistic 3D worlds.

We also acknowledge the 2022-2023 fellowship finalists:

  • Enze Xie, University of Hong Kong
  • Gokul Swamy, Carnegie Mellon University
  • Hong-Xing (Koven) Yu, Stanford University
  • Suyeon Choi, Stanford University
  • Yash Sharma, University of Tübingen

The post NVIDIA Awards $50,000 Fellowships to Ph.D. Students for GPU Computing Research appeared first on The Official NVIDIA Blog.

Read More

Train and deploy a FairMOT model with Amazon SageMaker

Multi-object tracking (MOT) in video analysis is increasingly in demand in many industries, such as live sports, manufacturing, surveillance, and traffic monitoring. For example, in live sports, MOT can track soccer players in real time to analyze physical performance such as real-time speed and moving distance.

Previously, most methods were designed to separate MOT into two tasks: object detection and association. The object detection task detects objects first. The association task extracts re-identification (re-ID) features from image regions for each detected object, and links each detected object through re-ID features to existing tracks or creates a new track. It’s challenging to do real-time inference in an environment with a large number of objects. This is because two tasks extract features respectively and the association task needs to run re-ID feature extraction for each object. Some proposed one-shot MOT methods add a re-ID branch to the object detection network to conduct object detection and association simultaneously. This reduces the inference time, but sacrifices the tracking performance.

FairMOT is a one-shot tracking method with two homogeneous branches for detecting objects and extracting re-ID features. FairMOT has higher performance than the two-step methods—it reaches a speed of about 30 FPS on the MOT challenge datasets. This improvement helps MOT find its way in many industrial scenarios.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to prepare, build, train, and deploy machine learning (ML) models quickly. SageMaker provides several built-in algorithms and container images that you can use to accelerate training and deployment of ML models. Additionally, custom algorithms such as FairMOT can also be supported via custom-built Docker container images.

This post demonstrates how to train and deploy a FairMOT model with SageMaker, optimize it using hyperparameter tuning, and make predictions in real time as well as batch mode.

Overview of the solution

Our solution consists of the following high-level steps:

  1. Set up your resources.
  2. Use SageMaker to train a FairMOT model and tune hyperparameters on the MOT challenge dataset.
  3. Run real-time inference.
  4. Run batch inference.

Prerequisites

Before getting started, complete the following prerequisites:

  1. Create an AWS account or use an existing AWS account.
  2. Make sure that you have a minimum of one ml.p3.16xlarge instance for the training job.
  3. Make sure that you have a minimum of one ml.p3.2xlarge instance for inference endpoint.
  4. Make sure that you have a minimum of one ml.p3.2xlarge instance for processing jobs.

If this is your first time training a model, deploying a model, or running a processing job on the previously mentioned instance sizes, you must request a service quota increase for SageMaker training job.

Set up your resources

After you complete all the prerequisites, you’re ready to deploy the necessary resources.

  1. Create a SageMaker notebook instance. For this task, we recommend the ml.t3.medium instance type. The default volume size is 5 GB; you must increase the volume size to 100 GB. For your AWS Identity and Access Management (IAM) role, choose an existing role or create a new role, and attach the AmazonSageMakerFullAccess and AmazonElasticContainerRegistryPublicFullAccess policies to the role.
  2. Clone the GitHub repo to the notebook you created.
  3. Create a new Amazon Simple Storage Service (Amazon S3) bucket or use an existing bucket.

Train a FairMOT model

To train your FairMOT model, we use the fairmot-training.ipynb notebook. The following diagram outlines the logical flow implemented in this code.

In the Initialize SageMaker section, we define the S3 bucket location and dataset name, and choose either to train on the entire dataset (by setting the half_val parameter to 0) or split it into training and validation (half_val is set to 1). We use the latter mode for hyperparameter tuning.

Next, the prepare-s3-bucket.sh script downloads the dataset from MOT challenge, converts it, and uploads it to the S3 bucket. We tested training the model using the MOT17 and MOT20 datasets, but you can try training with other MOT datasets as well.

In the Build and push SageMaker training image section, we create a custom container image with the FairMOT training algorithm. You can find the definition of the Docker image in the container-dp folder. Because this container image consumes about 13.5 GB volume, the prepare-docker.sh script changes the default directory of the local temporary Docker image in order to avoid the “no space” error. The build_and_push.sh command does just that—it builds and pushes the container to Amazon Elastic Container Registry (Amazon ECR). You should be able to validate the result on the Amazon ECR console.

Finally, the Define a training job section initiates the model training. You can observe the model training on the SageMaker console on the Training Jobs page. The model shows an In progress status first and changes to Completed in about 3 hours (if you’re running the notebook as is). You can access corresponding training metrics on the training job details page, as shown in the following screenshot.

Training metrics

The FairMOT model is based on a backbone network with object detection and re-ID branches on top. The object detection branch has three parallel heads to estimate heatmaps, object center offsets, and bounding box sizes. During the training phase, each head has a corresponding loss value: hm_loss for heatmap, offset_loss for center offsets, and wh_loss for bounding box sizes. The re-ID branch has an id_loss for the re-ID feature learning. Based on these four loss values, a total loss named loss is calculated for the entire network. We monitor all loss values on both the training and validation datasets. During hyperparameter tuning, we rely on ObjectiveMetric to select the best-performing model.

When the training job is complete, note the URI of your model in the Output section of the job details page.

Finally, the last section of the notebook demonstrates SageMaker hyperparameter optimization (HPO). The right combination of hyperparameters can improve performance of ML models; however, finding one manually is time-consuming. SageMaker hyperparameter tuning helps automate the process. We simply define the range for each tuning hyperparameter and the objective metric, while HPO does the rest.

To accelerate the process, SageMaker HPO can run multiple training jobs in parallel. In the end, the best training job provides the most optimal hyperparameters for the model, which you can then use for training on the entire dataset.

Perform real-time inference

In this section, we use the fairmot-inference.ipynb notebook. Similar to the training notebook, we begin by initializing SageMaker parameters and building a custom container image. The inference container is then deployed with the model we built earlier. The model is referenced via the s3_model_uri variable—you should double-check to make sure it links to the correct URI (adjust manually if necessary).

The following diagram illustrates the inference flow.

After our custom container is deployed on a SageMaker inference endpoint, we’re ready to test. First, we download a test video from MOT16-03. Next, in our inference loop, we use OpenCV to split the video into individual frames, convert them to base64, and make predictions by calling the deployed inference endpoint.

The following code demonstrates this logic implemented with the SageMaker SDK:

frame_path = # the path of a frame
with open(frame_path, "rb") as image_file:
        img_data = base64.b64encode(image_file.read())
        data = {"frame_id": frame_id}
        data["frame_data"] = img_data.decode("utf-8")
        if frame_id == 0:
            data["frame_w"] = frame_w
            data["frame_h"] = frame_h
            data["batch_size"] = 1
        body = json.dumps(data).encode("utf-8")
    
    os.remove(frame_path)
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Accept="application/json", Body=body
    )

    body = response["Body"].read()

The resulting video is stored in {root_directory}/datasets/test.mp4. The following is a sample frame. The same person in consecutive frames is wrapped by a bounding box with a unique ID.

Perform batch inference

Now that we implemented and validated the FairMOT model using a frame-by-frame inference endpoint, we build a container that can process the entire video as a whole. This allows us to use FairMOT as a step in more complex video processing pipelines. We use a SageMaker processing job to achieve this goal, as demonstrated in the fairmot-batch-inference.ipynb notebook.

Once again, we begin with SageMaker initialization and building a custom container image. This time we encapsulate the frame-by-frame inference loop into the container itself (the predict.py script). Our test data is MOT16-03, pre-staged in the S3 bucket. As in the previous steps, make sure that the s3_model_uri variable refers to the correct model URI.

SageMaker processing jobs rely on Amazon S3 for input and output data placement. The following diagram demonstrates our workflow.

In the Run batch inference section, we create an instance of ScriptProcessor and define the path for input and output data, as well as the target model. We then run the processor, and the resulting video is placed into the location defined in the s3_output variable. It looks the same as the resulting video generated in the previous section.

Clean up

To avoid unnecessary costs, delete the resources you created as part of this solution, including the inference endpoint.

Conclusion

This post demonstrated how to use SageMaker to train and deploy an object tracking model based on FairMOT. You can use a similar approach to implement other custom algorithms. Although we used public datasets in this example, you can certainly accomplish the same with your own dataset. Amazon SageMaker Ground Truth can help you with the labeling, and SageMaker custom containers simplify implementation.


About the Author

Gordon Wang is a Data Scientist on the Professional Services team at Amazon Web Services. He supports customers in many industries, including media, manufacturing, energy, and healthcare. He is passionate about computer vision, deep learning, and MLOps. In his spare time, he loves running and hiking.

Read More

What Is a Digital Twin?

Step inside an auto assembly plant. See workers ratcheting down nuts to bolts. Hear the whirring of air tools. Watch pristine car bodies gliding along the line and robots rolling up with parts.

Now, fire up its digital twin in 3D online. See animated digital humans at work in the exact same, but digital version of the plant. Drag and drop in robots to move heavy materials, and run simulations for optimizations, taking in real-time factory floor data for improvements. That’s a digital twin.

A digital twin is a virtual representation — a true-to-reality simulation of physics and materials — of a real-world physical asset or system, which is continuously updated.

Digital twins aren’t just for inanimate objects and people. They can be a virtual representation of computer networking architecture used as a sandbox for cyberattack simulations. They can replicate a fulfillment center process to test out human-robot interactions before activating certain robot functions in live environments. The applications are as wide as the imagination.

Digital twins are shaking up operations of businesses. The worldwide market for digital twin platforms is forecast to reach $86 billion by 2028, according to Grand View Research. Its report cites COVID-19 as a catalyst for the adoption of digital twins in specific industries.

What’s Driving Digital Twins? 

The Internet of Things is revving up digital twins.

IoT is helping to enable connected machines and devices to share data with their digital twins and vice versa. That’s because digital twins are always on and up-to-date computer-simulated versions of real-world IoT-connected physical things or processes they represent.

Digital twins are virtual representations that can capture the physics of structures and changing conditions internally and externally, as measured by myriad connected sensors driven by edge computing. They can also run simulations within the virtualizations to test for problems and seek improvements through service updates.

Robotics development and autonomous vehicles are just a couple of the growing number of examples used in digital twins to mimic physical equipment and environments.

“Autonomous vehicles at a very simple level are robots that operate in the open world, striving to avoid contact with anything,” said Rev Lebaredian, vice president of Omniverse and Simulation Technology at NVIDIA. “Eventually we’ll have sophisticated autonomous robots working alongside humans in settings like kitchens — manipulating knives and other dangerous tools. We need digital twins of the worlds they are going to be operating in, so we can teach them safely in the virtual world before transferring their intelligence into the real world.”

Digital Twins in 3D Virtual Environments  

Shared virtual 3D worlds are bringing people together to collaborate on digital twins.

The interactive 3D virtual universe is evident in gaming. Online social games such as Fortnite and the user-generated virtual world of Roblox offer a glimpse of the potential of interactions.

Video conferencing calls in VR, with participants existing as avatars of themselves in a shared virtual conference room, are a step toward realizing the possibilities for the enterprise.

Today the tools exist to develop each of these shared virtual worlds in a shared virtual collaboration platform within this environment.

Omniverse Replicator for Digital Twin Simulations

At GTC, NVIDIA unveiled Omniverse Replicator to help develop digital twins. It’s a  synthetic-data-generation engine that produces physically simulated data for training deep neural networks.

Along with this, the company introduced two implementations of the engine for applications that generate synthetic data: NVIDIA DRIVE Sim, a virtual world for hosting the digital twin of autonomous vehicles, and NVIDIA Isaac Sim, a virtual world for the digital twin of manipulation robots.

Autonomous vehicles and robots developed using this data can master skills across an array of virtual environments before applying them in the real world.

Based on Pixar’s Universal Scene Description and NVIDIA RTX technology, NVIDIA Omniverse is the world’s first scalable, multi-GPU physically accurate world simulation platform.

Omniverse offers users the ability to connect to multiple software ecosystems — including Epic Games Unreal Engine, Reallusion, OnShape, Blender and Adobe — that can assist millions of users.

The reference development platform is modular and can be extended easily. Teams across NVIDIA have enlisted the platform to build core simulation apps such as the previously mentioned NVIDIA Isaac Sim for robotics and synthetic data generation, and NVIDIA DRIVE Sim.

DRIVE Sim enables recreating real-world driving scenarios in a virtual environment to enable testing and development of rare and dangerous use cases.  In addition, because the simulator has a perfect understanding of the ground-truth in any scene, the data from the simulator can be used for training the deep neural networks used in autonomous vehicle perception.

As shown in BMW Group’s factory of the future, Omniverse’s modularity and openness allows it to utilize several other NVIDIA platforms such as the NVIDIA Isaac platform for robotics, NVIDIA Metropolis for intelligent video analytics, and the NVIDIA Aerial software development kit, which brings GPU-accelerated, software-defined 5G wireless radio access networks to environments as well as third-party software for users and companies to continue to use their own tools.

How Are Digital Twins Coming Online?

When building a digital twin and deploying its features, corralling AI resources is necessary.

NVIDIA Base Command Platform enables enterprises to deploy large-scale AI infrastructure. It optimizes resources for users and teams, and it can monitor the workflow from early development to production.

Base Command was developed to help support NVIDIA’s in-house research team with AI resources. It helps manage the available GPU resources, select databases, workspaces and container images available.

It manages the lifecycle of AI development, including workload management and resource sharing, providing both a graphical user interface and a command line interface, and integrated monitoring and reporting dashboards. It delivers the latest NVIDIA updates directly into your AI workflows.

Think of it as the compute engine of AI.

How Are Digital Twins Managed?

NVIDIA Fleet Command provides remote AI management.

Implementing AI from digital twins into the real world requires a deployment platform to handle the updates to the thousands, or even millions, of machines and devices of the edge.

NVIDIA Fleet Command is a cloud-based service accessible from the NVIDIA NGC hub of GPU-accelerated software to securely deploy, manage and scale AI applications across edge-connected systems and devices.

Fleet Command enables fulfillment centers, manufacturing facilities, retailers and many others to remotely implement AI updates.

How Are Digital Twins Advancing?

Digital twins enable the autonomy of things. They can be used to control a physical counterpart autonomously.

An electric vehicle maker, for example, might use a digital twin of a sedan to run simulations on software updates. And when the simulations show improvements to the car’s performance or solve a problem, those software updates can be dispatched over the air to the physical vehicle.

Siemens Energy is creating digital twins to support predictive maintenance of power plants. A digital twin of this scale promises to reduce downtime and help save utility providers an estimated $1.7 billion a year, according to the company.

Passive Logic, a startup based in Salt Lake City, offers an AI platform to engineer and autonomously operate the IoT components of buildings. Its AI engine understands how building components work together, down to the physics, and can run simulations of building systems.

The platform can take in multiple data points and make control decisions to optimize operations autonomously. It compares this optimal control path to actual sensor data, applies machine learning and learns improvements about operating the building over time.

Trains are on a fast track to autonomy as well, and digital twins are being developed to help get there. They’re being used in simulations for features such as automated braking and collision detection systems, enabled by AI run on NVIDIA GPUs.

What Is the History of Digital Twins?

By many accounts, NASA was the first to introduce the notion of the digital twin. While clearly not connected in the Internet of Things way, NASA’s early twin concept and its usage share many similarities with today’s digital twins.

NASA began with the digital twin idea as early as the 1960s. The space agency illustrated its enormous potential in the Apollo 13 moon mission. NASA had set up simulators of systems on the Apollo 13 spacecraft, which could get updates from the real ship in outer space via telecommunications. This allowed NASA engineers to run situation simulations between astronauts and engineers ahead of departure, and it came in handy when things went awry on that mission in 1970.

Engineers on the ground were able to troubleshoot with the astronauts in space, referring to the models on Earth and saving the mission from disaster.

What Types of Digital Twins Are There?

Smart Cities Sims

Smart cities are popping up everywhere. Using video cameras, edge computing and AI, cities are able to understand everything from parking to traffic flow to crime patterns. Urban planners can study the data to help draw up and improve city designs.

Digital twins of smart cities can enable better planning of construction as well as constant improvements in municipalities. Smart cities are building 3D replicas of themselves to run simulations. These digital twins help optimize traffic flow, parking, street lighting and many other aspects to make life better in cities, and these improvements can be implemented in the real world.

Dassault Systèmes has helped build digital twins around the world. In Hong Kong, the company presented examples for a walkability study, using a 3D simulation of the city for visualization.

NVIDIA Metropolis is an application framework, a set of developer tools and a large ecosystem of specialist partners that help developers and service providers better instrument physical space and build smarter infrastructure and spaces through AI-enabled vision. The platform spans AI training to inference, facilitating edge-to-cloud deployment, and it includes enterprise management tools like Fleet Command to better manage fleets of edge nodes.

Earth Simulation Twins 

Digital twins are even being applied to climate modeling.

NVIDIA CEO Jensen Huang disclosed plans to build the world’s most powerful AI supercomputer dedicated to predicting climate change.

Named Earth-2, or E-2, the system would create a digital twin of Earth in Omniverse.

Separately, the European Union has launched Destination Earth, an effort to build a digital simulation of the planet. The plan is to help scientists accurately map climate development as well as extreme weather.

Supporting an EU mandate for achieving climate neutrality by 2050, the digital twin effort would be rendered at one-kilometer scale and based on continuously updated observational data from climate, atmospheric and meteorological sensors. It also plans to take into account measurements of the environmental impacts of human activities.

It is predicted that the Destination Earth digital twin project would require a system with 20,000 GPUs to operate at full scale, according to a paper published in Nature Computational Science. Simulation insights can enable scientists to develop and test scenarios. This can help inform policy decisions and sustainable development planning.

Such work can help assess drought risk, monitor rising sea levels and track changes in the polar regions. It can also be used for planning on food and water issues, and renewable energy such as wind farms and solar plants. The goal is for the main digital modeling platform to be operating by 2023, with the digital twin live by 2027.

Data Center Networking Simulation

Networking is an area where digital twins are reducing downtime for data centers.

Over time, networks have become more complicated. The scale of networks, the number of nodes and the interoperability between components add to their complexity, affecting preproduction and staging operations.

Network digital twins speed up initial deployments by pretesting routing, security, automation and monitoring in simulation. They also enhance ongoing operations, including validating network change requests in simulation, which reduces maintenance times.

Networking operations have also evolved to more advanced capabilities with the use of APIs and automation. And streaming telemetry — think IoT-connected sensors for devices and machines — allows for constant collection of data and analytics on the network for visibility into problems and issues.

The NVIDIA Air infrastructure simulation platform enables network engineers to host digital twins of data center networks.

Ericsson, a maker of telecommunications equipment, is combining decades of radio network simulation expertise with NVIDIA Omniverse Enterprise.

The Stockholm-based company is building city-scale digital twins in NVIDIA Omniverse to help accurately simulate the interplay between 5G cells and the environment to maximize performance and coverage.

 

 

Automotive Manufacturing Twins

BMW Group, which has 31 factories around the world, is collaborating with NVIDIA on digital twins. The German automaker is relying on NVIDIA Omniverse Enterprise to run factory simulations to optimize its operations.

 

Its factories provide more than 100 options for each car, and more than 40 BMW models, offering 2,100 possible configurations of a new vehicle. Some 99 percent of the vehicles produced in BMW factories are custom configurations, which creates challenges for keeping materials stocked on the assembly line.

To help maintain the flow of materials for its factories, BMW Group is also harnessing the NVIDIA Isaac robotics platform to deploy a fleet of robots for logistics to improve the distribution of materials in its production environment. These human-assisting robots, which are put into simulation scenarios with digital humans in pre-production, enable the company to safely test out robot applications on the factory floor of the digital twin before launching into production.

Virtual simulations also enable the company to optimize the assembly line as well as worker  ergonomics and safety. Planning experts from different regions can connect virtually with NVIDIA Omniverse, which lets global 3D design teams work together simultaneously across multiple software suites in a shared virtual space.

NVIDIA Omniverse Enterprise is enabling digital twins for many different industrial applications.

Architecture, Engineering and Construction

Building design teams face a growing demand for efficient collaboration, faster iteration on renderings, and expectations for accurate simulation and photorealism.

These demands can become even more challenging when teams are dispersed worldwide.

Creating digital twins in Omniverse for architects, engineers and construction teams to assess designs together can quicken the pace of development, helping contracts run on time.

Teams on Omniverse can be brought together virtually in a single, interactive platform — even when simultaneously working in different software applications — to rapidly develop architectural models as if they are in the same room and simulate with full physical accuracy and fidelity.

Retail and Fulfillment

Logistics for order fulfillment is a massive industry of moving parts. Fulfillment centers now are aided by robots to help workers avoid injury and boost their efficiency. It’s an environment filled with cameras driven by AI and edge computing to help rapidly pick, pull and pack products. It’s how one-day deliveries arrive at our doors.

The use of digital twins means that much of this can be created in a virtual environment, and simulations can be run to eliminate bottlenecks and other problems.

Kinetic Vision is reinventing intelligent fulfillment and distribution centers with digital twins through digitization and AI. Successfully implementing a network of intelligent stores and fulfillment centers needs robust information, data, and operational technologies to enable innovative edge computing and AI solutions like real-time product recognition. This drives faster, more agile product inspections and order fulfillments.

Energy Industry Twins 

Siemens Energy is relying on the NVIDIA Omniverse platform to create digital twins to support predictive maintenance of power plants.

Using NVIDIA Modulus software frameworks, running on NVIDIA A100 Tensor Core GPUs, Siemens Energy can simulate the corrosive effects of heat, water and other conditions on metal over time to fine-tune maintenance needs.

Hydrocarbon Exploration 

Oil companies face huge risks in seeking to tap new reservoirs or reassess production stage fields with the least financial and environmental downside. Drilling can cost hundreds of millions of dollars. After locating hydrocarbons, these energy companies need to quickly figure out the most profitable strategies for new or ongoing production

Digital twins for reservoir simulations can save many millions of dollars and avoid environmental problems. Using technical software applications, these companies can model how water and hydrocarbons flow under the ground amid wells. This allows them to evaluate potentially problematic situations and virtual production strategies on supercomputers.

Having assessed the risks beforehand, in the digital twins, these exploration companies can minimize losses when committing to new projects. Real-world versions in production can also be optimized for better output based on analytics from their digital doppelganger.

Airport Efficiencies

Digital twins can enable airports to improve customer experiences. For instance, video cameras could monitor the Transportation Security Administration, or TSA, and apply AI to look for ways to analyze bottlenecks at peak hours. Those could be addressed in digital models, and then moved into production to reduce missed flights. Baggage handling video can be assessed to improve ways in the digital environment to ensure luggage arrives on time.

Airplane turnarounds can benefit, too. Many vendors service arriving planes to get them turned around and back on the runway for departures. Video can help airlines track these vendors to ensure timely turnarounds. Digital twins can also analyze the services coordination to optimize workflows before changing things up.

Airlines can then hold their vendors accountable to quickly carrying out services. Caterers, cleaners, refueling, trash and waste removal and other service providers all have what’s known as service-level agreements with airlines to help keep the planes running on time. All of these activities can be run in simulations in the digital world and then applied to scheduling in production for real-world results to help reduce departure delays.

NVIDIA Metropolis helps to process massive amounts of video from the edge so that airports and other industries can analyze operations in real time and derive insights from analytics.

What’s the Future for Digital Twins?

Digital twin simulations have been simmering for half a century. But the past decade’s advances in GPUs, AI and software platforms are heating up their adoption amid this higher-fidelity era of more immersive experiences.    

Increasing penetration of virtual reality and augmented reality will accelerate this work.

Worldwide sales of VR headsets are expected to increase from roughly 7 million in 2021 to more than 28 million in 2025, according to analyst firm IDC.

That’s a lot more headset-connected, content-consuming eyeballs for virtual environments.

And all those in it will be able to access the NVIDIA Omniverse platform for AI, human and robot interactions, and infinite simulations, driving a wild ride of advances from digital twins.

“There has been talk of virtual worlds and digital twins for years. We’re right at the beginning of this transition into reality, much as AI became viable and created an explosion of possibilities,” said NVIDIA’s Lebaredian.

Buckle up for the adventure.

 

 

The post What Is a Digital Twin? appeared first on The Official NVIDIA Blog.

Read More

Customizing GPT-3 for Your Application

Developers can now fine-tune GPT-3 on their own data, creating a custom version tailored to their application. Customizing makes GPT-3 reliable for a wider variety of use cases and makes running the model cheaper and faster.

You can use an existing dataset of virtually any shape and size, or incrementally add data based on user feedback. With fine-tuning, one API customer was able to increase correct outputs from 83% to 95%. By adding new data from their product each week, another reduced error rates by 50%.

To get started, just run a single command in the OpenAI command line tool with a file you provide. Your custom version will start training and then be available immediately in our API.

Read documentation


Last year we trained GPT-3 and made it available in our API. With only a few examples, GPT-3 can perform a wide variety of natural language tasks, a concept called few-shot learning or prompt design. Customizing GPT-3 can yield even better results because you can provide many more examples than what’s possible with prompt design.

You can customize GPT-3 for your application with one command and use it immediately in our API:

openai api fine_tunes.create -t <train_file>

It takes less than 100 examples to start seeing the benefits of fine-tuning GPT-3 and performance continues to improve as you add more data. In research published last June, we showed how fine-tuning with less than 100 examples can improve GPT-3’s performance on certain tasks. We’ve also found that each doubling of the number of examples tends to improve quality linearly.

With one of our most challenging research datasets, Grade School Math problems, fine-tuning GPT-3 improves accuracy by 2 to 4x over what’s possible with prompt design.

Two sizes of GPT-3 models, Curie and Davinci, were fine-tuned on 8,000 examples from one of our most challenging research datasets, Grade School Math problems. We compare the models’ ability to solve problems when 10 completions are created.

Customizing GPT-3 improves the reliability of output, offering more consistent results that you can count on for production use-cases. One customer found that customizing GPT-3 reduced the frequency of unreliable outputs from 17% to 5%. Since custom versions of GPT-3 are tailored to your application, the prompt can be much shorter, reducing costs and improving latency.

Whether text generation, summarization, classification, or any other natural language task GPT-3 is capable of performing, customizing GPT-3 will improve performance.

Apps Powered by Customized Versions of GPT-3

Keeper Tax helps independent contractors and freelancers with their taxes. After a customer links their financial accounts, Keeper Tax uses various models to extract text and classify transactions. Using the classified data, Keeper Tax identifies easy-to-miss tax write-offs and helps customers file their taxes directly from the app. By customizing GPT-3, Keeper Tax is able to continuously improve results. Once a week, Keeper Tax adds around 500 new training examples to fine-tune their model, which is leading to about a 1% accuracy improvement each week, increasing accuracy from 85% to 93%.

Viable helps companies get insights from their customer feedback. By customizing GPT-3, Viable is able to transform massive amounts of unstructured data into readable natural language reports, highlighting top customer complaints, compliments, requests, and questions. Customizing GPT-3 has increased the reliability of Viable’s reports. By using a customized version of GPT-3, accuracy in summarizing customer feedback has improved from 66% to 90%. The result is tangible, intuitive information that customers need to inform their product decisions.

Sana Labs is a global leader in the development and application of AI to learning. The Sana learning platform powers personalized learning experiences for businesses by leveraging the latest ML breakthroughs to tailor the content for each individual. By customizing GPT-3 with their data, Sana’s question and content generation went from grammatically correct but general responses to highly accurate outputs. This yielded a 60% improvement, enabling fundamentally more personalized and effective experiences for their learners.

Elicit is an AI research assistant that helps people directly answer research questions using findings from academic papers. The tool finds the most relevant abstracts from a large corpus of research papers, then applies a customized version of GPT-3 to generate the claim (if any) that the paper makes about the question. A custom version of GPT-3 outperformed prompt design across three important measures: results were easier to understand (a 24% improvement), more accurate (a 17% improvement), and better overall (a 33% improvement).

All API customers can customize GPT-3 today. Sign-up and get started with the fine-tuning documentation.

How to customize GPT-3 for your application


Set up

  • Install the openai python-based client from your terminal:pip install --upgrade openai
  • Set your API key as an environment variable:export OPENAI_API_KEY=<api_key>

Train a custom model

  • Fine-tune the Ada model on a demo dataset for translating help messages from Spanish to English.
    openai api fine_tunes.create -m ada –n_epochs 2
    -t https://cdn.openai.com/API/train-demo.jsonl


    (Ctrl-C will interrupt the stream, but not cancel the fine-tune)
    [2021-12-08 12:11:30] Created fine-tune: ft-gK9R3N3lDQYQJD0SXqlF8Fnc
    [2021-12-08 12:11:40] Fine-tune costs $0.01
    [2021-12-08 12:11:40] Fine-tune enqueued. Queue number: 0
    [2021-12-08 12:11:45] Fine-tune started
    [2021-12-08 12:12:58] Completed epoch 1/2
    [2021-12-08 12:13:56] Completed epoch 2/2
    [2021-12-08 12:14:26] Uploaded model: ada:ft-org-2021-12-08-20-14-25
    [2021-12-08 12:14:29] Uploaded result file: file-QvY81nzrOhXMenjMS5OlPeBW
    [2021-12-08 12:14:30] Fine-tune succeeded
    Job complete! Status: succeeded 🎉
    Try out your fine-tuned model:
    openai api completions.create -m ada:ft-org-2021-12-08-20-14-25 -p <YOUR_PROMPT>

Use the custom model

  • Ask your customized model for a translation.
    openai api completions.create -m <model_ID>
    –max-tokens 30 –temperature 0 –stop “###”
    -p $’Conecte la PS3 y vaya a Configuración>Configuraciones de Red, seleccione la red y escriba sus credenciales.nEnglish translation:’


    Conecte la PS3 y vaya a Configuración>Configuraciones de Red, seleccione la red y escriba sus credenciales.nEnglish translation: Connect the PS3 and go to Settings> Accounts Settings, select the network and write your credentials.%


OpenAI

Startup Surge: Utility Feels Power of Computer Vision to Track its Lines 

It was the kind of message Connor McCluskey loves to find in his inbox.

As a member of the product innovation team at FirstEnergy Corp. — an electric utility serving 6 million customers from central Ohio to the New Jersey coast — his job is to find technologies that open new revenue streams or cut costs.

In the email, Chris Ricciuti, the founder of Noteworthy AI, explained his ideas for using edge computing to radically improve how utilities track their assets. For FirstEnergy, those assets include tens of millions of devices mounted on millions of poles across more than 269,000 miles of distribution lines.

Bucket Trucks Become Smart Cameras

Ricciuti said his startup aimed to turn every truck in a utility’s fleet into a smart camera that takes pictures of every pole it passes. What’s more, Noteworthy AI’s software would provide the location of the pole, identify the gear on it and help analyze its condition.

“I saw right away that this could be a game changer, so I called him,” said McCluskey.

In the U.S. alone, utilities own 185 million poles. They spend tens, if not hundreds, of millions of dollars a year trying to track the transformers, fuses and other devices on them, as well as the vegetation growing around them.

Utilities typically send out workers each year to manually inspect a fraction of their distribution lines. It’s an inventory that can take a decade, yet the condition of each device is critical to delivering power safely.

5x More Images in 30 Days

In a pilot test last summer, Noteworthy AI showed how edge computing gets better results.

In 30 days, two FirstEnergy trucks, outfitted with the startup’s smart cameras, collected more than 5,000 high-res images of its poles. That expanded the utility’s database more than fivefold.

“People were astounded at what we could do in such a short time frame,” said McClusky.

What’s more, the pictures were of higher quality than those in the utility’s database. That would help eliminate wasted trips when actual line conditions vary from what engineers expect to find.

Noteworthy AI computer vision system mounted on a First Energy truck
The startup’s camera system can be mounted on a utility truck in less than an hour.

Use Cases Multiply

News of the pilot program spread to other business units.

A team that inspects FirstEnergy’s 880,000 streetlights and another responsible for tracking vegetation growth around its lines wanted to try the technology. Both saw the value of having more and better data.

So, an expanded pilot is in the works with more trucks over a larger area.

It’s too early to estimate the numbers, but McCluskey “felt right away we could find some significant cost savings with this technology — in a couple years I can imagine its use expanded to all our states,” he said.

An Inside Look at Edge Computing

In a unit the size of a small cake box that attaches to a truck with magnets or suction cups, Noteworthy AI packs two cameras and communications gear. It links to a smaller unit inside the cab that processes the images and AI on an NVIDIA Jetson Xavier NX.

“We developed a pretty sophisticated workflow that runs at the edge on Jetson,” Ricciuti said.

It uses seven AI models. One model looks for poles in images taken at 30 frames/second. When it finds one, it triggers a higher res camera to take bursts of 60-megabyte pictures.

Other models identify gear on the poles and determine which images to send to a database in the cloud.

Noteworthy AI camera processes images with NVIDIA Jetson
Designing a fast, resilient camera was even more challenging than implementing AI, said Ricciuti.

“We’re doing all this AI compute at the edge on Jetson, so we don’t have to send all the images to the cloud — it’s a huge cost savings,” Ricciuti said.

“With customer use cases growing, we’ll graduate to products like Jetson AGX Orin in the future — NVIDIA has been awesome in computing at the edge,” he added.

Software, Support Speeds Startup

The startup uses NVIDIA TensorRT, code that keeps its AI models trim, so they run fast. It also employs the NVIDIA JetPack SDK with drivers and libraries for computer vision and deep learning as well as ROS, an operating system, now accelerated on Jetson.

In addition, Ricciuti ticks off three benefits from being part of NVIDIA Inception, a program designed to nurture cutting-edge startups.

“When we have engineering questions, we get introduced to technical people who unblock us; we meet potential customers when we’re ready to go to market; and we get computer credits for GPUs in the cloud to train our models,” he said.

AI Spells Digital Transformation

The GPUs, software and support help Ricciuti do the work he loves: finding ways AI can transform legacy practices at large, regulated companies.

“We’re just seeing the tip of the iceberg of what we can do as people are being forced to innovate in the face of problems like climate change, and we’re getting a lot of interest from utilities with large distribution networks,” he said.

Learn more about how NVIDIA is accelerating innovation in the energy industry.

The post Startup Surge: Utility Feels Power of Computer Vision to Track its Lines  appeared first on The Official NVIDIA Blog.

Read More

Distributed Mask RCNN training with Amazon SageMakerCV

Computer vision algorithms are at the core of many deep learning applications. Self-driving cars, security systems, healthcare, logistics, and image processing all incorporate various aspects of computer vision. But despite their ubiquity, training computer vision algorithms, like Mask or Cascade RCNN, is hard. These models employ complex architectures, train on large datasets, and require computer clusters, often requiring dozens of GPUs.

Last year at AWS re:Invent we announced record-breaking Mask RCNN training times of 6:45 minutes on PyTorch and 6:12 minutes on TensorFlow, which we achieved through a series of algorithmic, system, and infrastructure improvements. Our model made heavy use of half precision computation, state-of-the-art optimizers and loss functions, the AWS Elastic Fabric Adapter, and a new parameter server distribution approach.

Now, we’re making these optimizations available in Amazon SageMaker in our new SageMakerCV package. SageMakerCV takes all the high performance tools we developed last year and combines them with the convenience features of SageMaker, such as interactive development in SageMaker Studio, Spot training, and streaming data directly from Amazon Simple Storage Service (Amazon S3).

The challenge of training object detection and instance segmentation

Object detection models, like Mask RCNN, have complex architectures. They typically involve a pretrained backbone, such as a ResNet model, a region proposal network, classifiers, and regression heads. Essentially, these models work like a collection of neural networks working on slightly different, but related, tasks. On top of that, developers often need to modify these models for their own use case. For example, along with the classifier, we might want a model that can identify human poses, as part of an autonomous vehicle project, in order to predict movement and behavior. This involves adding an additional network to the model, alongside the classifier and regression heads.

Mask RCNN architecture

The following diagram illustrates the Mask RCNN architecture.

For more information on Mask RCNN, see the following blog posts:

Modifying models like this is a time-consuming process. The updated model might train slower, or not converge as well as the previous model. SageMakerCV solves these issues by simplifying both the model modification and optimization process. The modification process is streamlined by modularizing the models, and using the interactive development environment in Studio. At the same time, we can apply all the optimizations we developed for our record training time to the new model.

GPU and algorithmic improvements

Several pieces of Mask RCNN are difficult to optimize for GPUs. For example, as part of the region proposal step, we want to reduce the number of regions using non-max suppression (NMS), the process of removing overlapping boxes. Many implementations of Mask RCNN run NMS on the CPU, which means moving a lot of data off the GPU in the middle of training. Other parts of the model, such as anchor generation and assignment, and ROI align, encounter similar problems.

As part of our Mask RCNN optimizations in 2020, we worked with NVIDIA to develop efficient CUDA implementations of NMS, ROI align, and anchor tools, all of which are built into SageMakerCV. This means data stays on the GPU and models train faster. Options for mixed and half precision training means larger batch sizes, shorter step times, and higher GPU utilization.

SageMakerCV also includes the same improved optimizers and loss functions we used in our record Mask RCNN training. NovoGrad means you can now train a model on batch sizes as large as 512. GIoU loss boosts both box and mask performance by around 5%. Combined, these improvements make it possible to train Mask RCNN to state-of-the-art levels of performance in under 7 minutes.

The following table summarizes the benchmark training times for Mask RCNN trained to MLPerf convergence levels using SageMakerCV on P4d.24xlarge instances SageMaker instances. Total time refers to the entire elapsed time, including SageMaker instance setup, Docker and data download, training, and evaluation.

Framework Nodes Total Time Training Time Box MaP Seg MaP
PyTorch 1 1:33:04 1:25:59 37.8 34.1
PyTorch 2 0:57:05 0:50:21 38.0 34.4
PyTorch 4 0:36:27 0:29:40 37.9 34.3
TensorFlow 1 2:23:52 2:18:24 37.7 34.3
TensorFlow 2 1:09:02 1:03:29 37.8 34.5
TensorFlow 4 0:48:55 0:42:33 38.0 34.8

Interactive development

Our goal with SageMakerCV was not only to provide fast training models to our users, but also to make developing new models easier. To that end, we provide a series of template object detection models in a highly modularized format, with a simple registry structure for adding new pieces. We also provide tools to modify and test models directly in Studio, so you can quickly go from prototyping a model to launching a distributed training cluster.

For example, say you want to add a custom keypoint head to Mask RCNN in TensorFlow. You first build your new head using the TensorFlow 2 Keras API, and add the SageMakerCV registry decorator at the top. The registry is a set of dictionaries organized into sections of the model. For example, the HEADS section triggers when the build_detector function is called, and the KeypointHead value from the configuration file tells the build to include the new ROI head. See the following code:

import tensorflow as tf
from sagemakercv.builder import HEADS

@HEADS.register("KeypointHead")
class KeypointHead(tf.keras.Model):
    def __init__(self, cfg):
        ...

Then you can call your new head by adding it to a YAML configuration file:

MODEL:
    RCNN:
        ROI_HEAD: "KeypointHead"

You provide this new configuration when building a model:

from configs.default_config import _C as cfg
from sagemakercv.detection import build_detector

cfg.merge_from_file('keypoint_config.yaml')

model = build_detector(cfg)

We know that building a new model is never as straightforward as we’re describing here, so we provide example notebooks of how to prototype models in Studio. This allows developers to quickly iterate on and debug their ideas.

Distributed training

SageMakerCV uses the distributed training capabilities of SageMaker right out of the box. You can go from prototyping a model on a single GPU to launching training on dozens of GPUs with just a few lines of code. SageMakerCV automatically supports SageMaker Distributed Data Parallel, which uses EFA to provide unmatched multi-node scaling efficiency. We also provide support for DDP in PyTorch, and Horovod in TensorFlow. By default, SageMakerCV automatically selects the optimal distributed training strategy for the cluster configuration you select. All you have to do is set your instance type and number of nodes, and SageMakerCV takes care of the rest.

Distributed training also typically involves huge amounts of data, often in the order of many terabytes. Getting all that data onto the training instances can take time, providing it will even fit. To fix this problem, SageMakerCV provides built-in support for streaming data directly from Amazon S3 with our recently released S3 plugin, reducing startup times and training costs.

Get started

We provide detailed tutorial notebooks that walk you through the entire process, from getting the COCO dataset, to building a model in Studio, to launching a distributed cluster. What follows is a brief overview.

Follow the instructions in Onboard to Amazon SageMaker Studio Using Quick Start. On your Studio instance, open a system terminal and clone the SageMakerCV repo.

git clone https://github.com/aws-samples/amazon-sagemaker-cv

Create a new Studio notebook with the PyTorch DLC, and install SageMakerCV in editable mode:

cd amazon-sagemaker-cv/pytorch
pip install -e .

In your notebook, create a new training configuration:

from configs import cfg

cfg.SOLVER.OPTIMIZER="NovoGrad" 
cfg.SOLVER.BASE_LR=0.042
cfg.SOLVER.LR_SCHEDULE="COSINE"
cfg.SOLVER.IMS_PER_BATCH=384 
cfg.SOLVER.WEIGHT_DECAY=0.001 
cfg.SOLVER.MAX_ITER=5000
cfg.OPT_LEVEL="O1"

Set your data sources by using either channels, or an S3 location to stream data during training:

S3_DATA_LOCATION = 's3://my-bucket/coco/'
CHANNELS_DIR='/opt/ml/input/data/' # on node, set by SageMaker

channels = {'validation': os.path.join(S3_DATA_LOCATION, 'val2017'),
            'weights': S3_WEIGHTS_LOCATION,
            'annotations': os.path.join(S3_DATA_LOCATION, 'annotations')}
            
cfg.INPUT.VAL_INPUT_DIR = os.path.join(CHANNELS_DIR, 'validation') 
cfg.INPUT.TRAIN_ANNO_DIR = os.path.join(CHANNELS_DIR, 'annotations', 'instances_train2017.json')
cfg.INPUT.VAL_ANNO_DIR = os.path.join(CHANNELS_DIR, 'annotations', 'instances_val2017.json')
cfg.MODEL.WEIGHT=os.path.join(CHANNELS_DIR, 'weights', R50_WEIGHTS) 
cfg.INPUT.TRAIN_INPUT_DIR = os.path.join(S3_DATA_LOCATION, "train2017") 
cfg.OUTPUT_DIR = '/opt/ml/checkpoints' # SageMaker output dir

# Save the new configuration file
dist_config_file = f"configs/dist-training-config.yaml"
with open(dist_config_file, 'w') as outfile:
    with redirect_stdout(outfile): print(cfg.dump())
    
hyperparameters = {"config": dist_config_file}

Finally, we can launch a distributed training job. For example, we can say we want four ml.p4d.24xlarge instances, and train a model to state-of-the-art convergence in about 45 minutes:

estimator = PyTorch(
                entry_point='train.py', 
                source_dir='.', 
                py_version='py3',
                framework_version='1.8.1',
                role=get_execution_role(),
                instance_count=4,
                instance_type='ml.p4d.24xlarge',
                distribution={ "smdistributed": { "dataparallel": { "enabled": True } } } ,
                output_path='s3://my-bucket/output/',
                checkpoint_s3_uri='s3://my-bucket/checkpoints/',
                model_dir='s3://my-bucket/model/',
                hyperparameters=hyperparameters,
                volume_size=500,
)

estimator.fit(channels)

Clean up

After training your model, be sure to check that all your training instances are complete or stopped by using the SageMaker console and choosing Training Jobs in the navigation pane.

Also, make sure to stop all Studio instances by choosing the Studio session monitor (square inside a circle icon) at the left of the page in Studio. Choose the power icon next to any running instances to shut them down. Your files are saved on your Studio EBS.

Conclusion

SageMakerCV started life as our project to break training records for computer vision models. In the process, we developed new tools and techniques to boost both training speed and accuracy. Now, we’ve combined those advances with SageMaker’s unified machine learning development experience. By combining the latest algorithmic advances, GPU hardware, EFA, and the ability to stream huge datasets from Amazon S3, SageMakerCV is the ideal place to develop the most advanced computer vision models. We look forward to seeing what new models and applications the machine learning community develops, and welcome any and all contributions. To get started, see our comprehensive tutorial notebooks in PyTorch and TensorFlow on GitHub.


About the Authors

Ben Snyder is an applied scientist with AWS Deep Learning. His research interests include computer vision models, reinforcement learning, and distributed optimization. Outside of work, he enjoys cycling and backcountry camping.

Khaled ElGalaind is the engineering manager for AWS Deep Engine Benchmarking, focusing on performance improvements for AWS Machine Learning customers. Khaled is passionate about democratizing deep learning. Outside of work, he enjoys volunteering with the Boy Scouts, BBQ, and hiking in Yosemite.

Sami Kama is a software engineer in AWS Deep Learning with expertise in performance optimization, HPC/HTC, Deep learning frameworks and distributed computing. Sami aims to reduce the environmental impact of Deep Learning by increasing the computation efficiency. He enjoys spending time with his kids, catching up with science and technology and occasional video games.

Read More

Machine learning inference at the edge using Amazon Lookout for Vision and AWS IoT Greengrass

Discrete and continuous manufacturing lines generate a high volume of products at low latency, ranging from milliseconds to a few seconds. To identify defects at the same throughput of production, camera streams of images must be processed at low latency. Additionally, factories may have low network bandwidth or intermittent cloud connectivity. In such scenarios, you may need to run the defect detection system on your on-premises compute infrastructure, and upload the processed results for further development and monitoring purposes to the AWS Cloud. This hybrid approach with both local edge hardware and the cloud can address the low latency requirements and help reduce storage and network transfer costs to the cloud. This may also fulfill your data privacy and other regulatory requirements.

In this post, we show you how to detect defective parts using Amazon Lookout for Vision machine learning (ML) models running on your on-premises edge appliance.

Lookout for Vision is an ML service that helps spot product defects using computer vision to automate the quality inspection process in your manufacturing lines, with no ML expertise required. The fully managed service enables you to build, train, optimize, and deploy the models in the AWS Cloud or edge. You can use the cloud APIs or deploy Amazon Lookout for Vision models on any NVIDIA Jetson edge appliance or x86 compute platform running Linux with an NVIDIA GPU accelerator. You can use AWS IoT Greengrass to deploy, and manage your edge compatible customized models on your fleet of devices.

Solution overview

In this post, we use a printed circuit board dataset composed of normal and defective images such as scratches, solder blobs, and damaged components on the board. We train a Lookout for Vision model in the cloud to identify defective and normal printed circuit boards. We compile the model to a target ARM architecture, package the trained Lookout for Vision model as an AWS IoT Greengrass component, and deploy the model to an NVIDIA Jetson edge device using the AWS IoT Greengrass console. Finally, we demonstrate a Python-based sample application running on the NVIDIA Jetson edge device that sources the printed circuit board image from the edge device file system, runs the inference on the Lookout for Vision model using the gRPC interface, and sends the inference data to an MQTT topic in the AWS Cloud.

The following diagram illustrates the solution architecture.

The solution has the following workflow:

  1. Upload a training dataset to Amazon Simple Storage Service (Amazon S3).
  2. Train a Lookout for Vision model in the cloud.
  3. Compile the model to the target architecture (ARM) and deploy the model to the NVIDIA Jetson edge device using the AWS IoT Greengrass console.
  4. Source images from local disk.
  5. Run inferences on the deployed model via the gRPC interface.
  6. Post the inference results to an MQTT client running on the edge device.
  7. Receive the MQTT message on a topic in AWS IoT Core in the AWS Cloud for further monitoring and visualization.

Steps 4, 5 and 6 are coordinated with the sample Python application.

Prerequisites

Before you get started, complete the following prerequisites:

  1. Create an AWS account.
  2. On your NVIDIA Jetson edge device, complete the following:
    1. Set up your edge device (we have set IoT THING_NAME = l4vJetsonXavierNx when installing AWS IoT Greengrass V2).
    2. Clone the sample project containing the Python-based sample application (warmup-model.py to load the model, and sample-client-file-mqtt.py to run inferences). Load the Python modules. See the following code:
git clone https://github.com/aws-samples/ds-peoplecounter-l4v-workshop.git
cd ds-peoplecounter-l4v-workshop 
pip3 install -r requirements.txt
cd lab2/inference_client  
# Replace ENDPOINT variable in sample-client-file-mqtt.py with the 
# value on the AWS console AWS IoT->Things->l4JetsonXavierNX->Interact.  
# Under HTTPS. It will be of type <name>-ats.iot.<region>.amazon.com 

Dataset and model training

We use the printed circuit board dataset to demonstrate the solution. The dataset contains normal and anomalous images. Here are a few sample images from the dataset.

The following image shows a normal printed circuit board.

The following image shows a printed circuit board with scratches.

The following image shows a printed circuit board with a soldering defect.

To train a Lookout for Vision model, we follow the steps outlined in Amazon Lookout for Vision – New ML Service Simplifies Defect Detection for Manufacturing. After you complete these steps, you can navigate to the project and the Models page to check the performance of the trained model. You can start the process of exporting the model to the target edge device any time after the model is trained.

Compile and package the model as an AWS IoT Greengrass component

In this section, we walk through the steps to compile the printed circuit board model to our target edge device and package the model as an AWS IoT Greengrass component.

  1. On the Lookout for Vision console, choose your project.
  2. In the navigation pane, choose Edge model packages.
  3. Choose Create model packaging job.

  1. For Job name, enter a name.
  2. For Job description, enter an optional description.
  3. Choose Browse models.

  1. Select the model version (the printed circuit board model built in the previous section).
  2. Choose Choose.

  1. Select Target device and enter the compiler options.

Our target device is on JetPack 4.5.1. See this page for additional details on supported platforms. You can find the supported compiler options such as trt-ver and cuda-ver in the NVIDIA JetPack 4.5.1 archive.

  1. Enter the details for Component name, Component description (optional), Component version, and Component location.

Amazon Lookout for Vision stores the component recipes and artifacts in this Amazon S3 location.

  1. Choose Create model packaging job.

You can see your job name and status showing as In progress. The model packaging job may take a few minutes to complete.

When the model packaging job is complete, the status shows as Success.

  1. Choose your job name (in our case it’s ComponentCircuitBoard) to see the job details.

The Greengrass component and model artifacts have been created in your AWS account.

  1. Choose Continue deployment to Greengrass to deploy the component to the target edge device.

Deploy the model

In this section, we walk through the steps to deploy the printed circuit board model to the edge device using the AWS IoT Greengrass console.

  1. Choose Deploy to initiate the deployment steps.

  1. Select Core device (because the deployment is to a single device) and enter a name for Target name.

The target name is the same name you used to name the core device during the AWS IoT Greengrass V2 installation process.

  1. Choose your component. In our case, the component name is ComponentCircuitBoard, which contains the circuit board model.
  2. Choose Next.

  1. Configure the component (optional).
  2. Choose Next.

  1. Expand Deployment policies.

  1. For Component update policy, select Notify components.

This allows the already deployed component (a prior version of the component) to defer an update until they are ready to update.

  1. For Failure handling policy, select Don’t roll back.

In case of a failure, this option allows us to investigate the errors in deployment.

  1. Choose Next.

  1. Review the list of components that will be deployed on the target (edge) device.
  2. Choose Next.

You should see the message Deployment successfully created.

  1. To validate the model deployment was successful, run the following command on your edge device:
sudo /greengrass/v2/bin/greengrass-cli component list

You should see a similar looking output running the ComponentCircuitBoard lifecycle startup script:

 Components currently running in Greengrass:
 
 Component Name: aws.iot.lookoutvision.EdgeAgent
    Version: 0.1.34
    State: RUNNING
    Configuration: {"Socket":"unix:///tmp/aws.iot.lookoutvision.EdgeAgent.sock"}
 Component Name: ComponentCircuitBoard
    Version: 1.0.0
    State: RUNNING
    Configuration: {"Autostart":false}

Run inferences on the model

We’re now ready to run inferences on the model. On your edge device, run the following command to load the model:

# run command to load the model
# This will load the model into running state 
python3 warmup-model.py

To generate inferences, run the following command with the source file name:

python3 sample-client-file-mqtt.py /path/to/images

The following screenshot shows that the model correctly predicts the image as anomalous (bent pin) with a confidence score of 0.999766.

The following screenshot shows that the model correctly predicts the image as anomalous (solder blob) with a confidence score of 0.7701461.

The following screenshot shows that the model correctly predicts the image as normal with a confidence score of 0.9568462.

The following screenshot shows that the inference data posted an MQTT topic in AWS IoT Core.

Customer Stories

With AWS IoT Greengrass and Amazon Lookout for Vision, you can now automate visual inspection with CV for processes like quality control and defect assessment – all on the edge and in real time. You can proactively identify problems such as parts damage (like dents, scratches, or poor welding), missing product components, or defects with repeating patterns, on the production line itself – saving you time and money. Customers like Tyson and Baxter are discovering the power of Amazon Lookout for Vision to increase quality and reduce operational costs by automating visual inspection.

“Operational excellence is a key priority at Tyson Foods. Predictive maintenance is an essential asset for achieving this objective by continuously improving overall equipment effectiveness (OEE). In 2021, Tyson Foods launched a machine learning based computer vision project to identify failing product carriers during production to prevent them from impacting Team Member safety, operations, or product quality.

The models trained using Amazon Lookout for Vision performed well. The pin detection model achieved 95% accuracy across both classes. The Amazon Lookout for Vision model was tuned to perform at 99.1% accuracy for failing pin detection. By far the most exciting result of this project was the speedup in development time. Although this project utilizes two models and a more complex application code, it took 12% less developer time to complete. This project for monitoring the condition of the product carriers at Tyson Foods was completed in record time using AWS managed services such as Amazon Lookout for Vision.”

Audrey Timmerman, Sr Applications Developer, Tyson Foods.

“We use Amazon Lookout for Vision to automate inspection tasks and solve complex process management problems that can’t be addressed by manual inspection or traditional machine vision alone. Lookout for Vision’s cloud and edge capabilities provide us the ability to leverage computer vision and AI/ML-based solutions at scale in a rapid and agile manner, helping us to drive efficiencies on the manufacturing shop floor and enhance our operator’s productivity and experience.”

K. Karan, Global Senior Director – Digital Transformation, Integrated Supply Chain, Baxter International Inc.

Conclusion

In this post, we described a typical scenario for industrial defect detection at the edge. We walked through the key components of the cloud and edge lifecycle with an end-to-end example using Lookout for Vision and AWS IoT Greengrass. With Lookout for Vision, we trained an anomaly detection model in the cloud using the printed circuit board dataset, compiled the model to a target architecture, and packaged the model as an AWS IoT Greengrass component. With AWS IoT Greengrass, we deployed the model to an edge device. We demonstrated a Python-based sample application that sources printed circuit board images from the edge device local file system, runs the inferences on the Lookout for Vision model at the edge using the gRPC interface, and sends the inference data to an MQTT topic in the AWS Cloud.

In a future post, we will show how to run inferences on a real-time stream of images using a GStreamer media pipeline.

Start your journey towards industrial anomaly detection and identification by visiting the Amazon Lookout for Vision and AWS IoT Greengrass resource pages.


About the Authors

Amit Gupta is an AI Services Solutions Architect at AWS. He is passionate about enabling customers with well-architected machine learning solutions at scale.

 Ryan Vanderwerf is a partner solutions architect at Amazon Web Services. He previously provided Java virtual machine-focused consulting and project development as a software engineer at OCI on the Grails and Micronaut team. He was chief architect/director of products at ReachForce, with a focus on software and system architecture for AWS Cloud SaaS solutions for marketing data management. Ryan has built several SaaS solutions in several domains such as financial, media, telecom, and e-learning companies since 1996.

Prathyusha Cheruku is an AI/ML Computer Vision Product Manager at AWS. She focuses on building powerful, easy-to-use, no code/ low code deep learning-based image and video analysis services for AWS customers.

Read More

Sensing What’s Ahead in 2022: Latest Breakthroughs Pave Way for Year of Autonomous Vehicle Innovation

2021 trends are charging into 2022, heralding a new era of autonomous transportation and opening up business models and services never before dreamed of.

In the next year, software-defined compute architectures, electric powertrains, high-fidelity simulation, AI assistants and autonomous trucking solutions are set to transform the transportation industry.

This past year, key technologies saw significant progress, including centralized high-performance compute, data center solutions, simulation and more. These breakthroughs will give rise to even greater innovation next year, ushering in new technology, improving current offerings and accelerating the deployment of safer, more efficient vehicles.

Following Mercedes-Benz’s announcement in 2020 of its upcoming software-defined fleets built on NVIDIA DRIVE Orin, 2021 saw nearly a dozen companies transition their vehicles to the high-performance, centralized compute platform, including NIO, SAIC, Xpeng and more.

Simulation, a crucial component of the autonomous vehicle development pipeline, further narrowed the gap between the virtual and real worlds, using technologies such as NVIDIA Omniverse and synthetic data generation.

And, as the pandemic continued, increased demand for delivery, as well as the worsening driver shortage, invigorated efforts to deploy autonomous trucking solutions.

An Always-New Set of Wheels

AI is transforming the personal vehicle experience. Vehicle development cycles have traditionally lasted around two years, and the end product is fixed with the technology it rolls off the manufacturing line with.

A centralized, software-defined vehicle architecture built on high-performance compute, such as NVIDIA DRIVE Orin, is richly programmable, streamlining development, and can continually improve over time.

In the next year, more automakers will move away from their traditional manufacturing practices and architect vehicles with high-performance compute headroom and full-stack software from the start. As a result, next-gen models can benefit from new apps and features, via over-the-air software updates, so the vehicle gets better and safer over time.

These vehicles will also continue to transition to electric powertrains for intelligent transportation that’s also more sustainable. Automakers have already pledged to increase the share of electric vehicles in their fleets, while newcomers begin to roll out cutting-edge production EVs.

Reality Goes Virtual

Autonomous vehicles are born in the data center, and simulation is a key component of this training and validation process.

In the past, simulation platforms have used gaming engines to generate virtual worlds. However, these engines have serious limitations in accurately recreating the physics and vehicle dynamics of a car driving in the real world.

NVIDIA DRIVE Sim is built on our core technologies, including NVIDIA RTX, Omniverse and AI, to create a true digital twin environment of the world. It uses NVIDIA Omniverse Replicator to generate physically based sensor data for camera, radar, lidar and ultrasonics, along with labeled ground truth data to reduce valuable development time and cost.

The combination of these technologies has significantly narrowed the gap between the virtual and physical worlds, delivering a comprehensive AV training, testing and validation platform. Equipped with DRIVE Sim, AV manufacturers can hit the accelerator on deployment plans in 2022.

Truly Personal Transportation

In addition to high-fidelity AV simulation, NVIDIA Omniverse is paving the way for a seamless intelligent assistant experience.

With NVIDIA DRIVE Concierge, vehicle occupants have access to AI services that are always on, using NVIDIA DRIVE IX and NVIDIA Omniverse Avatar for real-time interactions.

Omniverse Avatar connects speech AI, computer vision, natural language understanding, recommendation engines and simulation. Avatars created on the platform are interactive characters with ray-traced 3D graphics that can see, speak and converse on a wide range of subjects, and understand naturally spoken intent.

The technology of Omniverse Avatar enables DRIVE Concierge to serve as everyone’s digital assistant, providing recommendations and alerts, booking reservations and making phone calls. It’s personalized to each driver and passenger, giving every vehicle occupant their own personal concierge. And with Omniverse Avatar technology, these assistants will have incredible intelligence.

Keep on Trucking

As demand for ecommerce goods and freight continues to grow, the industry is increasingly investing in autonomous trucking solutions.

E-commerce orders increased nearly 60 percent year-over-year in 2020, according to last-mile technology vendor Convey Inc., with more than a third of shoppers opting for same-day delivery. At the same time, the trucking industry is experiencing a 90-percent-plus turnover rate; with the American Trucking Association estimating it will be short 160,000 drivers by 2028.

AI-enabled, highly automated and fully autonomous trucks as well as last-mile delivery vehicles, such as those under development by Volvo Autonomous Solutions, Kodiak Robotics, Embark, TuSimple, Plus, Einride and more, are an essential element of our transportation future.

These next-generation vehicles are built on the high-performance, energy-efficient compute of NVIDIA DRIVE to enhance the safety and quality of life for truck drivers and increase productivity and efficiency.

As the industry continues to adopt these transformative technologies, the next year will see rapid growth toward a truly autonomous future.

The post Sensing What’s Ahead in 2022: Latest Breakthroughs Pave Way for Year of Autonomous Vehicle Innovation appeared first on The Official NVIDIA Blog.

Read More