Customize business rules for intelligent document processing with human review and BI visualization

A massive amount of business documents are processed daily across industries. Many of these documents are paper-based, scanned into your system as images, or in an unstructured format like PDF. Each company may apply unique rules associated with its business background while processing these documents. How to extract information accurately and process them flexibly is a challenge many companies face.

Amazon Intelligent Document Processing (IDP) allows you to take advantage of industry-leading machine learning (ML) technology without previous ML experience. This post introduces a solution included in the Amazon IDP workshop showcasing how to process documents to serve flexible business rules using Amazon AI services. You can use the following step-by-step Jupyter notebook to complete the lab.

Amazon Textract helps you easily extract text from various documents, and Amazon Augmented AI (Amazon A2I) allows you to implement a human review of ML predictions. The default Amazon A2I template allows you to build a human review pipeline based on rules, such as when the extraction confidence score is lower than a pre-defined threshold or required keys are missing. But in a production environment, you need the document processing pipeline to support flexible business rules, such as validating the string format, verifying the data type and range, and validating fields across documents. This post shows how you can use Amazon Textract and Amazon A2I to customize a generic document processing pipeline supporting flexible business rules.

Solution overview

For our sample solution, we use the Tax Form 990, a US IRS (Internal Revenue Service) form that provides the public with financial information about a non-profit organization. For this example, we only cover the extraction logic for some of the fields on the first page of the form. You can find more sample documents on the IRS website.

The following diagram illustrates the IDP pipeline that supports customized business rules with human review.IDP HITM Overview

The architecture is composed of three logical stages:

  • Extraction – Extract data from the 990 Tax Form (we use page 1 as an example).

    • Retrieve a sample image stored in an Amazon Simple Storage Service (Amazon S3) bucket.
    • Call the Amazon Textract analyze_document API using the Queries feature to extract text from the page.
  • Validation – Apply flexible business rules with a human-in-the-loop review.

    • Validate the extracted data against business rules, such as validating the length of an ID field.
    • Send the document to Amazon A2I for a human to review if any business rules fail.
    • Reviewers use the Amazon A2I UI (a customizable website) to verify the extraction result.
  • BI visualization – We use Amazon QuickSight to build a business intelligence (BI) dashboard showing the process insights.

Customize business rules

You can define a generic business rule in the following JSON format. In the sample code, we define three rules:

  • The first rule is for the employer ID field. The rule fails if the Amazon Textract confidence score is lower than 99%. For this post, we set the confidence score threshold high, which will break by design. You could adjust the threshold to a more reasonable value to reduce unnecessary human effort in a real-world environment, such as 90%.
  • The second rule is for the DLN field (the unique identifier of the tax form), which is required for the downstream processing logic. This rule fails if the DLN field is missing or has an empty value.
  • The third rule is also for the DLN field but with a different condition type: LengthCheck. The rule breaks if the DLN length is not 16 characters.

The following code shows our business rules in JSON format:

rules = [
    {
        "description": "Employee Id confidence score should greater than 99",
        "field_name": "d.employer_id",
        "field_name_regex": None, # support Regex: "_confidence$",
        "condition_category": "Confidence",
        "condition_type": "ConfidenceThreshold",
        "condition_setting": "99",
    },
    {
        "description": "dln is required",
        "field_name": "dln",
        "condition_category": "Required",
        "condition_type": "Required",
        "condition_setting": None,
    },
    {
        "description": "dln length should be 16",
        "field_name": "dln",
        "condition_category": "LengthCheck",
        "condition_type": "ValueRegex",
        "condition_setting": "^[0-9a-zA-Z]{16}$",
    }
]

You can expand the solution by adding more business rules following the same structure.

Extract text using an Amazon Textract query

In the sample solution, we call the Amazon Textract analyze_document API query feature to extract fields by asking specific questions. You don’t need to know the structure of the data in the document (table, form, implied field, nested data) or worry about variations across document versions and formats. Queries use a combination of visual, spatial, and language cues to extract the information you seek with high accuracy.

To extract value for the DLN field, you can send a request with questions in natural languages, such as “What is the DLN?” Amazon Textract returns the text, confidence, and other metadata if it finds corresponding information on the image or document. The following is an example of an Amazon Textract query request:

textract.analyze_document(
        Document={'S3Object': {'Bucket': data_bucket, 'Name': s3_key}},
        FeatureTypes=["QUERIES"],
        QueriesConfig={
                'Queries': [
                    {
                        'Text': 'What is the DLN?',
                       'Alias': 'The DLN number - unique identifier of the form'
                    }
               ]
        }
)

Define the data model

The sample solution constructs the data in a structured format to serve the generic business rule evaluation. To keep extracted values, you can define a data model for each document page. The following image shows how the text on page 1 maps to the JSON fields.Custom data model

Each field represents a document’s text, check box, or table/form cell on the page. The JSON object looks like the following code:

{
    "dln": {
        "value": "93493319020929",
        "confidence": 0.9765, 
        "block": {} 
    },
    "omb_no": {
        "value": "1545-0047",
        "confidence": 0.9435,
        "block": {}
    },
    ...
}

You can find the detailed JSON structure definition in the GitHub repo.

Evaluate the data against business rules

The sample solution comes with a Condition class—a generic rules engine that takes the extracted data (as defined in the data model) and the rules (as defined in the customized business rules). It returns two lists with failed and satisfied conditions. We can use the result to decide if we should send the document to Amazon A2I for human review.

The Condition class source code is in the sample GitHub repo. It supports basic validation logic, such as validating a string’s length, value range, and confidence score threshold. You can modify the code to support more condition types and complex validation logic.

Create a customized Amazon A2I web UI

Amazon A2I allows you to customize the reviewer’s web UI by defining a worker task template. The template is a static webpage in HTML and JavaScript. You can pass data to the customized reviewer page using the Liquid syntax.

In the sample solution, the custom Amazon A2I UI template displays the page on the left and the failure conditions on the right. Reviewers can use it to correct the extraction value and add their comments.

The following screenshot shows our customized Amazon A2I UI. It shows the original image document on the left and the following failed conditions on the right:

  • The DLN numbers should be 16 characters long. The actual DLN has 15 characters.
  • The confidence score of employer_id is lower than 99%. The actual confidence score is around 98%.

The reviewers can manually verify these results and add comments in the CHANGE REASON text boxes.Customized A2I review UI

For more information about integrating Amazon A2I into any custom ML workflow, refer to over 60 pre-built worker templates on the GitHub repo and Use Amazon Augmented AI with Custom Task Types.

Process the Amazon A2I output

After the reviewer using the Amazon A2I customized UI verifies the result and chooses Submit, Amazon A2I stores a JSON file in the S3 bucket folder. The JSON file includes the following information on the root level:

  • The Amazon A2I flow definition ARN and human loop name
  • Human answers (the reviewer’s input collected by the customized Amazon A2I UI)
  • Input content (the original data sent to Amazon A2I when starting the human loop task)

The following is a sample JSON generated by Amazon A2I:

{
  "flowDefinitionArn": "arn:aws:sagemaker:us-east-1:711334203977:flow-definition/a2i-custom-ui-demo-workflow",
  "humanAnswers": [
    {
      "acceptanceTime": "2022-08-23T15:23:53.488Z",
      "answerContent": {
        "Change Reason 1": "Missing X at the end.",
        "True Value 1": "93493319020929X",
        "True Value 2": "04-3018996"
      },
      "submissionTime": "2022-08-23T15:24:47.991Z",
      "timeSpentInSeconds": 54.503,
      "workerId": "94de99f1bc6324b8",
      "workerMetadata": {
        "identityData": {
          "identityProviderType": "Cognito",
          "issuer": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_URd6f6sie",
          "sub": "cef8d484-c640-44ea-8369-570cdc132d2d"
        }
      }
    }
  ],
  "humanLoopName": "custom-loop-9b4e67ff-2c9f-40f9-aae5-0e26316c905c",
  "inputContent": {...} # the original input send to A2I when starting the human review task
}

You can implement extract, transform, and load (ETL) logic to parse information from the Amazon A2I output JSON and store it in a file or database. The sample solution comes with a CSV file with processed data. You can use it to build a BI dashboard by following the instructions in the next section.

Create a dashboard in Amazon QuickSight

The sample solution includes a reporting stage with a visualization dashboard served by Amazon QuickSight. The BI dashboard shows key metrics such as the number of documents processed automatically or manually, the most popular fields that required human review, and other insights. This dashboard can help you get an oversight of the document processing pipeline and analyze the common reasons causing human review. You can optimize the workflow by further reducing human input.

The sample dashboard includes basic metrics. You can expand the solution using Amazon QuickSight to show more insights into the data.BI dashboard

Expand the solution to support more documents and business rules

To expand the solution to support more document pages with corresponding business rules, you need to make the following changes:

  • Create a data model for the new page in JSON structure representing all the values you want to extract out of the pages. Refer to the Define the data model section for a detailed format.
  • Use Amazon Textract to extract text out of the document and populate values to the data model.
  • Add business rules corresponding to the page in JSON format. Refer to the Customize business rules section for the detailed format.

The custom Amazon A2I UI in the solution is generic, which doesn’t require a change to support new business rules.

Conclusion

Intelligent document processing is in high demand, and companies need a customized pipeline to support their unique business logic. Amazon A2I also offers a built-in template integrated with Amazon Textract to implement your human review use cases. It also allows you to customize the reviewer page to serve flexible requirements.

This post guided you through a reference solution using Amazon Textract and Amazon A2I to build an IDP pipeline that supports flexible business rules. You can try it out using the Jupyter notebook in the GitHub IDP workshop repo.


About the authors

Lana Zhang is a Sr. Solutions Architect at the AWS WWSO AI Services team with expertise in AI and ML for intelligent document processing and content moderation. She is passionate about promoting AWS AI services and helping customers transform their business solutions.


Sonali Sahu is leading Intelligent Document Processing AI/ML Solutions Architect team at Amazon Web Services. She is a passionate technophile and enjoys working with customers to solve complex problems using innovation. Her core area of focus are Artificial Intelligence & Machine Learning for Intelligent Document Processing.

Read More

How  AI is helping African communities and businesses

How AI is helping African communities and businesses

Editor’s note: Last week Google hosted the annual Google For Africa eventas part of our commitment to make the internet more useful in Africa, and to support the communities and businesses that will power Africa’s economic growth. This commitment includes our investment in research. Since announcing the Google AI Research Center in Accra, Ghanain 2018, we have made great strides in our mission to use AI for societal impact. In May we made several exciting announcements aimed at expanding these commitments.

Yossi Matias, VP of Engineering and Research, who oversees research in Africa, spoke with Jeff Dean, SVP of Google Research, who championed the opening of the AI Research Center, about the potential of AI in Africa.

Jeff: It’s remarkable how far we’ve come since we opened the center in Accra. I was excited then about the talented pool of researchers in Africa. I believed that by bringing together leading researchers and engineers, and collaborating with universities and the wider research community, we could push the boundaries of AI to solve critical challenges on the continent. It’s great to see progress on many fronts, from healthcare and education to agriculture and the climate crisis.

As part of Google For Africa last week, I spoke with Googlers across the continent about recent research and met several who studied at African universities we partner with. Yossi, from your perspective, how does our Research Center in Accra support the wider research ecosystem and benefit from it?

Yossi: I believe that nurturing local talent and working together with the community are critical to our mission. We’ve signed research agreements with five universities in Africa to conduct joint research, and I was fortunate to participate in the inauguration of the African Master of Machine Intelligence (AMMI) program, of which Google is a founding partner. Many AMMI graduates have continued their studies or taken positions in industry, including at our Accra Research Center where we offer an AI residency program. We’ve had three cohorts of AI residents to date.

Our researchers in Africa, and the partners and organizations we collaborate with, understand the local challenges best and can build and implement solutions that are helpful for their communities.

Jeff: For me, the Open Buildings initiative to map Africa’s built environment is a great example of that kind of collaborative solution. Can you share more about this?

Yossi: Absolutely. The Accra team used satellite imagery and machine learning to detect more than half a billion distinct structures and made the dataset available for public use. UN organizations, governments, non-profits, and startups have used the data for various applications, such as understanding energy needs for urban planning and managing the humanitarian response after a crisis. I’m very proud that we are now scaling this technology to countries outside of Africa as well.

Jeff: That’s a great achievement. It’s important to remember that the solutions we build in Africa can be scalable and useful globally. Africa has the world’s youngest population, so it’s essential that we continue to nurture the next generation of tech talent.

We must also keep working to make information accessible for this growing, diverse population. I’m proud of our efforts to use machine translation breakthroughs to bring more African languages online. Several languages were added to Google translate this year, including Bambara, Luganda, Oromo and Sepedi, which are spoken by a combined 85 million people. My mom spoke fluent Lugbara from our time living in Uganda when I was five—Lugbara didn’t make the set of languages added in this round, but we’re working on it!

Yossi: That’s just the start. Conversational technologies also have exciting educational applications that could help students and businesses. We recently collaborated with job seekers to build the Interview Warmup Tool, featured at the Google For Africa event, which uses machine learning and large language models to help job seekers prepare for interviews.

Jeff: Yossi, what’s something that your team is focused on now that you believe will have a profound impact on African society going forward?

Yossi: Climate and sustainability is a big focus and technology has a significant role to play. For example, our AI prediction models can accurately forecast floods, one of the deadliest natural disasters. We’re collaborating with several countries and organizations across the continent to scale this technology so that we can alert people in harm’s way.

We’re also working with local partners and startups on sustainability projects including reducing carbon emissions at traffic lights and improving food security by detecting locust outbreaks, which threaten the food supply and livelihoods of millions of people. I look forward to seeing many initiatives scale as more communities and countries get on board.

Jeff: I’m always inspired by the sense of opportunity in Africa. I’d like to thank our teams and partners for their innovation and collaboration. Of course, there’s much more to do, and together we can continue to make a difference.

Read More

What Is Green Computing?

Everyone wants green computing.

Mobile users demand maximum performance and battery life. Businesses and governments increasingly require systems that are powerful yet environmentally friendly. And cloud services must respond to global demands without making the grid stutter.

For these reasons and more, green computing has evolved rapidly over the past three decades, and it’s here to stay.

What Is Green Computing?

Green computing, or sustainable computing, is the practice of maximizing energy efficiency and minimizing environmental impact in the ways computer chips, systems and software are designed and used.

Also called green information technology, green IT or sustainable IT, green computing spans concerns across the supply chain, from the raw materials used to make computers to how systems get recycled.

In their working lives, green computers must deliver the most work for the least energy, typically measured by performance per watt.

Why Is Green Computing Important?

Green computing is a significant tool to combat climate change, the existential threat of our time.

Global temperatures have risen about 1.2°C over the last century. As a result, ice caps are melting, causing sea levels to rise about 20 centimeters and increasing the number and severity of extreme weather events.

The rising use of electricity is one of the causes of global warming. Data centers represent a small fraction of total electricity use, about 1% or 200 terawatt-hours per year, but they’re a growing factor that demands attention.

Powerful, energy-efficient computers are part of the solution. They’re advancing science and our quality of life, including the ways we understand and respond to climate change.

What Are the Elements of Green Computing?

Engineers know green computing is a holistic discipline.

“Energy efficiency is a full-stack issue, from the software down to the chips,” said Sachin Idgunji, co-chair of the power working group for the industry’s MLPerf AI benchmark and a distinguished engineer working on performance analysis at NVIDIA.

For example, in one analysis he found NVIDIA DGX A100 systems delivered a nearly 5x improvement in energy efficiency in scale-out AI training benchmarks compared to the prior generation.

“My primary role is analyzing and improving energy efficiency of AI applications at everything from the GPU and the system node to the full data center scale,” he said.

Idgunji’s work is a job description for a growing cadre of engineers building products from smartphones to supercomputers.

What’s the History of Green Computing?

Green computing hit the public spotlight in 1992, when the U.S. Environmental Protection Agency launched Energy Star, a program for identifying consumer electronics that met standards in energy efficiency.

A logo for energy efficient systems
The Energy Star logo is now used across more than three dozen product groups.

A 2017 report found nearly 100 government and industry programs across 22 countries promoting what it called green ICTs, sustainable information and communication technologies.

One such organization, the Green Electronics Council, provides the Electronic Product Environmental Assessment Tool, a registry of systems and their energy-efficiency levels. The council claims it’s saved nearly 400 million megawatt-hours of electricity through use of 1.5 billion green products it’s recommended to date.

Work on green computing continues across the industry at every level.

For example, some large data centers use liquid-cooling while others locate data centers where they can use cool ambient air. Schneider Electric recently released a whitepaper recommending 23 metrics for determining the sustainability level of data centers.

A checklist for green computing in a data center
Data centers need to consider energy and water use as well as greenhouse gas emissions and waste to measure their sustainability, according to a Schneider whitepaper.

A Pioneer in Energy Efficiency

Wu Feng, a computer science professor at Virginia Tech, built a career pushing the limits of green computing. It started out of necessity while he was working at the Los Alamos National Laboratory.

A computer cluster for open science research he maintained in an external warehouse had twice as many failures in summers versus winters. So, he built a lower-power system that wouldn’t generate as much heat.

Green Destiny, an energy efficient computer
The Green Destiny supercomputer

He demoed the system, dubbed Green Destiny, at the Supercomputing conference in 2001. Covered by the BBC, CNN and the New York Times, among others, it sparked years of talks and debates in the HPC community about the potential reliability as well as efficiency of green computing.

Interest rose as supercomputers and data centers grew, pushing their boundaries in power consumption. In November 2007, after working with some 30 HPC luminaries and gathering community feedback, Feng launched the first Green500 List, the industry’s benchmark for energy-efficient supercomputing.

A Green Computing Benchmark

The Green500 became a rallying point for a community that needed to reign in power consumption while taking performance to new heights.

“Energy efficiency increased exponentially, flops per watt doubled about every year and a half for the greenest supercomputer at the top of the list,” said Feng.

By some measures, the results showed the energy efficiency of the world’s greenest systems increased two orders of magnitude in the last 14 years.

The Green500 list shows the energy efficiency of NVIDIA GPUs
The Green500 showed that heterogeneous systems — those with accelerators like GPUs in addition to CPUs — are consistently the most energy-efficient ones.

Feng attributes the gains mainly to the use of accelerators such as GPUs, now common among the world’s fastest systems.

“Accelerators added the capability to execute code in a massively parallel way without a lot of overhead — they let us run blazingly fast,” he said.

He cited two generations of the Tsubame supercomputers in Japan as early examples. They used NVIDIA Kepler and Pascal architecture GPUs to lead the Green500 list in 2014 and 2017, part of a procession of GPU-accelerated systems on the list.

“Accelerators have had a huge impact throughout the list,” said Feng, who will receive an award for his green supercomputing work at the Supercomputing event in November.

“Notably, NVIDIA was fantastic in its engagement and support of the Green500 by ensuring its energy-efficiency numbers were reported, thus helping energy efficiency become a first-class citizen in how supercomputers are designed today,” he added.

AI and Networking Get More Efficient

Today, GPUs and data processing units (DPUs) are bringing greater energy efficiency to AI and networking tasks, as well as HPC jobs like simulations run on supercomputers and enterprise data centers.

AI, the most powerful technology of our time, will become a part of every business. McKinsey & Co. estimates AI will add a staggering $13 trillion to global GDP by 2030 as deployments grow.

NVIDIA estimates data centers could save a whopping 19 terawatt-hours of electricity a year if all AI, HPC and networking offloads were run on GPU and DPU accelerators (see the charts below). That’s the equivalent of the energy consumption of 2.9 million passenger cars driven for a year.

It’s an eye-popping measure of the potential for energy efficiency with accelerated computing.

The energy efficiency of using GPUs and DPUs for green computing
An analysis of the potential energy savings of accelerated computing with GPUs and DPUs.

AI Benchmark Measures Efficiency

Because AI represents a growing part of enterprise workloads, the MLPerf industry benchmarks for AI have been measuring performance per watt on submissions for data center and edge inference since February 2021.

“The next frontier for us is to measure energy efficiency for AI on larger distributed systems, for HPC workloads and for AI training — it’s similar to the Green500 work,” said Idgunji, whose power group at MLPerf includes members from six other chip and systems companies.

Energy efficiency gains of green computing with NVIDIA Jetson
NVIDIA Jetson modules recently demonstrated significant generation-to-generation leaps in performance per watt in MLPerf benchmarks of AI inference.

The public results motivate participants to make significant improvements with each product generation. They also help engineers and developers understand ways to balance performance and efficiency across the major AI workloads that MLPerf tests.

“Software optimizations are a big part of work because they can lead to large impacts in energy efficiency, and if your system is energy efficient, it’s more reliable, too,” Idgunji said.

Green Computing for Consumers

In PCs and laptops, “we’ve been investing in efficiency for a long time because it’s the right thing to do,” said Narayan Kulshrestha, a GPU power architect at NVIDIA who’s been working in the field nearly two decades.

For example, Dynamic Boost 2.0 uses deep learning to automatically direct power to a CPU, a GPU or a GPU’s memory to increase system efficiency. In addition, NVIDIA created a system-level design for laptops, called Max-Q, to optimize and balance energy efficiency and performance.

Building a Cyclical Economy

When a user replaces a system, the standard practice in green computing is that the old system gets broken down and recycled. But Matt Hull sees better possibilities.

“Our vision is a cyclical economy that enables everyone with AI at a variety of price points,” said Hull, the vice president of sales for data center AI products at NVIDIA.

So he aims to find the system a new home with users in developing countries who find it useful and affordable. It’s a work in progress seeking the right partner and writing a new chapter in an existing lifecycle management process.

Green Computing Fights Climate Change

Energy-efficient computers are among the sharpest tools fighting climate change.

Scientists in government labs and universities have long used GPUs to model climate scenarios and predict weather patterns. Recent advances in AI, driven by NVIDIA GPUs, can now help model weather forecasting 100,000x quicker than traditional models. Watch the following video for details:

In an effort to accelerate climate science, NVIDIA announced plans to build Earth-2, an AI supercomputer dedicated to predicting the impacts of climate change. It will use NVIDIA Omniverse, a 3D design collaboration and simulation platform, to build a digital twin of Earth so scientists can model climates in ultra-high resolution.

In addition, NVIDIA is working with the United Nations Satellite Centre to accelerate climate-disaster management and train data scientists across the globe in using AI to improve flood detection.

Meanwhile, utilities are embracing machine learning to move toward a green, resilient and smart grid. Power plants are using digital twins to predict costly maintenance and model new energy sources, such as fusion-reactor designs.

What’s Ahead in Green Computing?

Feng sees the core technology behind green computing moving forward on multiple fronts.

In the short term, he’s working on what’s called energy proportionality, that is, ways to make sure systems get peak power when they need peak performance and scale gracefully down to zero power as they slow to an idle, like a modern car engine that slows its RPMs and then shuts down at a red light.

Researchers seek to close the gap in energy-proportional computing.

Long term, he’s exploring ways to minimize data movement inside and between computer chips to reduce their energy consumption. And he’s among many researchers studying the promise of quantum computing to deliver new kinds of acceleration.

It’s all part of the ongoing work of green computing, delivering ever more performance at ever greater efficiency.

The post What Is Green Computing? appeared first on NVIDIA Blog.

Read More

GeForce RTX 4090 GPU Arrives, Enabling New World-Building Possibilities for 3D Artists This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. In the coming weeks, we’ll deep dive on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Creators can now pick up the GeForce RTX 4090 GPU, available from top add-in card providers including ASUS, Colorful, Gainward, Galaxy, GIGABYTE, INNO3D, MSI, Palit, PNY and ZOTAC, as well as from system integrators and builders worldwide.

Fall has arrived, and with it comes the perfect time to showcase the beautiful, harrowing video, Old Abandoned Haunted Mansion, created by 3D artist and principal lighting expert Pasquale Scionti this week In the NVIDIA Studio.

Artists like Scionti can create at the speed of light with the help of RTX 40 Series GPUs alongside 110 RTX-accelerated apps, the NVIDIA Studio suite of software and dedicated Studio Drivers.

A Quantum Leap in Creative Performance

The new GeForce RTX 4090 GPU brings an extraordinary boost in performance, third-generation RT Cores, fourth-generation Tensor Cores, an eighth-generation NVIDIA Dual AV1 Encoder and 24GB of Micron G6X memory capable of reaching 1TB/s bandwidth.

The new GeForce RTX 4090 GPU.

3D artists can now build scenes in fully ray-traced environments with accurate physics and realistic materials — all in real time, without proxies. DLSS 3 technology uses the AI-powered RTX Tensor Cores and a new Optical Flow Accelerator to generate additional frames and dramatically increase frames per second (FPS). This improves smoothness and speeds up movement in the viewport. NVIDIA is working with popular 3D apps Unity and Unreal Engine 5 to integrate DLSS 3.

DLSS 3 will also benefit workflows in the NVIDIA Omniverse platform for building and connecting custom 3D pipelines. New Omniverse tools such as NVIDIA RTX Remix for modders, which was used to create Portal with RTX, will be game changers for 3D content creation.

Video and live-streaming creative workflows are also turbocharged as the new AV1 encoder delivers 40% increased efficiency, unlocking higher resolution and crisper image quality. Expect AV1 integration in OBS Studio, DaVinci Resolve and Adobe Premiere Pro (though the Voukoder plugin) later this month.

The new dual encoders capture up to 8K resolution at 60 FPS in real time via GeForce Experience and OBS Studio, and cut video export times nearly in half. These encoders will be enabled in popular video-editing apps including Blackmagic Design’s DaVinci Resolve, the Voukoder plugin for Adobe Premiere Pro, and Jianying Pro — China’s top video-editing app — later this month.

State-of-the-art AI technology, like AI image generators and new video-editing tools in DaVinci Resolve, is ushering in the next step in the AI revolution, delivering up to a 2x increase in performance over the previous generation.

To break technological barriers and expand creative possibilities, pick up the GeForce RTX 4090 GPU today. Check out this product finder for retail availability.

Haunted Mansion Origins

The visual impact of Old Abandoned Haunted Mansion is nothing short of remarkable, with photorealistic details for lighting and shadows and stunningly accurate textures.

However, it’s Scionti’s intentional omission of specific detail that allows viewers to construct their own narrative, a staple of his work.

Scionti highlighted additional mysterious features he created within the haunted mansion: a painting with a specter on the stairs, knocked-over furniture, a portrait of a woman who might’ve lived there and a mirror smashed in the middle as if someone struck it.

“Perhaps whatever happened is still in these walls,” mused Scionti. “Abandoned, reclaimed by nature.”

Scionti said he finds inspiration in the works of H.R. Giger, H.P. Lovecraft and Edgar Allan Poe, and often dreams of the worlds he aspires to build before bringing them to life in 3D. He stressed, however, “I don’t have a dark side! It just appears in my work!”

For Old Abandoned Haunted Mansion, the artist began by creating a moodboard featuring abandoned places. He specifically included structures that were reclaimed by nature to create a warm mood with the sun filtering in from windows, doors and broken walls.

Foundational building blocks in Autodesk 3ds Max.

Scionti then modeled the scene’s objects, such as the ceiling lamp, painting frames and staircase, using Autodesk 3ds Max. By using a GeForce RTX 3090 GPU and selecting the default Autodesk Arnold renderer, he deployed RTX-accelerated AI denoising, resulting in interactive renders that were easy to edit while maintaining photorealism.

Modeling in Autodesk 3d Max.

The versatile Autodesk 3ds Max software supports third-party GPU-accelerated renderers such as V-Ray, OctaneRender and Redshift, giving RTX owners additional options for their creative workflows.

When it comes time to export the renders, Scionti will soon be able to use GeForce RTX 40 Series GPUs to do so up to 80% faster than the previous generation.

Texture applications in Adobe 3D Substance Painter.

Scionti imported the models, like the ceiling lamp and various paintings, into Adobe Substance 3D Painter to apply unique textures. The artist used RTX-accelerated light and ambient occlusion to bake his assets in mere seconds.

Modeling techniques for the curtains, the drape on the armchair and the ghostly figure were created using Marvelous Designer, a realistic cloth-making program for 3D artists. In a system-requirements page, the Marvelous Designer team recommends using GeForce RTX 30 and other NVIDIA RTX GPU class GPUs, as well as downloading the latest NVIDIA Studio Driver.

Texturing and material creation in Quixel Mixer.

Additional objects like the wooden ceiling were created using Quixel Mixer, an all-in-one texturing and material-creation tool designed to be intuitive and extremely fast.

Browsing objects in Quixel Megascans.

Scionti then searched Quixel Megascans, the largest and fastest growing 3D can library, to acquire the remaining assets to round out the piece.

With the composition in place, Scionti applied final details in Unreal Engine 5.

RTX ON in Unreal Engine 5

Scionti used Unreal Engine 5, activating hardware-accelerated RTX ray tracing for high-fidelity, interactive visualization of 3D designs. He was further aided by NVIDIA DLSS, which uses AI to upscale frames rendered at lower resolution while retaining high-fidelity detail. The artist then constructed the scene rich with beautiful lighting, shadows and textures.

The new GeForce RTX 40 Series GPU lineup will use DLSS 3 — coming soon to UE5 — with AI Frame Generation to further enhance interactivity in the viewport.

Scionti perfected his lighting with Lumen, UE5’s fully dynamic global illumination and reflections system, supported by GeForce RTX GPUs.

Photorealistic details achieved thanks to Unreal Engine 5 and NVIDIA RTX-accelerated ray tracing.

“Nanite meshes were useful to have high polygons for close up details,” noted Scionti. “For lighting, I used the sun and sky, but to add even more light, I inserted rectangular light sources outside each opening, like the windows and the broken wall.”

To complete the video, Scionti added a deliberately paced, instrumental score which consists of a piano, violin, synthesizer and drum. The music injects an unexpected emotional element to the piece.

Scionti reflected on his creative journey, which he considers a relentless pursuit of knowledge and perfecting his craft. “The pride of seeing years of commitment and passion being recognized is incredible, and that drive has led me to where I am today,” he said.

To embark on an Unreal Engine 5-powered creative journey through desert scenes, alien landscapes, abandoned towns, castle ruins and beyond, check out the latest NVIDIA Studio Standout featuring some of the most talented 3D artists, including Scionti.

3D artist and principal lighting expert Pasquale Scionti.

For more, explore Scionti’s Instagram.

Join the #From2Dto3D challenge

Scionti brought Old Abandoned Haunted Mansion from 2D beauty into 3D realism — and the NVIDIA Studio team wants to see more 2D to 3D progress.

Join the #From2Dto3D challenge this month for a chance to be featured on the NVIDIA Studio social media channels, like @juliestrator, whose delightfully cute illustration is elevated in 3D:

Entering is quick and easy. Simply post a 2D piece of art next to a 3D rendition of it on Instagram, Twitter or Facebook. And be sure to tag #From2Dto3D to enter.

Get creativity-inspiring updates directly to your inbox by subscribing to the NVIDIA Studio newsletter.

The post GeForce RTX 4090 GPU Arrives, Enabling New World-Building Possibilities for 3D Artists This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Read More

Measuring perception in AI models

Perception – the process of experiencing the world through senses – is a significant part of intelligence. And building agents with human-level perceptual understanding of the world is a central but challenging task, which is becoming increasingly important in robotics, self-driving cars, personal assistants, medical imaging, and more. So today, we’re introducing the Perception Test, a multimodal benchmark using real-world videos to help evaluate the perception capabilities of a model.Read More

Measuring perception in AI models

Perception – the process of experiencing the world through senses – is a significant part of intelligence. And building agents with human-level perceptual understanding of the world is a central but challenging task, which is becoming increasingly important in robotics, self-driving cars, personal assistants, medical imaging, and more. So today, we’re introducing the Perception Test, a multimodal benchmark using real-world videos to help evaluate the perception capabilities of a model.Read More