How LotteON built a personalized recommendation system using Amazon SageMaker and MLOps

How LotteON built a personalized recommendation system using Amazon SageMaker and MLOps

This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON.

LotteON aims to be a platform that not only sells products, but also provides a personalized recommendation experience tailored to your preferred lifestyle. LotteON operates various specialty stores, including fashion, beauty, luxury, and kids, and strives to provide a personalized shopping experience across all aspects of customers’ lifestyles.

To enhance the shopping experience of LotteON’s customers, the recommendation service development team is continuously improving the recommendation service to provide customers with the products they are looking for or may be interested in at the right time.

In this post, we share how LotteON improved their recommendation service using Amazon SageMaker and machine learning operations (MLOps).

Problem definition

Traditionally, the recommendation service was mainly provided by identifying the relationship between products and providing products that were highly relevant to the product selected by the customer. However, it was necessary to upgrade the recommendation service to analyze each customer’s taste and meet their needs. Therefore, we decided to introduce a deep learning-based recommendation algorithm that can identify not only linear relationships in the data, but also more complex relationships. For this reason, we built the MLOps architecture to manage the created models and provide real-time services.

Another requirement was to build a continuous integration and continuous delivery (CI/CD) pipeline that can be integrated with GitLab, a code repository used by existing recommendation platforms, to add newly developed recommendation models and create a structure that can continuously improve the quality of recommendation services through periodic retraining and redistribution of models.

In the following sections, we introduce the MLOps platform that we built to provide high-quality recommendations to our customers and the overall process of inferring a deep learning-based recommendation algorithm (Neural Collaborative Filtering) in real time and introducing it to LotteON.

Solution architecture

The following diagram illustrates the solution architecture for serving Neural Collaborative Filtering (NCF) algorithm-based recommendation models as MLOps. The main AWS services used are SageMaker, Amazon EMR, AWS CodeBuild, Amazon Simple Storage Service (Amazon S3), Amazon EventBridge, AWS Lambda, and Amazon API Gateway. We’ve combined several AWS services using Amazon SageMaker Pipelines and designed the architecture with the following components in mind:

  • Data preprocessing
  • Automated model training and deployment
  • Real-time inference through model serving
  • CI/CD structure

MLOps Architecture

The preceding architecture shows the MLOps data flow, which consists of three decoupled passes:

  • Code preparation and data preprocessing (blue)
  • Training pipeline and model deployment (green)
  • Real-time recommendation inference (brown)

Code preparation and data preprocessing

The preparation and preprocessing phase consists of the following steps:

  1. The data scientist publishes the deployment code containing the model and the training pipeline to GitLab, which is used by LotteON, and Jenkins uploads the code to Amazon S3.
  2. The EMR preprocessing batch runs through Airflow according to the specified schedule. The preprocessing data is loaded into MongoDB, which is used as a feature store along with Amazon S3.

Training pipeline and model deployment

The model training and deployment phase consists of the following steps:

  1. After the training data is uploaded to Amazon S3, CodeBuild runs based on the rules specified in EventBridge.
  2. The SageMaker pipeline predefined in CodeBuild runs, and sequentially runs steps such as preprocessing including provisioning, model training, and model registration.
  3. When training is complete (through the Lambda step), the deployed model is updated to the SageMaker endpoint.

Real-time recommendation inference

The inference phase consists of the following steps:

  1. The client application makes an inference request to the API gateway.
  2. The API gateway sends the request to Lambda, which makes an inference request to the model in the SageMaker endpoint to request a list of recommendations.
  3. Lambda receives the list of recommendations and provides them to the API gateway.
  4. The API gateway provides the list of recommendations to the client application using the Recommendation API.

Recommendation model using NCF

NCF is an algorithm based on a paper presented at the International World Wide Web Conference in 2017. It is an algorithm that covers the limitations of linear matrix factorization, which is often used in existing recommendation systems, with collaborative filtering based on the neural net. By adding non-linearity through the neural net, the authors were able to model a more complex relationship between users and items. The data for NCF is interaction data where users react to items, and the overall structure of the model is shown in the following figure (source: https://arxiv.org/abs/1708.05031).

NCF Model

Although NCF has a simple model architecture, it has shown a good performance, which is why we chose it to be the prototype for our MLOps platform. For more information about the model, refer to the paper Neural Collaborative Filtering.

In the following sections, we discuss how this solution helped us build the aforementioned MLOps components:

  • Data preprocessing
  • Automating model training and deployment
  • Real-time inference through model serving
  • CI/CD structure

MLOps component 1: Data preprocessing

For NCF, we used user-item interaction data, which requires significant resources to process the raw data collected at the application and transform it into a form suitable for learning. With Amazon EMR, which provides fully managed environments like Apache Hadoop and Spark, we were able to process data faster.

The data preprocessing batches were created by writing a shell script to run Amazon EMR through AWS Command Line Interface (AWS CLI) commands, which we registered to Airflow to run at specific intervals. When the preprocessing batch was complete, the training/test data needed for training was partitioned based on runtime and stored in Amazon S3. The following is an example of the AWS CLI command to run Amazon EMR:

aws emr create-cluster --release-label emr-6.0.0 
    --name "CLUSTER_NAME" 
    --applications Name=Hadoop Name=Hive Name=Spark 
    --tags 'Name=EMR-DATA-PREP' 'Owner=MODEL' 'Service=LOTTEON' 
    --ec2-attributes '{"KeyName":"keyname","InstanceProfile":"DefaultRole","ServiceAccessSecurityGroup":"sg-xxxxxxxxxxxxxx","SubnetId":"subnet- xxxxxxxxxxxxxx ","EmrManagedSlaveSecurityGroup":"sg- xxxxxxxxxxxxxx ","EmrManagedMasterSecurityGroup":"sg-xxxxxxxxxxxxxx "}' 
--instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"r5.xlarge","Name":"Master Instance Group"},{"InstanceCount":2,"InstanceGroupType":"CORE","InstanceType":"r5.xlarge","Name":"Core Instance Group"},{"InstanceCount":2,"BidPrice":"OnDemandPrice","InstanceGroupType":"TASK","InstanceType":"r5.xlarge","Name":"Task Instance Group"}]' 
    --service-role EMR_DefaultRole 
    --region ap-northeast-2 
    --steps Type=CUSTOM_JAR,Name=DATA_PREP,ActionOnFailure=CONTINUE,Jar=s3://ap-northeast-2.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://bucket/prefix/data_prep_batch.sh"]  
    --auto-terminate

MLOps component 2: Automated training and deployment of models

In this section, we discuss the components of the model training and deployment pipeline.

Event-based pipeline automation

After the preprocessing batch was complete and the training/test data was stored in Amazon S3, this event invoked CodeBuild and ran the training pipeline in SageMaker. In the process, the version of the result file of the preprocessing batch was recorded, enabling dynamic control of the version and management of the pipeline run history. We used EventBridge, Lambda, and CodeBuild to connect the data preprocessing steps run by Amazon EMR and the SageMaker learning pipeline on an event-based basis.

EventBridge is a serverless service that implements rules to receive events and direct them to destinations, based on the event patterns and destinations you establish. The initial role of EventBridge in our configuration was to invoke a Lambda function on the S3 object creation event when the preprocessing batch stored the training dataset in Amazon S3. The Lambda function dynamically modified the buildspec.yml file, which is indispensable when CodeBuild runs. These modifications encompassed the path, version, and partition information of the data that needed training, which is crucial for carrying out the training pipeline. The subsequent role of EventBridge was to dispatch events, instigated by the alteration of the buildspec.yml file, leading to running CodeBuild.

CodeBuild was responsible for building the source code where the SageMaker pipeline was defined. Throughout this process, it referred to the buildspec.yml file and ran processes such as cloning the source code and installing the libraries needed to build from the path defined in the file. The Project Build tab on the CodeBuild console allowed us to review the build’s success and failure history, along with a real-time log of the SageMaker pipeline’s performance.

SageMaker pipeline for training

SageMaker Pipelines helps you define the steps required for ML services, such as preprocessing, training, and deployment, using the SDK. Each step is visualized within SageMaker Studio, which is very helpful for managing models, and you can also manage the history of trained models and endpoints that can serve the models. You can also set up steps by attaching conditional statements to the results of the steps, so you can adopt only models with good retraining results or prepare for learning failures. Our pipeline contained the following high-level steps:

  • Model training
  • Model registration
  • Model creation
  • Model deployment

Each step is visualized in the pipeline in Amazon SageMaker Studio, and you can also see the results or progress of each step in real time, as shown in the following screenshot.

SageMaker Pipeline

Let’s walk through the steps from model training to deployment, using some code examples.

Train the model

First, you define a PyTorch Estimator to use for training and a training step. This requires you to have the training code (for example, train.py) ready in advance and pass the location of the code as an argument of the source_dir. The training step runs the training code you pass as an argument of the entry_point. By default, the training is done by launching the container in the instance you specify, so you’ll need to pass in the path to the training Docker image for the training environment you’ve developed. However, if you specify the framework for your estimator here, you can pass in the version of the framework and Python version to use, and it will automatically fetch the version-appropriate container image from Amazon ECR.

When you’re done defining your PyTorch Estimator, you need to define the steps involved in training it. You can do this by passing the PyTorch Estimator you defined earlier as an argument and the location of the input data. When you pass in the location of the input data, the SageMaker training job will download the train and test data to a specific path in the container using the format /opt/ml/input/data/<channel_name> (for example, /opt/ml/input/data/train).

In addition, when defining a PyTorch Estimator, you can use metric definitions to monitor the learning metrics generated while the model is being trained with Amazon CloudWatch. You can also specify the path where the results of the model artifacts after training are stored by specifying estimator_output_path, and you can use the parameters required for model training by specifying model_hyperparameters. See the following code:

from sagemaker.pytorch import PyTorch
metric_definitions=[
        {'Name': 'HR', 'Regex': 'HR=(.*?);'},
        {'Name': 'NDCG', 'Regex': 'NDCG=(.*?);'},
        {'Name': 'Loss', 'Regex': 'Loss=(.*?);'}
    ]
estimator_output_path = f's3://{bucket}/{prefix}'
model_hyperparameter = {'epochs': 10, 
                    'lr': 0.001,
                    'batch_size': 256,
                    'top_k' : 10,
                    'dropout' : 0.3,
                    'factor_num' : 32,
                    'num_layers' : 3
                }  
s3_code_uri = 's3://code_location/source.tar.gz'

host_estimator = PyTorch(
    entry_point="train.py", 
    source_dir = s3_code_uri, 
    output_path = estimator_output_path, 
    role=aws_role,
framework_version='1.8.1',
    py_version='py3',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    session = pipeline_session,
    hyperparameters=model_hyperparameter,
    metric_definitions = metric_definitions
)

from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
data_loc = f's3://{bucket}/{prefix}'
step_train = TrainingStep(
    name= "NCF-Training",
    estimator=host_estimator,
    inputs={
        "train": TrainingInput(s3_data=data_loc),
        "test": TrainingInput(s3_data=data_loc),        
    }
)

Create a model package group

The next step is to create a model package group to manage your trained models. By registering trained models in model packages, you can manage them by version, as shown in the following screenshot. This information allows you to reference previous versions of your models at any time. This process only needs to be done one time when you first train a model, and you can continue to add and update models as long as they declare the same group name.

Model Packages

See the following code:

import boto3
model_package_group_name = 'NCF'
sm_client = boto3.client("sagemaker")
model_package_group_input_dict = {
    "ModelPackageGroupName" : model_package_group_name,
    "ModelPackageGroupDescription" : "Model Package Group"
}
response = sm_client.list_model_package_groups(NameContains=model_package_group_name)
if len(response['ModelPackageGroupSummaryList']) == 0:
create_model_pacakge_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)

Add a trained model to a model package group

The next step is to add a trained model to the model package group you created. In the following code, when you declare the Model class, you get the result of the previous model training step, which creates a dependency between the steps. A step with a declared dependency can only be run if the previous step succeeds. However, you can use the DependsOn option to declare a dependency between steps even if the data is not causally related.

After the trained model is registered in the model package group, you can use this information to manage and track future model versions, create a real-time SageMaker endpoint, run a batch transform job, and more.

from sagemaker.workflow.model_step import ModelStep
from sagemaker.model import Model

inference_image_uri = '763104351884.dkr.ecr.ap-northeast-2.amazonaws.com/pytorch-inference:1.8.1-gpu-py3'
model = Model(
    image_uri=inference_image_uri,
    model_data = step_train.properties.ModelArtifacts.S3ModelArtifacts,
    role=role,
    sagemaker_session=pipeline_session,
)

register_model_step_args = model.register(
    content_types=["text/csv"],
    response_types=["text/csv"],
    model_package_group_name=model_package_group_name,
    approval_status='Approved',        
)

step_model_registration = ModelStep(
    name="RegisterModel",
    step_args=register_model_step_args
)

Create a SageMaker model

To create a real-time endpoint, an endpoint configuration and model is required. To create a model, you need two basic elements: an S3 address where the model’s artifacts are stored, and the path to the inference Docker image that will run the model’s artifacts.

When creating a SageMaker model, you must pay attention to the following steps:

  • Provide the result of the model training step, step_train.properties.ModelArtifacts.S3ModelArtifacts, which will be converted to the S3 path where the model artifact is stored, as an argument of the model_data.
  • Because you specified the PyTorchModel class, framework_version, and py_version, you use this information to get the path to the inference Docker image through Amazon ECR. This is the inference Docker image that is used for model deployment. Make sure to enter the same PyTorch framework, Python version, and other details that you used to train the model. This means keeping the same PyTorch and Python versions for training and inference.
  • Provide the inference.py as the entry point script to handle invocations.

This step will set a dependency on the model package registration step you defined via the DependsOn option.

from sagemaker.pytorch.model import PyTorchModel
from sagemaker.workflow.model_step import ModelStep

model_name = 'NCF-MODEL'
s3_code_uri = 's3://code_location/source.tar.gz'

model_inference = PyTorchModel(
        name = model_name,
        model_data = step_train.properties.ModelArtifacts.S3ModelArtifacts, 
image_uri= image_uri,
        role=role,
        entry_point= 'inference.py',
        source_dir = s3_code_uri,
        framework_version='1.8.1',
        py_version='py3',
        model_server_workers=1,
        sagemaker_session=pipeline_session
                            )
step_model_create = ModelStep(
    name="ModelCreation",
    step_args=model_inference.create(instance_type = 'ml.p3.2xlarge'),
    depends_on=step_model_registration
)

Create a SageMaker endpoint

Now you need to define an endpoint configuration based on the created model, which will create an endpoint when deployed. Because the SageMaker Python SDK doesn’t support the step related to deployment (as of this writing), you can use Lambda to register that step. Pass the necessary arguments to Lambda, such as instance_type, and use that information to create the endpoint configuration first. Because you’re calling the endpoint based on endpoint_name, you need to make sure that variable is defined with a unique name. In the following Lambda function code, based on the endpoint_name, you update the model if the endpoint exists, and deploy a new one if it doesn’t:

# lambda_deploy_model.py
import json
import boto3
def lambda_handler(event, context):
    sm_client = boto3.client("sagemaker")
    model_name = event["model_name"]
    endpoint_config_name = event["endpoint_config_name"]
    endpoint_name = event["endpoint_name"]
    instance_type = event["instance_type"]
 
    create_endpoint_config_response = sm_client.create_endpoint_config(
        EndpointConfigName=endpoint_config_name,
        ProductionVariants=[
            {
                "InstanceType": instance_type,
                "InitialVariantWeight": 1,
                "InitialInstanceCount": 1,
                "ModelName": model_name,
                "VariantName": "AllTraffic",
            }
        ],
    )
    print(f"create_endpoint_config_response: {create_endpoint_config_response}")
    existing_endpoints = sm_client.list_endpoints(NameContains=endpoint_name)['Endpoints']
    if len(existing_endpoints["Endpoints"]) > 0:
        sm_client.update_endpoint(
            EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
        )
    else:
        sm_client.create_endpoint(
            EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
        )
    return {"statusCode": 200, "body": json.dumps("Endpoint Created Successfully")}

To get the Lambda function into a step in the SageMaker pipeline, you can use the SDK associated with the Lambda function. By passing the location of the Lambda function source as an argument of the function, you can automatically register and use the function. In conjunction with this, you can define LambdaStep and pass it the required arguments. See the following code:

from sagemaker.lambda_helper import Lambda
from sagemaker.workflow.lambda_step import (LambdaStep, LambdaOutput, LambdaOutputTypeEnum)
endpoint_name = 'NCF-ENDPOINT'
endpoint_config_name = 'NCF-CONF'
deploy_script_path = 's3://code_location/lambda_deploy_model.py'
deploy_model_func = Lambda(
    function_name='lambda-deploy-step',
    execution_role_arn=role,
    script=deploy_script_path,
    handler="lambda_deploy_model.lambda_handler"
)
output_param_1 = LambdaOutput(output_name="statusCode", output_type=LambdaOutputTypeEnum.String)
output_param_2 = LambdaOutput(output_name="body", output_type=LambdaOutputTypeEnum.String)

step_deploy_lambda = LambdaStep(
    name="LambdaDeployStep",
    lambda_func=deploy_model_func,
    inputs={
        "model_name": step_model_create.properties.ModelName,
        "endpoint_config_name": endpoint_config_name,
        "endpoint_name": endpoint_name,
        "instance_type": 'ml.p3.2xlarge',       
    },
    outputs=[output_param_1, output_param_2]
)

Create a SageMaker pipeline

Now you can create a pipeline using the steps you defined. You can do this by defining a name for the pipeline and passing in the steps to be used in the pipeline as arguments. After that, you can run the defined pipeline through the start function. See the following code:

from sagemaker.workflow.pipeline import Pipeline
pipeline_name = 'NCF-pipeline'
pipeline = Pipeline(
    name=pipeline_name,
    steps=[step_train, step_model_registration, step_model_create, step_deploy_lambda],
    sagemaker_session=pipeline_session,
)

pipeline.start()

After this process is complete, an endpoint is created with the trained model and is ready for use based on the deep learning-based model.

MLOps component 3: Real-time inference with model serving

Now let’s see how to invoke the model in real time from the created endpoint, which can also be accessed using the SageMaker SDK. The following code is an example of getting real-time inference values for input values from an endpoint deployed via the invoke_endpoint function. The features you pass as arguments to the body are passed as input to the endpoint, which returns the inference results in real time.

import boto3
sagemaker_runtime = boto3.client("sagemaker-runtime")
endpoint_name='NCF-ENDPOINT'
 
response = sagemaker_runtime.invoke_endpoint(
                    EndpointName=endpoint_name, 
                    Body=bytes("'features': '{"user": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "item": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}'}")
)
print(response['Body'].read())

When we configured the inference function, we had it return the items in the order that the user is most likely to like among the items passed in. The preceding example returns items from 1–25 in order of likelihood of being liked by the user at index 0.

We added business logic to the feature, configured it in Lambda, and connected it with an API gateway to implement the API’s ability to return recommended items in real time. We then conducted performance testing of the online service. We load tested it with Locust using five g4dn.2xlarge instances and found that it could be reliably served in an environment with 1,000 TPS.

MLOps component 4: CI/CD structure

A CI/CD structure is a fundamental part of DevOps, and is also an important part of organizing an MLOps environment. AWS CodeCommit, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline collectively provide all the functionality you need for CI/CD, from code shaping to deployment, build, and batch management. The services are not only linked to the same code series, but also to other services such as GitHub and Jenkins, so if you have an existing CI/CD structure, you can use them separately to fill in the gaps. Therefore, we expanded our CI/CD structure by linking only the CodeBuild configuration described earlier to our existing CI/CD pipeline.

We linked our SageMaker notebooks with GitLab for code management, and when we were done, we replicated them to Amazon S3 via Jenkins. After that, we set the S3 path to the default repository path of the NCF CodeBuild project as described earlier, so that we could build the project with CodeBuild.

Conclusion

So far, we’ve seen the end-to-end process of configuring an MLOps environment using AWS services and providing real-time inference services based on deep learning models. By configuring an MLOps environment, we’ve created a foundation for providing high-quality services based on various algorithms to our customers. We’ve also created an environment where we can quickly proceed with prototype development and deployment. The NCF we developed with the prototyping algorithm was also able to achieve good results when it was put into service. In the future, the MLOps platform can help us quickly develop and experiment with models that match LotteON data to provide our customers with a progressively higher-quality recommendation experience.

Using SageMaker in conjunction with various AWS services has given us many advantages in developing and operating our services. As model developers, we didn’t have to worry about configuring the environment settings for frequently used packages and deep learning-related frameworks because the environment settings were configured for each library, and we felt that the connectivity and scalability between AWS services using AWS CLI commands and related SDKs were great. Additionally, as a service operator, it was good to track and monitor the services we were running because CloudWatch connected the logging and monitoring of each service.

You can also check out the NCF and MLOps configuration for hands-on practice on our GitHub repo (Korean).

We hope this post will help you configure your MLOps environment and provide real-time services using AWS services.


About the Authors

SeungBum Shim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team, responsible for discovering ways to use and improve recommendation-related products through LotteON data analysis, and developing MLOps pipelines and ML/DL recommendation models.

HyeKyung Yang is a research engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of developing ML/DL recommendation models by analyzing and utilizing various data and developing a dynamic A/B test environment.

Jieun Lim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of operating LotteON’s personalized recommendation system and developing personalized recommendation models and dynamic A/B test environments.

Jesam Kim is an AWS Solutions Architect and helps enterprise customers adopt and troubleshoot cloud technologies and provides architectural design and technical support to address their business needs and challenges, especially in AIML areas such as recommendation services and generative AI.

Gonsoo Moon is an AWS AI/ML Specialist Solutions Architect and provides AI/ML technical support. His main role is to collaborate with customers to solve their AI/ML problems based on various use cases and production experience in AI/ML.

Read More

Fight for Honor in ‘Men of War II’ on GFN Thursday

Fight for Honor in ‘Men of War II’ on GFN Thursday

Whether looking for new adventures, epic storylines or games to play with a friend, GeForce NOW members are covered.

Start off with the much-anticipated sequel to the Men of War franchise or cozy up with some adorable pals in Palworld, both part of five games GeForce NOW is bringing to the cloud this week.

No Guts, No Glory

Men of War II on GeForce NOW screenshot
For the cloud!

Get transported to the battlefields of World War II with historical accuracy and attention to detail in Men of War II, the newest entry in the real-time strategy series from Fulqrum Publishing.

The game features an extensive roster of units, including tanks, airplanes and infantry. With advanced enemy AI and diverse gameplay modes, Men of War II promises an immersive experience for both history enthusiasts and casual gamers.

Gear up, strategize and prepare to rewrite history. Get an extra fighting chance with a GeForce NOW Ultimate membership, which streams at up to 4K resolution and provides longer gaming sessions and faster access to games over a free membership.

Cloud Pals

Palworld on GeForce NOW
Pal around in the cloud.

Step into a world teeming with enigmatic creatures known as “Pals” in the action-adventure survival game Palworld from Pocketpair. Navigate the wilderness, gather resources and construct a base to capture, tame and train Pals, each with distinct abilities. Explore the world, uncover secrets and forge alliances or rivalries with other survivors in online co-op play mode.

Embark on adventure with these trusty Pals through a GeForce NOW membership. With a Priority membership, enjoy up to six hours of uninterrupted gaming sessions, while Ultimate members can extend their playtime to eight hours.

Master New Games

Die By The Blade on GeForce NOW
More than a one-hit wonder.

Vanquish foes with a single strike in 1v1 weapon-based fighter Die by the Blade from Grindstone. Dive into a samurai punk world and wield a range of traditional Japanese weapons. Take up arms and crush friends in local or online multiplayer, or take on unknown warriors in online ranked matches. Outwit opponents in intense, tactical battles and master the art of the one-hit kill.

Check out the list of new games this week:

  • Men of War II (New release on Steam, May 15)
  • Die by the Blade (New release on Steam, May 16)
  • Colony Survival (Steam)
  • Palworld (Steam)
  • Tomb Raider: Definitive Edition (Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

What’s Your Story: Jacki O’Neill

What’s Your Story: Jacki O’Neill

Circle photo of Jacki O'Neill, director of the Microsoft Africa Research Institute (MARI), with a microphone in the corner on a blue and green gradient background

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

In this episode, Gehrke is joined by Jacki O’Neill, director of Microsoft Research Africa, Nairobi (formerly the Microsoft Africa Research Institute, or MARI) in Kenya. O’Neill pitched the idea for the lab after seeing an opportunity to expand the Microsoft research portfolio. She shares how a desire to build tech that can have global societal impact and a familial connection to the continent factored into the decision; how a belief that life is meant to be exciting has allowed her to take big personal and professional swings; and how her team in Nairobi is applying their respective expertise in human-computer interaction, machine learning, and data science to pursue globally equitable AI.

To learn more about the global impact of AI, efforts to make AI more equitable, and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Photos of Jacki O'Neill, director of the Microsoft Africa Research Institute (MARI), throughout her life.

The post What’s Your Story: Jacki O’Neill appeared first on Microsoft Research.

Read More

NVIDIA, Teradyne and Siemens Gather in the ‘City of Robotics’ to Discuss Autonomous Machines and AI

NVIDIA, Teradyne and Siemens Gather in the ‘City of Robotics’ to Discuss Autonomous Machines and AI

Senior executives from NVIDIA, Siemens and Teradyne Robotics gathered this week in Odense, Denmark, to mark the launch of Teradyne’s new headquarters and discuss the massive advances coming to the robotics industry.

One of Denmark’s oldest cities and known as the city of robotics, Odense is home to over 160 robotics companies with 3,700 employees and contributes profoundly to the industry’s progress.

Teradyne Robotics’ new hub there, which includes cobot company Universal Robots (UR) and autonomous mobile robot (AMR) company MiR, is set to help employees maximize collaborative efforts, foster innovation and provide an environment to revolutionize advanced robotics and autonomous machines.

The grand opening showcased the latest AI robotic applications and featured a panel discussion on the future of advanced robotics. Speakers included Ujjwal Kumar, group president at Teradyne Robotics; Rainer Brehm, CEO of Siemens Factory Automation; and Deepu Talla, vice president of robotics and edge computing at NVIDIA.

“The advent of generative AI coupled with simulation and digital twins technology is at a tipping point right now, and that combination is going to change the trajectory of robotics,” commented Talla.

The Power of Partnerships

The discussion comes as the global robotics market continues to grow rapidly. The cobots market in Europe was valued at $286 million in 2022 and is projected to reach $6.7 billion by 2032, at a yearly growth rate of more than 37%.

Panelists discussed why teaming up is key to innovation for any company — whether a startup or an enterprise — and how physical AI is being used across businesses and workplaces, stressing the game-changing impact of advanced robotics.

The alliance between NVIDIA and Teradyne Robotics, which includes an AI-based intra-logistics solution alongside Siemens, showcases the strength of collaboration across the ecosystem. NVIDIA’s prominent role as a physical AI hardware provider is boosting the cobot and AMR sectors with accelerated computing, while its collaboration with Siemens is transforming industrial automation.

“NVIDIA provides all the core AI capabilities that get integrated into the hundreds and thousands of companies building robotic platforms and robots, so our approach is 100% collaboration,” Talla said.

“What excites me most about AI and robots is that collaboration is at the core of solving our customers’ problems,” Kumar added. “No one company has all the technologies needed to address these problems, so we must work together to understand and solve them at a very fast pace.”

Accelerating Innovation With AI 

AI has already made huge strides across industries and plays an important role in enhancing advanced robotics. Leveraging machine learning, computer vision and natural language processing, AI gives robots the cognitive capability to understand, learn and make decisions.

“For humans, we have our senses, but it’s not that easy for a robot, so you have to build these AI capabilities for autonomous navigation,” Talla said. “NVIDIA’s Isaac platform is enabling increased autonomy in robotics with rapid advancements in simulation, generative AI, foundation models and optimized edge computing.”

NVIDIA is working closely with the UR team to infuse AI into UR’s robotics software technology. In the case of autonomous mobile robots that move things from point A to B to C, it’s all about operating in unstructured environments and navigating autonomously.

Brehm emphasized the need to scale AI by industrializing it, allowing for automated deployment, inference and monitoring of models. He spoke about empowering customers to utilize AI effortlessly, even without AI expertise. “We want to enhance automation for more skill-based automation systems in the future,” he said.

As a leading robotics company with one of the largest installed bases of collaborative and AMRs, Teradyne has identified a long list of industry problems and is working closely with NVIDIA to solve them.

“I use the term ‘physical AI’ as opposed to ‘digital AI’ because we are taking AI to a whole new level by applying it in the physical world,” said Kumar. “We see it helping our customers in three ways: adding new capabilities to our robots, making our robots smarter with advanced path planning and navigation, and further enhancing the safety and reliability of our collaborative robots.”

The Impact of Real-World Robotics

Autonomous machines, or AI robots, are already making a noticeable difference in the real world, from industries to our daily lives. Industries such as manufacturing are using advanced robotics to enhance efficiency, accuracy and productivity.

Companies want to produce goods close to where they are consumed, with sustainability being a key driver. But this often means setting up shop in high-cost countries. The challenge is twofold: producing at competitive prices and dealing with shrinking, aging workforces that are less available for factory jobs.

“The problem for large manufacturers is the same as what small and medium manufacturers have always faced: variability,” Kumar said. “High-volume industrial robots don’t suit applications requiring continuous design tweaks. Collaborative robots combined with AI offer solutions to the pain points that small and medium customers have lived with for years, and to the new challenges now faced by large manufacturers.”

Automation isn’t just about making things faster; it’s also about making the most of the workforce. In manufacturing, automation aids smoother processes, ramps up safety, saves time and relieves pressure on employees.

“Automation is crucial and, to get there, AI is a game-changer for solving problems,” Brehm said.

AI and computing technologies are set to redefine the robotics landscape, transforming robots from mere tools to intelligent partners capable of autonomy and adaptability across industries.

Feature image by Steffen Stamp. Left to right: Fleur Nielsen, head of communications at Universal Robots; Deepu Talla, head of robotics at NVIDIA; Rainer Brehm, CEO of Siemens Factory Automation; and Ujjwal Kumar, president of Teradyne Robotics.

Read More