Introduction
Generative foundation models have transformed various domains, creating new paradigms for content generation. Integrating these models with domain-specific data enables industry-specific applications. Microsoft Research has used this approach to develop the large market model (LMM) and the Financial Market Simulation Engine (MarS) for the financial domain. These innovations have the potential to empower financial researchers to customize generative models for diverse scenarios, establishing a new paradigm for applying generative models to downstream tasks in financial markets. This integration may provide enhanced efficiency, more accurate insights, and significant advancements in the financial domain.
Applying generative models to financial markets
In recent years, generative foundation models have achieved notable success in fields like natural language processing and media generation. Their rise has sparked a new wave of research and industrial adoption, reshaping production processes across industries. These models excel due to three essential elements: a large volume of high-quality training data; effective tokenization and serialization of core information (such as semantic information in text); and an auto-regressive training approach that models data comprehensively, enabling implicit reasoning.
Building on years of AI applications across industries, Microsoft researchers recognized that combining generative models with domain-specific data could lead to impactful solutions, particularly in finance. The financial market is a prime example, notably for its vast amount of order data, which are characterized by three key features:
- Fine granularity: Orders, as the atomic data in the financial market, provide a comprehensive and detailed representation of the real market. Combined with matching rules, one can reproduce the entire market operation process.
- Large scale: Electronic trading has resulted in the accumulation of massive trade-order data across global exchanges
- Well-structured: The structured nature of order data makes it ideal for tokenization and sequential modeling
-
Download
MarS
These characteristics position order flow data as a critical foundation for generative modeling in financial markets. To this end, Microsoft Research developed the LMM and the MarS, which financial researchers can use to customize generative models for various applications, thus fostering a new paradigm of generative solutions for all downstream tasks in finance. This has the potential to advance efficiency and insight generation in the financial industry.
Tokenization of order flow information
Order flow data is vital for generative models in finance, reflecting real-time interactions among market participants. It offers two types of value:
- Fine-grained market feedback: Each order, especially large ones, may influence others’ decisions, providing a micro-level view of pricing behavior.
- Macroscopic market dynamics: Collective interactions shape trading dynamics over time, capturing the evolution and resolution of conflicts between market forces.
Researchers at Microsoft developed LMM by modeling both individual orders and entire order sets over time. This two-tiered approach captures both fine-grained feedback and macro-level dynamics of competition. Figure 2 shows the tokenization techniques for these models, enabling high-fidelity simulations of complex market dynamics.
Expansion law of large market model: Unlocking the potential of financial data
The effectiveness of generative models improves significantly with larger training datasets and model parameters. Researchers at Microsoft used two tokenization strategies to design models based on the Transformer architecture, testing them across varying data scales. Figure 3 illustrates the scaling behavior of both the order and order batch models, highlighting insights from historical trading data. This integration enhances the model’s ability to generate order flows with a deep understanding of market intricacies, enabling more accurate time-series modeling.
Microsoft research podcast
Collaborators: Silica in space with Richard Black and Dexter Greene
College freshman Dexter Greene and Microsoft research manager Richard Black discuss how technology that stores data in glass is supporting students as they expand earlier efforts to communicate what it means to be human to extraterrestrials.
MarS based on LMM
A customizable generative model for financial scenarios
Generative models, once trained, can be easily adapted for a range of downstream tasks, often outperforming traditional models tailored for specific scenarios. Building on the development of LMM, researchers further analyzed the needs of various financial scenarios and designed MarS as a versatile financial market simulation engine. MarS not only serves as a general-purpose simulation tool but also introduces a novel framework for applying generative models across diverse financial tasks, from market prediction and risk assessment to trading strategy optimization.
Constructing a unified paradigm for prediction and detection tasks
Traditional financial prediction solutions often require the development of specialized algorithms, which must be frequently adjusted, consuming time and resources. LMM’s capacity to model financial markets in depth allows for periodic updates based on the latest data. MarS creates a virtual exchange to match order flows generated by LMM, simulating trades and deriving simulated market trajectories (see the top right of Figure 4). This approach can effectively address common prediction and detection tasks in financial scenarios, introducing innovative solutions within the generative model framework.
Applications in prediction tasks
Prediction tasks, vital in finance, involve estimating future market metrics. Traditional models require modifications with any change in prediction targets. MarS addresses this by continuously generating future order flows from recent data, which are matched in a virtual exchange, allowing for the simulation of potential future market trajectories. This provides a significant advancement in forecasting capabilities.
Figure 5 demonstrates the use of MarS in forecasting stock-price movements, where its performance significantly outperforms traditional benchmark algorithms. Taking the Order Model (1.02B) for instance, its performance exceeds that of DeepLOB by approximately (0.662/0.583−1=13.5%) at a 1-minute horizon and increases to (0.579/0.473−1=22.4%) at a 5-minute horizon This widening performance gap suggests that the Order Model maintains its predictive accuracy more effectively over longer horizons, highlighting its superior generalization capability compared to baseline, especially as the prediction task becomes more challenging over extended timeframes. This provides an attractive solution for prediction tasks in financial markets, while also highlighting LMM’s capability in modeling stock market dynamics.
Applications in detection tasks
For regulators, detecting systemic risks or market abuse is critical for market stability. LMM models typical market patterns, enabling the identification of anomalies by comparing real market trajectories with those generated by MarS. Figure 6 shows the differences in the spread distribution (i.e., the difference between the best buy and sell prices, which reflects asset liquidity) between simulated and real market trajectories during a confirmed malicious market manipulation incident. This comparison can uncover subtle deviations indicative of unusual activities, offering regulators effective tools for monitoring market integrity.
Defining new FinTech scenarios
Generative models can create tailored content from simple descriptions. In MarS, a mechanism generates specific order flows from natural language descriptions of market conditions. To address extreme conditions, researchers developed a control signal system using a hierarchical diffusion model to generate high-fidelity signals during rare events, such as stock market crashes and circuit breakers. This capability transforms broad descriptions into precise order flow controls.
By integrating controlled order generation with real-time feedback, MarS creates a unified framework for prediction and detection tasks, redefining financial research, applications, and market understanding. Key applications include “What If” analyses and training environments for reinforcement learning algorithms in realistic market conditions.
“What If” analysis for financial research
The question “What would happen if different sizes of trading orders were executed under different market conditions?” is crucial for understanding market behavior. Traditional methods, relying on real orders, experience, and assumptions, are costly and slow. Generative models provide a breakthrough solution.
Figure 7 illustrates how MarS can simulate market impact: the top left shows how buy orders affect asset price trajectories, while the top right presents market impact curves of different strategies, matching traditional patterns. Researchers also used MarS to generate large-scale simulated data, constructing a market impact model using ordinary differential equations (ODE). The bottom left of Figure 7) shows the derived impact formula, and the bottom right demonstrates its interpretability. These advancements highlight MarS’s potential to enhance “What If” research through deep market modeling.
Training environments for reinforcement learning in financial markets
Reinforcement learning (RL) algorithms require controlled environments for testing and optimization. Financial market behaviors often manifest through order flow changes, impacting the market. If the simulation cannot reflect these impacts accurately, an RL algorithm may fail in real-world scenarios.
MarS provides high-fidelity generation and real-time feedback, creating a comprehensive environment for RL in finance. Figure 8 shows the training process of trading agents, highlighting significant improvements in performance over time and demonstrating MarS’s effectiveness as an RL training ground.
Disclaimer: The research mentioned in this article, conducted by Microsoft Research, focuses on scientific exploration, aiming to advance knowledge and provide theoretical and technological support for research and applications in the financial field. All studies adhere to Microsoft’s responsible AI guidelines, ensuring principles such as fairness, inclusiveness, reliability and safety, transparency, privacy, and accountability are maintained. The technologies and methods discussed are still under research and development, not yet forming any commercial products or services, nor constituting any financial solutions. Readers are advised to consult certified financial professionals before making any financial decisions.
The post MarS: A unified financial market simulation engine in the era of generative foundation models appeared first on Microsoft Research.