Tail-Safe Hedging: Explainable Risk-Sensitive Reinforcement Learning with a White-Box CBF--QP Safety Layer in Arbitrage-Free Markets

Tail-Safe Hedging: Explainable Risk-Sensitive Reinforcement Learning with a White-Box CBF--QP Safety Layer in Arbitrage-Free Markets

Author

Listed:

Jian'an Zhang

Abstract

We introduce Tail-Safe, a deployability-oriented framework for derivatives hedging that unifies distributional, risk-sensitive reinforcement learning with a white-box control-barrier-function (CBF) quadratic-program (QP) safety layer tailored to financial constraints. The learning component combines an IQN-based distributional critic with a CVaR objective (IQN--CVaR--PPO) and a Tail-Coverage Controller that regulates quantile sampling through temperature tilting and tail boosting to stabilize small-$\alpha$ estimation. The safety component enforces discrete-time CBF inequalities together with domain-specific constraints -- ellipsoidal no-trade bands, box and rate limits, and a sign-consistency gate -- solved as a convex QP whose telemetry (active sets, tightness, rate utilization, gate scores, slack, and solver status) forms an auditable trail for governance. We provide guarantees of robust forward invariance of the safe set under bounded model mismatch, a minimal-deviation projection interpretation of the QP, a KL-to-DRO upper bound linking per-state KL regularization to worst-case CVaR, concentration and sample-complexity results for the temperature-tilted CVaR estimator, and a CVaR trust-region improvement inequality under KL limits, together with feasibility persistence under expiry-aware tightening. Empirically, in arbitrage-free, microstructure-aware synthetic markets (SSVI $\to$ Dupire $\to$ VIX with ABIDES/MockLOB execution), Tail-Safe improves left-tail risk without degrading central performance and yields zero hard-constraint violations whenever the QP is feasible with zero slack. Telemetry is mapped to governance dashboards and incident workflows to support explainability and auditability. Limitations include reliance on synthetic data and simplified execution to isolate methodological contributions.

Suggested Citation

Jian'an Zhang, 2025. "Tail-Safe Hedging: Explainable Risk-Sensitive Reinforcement Learning with a White-Box CBF--QP Safety Layer in Arbitrage-Free Markets," Papers 2510.04555, arXiv.org.

Handle: RePEc:arx:papers:2510.04555

Download full text from publisher

References listed on IDEAS

Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
Obizhaeva, Anna A. & Wang, Jiang, 2013. "Optimal trading strategy and supply/demand dynamics," Journal of Financial Markets, Elsevier, vol. 16(1), pages 1-32.
- Anna Obizhaeva & Jiang Wang, 2005. "Optimal Trading Strategy and Supply/Demand Dynamics," NBER Working Papers 11444, National Bureau of Economic Research, Inc.
Kyle, Albert S, 1985. "Continuous Auctions and Insider Trading," Econometrica, Econometric Society, vol. 53(6), pages 1315-1335, November.
Garud N. Iyengar, 2005. "Robust Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 30(2), pages 257-280, May.
Acerbi, Carlo & Tasche, Dirk, 2002. "On the coherence of expected shortfall," Journal of Banking & Finance, Elsevier, vol. 26(7), pages 1487-1503, July.
- Carlo Acerbi & Dirk Tasche, 2001. "On the coherence of Expected Shortfall," Papers cond-mat/0104295, arXiv.org, revised May 2002.
Jim Gatheral & Antoine Jacquier, 2014. "Arbitrage-free SVI volatility surfaces," Quantitative Finance, Taylor & Francis Journals, vol. 14(1), pages 59-71, January.
- Jim Gatheral & Antoine Jacquier, 2012. "Arbitrage-free SVI volatility surfaces," Papers 1204.0646, arXiv.org, revised Mar 2013.
Ben Hambly & Renyuan Xu & Huining Yang, 2023. "Recent advances in reinforcement learning in finance," Mathematical Finance, Wiley Blackwell, vol. 33(3), pages 437-503, July.
Jim Gatheral, 2010. "No-dynamic-arbitrage and market impact," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 749-759.
Arnab Nilim & Laurent El Ghaoui, 2005. "Robust Control of Markov Decision Processes with Uncertain Transition Matrices," Operations Research, INFORMS, vol. 53(5), pages 780-798, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Jian'an Zhang, 2025. "Tail-Safe Stochastic-Control SPX-VIX Hedging: A White-Box Bridge Between AI Sensitivities and Arbitrage-Free Market Dynamics," Papers 2510.15937, arXiv.org.
Gianbiagio Curato & Jim Gatheral & Fabrizio Lillo, 2014. "Optimal execution with nonlinear transient market impact," Papers 1412.4839, arXiv.org.
Martin D. Gould & Mason A. Porter & Stacy Williams & Mark McDonald & Daniel J. Fenn & Sam D. Howison, 2010. "Limit Order Books," Papers 1012.0349, arXiv.org, revised Apr 2013.
Seungki Min & Costis Maglaras & Ciamac C. Moallemi, 2018. "Cross-Sectional Variation of Intraday Liquidity, Cross-Impact, and their Effect on Portfolio Execution," Papers 1811.05524, arXiv.org.
Olivier Guéant, 2016. "The Financial Mathematics of Market Liquidity: From Optimal Execution to Market Making," Post-Print hal-01393136, HAL.
Emilio Said, 2022. "Market Impact: Empirical Evidence, Theory and Practice," Working Papers hal-03668669, HAL.
Puru Gupta & Saul D. Jacka, 2023. "Portfolio Choice In Dynamic Thin Markets: Merton Meets Cournot," Papers 2309.16047, arXiv.org.
Jian'an Zhang, 2025. "Risk-Sensitive Option Market Making with Arbitrage-Free eSSVI Surfaces: A Constrained RL and Stochastic Control Bridge," Papers 2510.04569, arXiv.org.
A. Sadoghi & J. Vecer, 2015. "Optimum Liquidation Problem Associated with the Poisson Cluster Process," Papers 1507.06514, arXiv.org, revised Dec 2015.
Sadoghi, Amirhossein & Vecer, Jan, 2022. "Optimal liquidation problem in illiquid markets," European Journal of Operational Research, Elsevier, vol. 296(3), pages 1050-1066.
Amirhossein Sadoghi & Jan Vecer, 2022. "Optimal liquidation problem in illiquid markets," Post-Print hal-03696768, HAL.
Antje Fruth & Torsten Schöneborn & Mikhail Urusov, 2014. "Optimal Trade Execution And Price Manipulation In Order Books With Time-Varying Liquidity," Mathematical Finance, Wiley Blackwell, vol. 24(4), pages 651-695, October.
- Antje Fruth & Torsten Schoeneborn & Mikhail Urusov, 2011. "Optimal trade execution and price manipulation in order books with time-varying liquidity," Papers 1109.2631, arXiv.org.
Yuheng Zheng & Zihan Ding, 2024. "Reinforcement Learning in High-frequency Market Making," Papers 2407.21025, arXiv.org, revised Aug 2024.
Nico Achtsis & Dirk Nuyens, 2013. "A Monte Carlo method for optimal portfolio executions," Papers 1312.5919, arXiv.org.
Beomsoo Park & Benjamin Van Roy, 2015. "Adaptive Execution: Exploration and Learning of Price Impact," Operations Research, INFORMS, vol. 63(5), pages 1058-1076, October.
Emilio Said, 2022. "Market Impact: Empirical Evidence, Theory and Practice," Papers 2205.07385, arXiv.org.
Masamitsu Ohnishi & Makoto Shimoshimizu, 2024. "Trade execution games in a Markovian environment," Papers 2405.07184, arXiv.org.
Beomsoo Park & Benjamin Van Roy, 2012. "Adaptive Execution: Exploration and Learning of Price Impact," Papers 1207.6423, arXiv.org.
Gianbiagio Curato & Jim Gatheral & Fabrizio Lillo, 2017. "Optimal execution with non-linear transient market impact," Quantitative Finance, Taylor & Francis Journals, vol. 17(1), pages 41-54, January.
Igor Skachkov, 2013. "Market Impact Paradoxes," Papers 1312.3349, arXiv.org.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2510.04555. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Tail-Safe Hedging: Explainable Risk-Sensitive Reinforcement Learning with a White-Box CBF--QP Safety Layer in Arbitrage-Free Markets

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data