IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.16452.html

Smart Data Portfolios: A Governance Framework for AI Training Data

Author

Listed:
  • A. Talha Yalta
  • A. Yasemin Yalta

Abstract

Contemporary AI regulation, including the EU Artificial Intelligence Act and related governance frameworks, increasingly requires institutions to justify the training data used in automated decision-making. Yet existing governance regimes provide limited operational methods for selecting, weighting, and explaining data inputs. We introduce the Smart Data Portfolio (SDP) framework, which treats data categories as productive but risk-bearing assets, formalizing input governance as an information-risk trade-off. Within this framework, we define two portfolio-level quantities, Informational Return and Governance-Adjusted Risk, whose interaction characterizes attainable data mixtures and yields a Governance-Efficient Frontier. Regulators shape this frontier through risk caps, admissible categories, and weight bands that translate fairness, privacy, robustness, and provenance requirements into measurable constraints on data allocation while preserving model flexibility. A sectoral illustration shows how different AI services require distinct portfolios within a common governance structure. The framework provides an input-level explanation layer through which institutions can justify governed data use in large-scale AI deployment.

Suggested Citation

  • A. Talha Yalta & A. Yasemin Yalta, 2025. "Smart Data Portfolios: A Governance Framework for AI Training Data," Papers 2512.16452, arXiv.org, revised Feb 2026.
  • Handle: RePEc:arx:papers:2512.16452
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.16452
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Alessandro Acquisti & Curtis Taylor & Liad Wagman, 2016. "The Economics of Privacy," Journal of Economic Literature, American Economic Association, vol. 54(2), pages 442-492, June.
    2. repec:dau:papers:123456789/4688 is not listed on IDEAS
    3. Linh Tu Ho & Christopher Gan, 2023. "Artificial Intelligence, T-Shaped Teams, and Risk Management Post COVID-19 and Beyond," World Scientific Book Chapters, in: Suman Lodh & Monomita Nandy (ed.), Corporate Risk Management after the COVID-19 Crisis, chapter 6, pages 153-194, World Scientific Publishing Co. Pte. Ltd..
    4. Johannes Jakubik & Michael Vössing & Niklas Kühl & Jannis Walk & Gerhard Satzger, 2024. "Data-Centric Artificial Intelligence," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 66(4), pages 507-515, August.
    5. Philippe Artzner & Freddy Delbaen & Jean‐Marc Eber & David Heath, 1999. "Coherent Measures of Risk," Mathematical Finance, Wiley Blackwell, vol. 9(3), pages 203-228, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sofiane Aboura, 2014. "When the U.S. Stock Market Becomes Extreme?," Risks, MDPI, vol. 2(2), pages 1-15, May.
    2. Gordon J. Alexander & Alexandre M. Baptista, 2004. "A Comparison of VaR and CVaR Constraints on Portfolio Selection with the Mean-Variance Model," Management Science, INFORMS, vol. 50(9), pages 1261-1273, September.
    3. Domagoj Demeterfi & Kathrin Glau & Linus Wunderlich, 2025. "Function approximations for counterparty credit exposure calculations," Papers 2507.09004, arXiv.org.
    4. Christina Büsing & Sigrid Knust & Xuan Thanh Le, 2018. "Trade-off between robustness and cost for a storage loading problem: rule-based scenario generation," EURO Journal on Computational Optimization, Springer;EURO - The Association of European Operational Research Societies, vol. 6(4), pages 339-365, December.
    5. Winter, Peter, 2007. "Managerial Risk Accounting and Control – A German perspective," MPRA Paper 8185, University Library of Munich, Germany.
    6. Cui, Xueting & Zhu, Shushang & Sun, Xiaoling & Li, Duan, 2013. "Nonlinear portfolio selection using approximate parametric Value-at-Risk," Journal of Banking & Finance, Elsevier, vol. 37(6), pages 2124-2139.
    7. J. K. Pappalardo, 2022. "Economics of Consumer Protection: Contributions and Challenges in Estimating Consumer Injury and Evaluating Consumer Protection Policy," Journal of Consumer Policy, Springer, vol. 45(2), pages 201-238, June.
    8. Jiang Cheng & Hung-Gay Fung & Tzu-Ting Lin & Min-Ming Wen, 2024. "CEO optimism and the use of credit default swaps: evidence from the US life insurance industry," Review of Quantitative Finance and Accounting, Springer, vol. 63(1), pages 169-194, July.
    9. Walter Farkas & Pablo Koch-Medina & Cosimo Munari, 2014. "Beyond cash-additive risk measures: when changing the numéraire fails," Finance and Stochastics, Springer, vol. 18(1), pages 145-173, January.
    10. Li, Xiao-Ming & Rose, Lawrence C., 2009. "The tail risk of emerging stock markets," Emerging Markets Review, Elsevier, vol. 10(4), pages 242-256, December.
    11. Castaño-Martínez, A. & Pigueiras, G. & Ramos, C.D. & Sordo, M.A., 2025. "Ordering higher risks in Yaari's dual theory," Insurance: Mathematics and Economics, Elsevier, vol. 125(C).
    12. Choo, Weihao & de Jong, Piet, 2015. "The tradeoff insurance premium as a two-sided generalisation of the distortion premium," Insurance: Mathematics and Economics, Elsevier, vol. 65(C), pages 238-246.
    13. Louis Anthony (Tony)Cox, 2008. "What's Wrong with Risk Matrices?," Risk Analysis, John Wiley & Sons, vol. 28(2), pages 497-512, April.
    14. Jay Cao & Jacky Chen & John Hull & Zissis Poulos, 2021. "Deep Hedging of Derivatives Using Reinforcement Learning," Papers 2103.16409, arXiv.org.
    15. Ji, Ronglin & Shi, Xuejun & Wang, Shijie & Zhou, Jinming, 2019. "Dynamic risk measures for processes via backward stochastic differential equations," Insurance: Mathematics and Economics, Elsevier, vol. 86(C), pages 43-50.
    16. Malavasi, Matteo & Ortobelli Lozza, Sergio & Trück, Stefan, 2021. "Second order of stochastic dominance efficiency vs mean variance efficiency," European Journal of Operational Research, Elsevier, vol. 290(3), pages 1192-1206.
    17. Giovanni Bonaccolto & Massimiliano Caporin & Sandra Paterlini, 2018. "Asset allocation strategies based on penalized quantile regression," Computational Management Science, Springer, vol. 15(1), pages 1-32, January.
    18. Alois Pichler, 2024. "Higher order measures of risk and stochastic dominance," Papers 2402.15387, arXiv.org.
    19. Rostagno, Luciano Martin, 2005. "Empirical tests of parametric and non-parametric Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) measures for the Brazilian stock market index," ISU General Staff Papers 2005010108000021878, Iowa State University, Department of Economics.
    20. Luigi Zingales, 2022. "Regulating big tech," BIS Working Papers 1063, Bank for International Settlements.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.16452. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.