IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v4y2025i3p248-264.html

Stochastic Neighbourhood Components Analysis

Author

Listed:
  • Graham Laidler

    (STOR-i Centre for Doctoral Training, Lancaster University, Lancaster LA1 4YW, United Kingdom)

  • Lucy E. Morgan

    (Department of Management Science, Lancaster University, Lancaster LA1 4YW, United Kingdom)

  • Nicos G. Pavlidis

    (Department of Management Science, Lancaster University, Lancaster LA1 4YW, United Kingdom)

  • Barry L. Nelson

    (Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208)

Abstract

Distance metric learning is a fundamental task in data mining and is known to enhance the performance of various distance-based algorithms. In this paper, we consider stochastic training data in which repeated feature vectors can belong to different classes, a scenario in which existing methods of metric learning are known to struggle. This type of data is common in stochastic simulations, where multidimensional, recurrent system states are subject to inherent randomness. Classification models on such high-resolution simulation-generated data play a critical role in real-time decision making across diverse applications. This paper presents and implements a stochastic version of the popular neighbourhood components analysis. We demonstrate its behaviour on stochastic data using simulation models and reveal its advantages when used for nearest neighbour classification. Meanwhile, the assumptions of stochastic labelling and repeated feature vectors extend to data from various domains, suggesting that the method can attain broad impact. For example, beyond its applications to system control and decision making with digital twin simulation, it may enhance the analysis of data from sensor networks, recommender systems, and crowdsourced platforms, where stochasticity and recurring feature patterns are typical.

Suggested Citation

  • Graham Laidler & Lucy E. Morgan & Nicos G. Pavlidis & Barry L. Nelson, 2025. "Stochastic Neighbourhood Components Analysis," INFORMS Joural on Data Science, INFORMS, vol. 4(3), pages 248-264, July.
  • Handle: RePEc:inm:orijds:v:4:y:2025:i:3:p:248-264
    DOI: 10.1287/ijds.2023.0018
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2023.0018
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2023.0018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Carlos Henrique dos Santos & José Arnaldo Barra Montevechi & José Antônio de Queiroz & Rafael de Carvalho Miranda & Fabiano Leal, 2022. "Decision support in productive processes through DES and ABS in the Digital Twin era: a systematic literature review," International Journal of Production Research, Taylor & Francis Journals, vol. 60(8), pages 2662-2681, April.
    2. Yujing Lin & Barry L. Nelson & Linda Pei, 2019. "Virtual Statistics in Simulation via k Nearest Neighbors," INFORMS Journal on Computing, INFORMS, vol. 31(3), pages 576-592, July.
    3. Haihui Shen & L. Jeff Hong & Xiaowei Zhang, 2021. "Ranking and Selection with Covariates for Personalized Decision Making," INFORMS Journal on Computing, INFORMS, vol. 33(4), pages 1500-1519, October.
    4. Guangxin Jiang & L. Jeff Hong & Barry L. Nelson, 2020. "Online Risk Monitoring Using Offline Simulation," INFORMS Journal on Computing, INFORMS, vol. 32(2), pages 356-375, April.
    5. B L Nelson, 2016. "‘Some tactical problems in digital simulation’ for the next 10 years," Journal of Simulation, Taylor & Francis Journals, vol. 10(1), pages 2-11, February.
    6. L. Jeff Hong & Guangxin Jiang, 2019. "Offline Simulation Online Application: A New Framework of Simulation-Based Decision Making," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 36(06), pages 1-22, December.
    7. Morgan, Lucy E. & Barton, Russell R., 2022. "Fourier trajectory analysis for system discrimination," European Journal of Operational Research, Elsevier, vol. 296(1), pages 203-217.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Dongyang & Chew, Ek Peng & Li, Haobin & Yücesan, Enver & Chen, Chun-Hung, 2026. "Efficient simulation budget allocation for contextual ranking and selection with quadratic models," European Journal of Operational Research, Elsevier, vol. 328(3), pages 862-876.
    2. Guangxin Jiang & L. Jeff Hong & Haihui Shen, 2024. "Real-Time Derivative Pricing and Hedging with Consistent Metamodels," INFORMS Journal on Computing, INFORMS, vol. 36(5), pages 1168-1189, September.
    3. Jiaqi Zhou & Ilya O. Ryzhov, 2025. "Technical Note—A New Rate-Optimal Sampling Allocation for Linear Belief Models," Operations Research, INFORMS, vol. 73(1), pages 239-250, January.
    4. Cheng Li & Siyang Gao & Jianzhong Du, 2023. "Convergence Analysis of Stochastic Kriging-Assisted Simulation with Random Covariates," INFORMS Journal on Computing, INFORMS, vol. 35(2), pages 386-402, March.
    5. Morgan, Lucy E. & Barton, Russell R., 2025. "Statistical process control for queue length trajectories using Fourier analysis," European Journal of Operational Research, Elsevier, vol. 325(2), pages 233-246.
    6. Decui Liang & Fangshun Li & Xinyi Chen, 2024. "Failure mode and effect analysis by exploiting text mining and multi-view group consensus for the defect detection of electric vehicles in social media data," Annals of Operations Research, Springer, vol. 340(1), pages 289-324, September.
    7. Jack P. C. Kleijnen & Wim C. M. van Beers, 2022. "Statistical Tests for Cross-Validation of Kriging Models," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 607-621, January.
    8. Eunji Lim & Peter W. Glynn, 2023. "Simulation-Based Prediction," Operations Research, INFORMS, vol. 71(1), pages 47-60, January.
    9. Xin Yun & Yanyi Ye & Hao Liu & Yi Li & Kin-Keung Lai, 2023. "Stylized Model of Lévy Process in Risk Estimation," Mathematics, MDPI, vol. 11(6), pages 1-14, March.
    10. Haidong Li & Henry Lam & Yijie Peng, 2024. "Efficient Learning for Clustering and Optimizing Context-Dependent Designs," Operations Research, INFORMS, vol. 72(2), pages 617-638, March.
    11. Gregory Keslin & Barry L. Nelson & Bernardo Pagnoncelli & Matthew Plumlee & Hamed Rahimian, 2025. "Ranking and Contextual Selection," Operations Research, INFORMS, vol. 73(5), pages 2695-2707, September.
    12. Yi Zhu & Jing Dong & Henry Lam, 2024. "Uncertainty Quantification and Exploration for Reinforcement Learning," Operations Research, INFORMS, vol. 72(4), pages 1689-1709, July.
    13. Mohammadmahdi Ghasemloo & David J. Eckman, 2026. "An Agglomerative Clustering Algorithm for Simulation Output Distributions Using Regularized Wasserstein Distance," INFORMS Joural on Data Science, INFORMS, vol. 5(1), pages 65-80, January.
    14. Taeho Kim & Kyoung-Kuk Kim & Eunhye Song, 2025. "Selection of the Most Probable Best," Operations Research, INFORMS, vol. 73(6), pages 3199-3218, November.
    15. Wang, Tianxiang & Xu, Jie & Branke, Juergen & Hu, Jian-Qiang & Chen, Chun-Hung, 2025. "Ranking and selection with two-stage decision," European Journal of Operational Research, Elsevier, vol. 322(1), pages 121-132.
    16. L. Jeff Hong & Guangxin Jiang & Ying Zhong, 2022. "Solving Large-Scale Fixed-Budget Ranking and Selection Problems," INFORMS Journal on Computing, INFORMS, vol. 34(6), pages 2930-2949, November.
    17. Yufeng Zheng & Zeyu Zheng & Tingyu Zhu, 2025. "A Doubly Stochastic Simulator with Applications in Arrivals Modeling and Simulation," Operations Research, INFORMS, vol. 73(2), pages 910-926, March.
    18. Jensen, Kimberly L. & Hughes, David L. & DeLong, Karen L. & Trejo-Pech, Carlos O. & Gill, Mackenzie B., 2021. "Factors Influencing Tennessee Adults’ Craft Hard Apple Cidery Visit Expenditures and Travel Distance," Journal of Food Distribution Research, Food Distribution Research Society, vol. 52(2), July.
    19. Du-Yi Wang & Guo Liang & Kun Zhang & Qianwen Zhu, 2026. "Reliable Real-Time Value at Risk Estimation via Quantile Regression Forest with Conformal Calibration," Papers 2602.01912, arXiv.org.
    20. Vahid Roshanaei & Bahman Naderi & Opher Baron & Dmitry Krass, 2024. "An Interactive Spreadsheet Model for Teaching Classification Using Logistic Regression," INFORMS Transactions on Education, INFORMS, vol. 25(1), pages 55-80, September.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:4:y:2025:i:3:p:248-264. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.