IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i6p109-d1171226.html
   My bibliography  Save this article

Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

Author

Listed:
  • Liliya A. Demidova

    (Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia)

  • Elena G. Andrianova

    (Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia)

  • Peter N. Sovietov

    (Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia)

  • Artyom V. Gorchakov

    (Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia)

Abstract

This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.

Suggested Citation

  • Liliya A. Demidova & Elena G. Andrianova & Peter N. Sovietov & Artyom V. Gorchakov, 2023. "Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant," Data, MDPI, vol. 8(6), pages 1-16, June.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:6:p:109-:d:1171226
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/6/109/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/6/109/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Brailsford, Sally C. & Potts, Chris N. & Smith, Barbara M., 1999. "Constraint satisfaction problems: Algorithms and applications," European Journal of Operational Research, Elsevier, vol. 119(3), pages 557-581, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Artyom V. Gorchakov & Liliya A. Demidova & Peter N. Sovietov, 2023. "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task," Future Internet, MDPI, vol. 15(9), pages 1-28, September.
    2. Liliya A. Demidova & Peter N. Sovietov & Elena G. Andrianova & Anna A. Demidova, 2023. "Anomaly Detection in Student Activity in Solving Unique Programming Exercises: Motivated Students against Suspicious Ones," Data, MDPI, vol. 8(8), pages 1-23, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huang, Wei & Chen, Bo, 2007. "Scheduling of batch plants: Constraint-based approach and performance investigation," International Journal of Production Economics, Elsevier, vol. 105(2), pages 425-444, February.
    2. Rafael Pastor & Albert Corominas, 2004. "Branch and win: OR tree search algorithms for solving combinatorial optimisation problems," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 12(1), pages 169-191, June.
    3. Roberto Rossi & S. Armagan Tarim & Brahim Hnich & Steven Prestwich & Semra Karacaer, 2010. "Scheduling internal audit activities: a stochastic combinatorial optimization problem," Journal of Combinatorial Optimization, Springer, vol. 19(3), pages 325-346, April.
    4. R Qu & E K Burke, 2009. "Hybridizations within a graph-based hyper-heuristic framework for university timetabling problems," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(9), pages 1273-1285, September.
    5. Fleckenstein, David & Klein, Robert & Steinhardt, Claudius, 2023. "Recent advances in integrating demand management and vehicle routing: A methodological review," European Journal of Operational Research, Elsevier, vol. 306(2), pages 499-518.
    6. Anthony Han & Elvis Li, 2014. "A constraint programming-based approach to the crew scheduling problem of the Taipei mass rapid transit system," Annals of Operations Research, Springer, vol. 223(1), pages 173-193, December.
    7. R. A. Oude Vrielink & E. A. Jansen & E. W. Hans & J. Hillegersberg, 2019. "Practices in timetabling in higher education institutions: a systematic review," Annals of Operations Research, Springer, vol. 275(1), pages 145-160, April.
    8. Boysen, Nils & Fliedner, Malte & Scholl, Armin, 2009. "Sequencing mixed-model assembly lines: Survey, classification and model critique," European Journal of Operational Research, Elsevier, vol. 192(2), pages 349-373, January.
    9. Alexandra M. Newman & Martin Weiss, 2013. "A Survey of Linear and Mixed-Integer Optimization Tutorials," INFORMS Transactions on Education, INFORMS, vol. 14(1), pages 26-38, September.
    10. Joni L. Jones & Gary J. Koehler, 2005. "A Heuristic for Winner Determination in Rule-Based Combinatorial Auctions," INFORMS Journal on Computing, INFORMS, vol. 17(4), pages 475-489, November.
    11. Xiangtong Qi & Jonathan F. Bard & Gang Yu, 2004. "Class Scheduling for Pilot Training," Operations Research, INFORMS, vol. 52(1), pages 148-162, February.
    12. Eveborn, Patrik & Flisberg, Patrik & Ronnqvist, Mikael, 2006. "Laps Care--an operational system for staff planning of home care," European Journal of Operational Research, Elsevier, vol. 171(3), pages 962-976, June.
    13. Tiago Pais & Paula Amaral, 2012. "Managing the tabu list length using a fuzzy inference system: an application to examination timetabling," Annals of Operations Research, Springer, vol. 194(1), pages 341-363, April.
    14. Tsoukias, Alexis, 2008. "From decision theory to decision aiding methodology," European Journal of Operational Research, Elsevier, vol. 187(1), pages 138-161, May.
    15. Alexis Tsoukiàs, 2007. "On the concept of decision aiding process: an operational perspective," Annals of Operations Research, Springer, vol. 154(1), pages 3-27, October.
    16. Giuseppe Lancia & Franca Rinaldi & Paolo Serafini, 2011. "A time-indexed LP-based approach for min-sum job-shop problems," Annals of Operations Research, Springer, vol. 186(1), pages 175-198, June.
    17. Qian Ma & Zhenhua Duan, 2014. "Linear time-dependent constraints programming with MSVL," Journal of Combinatorial Optimization, Springer, vol. 27(4), pages 724-766, May.
    18. Tarim, S. Armagan & Smith, Barbara M., 2008. "Constraint programming for computing non-stationary (R, S) inventory policies," European Journal of Operational Research, Elsevier, vol. 189(3), pages 1004-1021, September.
    19. San Segundo, Pablo & Furini, Fabio & León, Rafael, 2022. "A new branch-and-filter exact algorithm for binary constraint satisfaction problems," European Journal of Operational Research, Elsevier, vol. 299(2), pages 448-467.
    20. Colorni, Alberto & Tsoukiàs, Alexis, 2024. "What is a decision problem?," European Journal of Operational Research, Elsevier, vol. 314(1), pages 255-267.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:6:p:109-:d:1171226. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.