IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012574.html
   My bibliography  Save this article

Hands-on training about data clustering with orange data mining toolbox

Author

Listed:
  • Janez Demšar
  • Blaž Zupan

Abstract

Data clustering is a core data science approach widely used and referenced in the scientific literature. Its algorithms are often intuitive and can lead to exciting, insightful results that are easy to interpret. For these reasons, data clustering techniques could be the first method encountered in data science training. This paper proposes a hands-on approach to data clustering training suitable for introductory courses. The education approach features problem-based training that starts with the data and gradually introduces various data processing and analysis methods, illustrating them through visual representations of data and models. The proposed training is suitable for a general audience, does not require a background in statistics, mathematics, or computer science, and aims to engage the audience through practical examples, an exploratory approach to data analysis with visual analysis, experimentation, and a gentle learning curve. The manuscript details the pedagogical units of the training, motivates them through the sequence of methods introduced, and proposes data sets and data analysis workflows to be explored in the class.Author summary: The highest satisfaction for any instructor comes from an engaged audience, a motivated class that pays attention, and student questions that open up new venues for exploring the planned material. Any introduction to data science deserves such an audience, while the burden is on the instructor to prepare an exciting lesson that covers the planned material and keeps students engaged with just the right mix of theory and practice. We could think of no better topic to cover in this way than an introduction to machine learning and no better way to introduce this field than through data clustering. Of course, by including the necessary ingredients to assist instructors: use cases to explore, a visual analytics environment to use in the classroom, and a set of problems to intuitively introduce concepts ranging from data representation, similarity scoring, clustering methods, to evaluation and explanation of the resulting models. In the manuscript, we propose the ingredients of such training and offer them in a form ready to be explored by instructors in practical, hands-on courses.

Suggested Citation

  • Janez Demšar & Blaž Zupan, 2024. "Hands-on training about data clustering with orange data mining toolbox," PLOS Computational Biology, Public Library of Science, vol. 20(12), pages 1-9, December.
  • Handle: RePEc:plo:pcbi00:1012574
    DOI: 10.1371/journal.pcbi.1012574
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012574
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012574&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012574?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012574. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.