Author
Listed:
- Antoine Bordas
(CGS i3 - Centre de Gestion Scientifique i3 - Mines Paris - PSL (École nationale supérieure des mines de Paris) - PSL - Université Paris Sciences et Lettres - I3 - Institut interdisciplinaire de l’innovation - CNRS - Centre National de la Recherche Scientifique)
- Pascal Le Masson
(CGS i3 - Centre de Gestion Scientifique i3 - Mines Paris - PSL (École nationale supérieure des mines de Paris) - PSL - Université Paris Sciences et Lettres - I3 - Institut interdisciplinaire de l’innovation - CNRS - Centre National de la Recherche Scientifique)
- Benoit Weil
(CGS i3 - Centre de Gestion Scientifique i3 - Mines Paris - PSL (École nationale supérieure des mines de Paris) - PSL - Université Paris Sciences et Lettres - I3 - Institut interdisciplinaire de l’innovation - CNRS - Centre National de la Recherche Scientifique)
Abstract
In the current data-rich environment, valorizing of data has become a common task in data science and requires the design of a statistical model to transform input data into a desirable output. The literature in data science regarding the design of new models is abundant, while in parallel, other streams of literature such as epistemology of science, has shown the relevance of anomalies in model design processes. Anomalies are to be understood as unexpected observations in data, an historical example being the discovery of Mercury based on its famous anomalous precession perihelion. Therefore, this paper addresses the various design processes in data science and their relationships to anomalies. To do so, we conceptualize what designing a data science model means, and we derive three design processes based on the latest theories in engineering design. This allows us to formulate assumptions regarding the relationships between each design process and anomalies, which we test with several case studies. Notably, three processes for the design of models in data science are identified and, for each of them, the following information is provided: (1) the various knowledge leveraged and generated and (2) the specific relations with anomalies. From a theoretical standpoint, this work is one of the first applications of design methods in data science. This work paves the way for more research at the intersection of engineering design and data science, which could enrich both fields.
Suggested Citation
Antoine Bordas & Pascal Le Masson & Benoit Weil, 2024.
"Model design in data science: engineering design to uncover design processes and anomalies,"
Post-Print
hal-04790948, HAL.
Handle:
RePEc:hal:journl:hal-04790948
DOI: 10.1007/s00163-024-00442-w
Download full text from publisher
To our knowledge, this item is not available for
download. To find whether it is available, there are three
options:
1. Check below whether another version of this item is available online.
2. Check on the provider's
web page
whether it is in fact available.
3. Perform a
search for a similarly titled item that would be
available.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-04790948. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.