IDEAS home Printed from https://ideas.repec.org/a/kap/transp/v50y2023i4d10.1007_s11116-022-10279-8.html
   My bibliography  Save this article

A multi-source data fusion framework for joint population, expenditure, and time use synthesis

Author

Listed:
  • Jason Hawkins

    (University of Nebraska – Lincoln)

  • Khandker Nurul Habib

    (University of Toronto)

Abstract

Data are important components of any research; however, it is often the case that the required data are not readily available. Researchers often fuse multiple datasets to obtain the data required to complete their work. In urban simulation, spatially referencing data is of paramount importance to capture local variations in travel and preferences. Data fusion typically obfuscates the spatial reference by merging records from different locations. Population synthesis is used to match these fused household records to a plausible location based on aggregate sociodemographic statistics. In some cases, researchers must also synthesize the necessary data. This paper outlines a data fusion workflow for a statistically valid synthetic population for use in urban simulation models. We develop the framework for the case of household-level expenditure and individuals’ time use patterns. The Greater Toronto Area (GTA) in Canada is used as a testbed. The results of the data fusion and synthesis are validated against statistics from a large-sample travel survey conducted in the GTA, showing a good fit with the validation dataset. Finally, we outline how the framework could be applied in other contexts where a single dataset is unavailable.

Suggested Citation

  • Jason Hawkins & Khandker Nurul Habib, 2023. "A multi-source data fusion framework for joint population, expenditure, and time use synthesis," Transportation, Springer, vol. 50(4), pages 1323-1346, August.
  • Handle: RePEc:kap:transp:v:50:y:2023:i:4:d:10.1007_s11116-022-10279-8
    DOI: 10.1007/s11116-022-10279-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11116-022-10279-8
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s11116-022-10279-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Joseph W. Sakshaug & Trivellore E. Raghunathan, 2014. "Generating synthetic microdata to estimate small area statistics in the American Community Survey," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 15(3), pages 341-368, June.
    2. Nam Huynh & Johan Barthelemy & Pascal Perez, 2016. "A Heuristic Combinatorial Optimisation Approach to Synthesising a Population for Agent Based Modelling Purposes," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 19(4), pages 1-11.
    3. Saadi, Ismaïl & Mustafa, Ahmed & Teller, Jacques & Farooq, Bilal & Cools, Mario, 2016. "Hidden Markov Model-based population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 90(C), pages 1-21.
    4. Marcela Munizaga & Sergio Jara-Díaz & Javiera Olguín & Jorge Rivera, 2011. "Generating twins to build weekly time use data from multiple single day OD surveys," Transportation, Springer, vol. 38(3), pages 511-524, May.
    5. Floriana Gargiulo & Sônia Ternes & Sylvie Huet & Guillaume Deffuant, 2010. "An Iterative Approach for Generating Statistically Realistic Populations of Households," PLOS ONE, Public Library of Science, vol. 5(1), pages 1-9, January.
    6. Caleb Van Nostrand & Vijayaraghavan Sivaraman & Abdul Pinjari, 2013. "Analysis of long-distance vacation travel demand in the United States: a multiple discrete–continuous choice framework," Transportation, Springer, vol. 40(1), pages 151-171, January.
    7. Martin Browning & Mette Gørtz, 2012. "Spending Time and Money within the Household," Scandinavian Journal of Economics, Wiley Blackwell, vol. 114(3), pages 681-704, September.
    8. Bhat, Chandra R., 2015. "A new generalized heterogeneous data model (GHDM) to jointly model mixed types of dependent variables," Transportation Research Part B: Methodological, Elsevier, vol. 79(C), pages 50-77.
    9. Fang, Lei & Zhu, Guozhong, 2017. "Time allocation and home production technology," Journal of Economic Dynamics and Control, Elsevier, vol. 78(C), pages 88-101.
    10. Astroza, Sebastian & Pinjari, Abdul & Bhat, Chandra & Jara-Diaz, Sergio, 2017. "A Microeconomic Theory–Based Latent Class Multiple Discrete–Continuous Choice Model of Time Use and Goods Consumption," MPRA Paper 92574, University Library of Munich, Germany.
    11. Peijun Ye & Xiaolin Hu & Yong Yuan & Fei-Yue Wang, 2017. "Population Synthesis Based on Joint Distribution Inference Without Disaggregate Samples," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 20(4), pages 1-16.
    12. Jara-Díaz, Sergio & Rosales-Salas, Jorge, 2015. "Understanding time use: Daily or weekly data?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 76(C), pages 38-57.
    13. Jara-Díaz, Sergio R. & Munizaga, Marcela A. & Greeven, Paulina & Guerra, Reinaldo & Axhausen, Kay, 2008. "Estimating the value of leisure from a time allocation model," Transportation Research Part B: Methodological, Elsevier, vol. 42(10), pages 946-957, December.
    14. Johan Barthelemy & Philippe L. Toint, 2013. "Synthetic Population Generation Without a Sample," Transportation Science, INFORMS, vol. 47(2), pages 266-279, May.
    15. Byungduk Jeong & Wonjoon Lee & Deok-Soo Kim & Hayong Shin, 2016. "Copula-Based Approach to Synthetic Population Generation," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-28, August.
    16. Dane, Gamze & Arentze, Theo A. & Timmermans, Harry J.P. & Ettema, Dick, 2014. "Simultaneous modeling of individuals’ duration and expenditure decisions in out-of-home leisure activities," Transportation Research Part A: Policy and Practice, Elsevier, vol. 70(C), pages 93-103.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jara-Díaz, Sergio & Rosales-Salas, Jorge, 2017. "Beyond transport time: A review of time use modeling," Transportation Research Part A: Policy and Practice, Elsevier, vol. 97(C), pages 209-230.
    2. Florian Aschauer & Inka Rösel & Reinhard Hössinger & Heinz Brian Kreis & Regine Gerike, 2019. "Time use, mobility and expenditure: an innovative survey design for understanding individuals’ trade-off processes," Transportation, Springer, vol. 46(2), pages 307-339, April.
    3. Reinhard Hössinger & Florian Aschauer & Sergio Jara-Díaz & Simona Jokubauskaite & Basil Schmid & Stefanie Peer & Kay W. Axhausen & Regine Gerike, 2020. "A joint time-assignment and expenditure-allocation model: value of leisure and value of time assigned to travel for specific population segments," Transportation, Springer, vol. 47(3), pages 1439-1475, June.
    4. Suesse Thomas & Namazi-Rad Mohammad-Reza & Mokhtarian Payam & Barthélemy Johan, 2017. "Estimating Cross-Classified Population Counts of Multidimensional Tables: An Application to Regional Australia to Obtain Pseudo-Census Counts," Journal of Official Statistics, Sciendo, vol. 33(4), pages 1021-1050, December.
    5. Jian Liu & Xiaosu Ma & Yi Zhu & Jing Li & Zong He & Sheng Ye, 2021. "Generating and Visualizing Spatially Disaggregated Synthetic Population Using a Web-Based Geospatial Service," Sustainability, MDPI, vol. 13(3), pages 1-16, February.
    6. Chih-Wen Yang & Cheng-Lung (Richard) Wu & Jin-Long Lu, 2021. "Exploring the interdependency and determinants of tourism participation, expenditure, and duration: An analysis of Taiwanese citizens traveling abroad," Tourism Economics, , vol. 27(4), pages 649-669, June.
    7. Astroza, Sebastian & Pinjari, Abdul & Bhat, Chandra & Jara-Diaz, Sergio, 2017. "A Microeconomic Theory–Based Latent Class Multiple Discrete–Continuous Choice Model of Time Use and Goods Consumption," MPRA Paper 92574, University Library of Munich, Germany.
    8. Ma, Lu & Srinivasan, Sivaramakrishnan, 2016. "An empirical assessment of factors affecting the accuracy of target-year synthetic populations," Transportation Research Part A: Policy and Practice, Elsevier, vol. 85(C), pages 247-264.
    9. Jara-Díaz, Sergio R. & Rosales-Salas, Jorge, 2020. "Time use: The role of sleep," Transportation Research Part A: Policy and Practice, Elsevier, vol. 136(C), pages 1-20.
    10. repec:ijm:journl:v109:y:2017:i:1:p:167-200 is not listed on IDEAS
    11. Sun, Lijun & Erath, Alexander & Cai, Ming, 2018. "A hierarchical mixture modeling framework for population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 114(C), pages 199-212.
    12. Astroza, Sebastian & Bhat, Prerna C. & Bhat, Chandra R. & Pendyala, Ram M. & Garikapati, Venu M., 2018. "Understanding activity engagement across weekdays and weekend days: A multivariate multiple discrete-continuous modeling approach," Journal of choice modelling, Elsevier, vol. 28(C), pages 56-70.
    13. Nicholas Fournier & Eleni Christofa & Arun Prakash Akkinepally & Carlos Lima Azevedo, 2021. "Integrated population synthesis and workplace assignment using an efficient optimization-based person-household matching method," Transportation, Springer, vol. 48(2), pages 1061-1087, April.
    14. Schmid, Basil & Molloy, Joseph & Peer, Stefanie & Jokubauskaite, Simona & Aschauer, Florian & Hössinger, Reinhard & Gerike, Regine & Jara-Diaz, Sergio R. & Axhausen, Kay W., 2021. "The value of travel time savings and the value of leisure in Zurich: Estimation, decomposition and policy implications," Transportation Research Part A: Policy and Practice, Elsevier, vol. 150(C), pages 186-215.
    15. Jara-Díaz, Sergio & Rosales-Salas, Jorge, 2015. "Understanding time use: Daily or weekly data?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 76(C), pages 38-57.
    16. Saadi, Ismaïl & Mustafa, Ahmed & Teller, Jacques & Farooq, Bilal & Cools, Mario, 2016. "Hidden Markov Model-based population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 90(C), pages 1-21.
    17. Jara-Díaz, Sergio & Candia, Diego, 2020. "A new look at the value of leisure in two-worker households," Economics of Transportation, Elsevier, vol. 24(C).
    18. Jara-Díaz, Sergio R. & Astroza, Sebastian & Bhat, Chandra R. & Castro, Marisol, 2016. "Introducing relations between activities and goods consumption in microeconomic time use models," Transportation Research Part B: Methodological, Elsevier, vol. 93(PA), pages 162-180.
    19. Pellegrini, Andrea & Pinjari, Abdul Rawoof & Maggi, Rico, 2021. "A multiple discrete continuous model of time use that accommodates non-additively separable utility functions along with time and monetary budget constraints," Transportation Research Part A: Policy and Practice, Elsevier, vol. 144(C), pages 37-53.
    20. Ian Philips & Graham Clarke & David Watling, 2017. "A Fine Grained Hybrid Spatial Microsimulation Technique for Generating Detailed Synthetic Individuals from Multiple Data Sources: An Application To Walking And Cycling," International Journal of Microsimulation, International Microsimulation Association, vol. 10(1), pages 167-200.
    21. La Paix Puello, Lissy & Chowdhury, Saidul & Geurs, Karst, 2019. "Using panel data for modelling duration dynamics of outdoor leisure activities," Journal of choice modelling, Elsevier, vol. 31(C), pages 141-155.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:transp:v:50:y:2023:i:4:d:10.1007_s11116-022-10279-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.