IDEAS home Printed from https://ideas.repec.org/a/eee/transe/v208y2026ics1366554525006751.html

Discriminatory order assignment and payment-setting of on-demand food-delivery platforms: A multi-action and multi-agent reinforcement learning framework

Author

Listed:
  • Zhao, Zijian
  • Li, Sen

Abstract

This paper studies the discriminatory order assignment and payment-setting strategies for on-demand food-delivery platforms. We consider an on-demand food-delivery platform that coordinates customers, couriers, and restaurants to maximize the profit. It determines how to bundle orders, assign orders to couriers, and set payments to couriers in real-time. These decisions are made in a personalized manner, depending on the historical data collected from each of the couriers, such as the order acceptance and rejection rates under distinct scenarios of order assignment and payment values. A Markov Decision Process is formulated for the courier, capturing the decisions of the platform (including differentiated order assignment/bundling strategies and the discriminatory payment-settings decisions) while considering its dependence on the personalized work-related data of each individual courier. To derive the optimal policies, we propose a novel multi-action and multi-agent deep reinforcement learning framework, where a double Deep Q-Network is employed to develop discrete order assignment strategies, and double Proximal Policy Optimization is utilized to determine continuous payment decisions. Within this learning framework, we introduce a novel neural network architecture that leverages the Query-Key attention mechanism to transform multiplicative time complexities into additive computation complexity for order assignment, and we adopt a variable-length Bi-LSTM module that compresses variable-length order sequence into a fixed-dimensional feature space to enhance scalability. The proposed neural network and algorithmic framework was validated in a case study using real-world food-delivery data from Hong Kong. By comparing the proposed method with a vanilla MLP-based neural network architecture, we find that the proposed neural network architecture significantly enhances platform performance: it increases the number of orders served by 5.25%, reduces platform expenses by 10%, and improves the overall reward of the platform by over 50%. Additionally, our results reveal that couriers with higher order rejection rates receive more orders during peak hours but earn lower wages. This counterintuitive finding is attributed to a strategic approach by the platform to differentiate order allocation: instead of simply allocating fewer orders to couriers with higher rejection rates, the platform preferentially assigns longer-distance trips to couriers with a higher likelihood of order acceptance. These findings expose the implicit biases in the discriminatory algorithms used by the profit-maximizing platform and highlight potential areas for governmental regulatory intervention. The code of this paper is provided at https://github.com/RS2002/Discriminatory-Food-Delivery.

Suggested Citation

  • Zhao, Zijian & Li, Sen, 2026. "Discriminatory order assignment and payment-setting of on-demand food-delivery platforms: A multi-action and multi-agent reinforcement learning framework," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 208(C).
  • Handle: RePEc:eee:transe:v:208:y:2026:i:c:s1366554525006751
    DOI: 10.1016/j.tre.2025.104653
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1366554525006751
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tre.2025.104653?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:transe:v:208:y:2026:i:c:s1366554525006751. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/600244/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.