RL-for-Transportation

Paper list of Reinforcement Learning (RL) applied on transportation

RL-for-Transportation
- Ride-sourcing system
- Intersection control

Ride-sourcing system

Book

Approximate Dynamic Programming: Solving the curses of dimensionality. Powell, W. B. (2007).

Paper

Order dispatching

1. A Taxi Order Dispatch Model based On Combinatorial Optimization. 2017. KDD

predict cancellation probability of vehicle-order pair $p_{ij}$
maximize total success rate:
1. $a_{ij}$: matching decision
2. NP hard combinatorial optimization
  1. HillClimbing Algorithm

2. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. 2018. KDD

bipartite matching + evaluation
1. weight: advantage trick
  1. discounted reward + furture value - current value
2. state: time & space zone (no contextual information)
3. value: policy evaluation (offline)

3. A Deep Value-network Based Approach for Multi-Driver Order Dispatching. 2019. KDD

bipartite matching + policy evaluation
1. weight: advantage value - distance
2. state:
  1. time & space zone + supply-demand info
  2. coarse coding with hierarchical hexagon grid
3. value (offline)
  1. policy evaluation: with supply-demand info
  2. distillation: marginalize time-space value

4. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. 2019. WWW

MARL (on policy)
1. state: contextual information
2. acrion: active order pool
  1. mean action: defined as number of neighbor drivers
3. reward:
  1. order fare
  2. pick distance
  3. destination supply-demand gap

5. Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching. 2019. CIKM

MARL (on policy)
1. state: contextual features
2. TD loss + KL regulizer
  1. regulizer: distribution between order and vehicles

Order delaying

1. Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. 2019. TKDE

MARL (on policy)
1. state: contextual features
2. action: {0,1}, match or hold
3. reward: customer waiting time
  1. weighted global + individual reward

2. Optimizing matching time intervals for ride-hailing services using reinforcement learning. 2021. TRC

RL (off policy)
1. state: global grid-based state -> flatten
2. action: {0,1}, match or hold
3. reward:
  1. matching wating time
  2. pickup wiating time

Order pooling

1. Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining. 2018. Big Data

RL (on policy)
1. state: time & space grid
2. action: wait, TK1, TK2
  1. wait: stay current location
  2. TK1: pick orders within max pick time
  3. TK2: TK1 + larger pick time for second order
3. reward: effective distance traveled

2. DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning. 2019. ITS

RL (on policy)
1. state: global supply-demand profile map
2. action (sequentially): follow shorest path
  1. find another customer
  2. next zone
3. reward
  1. served customer
  2. detour time

3. AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection. 2021.

The same as DeepPool
1. consider the change of MDP （with different models）
2. online Dirichlet change point detection (ODCP) to detect changes

4. An Integrated Decomposition and Approximate Dynamic Programming Approach for On-Demand Ride Pooling. 2018. ITS

ADP
1. decision
  1. routes is determined using the shortest-path strategy
  2. one decision one assignment
  3. linear approximation for value function
  4. linear assignment problem
    1. dual update

5. Neural Approximate Dynamic Programming for On-Demand Ride-Pooling. 2020. AAAI

ADP
1. decision
  1. follow shorest path
  2. generate feasible order combinations
  3. value approximation: individual value
    1. low-dimensional embedding for each location of vehicle
  4. linear assignment
    1. TD update

6. Conditional Expectation Based Value Decomposition For Scalable On-demand Ride Pooling

like NeuralADP
1. average of neighbor value to approximate indivadule value:
2. each neighbor value is the conditional expectation of action
3. final approximation

Order pricing

1. InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms. 2019. ICDM

RL (on-policy)
1. use TD learning to calculate furture value
2. use Contextual Bandit to give price

Vehicle relocation

1. A Cost-Effective Recommender System for Taxi Drivers. 2014. KDD

recommend route for vacant vehicles
1. segment profit
  1. earning:
  2. cost:
2. expected route profits:
3. algorithm
  1. Brute-Force based MNP Recommendation
  2. Recursive Recommendation Strategy
4. multi-driver routes recommendation
  1. recommend the route with the lowest correlationship to the second driver

2. Optimizing Taxi Driver Profit Efficiency: A Spatial Network-based Markov Decision Process Approach. 2015. Big Data

Defined as MDP:
1. calibrate pick probability (discounted by number of taxis)
2. passenger destination probability
solving
1. Rolling Horizon:
2. DP approach
  1. discounting pick prob:

3. Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning. 2020. TRC

similar to last one
1. long-horizon
2. discounted prob of competing drivers

4. MOVI: A Model-Free Approach to Dynamic Fleet Management. 2018. INFOCOM

RL (on-policy)
1. state: heatmap + CNN
2. making decision sequentially for each vehicle

5. Credit Assignment For Collective Multiagent RL With Global Rewards. 2018. NIPS

MARL (on policy)
1. centralize critic + decentralized policy
2. conut based global state
3. difference policy gradient
  1. Wonderful Life Utility (WLU)
    1. with and without agent i
  2. Aristrocratic Utility (AU)
    1. fix actions of other agents, marginalize agent i
    2. conterfactual
4. approximating central xritic
  1. linear combination
    1. individual information
    2. f without info of other agents
  2. first-order approximation
    1. those coefficients are evaluated at the overall state-action counts

6. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. 2018. KDD

RL (on-policy)
1. state + contextual features
2. action: neighbor girds
  1. sequentially make decision
  2. avoid moving in conflict directions
    1. add collaborative context indicating directions of previous vehicles
  3. avoid moving to low-value grid

7. Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning. 2020. NIPS

Policy evaluation (off-line)
1. time-space value function
  1. dual policy evaluation
    1. conditional value: V(s|b)
    2. marginal value: V(s)
  2. Value-based Policy Search (VPS)
    1. one-step:
    2. two-step:
  3. implementation
    1. step selection: small works well
    2. long search: choose global top points
    3. contextual value:
    4. SD regulizer
      1. add destination supply-demand gap

Joint dispatching and relocation

1. CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms. 2019. CIKM

MARL (on-policy), sequentially decision making
1. hierarchical strucutre
  1. upper level
    1. generate encoding of env using RNN
  2. lower level
    1. using info from upper level, generate prob of different grids
    2. dispatching and relocating
2. reward
  1. gap between manager’s entropy and global average entropy
  2. KL divergence of supplt and demand
3. coordination
  1. using attention to aggregate info of neighbor grids

2. Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms. 2021. KDD

policy evaluation (off-line)
on-line updateing
1. using current transitions
ensemble of offline and online value
dispatching: bipartite matching
relocaintg:

3. An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching. 2021. TNNLS

RL (on-policy)
1. centralized programming model
  1. planning in both dispatching and relocating
2. TD learning for updating value function

4. Path-based dynamic pricing for vehicle allocation in ridesharing systems with fully compliant drivers. 2019. TRB

ADP (marco level)
1. decision
2. path based pricing (market cleaning)
3. routing after distaching (constrained zone choice)
4. order sharing
5. relocation
piece-wise linear approximation of value function

Intersection control

Survey

Dataset

Competition

City Brain Challenge

Paper

Single-agent

1. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. 2018. KDD

state:
action: {1,0}
1. change to the next phase
2. keep phase
reward:
1. wighted reward of (queue length, delay, waiting time, light switches, number of vehicles, and travel time)
algorithm:
1. DQN

2. Learning Traffic Signal Control from Demonstrations. 2019. CIKM

1. imitation learning
   1. actor: ![](pic/2021-10-25-19-38-48.png)
   2. critic:![](pic/2021-10-25-19-39-39.png)

3. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network. 2019. KDD

state:
1. current pahse (one-hot)
2. number of vehicles
action:
1. pre-defined phases
reward:
1. pressure for movement:
2. total pressure:
algorithm:
1. DQN

4. PDLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Pressure and Dynamic Light Duration. 2020

reward
1. Pressure with Remaining Capacity of Outgoing Lane:

5. Learning Phase Competition for Traffic Signal Control. 2019. CIKM

state
1. current phase (one-hot)
2. number of vehicles
action:
1. pre-defined phases
reward:
1. queue length
invariance:
1. flip and rotation

6. Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. 2020. AAAI

1. FRAP + pressure reward
2. reward:
   1. pressure based on queuing vehicles

7. AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control. 2020. NIPS

state
1. traffic characteristics in each lane
action
1. pre-defined phases
reward
1. pressure
algorithm:
1. PG + MC
invariance
1. topology

8. GeneraLight: Improving Environment Generalization of Traffic Signal Control via Meta Reinforcement Learning. 2020. CIKM

gradient-based meta learning
1. training agent in clusetered environments
2. meta-training

9. MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control. 2020. AAAI

FRAP + gradient-based meta-learning

Multi-agent

1. CoLight: Learning Network-level Cooperation for Traffic Signal Control. 2019. CIKM

information aggregation:
GAT to aggregation neighbor intersection information

2. Multi-agent Reinforcement Learning for Networked System Control. 2020. ICLR

differentiable commnication

3. Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control. 2021

intrinsic reward
1. error of predict neighbor reward and transitions
latent variable policy
1. RNN encoded environment

4. Hierarchically and Cooperatively Learning Traffic Signal Control. 2021. AAAI

Hierarchy
1. select sub-policies with different reward function
weighted local and neighbor reward
1. adaptive weighting mechanism

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
pic		pic
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

License

pankajksinha123/RL-for-Transportation

Folders and files

Latest commit

History

Repository files navigation

RL-for-Transportation

Ride-sourcing system

Survey

Dataset

Competition

Book

Paper

Order dispatching

1. A Taxi Order Dispatch Model based On Combinatorial Optimization. 2017. KDD

2. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. 2018. KDD

3. A Deep Value-network Based Approach for Multi-Driver Order Dispatching. 2019. KDD

4. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. 2019. WWW

5. Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching. 2019. CIKM

Order delaying

1. Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. 2019. TKDE

2. Optimizing matching time intervals for ride-hailing services using reinforcement learning. 2021. TRC

Order pooling

1. Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining. 2018. Big Data

2. DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning. 2019. ITS

3. AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection. 2021.

4. An Integrated Decomposition and Approximate Dynamic Programming Approach for On-Demand Ride Pooling. 2018. ITS

5. Neural Approximate Dynamic Programming for On-Demand Ride-Pooling. 2020. AAAI

6. Conditional Expectation Based Value Decomposition For Scalable On-demand Ride Pooling

Order pricing

1. InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms. 2019. ICDM

Vehicle relocation

1. A Cost-Effective Recommender System for Taxi Drivers. 2014. KDD

2. Optimizing Taxi Driver Profit Efficiency: A Spatial Network-based Markov Decision Process Approach. 2015. Big Data

3. Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning. 2020. TRC

4. MOVI: A Model-Free Approach to Dynamic Fleet Management. 2018. INFOCOM

5. Credit Assignment For Collective Multiagent RL With Global Rewards. 2018. NIPS

6. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. 2018. KDD

7. Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning. 2020. NIPS

Joint dispatching and relocation

1. CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms. 2019. CIKM

2. Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms. 2021. KDD

3. An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching. 2021. TNNLS

4. Path-based dynamic pricing for vehicle allocation in ridesharing systems with fully compliant drivers. 2019. TRB

Intersection control

Survey

Dataset

Competition

Paper

Single-agent

1. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. 2018. KDD

2. Learning Traffic Signal Control from Demonstrations. 2019. CIKM

3. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network. 2019. KDD

4. PDLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Pressure and Dynamic Light Duration. 2020

5. Learning Phase Competition for Traffic Signal Control. 2019. CIKM

6. Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. 2020. AAAI

7. AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control. 2020. NIPS

8. GeneraLight: Improving Environment Generalization of Traffic Signal Control via Meta Reinforcement Learning. 2020. CIKM

9. MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control. 2020. AAAI

Multi-agent

1. CoLight: Learning Network-level Cooperation for Traffic Signal Control. 2019. CIKM

2. Multi-agent Reinforcement Learning for Networked System Control. 2020. ICLR

3. Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control. 2021

4. Hierarchically and Cooperatively Learning Traffic Signal Control. 2021. AAAI

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages