How do Geek+'s algorithms assign multiple shelf handling tasks to multiple robots? And how to they calculate the relationship between order, workstation and inventory?
Before, these problems only considered the situation at the time, based on manually set rules.
But as Dr. Wenzhe Tan, Vice President of Algorithms at Geek+ and Director of Geek+ AI Research Center knows, our engineers are capable of so much more.
During the 2022 Global Logistics Technology Conference, an annual event of logistics technology in Asia, Dr. Tan delivered a keynote speech at the Operations Research Logistics Application Forum, sharing the application of operations research in intelligent logistics. He also shared his experience on the application of operations research for tech businesses with experts and scholars from Alibaba, Huawei and top universities in a roundtable discussion.
The 2022 Global Logistics Technology Conference is organized by the CFLP, a major federation of Logistics and Purchasing in Asia. As a benchmark for cutting-edge technology innovation in the industry, the conference brings together global academic leaders and experts to discuss the latest technology achievements and future R&D direction.
In the Operations Research Forum, Warren B. Powell, Professor Emeritus of Princeton University, and Professor Zhao Lei of Tsinghua University's Department of Industrial Engineering both emphasized that the Sequential Decision Models in Operations Research has become an effective modeling and analysis tool in current logistics management. Dr. Tan also emphasized the application of Sequential Decision Models in robotic intelligent logistics, and the data-driven way to improve the intelligence of system decision-making, which effectively helps customers to reduce cost and increase efficiency.
Application of Sequential Decision Making at Geek+
The robot-based smart warehouse can be modeled as a typical sequential decision model (Figure 1), where the system can collect information such as orders, workstations, inventory, robots, at each moment, and make a series of decisions such as dispatch hit, task assignment, path planning scheduling, etc.
▲ Figure 1 Sequential decision problem in smart warehouse
This process is continuously cycled to complete the intelligent storage system function. Each time the algorithm is faced with a huge amount of information and needs to make complex decisions involving various systems, Geek+ splits the system into several key steps to be tackled step by step.
Task assignment and dispatch hit are two of the most typical problems. The task assignment (Figure 2) refers to the distribution problem between giving shelf handling tasks to robots, aiming to establish the mapping relationship between multiple tasks and multiple robots, which is a classic operation optimization problem.
▲ Figure 2 Schematic diagram of task assignment
The second step is the dispatch hit problem faced in the smart warehouse, which aims to establish the decision matching relationship between order-workstation-inventory (as in Figure 3). These two typical problems of previous decision sites often only considered the situation at the time and was based on manually set rules. Dr Wenzhe Tan wondered, if the impact of the decision on the subsequent process was considered, could there be better decision outcome? In the report, Dr. Wenzhe Tan introduced the series of exploration of task assignment sequential decision making by Geek+.
▲ Figure 3 Schematic diagram of dispatching hits
▲ Figure 4 Task assignment
Exploration 1: Historical Data-Driven Task assignment
The task assignment is the basis for the operation of Geek+ unmanned warehouse, and the good or bad task assignment decisions directly determine the order completion time, robot utilization and other efficiency indicators on site (Figure 4). The algorithm usually considers multiple factors such as site traffic, order priority, and shelf heat for assignment. In the practice of continuous implementation, Geek+ also continues to think about a problem: the warehouse site faces different uncertainties at all times, should the value of the task be different in different warehouses, different workstation queuing situations, and even different picking speeds?
▲Figure 5 Using The Bellman equation to model the intelligent warehouse task assignment problem
Through complex scenarios and continuous technical innovation, Geek+ proposed a data-driven global intelligent task assignment model (Figure 5). The algorithm first collects a large amount of historical data from different warehouses, different moments and different scenarios, and retrospectively mines and analyzes its situation at that time to generate the expected value function. In turn, during the real-time run, the algorithm will calculate the immediate value of the current moment in real time and synthesize the expected value generated by the expected value function, and the scenario that makes the largest system reward value in the final match will be adopted.
Run data will be accumulated into the historical database, the training value function will be updated, forming a closed-loop algorithm optimization. The data does not require implementation staff dedicated to a specific warehouse for pre-set rules, significantly reducing the implementation cycle of the project.
The new integrated value integrates the impact of current state and expectation, which helps to better perceive the task state and temporal expectation state at different moments, and achieves more than 15% improvement in efficiency under the same conditions in a sampling scenario.
Exploration 2: Adaptive value function-driven task assignment algorithm
The success of Exploration 1 brought considerable efficiency improvement, and at the same time triggered further thinking at Geek+. Can the value function of the task sense the environmental information such as order and robot density, and change adaptively with the system operation?
To address these questions, Geek+ proposed a new adaptive value function-driven task assignment algorithm. The algorithm further considers the impact of path planning in the task assignment process, uses Online Reinforcement Learning (ORL) method to mine the order demand characteristics and adaptively adjusts the task assignment strategy to accurately sense global efficiency bottlenecks, making the optimization process more targeted and real-time, thus improving the operational efficiency of AMR.
As shown in Figure 7, the whole algorithm process is divided into four steps: information collection, model training, task selection, and path planning. First, the algorithm collects the spatio-temporal information of pickers, AMRs and shelves (e.g. picking time of pickers, estimated task completion time of shelves, location information of AMRs, etc.). After the collection, the adaptive planning module models the spatio-temporal information based on Markov Decision Process (MDP) model and trains the value function using the Q-Learning method in reinforcement learning. After that, the algorithm selects appropriate tasks to assign to AMR based on the value function, and finally plans the path for AMR based on the selection scheme.
▲Figure 7 Complete algorithm flowchart
The state definition in the sequential decision model integrates the work states of shelves and pickers, and this modeling method helps to allow the reinforcement learning intelligence (Agent) to better perceive the efficiency bottlenecks in the whole process from handling to picking. In addition, the online learning approach is used to update the state-action value function in a timely manner, which further improves the adaptiveness of the algorithm.
▲Fig. 8 Efficiency bottleneck changes with time at different stages
From the beginning, Geek+ has been actively exploring innovation in the algorithm of task assignment models, which has brought more than 20% improvement in the same scene digital twin environment. In the future, Geek+ will do what it does best: continue to optimize and improve to better empower intelligent logistics.