alexa Deadlock-Detection via Reinforcement Learning | Open Access Journals
ISSN: 2169-0316
Industrial Engineering & Management
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Deadlock-Detection via Reinforcement Learning

Chen M* and Rabelo L

Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, USA

*Corresponding Author:
Mengmeng Chen
Department of Industrial Engineering and Management Systems
University of Central Florida, Orlando, USA
Tel: 407 823-2204

Received Date: May 02, 2017; Accepted Date: June 01, 2017; Published Date: June 16, 2017

Citation: Chen M, Rabelo L (2017) Deadlock-Detection via Reinforcement Learning. Ind Eng Manage 6: 215. doi:10.4172/2169-0316.1000215

Copyright: © 2017 Chen M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Industrial Engineering & Management


Optimization of makespan in scheduling is a highly desirable research topic, deadlock detection and prevention is one of the fundamental issues. Supported by what learned from this class, a reinforcement learning approach is developed to unravel this optimization difficulty. By evaluating this RL model on forty classical non-buffer benchmarks and compare with other alternative algorithms, we presented a near-optimal result.


Reinforcement learning; Optimization makespan; DL detection


Due to buffer-less setting, deadlock (DL) occurs frequently in resource sharing environment and concurrent computing systems. A deadlock is a state in which each member of a group of actions is waiting for some other member to release a lock. [1] Once this DL state occurred, workflow would stack in a fixed loop and never discharged.

Figure 1 present a typical scheduling problem: 3 jobs need to be operated on 3 different machines following different sequences, each machine can only operate one job each time. How to schedule jobs in specific sequence to minimize total makespan aka processing time without deadlock is a typical optimization problem. Due to limitation of resources, deadlock happened frequently, other than a feasible solution, to find the global optimal deadlock free solution is difficult.


Figure 1: 3 machine 3 job scheduling problem.

There are certain methods to solving deadlock problems: 1. Do nothing, 2. Kill the workflow, 3. Preempt and rollback. Other than kill the workflow, deadlock detection algorithms are more efficient in most cases and additionally, deadlock free scheduler would enable the realtime control for engineering system. Preventing or avoiding deadlock helping maintain system performance aka makespan stay at positive level.

A simple head-tail scheduling example is present in Figure 2. The rectangle stands for resource Ri, symbol Pi stands for jobs, if rectangle is empty them means no job is operating on that resource. Arrow stands for the action of each job. Second level DL has been addressed in the previous articles. Here on Figure 3, a deck lock is presented. Job P3 is moving to resource 3 then resource 1, but once it moved it will be a deadlock because P1 and P2 will be non-moveable. If P2 move to resource 3 first, then P2 and P3 will be a deadlock as they heading to other’s occupation while there is no buffer.


Figure 2: Deadlock I level.


Figure 3: Deadlock II level.

In this paper we present a new reinforcement learning approach solving this scheduling optimization problem. We will test our algorithm’s on the classical buffer-less benchmark and compare with the optimal solution.

Related Work

Local and global deadlock-detection in component-based systems are NP-hard [2]. Wysk et al. [3] developed a deadlock detection via integer programming in finding the optimum makespan. Specifically, they added constraints to ensure agents not release resource unless being assigned the next resource. However, integer programming method can only be used on small size problem as it would take very long time to run. Their integer programming formulation is shown below.

XiK: Completion time of last operation of job i(term K);

Xikj: Completion time of job i operation k on machine j;

Tikj: processing time of job i operation k on machine j;

Yprqaj: {o,1}: 1 if job p operation r follows job q operations on machine j; 0 if otherwise.

H: big positive number,

E: small positive number,

I: set of all jobs {1,2,3..N},

J: set of all machines {1,2,3..M}.



Xikj-Tikj≥ Xi(k-1)l

Xqrj-Xqrj +H(1-Yprqaj) ≥ Tqrj [∀j∈J; ∀(p,q)∈I]

Xqaj-Xqrj+HYprqaj ≥ Tqaj [∀j∈J; ∀(p,q)∈I]

Xp(r-1)1-Xqaj+H(1-Yprqaj) ≥ E [∀j∈J; ∀(p,q)∈I, l∈J]

Xq(r-1)-Xqrj+HYprqaj ≥ E [∀j∈J; ∀(p,q)∈I, w∈J].

In view of the modeling frameworks in the existing literature, three strategies for processing DL and corresponding research work are as follows:

• Deadlock Prevention, which organizes resource usage by each process to ensure that at least one process is always able to get all the resources it needs [4]. Mixed integer programming [5] and region theory [6] are used to solve elementary siphon [7-9] to avoid deadlock. Edsger et al., Gen and Cheng [6,10] developed Banker’s algorithm in 2006. However, these algorithms have many limitations: need fixed processing numbers; no further processing can be started when executing, and also need fixed resources amount.

• Deadlock Avoidance based on the current system state and agents’ future resource request, by restricting the resource allocation to avoid the deadlock. Petri Net model [3,11-14] are complete developed in this area. Di-graph model and Auto-mata model [6,9,12,15,16] were also built to handle avoidance problems. However, current avoidance algorithms are not able to handle high level DL.

• Deadlock Detection and Recovery is more focusing on quickly deadlock issue once detected. This can be completing in a couple ways [6,14], such as aborting certain action or add additional buffer. Still, detect deadlock and schedule a deadlock free path may be more convenient.

Lots of artificial intelligence and operation research effort has been applied to in scheduling problems. Zhang and Dietterich [17] were the first who applied reinforcement learning here. Mahadevan et al. [18] proposed a reinforcement learning algorithm combines different scheduling problem for the optimization of transfer-lines in manufacturing systems. Another maintenance-based approach based on simplified reinforcement learning is suggested by Zeng and Sycara [19].

Research Methodology

First let’s give definition of deadlock on different levels. The first level of a DL is a set of agents that each request collects the resources held by another agent. The second-level DL is a set of agents any action will result in a first-level DL. The high level DL is a set of agents moving any action will result in a second-level DL. Figure 4 deliver a graph of how a high level DL happens.


Figure 4: Deadlock III level.

We would like to use ranking matrix formulate this action system: S=[sij]M×N stand for the state matrix of the system with i∈l={1,2,…, M} and j∈j={1,2,…, N}

Proposition 1

For agents/ jobs A={ai, i=1,…, MA} in the detection system, we have that,

For i=1 then no existence of b∈l, x∈J where equation

For 1<i<MA, then y∈J where equation

For i=MA, then z∈J and equation(Figure 5).


Figure 5: Simple system.

a1 defined as the Last agent of equation and denoted byequation defined as the First agent where equation denoted byequation defined as the head resource of equation denoted by HR(ai)=z. After Pk changing every state, a new matrix will be derivative: if i ≠ k then equation while i=k. using set R={r R={rj, j∈J={1,2,…, M}} stand for M resources, using set P={pi, i∈l={1,2,…,N}} stand for N agents/jobs, all agents follows the predefined working procedure S=[sij]M×N stand for the system state at certain point. sij stand for the position when agent/job pi being processed on resource ri. (Figure 6.1).


Figure 6.1: System with 2nd level deadlock.

If sij>0 means ri not being occupied by agent/job pi yet; if sij=0 then job pi is occupying resource ri ; if sij>0 then job pi finished tasks on ri. Hence, the ranking matrix of state changed from S1 to state S2 while agent/job p1 transferring position from r2 to r4,


Proposition 2

The second level deadlock exist under necessary and sufficient condition as ls⊆l where;

(a) k∈J as for ∀a∈ls, HR(a)=k.

(b) ∀b∈ls as HJ(b)=b there exist equation that equation, and HJ(c) ≠b (Figure 6.2).


Figure 6.2: System with 2nd level deadlock.

If apply Proposition 2 to the previous example in ranking matrix. The state matrix will be:


Proposition 3

The high level deadlock exist under exist necessary and sufficient condition as lT⊆l which:

(a) x,y∈J for

(i) for ∀alT, [HR(a)-x].[ HR(a)-y]=0

(ii) equation

(b) ∀b∈lT as HJ(b)=b

(i) k∈J as sbk=2;

(ii) if c∈lT as sck=0, then HJ(c) ≠b; o.w., (k-x) (k-y)=0, as d∈lT and equation (Figure 7).


Figure 7: System with 3rd level deadlock.

If apply Proposition 3 to the previous example in ranking matrix. The state matrix will be:


1. HR(1)=HR(2)=HR(4)=3. HR(3)=HR(5)=4 so condition (a) satisfied.

2. HJ(2)=2, HJ(3)=3, HJ(4)=4 and s24=s36=s41=2 so condition [b(i)] satisfied.

3. Resource 4 will be required by Agent 2 in two steps and available; also have s25=3, s25=0 and HR(2)=3≠HR(3)=4. Condition [b(ii)] satisfied as s36=2.

3.1 s36=2, s46=0, and HJ(3)=3≠HJ(4)=4.

3.2 s41=2, s11=0, and HJ(4)=4≠HJ(1)=2.

As mentioned above, we consider all collaborative action teams seeking to optimize global rewards, and we assume that we can use our reinforcement learning approach to model the corresponding multiaction stochastic systems and provide a search algorithm. Therefore, there is at least one action sequence that maximizes the expected return of all movements [20-26].

Definition 1

Set Si⊆S be the system state of i, where si={A (π1), A (π2)…A (πi)}, {Ai} denote actions will be executed, πI denotes the policies of action. The action Ai at state I and the reward value equation, represent by:


Cmax denotes the makespan, equation represent the summary of jobs and machines been involved. Here P (π) represents the performance of policy π, and equation represents the local action reward parameter.

Definition 2

Under the set So⊆S, the policy:

equationis defined as the expected local reward equation and set discount factor γ to 1.


There are 40 classical scheduling benchmark problems for testing. The design of these problems adopts complex structure to increase difficulty. Additionally, if these systems are buffer-less, find scheduling will harder. Gantt chart can be drawn based on a DL-free timesheet obtained by each scheduling benchmark [27-31].

As shown in Figures 8 and 9, they present benchmark LA08 (15 × 5) and benchmark LA16 (10 × 10). We test the performance of our algorithm in this 40 benchmarks with backtracking counting’s, we also compare the running time between with and without DL detection. Algorithms is written in Matlab, and the workflow of our RL algorithm is shown in Figure 10.


Figure 8: Optimal schedule of LA08.


Figure 9: Optimal schedule of LA16.


Figure 10: DL detection algorithm via RL.

The results of 40 benchmark testing with our algorithm are given in Table 1. From the results, we found that:

Problem (size) 2LD 3LD Makespan Optimal Makespan
Time Backtrack Time Backtrack
FT 06 <1 sec. 0 <1 sec. 0 512 512
LA 01 (10 × 5) <1 sec. 0 <1 sec. 0 1096 1073
LA 02 (10 × 5) <1 sec. 0 <1 sec. 0 1025 1025
LA 03 (10 × 5) <1 sec. 0 <1 sec. 0 857 817
LA 04 (10 × 5) <1 sec. 0 <1 sec. 0 933 827
LA 05 (10 × 5) <1 sec. 0 <1 sec. 0 879 879
LA 06 (15 × 5) <1 sec. 0 <1 sec. 7 1390 N/A in 48 hours
LA 07 (15 × 5) <1 sec. 0 <1 sec. 13 1337 N/A in 48 hours
LA 08 (15 × 5) <1 sec. 0 <1 sec. 19 1314 N/A in 48 hours
LA 09 (15 × 5) <1 sec. 0 <1 sec. 0 1609 N/A in 48 hours
LA 10 (15 × 5) <1 sec. 0 <1 sec. 8 1525 N/A in 48 hours
LA 11 (20 × 5) <1 sec. 0 <1 sec. 3 1873 N/A in 48 hours
LA 12 (20 × 5) <1 sec. 0 <1 sec. 17 1726 N/A in 48 hours
LA 0 13 (20 × 5) <1 sec. 0 <1 sec. 14 1895 N/A in 48 hours
LA 14 (20 × 5) <1 sec. 0 <1 sec. 0 1901 N/A in 48 hours
LA 15 (20 × 5) <1 sec. 0 <1 sec. 0 2015 N/A in 48 hours
LA 16 (10 × 10) <1 sec. 1 <1 sec. 16 1498 N/A in 48 hours
LA 17 (10 × 10) <1 sec. 0 <1 sec. 113 1187 N/A in 48 hours
LA 18 (10 × 10) <1 sec. 0 <1 sec. 12 1478 N/A in 48 hours
LA 19 (10 × 10) <1 sec. 6 <1 sec. 31 1412 N/A in 48 hours
LA 20 (10 × 10) <1 sec. 0 <1 sec. 0 1514 N/A in 48 hours
LA 21 (15x10) <1 sec. 21 <1 sec. 409 2051 N/A in 48 hours
LA 22 (15 × 10) <1 sec. 29 3 sec. 12053 1811 N/A in 48 hours
LA 23 (15 × 10) <1 sec. 143 <1 sec. 1339 2032 N/A in 48 hours
LA 24 (15 × 10) <1 sec. 22 <1 sec. 888 1934 N/A in 48 hours
LA 25 (15 × 10) <1 sec. 108 <1 sec. 12430 1983 N/A in 48 hours
LA 26 (20 × 10) <1 sec. 38 45 sec. 624349 2666 N/A in 48 hours
LA 27 (20 × 10) <1 sec. 36 <1 sec. 221 2730 N/A in 48 hours
LA 28 (20 × 10) <1 sec. 6 <1 sec. 799 2600 N/A in 48 hours
LA 29 (20 × 10) <1 sec. 196 <1 sec. 8683 2621 N/A in 48 hours
LA 30 (20 × 10) <1 sec. 23 <1 sec. 591 2774 N/A in 48 hours
LA 31 (30 × 10) <1 sec. 33 <1 sec. 3174 3701 N/A in 48 hours
LA 32 (30 × 10) <1 sec. 73 47 sec. 295725 3997 N/A in 48 hours
LA 33 (30 × 10) <1 sec. 71 4 sec. 46982 3791 N/A in 48 hours
LA 34 (30 × 10) <1 sec. 94 4 sec. 26426 3929 N/A in 48 hours
LA 35 (30 × 10) <1 sec. 68 <1 sec. 9705 4076 N/A in 48 hours
LA 36 (15 × 15) <1 sec. 69 Not available in 3hrs   2543 N/A in 48 hours
LA 37 (15 × 15) <1 sec. 239 <1 sec. 1765 2800 N/A in 48 hours
LA 38 (15 × 15) <1 sec. 339 8 min.   2301 N/A in 48 hours
LA 39 (15 × 15) <1 sec. 35 <1 sec. 1922 2386 N/A in 48 hours
LA 40 (15 × 15) <1 sec. 415 15 min. 8518357 2578 N/A in 48 hours

Table 1: Evaluation table.

(1) All 40 benchmarks can be solved via algorithm within acceptable time frame.

(2) Our results are very close to the optimal solution. These shows our policy-based RL approach is effective in reducing time and cost.

(3) Due to the system difficulty, once the system becomes larger, the number of backtracking increases. Backtracking numbers are 0 for first 15 benchmark, and the number increases as the states increases. However, for all benchmark problems, our number of backtracking used is kept at a low level.

(4) Once a DL event occurs, our scheduling algorithm can rearrange and generate a new DL-free timesheet within 1 seconds. Therefore, we can assuming that our DL-free algorithm would be applied to other similar structure systems. Additionally, under more power computation system this algorithms making itself a qualified tool for real-time operation system (Table 1).


Based on the ranking matrix, graph model and reinforcement learning, a new corresponding DL detection algorithm is proposed by us, and using that the author analyzed the general pattern of high-level DL detection problem based on discrete system, using the classical forty benchmark problems. However due to the heavy computation, some work might took very long term, but this can be solved in time while the computation speed is exponential increasing.

This algorithm is developed under the buffer less environmental which is much more difficulty compare to real world. Therefore, it is worth believing that our algorithm should be extended to other resource sharing systems.

Based on this DL detection algorithms, relax some certain constrains new limited buffer DL detection algorithms can be developed and can be widely applied in the mechanical system, parallel computing system, and the future is quite bright.


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 79
  • [From(publication date):
    August-2017 - Aug 19, 2017]
  • Breakdown by view type
  • HTML page views : 63
  • PDF downloads :16

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

1-702-714-7001Extn: 9037

Business & Management Journals


1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

1-702-714-7001 Extn: 9042

© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version