學習文獻來源:
1. Combining VSIDS and CHB Using Restarts in SAT。
引用:
@inproceedings{DBLP:conf/cp/CherifHT21,
author = {Mohamed Sami Cherif and
Djamal Habet and
Cyril Terrioux},
editor = {Laurent D. Michel},
title = {Combining {VSIDS} and {CHB} Using Restarts in {SAT}},
booktitle = {27th International Conference on Principles and Practice of Constraint
Programming, {CP} 2021, Montpellier, France (Virtual Conference),
October 25-29, 2021},
series = {LIPIcs},
volume = {210},
pages = {20:1--20:19},
publisher = {Schloss Dagstuhl - Leibniz-Zentrum f{\"{u}}r Informatik},
year = {2021},
url = {https://doi.org/10.4230/LIPIcs.CP.2021.20},
doi = {10.4230/LIPICS.CP.2021.20},
timestamp = {Wed, 21 Aug 2024 22:46:00 +0200},
biburl = {https://dblp.org/rec/conf/cp/CherifHT21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
|
|
AbstractConflict Driven Clause Learning (CDCL) solvers are known to be efficient on structured instances and manage to solve ones with a large number of variables and clauses. An important component in such solvers is the branching heuristic which picks the next variable to branch on. In this paper, we evaluate different strategies which combine two state-of-the-art heuristics, namely the Variable State Independent Decaying Sum (VSIDS) and the Conflict History-Based (CHB) branching heuristic.
These strategies take advantage of the restart mechanism, which helps to deal with the heavy-tailed phenomena in SAT, to switch between these heuristics thus ensuring a better and more diverse exploration of the search space. 這些策略利用SAT中的重啟機制,該機制有助于處理SAT中的重尾現象,從而在這些啟發式方法之間進行切換,確保對搜索空間進行更全面且更多樣化的探索。 Our experimental evaluation shows that combining VSIDS and CHB using restarts achieves competitive results and even significantly outperforms both heuristics for some chosen strategies. 我們的實驗評估表明,結合VSIDS和CHB并使用重啟機制能夠取得具有競爭力的結果,甚至在某些選定策略中顯著優于這兩種啟發式方法。 |
|
|
備注:兩類啟發式最早來源文獻: VSIDS --------------------------------------------------------------------------------- @inproceedings{DBLP:conf/dac/MoskewiczMZZM01,
author = {Matthew W. Moskewicz and
Conor F. Madigan and
Ying Zhao and
Lintao Zhang and
Sharad Malik},
title = {Chaff: Engineering an Efficient {SAT} Solver},
booktitle = {Proceedings of the 38th Design Automation Conference, {DAC} 2001,
Las Vegas, NV, USA, June 18-22, 2001},
pages = {530--535},
publisher = {{ACM}},
year = {2001},
url = {https://doi.org/10.1145/378239.379017},
doi = {10.1145/378239.379017},
timestamp = {Sat, 30 Sep 2023 09:38:31 +0200},
biburl = {https://dblp.org/rec/conf/dac/MoskewiczMZZM01.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
CHB ------------------------------------------------------------------------------ @inproceedings{DBLP:conf/aaai/LiangGPC16,
author = {Jia Hui Liang and
Vijay Ganesh and
Pascal Poupart and
Krzysztof Czarnecki},
editor = {Dale Schuurmans and
Michael P. Wellman},
title = {Exponential Recency Weighted Average Branching Heuristic for {SAT}
Solvers},
booktitle = {Proceedings of the Thirtieth {AAAI} Conference on Artificial Intelligence,
February 12-17, 2016, Phoenix, Arizona, {USA}},
pages = {3434--3440},
publisher = {{AAAI} Press},
year = {2016},
url = {https://doi.org/10.1609/aaai.v30i1.10439},
doi = {10.1609/AAAI.V30I1.10439},
timestamp = {Mon, 01 Jul 2024 10:37:52 +0200},
biburl = {https://dblp.org/rec/conf/aaai/LiangGPC16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
|
|
1 IntroductionIn recent years, combining VSIDS and CHB has shown promising results. For instance, the MapleCOMSPS solver, which won several medals in the 2016 and 2017 SAT competitions, switches from VSIDS to CHB after a set amount of time, or alternates between both heuristics by allocating the same duration of restarts to each one. 近年來,將VSIDS與CHB相結合已顯示出良好的效果。例如,在2016年和2017年SAT競賽中獲得多項獎牌的MapleCOMSPS求解器,在經過一定時間后會從VSIDS切換到CHB,或者通過為每種啟發式分配相同長度的重啟時間來在兩者之間交替使用。
備注1:此處介紹了最初較為簡單的切換啟發式的時機。
Yet, we still lack a thorough analysis on such strategies in the state of art as well as a comparison with new promising methods based on machine learning in the context of SAT solving. Indeed, recent research has also shown the relevance of machine learning in designing efficient search heuristics for SAT as well as for other decision problems. 然而,我們仍然缺乏對這類策略的徹底分析,以及在SAT求解背景下與基于機器學習的新有前景方法的比較。事實上,最近的研究也表明,機器學習在設計高效的SAT搜索啟發式方法以及其他決策問題中具有相關性。
One of the main challenges is defining a heuristic which can have high performance on any considered instance. 主要挑戰之一是定義一個啟發式方法,使其在任何考慮的實例上都能表現出高性能。眾所周知,一種啟發式方法在一個實例族上可能表現非常出色,而在另一個實例族上卻可能嚴重失效。
To this end, several reinforcement learning techniques can be used, specifically under the Multi-Armed Bandit (MAB) framework, to pick an adequate heuristic among CHB and VSIDS for each instance. 為此,可以使用幾種強化學習技術,在多臂機(MAB)框架下,為每個實例在CHB和VSIDS之間選擇一個合適的啟發式方法。 These strategies also take advantage of the restart mechanism in modern CDCL algorithms to evaluate each heuristic and choose the best one accordingly. The evaluation is usually achieved by a reward function, which has to estimate the efficiency of a heuristic by relying on information acquired during the runs between restarts. In this paper, we want to compare these different strategies and, in particular, we want to know whether incorporating strategies which switch between VSIDS and CHB can achieve a better result than both heuristics and bring further gains to practical SAT solving. 這些策略還利用現代CDCL算法中的重啟機制來評估每種啟發式方法,并據此選擇最佳的一種。評估通常通過獎勵函數實現,該函數需要依靠重啟之間運行過程中獲取的信息來估計啟發式方法的效率。在本文中,我們希望比較這些不同的策略,特別是想了解:采用在VSIDS和CHB之間切換的策略,是否能比這兩種啟發式方法都取得更好的效果,并為實際的SAT求解帶來進一步的提升。
備注2: 此處介紹了本文核心MAB機制:利用重啟之間運行過程中獲取的信息,構建并使用獎勵函數,來評估多種啟發式策略的效率,在比較之后確定下一步的策略選擇。 附帶的也可以評估為什么早期簡單粗暴的多策略切換會顯現出一定的優勢。
|
|
2 Preliminaries
|
|
3 Related Work3.1 Branching Heuristics for SAT 3.1.1 VSIDS |
|
|
3.1.2 CHB
multiplier is set to 1.0 when branching, propagating or asserting the variable that triggered the score update lead to a conflict, else it is set to 0.9. The idea is to give extra rewards for variables producing a conflict.
當分支、傳播或斷言觸發分數更新的變量時,乘數設為1.0;若導致沖突,則乘數設為0.9。其思路是對產生沖突的變量給予額外獎勵。 |
|
3.2 Multi-Armed Bandit ProblemA Multi-Armed Bandit (MAB) is a reinforcement learning problem consisting of an agent and a set of candidate arms from which the agent has to choose while maximizing the expected gain. The agent relies on information in the form of rewards given to each arm and collected through a sequence of trials. 多臂機(MAB)是一種強化學習問題,它由一個智能體和一組候選臂組成,智能體需要在這些臂中進行選擇,同時最大化預期收益。智能體依靠以獎勵形式呈現的信息,這些獎勵是通過一系列試驗收集到的,每個臂都會給出相應的獎勵。
An important dilemma in MAB is the tradeoff between exploitation and exploration as the agent needs to explore underused arms often enough to have a robust feedback while also exploiting good candidates which have the best rewards. 在多臂機(MAB)中,一個重要的困境是利用與探索之間的權衡。因為智能體需要經常探索那些使用率較低的臂以獲得穩健的反饋,同時也需要利用那些獎勵最佳的優質候選臂。
The first MAB model, stochastic MAB, was introduced in [26] then different policies have been devised for MAB [1, 4, 6, 38, 39]. In recent years, there was a surge of interest in applying reinforcement learning techniques and specifically those related to MAB in the context of SAT solving. In particular, CHB [29] and LRB [30] (a variant of CHB) are based on ERWA [38] which is used in non-stationary MAB problems to estimate the average rewards for each arm. 近年來,在SAT求解的背景下,人們對應用強化學習技術以及特別是與多臂機(MAB)相關的技術產生了濃厚興趣。其中,CHB[29]和LRB[30](CHB的一種變體)均基于ERWA[38],而ERWA用于非平穩多臂機問題中,以估計每個臂的平均獎勵。 Furthermore, a new approach, called Bandit Ensemble for parallel SAT Solving (BESS), was devised in [27] to control the cooperation topology in parallel SAT solvers, i.e. pairs of units able to exchange clauses, by relying on a MAB formalization of the cooperation choices. 此外,文獻[27]提出了一種名為用于并行SAT求解的強盜集成(Bandit Ensemble for parallel SAT Solving,BESS)的新方法,以通過依賴合作選擇的多臂機(MAB)形式化來控制并行SAT求解器中的合作拓撲結構,即能夠交換子句的單元對。
MAB frameworks were also extensively used in the context of Constraint Satisfaction Problem (CSP) solving to choose a branching heuristic among a set of candidate ones at each node of the search tree [42] or at each restart [41, 13]. MAB框架也在約束滿足問題(CSP)求解的背景下被廣泛使用,用于在搜索樹的每個節點[42]或每次重啟[41, 13]時,在一組候選分支啟發式中選擇一個。
Finally, simple bandit-driven perturbation strategies to incorporate random choices in constraint solving with restarts were also introduced and evaluated in [36]. 最后,在[36]中還介紹了并評估了一種簡單的由劫持者驅動的擾動策略,該策略通過重啟在約束求解中引入隨機選擇。
The MAB framework we introduce in the context of SAT in Section 4.2 is closely related to those introduced in [41, 36] in the sense that we also pick an adequate heuristic at each restart. In particular, our framework is closer to the one in [36] in terms of the number of candidate heuristics and the chosen reward function and yet it is different in the sense that we consider two efficient state-of-the-art heuristics instead of perturbing one through random choices which may deteriorate the efficiency of highly competitive SAT solvers. 我們在第4.2節關于SAT的上下文中引入的MAB框架與文獻[41, 36]中介紹的框架密切相關,因為我們在每次重啟時選擇了一個合適的啟發式方法。具體而言,我們的框架在候選啟發式數量和所選獎勵函數方面更接近文獻[36]中的框架,但在考慮高效前沿啟發式方法方面有所不同——我們采用兩種高效的前沿啟發式方法,而不是通過隨機選擇來擾動一種啟發式方法,后者可能會降低高性能SAT求解器的效率。 |
|
4 Strategies to Combine VSIDS and CHB Using RestartsIn this section, we describe different strategies which take advantage of the restart mechanism in SAT solvers to combine VSIDS and CHB. First, we describe simple strategies which are either static or random. Then, we describe reinforcement learning strategies, in the context of a MAB framework, which rely on information acquired through the search to choose the most relevant heuristic at each restart. 在本節中,我們描述了利用SAT求解器中的重啟機制來結合VSIDS和CHB的不同策略。首先,我們描述了一些簡單的策略,這些策略要么是靜態的,要么是隨機的。然后,我們在多臂機(MAB)框架的背景下描述了強化學習策略,這些策略依靠搜索過程中獲得的信息,在每次重啟時選擇最相關的啟發式方法。 |
|
|
4.1 Static and Random Strategies
RD 每次重啟時切換(頻繁切換,完全隨機或機會相等); SS 總用時t的一半時切換一次; RR ——頻繁交替切換 This strategy alternates between VSIDS and CHB in the form of a round robin. This is similar to the strategy used in the latest version of MapleCOMSPS [28]. However, since we want to consider strategies which are independent from the restart policy and which only focus on choosing the heuristics, we do not assign equal amounts of restart duration (in terms of number of conflicts) to each heuristic and, instead, let the duration of restarts augment naturally with respect to the restart policy of the solver. 該策略以循環賽制的方式在VSIDS和CHB之間交替進行。這類似于MapleCOMSPS最新版本中采用的策略[28]。然而,由于我們希望考慮與重啟策略無關且僅專注于選擇啟發式方法的策略,因此我們沒有為每種啟發式方法分配相等的重啟持續時間(以沖突數量為單位),而是讓重啟的持續時間根據求解器的重啟策略自然增長。 |
|
|
4.2 Multi-Armed Bandit Strategies
要選擇一個臂,多臂機(MAB)策略通常依賴于在每次運行期間計算的獎勵函數,以估計所選臂的性能。
The reward function plays an important role in the proposed framework and has a direct impact on its efficiency. We choose a reward function that estimates the ability of a heuristic to reach conflicts quickly and efficiently. 獎勵函數在所提出的框架中起著重要作用,并對其效率產生直接影響。我們選擇了一個獎勵函數,用于估計啟發式方法快速且高效地到達沖突的能力。
|
|



浙公網安備 33010602011771號