The goal of this post is to showcase how to encode predicates into Linear Program (LP) constraints by solving two real-worldish problems, a simpler and a harder one. Examples of what I mean with predicates are:
$ \begin{aligned} \neg (a \wedge b), \quad a, b \in \{0, 1\} \quad &\text{i.e. solution is invalid if $a$ and $b$ are $1$}\\ a = b, \quad a, b \in \mathbb{Z} \quad &\text{i.e. solution is valid if and only if $a = b$} \end{aligned} $
Encoding such rules into LPs can be extremely useful, allowing us to marry the expressiveness of predicates with the theoretical guarantees and existing high-quality software related to LP optimization (e.g. CVXPY, Gurobi and CPLEX).
Despite their intuitively simple appearance, encoding predicates as LP constraints can be tricky sometimes. While the examples and explanations here could be useful by themselves, I hope that the thought process can also help others connecting the overall goals with the actual constraints when doing LP. It can also be useful to understand related literature.
I provided explanations and working Python code for two examples:
If you find the explanations too verbose, you can skip straight to the code samples. At the end I included some considerations regarding LP performance and some interesting connections with Graph Theory and Machine Learning, but please keep in mind that this is a practical post, I could never hope or dare to cover LP or convex optimization sufficiently.
The general idea in a Linear Program is to solve an optimization objective in the form
$$ \begin{aligned} \min_{x} \quad c^Tx \quad \textrm{s.t.} \quad \begin{cases} Gx \leq h\\ Ax = b \end{cases} \end{aligned} $$
usually all numbers are real- or complex-valued, but in the case of a (Mixed-) Integer Linear Programs (ILPs), we have the further constraint that all or some entries of the $x$ vector must be boolean or in $\mathbb{Z}$. This constraint is not a minor detail, and in general makes the problem much more difficult to solve than the real-valued counterpart (ILPs are NP-complete in the general case). Still, they are convex and one way of leveraging that is by expressing them as ILPs and feeding them to a high-quality solver. In a practical way, we can think of LPs as follows: if we have an objective in the form of a linear combination, and we are able to express all the required constraints in the form of linear (in)equalities, the problem is convex and can be cast in the form of an LP.
The problem is therefore defined by the vectors and matrices that form the objective and constraints. See e.g. the docstring for cvxopt.glpk.ilp
as an example of an ILP Python interface: we only provide matrices and vectors for the objective and constraints, hit optimize
, and the solver does the rest. All we need to know for today is how to encode our higher-level goals into such objectives and constraints!
ilp(...)
Solves a mixed integer linear program using GLPK.
(status, x) = ilp(c, G, h, A, b, I, B)
PURPOSE
Solves the mixed integer linear programming problem
minimize c'*x
subject to G*x <= h
A*x = b
x[k] is integer for k in I
x[k] is binary for k in B
ARGUMENTS
c nx1 dense 'd' matrix with n>=1
G mxn dense or sparse 'd' matrix with m>=1
h mx1 dense 'd' matrix
A pxn dense or sparse 'd' matrix with p>=0
b px1 dense 'd' matrix
I set of indices of integer variables
B set of indices of binary variables
status if status is 'optimal', 'feasible', or 'undefined',
a value of x is returned and the status string
gives the status of x. Other possible values of status are: 'invalid formulation',
'infeasible problem', 'LP relaxation is primal
infeasible', 'LP relaxation is dual infeasible',
'unknown'.
x a (sub-)optimal solution if status is 'optimal',
'feasible', or 'undefined'. None otherwise
For more background on LPs and convex optimization, I absolutely recommend the lecture series by Prof. Stephen Boyd at Stanford university, which can be found in YouTube (full video list). Lecture 5 introduces LP:
Details on how exactly do (I)LP solvers tackle the optimization are even further out of scope for this practical post. Interested readers may want to take a look at algorithm families like branch-and-cut and techniques like interior-point methods.
The elegance, computational efficiency and theoretical guarantees of LPs, together with the fact that a vast amount of problems can be naturally expressed via linear objectives and constraints, makes LPs one of the most popular tools in optimization. Typical examples are:
Examples like the above can be abundantly found online and in the literature. Less documented are constraints like the following ones:
Constraints like these can increase the expressivity of our LP repertoire, e.g. through the encoding of non-linear behaviour. Conversely, solving problems like these via LP can be very advantageous since we can leverage theoretical and practical advantages associated with LPs.
But in many cases, they involve operators like absolute values and some boolean logic that, despite their conceptual simplicity, can be surprisingly unintuitive to implement as LPs. To help the intuition and provide a bit of systematic, I’ve gathered a series of tips in the next section, followed by examples.
As we’ve seen, an encoded ILP has basically three sets of matrices and vectors:
By this convention, the $i^{th}$ column of $G, A, c^T$ corresponds to the $x_i$ variable. Then, each row of $G, h$ corresponds to an inequality constraint, and each row of $A, b$ to an equality constraint. When we are coding, we typically create these objects based on the known size of the problem, populate the contents of $c, G, h, A, b$, and feed everything to the solver. The only step that is really error-prone and challenging is encoding and populating the contents. For that, there are a few tricks&tips that may be helpful (the next sections will exemplify most, if not all of them):
Encoded constraints can look very differently to the original goals. Furthermore, the tiniest mistake in the encoding can easily lead to infeasible or inexact programs. Such bugs can be very difficult to catch! Even if we think all is well, I strongly recommend investing some time in testing the results against the original goals.
Linear constraints can be split by columns, inequality rows, and equality rows. Conventionally, columns correspond to individual variables, and rows to (in)equality constraints.
When designing the columns, make sure that the defined variables are sufficient to encode everything. Existing variables can be composed into new ones to create more complex behaviour. Usually, these auxiliary or control variables are only there to model behaviour and do not contribute to the objective, i.e. the correspoding $c$ entries are zero.
Running the solver at any point is a good way of ensuring we aren’t encoding anything unfeasible (that would mean our program doesn’t have any possible solution, so we likely have a bug in our encoding). At the beginning, we may not have enough constraints and the solver may return unbonded results. That’s fine, keep adding constraints until it provides bounded solutions.
Make sure that columns have consistent semantic meaning across all $A, G, c, x$ encodings. E.g. the $10^{th}$ column always represents the $10^{th}$ client and nothing else. This is kind of obvious but overlooking this can result in hard-to-find bugs.
Remove redundancies in the sets of constraints: sometimes, removing a constraint doesn’t alter the problem space. E.g. if we have a set of nonnegative variables, we don’t need to require that their sum is nonnegative. While some solvers do this automatically, some optimizations are beyond their current capabilities. Coming up with the minimal set of constraints can be a creative and fun task that leads to (sometimes substantial) size reduction of the LPs and subsequent speedups, so it’s always worth looking for!
Plotting the encoded constraints can help tremendously with debugging. Particularly, I’d recommend to choose a diverging colormap like the one in the image below, where white corresponds to zero, red to positive values and blue to negatives. It is particularly helpful to clip the colormap to reasonably small values, otherwise it can saturate and smaller entries will look too similar to zero. Many times, a simple look at the map reveals bugs and design issues.
More specifically to the thought process of converting predicates into specific (in)equality constraints, I found that the following systematic was of great help:
Note that we still need to provide goals that present specific changepoints in a meaningful way, which by itself can be already unintuitive. This systematic helps converting them to specific constraints, allowing us to focus on the higher-level modeling part.
Consider the following scenario:
Which subset of the candidates satisfies our objective and the constraints? We can solve this automatically, efficiently and with guarantees via LP. Points 2 to 4 can be directly translated into an LP as follows:
$$ \begin{aligned} c \quad := \quad &(-3, -3.5, -3.1, -3.3, -2.5, -3)\\ x \quad := \quad &(x_1, x_2, x_3, x_4, x_5, x_6)^T \quad \in \quad \{ 0, 1 \}^6\\ G \quad = \quad &\begin{pmatrix} 1000 & 1500 & 1200 & 1350 & 900 & 1300 \end{pmatrix}\\ h \quad = \quad &5000\\ &\min_{x} \quad c^Tx \quad \textrm{s.t.} \quad Gx \leq h \end{aligned} $$
But adding the constraint in point 5 requires a bit more thought, as it doesn’t translate directly into a linear (in)equality like the others. Time to identify, define and enforce!
A relevant changepoint occurs whenever both $x_1$ and $x_2$ are $1$: we want that to be infeasible. If we add them, that case would reach a value of $2$, while all other cases have less value. We are then able to draw a line between our relevant scenario and all others via the $(x_1 + x_2) \leq 1$ constraint:
$x_1$ | $x_2$ | $x_1 + x_2$ | $(x_1 + x_2) \leq 1$ |
---|---|---|---|
$0$ | $0$ | $0$ | feasible |
$0$ | $1$ | $1$ | feasible |
$1$ | $0$ | $1$ | feasible |
$1$ | $1$ | $2$ | infeasible |
This way we’ve achieved our goal of identifying the relevant changepoint. In fact, since this is a simpler problem involving just one changepoint, we can solve it without the addition of any control variables. The following threshold would ensure the infeasibility of $a \wedge b$:
$$ x_1 + x_2 \leq 1 $$
In this case, $x_1=x_2=1$ can’t happen, while all other possibilities are allowed, so we’re done. Control variables are only needed when we’re composing multiple constraints and some sort of memory is needed. Still, let’s see how would that play out!
The next step would be to add an extra boolean variable to our LP, $x_e$, and decide which region is fixed and which one is don’t care. In our model, we’d like $x_1 = x_2 = 1$ to result in $x_e = 1$ being fixed. The rest is $DC$. This can be enforced with:
$$ x_1 + x_2 - x_e \leq 1 $$
It can help to make a table with the $0, 1, DC$ values as follows:
$x_1$ | $x_2$ | $x_e$ with $x_1 + x_2 - x_e \leq 1$ |
---|---|---|
$0$ | $0$ | $DC$ |
$0$ | $1$ | $DC$ |
$1$ | $0$ | $DC$ |
$1$ | $1$ | $1$ |
Now $x_e$ behaves as we want, but we still need to add a further constraint to ensure the infeasibility of $x_e = 1$. In this case it is simple because $x_e$ is boolean. We enforce:
$$ x_e = 0 $$
And we’re done. All the $DC$ combinations will collapse then to $0$, but this won’t harm the solution because all of them are still feasible. We can now encode the complete LP with the control variable $x_e$ as follows:
$$ \begin{aligned} c \quad := \quad &(-3, -3.5, -3.1, -3.3, -2.5, -3, 0)\\ x \quad := \quad &(x_1, x_2, x_3, x_4, x_5, x_6, x_e)^T \quad \in \quad \{ 0, 1 \}^6\\ G \quad = \quad &\begin{pmatrix} 1000 & 1500 & 1200 & 1350 & 900 & 1300 & 0\\ 1 & 1 & 0 & 0 & 0 & 0 & -1 \end{pmatrix}\\ h \quad = \quad &\begin{pmatrix} 5000\\ 1 \end{pmatrix}\\ A \quad = \quad &\begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix}\\ b \quad = \quad &0\\ &\min_{x} \quad c^Tx \quad \textrm{s.t.} \begin{cases} Gx \leq h\\ Ax = b \end{cases} \end{aligned} $$
Note that in this case we added 2 extra constraints and 1 extra variable, while in the shorter version we would just add 1 extra constraint and 0 extra variables. Although not always precise, a popular rule of thumb for measuring the size of an LP is number of variables times number of constraints.
Let’s see that in action!
import cvxpy as cp
import numpy as np
FLOAT_DTYPE = np.float64
# objective
x = cp.Variable(7, boolean=True)
c = FLOAT_DTYPE([-3, -3.5, -3.1, -3.3, -2.5, -3, 0])
# inequality constraints
G = FLOAT_DTYPE([[1000, 1500, 1200, 1350, 900, 1300, 0],
[1, 1, 0, 0, 0, 0, -1]])
h = FLOAT_DTYPE([5000, 1])
# equality constraints
A = FLOAT_DTYPE([[0, 0, 0, 0, 0, 0, 1]])
b = FLOAT_DTYPE([0])
# solver
prob = cp.Problem(cp.Minimize(c.T@x),
[G@x <= h, A@x == b])
score = prob.solve()
# solution
print("status:", prob.status) # optimal means solution found
print("story points:", -prob.value) # optimal value for -<c, x>
print("money:", G[0]@x.value)
print("optimal x:", x.value)
At this point I’m really curious about the results:
status: optimal
story points: 12.399999999999999
money: 4850.0
optimal x: [1. 0. 1. 1. 0. 1. 0.]
We see that $x_1, x_3, x_4$ and $x_6$ got the job, providing $12.4$ story points for a cost of $4850$. We want to thank all other candidates for their engagement and wish them success in their future endeavours.
As an exercise, consider the following 2 scenarios:
In this example we want to illustrate how to model more complex behaviours:
Imagine the following scenario:
We would really like to find the global maximum, but how could we go about it? In many aspects, this looks like a hardcore combinatorial problem with an extremely high number of combinations for a brute-force search. Can we do better than that?
If we encode each participant as 11 boolean variables (one per position), and each pairwise relationship also as a boolean variable ($1$=active, $0$=inactive in all cases), we would have $11N + M$ boolean variables, and we could (informally):
We see that there is hope, since we can already encode the objective and some of the constraints into an LP. But we haven’t seen yet how to encode “if and only if” relationships. Also, the relationship should only be active if both participants are in the same team. Furthermore, we didn’t enforce that teams have exactly one member per position. How could we achieve any of that through linear constraints?
Let’s start with the simpler issue: the constraint $C_i + C_j - R_{ij} <= 1$ ensures that if $C_i, C_j$ are both active, then $R_{ij}$ is also active. But otherwise, the variable is a $DC$, and we want it to be fixed to $0$ instead.
Changepoints and variable $R_{ij}$ are the same, we just need to add an extra constraint to fill the gap:
$C_i$ | $C_j$ | $R_{ij}$ with $C_i + C_j - R_{ij} <= 1$ | $R_{ij}$ with $-C_i - C_j + 2R_{ij} <= 0$ |
---|---|---|---|
$0$ | $0$ | $DC$ | $0$ |
$0$ | $1$ | $DC$ | $0$ |
$1$ | $0$ | $DC$ | $0$ |
$1$ | $1$ | $1$ | $DC$ |
Enforcing both constraints at the same time would ensure that $R_{ij}$ is $1$ if and only if $C_i, C_j$ are both $1$. We can then use $R_{ij}$ directly as an optimization variable to be multiplied with the corresponding relationship score.
But we’re still far from done, we still need to address how will the teams get exactly one member per position.
So far, our LP has no way of encoding how many members a team has, even less how many per position. We also don’t know how many teams we will end up having, so we can’t provide placeholders either. We need to re-model the whole problem before we can propose specific changepoints.
A possible solution is to add a non-negative integer to each participant representing the team ID: We can use $0$ as a “no team” catch-all term, and any value above as an arbitrary team ID. Consistency can be ensured as follows, for each participant:
Now that team ID and position states are consistent, we can handle relationships as follows, for each pair of participants:
At this point we have discarded a majority of the infeasible connections, but we still aren’t enforcing teams of 11 members with unique positions. We can make progress by enforcing the following:
In a way, we want participants to be either in a full team or in none. But still, duplicate positions may happen, and with our current model we have no way of keeping track of the team IDs. Luckily, we can exploit a key property of our model: transitivity. If the $(p, q)$ and $(q, r)$ relationships are active, this means that participants $p, q, r$ are in the same team, so we must also enforce the $(p, r)$ relationship to be active. But this can only happen if all 3 participants have distinct positions!. Even better: since we are enforcing that all participants in teams must have 11 active relationships, this property extends to all of them: by enforcing all 11 people to be related to each other (in a so-called clique), we will have guaranteed 11 distinct positions covered, because all possible triangles are present, and no single triangle can have repeated positions. Furthermore, we will also have guaranteed a unique ID for every team, because otherwise players from different teams would have to be connected.
At this point we’re done with the higher-level modelling, and ready to convert our goals into LP constraints. Let’s identify the changepoints in bottom-up order:
Now we can enforce the constraints, defining control variables whenever needed.
Goal 1 is straightforward: for each participant’s team ID $t$ integer variable, enforce:
$$ -t \leq 0 $$
Goals 2 and 3 require us to define an if and only if relationship that is true only when team ID $t$ is zero. We define an $x_0$ control variable and 2 inequality constraints per participant as follows, with a sufficiently large integer K:
$t$ | $x_0$ with $-t - x_0 \leq -1$ | $x_0$ with $t + Kx_0 \leq K$ |
---|---|---|
$\dots$ | $DC$ | $0$ |
$2$ | $DC$ | $0$ |
$1$ | $DC$ | $0$ |
$0$ | $1$ | $DC$ |
This gives us goal 2. Then, for each participant, given $x_0$ and the 11 (boolean) position variables $x_a, \dots, x_k$, we can enforce goal 3 via:
$$ x_0 + x_a + \dots + x_k = 1 $$
That way, if $t=0$, no positions will be active; otherwise, exactly one position will be active.
Goal 4 is also straightforward. Since we know ahead of time which positions did each participant choose, we isolate the discarded ones and enforce that they are all inactive:
$$ \sum_{i \in discarded} x_i = 0 $$
Goal 5 requires an integer comparator variable $x_=$ for each pair of participants. Since the difference between 2 team entries $\Delta_t$ can be also a negative number, this is a bit more involved. Each integer comparator is a combination of 2 further auxiliary $x_\gt, x_\lt$ variables, via 4 inequality and 1 equality constraints. All variables are boolean. Given a sufficiently large integer as $K$, we enforce $x_\gt$ with the following 2 constraints:
$\Delta_t$ | $x_\gt$ with $\Delta_t - Kx_\gt \leq 0$ | $x_\gt$ with $-\Delta_t + Kx_\gt \leq (K - 1)$ |
---|---|---|
$\dots$ | $1$ | $DC$ |
$1$ | $1$ | $DC$ |
$0$ | $DC$ | $0$ |
$-1$ | $DC$ | $0$ |
$\dots$ | $DC$ | $0$ |
And $x_\lt$ can be enforced analogously. Then, we enforce $x_=$ as follows:
$$ x_\gt + x_\lt + x_= = 1 $$
This way, $x_=$ is $1$ if and only if both $x_\gt = x_\lt = 0$, which happens exactly when $\Delta_t = 0$.
Goal 6 requires to compare the 11 boolean variables between any 2 participants $\alpha$ and $\beta$. We define a control variable $x_p$ that is $1$ if and only if both participants have different position. This is a variation of the integer comparator from goal 5 and also requires 2 auxiliary $x_\gt, x_\lt$ variables. But now we define the difference between both positions as:
$$ \Delta_p = 2^0 \cdot \alpha_1 + 2^1 \cdot \alpha_2 + \dots a-2^0 \cdot \beta_1 - 2^1 \cdot \beta_2 - \dots $$
Since this is basically a binary encoding, we see that $\Delta_p$ can only be zero if both participants have identical positions. So we can feed $\Delta_p$ to an integer comparator like we did with $\Delta_t$, yielding our desired $x_p$. Note that in this case we enforce $x_p=1$ when positions are different. This only requires a minor modification to the enforcer equality constraint. Instead of $x_\gt + x_\lt + x_p = 1$ we just have to flip $x_p$, yielding $x_\gt + x_\lt + (1 - x_p) = 1$, or equivalently:
$$ x_\gt + x_\lt - x_p = 0 $$
This enforces that $x_p$ is zero when $\Delta_p$ is zero, and 1 otherwise.
Goal 7 makes use of $x_=$ and $x_p$. The activity of the relationship $R$ between 2 participants whenever their team ID is nonzero is heterogeneous:
$x_=$ | $x_p$ | $R$ |
---|---|---|
$0$ | $0$ | $0$ |
$0$ | $1$ | $0$ |
$1$ | $0$ | should not be allowed |
$1$ | $1$ | $1$ |
It is not feasible to have 2 participants on the same (nonzero) team and position. Apart from that, the relationship is active if and only if both participants are in the same team and different positions. Let’s try to identify the relevant changing points: they can be exposed by the $\Gamma := 2x_= - x_p$ operator:
$x_=$ | $x_p$ | $\Gamma:= 2x_= - x_p$ | $R$ |
---|---|---|---|
$0$ | $1$ | $-1$ | $0$ |
$0$ | $0$ | $0$ | $0$ |
$1$ | $1$ | $1$ | $1$ |
$1$ | $0$ | $2$ | infeasible |
This looks much better now, because we can express it as inequalities: anything above 1 is infeasible, and the rest behaves like an if and only if. So now we can enforce $R$ via just 2 inequality constraints, because we can create a $DC$/fixed/infeasible split with a single constraint:
$\Gamma$ | $R$ with $\Gamma - R \leq 0$ | $R$ with $KR -\Gamma \leq (K-1)$ |
---|---|---|
$-1$ | $DC$ | $0$ |
$0$ | $DC$ | $0$ |
$1$ | $1$ | $DC$ |
$2$ | infeasible | $DC$ |
Combining both inequality constraints gives us the desired behaviour for $R$. And importantly, it is not possible for any pair of participants to have same team and same position: that would imply $\Gamma = 2$, and there is no possible value of $R$ that would support that. Hence, $\Gamma = 2$ can’t be a feasible solution.
But we must be careful: all participants with no team will end up with the same team ID and position (i.e. both $0$). With these constraints, we are forbidding that to happen, and this leads to an LP that has no solution at all! We must leave some breathing room, i.e. introduce the following extra requirement:
A pair of participants is allowed to have the same team ID and position if and only if the team ID is zero.
We can leverage our already defined $x_0$ variable as the only changepoint we need. We just need to enforce the following rules:
This can be implemented as follows:
$$ \begin{aligned} \Gamma - R &\leq 0 + K(x_0+x_0’)\\ KR -\Gamma &\leq (K-1) + K(x_0+x_0’) \end{aligned} $$
Note that with this modification the value of $R$ is $DC$ whenever there is any $x_0=1$. This is fine, since our next goal will handle those cases by enforcing $R=0$.
In goal 8, we need to locate all $R_{ij}$ variables for each participant $i$. Then we make use of the $x_0$ variable that equals $1$ if and only if the participant’s team ID is $0$, and encode the goal with the following constraint:
$$ 11x_0 + \sum_{j} R_{ij} = 11 $$
This way, if the participant has no team, $x_0 = 1$ and all relationships will be inactive. If the participant has a team, $x_0=0$ and exactly 11 relationships must be active.
A literal implementation of Goal 9 would be by far the heaviest of all, since it would require two constraints for each pair of relationships, i.e. $2 \begin{pmatrix}M\\2\end{pmatrix} = M^2 - M$ constraints. Given that $M$ is already in $\mathcal{O}(N^2)$, this would yield an order of $\mathcal{O}(N^4)$ constraints.
This is in fact how the literature proposes to solve it. But wait, doesn’t goal 7 *already enforce transitivity?
If participants $p, q$ have an active $R$, and participants $q, r$ as well, this implies:
The first relationship is indeed transitive, so we are covered. The second relationship is a bit trickier, since “having a different position” is not transitive: if $p$ is at position $a$, and both $q, r$ are at position $b$, having the $(p, q), (q, r)$ relationships active won’t imply that $(q, r)$ is active as well. Luckily for us, this can’t happen since the constraints in goal 7 make it infeasible. If $(p, q)$ and $(q, r)$ have an active $R$, this must imply that all $p, q, r$ are in the same team and have different positions. Hence, $(p, r)$ will be active as well!
Through detailed observation and handling of all cases we were able to reduce the overall size of the LP matrix from $\mathcal{O}(N^8)$ to $\mathcal{O}(N^4)$. not bad!
So it turned out that this was a bit more convoluted than expected, in fact this was a multi-state, multi-clique, transitivity-optimized version of the Maximum Edge Weight Clique (MEWC) problem. OK, at this point I’ll admit that our 3-step systematic didn’t quite replace all of the non-intuitive tweaks and quirks needed to encode this into an LP program, but still they were at the core of the design process, and hopefully removing the complexity there helps focusing all the brainwork on the big picture.
A possible implementation would look as follows (complete script with tests and evaluation can be found here):
import random
from collections import defaultdict
#
import numpy as np
import cvxpy as cp
from cvxpy.settings import CPLEX
import networkx as nx
# ############################################################################
# # SYNTHETIC DATASET CREATION
# ############################################################################
N = 61
NUM_POSITIONS = 5
POS_ENCODING = 2 ** np.arange(NUM_POSITIONS) # 1, 2, 4...
PREFERENCE_SCORES = np.array([1, 2/3, 1/3]) * (NUM_POSITIONS - 1)
MIN_NEIGHS_IF_NONZERO = 4 # at most N-1
IGNORED_PENALTY = 1 * (NUM_POSITIONS - 1)
K = max(2 * N, POS_ENCODING.sum()) # K needs to be bigger than any of these 2
# Create graph with random relationships between -1 and 1
g = nx.complete_graph(N)
for ori, dest in g.edges():
r = (random.random() * 2 - 1) ** 3
g.edges[ori, dest]["weight"] = r
g_edge_idxs = np.array(g.edges) # shape (M, 2)
# Create random preferences for all users
prefs = np.zeros((N, NUM_POSITIONS))
p_idxs = np.array([random.sample(range(NUM_POSITIONS), len(PREFERENCE_SCORES))
for _ in range(N)])
for i, score in enumerate(PREFERENCE_SCORES):
prefs[range(N), p_idxs[:, i]] = score
# ############################################################################
# # VARIABLE PARTITION AND OPTIMIZATION OBJECTIVE
# ############################################################################
# column partition in 3 sections: team IDs, indiv. vars, relationship vars
ind_vars = NUM_POSITIONS + 1 # one per position + x0
rel_vars = 7 # [rel_score, x<, x>, x=, z<, z>, xp]
num_rels = len(g.edges)
num_pvars, num_relvars = N*ind_vars, num_rels*rel_vars
beg1, beg2, beg3, num_vars = np.cumsum([0, N, num_pvars, num_relvars])
# indexes for column partitions
t_idxs = np.arange(beg1, beg2)
pos_idxs = np.array([np.arange(i, i+NUM_POSITIONS)
for i in np.arange(beg2, beg3, ind_vars)])
x0_idxs = np.arange(beg2, beg3, ind_vars) + NUM_POSITIONS
rel_idxs = np.arange(beg3, num_vars, rel_vars) + 0
xl_idxs = rel_idxs + 1
xg_idxs = rel_idxs + 2
xe_idxs = rel_idxs + 3
zl_idxs = rel_idxs + 4
zg_idxs = rel_idxs + 5
zp_idxs = rel_idxs + 6
# Create objective vector with 3 sections: team ID, preferences, relationships
c = np.zeros(num_vars)
for i, positions in enumerate(pos_idxs):
c[positions] = prefs[i]
rel_weights = [g.edges[idx]["weight"] for idx in g_edge_idxs]
c[rel_idxs] = rel_weights
c[x0_idxs] = -IGNORED_PENALTY
# Create variables analogously to objective vector: teamID are int, rest bool
x_int = cp.Variable(N, integer=True)
x_bool = cp.Variable(num_pvars+num_relvars, boolean=True)
x = cp.hstack([x_int, x_bool])
# ############################################################################
# # INEQUALITY CONSTRAINTS
# ############################################################################
ineq_blocksizes = [N, # goal 1
N, N, # goal 2
num_rels, num_rels, num_rels, num_rels, # goal 5
num_rels, num_rels, num_rels, num_rels, # goal 6
num_rels, num_rels, # goal 7
N, N] # goal 8
ineq_begs = np.cumsum([0] + ineq_blocksizes)
total_ineq = ineq_begs[-1]
#
G = np.zeros((total_ineq, num_vars))
h = np.zeros(total_ineq)
# Goal 1: non-negativity of team IDs: -t <= 0
G[ineq_begs[0]:ineq_begs[1], t_idxs] = -np.eye(N)
# Goal 2: x_0 iff t=0 for each n: (-t - x_0 <= -1), (t + K*x_0 <= K)
G[ineq_begs[1]:ineq_begs[2], t_idxs] = -np.eye(N)
G[ineq_begs[1]:ineq_begs[2], x0_idxs] = -np.eye(N)
h[ineq_begs[1]:ineq_begs[2]] = -1
G[ineq_begs[2]:ineq_begs[3], t_idxs] = np.eye(N)
G[ineq_begs[2]:ineq_begs[3], x0_idxs] = np.eye(N) * K
h[ineq_begs[2]:ineq_begs[3]] = K
# Goal 5 ineq: teamID integer comparator for each relationship:
t_ori_idxs = t_idxs[g_edge_idxs[:, 0]]
t_dest_idxs = t_idxs[g_edge_idxs[:, 1]]
# xl constraints: (t1-t2) - K*xl <= 0 (t2-t1) + K*xl <= (K-1)
G[ineq_begs[3]:ineq_begs[4]][range(num_rels), t_ori_idxs] = 1
G[ineq_begs[3]:ineq_begs[4]][range(num_rels), t_dest_idxs] = -1
G[ineq_begs[3]:ineq_begs[4]][range(num_rels), xl_idxs] = -K
#
G[ineq_begs[4]:ineq_begs[5]][range(num_rels), t_ori_idxs] = -1
G[ineq_begs[4]:ineq_begs[5]][range(num_rels), t_dest_idxs] = 1
G[ineq_begs[4]:ineq_begs[5]][range(num_rels), xl_idxs] = K
h[ineq_begs[4]:ineq_begs[5]] = K - 1
# xg constraints: (t2-t1) - K*xg <= 0 (t1-t2) + K*xg <= (K-1)
G[ineq_begs[5]:ineq_begs[6]][range(num_rels), t_ori_idxs] = -1
G[ineq_begs[5]:ineq_begs[6]][range(num_rels), t_dest_idxs] = 1
G[ineq_begs[5]:ineq_begs[6]][range(num_rels), xg_idxs] = -K
#
G[ineq_begs[6]:ineq_begs[7]][range(num_rels), t_ori_idxs] = 1
G[ineq_begs[6]:ineq_begs[7]][range(num_rels), t_dest_idxs] = -1
G[ineq_begs[6]:ineq_begs[7]][range(num_rels), xg_idxs] = K
h[ineq_begs[6]:ineq_begs[7]] = K - 1
# Goal 6 ineq: binary-encoded position comparator for each relationship:
# zl constraint 1: (bin(pos1)-bin(pos2)) - K*zl <= 0
# zl constraint 2: (bin(pos2)-bin(pos1)) + K*zl <= (K-1)
for row_i, (ori, dest) in enumerate(g_edge_idxs):
ori_positions, dest_positions = pos_idxs[ori], pos_idxs[dest]
G[ineq_begs[7]+row_i][ori_positions] = POS_ENCODING
G[ineq_begs[7]+row_i][dest_positions] = -POS_ENCODING
G[ineq_begs[8]+row_i][ori_positions] = -POS_ENCODING
G[ineq_begs[8]+row_i][dest_positions] = POS_ENCODING
G[ineq_begs[7]:ineq_begs[8]][range(num_rels), zl_idxs] = -K
G[ineq_begs[8]:ineq_begs[9]][range(num_rels), zl_idxs] = K
h[ineq_begs[8]:ineq_begs[9]] = K - 1
# zg constraint 1: (bin(pos2)-bin(pos1)) - K*zl <= 0
# zg constraint 2: (bin(pos1)-bin(pos2)) + K*zl <= (K-1)
for row_i, (ori, dest) in enumerate(g_edge_idxs):
ori_positions, dest_positions = pos_idxs[ori], pos_idxs[dest]
G[ineq_begs[9]+row_i][ori_positions] = -POS_ENCODING
G[ineq_begs[9]+row_i][dest_positions] = POS_ENCODING
G[ineq_begs[10]+row_i][ori_positions] = POS_ENCODING
G[ineq_begs[10]+row_i][dest_positions] = -POS_ENCODING
G[ineq_begs[9]:ineq_begs[10]][range(num_rels), zg_idxs] = -K
G[ineq_begs[10]:ineq_begs[11]][range(num_rels), zg_idxs] = K
h[ineq_begs[10]:ineq_begs[11]] = K - 1
# Goal 7 ineq: feasibility and state of relationships based on team ID and pos
x0_ori_idxs = x0_idxs[g_edge_idxs[:, 0]]
x0_dest_idxs = x0_idxs[g_edge_idxs[:, 1]]
# Constraint 1 for each rel: (2xe-zp) - rel - K*x0 - K*x'0 <= 0
G[ineq_begs[11]:ineq_begs[12]][range(num_rels), xe_idxs] = 2
G[ineq_begs[11]:ineq_begs[12]][range(num_rels), zp_idxs] = -1
G[ineq_begs[11]:ineq_begs[12]][range(num_rels), rel_idxs] = -1
G[ineq_begs[11]:ineq_begs[12]][range(num_rels), x0_ori_idxs] = -K
G[ineq_begs[11]:ineq_begs[12]][range(num_rels), x0_dest_idxs] = -K
# Constraint 2 for each rel: K*rel -(2*xe-zp) - K*x0 - K*x'0 <= (K-1)
G[ineq_begs[12]:ineq_begs[13]][range(num_rels), rel_idxs] = K
G[ineq_begs[12]:ineq_begs[13]][range(num_rels), xe_idxs] = -2
G[ineq_begs[12]:ineq_begs[13]][range(num_rels), zp_idxs] = 1
G[ineq_begs[12]:ineq_begs[13]][range(num_rels), x0_ori_idxs] = -K
G[ineq_begs[12]:ineq_begs[13]][range(num_rels), x0_dest_idxs] = -K
h[ineq_begs[12]:ineq_begs[13]] = K - 1
# Goal 8: Enforce that each individual has either 0 relationships or >=NZ:
# K*x0 + sum(rels) <= K (if x0==1, sum(rels) must be zero)
G[ineq_begs[13]:ineq_begs[14]][range(N), x0_idxs] = K
for i in range(N):
# For individual i, find var indexes of all its relationships
i_edge_idxs = np.where((g_edge_idxs == i).any(axis=1))[0]
all_i_rels = rel_idxs[i_edge_idxs] # (N-1)
# and set them
G[ineq_begs[13]+i, all_i_rels] = 1
h[ineq_begs[13]:ineq_begs[14]] = K
# -NZ*x0 - sum(rels) <= -NZ (if x0==0, sum(rels) must be >=NZ, otherwise DC)
G[ineq_begs[14]:ineq_begs[15]][range(N), x0_idxs] = -MIN_NEIGHS_IF_NONZERO
for i in range(N):
# For individual i, find var indexes of all its relationships
i_edge_idxs = np.where((g_edge_idxs == i).any(axis=1))[0]
all_i_rels = rel_idxs[i_edge_idxs] # (N-1)
# and set them
G[ineq_begs[14]+i, all_i_rels] = -1
h[ineq_begs[14]:ineq_begs[15]] = -MIN_NEIGHS_IF_NONZERO
# ############################################################################
# # EQUALITY CONSTRAINTS
# ############################################################################
eq_blocksizes = [N, N, num_rels, num_rels]
eq_begs = np.cumsum([0] + eq_blocksizes)
total_eq = sum(eq_blocksizes)
#
A = np.zeros((total_eq, num_vars))
b = np.zeros(total_eq)
# Goal 3: x_0 + sum(x_pos) = 1 for each n
for i, row_i in enumerate(range(eq_begs[0], eq_begs[1])):
A[row_i, x0_idxs[i]] = 1
A[row_i, pos_idxs[i]] = 1
b[eq_begs[0]:eq_begs[1]] = 1
# Goal 4: sum(discarded_positions) = 0 for each n
for i, row_i in enumerate(range(eq_begs[1], eq_begs[2])):
discarded_by_i = (prefs[i] == 0)
A[row_i, pos_idxs[i]] = discarded_by_i
# Goal 5 eq: enforce xe via xl+xg+xe = 1, once per relationship
A[eq_begs[2]:eq_begs[3]][range(num_rels), xl_idxs] = 1
A[eq_begs[2]:eq_begs[3]][range(num_rels), xg_idxs] = 1
A[eq_begs[2]:eq_begs[3]][range(num_rels), xe_idxs] = 1
b[eq_begs[2]:eq_begs[3]] = 1
# Goal 6 eq: enforce zp via zl+zg-zp = 0, once per relationship
A[eq_begs[3]:eq_begs[4]][range(num_rels), zl_idxs] = 1
A[eq_begs[3]:eq_begs[4]][range(num_rels), zg_idxs] = 1
A[eq_begs[3]:eq_begs[4]][range(num_rels), zp_idxs] = -1
breakpoint()
# ############################################################################
# # SOLVER AND OPTIMIZATION
# ############################################################################
prob = cp.Problem(cp.Maximize(c.T@x), [G@x <= h, A@x == b])
prob.solve(solver=CPLEX, verbose=True) # CPLEX worked best with no tweaking
Exercises:
The technique covered here is both fundamental and impactful, being directly connected with a variety of relevant applications. To illustrate this, we consider here a few of those connections and practical implications.
We have seen different ways of implementing the same relation, either with more variables and less constraints, or the converse. In this context, one relevant question is: which one should I prefer to get to my solution faster?. By the Strong Duality Theorem, we know that for each feasible linear program (called primal), there is an equivalent dual program that achieves the same optimum. And it turns out that the parameter matrix for the dual program is the transpose of the primal.
Many modern ILP solvers are able to swap between primal and dual, and usually they identify which one is better. In the linked post, Michael Grant explains the point perfectly:
Commercial solvers like Gurobi analyze the structure of a problem and decide whether the problem is best suited for solving in primal or dual form. As a general rule, users should not worry about this. In fact, it can often be counterproductive […].
So since the solver can end up transposing it anyway, it seems that reformulating the problem so that our matrix has less rows (less constraints) but more columns (more variables), or viceversa, should only be done if we end up with strictly less elements, or for the sake of formulation clarity.
One interesting connection between this technique and Graph Theory is the problem of finding maximal weighted cliques in a graph, which, like ILP, is NP-complete in the general case. But due to its convexity, for many cases, an ILP formulation and solver finds the optimum in a reasonable amount of time.
An example of this can be seen in constraint 3 of the extended MCPb formulation paper (online version can be found here) by Park, Lee and Park. The constraint $x_i + x_j - y_{ij} \leq 1$ enforces that, if both $x$ nodes are active, the $y$ edge between them must also be active (note that this is a one-side implication, since an active edge with inactive nodes is not being constrained, but other constraints take care of that).
More generally, finding maximal cliques on densely connected k-partite graphs like Turán Graphs is a central problem for many signal processing applications, like compressing, denoising or object tracking (online version of the paper here).
A 13-4 Turán graph, from Wikipedia
The idea of achieving complex behaviour by linearly combining simpler, piece-wise rules is at the core of many powerful Machine Learning techniques.
Regarding neural networks, the DeepLearningBook by Goodfellow, Bengio and Courville summarizes the point well in its introduction:
The quintessential example of a deep learning model is the feedforward deep network or multilayer perceptron (MLP). […] The function is formed by composing many simpler functions.
More generally, in artificial neural networks, each layer performs a piecewise linear composition (followed then by a nonlinear transformation). And strongly related to this post, the idea that artificial neurons can implement arbitrary boolean functions was one of the first realizations in the field, stemming from the 1943 foundational paper by McCulloch and Pitts (online version can be found here). Much later (1989), Cybenko would formulate the Universal Approximation Theorem, which proves that artificial neural networks, given enough capacity, can approximate a very large family of functions. The following image provides a good intuition. In it, the authors show the expressivity of a deep NN with single-neuron hidden layers (check the paper for more details):
Source: ResNet with one-neuron hidden layers is a Universal Approximator, Lin and Jegelka, NeurIPS 2018). (online version can be found here).
Another tightly related concept is boosting, a meta-algorithm presented in Schapire’s 1990 paper (online version can be found here), which provided an affirmative answer to the 1988 question by Kearns and Valiant: “Can a set of weak learners create a single strong learner?”. In this context, multiple weak predictors can be combined to produce a stronger one, thus reducing both variance and bias of the resulting composition. The linear combination of different thresholding mechanisms, like presented in this post, is a popular boosting strategy.
The power and simplicity of boosting proves to be very effective. Many current setups include it for different applications; e.g. the InstaBoost paper by Fang et al. (online version here) performs instance-based segmentation and inpainting, bringing forward the state of the art in several benchmarks.
But maybe the most powerful and compelling bridge that I’ve seen so far between Machine Learning and ILPs is the following paper, that got a spotlight at ICML 2021:
In it, the authors demonstrate the potential of neural networks to solve ILPs. They provide differentiable expressions for both the cost terms and the constraints, resulting in end-to-end trainable architectures that simultaneously extract the features from raw data and learn the constraints that define the combinatorial problem. The image below illustrates one very exciting application of this:
This post covered a practical approach for encoding predicates into linear programs, and showcased it on two different examples inspired in real-world scenarios. From one perspective, the techniques presented here can enhance the expressiveness of our LP repertoire, but at the same time they allow us to leverage the LP theoretical and practical advantages to solve hard combinatorial problems. What a nice duality!