Seven Segment Search - Analysis & Constraints
In this post I work out a constraint-based solution to Advent of Code 2021 day 8, "Seven Segment Search". In part 2, I'll implement the solution in F#.
Premise
I won't repeat the full problem premise - you can find the full problem description at this URL: https://adventofcode.com/2021/day/8.
The problem involves seven segment digital displays. Each segment has its own input signal, labeled with the letters a
through g
inclusive. Turning on some signals but not others displays some digit 0 through 9. The signals are supposed to correspond to segments like so:
aaaa
b c
b c
dddd
e f
e f
gggg
In the above case, the signals "a c d f g" would correspond to the digit '3'.
But the signal-to-segment connections have been randomized so that we no longer know which segment each signal letter corresponds to.
The problem input consists of multiple lines; each line contains a list of 10 signal patterns corresponding to the digits 0-9 in an arbitrary order. i.e. we are given 10 unique patterns of letters a-g, but do not know which pattern corresponds to which digit or which letter corresponds to which segment. The randomization is different for each line of input.
The latter part of each line contains 4 additional patterns, each representing a digit, using the same letter to segment mappings as the 10 example patterns.
A single line of input looks like this:
acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab | cdfeb fcadb cdfeb cdbaf
The goal of the problem is to use the example patterns on the left to determine which sets of signals correspond to which digits, so we can decode the 4 displays on the right into a 4-digit integer value.
Notation
Since the problem premise is that the signals 'a' - 'g' no longer match the above display segments, I introduced labels for each segment based on their physical position:
TT
: topTL
: top-leftTR
: top-rightMM
: middleBL
: bottom-leftBR
: bottom-rightBB
: bottom
So the digit 3
would include the segments TT, TR, MM, BR, and BB like so:
TT
■ ■ ■
TL □ ■ TR
□ ■
■ ■ ■ MM
□ ■
BL □ ■ BR
■ ■ ■
BB
Initial Solution
Here are some initial observations about the problem:
- The order of the letters in a pattern does not matter
- Each digit is represented by a unique set of segments and the segments used by a digit are all active simultaneously, so the order of the letters representing the signals is irrelevant:
"abc" == "bca"
- Each digit is represented by a unique set of segments and the segments used by a digit are all active simultaneously, so the order of the letters representing the signals is irrelevant:
- Because the patterns are really sets, we can add (union), subtract, or intersect them to isolate particular segments.
- Some digits have a unique number of segments
- Part 1 of the problem description points out that the digits 1, 4, 7, and 8 each use a unique number of digits, so we can see immediately what patterns correspond to those digits (just not necessarily what letters correspond to what segment)
- We can use segments that different digits have in common to determine the letter for some segments
- the digits '1' and '7' only differ by a single segment. If '1' is represented by the pattern "cf" and '7' is represented by the pattern "acf", subtracting the segments in '1' from the segments in '7' leaves only "a", and we can conclude that "a" corresponds to the top horizontal segment.
Say the set of segments representing digit \(d\) is \(S_d\):
- \(S_1 = \{ TR, BR \}\)
- \(S_7 = \{ TT, TR, BR \}\)
- \(S_7-S_1=\{ TT \}\)
Initial Implementation
For my initial solution, I expanded on the idea that different digits have different segments in common to create a sequence of deductions for each digit. For example:
- The digit '4' is the only pattern with 4 segments: \(|S_4|=4\)
- The digit '7' is the only pattern with 3 segments: \(|S_7|=3\)
- The digits '0', '6', and '9' each use 6 segments: \(|S_0| = |S_6| = |S_9| = 6\)
- The digit '9' uses only one segment that is not also in the digits '4' and '7' combined:
- \(S_4=\{ TL, TR, MM, BR \}\)
- \(S_7=\{ TT, TR, BR \}\)
- \(S_9=\{ TT, TL, TR, MM, BR, BB \}\)
- \(S_4 \cup S_7 = \{ TT, TL, TR, MM, BR \}\)
- \(S_9-(S_4 \cup S_7)=\{ BB \}\)
- This is not true for '0', and '6' - each have two segments not in \(S_4 \cup S_7\)
Therefore the pattern for the digit '9' is a 6-letter pattern that differs from the patterns for '4' and '7' combined by one letter. This allows us to find both which pattern corresponds to the digit '9' and the letter that corresponds to the bottom segment.
You can build a series of deductions like this to determine the identity of all the digits. You can see my F# implementation of this strategy on GitLab. In particular, here is the implementation for finding the digit '9' as described above:
// 9 uses the segments as 1, 4, and 7 plus 1 additional segment at the bottom.
// So look for a 6-segment pattern where if we take out the segments from 1, 4, and 7
// we're left with only one other segment.
let _147 = Set.unionMany [ bindings[1] ; bindings[4] ; bindings[7] ]
let match9 pattern =
match Set.count pattern with
| 6 ->
let diff = Set.difference pattern _147
(Set.count diff) = 1
| _ -> false
set 9 <| List.find match9 unbound
// the additional segment for digit 9 is the bottom segment:
let seg_B = Set.difference bindings[9] _147 |> exactlyOne
Shortcomings
This solution functioned correctly and was sufficiently performant for the size of the problem input. My misgivings with this stragey were mostly stylistic:
- The decoding function is irreducibly complex - the deduction for each digit is unique and cannot re-use code from other deductions.
- The solution is highly specific and could not be generalized to similar problems.
- My implementation relied on mutation, which while allowed is not really in the functional spirit.
As a learning exercise for myself, I wondered if there was a more general way to solve the problem.
Domain Modelling and Brute Force
With our segment labels from earlier, we can list out the set of segments used by each digit:
Digit | Segments |
---|---|
0 | TT, TL, TR, BL, BR, BB |
1 | TR, BR |
2 | TT, TR, MM, BL, BB |
3 | TT, TR, MM, BR, BB |
4 | TL, TR, MM, BR |
5 | TT, TL, MM, BR, BB |
6 | TT, TL, MM, BL, BR, BB |
7 | TT, TR, BR |
8 | TT, TL, TR, MM, BL, BR, BB |
9 | TT, TL, TR, MM, BR, BB |
Another representation of the same information, using a 2D array / membership table:
TT | TL | TR | MM | BL | BR | BB | |
---|---|---|---|---|---|---|---|
0 | ■ | ■ | ■ | - | ■ | ■ | ■ |
1 | - | - | ■ | - | - | ■ | - |
2 | ■ | - | ■ | ■ | ■ | - | ■ |
3 | ■ | - | ■ | ■ | - | ■ | ■ |
4 | - | ■ | ■ | ■ | - | ■ | - |
5 | ■ | ■ | - | ■ | - | ■ | ■ |
6 | ■ | ■ | - | ■ | ■ | ■ | ■ |
7 | ■ | - | ■ | - | - | ■ | - |
8 | ■ | ■ | ■ | ■ | ■ | ■ | ■ |
9 | ■ | ■ | ■ | ■ | - | ■ | ■ |
If we use a similar format for a problem input, we can see something kind of interesting. Using the example input from the Advent of Code page:
acedgfb cdfbe gcdfa fbcad dab
cefabd cdfgeb eafb cagedb ab
We can sort the letters in each string and display them aligned in columns:
a b c d e f g
b c d e f
a c d f g
a b c d f
a b d
a b c d e f
b c d e f g
a b e f
a b c d e g
a b
Or perhaps put them in a 2D array:
a | b | c | d | e | f | g |
---|---|---|---|---|---|---|
■ | ■ | ■ | ■ | ■ | ||
■ | ■ | ■ | ■ | ■ | ||
■ | ■ | ■ | ■ | ■ | ||
■ | ■ | ■ | ||||
■ | ■ | ■ | ■ | ■ | ■ | |
■ | ■ | ■ | ■ | ■ | ■ | |
■ | ■ | ■ | ■ | |||
■ | ■ | ■ | ■ | ■ | ■ | |
■ | ■ |
Each row in this table corresponds to some digit and each column corresponds to some segment. Since we do not know which letter corresponds to which segment, the order of the columns does not match the order of the columns in our earlier table mapping digits to segments.
But some permutation does!
For some permutation of the columns, some permutation of the rows will match our digit-to-segment table exactly.
Is that tractable? The number of permutations for \(n\) distinct objects (with no repitition) is \(n!\)[1], so our number of possibilities when permuting both rows and columns is:
\[ n_d = 10 , n_s = 7 \\ n_d! * n_s! = \\ 3,628,800 * 5,040 = \\ 1,109,282,816 \]
A mere billion possible solutions! A computer should be able to do that easily. However, the idea of "assigning" letters to segments reminded me of another strategy.
Constraint Propagation Problems
To me, this problem felt like something that could be solved with constraint propagation. In particular, it reminded me of Sudoku.
In Sudoku, each square in a 9x9 grid must be assigned a value from 1 to 9 such that no digit repeats within a row, column, or 3x3 box. A puzzle starts with some set of "given" digits - squares that already have a digit assigned. Because of the region-exclusivity rules, given digits constrain the possible values for other cells in the same row, column, and box.
Peter Norvig has a classic essay describing how to solve Sudoku with constraint propagation and search: Solving Every Sudoku Puzzle.
Sudoku is a simple example of a broad class of Constraint Satsifaction Problems for which constraint propagation is one tactic.
Like Sudoku, we can use exclusivity to limit the possible options for different segments. If we start by assuming any segment could be represented by any letter:
Segment | Possible Values |
---|---|
TT | a b c d e f g |
TR | a b c d e f g |
TT | a b c d e f g |
MM | a b c d e f g |
BL | a b c d e f g |
BR | a b c d e f g |
BB | a b c d e f g |
Let's say we had some way of knowing that the segment BL
was represented by the letter c
. We can then remove c
as a possible option for all the other segments:
Segment | Possible Values |
---|---|
TT | a b - d e f g |
TR | a b - d e f g |
TT | a b - d e f g |
MM | a b - d e f g |
BL | - - c - - - - |
BR | a b - d e f g |
BB | a b - d e f g |
Unlike Sudoku, we don't have any "given digits" per-se, only the patterns and our understanding of what segments are used for what digits. We have to find a way to use our problem inputs to further constrain the possible values for each segment.
Identifying Some Constraints
I decided on the following rules after some experimentation. These rules will let us constrain possible letters for each segment using the patterns from our input string, much like the given digits in a sudoku.
It may be possible to simplify or generalize these further! And it's also probably possible to encode our problem even more generically, so that it can fit into a generic CSP solver - but after this I felt I had spent enough time thinking about this one.
1: pattern length = number of segments
- An example string from a line of problem input (a "pattern") that has \(n\) letters must represent a digit with \(n\) segments.
- e.g. since the pattern
abcdf
has 5 letters, it must represent a digit with 5 segments (2, 3, or 5).
- e.g. since the pattern
- A segment that is part of a digit with \(n\) segments, must be represented by a letter from an input string with \(n\) letters.
- e.g. because the segment
TT
is used for the digit '7', and the digit '7' uses 3 segments, there must be a pattern with 3 letters andTT
must be represented by one of those 3 letters. Any letter that isn't one of those 3 cannot correspond toTT
!
- e.g. because the segment
2: common number of segments
Where there are multiple digits that use the same number of segments \(n\), any segment used in all the digits must be represented by a letter appearing in all patterns of length \(n\).
For example:
- The input string
bcdef
represents a digit with 5 segments. - There are 3 digits that use 5 segments: 2, 3, and 5
- Those digits use the following segments:
- \(S_2 = \{ TT, TR, MM, BL, BR \}\)
- \(S_3 = \{ TT, TR, MM, BR, BB \}\)
- \(S_5 = \{ TT, TL, MM, BR, BB \}\)
- These sets have the following segments in common:
- \(S_2 \cap S_3 \cap S_5 = \{ TT, MM, BB \}\)
- Because the string must represent one of the digits 2, 3, or 5, and those digits must include the segments
TT
,MM
, andBB
, we can infer the the segmentsTT
,MM
, andBB
must be represented by one of the letters{ b, c, d, e, f }
. 'a' and 'g' are not possible values for these segments.
Solving with Constraints
We can now define a solution procedure using our rules. Start by mapping each segment to its possible letters, where initially any segment could be any letter:
TT: a b c d e f g
TL: a b c d e f g
TR: a b c d e f g
MM: a b c d e f g
BL: a b c d e f g
BR: a b c d e f g
BB: a b c d e f g
Then, while any segment has more than one possible value:
- Consider the next input pattern, and find the digits that have the same number of segments as the number of letters in the string.
- Get the set of segments used by each of these possible digits.
- Intersect the sets of segments to find segments that are common to all of the possible digits.
- For each of the segments in the intersection, remove as a possibility any letter that is not in the input string.
- If any segment now has only one possible letter, remove that letter as a possibility from all other segments. Repeat if this steps leaves any additional segments with only one possible value.
- Continue at step 1 with the next set of letters from the input.
Because our problem space is so small (assigning 7 letters to 7 segments) and we know each case will have a unique solution (the pattern / set of segments for each digit is unique), this will completely solve the problem without any search or backtracking required.
Worked example
We'll continue to use the example input from the Advent of Code problem page:
acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab
I will sort the letters in each pattern for consistency.
Iteration 0 - abcdefg
abcdefg
happens to be the first input string; because this string contains all segments, we can't use it to constrain the possible values and simply skip it.
Iteration 1 - bcdef
Remember bcdef
from "2: common number of segments"?
Recap
bcdef
represents a digit with 5 segments- 3 digits consist of 5 segments:
- \(S_2= \{TT, TR, MM, BL, BB \}\)
- \(S_3= \{TT, TR, MM, BR, BB \}\)
- \(S_5= \{TT, TL, MM, BR, BB \}\)
- The segments
TT
,MM
, andBB
appear in all 3 possible digits (set intersection) - The segments
TT
,MM
, andBB
must be represented by one of the letters inbcdef
We can remove the letters a
and g
as possiblities for the segments TT, MM, and BB, because all the digits with 5 segments include those segments.
TT: b c d e f
TL: a b c d e f g
TR: a b c d e f g
MM: b c d e f
BL: a b c d e f g
BR: a b c d e f g
BB: b c d e f
Iteration 2 - acdfg
acdfg
represents a digit with 5 segments- 3 digits consist of 5 segments (same as iteration 1):
- \(S_2= \{TT, TR, MM, BL, BB \}\)
- \(S_3= \{TT, TR, MM, BR, BB \}\)
- \(S_5= \{TT, TL, MM, BR, BB \}\)
- The segments
TT
,MM
, andBB
appear in all 3 digits (same as iteration 1) - The segments
TT
,MM
, andBB
must be one of the letters inacdfg
Therefore, we can remove the letters b
and e
as possibilities for the segments TT
, MM
, and BB
:
TT: c d f
TL: a b c d e f g
TR: a b c d e f g
MM: c d f
BL: a b c d e f g
BR: a b c d e f g
BB: c d f
At this point we have a triple: TT
, MM
, and BB
each use the letters c
, d
, and f
. This implies that the other 4 segments can not use those letters.
I've ignored this in order to keep the procedure simple, but see the collapsed notes below if you're interested:
Pairs, Triples, and Disjoint Sets
If a segment has only one possibility, that letter cannot be used for any other segment.
This is also true for pairs, triples, or any subset of size \(N < 7\) where \(N\) segments have the same set of \(N\) possible values.
This is easiest to see with a pair/tuple, where 2 segments have an identical set of 2 possible values. For example, below MM
and BL
(2 segments) each have 2 possibilities, c
and d
:
TT: a b c d e f g
TL: a b c d e f g
TR: a b c d e f g
MM: c d
BL: c d
BR: a b c d e f g
BB: a b c d e f g
If it turns out that \(MM=c\), then \(BL=d\). And if \(MM=d\), then \(BL=c\). Regardless, the letters c
and d
can't be used by any other segments; they must be assigned to either MM
or BL
. We can remove them as possible values for all the other segments!
TT: a b e f g
TL: a b e f g
TR: a b e f g
MM: c d
BL: c d
BR: a b e f g
BB: a b e f g
This is precisely like identifying "naked" pairs or triples in Sudoku. Although it's easy to see visually, implementing this in code could require a lot of scanning and comparisons, so I left it out of my solution.
Iteration 3 - abcdf
abcdf
represents a digit with 5 segments- 3 digits consist of 5 segments (same as iteration 2)
- 3 segments appear in all 3 digits:
TT
,MM
, andBB
(same as iteration 2) TT
,MM
, andBB
must be one of the letters inabcdf
The remaining possibilities for TT
, MM
, and BB
(cdf
) are already a subset of abcdf
, so the possibility table doesn't change.
Iteration 4 - abd
abd
represents a sigit with 3 segments- 1 digit consists of 3 segments:
- \(S_7 = \{TT, TR, BR \}\)
TT
,TR
, andBR
must come from the letters "abd"
Remove any of the letters c
, e
, f
, or g
from the segments TT
, TR
, and BR
:
TT: d
TL: a b c d e f g
TR: a b d
MM: c d f
BL: a b c d e f g
BR: a b d
BB: c d f
The segment TT
now has one remaining possibility. We can "assign" TT = d
and remove d
as a possibility for all other segments:
TT= d
TL: a b c e f g
TR: a b
MM: c f
BL: a b c e f g
BR: a b
BB: c f
We now have 2 pairs! Implementing a pair constraint could remove more possibile values here, but I'll again ignore this for simplicity.
Iteration 5 - abcdef
abcdef
represents a digit with 6 segments- 3 digits have 6 segments:
- \(S_0 = \{ TT, TL, TR, BL, BR, BB \}\)
- \(S_6 = \{ TT, TL, MM, BL, BR, BB \}\)
- \(S_9 = \{ TT, TL, TR, MM, BR, BB \}\)
- 4 segments appear in all 3 digits:
TT
,TL
,BR
, andBB
Remove the letter g
from the segment TL
(TT
is already assigned, and g
has already been removed for BR
and BB
):
TT= d
TL: a b c e f
TR: a b
MM: c f
BL: a b c e f g
BR: a b
BB: c f
Single Position
The letter g
now only appears in one row. This implies that BL = g
!
This is the same as the "single position" technique for Sudoku, also known as spotting "Hidden Singles".
This could be implemented in code by scanning the sets on each iteration, or maintaining a map of letters to possible segments (the inverse of our map from segments to possible letters). Like using pairs and triples, I left this out of my solution for simplicity.
Iteration 6 - bcdefg
bcdefg
represents a digit with 6 segments- This has the same set of shared segments as iteration 5 (
TT
,TL
,BR
, andBB
)
Remove the letter a
from TL
, BR
, and BB
:
TT= d
TL: b c e f
TR: a b
MM: c f
BL: a b c e f g
BR: b
BB: c f
The segment BR
has one remaining possibility:
TT= d
TL: c e f
TR: a
MM: c f
BL: a c e f g
BR= b
BB: c f
Now the segment TR
also has one remaining possibility:
TT= d
TL: c e f
TR= a
MM: c f
BL: c e f g
BR= b
BB: c f
Iteration 7 - abef
abef
represents a digit with 4 segments- 1 digit has 4 segments:
- \(S_4 = \{TL, TR, MM, BR\}\)
TL
,TR
,MM
,BR
must come from the letters inabef
Remove the letters c
and d
from TL
and MM
(TR
and BR
are already assigned):
TT= d
TL: e f
TR= a
MM: f
BL: c e f g
BR= b
BB: c f
Segment MM
has only one possibility:
TT= d
TL: e
TR= a
MM= f
BL: c e g
BR= b
BB: c
Now segment TL
has only one possibility:
TT= d
TL= e
TR= a
MM= f
BL: c g
BR= b
BB: c
Now segment BB
has only one possibility:
TT= d
TL= e
TR= a
MM= f
BL: g
BR= b
BB= c
And segment BL
has only one possibility:
TT= d
TL= e
TR= a
MM= f
BL= g
BR= b
BB= c
At this point, all segments have one possibility:
{
TT = d,
TL = e,
TR = a,
MM = f,
BL = g,
BR = b,
BB = c
}
So we do not need to consider the input strings "cagedb" or "ab".
Implementation
todo... clean up that F# implementation!