Seven Segment Search - Analysis & Constraints

23 Jan 2022

This post is part of a series: "Advent of Code 2021 - Seven Segment Search"

In this post I work out a constraint-based solution to Advent of Code 2021 day 8, "Seven Segment Search". In part 2, I'll implement the solution in F#.

Premise

I won't repeat the full problem premise - you can find the full problem description at this URL: https://adventofcode.com/2021/day/8.

The problem involves seven segment digital displays. Each segment has its own input signal, labeled with the letters a through g inclusive. Turning on some signals but not others displays some digit 0 through 9. The signals are supposed to correspond to segments like so:

aaaa
b c
b c
dddd
e f
e f
gggg

In the above case, the signals "a c d f g" would correspond to the digit '3'.

But the signal-to-segment connections have been randomized so that we no longer know which segment each signal letter corresponds to.

The problem input consists of multiple lines; each line contains a list of 10 signal patterns corresponding to the digits 0-9 in an arbitrary order. i.e. we are given 10 unique patterns of letters a-g, but do not know which pattern corresponds to which digit or which letter corresponds to which segment. The randomization is different for each line of input.

The latter part of each line contains 4 additional patterns, each representing a digit, using the same letter to segment mappings as the 10 example patterns.

A single line of input looks like this:

acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab | cdfeb fcadb cdfeb cdbaf

The goal of the problem is to use the example patterns on the left to determine which sets of signals correspond to which digits, so we can decode the 4 displays on the right into a 4-digit integer value.

Notation

Since the problem premise is that the signals 'a' - 'g' no longer match the above display segments, I introduced labels for each segment based on their physical position:

TT : top
TL : top-left
TR : top-right
MM : middle
BL : bottom-left
BR : bottom-right
BB : bottom

So the digit 3 would include the segments TT, TR, MM, BR, and BB like so:

    TT
    ■ ■ ■
TL □     ■ TR
   □     ■
    ■ ■ ■  MM
   □     ■
BL □     ■ BR
    ■ ■ ■
       BB

Initial Solution

Here are some initial observations about the problem:

The order of the letters in a pattern does not matter
- Each digit is represented by a unique set of segments and the segments used by a digit are all active simultaneously, so the order of the letters representing the signals is irrelevant: "abc" == "bca"
Because the patterns are really sets, we can add (union), subtract, or intersect them to isolate particular segments.
Some digits have a unique number of segments
- Part 1 of the problem description points out that the digits 1, 4, 7, and 8 each use a unique number of digits, so we can see immediately what patterns correspond to those digits (just not necessarily what letters correspond to what segment)
We can use segments that different digits have in common to determine the letter for some segments
- the digits '1' and '7' only differ by a single segment. If '1' is represented by the pattern "cf" and '7' is represented by the pattern "acf", subtracting the segments in '1' from the segments in '7' leaves only "a", and we can conclude that "a" corresponds to the top horizontal segment.

Say the set of segments representing digit is :

Initial Implementation

For my initial solution, I expanded on the idea that different digits have different segments in common to create a sequence of deductions for each digit. For example:

The digit '4' is the only pattern with 4 segments:
The digit '7' is the only pattern with 3 segments:
The digits '0', '6', and '9' each use 6 segments:
The digit '9' uses only one segment that is not also in the digits '4' and '7' combined:
This is not true for '0', and '6' - each have two segments not in

Therefore the pattern for the digit '9' is a 6-letter pattern that differs from the patterns for '4' and '7' combined by one letter. This allows us to find both which pattern corresponds to the digit '9' and the letter that corresponds to the bottom segment.

You can build a series of deductions like this to determine the identity of all the digits. You can see my F# implementation of this strategy on GitLab. In particular, here is the implementation for finding the digit '9' as described above:

    // 9 uses the segments as 1, 4, and 7 plus 1 additional segment at the bottom.
    // So look for a 6-segment pattern where if we take out the segments from 1, 4, and 7
    // we're left with only one other segment.
    let _147 = Set.unionMany [ bindings[1] ; bindings[4] ; bindings[7] ]
    let match9 pattern =
        match Set.count pattern with
        | 6 -> 
            let diff = Set.difference pattern _147
            (Set.count diff) = 1
        | _ -> false

    set 9 <| List.find match9 unbound

    // the additional segment for digit 9 is the bottom segment:
    let seg_B = Set.difference bindings[9] _147 |> exactlyOne

Shortcomings

This solution functioned correctly and was sufficiently performant for the size of the problem input. My misgivings with this stragey were mostly stylistic:

The decoding function is irreducibly complex - the deduction for each digit is unique and cannot re-use code from other deductions.
The solution is highly specific and could not be generalized to similar problems.
My implementation relied on mutation, which while allowed is not really in the functional spirit.

As a learning exercise for myself, I wondered if there was a more general way to solve the problem.

Domain Modelling and Brute Force

With our segment labels from earlier, we can list out the set of segments used by each digit:

Digit	Segments
0	TT, TL, TR, BL, BR, BB
1	TR, BR
2	TT, TR, MM, BL, BB
3	TT, TR, MM, BR, BB
4	TL, TR, MM, BR
5	TT, TL, MM, BR, BB
6	TT, TL, MM, BL, BR, BB
7	TT, TR, BR
8	TT, TL, TR, MM, BL, BR, BB
9	TT, TL, TR, MM, BR, BB

Another representation of the same information, using a 2D array / membership table:

	TT	TL	TR	MM	BL	BR	BB
0	■	■	■	-	■	■	■
1	-	-	■	-	-	■	-
2	■	-	■	■	■	-	■
3	■	-	■	■	-	■	■
4	-	■	■	■	-	■	-
5	■	■	-	■	-	■	■
6	■	■	-	■	■	■	■
7	■	-	■	-	-	■	-
8	■	■	■	■	■	■	■
9	■	■	■	■	-	■	■

If we use a similar format for a problem input, we can see something kind of interesting. Using the example input from the Advent of Code page:

acedgfb cdfbe gcdfa fbcad dab
cefabd cdfgeb eafb cagedb ab

We can sort the letters in each string and display them aligned in columns:

a b c d e f g
b c d e f
a c d f g
a b c d f
a b d
a b c d e f
b c d e f g
a b e f
a b c d e g
a b

Or perhaps put them in a 2D array:

a	b	c	d	e	f	g
	■	■	■	■	■
■		■	■		■	■
■	■	■	■		■
■	■		■
■	■	■	■	■	■
	■	■	■	■	■	■
■	■			■	■
■	■	■	■	■		■
■	■

Each row in this table corresponds to some digit and each column corresponds to some segment. Since we do not know which letter corresponds to which segment, the order of the columns does not match the order of the columns in our earlier table mapping digits to segments.

But some permutation does!

For some permutation of the columns, some permutation of the rows will match our digit-to-segment table exactly.

Is that tractable? The number of permutations for distinct objects (with no repitition) is ^[1], so our number of possibilities when permuting both rows and columns is:

A mere billion possible solutions! A computer should be able to do that easily. However, the idea of "assigning" letters to segments reminded me of another strategy.

Constraint Propagation Problems

To me, this problem felt like something that could be solved with constraint propagation. In particular, it reminded me of Sudoku.

In Sudoku, each square in a 9x9 grid must be assigned a value from 1 to 9 such that no digit repeats within a row, column, or 3x3 box. A puzzle starts with some set of "given" digits - squares that already have a digit assigned. Because of the region-exclusivity rules, given digits constrain the possible values for other cells in the same row, column, and box.

Peter Norvig has a classic essay describing how to solve Sudoku with constraint propagation and search: Solving Every Sudoku Puzzle.

Sudoku is a simple example of a broad class of Constraint Satsifaction Problems for which constraint propagation is one tactic.

Like Sudoku, we can use exclusivity to limit the possible options for different segments. If we start by assuming any segment could be represented by any letter:

Segment	Possible Values
TT	`a b c d e f g`
TR	`a b c d e f g`
TT	`a b c d e f g`
MM	`a b c d e f g`
BL	`a b c d e f g`
BR	`a b c d e f g`
BB	`a b c d e f g`

Let's say we had some way of knowing that the segment BL was represented by the letter c. We can then remove c as a possible option for all the other segments:

Segment	Possible Values
TT	`a b - d e f g`
TR	`a b - d e f g`
TT	`a b - d e f g`
MM	`a b - d e f g`
BL	`- - c - - - -`
BR	`a b - d e f g`
BB	`a b - d e f g`

Unlike Sudoku, we don't have any "given digits" per-se, only the patterns and our understanding of what segments are used for what digits. We have to find a way to use our problem inputs to further constrain the possible values for each segment.

Identifying Some Constraints

I decided on the following rules after some experimentation. These rules will let us constrain possible letters for each segment using the patterns from our input string, much like the given digits in a sudoku.

It may be possible to simplify or generalize these further! And it's also probably possible to encode our problem even more generically, so that it can fit into a generic CSP solver - but after this I felt I had spent enough time thinking about this one.

1: pattern length = number of segments

An example string from a line of problem input (a "pattern") that has n letters must represent a digit with n segments.
- e.g. since the pattern abcdf has 5 letters, it must represent a digit with 5 segments (2, 3, or 5).
A segment that is part of a digit with n segments, must be represented by a letter from an input string with n letters.
- e.g. because the segment TT is used for the digit '7', and the digit '7' uses 3 segments, there must be a pattern with 3 letters and TT must be represented by one of those 3 letters. Any letter that isn't one of those 3 cannot correspond to TT!

2: common number of segments

Where there are multiple digits that use the same number of segments , any segment used in all the digits must be represented by a letter appearing in all patterns of length .

For example:

The input string bcdef represents a digit with 5 segments.
There are 3 digits that use 5 segments: 2, 3, and 5
Those digits use the following segments:
These sets have the following segments in common:
Because the string must represent one of the digits 2, 3, or 5, and those digits must include the segments TT, MM, and BB, we can infer the the segments TT, MM, and BB must be represented by one of the letters { b, c, d, e, f }. 'a' and 'g' are not possible values for these segments.

Solving with Constraints

We can now define a solution procedure using our rules. Start by mapping each segment to its possible letters, where initially any segment could be any letter:

TT: a b c d e f g
TL: a b c d e f g
TR: a b c d e f g
MM: a b c d e f g
BL: a b c d e f g
BR: a b c d e f g
BB: a b c d e f g

Then, while any segment has more than one possible value:

Consider the next input pattern, and find the digits that have the same number of segments as the number of letters in the string.
Get the set of segments used by each of these possible digits.
Intersect the sets of segments to find segments that are common to all of the possible digits.
For each of the segments in the intersection, remove as a possibility any letter that is not in the input string.
If any segment now has only one possible letter, remove that letter as a possibility from all other segments. Repeat if this steps leaves any additional segments with only one possible value.
Continue at step 1 with the next set of letters from the input.

Because our problem space is so small (assigning 7 letters to 7 segments) and we know each case will have a unique solution (the pattern / set of segments for each digit is unique), this will completely solve the problem without any search or backtracking required.

Worked example

We'll continue to use the example input from the Advent of Code problem page:

acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab

I will sort the letters in each pattern for consistency.

Iteration 0 - abcdefg

abcdefg happens to be the first input string; because this string contains all segments, we can't use it to constrain the possible values and simply skip it.

Iteration 1 - bcdef

Remember bcdef from "2: common number of segments"?

Recap

bcdef represents a digit with 5 segments
3 digits consist of 5 segments:
The segments TT, MM, and BB appear in all 3 possible digits (set intersection)
The segments TT, MM, and BB must be represented by one of the letters in bcdef

We can remove the letters a and g as possiblities for the segments TT, MM, and BB, because all the digits with 5 segments include those segments.

TT: b c d e f
TL: a b c d e f g
TR: a b c d e f g
MM: b c d e f
BL: a b c d e f g
BR: a b c d e f g
BB: b c d e f

Iteration 2 - acdfg

acdfg represents a digit with 5 segments
3 digits consist of 5 segments (same as iteration 1):
The segments TT, MM, and BB appear in all 3 digits (same as iteration 1)
The segments TT, MM, and BB must be one of the letters in acdfg

Therefore, we can remove the letters b and e as possibilities for the segments TT, MM, and BB:

TT: c d f
TL: a b c d e f g
TR: a b c d e f g
MM: c d f
BL: a b c d e f g
BR: a b c d e f g
BB: c d f

At this point we have a triple: TT, MM, and BB each use the letters c, d, and f. This implies that the other 4 segments can not use those letters.
I've ignored this in order to keep the procedure simple, but see the collapsed notes below if you're interested:

Pairs, Triples, and Disjoint Sets

If a segment has only one possibility, that letter cannot be used for any other segment.

This is also true for pairs, triples, or any subset of size where segments have the same set of possible values.

This is easiest to see with a pair/tuple, where 2 segments have an identical set of 2 possible values. For example, below MM and BL (2 segments) each have 2 possibilities, c and d:

TT: a b c d e f g
TL: a b c d e f g
TR: a b c d e f g
MM: c d
BL: c d
BR: a b c d e f g
BB: a b c d e f g

If it turns out that , then . And if , then . Regardless, the letters c and d can't be used by any other segments; they must be assigned to either MM or BL. We can remove them as possible values for all the other segments!

TT: a b e f g
TL: a b e f g
TR: a b e f g
MM: c d
BL: c d
BR: a b e f g
BB: a b e f g

This is precisely like identifying "naked" pairs or triples in Sudoku. Although it's easy to see visually, implementing this in code could require a lot of scanning and comparisons, so I left it out of my solution.

Iteration 3 - abcdf

abcdf represents a digit with 5 segments
3 digits consist of 5 segments (same as iteration 2)
3 segments appear in all 3 digits: TT, MM, and BB (same as iteration 2)
TT, MM, and BB must be one of the letters in abcdf

The remaining possibilities for TT, MM, and BB (cdf) are already a subset of abcdf, so the possibility table doesn't change.

Iteration 4 - abd

abd represents a sigit with 3 segments
1 digit consists of 3 segments:
TT, TR, and BR must come from the letters "abd"

Remove any of the letters c, e, f, or g from the segments TT, TR, and BR:

TT: d
TL: a b c d e f g
TR: a b d
MM: c d f
BL: a b c d e f g
BR: a b d
BB: c d f

The segment TT now has one remaining possibility. We can "assign" TT = d and remove d as a possibility for all other segments:

TT= d
TL: a b c e f g
TR: a b
MM: c f
BL: a b c e f g
BR: a b
BB: c f

We now have 2 pairs! Implementing a pair constraint could remove more possibile values here, but I'll again ignore this for simplicity.

Iteration 5 - abcdef

abcdef represents a digit with 6 segments
3 digits have 6 segments:
4 segments appear in all 3 digits: TT, TL, BR, and BB

Remove the letter g from the segment TL (TT is already assigned, and g has already been removed for BR and BB):

TT= d
TL: a b c e f
TR: a b
MM: c f
BL: a b c e f g
BR: a b
BB: c f

Single Position

The letter g now only appears in one row. This implies that BL = g!
This is the same as the "single position" technique for Sudoku, also known as spotting "Hidden Singles".

This could be implemented in code by scanning the sets on each iteration, or maintaining a map of letters to possible segments (the inverse of our map from segments to possible letters). Like using pairs and triples, I left this out of my solution for simplicity.

Iteration 6 - bcdefg

bcdefg represents a digit with 6 segments
This has the same set of shared segments as iteration 5 (TT, TL, BR, and BB)

Remove the letter a from TL, BR, and BB:

TT= d
TL: b c e f
TR: a b
MM: c f
BL: a b c e f g
BR: b
BB: c f

The segment BR has one remaining possibility:

TT= d
TL: c e f
TR: a
MM: c f
BL: a c e f g
BR= b
BB: c f

Now the segment TR also has one remaining possibility:

TT= d
TL: c e f
TR= a
MM: c f
BL: c e f g
BR= b
BB: c f

Iteration 7 - abef

abef represents a digit with 4 segments
1 digit has 4 segments:
TL, TR, MM, BR must come from the letters in abef

Remove the letters c and d from TL and MM (TR and BR are already assigned):

TT= d
TL: e f
TR= a
MM: f
BL: c e f g
BR= b
BB: c f

Segment MM has only one possibility:

TT= d
TL: e
TR= a
MM= f
BL: c e g
BR= b
BB: c

Now segment TL has only one possibility:

TT= d
TL= e
TR= a
MM= f
BL: c g
BR= b
BB: c

Now segment BB has only one possibility:

TT= d
TL= e
TR= a
MM= f
BL: g
BR= b
BB= c

And segment BL has only one possibility:

TT= d
TL= e
TR= a
MM= f
BL= g
BR= b
BB= c

At this point, all segments have one possibility:

{
TT = d,
TL = e,
TR = a,
MM = f,
BL = g,
BR = b,
BB = c
}

So we do not need to consider the input strings "cagedb" or "ab".

Implementation

todo... clean up that F# implementation!

https://brilliant.org/wiki/counting-permutations/#permutations-of-a-set-of-distinct-objects ↩︎

	TT	TL	TR	MM	BL	BR	BB
0	■	■	■	-	■	■	■
1	-	-	■	-	-	■	-
2	■	-	■	■	■	-	■
3	■	-	■	■	-	■	■
4	-	■	■	■	-	■	-
5	■	■	-	■	-	■	■
6	■	■	-	■	■	■	■
7	■	-	■	-	-	■	-
8	■	■	■	■	■	■	■
9	■	■	■	■	-	■	■

a	b	c	d	e	f	g
	■	■	■	■	■
■		■	■		■	■
■	■	■	■		■
■	■		■
■	■	■	■	■	■
	■	■	■	■	■	■
■	■			■	■
■	■	■	■	■		■
■	■

	TT	TL	TR	MM	BL	BR	BB
0	■	■	■	-	■	■	■
1	-	-	■	-	-	■	-
2	■	-	■	■	■	-	■
3	■	-	■	■	-	■	■
4	-	■	■	■	-	■	-
5	■	■	-	■	-	■	■
6	■	■	-	■	■	■	■
7	■	-	■	-	-	■	-
8	■	■	■	■	■	■	■
9	■	■	■	■	-	■	■

a	b	c	d	e	f	g
	■	■	■	■	■
■		■	■		■	■
■	■	■	■		■
■	■		■
■	■	■	■	■	■
	■	■	■	■	■	■
■	■			■	■
■	■	■	■	■		■
■	■

	TT	TL	TR	MM	BL	BR	BB
0	■	■	■	-	■	■	■
1	-	-	■	-	-	■	-
2	■	-	■	■	■	-	■
3	■	-	■	■	-	■	■
4	-	■	■	■	-	■	-
5	■	■	-	■	-	■	■
6	■	■	-	■	■	■	■
7	■	-	■	-	-	■	-
8	■	■	■	■	■	■	■
9	■	■	■	■	-	■	■

a	b	c	d	e	f	g
	■	■	■	■	■
■		■	■		■	■
■	■	■	■		■
■	■		■
■	■	■	■	■	■
	■	■	■	■	■	■
■	■			■	■
■	■	■	■	■		■
■	■