## [Problem](https://adventofcode.com/2023/day/4)
We are given a list of cards, each having a number, winning numbers, and playing numbers. Between those groups of numbers, the first match is worth 1 point and with every additional match it doubles your points. Sum the points from all cards for the answer.
For part two, you win copies of the following cards. So if `Card 10` has 2 matches, you win copies of `Card 11` and `Card 12`. Process all of the original and copied scratchcards until no more scratchcards are won. The total scratchcards is the answer.
> [!Example] Example line of input
> `Card 1: 41 48 83 86 17 | 83 86 6 31 17 9 48 53`
### Input
- 219 lines
- All 3 character card numbers, [left padded](https://en.wikipedia.org/wiki/Npm_left-pad_incident)
- All 2 character numbers, left padded
- 10 winning numbers
- 25 playing numbers
- The fact that they are numbers doesn't have any significance, I never used `int`
## [Solutions](https://github.com/mic-max/advent/blob/master/2023/d04.py)
The basic idea of all of these solutions is the same. Find how many matches a card has, calculate how many points that's worth and sum to our answer.
```python
def frame(L, helper):
res = 0
for line in L:
if matches := helper(line):
res += 2 ** (matches - 1)
return res
```
That's how I ended up writing this framework function for part 1.
### Regex and Set
Process each line by creating a set of all the playing numbers. Then loop through all the winning numbers and increment our match counter by 1 if it's been seen. I used `re.finditer` to find all numbers in the string slices, but any way to split out the strings between spaces would work.
```python
def regex(line):
mine = set()
matches = 0
colon_index = line.index(':')
pipe_index = line.index('|', colon_index)
win_string = line[colon_index + 1:pipe_index]
for m in re.finditer(r'\d+', line[pipe_index + 1:]):
mine.add(m[0])
for m in re.finditer(r'\d+', win_string):
if m[0] in mine:
matches += 1
return matches
```
### String Slices
Ditch the set! We will treat the playing cards string as our data structure. We will iterate through all winning numbers counting matches if its `in` our playing cards string.
To avoid finding the winning number `9` in my playing numbers string that contains `99` I strategically pad both the playing numbers string and the current winning number being searched for with spaces. So it is more like finding `" 9 "` in the string `" 99 "` which of course fails.
A reverse `strip()`, if you will
```python
def dress(s: str) -> str:
return " " + s + " "
```
```python
def slicing(line):
matches = 0
colon_index = line.index(':')
pipe_index = line.index('|', colon_index)
mine_string = line[pipe_index + 1:] + ' '
for part in line[colon_index + 1:pipe_index - 1].split():
if f" {part} " in mine_string: # use dress() if you like
matches += 1
return matches
```
### Aho-Corasick
I used this algorithm in [[Day 01 - Pattern Matching Double Digit Numbers]] so I thought why not try it out here. Although it was not the fastest it does work and is still faster than regex. Building the automaton with the winning numbers was faster than the opposite. I presume because making an automaton from 25 patterns and checking it with 10 strings is slower than making an automaton from 10 patterns and checking it with 25 strings.
```python
def aho_corasick(line):
matches = 0
colon_index = line.index(':')
pipe_index = line.index('|', colon_index)
win_string = line[colon_index + 2:pipe_index - 1]
a1 = ahocorasick.Automaton()
for i, win in enumerate(win_string.split()):
a1.add_word(win, True)
a1.make_automaton()
mine_string = line[pipe_index + 2:]
for x in mine_string.split():
if a1.get(x, False):
matches += 1
return matches
```
### Part 2
To keep track of how many cards will be won I will initialize a list full of 1's to represent my starting cards, since those do count. Then looping through each line I will count the matches using the fastest of my part 1 functions. Then for each match I will add the current total number of cards to its total. I'm adding that amount because if I have 5x Card 3's which has 1 match, then I will be winning 5x Card 4's, for example.
```python
totals = [1] * len(L)
for i, line in enumerate(L):
matches = slicing(line)
for j in range(i + 1, i + matches + 1):
totals[j] += totals[i]
return sum(totals)
```
I've iterated on this function a lot and got it as simple as I want to without stressing myself out. I think I could reduce the space used. I think at least down to the maximum number of matches a card has, which is 10 for my example input. The storage would then be limited to the number of winning numbers on a card instead of the number of cards.
## Performance
| Name | Part | Time (s) | Slower Than Best | Result | Correct |
| ------------------- | ---- | --------- | ---------------- | ------- | ------- |
| slicing_1_winp | 1 | 0.0005359 | 🔥 | 26426 | ✔ |
| slicing_1_pinw | 1 | 0.0008059 | 1.50x 🐌 | 26426 | ✔ |
| aho_corasick_1_pinw | 1 | 0.0017189 | 3.21x 🐌 | 26426 | ✔ |
| aho_corasick_1_winp | 1 | 0.0020292 | 3.79x 🐌 | 26426 | ✔ |
| regex_1_pinw | 1 | 0.0028156 | 5.25x 🐌 | 26426 | ✔ |
| regex_1_winp | 1 | 0.0029252 | 5.46x 🐌 | 26426 | ✔ |
| part_2 | 2 | 0.0006051 | 🔥 | 6227972 | ✔ |
- Take advantage of the fact that every number is 2 characters.
- `itertools.batched` might have been handy to process these fixed size chunks.
- Instead of building a set it was faster to just if the target needle is in the string slice haystack.
- Avoids `set.intersection` as well.
- Pass around only primitive values.
- Utilize offsets and starting and ending points rather than lists, when possible.
- Avoid using split. Use find on the input string to locate beginning and ending of the segment of interest, then use a `range` for loop.
- Performance at small scale can be much different than at large scale. Take care to optimize for the real world data sizes or find the point when one algorithm begins to beat the other.
> [!NOTE] P.S.
> `winp` means that function searched for winners within the playing numbers to find matches and vice-versa for `pinw`.
>
## Conclusion
I see how you can spend time optimising code that is overall inconsequential. Overall it probably doesn't matter and make sure you can justify spending a day optimising a function to take it from `0.3 seconds` to `0.0006 seconds` to an employer.
When trying to improve performance you should focus on the customer perspective.
Contemporary code readability and changeability largely matter more than performance.