Day 03 - Grids and Adjacency

## [Problem](https://adventofcode.com/2023/day/3) We must sum all numbers adjacent (including diagonals) to a symbol. Anything that isn't a period or a digit is considered a symbol. ### Input - 140 x 140 grid: ~20,000 characters - 21% of the characters in the grid belong to a number or are a symbol - 1217 numbers - Majority 3 digits, some single and double - 748 symbols - The symbols are `#$%&*+-/=@` in lexicographical order - 361 `*`'s - `#$%&` and `*+` are neighbouring chunks, all other symbols are on lonely islands. This information might be useless of the beginning of a rabbit hole for everyone's favourite thing, an early micro-optimisation😅 In part 2 we only care about asterisk symbols that are adjacent to exactly two numbers, and the product of those numbers is added to our result. > [!example] Example input > ``` 467..114.. ...*...... ..35..633. ......#... 617*...... .....+.58. ..592..... ......755. ...$.*.... .664.598.. ## [Solution](https://github.com/mic-max/advent/blob/master/2023/d03.py) ### Simple #### Part 1 I made a `namedtuple` Number for readability. It has fields for the int value, y AKA line number and x start and end positions. Iterating through the input lines using `enumerate` to get the index. I perform a regular expression finding all integers. Then I need to check if there exists a symbol within a 1 character border of that integer. If so I will convert the int from its string representation and add to our result. > [!NOTE] Regex Caveat > The token `\d` represents any digit. Which does include `0`. So this code will accept numbers that start with 0 and treat them as decimal numbers (not the typically 0 prefixed octal). Not that any were included in the input 😌 The function to determine if a symbol is adjacent is quite simple. It takes a y position and two x positions which represent the number to put a bounding box around. It first loops y values from one less than the y argument until 1 past it (inclusive), making sure not to overstep the bounds. Same thing for the inner loop for the x positions essentially, and it returns True at the first symbol seen. ```python def symbol_in_bounds(y, x1, x2): for y in range(max(0, y - 1), min(LINE_COUNT, y + 2)): for x in range(max(0, x1 - 1), min(LINE_LENGTH - 1, x2 + 1) + 1): if is_symbol(L[y][x]): return True return False ``` It's really as simple as the following once we have a way to check if a bounding box contained a symbol! ```python res = 0 for y, line in enumerate(L): for m in re.finditer(r'\d+', line): if symbol_in_bounds(y, m.start(), m.end() - 1): res += int(m[0]) return res ``` #### Part 2 For part 2 we will add a data structure to keep track of our numbers. Using a `defaultdict(list)` will cut down on the amount of unnecessary searching since we will organize each `Number` into its numeric row using the y value. As we iterate over each line's numbers we will add it to our numbers collection. Then we can loop through every character of the grid once again and find the asterisks only. We can call a helper function to find all adjacent numbers to this y, x position and if there's exactly two, multiply their values together, after converting them from a string to an int, and add it to our result. ```python def adjacent_nums(y, x): return [n.value for n in (numbers[y - 1] + numbers[y] + numbers[y + 1]) if (n.x1 - 1) <= x <= (n.x2 + 1)] ``` This pythonic one liner list comprehension function isn't so complicated. It is merely taking the numbers from the above, current and below rows (all out of bounds are handled by the defaultdict), checking if they horizontally come within 1 column of the asterisk and if they do map the Number to just its value. ```python res = 0 numbers = collections.defaultdict(list) for y, line in enumerate(L): for m in re.finditer(r'\d+', line): numbers[y].append(Number(m[0], y, m.start(), m.end() - 1)) for y, line in enumerate(L): for x, ch in enumerate(line): if ch == '*': adj = adjacent_nums(y, x) if len(adj) == 2: res += math.prod(map(int, adj)) return res ``` ### Iterative Another way to solve part 1 at least would be to iterate through every line's characters building up the int and keeping track of some state. ### Quadtree This might be the ideal data structure? ## Performance | Part Number | Time | | ----------- | -------- | | Part 1 | 0.003261 | | Part 2 | 0.002694 | It's funny my part 2 is faster than part 1! Stopping as soon as a failure state is reached could improve the performance in part 2 when looking for an asterisk with exactly two adjacent numbers. Currently my code will go onto find all neighbours, it should quit once it has found a third. I tried improving my original code that checked whether a character was a symbol or not. The result was actually a program that took 25 times as long to execute... What's going on here? ```python ch not in '.1234567890' # vs ch != '.' and (ch < '0' or ch > '9') ``` I thought that checking 3 conditions would be faster than checking if a character exists in an 11 character string. It's slows the runtime of part 1 down by about 10%. It was not of any benefit to optimize out checking the characters that belong to the number itself from symbol within a boundary function. It also complicated the code. Take a look. ```python def symbol_in_bounds(y, x1, x2): if x1 > 0 and is_symbol(L[y][x1 - 1]): return True if x2 < LINE_LENGTH - 1 and is_symbol(L[y][x2 + 1]): return True for x in range(max(0, x1 - 1), min(LINE_LENGTH - 1, x2 + 1) + 1): if is_symbol(L[max(0, y - 1)][x]): return True for x in range(max(0, x1 - 1), min(LINE_LENGTH - 1, x2 + 1) + 1): if is_symbol(L[min(LINE_COUNT - 1, y + 1)][x]): return True return False ``` ## Padding Input A strategy to simplify code is to add a border around the grid input of values that we ignore, in this problem that could be the period. Since we are only checking 1 position away in all directions that would be the same size of border we could add. ```python def pad_grid(L, ch = '.', border_size = 1): pad = ch * border_size L = [pad + x + pad for x in L] pad_line = ch * len(L[0]) L = [pad_line] + L + [pad_line] return L ``` This example grid padding function would turn a grid like ``` > ............ 467..114.. > .467..114... ...*...... > ....*....... ..35..633. > ...35..633.. ......#... > .......#.... 617*...... > .617*....... .....+.58. > ......+.58.. ..592..... > ...592...... ......755. > .......755.. ...$.*.... > ....$.*..... .664.598.. > ..664.598... > ............ ``` Keep in mind that if the x and y locations are important that you must now subtract the border size from both before using them. ## Conclusion Refactoring python ranges to list indices is a bit tricky since they are exclusive for the end argument.