Advent of Code 2021: Day 5


The Data


It’s Day 5 and we’re 20% of the way through the advent challenge! See the explanation for today’s challenge here. Today’s challenges will again involve working with indexing numbers in The Matrix a matrix.

We’re given two sets of 2-dimensional coordinates (X,Y), which represent the start and end of a line. We then need to count the number of points at which at least two lines overlap. The data includes an unusual separator (->), so we’ll use read_delim() from the readr package to read in the data. I find the readr functions more powerful than base functions for reading data because they allow for more complex separators. See the example below with base function read.delim that cannot use a separator larger than 1 byte.

read.delim("./data/Day5.txt", sep = " -> ")
## Error in scan(file, what = "", sep = sep, quote = quote, nlines = 1, quiet = TRUE, : invalid 'sep' value: must be one byte
library(readr)

#Read in data where each set of coordinates is a character string
raw_data <- readr::read_delim("./data/Day5.txt",
                              delim = " -> ", col_names = c("start", "end"),
                              show_col_types = FALSE,
                              col_types = list(start = readr::col_character(),
                                               end = readr::col_character()))

head(raw_data)
## # A tibble: 6 × 2
##   start   end    
##   <chr>   <chr>  
## 1 503,977 843,637
## 2 437,518 437,225
## 3 269,250 625,250
## 4 846,751 646,751
## 5 18,731  402,731
## 6 749,923 749,986

We need to be able to access each of the X and Y values separately, so we’ll separate this data out into 4 columns.

library(stringr)

#Convert the characters into a 4col numeric matrix
start_point <- str_split(raw_data$start, pattern = ",", simplify = TRUE)
end_point   <- str_split(raw_data$end, pattern = ",", simplify = TRUE)
all_points  <- cbind(start_point, end_point)
all_points  <- as.numeric(all_points)
dim(all_points) <- c(nrow(raw_data), 4)

head(all_points)
##      [,1] [,2] [,3] [,4]
## [1,]  503  977  843  637
## [2,]  437  518  437  225
## [3,]  269  250  625  250
## [4,]  846  751  646  751
## [5,]   18  731  402  731
## [6,]  749  923  749  986


The Challenges


Challenge 1


For challenge 1 we just focus on horizontal or vertical lines. Just like in our bingo challenge on day 4, we will create an empty matrix (all 0s) on which to map our results. Our matrix is 1000 x 1000 to accomodate all the possible coordinates in our data.

zero_mat <- matrix(0, nrow = 1000, ncol = 1000)

#Look at a section of the matrix
zero_mat[1:10, 1:10]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    0    0    0    0    0    0    0    0    0     0
##  [2,]    0    0    0    0    0    0    0    0    0     0
##  [3,]    0    0    0    0    0    0    0    0    0     0
##  [4,]    0    0    0    0    0    0    0    0    0     0
##  [5,]    0    0    0    0    0    0    0    0    0     0
##  [6,]    0    0    0    0    0    0    0    0    0     0
##  [7,]    0    0    0    0    0    0    0    0    0     0
##  [8,]    0    0    0    0    0    0    0    0    0     0
##  [9,]    0    0    0    0    0    0    0    0    0     0
## [10,]    0    0    0    0    0    0    0    0    0     0

As we’re just focussing on the horizontal or diagonal lines, we filter only those cases where x or y are constant.

#Identify only horizontal or vertical lines
nondiag <- all_points[apply(all_points, MARGIN = 1, FUN = function(x){

  x[1] == x[3] | x[2] == x[4]

}), ]

Now, adding lines to our matrix is fairly straight forward but we need to deal with two small issues before we do. First, the coordinates we’re given start at 0,0; however, in R indexing begins at 1 (a shocking fact for people familiar with other programming languages!). This means we’ll have to add 1 to all our coordinates so they can be properly used for indexing.

The second hurdle is that our coordinates are provided as X,Y, but to index a matrix in R we need to provide coordinates as Y (i.e. rows), X (i.e. columns). In the for loop below I create a sequence of coordinate to map the horizontal and vertical lines and then reverse their order to be used as our index values.

for (i in 1:nrow(nondiag)) {

  line <- nondiag[i, ]
  
  #Make sequence of first coordinates (X)
  #Add one to use for R indexing
  xs   <- (line[1]:line[3]) + 1
  #Make sequence of second coordinates (Y)
  ys   <- (line[2]:line[4]) + 1

  #Create a matrix of index values, BUT we need to write this as Y,X instead of X,Y
  index_values <- cbind(ys, xs)

  zero_mat[index_values] <- zero_mat[index_values] + 1

}

Now we can get our first answer, the number of points at which at least two lines overlap.

sum(zero_mat > 1)
## [1] 5124

Challenge 2


Challenge 2 requires us to include the diagonal lines too. This might seem more complex at first, but we are given the helpful caveat that all diagonal lines are 45 degrees, or, in other words, the slope of all lines will be 1. Because of this, our previous approach that creates a sequence of X and Y values increasing by 1 will still be appropriate. Only this time both the X and Y values will change.

#Find all diagonal lines
diag <- all_points[apply(all_points, MARGIN = 1, FUN = function(x){

  x[1] != x[3] & x[2] != x[4]

}), ]
#Use the same approach to add all diagonal lines to our original matrix
for (i in 1:nrow(diag)) {

  line <- diag[i, ]

  #Make sequence of first coordinates (X)
  #Add one to use for R indexing
  xs   <- (line[1]:line[3]) + 1
  #Make sequence of second coordinates (Y)
  ys   <- (line[2]:line[4]) + 1

  #Create a matrix of index values, BUT we need to write this as Y,X instead of X,Y
  index_values <- cbind(ys, xs)

  zero_mat[index_values] <- zero_mat[index_values] + 1

}

Once we have also added diagonal lines, we can get our second result.

sum(zero_mat > 1)
## [1] 19771


See previous solutions here:

Post-doctoral researcher

Ecologist and data science, interested in using data science techniques for conservation outcomes.