Merge pull request #36 from paxtonfitzpatrick/main

jeremymanning · web-flow · commit 0fad9f15a073 · 2024-07-29T21:57:08.000-04:00
catching up on previous days' problems
diff --git a/problems/1334/paxtonfitzpatrick.md b/problems/1334/paxtonfitzpatrick.md
@@ -0,0 +1,65 @@
+# [Problem 1334: Find the City With the Smallest Number of Neighbors at a Threshold Distance](https://leetcode.com/problems/find-the-city-with-the-smallest-number-of-neighbors-at-a-threshold-distance/description/?envType=daily-question)
+
+## Initial thoughts (stream-of-consciousness)
+
+- I feel like this is one of those problems that's supposed to cue you to some particular algorithm that I'm just not familiar with. I'll probably end up googling for this eventually, but I wanna take a stab at a solution first.
+- okay so to answer the problem I need to find the number of other nodes within `distanceThreshold` of each node. My first thought is I could do a DFS (BFS?) from each node where I stop going down a particular path when I reach the threshold distance, but that seems like it'd be slow...
+- I could reduce the overall number of paths upfront by dropping any whose weights are greater than the threshold distance, because I know they'll never contribute to a viable path anyway. This would require an additional $O(n)$ operation to filter the initial edge list, but if `distanceThreshold` is low, it could easily pay off. There are even a few instances of this in the examples, so I feel like it's something we're meant to notice.
+  - actually, I'll need to iterate through the list of edges at least once to build the graph anyway, so I could just filter out the edges that are too long as I go. So this wouldn't add a whole extra $O(n)$ operation and is definitely worth doing, I think
+- hmmm... another scenario potentially worth accounting for: if a node has no edges to any other nodes, then I know that 0 is the smallest number of neighbors. I'm not even sure whether they'd include a case like this, but it feels conspicuous that we're given `n` in addition to the `edges` list -- if all `n` nodes were necessarily in `edges`, `n` would be redundant.
+- let's switch over to a different part of the problem -- I'll need some way of representing the graph to traverse it. I think this format will likely follow the traversal method I come up with, but my initial thought is to create a dict where keys are the IDs for each node and their values are a lists of (neighbor, edge weight) tuples.
+- maybe the DFS approach would be manageable with some sort of caching or memoization? E.g., I could do a normal DFS for the first node I search from (up to `distanceThreshold`), and then store distances from that node to all nodes within `distanceThreshold` of it. Then whenever I DFS from another node and encounter that one, I can check that record instead of going further down that path.
+  - actually I'm not sure this would end up being faster... I'll end up checking more nodes than I would have if I'd just done a normal DFS because some within `distanceThreshold` of the node whose record I check won't be within `distanceThreshold` of the node I'm searching from. Might've helped if this were a binary tree, but since it's a graph I'll also end up checking lots of nodes via both DFS and other nodes' records
+  - then again, if some node is within `distanceThreshold` of the node I'm currently DFSing from via the node whose record I'm checking and via some other route, I'd have ended up checking it twice anyway via 2 DFSes instead of 1 DFS + 1 record check... so maybe the savings of being able to stop the DFS for a certain path when I hit a node with record data would outweigh the extra checks I'd do because of it? I'm not sure...
+- maybe there's a more efficient way to take advantage of work already done? This idea of considering nodes as "waypoints" to compare different paths seems promising -- i.e., "is the path between node `a` and node `b` via node `c` shorter than the path between them via node `d`/the shortest path between them I've found so far?" But to get to a point where I could check that I'd have to already know the distance between node `d` and each other node... which is the problem I'm trying to solve in the first place. So I'm not sure how to get there.
+- what if I represented the minimum distance between each pair of nodes in a 2D array (upper and lower triangles would be duplicates... maybe I can optimize that away?). Then for each node in the graph `i`, for each of its neighbors `j`, for each node in the graph `k`, I could check whether the path from `i` to `k` via `j` is shorter than `distances_matrix[i][k]`, and update it if so.
+  - hmmm though I'm not sure whether this would work for graphs with nodes separated by > 2 degrees... that doesn't show up in any of the example graphs.
+- okay I ended up googling around and it turns out this is basically the "Floyd-Warshall algorithm"! I just needed to make `j` all nodes in the graph instead of just the neighbors of `i`. I'll try implementing that now... though since the algorithm has $O(n^3)$ time complexity, I'm still not sure whether this would actually be more efficient than the BFS/DFS approach...
+
+
+## Refining the problem, round 2 thoughts
+
+Given the way the Floyd-Warshall algorithm works, I don't think it'll be worth removing edges > `distanceThreshold` from the initial `edges` list because it wouldn't save us any checks during the main processing of the pairwise distances. However, since the algorithm takes $O(n^3)$ time, I *do* think it'll be worth checking upfront whether any nodes have no edges to any other nodes, because that would take comparatively little time, I think
+
+## Attempted solution(s)
+
+```python
+class Solution:
+    def findTheCity(self, n: int, edges: List[List[int]], distanceThreshold: int) -> int:
+        # initialize min distances matrix
+        min_dists = [[float('inf')] * n for _ in range(n)]
+        # set to keep track of nodes with no edges
+        no_edges = set(range(n))
+        # add weights from edges list, remove nodes with edges from set
+        for from_node, to_node, weight in edges:
+            min_dists[from_node][to_node] = weight
+            min_dists[to_node][from_node] = weight
+            no_edges.discard(from_node)
+            no_edges.discard(to_node)
+        # if any nodes have no edges, return the one with the greatest ID
+        if no_edges:
+            return max(no_edges)
+        # # set diagonal to 0 -- actually, not needed since we skip the diagonal
+        # # in the main loop anyway
+        # for i in range(n):
+        #     min_dists[i][i] = 0
+        # run Floyd-Warshall
+        for via_node in range(n):
+            for from_node in range(n):
+                for to_node in range(n):
+                    if from_node == to_node:
+                        continue
+                    dist_via_intermediate = min_dists[from_node][via_node] + min_dists[via_node][to_node]
+                    if dist_via_intermediate < min_dists[from_node][to_node]:
+                        min_dists[from_node][to_node] = dist_via_intermediate
+        # find highest-numbered node with fewest reachable nodes within distanceThreshold
+        min_reachable = n
+        for node_id, dists in enumerate(min_dists):
+            reachable = sum(1 for dist in dists if dist <= distanceThreshold)
+            if reachable <= min_reachable:
+                min_reachable = reachable
+                min_reachable_node = node_id
+        return min_reachable_node
+```
+
+![](https://github.com/user-attachments/assets/f4fb89c1-536b-454e-a7e7-a89b8b9c4e20)
diff --git a/problems/2045/paxtonfitzpatrick.md b/problems/2045/paxtonfitzpatrick.md
@@ -0,0 +1,18 @@
+# [Problem 2045: Second Minimum Time to Reach Destination](https://leetcode.com/problems/second-minimum-time-to-reach-destination/description/?envType=daily-question)
+
+## Initial thoughts (stream-of-consciousness)
+
+- okay, this one looks tricky. One initial thought I have is that a path that involves revisiting some node will be the second shortest path only if there aren't two paths that *don't* involve revisiting a node. So I think I can ignore that outside of those specific cases.
+- It sounds like we'll need to use some algorithm that finds *all* paths between a target and destination node. I know Djikstra's algorithm can be modified to terminate early upon encountering a target node, so maybe there's a way to modify it such that it terminates when it encounters that node a second time?
+
+## Refining the problem, round 2 thoughts
+
+### Other notes
+
+## Attempted solution(s)
+
+```python
+class Solution:
+    def secondMinimum(self, n: int, edges: List[List[int]], time: int, change: int) -> int:
+
+```
diff --git a/problems/2976/paxtonfitzpatrick.md b/problems/2976/paxtonfitzpatrick.md
@@ -0,0 +1,94 @@
+# [Problem 2976: Minimum Cost to Convert String I](https://leetcode.com/problems/minimum-cost-to-convert-string-i/description/?envType=daily-question)
+
+## Initial thoughts (stream-of-consciousness)
+
+- okay, so this is going to be another shortest path problem. Letters are nodes, corresponding indices in `original` and `changed` are directed edges, and those same indices in `cost` give their weights.
+- I was originally thinking I'd want to find all min distances between letters using something similar to yesterday's problem (Floyd-Warshall algorithm), but i actually think it'll be more efficient to figure out what letters we need to convert first and then searching just for those. So I think this is calling for Djikstra's algorithm.
+- so I'll loop through `source` and `target`, identify differences, and store source-letter + target-letter pairs.
+  - if a source letter isn't in `original` or a target letter isn't in `changed`, I can immediately `return -1`
+  - actually, I think I'll store the source and target letters as a dict where keys are source letters and values are lists (probably actually sets?) of target letters for that source letter. That way if I need to convert some "a" to a "b" and some other "a" to a "c", I can save time by combining those into a single Djikstra run.
+- then I'll run Djikstra's algorithm starting from each source letter and terminate when I've found paths to all target letters for it.
+- I'll write a helper function for Djikstra's algorithm that takes a source letter and a set of target letters, and returns a list (or some sort of container) of minimum costs to convert that source letter to each of the target letters.
+
+---
+
+- after thinking through how to implement Djikstra here a bit, I wonder if Floyd-Warshall might actually be more efficient... Floyd-Warshall's runtime scales with the number of nodes, but since nodes here are letters, we know there will always be 26 of them. So that's essentially fixed. Meanwhile Djikstra's runtime scales with the number of nodes *and* edges, and since the constraints say there can be upto 2,000 edges, we're likely to have a large number of edges relative to the number of nodes. That also means we're much more likely to duplicate operations during different runs of Djikstra than we would be if the graph were large and sparse. So I think I'll actually try Floyd-Warshall first.
+
+## Refining the problem, round 2 thoughts
+
+- we could reduce the size of the distance matrix for the Floyd-Warshall algorithm by including only the letters in `original` and `changed` instead of all 26. But I doubt this would be worth it on average, since it'd only sometimes reduce the number of nodes in the graph and always incur overhead costs of converting `original` and `changed` to sets, looping over letters and converting them to indices instead of looping over indices directly, etc.
+  - speaking of which, I'll still have to loop over letters and convert them to indices in order to extract the conversion costs for mismatched letters, and I can think of two ways to do this:
+    - store a letters/indices mapping in a `dict`, i.e. `{let: i for i, let in enumerate('abcdefghijklmnopqrstuvwxyz')}` and index it with each letter
+    - use `ord(letter)` to get the letter's ASCII value and subtract 97 (ASCII value of "a") to get its index in the alphabet
+
+    Both operations would take constant time, but constructing the `dict` will use a little bit of additional memory so I think I'll go with the latter.
+  - hmmm actually, if I can just use a dict as the letter/index mapping, that might make reducing the size of the distance matrix worth it. Maybe I'll try that if my first attempt is slow.
+- hmmm the problem notes that "*there may exist indices `i`, `j` such that `original[j] == original[i]` and `changed[j] == changed[i]`*". But it's not totally clear to me whether they're (A) simply saying that nodes may appear in both the `original` and `changed` lists multiple times because they can have multiple edges, or (B) saying that ***edges*** may be duplicated, potentially with different `cost` values -- i.e., `(original[j], changed[j]) == (original[i], changed[i])` but `cost[j] != cost[i]`. My guess is that it's the latter because the former seems like a sort of trivial point to make note of, so I'll want to account for this when I initialize the distance matrix.
+
+## Attempted solution(s)
+
+```python
+class Solution:
+    def minimumCost(self, source: str, target: str, original: List[str], changed: List[str], cost: List[int]) -> int:
+        # setup min distance/cost matrix
+        INF = float('inf')
+        min_costs = [[INF] * 26 for _ in range(26)]
+        for orig_let, changed_let, c in zip(original, changed, cost):
+            orig_ix, changed_ix = ord(orig_let) - 97, ord(changed_let) - 97
+            if c < min_costs[orig_ix][changed_ix]:
+                min_costs[orig_ix][changed_ix] = c
+        # run Floyd-Warshall
+        for via_ix in range(26):
+            for from_ix in range(26):
+                for to_ix in range(26):
+                    if min_costs[from_ix][via_ix] + min_costs[via_ix][to_ix] < min_costs[from_ix][to_ix]:
+                        min_costs[from_ix][to_ix] = min_costs[from_ix][via_ix] + min_costs[via_ix][to_ix]
+        # compute total cost to convert source to target
+        total_cost = 0
+        for src_let, tgt_let in zip(source, target):
+            if src_let != tgt_let:
+                src_ix, tgt_ix = ord(src_let) - 97, ord(tgt_let) - 97
+                if min_costs[src_ix][tgt_ix] == INF:
+                    return -1
+                total_cost += min_costs[src_ix][tgt_ix]
+        return total_cost
+```
+
+![](https://github.com/user-attachments/assets/2df1bdf7-8f66-4d28-90f8-12998425b3ba)
+
+Not bad. But I'm curious whether creating a graph from only the letters in `original` and `changed` would be faster. It's a quick edit, so I'll try it. Biggest change will be an additional `return -1` condition in the last loop to handle letters in `source` and `target` that can't be mapped to/from anything.
+
+```python
+class Solution:
+    def minimumCost(self, source: str, target: str, original: List[str], changed: List[str], cost: List[int]) -> int:
+        # setup min distance/cost matrix
+        INF = float('inf')
+        letters = set(original) | set(changed)
+        letters_ixs = {let: i for i, let in enumerate(letters)}
+        len_letters = len(letters)
+        min_costs = [[INF] * 26 for _ in range(len_letters)]
+        for orig_let, changed_let, c in zip(original, changed, cost):
+            if c < min_costs[letters_ixs[orig_let]][letters_ixs[changed_let]]:
+                min_costs[letters_ixs[orig_let]][letters_ixs[changed_let]] = c
+        # run Floyd-Warshall
+        for via_ix in range(len_letters):
+            for from_ix in range(len_letters):
+                for to_ix in range(len_letters):
+                    if min_costs[from_ix][via_ix] + min_costs[via_ix][to_ix] < min_costs[from_ix][to_ix]:
+                        min_costs[from_ix][to_ix] = min_costs[from_ix][via_ix] + min_costs[via_ix][to_ix]
+        # compute total cost to convert source to target
+        total_cost = 0
+        try:
+            for src_let, tgt_let in zip(source, target):
+                if src_let != tgt_let:
+                    if (change_cost := min_costs[letters_ixs[src_let]][letters_ixs[tgt_let]]) == INF:
+                        return -1
+                    total_cost += change_cost
+        except KeyError:
+            return -1
+        return total_cost
+```
+
+![](https://github.com/user-attachments/assets/263ad81c-900d-40d1-8602-ee5012e4b47e)
+
+Wow, that made a much bigger difference than I expected!
diff --git a/problems/912/paxtonfitzpatrick.md b/problems/912/paxtonfitzpatrick.md