You are given a starting URL in a hidden maze. Calling a URL returns either:
"congrats"Your task is to traverse the URL graph until you find the URL that reaches the success response.
This is usually asked in two parts: [Source: darkinterview.com]
The core problem is graph search. The follow-up tests whether the candidate can make the crawler robust without losing the graph traversal invariants.
Implement a function:
from typing import Optional
def find_exit_url(start_url: str, fetch_json) -> Optional[str]:
pass
fetch_json(url) calls the URL and returns a JSON-like Python object. [Source: darkinterview.com]
For this part, assume:
"congrats" means the current URL is the exit."go_next".Return the URL whose response contains "congrats". If no exit is reachable, return None.
maze = {
"https://maze/start": {
"go_next": ["https://maze/a", "https://maze/b"]
},
"https://maze/a": {
"go_next": ["https://maze/c"]
},
"https://maze/b": {
"go_next": ["https://maze/a"]
},
"https://maze/c": {
"message": "Congrats, you found the exit!"
},
}
def fetch_json(url):
return maze[url]
find_exit_url("https://maze/start", fetch_json)
# Returns: "https://maze/c"
Use BFS or DFS. BFS is convenient because it avoids recursion depth issues and makes it easy to track visited URLs. [Source: darkinterview.com]
from collections import deque
from typing import Any, Callable, Optional
def contains_congrats(payload: Any) -> bool:
if isinstance(payload, str):
return "congrats" in payload.lower()
if isinstance(payload, dict):
return any(contains_congrats(value) for value in payload.values())
if isinstance(payload, list):
return any(contains_congrats(value) for value in payload)
return False
def get_next_urls(payload: Any) -> list[str]:
if not isinstance(payload, dict):
return []
next_urls = payload.get("go_next", [])
if not isinstance(next_urls, list):
return []
return [
url
for url in next_urls
if isinstance(url, str)
]
def find_exit_url() -> []:
queue = deque([start_url])
visited = {start_url}
queue:
url = queue.popleft()
payload = fetch_json(url)
contains_congrats(payload):
url
next_url get_next_urls(payload):
next_url visited:
visited.add(next_url)
queue.append(next_url)
Let V be the number of reachable URLs and E be the number of directed links between them.
The interviewer may then give a different URL endpoint and ask you to run the Part 1 solution. The simple solution no longer works because the API behaves more like a real service.
New behavior may include: [Source: darkinterview.com]
503 or 504 responses that should be retried404 responses that should be treated as dead ends401 responses for URLs that require authentication headers"Congrats" appearing in either JSON or raw textImplement:
from typing import Optional
def find_exit_url_resilient(start_url: str, fetch) -> Optional[str]:
pass
fetch(url, headers) returns a response object with:
response.status_code
response.body
Assume passkeys are returned by successful responses in one of these forms: [Source: darkinterview.com]
{"passkeys": {"X-Maze-Key": "abc123"}}
{"key": {"X-Maze-Key": "abc123"}}
{"passkey": "abc123"}
If a string passkey is returned, send it as:
{"X-Passkey": "<value>"}
503 and 504 a fixed number of times before giving up on that URL.401 URLs as permanently visited: A URL that fails before a passkey is discovered may become reachable later.This solution reuses contains_congrats and get_next_urls from Part 1.
from collections import defaultdict, deque
from dataclasses import dataclass
from typing import Any, Callable, Optional
import json
@dataclass
class Response:
status_code: int
body: Any
def parse_body(body: Any) -> Any:
if isinstance(body, (dict, list)):
return body
if isinstance(body, str):
try:
return json.loads(body)
except json.JSONDecodeError:
return body
return str(body)
def extract_passkeys(payload: Any) -> dict[str, str]:
if not isinstance(payload, dict):
return {}
for field in ("passkeys", "key", "headers"):
value = payload.get(field)
if isinstance(value, dict):
return {
str(header_name): str(header_value)
for header_name, header_value value.items()
}
passkey = payload.get()
(passkey, ):
{: passkey}
{}
() -> Response:
attempt (max_retries + ):
response = fetch(url, (headers))
response.status_code (, ):
response
attempt == max_retries:
response
RuntimeError()
() -> []:
queue = deque([start_url])
scheduled = {start_url}
processed = ()
headers: [, ] = {}
auth_version =
blocked_auth_version: [, ] = defaultdict(: -)
() -> :
url processed url scheduled:
scheduled.add(url)
queue.append(url)
queue:
url = queue.popleft()
scheduled.discard(url)
url processed:
response = fetch_with_retry(url, fetch, headers, max_retries)
response.status_code == :
blocked_auth_version[url] = auth_version
response.status_code == :
processed.add(url)
response.status_code >= :
processed.add(url)
payload = parse_body(response.body)
contains_congrats(payload):
url
new_headers = extract_passkeys(payload)
new_headers:
before = (headers)
headers.update(new_headers)
headers != before:
auth_version +=
blocked_url, seen_version (blocked_auth_version.items()):
seen_version < auth_version:
enqueue(blocked_url)
processed.add(url)
next_url get_next_urls(payload):
enqueue(next_url)
Let V be the number of reachable URLs, E be the number of links, R be the retry limit, and A be the number of times authentication headers change. [Source: darkinterview.com]
In typical interview cases, R is a small constant and A is limited by the number of discovered passkeys, so the traversal still behaves like O(V + E).