You are given a set of services in a cloud application. Each service may depend on other services, and the dependency graph is a directed acyclic graph.
When the entrypoint service receives one request, it calls all of its dependencies. Each dependency then calls its own dependencies, and so on. If multiple dependency paths reach the same service, that service handles load from each path.
This is usually asked as a two-part graph question: [Source: darkinterview.com]
The question is often described as a Course Schedule-style graph problem, but the goal is not just cycle detection. The key is cumulative load propagation through a DAG.
You are given service definitions in this format:
<service_name>=<dependency_1>,<dependency_2>,...
For example: [Source: darkinterview.com]
A=B
B=C,D
C=D
D=E,F
E=F
F=
The entrypoint is guaranteed to be defined. Dependencies that are not defined in the input should be ignored.
Return every reachable service's load factor in this output format:
<service_name>*<load_factor>
Sort the output lexicographically by service name. [Source: darkinterview.com]
services = [
"A=B",
"B=C,D",
"C=D",
"D=E,F",
"E=F",
"F=",
]
entrypoint = "A"
Dependency graph:
A -> B
B -> C, D
C -> D
D -> E, F
E -> F
Output:
["A*1", "B*1", "C*1", "D*2", "E*2", "F*4"]
Explanation: [Source: darkinterview.com]
A is triggered once.B is triggered once by A.C is triggered once by B.D is triggered twice: once through A -> B -> D and once through A -> B -> C -> D.E is triggered twice because D is triggered twice.F is triggered four times: twice directly from D and twice through D -> E -> F.The simplest solution is to propagate one unit of load through every dependency path.
from collections import defaultdict
def parse_services(services: list[str]) -> dict[str, list[str]]:
graph: dict[str, list[str]] = {}
for row in services:
name, raw_deps = row.split("=", 1)
name = name.strip()
if raw_deps.strip() == "":
graph[name] = []
continue
graph[name] = [
dep.strip()
for dep in raw_deps.split(",")
if dep.strip()
]
defined = set(graph)
return {
service: [dep for dep in deps if dep in defined]
for service, deps in graph.items()
}
def load_factors_dfs(services: list[str], entrypoint: str) -> list[str]:
graph = parse_services(services)
loads: dict[str, int] = defaultdict(int)
def dfs(service: str, incoming_load: int) -> None:
loads[service] += incoming_load
for dependency in graph.get(service, []):
dfs(dependency, incoming_load)
dfs(entrypoint, )
[
service (loads)
]
Let P be the number of dependency paths that are actually explored.
R is the number of reachable servicesThis passes small examples and is easy to reason about. The weakness is that P can be much larger than V + E when many paths merge into the same downstream services. [Source: darkinterview.com]
The interviewer may then ask:
"Can we avoid repeated work when many dependency paths share the same downstream service?"
For example, in the graph above, once you know D has load 2, you should propagate that total load to E and F once. You should not recursively reprocess the entire D subgraph separately for every path into D. [Source: darkinterview.com]
In a DAG, a service's load factor is the sum of the load factors of all reachable parent services that call it.
load[entrypoint] = 1
load[node] = sum(load[parent] for parent in reachable_parents[node])
If we process services in topological order, every parent has already contributed its final load before we process a service.
from collections import deque
def load_factors_topological(services: list[str], entrypoint: str) -> list[str]:
graph = parse_services(services)
if entrypoint not in graph:
return []
reachable: set[str] = set()
stack = [entrypoint]
while stack:
service = stack.pop()
if service in reachable:
continue
reachable.add(service)
for dependency in graph[service]:
stack.append(dependency)
indegree = {service: 0 for service in reachable}
for service in reachable:
for dependency in graph[service]:
if dependency in reachable:
indegree[dependency] += 1
loads = {service: 0 for service in reachable}
loads[entrypoint] = 1
queue = deque(
service
for service in reachable
if indegree[service] == 0
)
while queue:
service = queue.popleft()
for dependency in graph[service]:
if dependency not in reachable:
continue
loads[dependency] += loads[service]
indegree[dependency] -= 1
if indegree[dependency] == :
queue.append(dependency)
[
service (reachable)
]
Let V be the number of reachable services and E be the number of reachable dependency edges. [Source: darkinterview.com]
The V log V term is only for sorted output. If output order does not matter, the graph processing itself is O(V + E).
The dependency graph is acyclic, so at least one reachable node has indegree 0, and Kahn's algorithm can process the reachable graph in topological order.
When a service is popped from the queue, all reachable parents have already been processed. Its current loads[service] is therefore final. Propagating that final value to each dependency exactly models the service calling its dependencies once for every unit of load it receives. [Source: darkinterview.com]
Because each edge is processed once, shared downstream subgraphs no longer cause repeated traversal.
Some interviewers provide dependency edges directly:
edges = [
["A", "B"],
["B", "C"],
["B", "D"],
["C", "D"],
]
where [parent, child] means parent calls child. The algorithm is identical after graph construction. [Source: darkinterview.com]
Instead of returning all reachable services, the interviewer may provide:
targets = {"D", "F"}
Compute the same loads map, then return only target entries. Be careful not to skip non-target services during propagation, because non-target services may sit between the entrypoint and a target.
Another variant asks for the maximum load factor. After computing loads, return: [Source: darkinterview.com]
max(loads.values())
If the interviewer asks for the service as well, return the service name and load, using a deterministic tie-break such as lexicographic order.
One reported variant asks:
For each program, how many unique programs directly or indirectly depend on it? [Source: darkinterview.com]
That is close to this problem, but not identical. The load-factor version counts cumulative path-triggered calls, so the same downstream service can receive multiple units of load through different paths. The unique-dependents version counts distinct ancestors and usually needs sets or reverse-graph reachability.
Clarify which interpretation the interviewer wants before coding.