Router/README.md

6.4 KiB

Router

Hey there! This repo is an experiment to create a well-tuned and speedy URI router. There's only two main goals:

  • It's fast
  • It's simple

Prior to this proejct, I built the SimpleRouter as a fork from a very cool project called the simplePHPRouter. That router was based on regex for parameterization, and the regex made it pretty painful to maintain if I ever left it alone for too long.

Since working on SimpleRouter, I've played with other languages (primarily Go) and found a few new ways of doing things.

Methodology

Radix trees (tries, but I prefer the normal spelling) are wonderful mathematical constructs; the basic concept is that you have the root of a tree and branches (nodes) that have leaves (nodes). When you add a branch, this branch gets merged with existing branches if they match, and the leaves are still at the ends to be separated.

Take for example these routes:

/api/v1/hello
/api/v1/hi
/api/v1/hello/:param
/api/v2/no
/foo

A radix (and more specifically, a PATRICIA) trie takes the commonalities in these routes and makes them into nodes, or branches. / exists as the root node. api/ is turned into a node, from which v1/ and /v2/no branch. hello is taken as another branch with the / and :param child nodes. /foo is naturally it's only branch from the root.

By splitting these routes up into a trie based on their segments, you're able to iterate far more quickly through the tree to find what you're looking for. If a user then requests /api/v1/hello/sky the router can jump from the root, to api/, to v/1, to hello/, then to the final node much faster than if we had to chop up, say, an associative array and compare for every registered route.

The nodes can contain any arbitrary information, such as HTTP methods or handlers. From my experience, this method of lookup prefers specificity, and so it will always prefer the edges over the inner structures.

Parameters

One flaw(-ish) of the SimpleRouter implementation (and many other implementations) is the use of regex as a way of identifying and extracting route parameters. As everyone knows, regex imposes time, overhead, and complexity to any system.

In order to circumvent this, we can rely on our node structure; if a node begins with our delimiter : then we can take the related segment from the URI and use that as a parameter, regardless of the value. This means we have extremely low overhead in the logic required to pull parameters from URIs.

Performance

Of course, what good is a router that's slow? We need to be able to lookup routes and get the handler as quickly as possible. Now, you may note there are multiple routers here; these are implementations in their experimental phase to find the most memory and time efficient lookup operations possible.

For our benchmarks, which you can find in their respective files in tests, we create a single instance of a router, load routes from the .txt files, write their respective arrays to .txt files in storage, then perform three iterations each; 10k, 100k, 1m requests. In these iterations, we pick a random URI from the full list, and have the router perform the lookup on that randomly selected URI. The test fails only if a 404 or 405 is returned.

SimpleRouter

This is an old project of mine and the first router I ever tried to write. Foundationally it relies on tokenizing an incoming URI and matching it to regex, then looking through the internal routes array.

// big routes
Running 1000000 iterations
(100000 lookups) M: 1846.2 kb - T: 32.6156370640 s
(200000 lookups) M: 1846.2 kb - T: 63.9784071445 s
(300000 lookups) M: 1846.2 kb - T: 96.9934570789 s
(400000 lookups) M: 1846.2 kb - T: 130.2443051338 s
(500000 lookups) M: 1846.2 kb - T: 161.8348190784 s
(600000 lookups) M: 1846.3 kb - T: 197.4232161045 s
(700000 lookups) M: 1846.1 kb - T: 231.8421580791 s
(800000 lookups) M: 1846 kb - T: 262.8337080479 s
(900000 lookups) M: 1846.2 kb - T: 296.1434569359 s
Time: 330.9394941330 s
Avg/lookup: 0.0003309396 s

Interestingly, it has the lowest memory cost of the current iterations, but the absolute highest total time and time per request. The time issue is likely due to hugely unoptimized tokenization.

TrieRouter

This is my first iteration of a PATRICIA trie router in PHP. I don't think it's currently perfect, as we could probably work on storing nodes as bytes rather than strings, but it's a good proof of concept for a tree based mechanism.

Running 1000000 iterations
(100000 lookups) M: 4718.3 kb - T: 0.0581219196 s
(200000 lookups) M: 4718.3 kb - T: 0.1310830116 s
(300000 lookups) M: 4718.3 kb - T: 0.1909840107 s
(400000 lookups) M: 4718.3 kb - T: 0.2500770092 s
(500000 lookups) M: 4718.3 kb - T: 0.3067679405 s
(600000 lookups) M: 4718.3 kb - T: 0.3660039902 s
(700000 lookups) M: 4718.3 kb - T: 0.4237358570 s
(800000 lookups) M: 4718.3 kb - T: 0.4837160110 s
(900000 lookups) M: 4718.3 kb - T: 0.5422408581 s
Time: 0.6060788631 s
Avg/lookup: 0.0000006061 s

You can immediately see a huge time difference from SimpleRouter. Responses are in microseconds rather than milliseconds, but we're using 3x+ as much memory. From experimentation (and you can see this in the visualization) that the trie method creates a gigantic number of child elements to store the handler for every endpoint.

SegmentRouter

This second iteration is the first to achieve the best of both worlds; lower memory usage and lower time per request! In order to achieve this, we simply split routes into segments and store each segment as a node. This means that there are no extraneous child elements and navigating to an endpoint requires less effort. The visualization also shows how much simpler the tree is compared to TrieRouter.

Running 1000000 iterations
(100000 lookups) M: 2891.8 kb - T: 0.0500328541 s
(200000 lookups) M: 2891.8 kb - T: 0.0995390415 s
(300000 lookups) M: 2891.8 kb - T: 0.1491589546 s
(400000 lookups) M: 2891.8 kb - T: 0.1987509727 s
(500000 lookups) M: 2891.8 kb - T: 0.2471258640 s
(600000 lookups) M: 2891.8 kb - T: 0.2962870598 s
(700000 lookups) M: 2891.8 kb - T: 0.3496289253 s
(800000 lookups) M: 2891.8 kb - T: 0.3990900517 s
(900000 lookups) M: 2891.8 kb - T: 0.4483740330 s
Time: 0.4971950054 s
Avg/lookup: 0.0000004973 s

Truly our most impressive show yet.