Experiments to make the easiest but fastest URI router possible!
Go to file
2024-09-07 16:45:56 -05:00
tests huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00
LICENSE Initial commit 2024-09-07 08:46:27 -05:00
other.php huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00
README.md Update readme with explanation and colors 2024-09-07 16:45:56 -05:00
Router.php huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00
SegmentRouter.php huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00
SimpleRouter.php huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00
TrieRouter.php huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00

Router

Hey there! This repo is an experiment to create a well-tuned and speedy URI router. There's only two main goals:

  • It's fast
  • It's simple

Prior to this proejct, I built the SimpleRouter as a fork from a very cool project called the simplePHPRouter. That router was based on regex for parameterization, and the regex made it pretty painful to maintain if I ever left it alone for too long.

Since working on SimpleRouter, I've played with other languages (primarily Go) and found a few new ways of doing things.

Methodology

Radix trees (tries, but I prefer the normal spelling) are wonderful mathematical constructs; the basic concept is that you have the root of a tree and branches (nodes) that have leaves (nodes). When you add a branch, this branch gets merged with existing branches if they match, and the leaves are still at the ends to be separated.

Take for example these routes:

/api/v1/hello
/api/v1/hi
/api/v1/hello/:param
/api/v2/no
/foo

A radix (and more specifically, a PATRICIA) trie takes the commonalities in these routes and makes them into nodes, or branches. / exists as the root node. api/ is turned into a node, from which v1/ and /v2/no branch. hello is taken as another branch with the / and :param child nodes. /foo is naturally it's only branch from the root.

By splitting these routes up into a trie based on their segments, you're able to iterate far more quickly through the tree to find what you're looking for. If a user then requests /api/v1/hello/sky the router can jump from the root, to api/, to v/1, to hello/, then to the final node much faster than if we had to chop up, say, an associative array and compare for every registered route.

The nodes can contain any arbitrary information, such as HTTP methods or handlers. From my experience, this method of lookup prefers specificity, and so it will always prefer the edges over the inner structures.

Parameters

One flaw(-ish) of the SimpleRouter implementation (and many other implementations) is the use of regex as a way of identifying and extracting route parameters. As everyone knows, regex imposes time, overhead, and complexity to any system.

In order to circumvent this, we can rely on our node structure; if a node begins with our delimiter : then we can take the related segment from the URI and use that as a parameter, regardless of the value. This means we have extremely low overhead in the logic required to pull parameters from URIs.

Performance

Of course, what good is a router that's slow? We need to be able to lookup routes and get the handler as quickly as possible. Now, you may note there are multiple routers here; these are implementations in their experimental phase to find the most memory and time efficient lookup operations possible.

For our benchmarks, which you can find in their respective files in tests, we create a single instance of a router, load routes from the .txt files, write their respective arrays to .txt files in storage, then perform three iterations each; 10k, 100k, 1m requests. In these iterations, we pick a random URI from the full list, and have the router perform the lookup on that randomly selected URI. The test fails only if a 404 or 405 is returned.

Below are the results from our most rigorous tests; performing 1 million lookups on 1000 randomized routes with various lengths and parameters.

SimpleRouter

This is an old project of mine and the first router I ever tried to write. Foundationally it relies on tokenizing an incoming URI and matching it to regex, then looking through the internal routes array.

Running 1000000 iterations
(100000 lookups) M: 1846.2 kb - T: 32.6156370640 s
(200000 lookups) M: 1846.2 kb - T: 63.9784071445 s
(300000 lookups) M: 1846.2 kb - T: 96.9934570789 s
(400000 lookups) M: 1846.2 kb - T: 130.2443051338 s
(500000 lookups) M: 1846.2 kb - T: 161.8348190784 s
(600000 lookups) M: 1846.3 kb - T: 197.4232161045 s
(700000 lookups) M: 1846.1 kb - T: 231.8421580791 s
(800000 lookups) M: 1846 kb - T: 262.8337080479 s
(900000 lookups) M: 1846.2 kb - T: 296.1434569359 s
Time: 330.9394941330 s
Avg/lookup: 0.0003309396 s

Interestingly, it has the lowest memory cost of the current iterations, but the absolute highest total time and time per request. The time issue is likely due to hugely unoptimized tokenization.

TrieRouter

This is my first iteration of a PATRICIA trie router in PHP. I don't think it's currently perfect, as we could probably work on storing nodes as bytes rather than strings, but it's a good proof of concept for a tree based mechanism.

Running 1000000 iterations
(100000 lookups) M: 4718.3 kb - T: 0.0581219196 s
(200000 lookups) M: 4718.3 kb - T: 0.1310830116 s
(300000 lookups) M: 4718.3 kb - T: 0.1909840107 s
(400000 lookups) M: 4718.3 kb - T: 0.2500770092 s
(500000 lookups) M: 4718.3 kb - T: 0.3067679405 s
(600000 lookups) M: 4718.3 kb - T: 0.3660039902 s
(700000 lookups) M: 4718.3 kb - T: 0.4237358570 s
(800000 lookups) M: 4718.3 kb - T: 0.4837160110 s
(900000 lookups) M: 4718.3 kb - T: 0.5422408581 s
Time: 0.6060788631 s
Avg/lookup: 0.0000006061 s

You can immediately see a huge time difference from SimpleRouter. Responses are in microseconds rather than milliseconds, but we're using 3x+ as much memory. From experimentation (and you can see this in the visualization) that the trie method creates a gigantic number of child elements to store the handler for every endpoint.

SegmentRouter

This second iteration is the first to achieve the best of both worlds; lower memory usage and lower time per request! In order to achieve this, we simply split routes into segments and store each segment as a node. This means that there are no extraneous child elements and navigating to an endpoint requires less effort. The visualization also shows how much simpler the tree is compared to TrieRouter.

Running 1000000 iterations
(100000 lookups) M: 2891.8 kb - T: 0.0500328541 s
(200000 lookups) M: 2891.8 kb - T: 0.0995390415 s
(300000 lookups) M: 2891.8 kb - T: 0.1491589546 s
(400000 lookups) M: 2891.8 kb - T: 0.1987509727 s
(500000 lookups) M: 2891.8 kb - T: 0.2471258640 s
(600000 lookups) M: 2891.8 kb - T: 0.2962870598 s
(700000 lookups) M: 2891.8 kb - T: 0.3496289253 s
(800000 lookups) M: 2891.8 kb - T: 0.3990900517 s
(900000 lookups) M: 2891.8 kb - T: 0.4483740330 s
Time: 0.4971950054 s
Avg/lookup: 0.0000004973 s

Truly our most impressive show yet. By simplifying the structure of our tree and only storing what we need, we can achieve pretty incredible results in only 3 MB of RAM.