Router/README.md

107 lines
6.6 KiB
Markdown
Raw Normal View History

2024-09-07 08:46:27 -05:00
# Router
Hey there! This repo is an experiment to create a well-tuned and speedy URI router. There's only two main goals:
- It's fast
- It's simple
Prior to this proejct, I built the [SimpleRouter](https://github.com/splashsky/simplerouter) as a fork from a very cool project called the simplePHPRouter. That router was based on regex for parameterization, and the regex made it pretty painful to maintain if I ever left it alone for too long.
Since working on SimpleRouter, I've played with other languages (primarily Go) and found a few new ways of doing things.
## Methodology
Radix trees (tries, but I prefer the normal spelling) are wonderful mathematical constructs; the basic concept is that you have the root of a tree and branches (nodes) that have leaves (nodes). When you add a branch, this branch gets merged with existing branches if they match, and the leaves are still at the ends to be separated.
Take for example these routes:
```
/api/v1/hello
/api/v1/hi
/api/v1/hello/:param
/api/v2/no
/foo
```
A radix (and more specifically, a PATRICIA) trie takes the commonalities in these routes and makes them into nodes, or branches. `/` exists as the root node. `api/` is turned into a node, from which `v1/` and `/v2/no` branch. `hello` is taken as another branch with the `/` and `:param` child nodes. `/foo` is naturally it's only branch from the root.
By splitting these routes up into a trie based on their segments, you're able to iterate far more quickly through the tree to find what you're looking for. If a user then requests `/api/v1/hello/sky` the router can jump from the root, to `api/`, to `v/1`, to `hello/`, then to the final node much faster than if we had to chop up, say, an associative array and compare for every registered route.
The nodes can contain any arbitrary information, such as HTTP methods or handlers. From my experience, this method of lookup prefers specificity, and so it will always prefer the edges over the inner structures.
## Parameters
One flaw(-ish) of the SimpleRouter implementation (and many other implementations) is the use of regex as a way of identifying and extracting route parameters. As everyone knows, regex imposes time, overhead, and complexity to any system.
In order to circumvent this, we can rely on our node structure; if a node begins with our delimiter `:` then we can take the related segment from the URI and use that as a parameter, regardless of the value. This means we have extremely low overhead in the logic required to pull parameters from URIs.
## Performance
Of course, what good is a router that's slow? We need to be able to lookup routes and get the handler as quickly as possible. Now, you may note there are multiple routers here; these are implementations in their experimental phase to find the most memory and time efficient lookup operations possible.
For our benchmarks, which you can find in their respective files in [tests](tests/), we create a single instance of a router, load routes from the `.txt` files, write their respective arrays to `.txt` files in [storage](tests/storage/), then perform three iterations each; 10k, 100k, 1m requests. In these iterations, we pick a random URI from the full list, and have the router perform the lookup on that randomly selected URI. The test fails only if a `404` or `405` is returned.
Below are the results from our most rigorous tests; performing 1 million lookups on 1000 randomized routes with various lengths and parameters.
### SimpleRouter
This is an old project of mine and the first router I ever tried to write. Foundationally it relies on tokenizing an incoming URI and matching it to regex, then looking through the internal routes array.
```php
Running 1000000 iterations
(100000 lookups) M: 1846.2 kb - T: 32.6156370640 s
(200000 lookups) M: 1846.2 kb - T: 63.9784071445 s
(300000 lookups) M: 1846.2 kb - T: 96.9934570789 s
(400000 lookups) M: 1846.2 kb - T: 130.2443051338 s
(500000 lookups) M: 1846.2 kb - T: 161.8348190784 s
(600000 lookups) M: 1846.3 kb - T: 197.4232161045 s
(700000 lookups) M: 1846.1 kb - T: 231.8421580791 s
(800000 lookups) M: 1846 kb - T: 262.8337080479 s
(900000 lookups) M: 1846.2 kb - T: 296.1434569359 s
Time: 330.9394941330 s
Avg/lookup: 0.0003309396 s
```
Interestingly, it has the lowest memory cost of the current iterations, but the absolute highest total time and time per request. The time issue is likely due to hugely unoptimized tokenization.
### TrieRouter
This is my first iteration of a PATRICIA trie router in PHP. I don't think it's currently perfect, as we could probably work on storing nodes as bytes rather than strings, but it's a good proof of concept for a tree based mechanism.
```php
Running 1000000 iterations
(100000 lookups) M: 4718.3 kb - T: 0.0581219196 s
(200000 lookups) M: 4718.3 kb - T: 0.1310830116 s
(300000 lookups) M: 4718.3 kb - T: 0.1909840107 s
(400000 lookups) M: 4718.3 kb - T: 0.2500770092 s
(500000 lookups) M: 4718.3 kb - T: 0.3067679405 s
(600000 lookups) M: 4718.3 kb - T: 0.3660039902 s
(700000 lookups) M: 4718.3 kb - T: 0.4237358570 s
(800000 lookups) M: 4718.3 kb - T: 0.4837160110 s
(900000 lookups) M: 4718.3 kb - T: 0.5422408581 s
Time: 0.6060788631 s
Avg/lookup: 0.0000006061 s
```
You can immediately see a ***huge*** time difference from SimpleRouter. Responses are in microseconds rather than milliseconds, but we're using 3x+ as much memory. From experimentation (and you can see this in the [visualization](tests/storage/trie/big.txt)) that the trie method creates a gigantic number of child elements to store the handler for every endpoint.
### SegmentRouter
This second iteration is the first to achieve the best of both worlds; lower memory usage and lower time per request! In order to achieve this, we simply split routes into segments and store each segment as a node. This means that there are no extraneous child elements and navigating to an endpoint requires less effort. The [visualization](tests/storage/segment/big.txt) also shows how much simpler the tree is compared to TrieRouter.
```php
Running 1000000 iterations
(100000 lookups) M: 2891.8 kb - T: 0.0500328541 s
(200000 lookups) M: 2891.8 kb - T: 0.0995390415 s
(300000 lookups) M: 2891.8 kb - T: 0.1491589546 s
(400000 lookups) M: 2891.8 kb - T: 0.1987509727 s
(500000 lookups) M: 2891.8 kb - T: 0.2471258640 s
(600000 lookups) M: 2891.8 kb - T: 0.2962870598 s
(700000 lookups) M: 2891.8 kb - T: 0.3496289253 s
(800000 lookups) M: 2891.8 kb - T: 0.3990900517 s
(900000 lookups) M: 2891.8 kb - T: 0.4483740330 s
Time: 0.4971950054 s
Avg/lookup: 0.0000004973 s
```
Truly our most impressive show yet. By simplifying the structure of our tree and only storing what we need, we can achieve pretty incredible results in only 3 MB of RAM.