Router/README.md

# Router

Hey there! This repo is an experiment to create a well-tuned and speedy URI router. There's only two main goals:
- It's fast
- It's simple

Prior to this proejct, I built the [SimpleRouter](https://github.com/splashsky/simplerouter) as a fork from a very cool project called the simplePHPRouter. That router was based on regex for parameterization, and the regex made it pretty painful to maintain if I ever left it alone for too long.

Since working on SimpleRouter, I've played with other languages (primarily Go) and found a few new ways of doing things.

## Methodology

Radix trees (tries, but I prefer the normal spelling) are wonderful mathematical constructs; the basic concept is that you have the root of a tree and branches (nodes) that have leaves (nodes). When you add a branch, this branch gets merged with existing branches if they match, and the leaves are still at the ends to be separated.

Take for example these routes:
```
/api/v1/hello
/api/v1/hi
/api/v1/hello/:param
/api/v2/no
/foo
```

A radix (and more specifically, a PATRICIA) trie takes the commonalities in these routes and makes them into nodes, or branches. `/` exists as the root node. `api/` is turned into a node, from which `v1/` and `/v2/no` branch. `hello` is taken as another branch with the `/` and `:param` child nodes. `/foo` is naturally it's only branch from the root.

By splitting these routes up into a trie based on their segments, you're able to iterate far more quickly through the tree to find what you're looking for. If a user then requests `/api/v1/hello/sky` the router can jump from the root, to `api/`, to `v/1`, to `hello/`, then to the final node much faster than if we had to chop up, say, an associative array and compare for every registered route.

The nodes can contain any arbitrary information, such as HTTP methods or handlers. From my experience, this method of lookup prefers specificity, and so it will always prefer the edges over the inner structures.

## Parameters

One flaw(-ish) of the SimpleRouter implementation (and many other implementations) is the use of regex as a way of identifying and extracting route parameters. As everyone knows, regex imposes time, overhead, and complexity to any system.

In order to circumvent this, we can rely on our node structure; if a node begins with our delimiter `:` then we can take the related segment from the URI and use that as a parameter, regardless of the value. This means we have extremely low overhead in the logic required to pull parameters from URIs.


## Performance

Of course, what good is a router that's slow? We need to be able to lookup routes and get the handler as quickly as possible. Now, you may note there are multiple routers here; these are implementations in their experimental phase to find the most memory and time efficient lookup operations possible.

For our benchmarks, which you can find in their respective files in [tests](tests/), we create a single instance of a router, load routes from the `.txt` files, write their respective arrays to `.txt` files in [storage](tests/storage/), then perform three iterations each; 10k, 100k, 1m requests. In these iterations, we pick a random URI from the full list, and have the router perform the lookup on that randomly selected URI. The test fails only if a `404` or `405` is returned.

Below are the results from our most rigorous tests; performing 1 million lookups on 1000 randomized routes with various lengths and parameters.

### SimpleRouter

This is an old project of mine and the first router I ever tried to write. Foundationally it relies on tokenizing an incoming URI and matching it to regex, then looking through the internal routes array.

```php
Running 1000000 iterations
(100000 lookups) M: 1846.2 kb - T: 32.6156370640 s
(200000 lookups) M: 1846.2 kb - T: 63.9784071445 s
(300000 lookups) M: 1846.2 kb - T: 96.9934570789 s
(400000 lookups) M: 1846.2 kb - T: 130.2443051338 s
(500000 lookups) M: 1846.2 kb - T: 161.8348190784 s
(600000 lookups) M: 1846.3 kb - T: 197.4232161045 s
(700000 lookups) M: 1846.1 kb - T: 231.8421580791 s
(800000 lookups) M: 1846 kb - T: 262.8337080479 s
(900000 lookups) M: 1846.2 kb - T: 296.1434569359 s
Time: 330.9394941330 s
Avg/lookup: 0.0003309396 s
```

Interestingly, it has the lowest memory cost of the current iterations, but the absolute highest total time and time per request. The time issue is likely due to hugely unoptimized tokenization.

### TrieRouter

This is my first iteration of a PATRICIA trie router in PHP. I don't think it's currently perfect, as we could probably work on storing nodes as bytes rather than strings, but it's a good proof of concept for a tree based mechanism.

```php
Running 1000000 iterations
(100000 lookups) M: 4718.3 kb - T: 0.0581219196 s
(200000 lookups) M: 4718.3 kb - T: 0.1310830116 s
(300000 lookups) M: 4718.3 kb - T: 0.1909840107 s
(400000 lookups) M: 4718.3 kb - T: 0.2500770092 s
(500000 lookups) M: 4718.3 kb - T: 0.3067679405 s
(600000 lookups) M: 4718.3 kb - T: 0.3660039902 s
(700000 lookups) M: 4718.3 kb - T: 0.4237358570 s
(800000 lookups) M: 4718.3 kb - T: 0.4837160110 s
(900000 lookups) M: 4718.3 kb - T: 0.5422408581 s
Time: 0.6060788631 s
Avg/lookup: 0.0000006061 s
```

You can immediately see a ***huge*** time difference from SimpleRouter. Responses are in microseconds rather than milliseconds, but we're using 3x+ as much memory. From experimentation (and you can see this in the [visualization](tests/storage/trie/big.txt)) that the trie method creates a gigantic number of child elements to store the handler for every endpoint.

### SegmentRouter

This second iteration is the first to achieve the best of both worlds; lower memory usage and lower time per request! In order to achieve this, we simply split routes into segments and store each segment as a node. This means that there are no extraneous child elements and navigating to an endpoint requires less effort. The [visualization](tests/storage/segment/big.txt) also shows how much simpler the tree is compared to TrieRouter.

```php
Running 1000000 iterations
(100000 lookups) M: 2891.8 kb - T: 0.0500328541 s
(200000 lookups) M: 2891.8 kb - T: 0.0995390415 s
(300000 lookups) M: 2891.8 kb - T: 0.1491589546 s
(400000 lookups) M: 2891.8 kb - T: 0.1987509727 s
(500000 lookups) M: 2891.8 kb - T: 0.2471258640 s
(600000 lookups) M: 2891.8 kb - T: 0.2962870598 s
(700000 lookups) M: 2891.8 kb - T: 0.3496289253 s
(800000 lookups) M: 2891.8 kb - T: 0.3990900517 s
(900000 lookups) M: 2891.8 kb - T: 0.4483740330 s
Time: 0.4971950054 s
Avg/lookup: 0.0000004973 s
```

Truly our most impressive show yet. By simplifying the structure of our tree and only storing what we need, we can achieve pretty incredible results in only 3 MB of RAM.
Initial commit 2024-09-07 08:46:27 -05:00			`# Router`

huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00			`Hey there! This repo is an experiment to create a well-tuned and speedy URI router. There's only two main goals:`
			`- It's fast`
			`- It's simple`

			`Prior to this proejct, I built the [SimpleRouter](https://github.com/splashsky/simplerouter) as a fork from a very cool project called the simplePHPRouter. That router was based on regex for parameterization, and the regex made it pretty painful to maintain if I ever left it alone for too long.`

			`Since working on SimpleRouter, I've played with other languages (primarily Go) and found a few new ways of doing things.`

			`## Methodology`

			`Radix trees (tries, but I prefer the normal spelling) are wonderful mathematical constructs; the basic concept is that you have the root of a tree and branches (nodes) that have leaves (nodes). When you add a branch, this branch gets merged with existing branches if they match, and the leaves are still at the ends to be separated.`

			`Take for example these routes:`
			```
			`/api/v1/hello`
			`/api/v1/hi`
			`/api/v1/hello/:param`
			`/api/v2/no`
			`/foo`
			```

			A radix (and more specifically, a PATRICIA) trie takes the commonalities in these routes and makes them into nodes, or branches. `/` exists as the root node. `api/` is turned into a node, from which `v1/` and `/v2/no` branch. `hello` is taken as another branch with the `/` and `:param` child nodes. `/foo` is naturally it's only branch from the root.

			By splitting these routes up into a trie based on their segments, you're able to iterate far more quickly through the tree to find what you're looking for. If a user then requests `/api/v1/hello/sky` the router can jump from the root, to `api/`, to `v/1`, to `hello/`, then to the final node much faster than if we had to chop up, say, an associative array and compare for every registered route.

			`The nodes can contain any arbitrary information, such as HTTP methods or handlers. From my experience, this method of lookup prefers specificity, and so it will always prefer the edges over the inner structures.`

			`## Parameters`

			`One flaw(-ish) of the SimpleRouter implementation (and many other implementations) is the use of regex as a way of identifying and extracting route parameters. As everyone knows, regex imposes time, overhead, and complexity to any system.`

			In order to circumvent this, we can rely on our node structure; if a node begins with our delimiter `:` then we can take the related segment from the URI and use that as a parameter, regardless of the value. This means we have extremely low overhead in the logic required to pull parameters from URIs.


			`## Performance`

			`Of course, what good is a router that's slow? We need to be able to lookup routes and get the handler as quickly as possible. Now, you may note there are multiple routers here; these are implementations in their experimental phase to find the most memory and time efficient lookup operations possible.`

			For our benchmarks, which you can find in their respective files in [tests](tests/), we create a single instance of a router, load routes from the `.txt` files, write their respective arrays to `.txt` files in [storage](tests/storage/), then perform three iterations each; 10k, 100k, 1m requests. In these iterations, we pick a random URI from the full list, and have the router perform the lookup on that randomly selected URI. The test fails only if a `404` or `405` is returned.

Update readme with explanation and colors 2024-09-07 16:45:56 -05:00			`Below are the results from our most rigorous tests; performing 1 million lookups on 1000 randomized routes with various lengths and parameters.`

huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00			`### SimpleRouter`

			`This is an old project of mine and the first router I ever tried to write. Foundationally it relies on tokenizing an incoming URI and matching it to regex, then looking through the internal routes array.`

Update readme with explanation and colors 2024-09-07 16:45:56 -05:00			```php
huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00			`Running 1000000 iterations`
			`(100000 lookups) M: 1846.2 kb - T: 32.6156370640 s`
			`(200000 lookups) M: 1846.2 kb - T: 63.9784071445 s`
			`(300000 lookups) M: 1846.2 kb - T: 96.9934570789 s`
			`(400000 lookups) M: 1846.2 kb - T: 130.2443051338 s`
			`(500000 lookups) M: 1846.2 kb - T: 161.8348190784 s`
			`(600000 lookups) M: 1846.3 kb - T: 197.4232161045 s`
			`(700000 lookups) M: 1846.1 kb - T: 231.8421580791 s`
			`(800000 lookups) M: 1846 kb - T: 262.8337080479 s`
			`(900000 lookups) M: 1846.2 kb - T: 296.1434569359 s`
			`Time: 330.9394941330 s`
			`Avg/lookup: 0.0003309396 s`
			```

			`Interestingly, it has the lowest memory cost of the current iterations, but the absolute highest total time and time per request. The time issue is likely due to hugely unoptimized tokenization.`

			`### TrieRouter`

			`This is my first iteration of a PATRICIA trie router in PHP. I don't think it's currently perfect, as we could probably work on storing nodes as bytes rather than strings, but it's a good proof of concept for a tree based mechanism.`

Update readme with explanation and colors 2024-09-07 16:45:56 -05:00			```php
huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00			`Running 1000000 iterations`
			`(100000 lookups) M: 4718.3 kb - T: 0.0581219196 s`
			`(200000 lookups) M: 4718.3 kb - T: 0.1310830116 s`
			`(300000 lookups) M: 4718.3 kb - T: 0.1909840107 s`
			`(400000 lookups) M: 4718.3 kb - T: 0.2500770092 s`
			`(500000 lookups) M: 4718.3 kb - T: 0.3067679405 s`
			`(600000 lookups) M: 4718.3 kb - T: 0.3660039902 s`
			`(700000 lookups) M: 4718.3 kb - T: 0.4237358570 s`
			`(800000 lookups) M: 4718.3 kb - T: 0.4837160110 s`
			`(900000 lookups) M: 4718.3 kb - T: 0.5422408581 s`
			`Time: 0.6060788631 s`
			`Avg/lookup: 0.0000006061 s`
			```

			`You can immediately see a *huge* time difference from SimpleRouter. Responses are in microseconds rather than milliseconds, but we're using 3x+ as much memory. From experimentation (and you can see this in the [visualization](tests/storage/trie/big.txt)) that the trie method creates a gigantic number of child elements to store the handler for every endpoint.`

			`### SegmentRouter`

			`This second iteration is the first to achieve the best of both worlds; lower memory usage and lower time per request! In order to achieve this, we simply split routes into segments and store each segment as a node. This means that there are no extraneous child elements and navigating to an endpoint requires less effort. The [visualization](tests/storage/segment/big.txt) also shows how much simpler the tree is compared to TrieRouter.`

Update readme with explanation and colors 2024-09-07 16:45:56 -05:00			```php
huge big ol' updates; benches, tools, simplerouter, etc 2024-09-07 16:39:57 -05:00			`Running 1000000 iterations`
			`(100000 lookups) M: 2891.8 kb - T: 0.0500328541 s`
			`(200000 lookups) M: 2891.8 kb - T: 0.0995390415 s`
			`(300000 lookups) M: 2891.8 kb - T: 0.1491589546 s`
			`(400000 lookups) M: 2891.8 kb - T: 0.1987509727 s`
			`(500000 lookups) M: 2891.8 kb - T: 0.2471258640 s`
			`(600000 lookups) M: 2891.8 kb - T: 0.2962870598 s`
			`(700000 lookups) M: 2891.8 kb - T: 0.3496289253 s`
			`(800000 lookups) M: 2891.8 kb - T: 0.3990900517 s`
			`(900000 lookups) M: 2891.8 kb - T: 0.4483740330 s`
			`Time: 0.4971950054 s`
			`Avg/lookup: 0.0000004973 s`
			```

Update readme with explanation and colors 2024-09-07 16:45:56 -05:00			`Truly our most impressive show yet. By simplifying the structure of our tree and only storing what we need, we can achieve pretty incredible results in only 3 MB of RAM.`