Edges and RAG Server: A Journey in Node.js

For today's work, the big task was mostly implementing what I call edges.

What Is an Edge?

If you know what messages in a message queue system like RabbitMQ are, and you know what HTTP requests are, then you have enough to understand what an edge is.

An edge is a super-set of messages and requests, taken up to N-dimensions further. Like message queues, they are persistent until resolved.

Directed Edge: A message can be thought of as a directed edge in graph theory that connects two nodes.
Hyper-edge: In graph theory, a hyper-edge is a connection that spans a set of nodes. When hyper-edges are directed, they form a path. A request can be seen as an N=3 hyper-edge, where the starting and ending nodes happen to be the same.

In both messages and requests, each node can pass information to any forward node and receive information from any previous node. We often don’t think of a request as passing information from the first node to the last (itself) because of shared state, but that perspective becomes useful for persistent requests with fault tolerance.

This property is one of the main reasons why messages are preferred for fault-tolerant systems over requests—the assumption that state will persist in a process during the duration of a request isn’t guaranteed.

I had already written some code for this a few days ago, but today I made a client-side library and debugged the server’s code (the "server" here is analogous to RabbitMQ).

Node.js Client Library for Edges

The client library written for Node.js is kind of cool:

const edgeclient = require('edgeclient')({options});
// Options can be empty and are mostly used to provide an endpoint for the edge director

edgeclient.createService("llama3", edge => {
 if(edge.function === "prompt") return llama3prompt(edge);
});

async function llama3prompt(edge) {
 var prompt = edge.messages[0]; // Wish node had pattern matching
 var result = await somelongprocess(prompt);
 edge.write('-1', result);
 edge.end();
}

const edge = await edgeclient.createEdge(["llamaclient/start", "llama3/prompt", "lllamaclient/receive"]);
edge.write("1", "Farmer Brown had all brown farm animals. What did he say when he saw his chicken and his cow together in the barn?");
edge.end();

edgeclient.createService("llamaclient", edge => {
 if(edge.function === 'receive') {
  console.log(edge.messages[0]);
  // <= Expect: "Brown chicken brown cow!"
 }
});

So you get the gist. The service/function pattern is a little Erlang-ish—and that’s on purpose. Speaking of Erlang, it really makes me wish that Node.js had pattern matching.

Handling Multiple Incoming Bodies

One thing you have to consider is that you can have more than one incoming body. You need to address who you are writing a message to. Here are some shortcuts:

* or an empty string "": Message all.
"service": Message a specific service.
"Number": Target a specific index.
":Number": Use a relative index.
"-Number": Count relative to the end.

I probably didn’t use the best practices possible for this example. With multiple messages coming in various formats, pattern matching might be the right technique—but that’s tedious in Node.js. There are a few modules on npm, and maybe I should pick one up. Perhaps someone could even make an Erlang client library.

As you can see, we’re passing plain objects into write. This isn’t HTTP; there’s no reason for it to be text-centric.

Part 2: Building a RAG Server

Now for the second part of the day.

I spent some time making a simple server on my desktop to return an embedding for a piece of text. It’s basically just an edge client wrapper to Ollama.

The next step, which I only got halfway through, is making a RAG server. I’m going to be low-tech and use sqlite3. I created a function in C to perform Euclidean distance comparisons between blobs in sqlite (which are really float32 arrays). I also need to implement cosine distance but that’s almost the same thing.

I was already set up for this because I have the scoring function for GoatMatrix coded in C (as a sqlite3 extension). So I just dropped the function into the code and recompiled. It worked well.

On another note, GoatMatrix might be getting higher-dimensional. How many dimensions should we have? 32? 100,000? It probably doesn’t matter that much.

The hardest part remaining for setting up a RAG server is cutting up documents. I also found a better model for the embeddings: Nomic-embed-text.

Advantages of the Edge System

One of the nice aspects of the edge system is that you can have more than one service provider for a service. This is a common benefit of message queues. Additionally, each box involved doesn’t need a public IP. If my desktop can’t keep up, I can launch a Runpod for a few minutes to handle the load, or have certain requests run on Runpod if they require more VRAM than my desktop has. Runpod can launch periodically, so I don’t have to rent a cloud GPU 24/7—just queue jobs and spin up a service provider as needed.

Another advantage is that these services can be simpler than those in traditional message queue architectures. Normally, a service needs to message back and know whom to message, which means it needs some knowledge of your overall architecture. If that changes, I’d have to modify my code on Runpod. I just want a “dumb” service that works even if the architecture changes.

Traditional message queues often require you to whiteboard your design and stick to it—code in many places has to adhere to a single topology, which can lead to conflicts and make changes difficult. In contrast, the edge system is defined by a single line of code, so things always remain in self-agreement. Different lines of code can define entirely different topologies without conflict. There’s no need to code those topologies into different services or use branching flags to indicate chaining.

Performance

Is this system performant? Hell no. I haven’t invested as much time into it as I have with RabbitMQ. There are complexities that likely won’t be as friendly to performance hacking as the simple job of sending a plain message. Also, since this is still a prototype, I’m polling instead of pushing (but that could be fixed), and I’m implementing this over HTTP (which can also be improved).

For now, it’s really meant for long tasks that could timeout an HTTP request, where eventual resolution is needed and services may have varying levels of uptime (whether intentionally or unintentionally). It also allows publicly inaccessible hardware to contribute.

The last nice thing is that because it is a superset of both a message and a request, it’s easier to decide between the two or switch strategies without needing a completely different tech stack.

And that’s it for today!

Comment preview

[-]x0x70(0|0)

This was a test to see if chatgpt could add good markdown formatting to a blog post. To see what I actually wrote myself there is this version: https://goatmatrix.net/c/DevBlog/EaEyj5ZUtH

Do you think this is better or worse.

My big problem with it is the cheesiness. "Edges and RAG Server: A Journey in Node.js"

Just because I happen to code in nodeJS doesn't mean everything I do needs to be called a journey in NodeJS. One day we'll have AI that doesn't try hard to be a redditor.

parent linkreply

[-]HighQulaityDickPics0(0|0)

I like this one better. I like edges. I'm emo as fuck.

parent linkreply

Score		1
Age		1
Proximity		1
Bump		1