Today's work: Implementing EdgeDirector, EdgeClient, embedding server, and 20% of a RAG server

Announcements MatrixEvents Funny Videos Music Ancaps Technology Economics Privacy GIFS CringeAnarchy Film Pics Themes Ideas4Matrix AskMatrix Help Top Subs

Try our user generated app platform

DevBlog - - (+2|-0)

Today's work: Implementing EdgeDirector, EdgeClient, embedding server, and 20% of a RAG server(self)

1 commentbyx0x7

For today's work the big task was mostly implementing what I call edges.

What is an edge? If you know what messages in a message queue system like RabbitMQ are, and you know what HTTP requests are then you have enough to understand what an edge is.

An edge is a super-set of messages and requests and then taken up to N-dimensions further. Then like message queues they are persistent until resolved.

So a message can be thought of as a directed edge in graph theory. It connects two nodes. But there is another concept in graph theory called a hyper-edge. When hyper-edges are directed they are just a path through a set of nodes. A request can be thought of as an N=3 hyper-edge, where the starting node and ending node happen to be the same. The edges I created are just that at an arbitrary level.

Now for both messages and requests each node is able to pass information to any forward node, and any node can receive information from any previous node. We often don't think of a request as passsing information from the first node to the last node (itself) because in most conditions they already have shared state. But we would think that way if we wanted persistent requests with fault tolerance.

This property is one of the main reason why messages are preferred for fault tolerant systems over requests. The assumption that state will persist in a process during the duration that a request takes place isn't guaranteed.

I had already written some code for this a few days ago but today I made a client side library and debugged the server's code. The "server" is analogous to rabbitMQ here.

The client library written for nodeJS is kind of cool.

const edgeclient = require('edgeclient')({options});
//Options can be empty and is mostly used to provide an endpoint for the edge director

edgeclient.createService("llama3",edge=>{
 if(edge.function==="prompt") return llama3prompt(edge);
});

async function llama3prompt(edge) {
 var prompt = edge.messages[0]; //Wish node had pattern matching
 var result = await somelongprocess(prompt);
 edge.write('-1',result);
 edge.end();
}

const edge = await edgeclient.createEdge(["llamaclient/start","llama3/prompt","lllamaclient/receive"]);
edge.write("1","Farmer Brown had all brown farm animals.  What did he say when he say he saw his chicken and his cow together in the barn?");
edge.end();

edgeclient.createService("llamaclient",edge=>{
 if(edge.function==='receive') {
  console.log(edge.messages[0]);
  //<= Expect: "Brown chicken brown cow!"
 }
});

So you get the gist. The service/function pattern is a little Erlang-ish. That is on purpose. Speaking of Erlang this really makes me wish that nodeJS had pattern matching.

One thing you have to consider is that you can have more than one incoming body. You have to address who you are writing a message to. There are some shortcuts. "*" to message all. Empty "" to message all. "service" to message a specific service. "Number" to get a specific index. ":Number" for a relative index. "-Number" for relative to the end.

I probably didn't use the best practices possible for this example. With the multiple messages that could be sending in any format pattern matching is probably the right technique. But that's tedious in nodejs. There are a few modules on npm and maybe I should pick one up.

I supposed someone could make an Erlang client library.

As you can see we are passing plain objects into write. This is not HTTP. There is no reason for this to be text-centric.

Now for the second part of the day.

I spent it making a simple server on my desktop to return an embedding for a piece of text. It's basically just and edge client wrapper to ollama.

Then the next step that I only got 1/2 through is making a RAG server.

I'm going to be low-tech and just use sqlite3. I made a function in C for doing euclidian distance comparisons between blobs in sqlite that are really float32 arrays. I also need to implement cosign distance but that's almost the same thing.

I was already set up to do that because I have the scoring function for GoatMatrix coded in C (sqlite3 extension). So I just dropped the function into the code I have for that and recompiled. Worked well.

On another note GoatMatrix might be getting higher-dimensional. How many should we have? 32? 100,000? It probably doesn't matter that much.

Now honestly the hardest part remaining on having a RAG server set up is cutting up documents.

I also found a better model for the embeddings. Nomic-embed-text.

One thing that is nice about the edge system is that you can have more than one service provider for a service. This is a common nice thing about message queues. And also each box involved doesn't need a public IP. So at any point if my desktop can't keep up I can launch a runpod for a few minutes and just blow though. Or I can have certain requests only run on runpod that require more VRAM than my desktop has. And runpod can launch once an hour or whatever for a few minutes. Then I don't have to keep a cloud GPU rented 24/7. Queue jobs up and spin up a service provider whenever.

The other nice thing about these edges over message queues is that many of these services can be dumber than what can fit into a message queue architecture. Normally a service needs to message back. It needs to know who to message. That means it needs some level of knowledge about your general architecture. If that changes I'd have to modify my code on runpod. I just want a dumb service that works even if I change the architecture. The other nice thing verses message queues is that with that kind of system you do have to whiteboard your design and stick to it. Code in a lot of places need to adhere to it. Because your effective architecture depends on what a lot of code is doing in different places there are chances for conflicts. It also means you can't change it without changing a lot of code. There can be competing demands for different architectures.

Well in this case because it is designed by a single line of code things will always be in self agreement. And two different lines of code can define entirely different topologies and they aren't in conflict. You don't have to code those topologies into different services or use branching flags to indicate chaining somewhere else.

Now I talk a lot of talk. Is this system performant? Hell no. I have not put the time into this that has gone into RabbitMQ. There are also complexities that likely will not be as friendly to performance hacking as the simple job of a plain message. Also because this is in a prototype stage I'm also polling instead of pushing (but that could be fixed). And I'm implementing this over http (but that can also be fixed).

For now what it's really for is long tasks that could timeout an http request where you need it to resolve eventually and services could have varying levels of uptime (either intentionally or unintentionally), as well as allow publicly inaccessible hardware to contribute.

The last nice thing is that because it is a superset of a message and a request it is easier to decide between the two or switch. You don't need a completely different strategy or tech stack when picking between the two. Of course if you had a real need to pick message queues over requests then you have a performance need this won't satisfy in its current implementation. But that can be fixed if it were backed by messages. That's a V2. For now it lets me queue up jobs for AI to do and gets the result back onto a server or bot. So I'm happy.

IDK, maybe hyper-request is a cooler name.

Comment preview

[-]x0x70(+0|0)

Here is the chatgpt formatted version of this: https://goatmatrix.net/c/DevBlog/4zTzjEBMmg

parent linkreply

Score		1
Age		1
Proximity		1
Bump		1