1 | ||
1 | ||
1 | ||
1 |
So I've been playing around with runpod.
I have another series of presentations I'm going to be giving where we might want to do a free coding session after some demonstrations.
I want to give people access to graphics cards. Instead of running a lot of very expensive ones on site I could spin up like 5 run pods, create a bunch of users on them, and hand out a lot of username password pairs, give people access for an hour to an hour 30.
So that would be like 30 people on 5 graphics cards. I think it will be fine.
But here is where runpod sucks, or maybe I suck because I refuse to properly learn docker.
If I spin a bunch of these up I can pause them, and then restart them. So I could get all of this set up manually and then relaunch it when I need it.
But.... for some reason it's probabilistic whether or not you can re-attach to a gpu from the restarted pod. And the probability is like 90% that you can't.
They say it's because you are provisioned on one physical box that has 4 gpus on it. If you pause someone else hops on the box. That's reasonable. But the new folks are going to get exclusive rights to one of those graphics cards.
Ok... So why not take all of my file off of that box onto one that has a graphics card available. Sure it might take longer to restart as the files are moving, or do some kind of copy on read.
The right way to do it, which is kind of lame IMO, is to make a public docker image on docker.io and reference it. I guess I can look up how to upload a private one to runpod. From what I can find it doesn't exist. But I might be blind. But I would want it to be private I guess because I want this to ping a server of mine so I can get all the usernames and passwords it created and more importantly the ssh endpoints for the servers so that I can generate a web page I can display to the room.
I guess the existence of my servers isn't that private because like I can just target goatmatrix and it's not like goatmatrix is a secret. It's just the principle of the matter. I shouldn't have to put all my code somewhere public to make something re-deployable and not leave an expensive server running indefinitely.
Probably I just need to learn how to use run pod correctly. But long story short, the pause and restart feature is basically not really a thing.
Plus some of these pods that use bigger models like flux take like 15 minutes to deploy and you might have media on that server. You don't want to redeploy.
Like if I wanted to make it so bots were accessible here, a feature you guys would only use every now and again, I'd probably have to access a SaaS product instead of having something custom. Something that could have been easy to start and stop now only makes sense as something to keep up permanently. Hence it makes more sense to access the SaaS that is up permanently.
So if I were to set that up and wanted to do custom models you might have to wait 15 minutes to get a response (or more, kind of ok), but then you might as well access it a lot before I close it.
Funny thing is this probably puts more load on their servers than my migration idea because the initialization of these new pods is mostly downloading things, from outside of their network.
That sounds super sick. Docker is a mystery wrapped in an enigma to me. So i am no help! lol. But from what it sounds like, your going to use a cloud service for GPU's to run the sea of turtles across the some models and give us limited access? I know that in the world of stable and generated content, a lot of creatives are renting their 4080's vs buying. Which i think is 'smart'. But that pushes the comphyUI experience into using things like google canvas (which i also dont know very well) to do shared workflows n stuff. i dunno, i'd like to look more into it but time and all. I look forward to more write ups like this!