Read original ↗
newsReddit r/LocalLLaMATrust 52 · CommunityPublished 7d agoLive · 7d ago

What are people using for multi-model backends? What about swapping configs?

I am trying to plan and deploy a machine that serves models for coding, Hermes, and whatever else. It's got multiple GPUs in it, and I want the flexibility to run different configurations (i.e. I might want to run two smaller models when I'm using Hermes and doing some less-intensive coding, swap to one big model across multiple GPUs when only Hermes is running and I'm not using anything for coding, or swap to one larger model that is better at coding and tool c