Skip to content

Hybrid AI: local+ api mind #27

@Keeper888

Description

@Keeper888

I think that most people are proud and happy to support Proton, I personally love the page of request denied from police. It makes me feel safe that they don't just handle data to govs and authorities, this is the old spirit of what was internet once, it reminds me of early 2000s internet and spirit of "curiosity" and common shared knowledge.
I pay 5 business subscriptions to LLMs and lumo is the one I use when I'm high and paranoid, for the rest, it doesn't fit my user case and one of many for one reason. But I'm here and maybe I found a solution:

I don't want to sponsor anyone so I didn't put names, just: problem -> technical solution.

Meta connections in user cases

developers -> cli LLM.

Product Managers -> LLM with good MCP for online research or research in general

Researcher -> smoke devil's lettuce -> idea -> research assistant AI

  • You are privacy first: A
  • You need something that doesn't kill the budget: B
  • has a MASSIVE advantage for the user and it cost you nothing? C
  • how do you solve hardware compatibility? D
  • How would you do it? E
  • what are the possible scenarios of possibilities from a single implementation? F (is it F before E? I'm sure like I'm sure people are going to love it).

A privacy:

With this solution you decentralise 2/3 of the data and the user can be responsible for the data since it runs in the user's browser.

B cost/benefit:

You use the 3 agent system of Manus ( public research) and keep the mind agent as lumo and the 2 other agents are 500k-1b/2/5b (depends on user hardware a quick plug and play, you run a stress test for the local hardware and then select the right weights.)

You just created a strong reasoning and semi-looped agent (not real loop, just an agent with a todo list)

C advantages of small models:

Consider that specialised small models they hallucinate less and if you do a nice job with the training data or the rag or data quality in general, it performs equal or better than large one (in this specific case, do not crucify me, for a todo list and a guy that executes predictable tasks, it works better).

But it stills allows advanced multi steps reasoning and execution for the cost of 1 week of development, don't forget you don't pay the other 2 models because you use webLLM or something similar but it's open source so gg, fork and enjoy.
Wait! Are we forgetting privacy first?(A)

Hardware compatibility: it doesn't cost much (time) to create a daemon orchestrator for web/ios/android or just use Camel AI, open source.

D Technical implementation:

First of all, I want to say that this issue is written by somebody really high so appreciate that is not AI. You can say that I'm high but you can't deny that this solves a lot of problems.

3 agents (it's a public research)
Use Camel if you don't know how to make it/ just want MVP.
I personally would make the orchestrator.

WebLLM it's open source so all good and no need to reinvent the weel, you can fork it and gg.

Docker? No? Ok I skip this. It should be pretty obvious, kubernetes? Nooo, please , not even the authors probably know how to use it.

Data science and data collection for learning/rag/embedding or what you like the most:

Just pay a human for good data picky quality, small LLMs perform better because you give them a few stuff but precise and that's it, this is the source then you want to play with LoRa to make the investors happy ? Sure no problem. Ok I'm jocking I don't know but probably a good learning session plus rag updates works very well for MVP. My time deadlines are a bit unrealistic unless you find somebody that sits in front of a computer for 12h, I think maybe 2 weeks if you scrum this properly but hey, you just created privacy+decentralisation (the brain is still lumo server) +fits in the KPI(the investors are going to call and say "hey why you have more users online but still using old predictive data for usage and stays in the parameters?) and you say (no, emh a guy really high proposed a solution and what was needed was just pepperoni pizza, 8 cola, 12h, an hand cream for the butt that now should have the shape of the chair(not the opposite).

I think you can just toggle it, plus the stress test so you could use models in the range of 500k-4b (from smallest hardware to great performance hardware), don't forget it has to communicate with lumo, so you need fast stuff, I would push it for 2b max. Don't forget they are hyper specialised otherwise they do hallucinate.
So, xecution agents with bounded action spaces. Predictable , we are predictable as human, just narrow down for most frequent tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions