SmokeyDope
- 29 Posts
- 353 Comments
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Do you quantize models yourself?English2·4 days agoThank you for deciding to engage with our community here! You’re in good company.
Kobold just released a bunch of tools for quant making you may want to check out.
I have not made my own quants. I usually just find whatever imatrix gguf bartowlski or the other top makers on HF release.
I too am in the process of upgrading my homelab and opening up my model engine as a semi public service. The biggest performance gains ive found are using CUDA and loading everything in vram. So far just been working with my old nvidia 1070ti 8gb card.
Havent tried vllm engine just kobold. I hear good things about vllm it will be something to look into sometime. I’m happy and comfortable with my model engine system as it got everything setup just the way I want is but I’m always open to performance optimization.
If you havent already try running vllm with its CPU nicencess set to highest priority. If vllm can use flash attention try that too.
I’m just enough of a computer nerd to get the gist of technical things and set everything up software/networking side. Bought a domain name, set up a web server and hardened it. Kobolds webui didnt come with https SSL/TLS cert handling so I needed to get a reverse proxy working to get the connection properly encrypted.
I am really passionate about this even though so much of the technical nitty gritty under the hood behind models goes over my head. I was inspired enough to buy a p100 Tesla 16gb and try shoving it into an old gaming desktop which is my current homelab project. I dont have a lot of money so this was months of saving for the used server class GPU and the PSU to run it + the 1070ti 8gb I have later.
The PC/server building hardware side scares me but I’m working on it. I’m not used to swapping parts out at all. when I tried to build my own PC a decade ago it didnt last long before something blew so there’s a bit of residual trauma there. I’m worried about things not fit right in the case, or destroying something or the the card not working and it all.
Those are unhealthy worries when I’m trying to apply myself to this cutting edge stuff. I’m really trying to work past that anxiety and just try my best to install the stupid GPU. I figure if I fail I fail thats life it will be a learning experience either way.
I want to document the upgrade process journey on my new self hosted site. I also want to open my kobold service to public use by fellow hobbyist. I’m not quite confident in sharing my domain on the public web though just yet I’m still cooking.
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Any recomendations for a text editor with good AI integration (not code editor)English1·6 days agoHave you by chance checked out kobold.cpp lite webUI? It allows some of what your asking for like RAG for worldbuilding, adding images for the llm to describe to add into the story, easy editing of input and output, lots of customization in settings. I have a public instance of kobold webui setup on my website and I’m cool with allowing fellow hobbyist using my compute to experiment with things. If your interested in trying it out to see if its more what youre looking for, feel free to send me a pm and I’ll send you the address and a api key/password.
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Any recomendations for a text editor with good AI integration (not code editor)English1·6 days agoIn an ideal work what exactly would you want an AI integrated text editor to do? Depending on what you need to have happen in your workflow you can automate copy pasting and automatic output logging with python scripts and your engines api.
Editing and audiing stories isnt that much different from auditing codebases. It all boils down to the understanding and correct use of language to convey abstraction. I bet tweaking some agebic personalities and goals in vscode+roo could get you somewhere
SmokeyDope@lemmy.worldto Selfhosted@lemmy.world•[SOLVED] ELI5: How to put several servers on one external IP?English3·9 days agoGood to hear you figured it out with router settings. I’m also new to this but got all that figured out this week. As other commenters say I went with a reverse proxy and configured it. I choose caddy over nginx for easy of install and config. I documented just about every step of the process. I’m a little scared to share my website on public fourms just yet but PM me ill send you a link if you want to see my infrastructure page where I share the steps and config files.
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Local Voiceover/Audiobook generationEnglish3·16 days agoNice post Hendrik thanks for sharing your knowledge and helping people out :)
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Local Voiceover/Audiobook generationEnglish2·16 days agoI once got kobold.CPP working with their collection of TTS model+ wav tokenizer system. Here’s the wiki page on it.
It may not be as natural as a commercial voice model but may be enough to wet your appetite in the event that other solutions feel overwhelmingly complicated
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Current best local models for tool use?English1·16 days agoWow this is some awese information Brucethemoose thanks for sharing!
I hope you dont mind if I ask some things. Tool calling is one of those things I’m really curious about. Sorry if this is too much please dont feel pressured you dont need to answer everything or anything at all. Thanks for being here.
I feel like a lot of people including myself only vaguely understand tool calling, how its supposed to work, and simple practice excersises to use it on via scripts and APIs. What’s a dead simple python script someone could cook to tool call within the openai-compatable API?
In your own words what exactly is tool calling and how does an absolute beginner tap into it? Could you clarify what you mean by ‘tool calling being built into their tokenizers’?
Would you mind sharing some sources where we can learn more? I’m sure huggingface has courses but maybe you know some harder to find sources?
Is tabbyAPI an engine similar to ollama, llama.cpp, ect?
What is elx2,3, ect?
SmokeyDope@lemmy.worldOPto Selfhosted@lemmy.world•Got any security advice for setting up a locally hosted website/external service?English31·17 days agoPangolin.
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•MistralAI releases `Magistral`, their first official reasoning models. magistral small 2506 released under apache 2.0 license!English4·18 days agoYes it would have been awesome of them to release a bigger one for sure :( At the end of the day they are still a business that needs a product to sell. I don’t want to be ungrateful complaining that they dont give us everything. I expect some day all these companies will eventually clam up and stop releasing models to the public all together once the dust settles and monopolies are integrated. I’m happy to be here in an era where we can look forward to open licence model released every few months.
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Current best local models for tool use?English3·19 days agoDevstral was released recently specifically trained for tool calling in mind. I havent personally tried it out yet but people say it works good with vscode+roo
SmokeyDope@lemmy.worldOPto Selfhosted@lemmy.world•Got any security advice for setting up a locally hosted website/external service?English5·19 days agoThanks for the input! I do eventually plan on making some scripts and a custom web interface to interact with/expose some local services on my network once I have the basics of HTML covered as part of a portfolio thing so would like to cover my ass early and not have problems later
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•I'm excited for dots.llm (142BA14B)!English4·20 days agoHavent heard of this one before now. It will be interesting to see how it actually performs. I didnt see what license the models will be released under hope its a more permissive one like apache. Their marketing should try cooking up a catchy name thats easy to remember. It seems they’re a native western language company so also hope it doesnt have too much random Chinese characters like qwen does sometimes
Ive never really gotten into MoE models, people say you can get great performance gains with clever partial offloading strategy between various experts. Maybe one of these days!
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•What to Integrate With My AIEnglish3·20 days agoIf your running into the issue of an app wanting an api key for your local ollamas openai-compatable web interface API and refuses to work without one, I found that any random characters work. If you port forward your host computer you should be able to access the webui interface on an external network using the public IP.
Heres the dead simple python program I used to send and recieve text to kobold.cpp engine through the web API. Not sure how similar ollama but afaik openai-compatable API means it all should works close to the same for compatibility(I think? lol!) if you give it a shot Make sure to set the .py file you make as executable and run it from a terminal doing ./filename.py to see the output in real time. It should make a log text file in same dir as the program too. Just use your host computers local ip if the python script pc is on same network.
spoiler
import requests # Configuration API_URL = "http://10.0.0.xx:5001/api/v1/generate" PROMPT = "Tell me a short story about a robot learning to dance." OUTPUT_FILE = "output.txt" # Define the API request data data = { "prompt": PROMPT, "max_length": 200, # Adjust response length "temperature": 0.7, # Control randomness (0=deterministic, 1=creative) "top_p": 0.9, # Focus on high-probability tokens } # Send the request to kobold.cpp response = requests.post(API_URL, json=data) if response.status_code == 200: # Extract the generated text result = response.json() generated_text = result["results"][0]["text"] # Save to a text file with open(OUTPUT_FILE, "w") as f: f.write(generated_text) print(f"Response saved to {OUTPUT_FILE}!") else: print(f"Error: {response.status_code} - {response.text}")
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•What to Integrate With My AIEnglish1·20 days agoVSCode + roo plugin seems to be all the hotness for coders leveraging ‘agenic teams’ so I spent a bit playing around with it. Most local models dont do tool calling very well I need to see if devstral works better without giving errors. I hear real professionals use claude API for that kind of stuff.
Im only vaguely familiar with getting computers to send, recieve, and manipulate data with eachother on a local network so got a very basic python script going pointed at kobold cpps openai-compatable API to send prompts and recieve repliesinstead of the default webui app just to learn how it works under the hood.
One of my next projects will be creating a extremely simple web based UI for my ereaders basic web browser to connect to. kobold has something similar with the /noscript subpage but even that is too much for my kobo reader. I intend to somehow leverage a gemtext to html proxy like ducking or newswaffle to make the page rendering output dead simple.
One of these days im going to get a pi zero and attach it to a relay and see if I can get a model to send a signal to turn a light on and off. Those home automation people with the smart houses that integrate llms into things look soo cool
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•My AI Skeptic Friends Are All NutsEnglish1·26 days agodeleted by creator
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Noob experience using local LLM as a D&D style DM.English3·27 days agoThanks for sharing your nice project ThreeJawedChuck!
I feel like a little bit of prompt engineering would go a long way.
To explain, a models base personality tends to be aligned into the “ai chat assistant” archetype. Models are encouraged to be positive yes-men with the goal of assisting the user with goals and pleasing them with pleasantry in the process.
They do not need to be this way though. By using system prompts you may directly instruct the model to alter its personality or directly instruct it on how to structure things. In this relevant context tell it something like
"You are a dungeon master with the primary goal of weaving an interesting and coherent story in the ‘dungeons and dragons’ universe. Your secondary goal is ensuring game rules are generally followed correctly.
You are not a yes-man. You are dominant and in control of the situation.You may argue and challenge users as needed when negotiating game actions.
Your players want a believable and grounded setting without falling into the tropes of main character syndrome or becoming Mary Sues. Make sure that their adventures remain grounded and the world their characters live in remains largely indifferent to their existance."
This eats into a little bit of context but should change things up a little.
You may make the model more creative and outlandish or more rigid and predictable by adjusting sampler settings.
Consider finding a PDF or an epub of an old DND manual, convert to text, and put into your engines rag system so it can directly reference DND rules.
Be wary of context limits. No matter what model makers tell you, 16-32k is a reasonable limit to expect when it comes to models keeping coherent track of things. A good idea is to keep track of important information you dont want the model to forget in a text file and give it a refresher on relevant context when it starts getting a little confused about who did what.
Chain of Thought reasoning models may also give an edge when it comes to thinking deeper about the story and how its put together interaction wise. But as a downside they take some extra time and compute to think about things.
I never tried silly tavern but know its meant for roleplaying with character cards. I always recommend kobold since I know most about it but theres more than one way to do things.
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•DeepSeek just released updated r1 models with 'deeper and more complex reasoning patterns'. Includes a r1 distilled qwen3 8b model boasting "10% improved performance" over originalEnglish21·1 month agoIm not really into politics enough to say much about accuracy of labels. Ive been on lemmy long enough to see many arguments and debates between political people about what being ‘anarchist’ or ‘leftist’ or ‘socialist’ really means, writing five paragraph heated essays back and forth over what labels properly define what concepts, so on and so forth.
It seems political bias are one of those things nobody can really agree with because it boils down to semantic/linguistic arguments for redefining ultimately arbitrary drawn labels. Also arguers being somewhat emotional about it since its largely dealing with subjective beliefs, and their own societal identity politics over which groups they want to be seen/grouped based off their existing beliefs, which can cause some level of mental gymnastics.
Its a whole can of worms that is a nightmare to navigate semantically in a traditional sense, let alone try to mathematically analyze through plotting data in matrixes and extract range values. I cant imagine how much of a headache it would be to figure out a numerical chart rating ‘political bias’ as a 0-100 percentile number in a chart like UGI. Political minded people cant really agree on what terms really mean, so the data scientist people trying to objectively analyze this stuff for llm benchmarking get shafted when trying to figure out a concrete measurement system. The 12Axes test they use is kind of interesting to read through in itself
SmokeyDope@lemmy.worldto Selfhosted@lemmy.world•Plex now want to SELL your personal dataEnglish32·1 month agoFrom what Ive seen in arguments about this, Plex generally is more accessible with QoL and easier to understand interface for non-techie people to share with family/friends. Something thats hard for nerdy people to understand is that average people are perfectly fine paying for digital goods and services. An older well off normie has far more money than sense and will happily pay premiums just to not have to rub two braincells together with setup or for a nicer quality of experience. If you figure out how to make a very useful plug-an-play service that works without the end user of average intelligence/domain knowledge stressing about how to set up, maintain, and navigate confusing layouts, you’ve created digital gold.
This isn’t the fault of open source services you can only expect so much polish from non-profit voulenteer. Its just the nature of consumer laziness/expectation for professional product standards and the path/product of least resistance.
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•DeepSeek just released updated r1 models with 'deeper and more complex reasoning patterns'. Includes a r1 distilled qwen3 8b model boasting "10% improved performance" over originalEnglish51·1 month agoI really hope the new r1 CoT reasoning patterns get trained into a mistral model, they’re the only ones I half count on for decent uncensored base models. Keep an eye on the UGI chart too if thats something you care about. The best uncensored model I ever tried was is beepo 22b IMO.
What does an MCP server do?