• 29 Posts
  • 353 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle


  • Thank you for deciding to engage with our community here! You’re in good company.

    Kobold just released a bunch of tools for quant making you may want to check out.

    Kcpp_tools

    I have not made my own quants. I usually just find whatever imatrix gguf bartowlski or the other top makers on HF release.

    I too am in the process of upgrading my homelab and opening up my model engine as a semi public service. The biggest performance gains ive found are using CUDA and loading everything in vram. So far just been working with my old nvidia 1070ti 8gb card.

    Havent tried vllm engine just kobold. I hear good things about vllm it will be something to look into sometime. I’m happy and comfortable with my model engine system as it got everything setup just the way I want is but I’m always open to performance optimization.

    If you havent already try running vllm with its CPU nicencess set to highest priority. If vllm can use flash attention try that too.

    I’m just enough of a computer nerd to get the gist of technical things and set everything up software/networking side. Bought a domain name, set up a web server and hardened it. Kobolds webui didnt come with https SSL/TLS cert handling so I needed to get a reverse proxy working to get the connection properly encrypted.

    I am really passionate about this even though so much of the technical nitty gritty under the hood behind models goes over my head. I was inspired enough to buy a p100 Tesla 16gb and try shoving it into an old gaming desktop which is my current homelab project. I dont have a lot of money so this was months of saving for the used server class GPU and the PSU to run it + the 1070ti 8gb I have later.

    The PC/server building hardware side scares me but I’m working on it. I’m not used to swapping parts out at all. when I tried to build my own PC a decade ago it didnt last long before something blew so there’s a bit of residual trauma there. I’m worried about things not fit right in the case, or destroying something or the the card not working and it all.

    Those are unhealthy worries when I’m trying to apply myself to this cutting edge stuff. I’m really trying to work past that anxiety and just try my best to install the stupid GPU. I figure if I fail I fail thats life it will be a learning experience either way.

    I want to document the upgrade process journey on my new self hosted site. I also want to open my kobold service to public use by fellow hobbyist. I’m not quite confident in sharing my domain on the public web though just yet I’m still cooking.


  • Have you by chance checked out kobold.cpp lite webUI? It allows some of what your asking for like RAG for worldbuilding, adding images for the llm to describe to add into the story, easy editing of input and output, lots of customization in settings. I have a public instance of kobold webui setup on my website and I’m cool with allowing fellow hobbyist using my compute to experiment with things. If your interested in trying it out to see if its more what youre looking for, feel free to send me a pm and I’ll send you the address and a api key/password.



  • Good to hear you figured it out with router settings. I’m also new to this but got all that figured out this week. As other commenters say I went with a reverse proxy and configured it. I choose caddy over nginx for easy of install and config. I documented just about every step of the process. I’m a little scared to share my website on public fourms just yet but PM me ill send you a link if you want to see my infrastructure page where I share the steps and config files.




  • Wow this is some awese information Brucethemoose thanks for sharing!

    I hope you dont mind if I ask some things. Tool calling is one of those things I’m really curious about. Sorry if this is too much please dont feel pressured you dont need to answer everything or anything at all. Thanks for being here.

    I feel like a lot of people including myself only vaguely understand tool calling, how its supposed to work, and simple practice excersises to use it on via scripts and APIs. What’s a dead simple python script someone could cook to tool call within the openai-compatable API?

    In your own words what exactly is tool calling and how does an absolute beginner tap into it? Could you clarify what you mean by ‘tool calling being built into their tokenizers’?

    Would you mind sharing some sources where we can learn more? I’m sure huggingface has courses but maybe you know some harder to find sources?

    Is tabbyAPI an engine similar to ollama, llama.cpp, ect?

    What is elx2,3, ect?








  • Havent heard of this one before now. It will be interesting to see how it actually performs. I didnt see what license the models will be released under hope its a more permissive one like apache. Their marketing should try cooking up a catchy name thats easy to remember. It seems they’re a native western language company so also hope it doesnt have too much random Chinese characters like qwen does sometimes

    Ive never really gotten into MoE models, people say you can get great performance gains with clever partial offloading strategy between various experts. Maybe one of these days!


  • If your running into the issue of an app wanting an api key for your local ollamas openai-compatable web interface API and refuses to work without one, I found that any random characters work. If you port forward your host computer you should be able to access the webui interface on an external network using the public IP.

    Heres the dead simple python program I used to send and recieve text to kobold.cpp engine through the web API. Not sure how similar ollama but afaik openai-compatable API means it all should works close to the same for compatibility(I think? lol!) if you give it a shot Make sure to set the .py file you make as executable and run it from a terminal doing ./filename.py to see the output in real time. It should make a log text file in same dir as the program too. Just use your host computers local ip if the python script pc is on same network.

    spoiler
    import requests
    
    # Configuration
    API_URL = "http://10.0.0.xx:5001/api/v1/generate"
    PROMPT = "Tell me a short story about a robot learning to dance."
    OUTPUT_FILE = "output.txt"
    
    # Define the API request data
    data = {
        "prompt": PROMPT,
        "max_length": 200,      # Adjust response length
        "temperature": 0.7,     # Control randomness (0=deterministic, 1=creative)
        "top_p": 0.9,           # Focus on high-probability tokens
    }
    
    # Send the request to kobold.cpp
    response = requests.post(API_URL, json=data)
    
    if response.status_code == 200:
        # Extract the generated text
        result = response.json()
        generated_text = result["results"][0]["text"]
        
        # Save to a text file
        with open(OUTPUT_FILE, "w") as f:
            f.write(generated_text)
        print(f"Response saved to {OUTPUT_FILE}!")
    else:
        print(f"Error: {response.status_code} - {response.text}")
    

  • VSCode + roo plugin seems to be all the hotness for coders leveraging ‘agenic teams’ so I spent a bit playing around with it. Most local models dont do tool calling very well I need to see if devstral works better without giving errors. I hear real professionals use claude API for that kind of stuff.

    Im only vaguely familiar with getting computers to send, recieve, and manipulate data with eachother on a local network so got a very basic python script going pointed at kobold cpps openai-compatable API to send prompts and recieve repliesinstead of the default webui app just to learn how it works under the hood.

    One of my next projects will be creating a extremely simple web based UI for my ereaders basic web browser to connect to. kobold has something similar with the /noscript subpage but even that is too much for my kobo reader. I intend to somehow leverage a gemtext to html proxy like ducking or newswaffle to make the page rendering output dead simple.

    One of these days im going to get a pi zero and attach it to a relay and see if I can get a model to send a signal to turn a light on and off. Those home automation people with the smart houses that integrate llms into things look soo cool




  • Thanks for sharing your nice project ThreeJawedChuck!

    I feel like a little bit of prompt engineering would go a long way.

    To explain, a models base personality tends to be aligned into the “ai chat assistant” archetype. Models are encouraged to be positive yes-men with the goal of assisting the user with goals and pleasing them with pleasantry in the process.

    They do not need to be this way though. By using system prompts you may directly instruct the model to alter its personality or directly instruct it on how to structure things. In this relevant context tell it something like

    "You are a dungeon master with the primary goal of weaving an interesting and coherent story in the ‘dungeons and dragons’ universe. Your secondary goal is ensuring game rules are generally followed correctly.

    You are not a yes-man. You are dominant and in control of the situation.You may argue and challenge users as needed when negotiating game actions.

    Your players want a believable and grounded setting without falling into the tropes of main character syndrome or becoming Mary Sues. Make sure that their adventures remain grounded and the world their characters live in remains largely indifferent to their existance."

    This eats into a little bit of context but should change things up a little.

    You may make the model more creative and outlandish or more rigid and predictable by adjusting sampler settings.

    Consider finding a PDF or an epub of an old DND manual, convert to text, and put into your engines rag system so it can directly reference DND rules.

    Be wary of context limits. No matter what model makers tell you, 16-32k is a reasonable limit to expect when it comes to models keeping coherent track of things. A good idea is to keep track of important information you dont want the model to forget in a text file and give it a refresher on relevant context when it starts getting a little confused about who did what.

    Chain of Thought reasoning models may also give an edge when it comes to thinking deeper about the story and how its put together interaction wise. But as a downside they take some extra time and compute to think about things.

    I never tried silly tavern but know its meant for roleplaying with character cards. I always recommend kobold since I know most about it but theres more than one way to do things.


  • Im not really into politics enough to say much about accuracy of labels. Ive been on lemmy long enough to see many arguments and debates between political people about what being ‘anarchist’ or ‘leftist’ or ‘socialist’ really means, writing five paragraph heated essays back and forth over what labels properly define what concepts, so on and so forth.

    It seems political bias are one of those things nobody can really agree with because it boils down to semantic/linguistic arguments for redefining ultimately arbitrary drawn labels. Also arguers being somewhat emotional about it since its largely dealing with subjective beliefs, and their own societal identity politics over which groups they want to be seen/grouped based off their existing beliefs, which can cause some level of mental gymnastics.

    Its a whole can of worms that is a nightmare to navigate semantically in a traditional sense, let alone try to mathematically analyze through plotting data in matrixes and extract range values. I cant imagine how much of a headache it would be to figure out a numerical chart rating ‘political bias’ as a 0-100 percentile number in a chart like UGI. Political minded people cant really agree on what terms really mean, so the data scientist people trying to objectively analyze this stuff for llm benchmarking get shafted when trying to figure out a concrete measurement system. The 12Axes test they use is kind of interesting to read through in itself


  • From what Ive seen in arguments about this, Plex generally is more accessible with QoL and easier to understand interface for non-techie people to share with family/friends. Something thats hard for nerdy people to understand is that average people are perfectly fine paying for digital goods and services. An older well off normie has far more money than sense and will happily pay premiums just to not have to rub two braincells together with setup or for a nicer quality of experience. If you figure out how to make a very useful plug-an-play service that works without the end user of average intelligence/domain knowledge stressing about how to set up, maintain, and navigate confusing layouts, you’ve created digital gold.

    This isn’t the fault of open source services you can only expect so much polish from non-profit voulenteer. Its just the nature of consumer laziness/expectation for professional product standards and the path/product of least resistance.