• 18 Posts
  • 312 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle

  • I have been using deephermes daily. I think CoT reasoning is so awesome and such a game changer! It really helps the model give better answers especially for hard logical problems. But I don’t want it all the time especially on an already slow model. Being able to turn it on and off wirhout switching models is awesome. Mistral 24b deephermes is relatively uncensored, powerful and not painfully slow on my hardware. a high quant of llama 3.1 8b deephermes is able to fit entirely on my 8gb vram.




  • Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.

    Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.

    LLM wise Your phone can run 1-4b models, Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.

    Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.

    Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.


  • If you are asking questions try out deephermes finetune of llama 3.1 8b and turn on CoT reasoning with the special system prompt.

    Tap for spoiler

    You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

    It really helps the smaller models come up with nicer answers but takes them a little more time to bake an answer with the thinking part. Its unreal how good models have come in a year thanks to leveraging reasoning in context space.







  • The most useful thing my LLM has done is help me with hobbyist computer coding projects and to ask advanced stem questions. I try to use my llm to parse code that im unfamiliar with and to understand how the functions translate to actual things happening. I give it an example of functioning code and ask it to adapt the logic a certain way to see how it goes about it. I have to parse a large very old legacy codebase written in many parts by different people of different skill so just being able to understand what block does what is a big win some days. Even if its solutions aren’t copy/paste ready I usually learn quite a lot just seeing what insights it can gleam from the problem. Actually I prefer when I have to clean it up because it feels like I still did something to refine and sculpt the logic in a way the llm cant.

    I don’t want to be a stereotypical ‘vibe coder’ who copies and paste without being able to bug fix or understand the code their putting in. So I ask plenty of questions and read through its reasoning for thousands of words to understand the thought processes that lead to functioning changes. I try my best to understand the code and clean it up. It is nice to have a second brain help with initial boiler plating and piecing together general flow of logic.

    I treat it like a teacher and an editor. But its got limits like any tool and needs a sweet spot of context, example, and circumstance for it to work out okay,



  • Hi! So heres the rundown.

    You are going to need to be willing to learn how computer program services send text messages to eachother over open ports, how to call on a API in a programming script, and slowly piece together how to work with ollamas external API calling tool functions. Heres the documentation

    Essentially you need to

    1. learn how ollama external API works. How to send it text data using a basic program in python on an open port and recieve data back to put into a text file.

    2. learn how to make that python program pull weather and time data from openweather

    3. learn how to feed that weather and time data into ollama on an open port as part of a tool calling function. A tool call is a fancy system prompt that tells the model how to interface with the data in a well defined paratamized way. you say a keyword like get weather, it sends a request to your python program to get data from openweather and sends it back in way the llm is instructed to process.

    example: https://medium.com/@ismalinggazein/openai-function-calling-integrating-external-weather-api-6935e5d701d3

    Unless you are already a programmer who works with sending and recieving data over the internet to be processed, this is a non-trivial task that requires a lot of experimentation and getting your hands dirty with ports and coding languages. Im currently getting ready to delve into this myself so I know its all can feel overwhelming. Hope this helps.










  • They are similar and use some of the same underlying technology powered by the readability library, but newswaffle gives more options on how to render the article (article mode, link mode, raw mode), it isolates images and gives them their own external url link you can click on, it tells you exactly how much cruft it saved from original webpage (something about seeing 99.x% lighter makes my brain tingle good chemicals). It works well with article indexes. You can bookmark a newswaffle page to get reader view by default instead of clicking a button in firefox toolbar. Hope these examples help.