This Guy Has Built an Open Source Search Engine as an Alternative to Google in His Spare Time

noodle (he/him)@lemm.ee · 1 year ago

This Guy Has Built an Open Source Search Engine as an Alternative to Google in His Spare Time

darkphotonstudio@beehaw.org · 1 year ago

“Sign up for free access to this post”

No.

Chris@feddit.uk · 1 year ago

To save reading the paywalled article, the site is at https://stract.com

I’ve only done a single search but it gave me a summary at the top, and some discussion forums in a different format. I’m impressed so far!

luciole (he/him)@beehaw.org · 1 year ago

It’s a free account, like the one you made so you can write your comment. I’d hardly call it paywalled.

Chris@feddit.uk · 1 year ago

Tbh I just saw it needed a login and scrolled back up to the link without reading further, so was obviously a bit hasty in my assessment of it being a paywall.

Hawk@lemmy.dbzer0.com · 1 year ago

I did a few searches but had terrible results.

Searching for “Tokyo”, I got a summary about some Indonesian food chain. I had to scroll down quite a bit to get info about the city.

It looks interesting, but seems far from ready.

morethanevil@lemmy.fedifriends.social · 1 year ago

Hell no

Here is the source on github

Bloody Harry@feddit.de · 1 year ago

I don’t want any damn ~~vegetables~~ mail accounts.

Chris@feddit.uk · 1 year ago

I’ve just noticed there’s a Fediverse “optic”, so you can restrict it to Fediverse sites.

luciole (he/him)@beehaw.org · 1 year ago

For anyone wondering about how they’ll eventually address financial sustainability if Stract takes off:

Stract is currently not monetized in any way, but its website says it will eventually have contextual ads tied to specific search terms but that it will not track its users, which is similar to the system DuckDuckGo uses. Stract also plans on offering ad-free searches to paying subscribers.

I’d pay for independent, non meta, ad-free search. I bet a more straightforward approach is more energy efficient as well. In the meanwhile the big tech are running a gazillion processes on our data to suck every bit of wealth they can out of our existence through their free (in it’s littlest sense) products.

Corgana@startrek.website · 1 year ago

I’d pay for independent, non meta, ad-free search.

You might, but not enough people would to make it sustainable. Neeva was really well loved but couldn’t make the math work.

elvith@feddit.de · 1 year ago

I’d pay for independent, non meta, ad-free search.

Haven’t tested it yet, but have seen it mentioned several times here on Lemmy:

https://kagi.com/

luciole (he/him)@beehaw.org · edit-2 1 year ago

Kagi is a meta search engine though. They just do calls to Google, Yandex, Brave, etc. cut the ad rot and sprinkle some secret spice on top.

EDIT: source, https://help.kagi.com/kagi/search-details/search-sources.html

elvith@feddit.de · 1 year ago

Hmmmm I didn’t know that, every comment that I read, didn’t mention this fact. I’m running my own Searxng instance and Meta engines can be quite powerful, especially when you can adjust them a bit and filter out what you consider “spam” results (e.g. pinterest)

Melmi@lemmy.blahaj.zone · edit-2 1 year ago

Interestingly the source you linked says that they do have an in-house web index, they just use it alongside other sources rather than using it as their only source

debanqued@beehaw.org · 1 year ago

#DuckDuckGo makes the same claim as well. IMO it’s a great marketing tactic to say “we have our own crawler” to imply to people they will get some unique results-- but I’m not convinced that supplemental crawlers are significant. They are all too happy to rely on the crutch of the search engines they source from.

lemmyreader@lemmy.ml · 1 year ago

Yes, I’ve seen Kagi mentioned quite often here on Lemmy.

Though Kagi seems Tor unfriendlly maybe.

debanqued@beehaw.org · 1 year ago

indeed. I cannot reach this link from tor:

https://help.kagi.com/kagi/search-details/search-sources.html

Lionir [he/him]@beehaw.org · 1 year ago

I will say I’m pretty glad to see a search engine which actually is not just a meta search engine. I wish Kagi would attempt this rather than partnerning with Brave.

One thing I find odd though is why these engines trying to make their own index don’t do the adversarial strategy that Brave Search has done : while using other indexes, collect what people actually click on and use it in your own index. I will note that I do not support Brave.

Steve@communick.news · 1 year ago

It’s not just a meta search. They do have their own index. And Brave is only one of a dozen-ish external index’s they also use.

ioslife@lemmy.ml · 1 year ago

Yes. Kagi doesn’t partner with Brave. They use Brave’s search index.

Admiral Patrick@dubvee.org · edit-2 1 year ago

I found the GitHub for it: https://github.com/StractOrg/stract/tree/main

What I still can’t figure out (in my very shallow dive into the repo) is if it’s a meta search engine like Searx-ng or if it does its own crawling and builds its own search index.

I run Searx-ng and love it, but I’d be interested in a true self-hosted search (though I’d need to devote a lot of resources to build and run such an index).

Anyone know?

Update: Looks like it crawls and maintains its own index. From the credits/thanks at the bottom of the readme (emphases mine):

The commoncrawl organization for crawling the web and making the dataset readily available. Even though we have our own crawler now, commoncrawl has been a huge help in the early stages of development.

dblsaiko@discuss.tchncs.de · 1 year ago

From the readme, it uses its own index:

Fully independent search index.

Also here’s a related discussion: https://github.com/StractOrg/stract/discussions/136

debanqued@beehaw.org · 1 year ago

#YaCy is an open source crawler that you can run and feed Searx with. I recall some searx instances that run their own YaCy. YaCy can also share indexes with other YaCy instances.

Lionir [he/him]@beehaw.org · 1 year ago

For everyone complaining about 404media needing an account for the posts, they explain their reasoning here : https://www.404media.co/why-404-media-needs-your-email-address/

Evkob (they/them)@lemmy.ca · 1 year ago

They’re fully within their rights to restrict access to their content, just as everyone complaining is fully within their rights to not give up their email to access content.

I realize independent media financing is a huge struggle right now, and the quality of journalism has been in a downwards spiral for decades now. Clearly, the current system is unsustainable, I agree with 404media on that much. I wholeheartedly disagree with restricting access to information as a solution, as that seems completely opposed to what journalism should aim to achieve.

Lionir [he/him]@beehaw.org · 1 year ago

For most of its history, journalism has been locked behind a paywall. I think it’s a bit disingeneous to claim that this principle is against the idea of journalism. Journalism and especially good journalism is expensive - under a capitalist system, it’s entirely normal to ask for your work to be valued through monetary means.

That said, I’m most annoyed because no one is actually talking about Stract, just about how 404media decided to lock the article.

firewood010@lemmy.zip · 1 year ago

It worked in the history doesn’t mean it should be continued that way. Also neighbors and companies tended to share the same newspaper back then.

Writing was also a much rarer skill in the past.

ReallyZen@lemmy.ml · 1 year ago

Newpapers are available in public libraries

Evkob (they/them)@lemmy.ca · 1 year ago

We don’t live in history anymore, we live in the present. Our relationship to information and journalism is not the same as it was in the past, for better and for worse.

In the past, a typical individual would have access to maybe a handful of news sources. You’d pay for the printing and delivery of a physical newspaper and that was going to be the extent of the journalism you were exposed to. I don’t think it’s realistic to think one should subscribe to every news source they’re likely to encounter online. I’d also counter that radio journalism was one of the main sources of information in the 20th century and had no such paywalls.

That said, I’m most annoyed because no one is actually talking about Stract, just about how 404media decided to lock the article

You know how that could have been avoided? If the link actually contained any useful information about Stract instead of being a sign-up page :P

millie@beehaw.org · 1 year ago

Yeah, that’s an automatic no for me on all of their articles. I hope they eventually see posts like this and realize they’re shooting themselves in the foot.

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

CyberCatBytes@kbin.social · 1 year ago

Why?

millie@beehaw.org · 1 year ago

This seems cool, and it’s nice to see people creating alternatives to google, but I probably won’t end up using it.

Over the past few months I’ve tried both DuckDuckGo and Kagi. Both are decent for a lot of things, and Kagi has some really nice features, but in practice they’ve just taught me that I actually want my search engine to know a bit about me.

If I’m looking for something in the area on a google search, I can literally just search the thing. Google already knows where I am and knows what context I’m probably looking for, so it gets me to important results faster. While that might not be particularly useful for areas where Kagi’s tools shine (like research), it turns out that a ton of my searches are just basic stuff like looking for store hours and phone numbers. In both cases I found myself getting frustrated with not having google as my default, requiring a bunch of extra typing or a manual switch of search engines.

I’d love to get a viable replacement for google, but realizing how much my searching benefits from their massive pile of data on me, I don’t know that I’ll actually find one without that. It is nice to have an alternative if results get too personalized or if I want to check against like a baseline search, but search is the one place I’ve tried to get away from google that I keep going back.

I definitely am glad I got away from them for email and document storage, though.

🐠 tiago🍍@beehaw.org · 1 year ago

It’s the predicament between choosing convenience or privacy. Apart from local businesses, what other searches have you found are improved by them having your data? For me, it’s money exchange rates.

(What alternatives do you use for email and storage, though?)

millie@beehaw.org · 1 year ago

Searches that require some context are often a lot easier to find. Like, if I’m searching for something D&D related, I rarely have to specify that that’s what I’m looking for. If it’s on wikidot, it’ll come up right away. Even for pretty generic words like ‘web’ or ‘death’, it knows I’m looking for the spell on the one hand and the cleric domain on the other, just because I’ve searched for so much D&D stuff and done so over and over again.

For mail I use Proton, for backup I use iDrive. I’m pretty happy with both.

aldalire@lemmy.dbzer0.com · 1 year ago

I hope they make it self-hostable.

lemmyreader@lemmy.ml · 1 year ago

The open source SearXNG is good enough for me so far. Any reason to switch to Stract ?

Lionir [he/him]@beehaw.org · 1 year ago

Stract and SearXNG are two entirely different projects. SearXNG is just using other search engines to power itself - it’s known as a meta search engine. Stract has its own index that does not use other search engines to power itself.

smileyhead@discuss.tchncs.de · 1 year ago

If Stract is any good, it would be nice combo to make it work in SearXNG.

lemmyreader@lemmy.ml · 1 year ago

Exactly.

Fisch@lemmy.ml · 1 year ago

Another interesting open source search engine is Mwmbl

lemmyreader@lemmy.ml · edit-2 1 year ago

Thanks for sharing.

AVincentInSpace@pawb.social · 1 year ago

OMG YES! I was really bummed after Gigablast died. Here’s hoping it’ll be more useful than the open source search engines I’ve tried in the past.

madkarlsson@beehaw.org · 1 year ago

I find the amount of engineers who have built an “alternative to google on their spare time” truly fascinating. Because if you think that is possible, IMO, you have no idea what Google actually built.

Its just not a search engine, and also, as a search engine, it stopped being a good model to follow a decade ago.

Build on the ideas and build something new instead

lemmyreader@lemmy.ml · 1 year ago

Catches on : Top post on #explore on one Mastodon instance : https://hachyderm.io/@molly0xfff/111907956787854308