nonoiseplease is an experiment to try an "enahnce" my web search results.
It lets you "index" pages from the web and makes them searchable through a full-text search engine.
Eventually, with an extension, it can show indexed pages onto google results.
I have always thought that every day I come across sections of the web that are particularly interesting, and I've always tried to save them for later use... in my bookmarks. However, the "later use" never actually came, and I kept thinking, "Damn, there must be some hidden gems buried deep in my browser bookmarks."
So, I decided to create a tool that could scrape and index my own web findings. Then, I realized that this tool could be useful for others as well. After all, I can't be the only one who still uses bookmarks to save everything in an unsorted manner.
Also, Google sometimes is shit and SEO is a bit annoying, that makes retrieving some niche page I find while surfing very difficult.
Also, AI will generate a ton of noise on the internet and i want to try to increase "signal" / "shit" ratio on my searches.
(please can you read on the website? i was repeating/fragmenting to much information, sorry)
3 main components
- backend
- meilisearc
- extension
let s split in actual backend (pocketbase) and frontend
god bless pocketbase, the right spot between simple, complex, extendible... also, coming from i bit of python, i really needed types
- it is still a giant mess (i cant write idiomatic go)
- i have a set of controller with their interfaces
- every api call is handled by WebController (NoNoiseInterface) and i listed in be.go
- WebController handle everything that is not handled by PocketBase exposed api
- mainly scrape, search, categorize pages, communication with extension
- WebController handle everything that is not handled by PocketBase exposed api
- pb_public and extension_template has ui
- serve_public handles templating
- luckly pocketbase has a giant set of hooks that i use, for example, to delete every page a user scraped when his record is deleted
i have no idea on who to do frontend, it s terrible
- there are a bunch of pages just for "info" (/why, /index, /alternatives)
- pages with more user interaction are a mix of go-templating and alpine-framework
- templated pages are me/account.html and register.html, are listed on backend/serve_public/template_renderer.go
- i like templating cause it can always be usefull, also the HTML sent to extension is templated (backend/extention_template/search.html)
- most is done with alpine cause the backend expose some api
- authenticated with jwt
- templated pages are me/account.html and register.html, are listed on backend/serve_public/template_renderer.go
- pages enables all function all the backend
- CRUD user
- scrape a webpage
- search/catogory scraped web pages
is the engine that provide full text search
- it comunicates only with the backend
- when a user registes, it gets its own index
- the index is kept synched at every operation
- every index stores the docs as full text, list of categories and id (to have a link ok the backend)
- it s the only element that stores the full text scraped
actually it s optional, it s just to avoid scraping from backend ip and serving results inside google, not much to explain
- inject HTML (dangerous) on google search, with results from a search on backend
- when you search on google, the extensione recognize that and does the same search on backend, which respondes with html
- allows you to upload you current page to backend
- simply by reading the source
- eventually:
- records the pages you visit locally (in background script) and send them to backend if you notice you found something interesting
- upload your bookmarks. they get buffered and patiently (slowly) scraped
- saves cookies to remember your options
I have
- some playwright (and other python) to test the website. Currently i dont test the extension.
- some data i test and develop on
- some shell to build, deploy, rollback almost autocatically But these are definetly not ready to be shared publicly
Right now, the main limitation is on the number of pages that can be scraped. I have set a limit of 5 pages per month.
This is because I don't want to get my ip blacklisted. As a makeshift, I am currently routing the requests through a proxy, but I am still working on a solution to this problem. Suggestion are welcome.
Also, I call "scaping" a simple GET request to the page (and so the text), so if you need to login to see the content, you can't scrape it (but you can use the extension).
Well, of course, if I want to index every web page uploaded, I think there is no alternative but to obtain the clear/plain text (note that this is also valid if you want to index personal data through the extension). In case there is a way to protect the indexed text as well, please let me know. Maybe, indexing can be optional.. dunno.
I ask only for an email and save only 2 cookies (that a know of): a JWT and an id.
There are no analitycs.
If you delete your account, it is actually deleted.
The extention is 1.6kB of js, you can look it up. (just rename the file from .xpi to .zip)
nonoiseplease is still missing some features, I know, but I'm working on it.
- browser extention
- fix the terrible UX and auth
- improve the "scraper"
- to get the transcript of a Youtube video
- find a way to reliably read PDF
- consider to add something like puppeteer/playwright
- make a clear rest API
- you can alerady try to reverse engeer the js (should not be too difficult)
- (in a far future) give you the option to make your personal index searchable by others
- cause maybe you are the top expert on a very little niche and you found interesting resources (that other may find interesting)
- with a clear distinction of what you are sharing / where are your searches going
- maybe also add a way to "group" pages (like a tag, but with a description)
- got to add a local client
- maybe gui (https://mattn.github.io/go-gtk/), maybe not
- in a far away future: write my own browser
- it might be an oversimplified perspective but, i love to have related things close togher but in a sort of hierarchy way
- the internet, with new info, in the cloud
- bookmarks, knowledge i m interested, in a browser (kind of like a bridge)
- books and notes, as consolidated knowledge, offline
- it might be an oversimplified perspective but, i love to have related things close togher but in a sort of hierarchy way