2011-04-26

Finding the hidden

"if it is not on google it does not exist"
-- Some marketing guy at a company in Sweden who got booted from google.

A proposal for the W3C


I have been looking at different search engines and found that the king of kings is still Google, but there is a problem with Google: no one knows how it works, and they have a total control over the web, b/c "if you can not be found on google you are dead".

This seems OK for most of us, but some see it a big problem. So do I.

The proposal


I propose that the W3C put together a working group for "the federated search engine".

why you ask?

b/c it is more ethical to have a distributed web then a static one, the web we have is so static that it hurts to think about it.

There are already many meta search engines and open p2p engines. but they falter in that they are a) using the same old engines that are the "kings" of the web or b) they are unmoderated, and prone to manipulation.

What I propose is something radically different: a federated search engine, consisting of web crawler nodes that run on the engines, then the nodes can subscribe to other engines to got updates or query them for answers.

The thing with this is that you could use simple, already in use tech to do this. and the engines can blacklist and white list results according to patterns or methods.

each engine could have different methods of doing there crawl and prioritising of results. and different nodes can do different things, some only index universities an other only forums and so on.

and if a site wants to be indexed they can send an API request to the nodes to show that they exist. and if nodes decide that the request is spam or malice they can get blacklisted, but only on that node.

Full node search


Now then, how do we do queries without causing mayhem on the web? awnser: Full node search.

the full node search will query all the (known) nodes for an update, then do a regex of the data, and build a response to the query.

the full node search can (if wanted) send queries to the different nodes for them to pull the latest priorety list, count, black/white list, and meta data from the the node that requested the full node search update.

the white listing and black listing of sites can be done by the users, flagging and proboting.

the Process of flagging should be simple and not to complicated: just hit [flag] button by the link and you get asked "why?" and there is a drop down list of answers ("offensive" should never be on such lists), then you will be asked for your e-mail (this should not be mandatory) then the flag request is of.
to prevent the flagging from being exploited the node should track the flaggings and if there is a spike of flaggings of one site bar flagging of the site and SysOps are noted to look into the matter.

the same goes for whitlisting, or promoting, but here you have to log in, with OpenID or simmilar to shaw that you are a real porson, then you will be asked for your e-mail and then an e-mail is sent to confirm the request.
Same spike logic applies here as in the oxamplo of flagging.

I LOVE GOOGLE!

just saying.

they promote FLOSS and have a big role in the fight against OOXML in the ISO. even if it phaild it was a good effort that we all are grateful for.

The SOC projects are a grate initiative, and relay pushes FLOSS to become more friendly with to the devs.

Thank you Goggle.

Inga kommentarer:

Skicka en kommentar