Monday, August 06, 2007

What's in Google????????????

I remember search engines like Alta vista was rocking few years back. But slowly I could see how Google gained its popularity. Google is not merely moving towards “owning” the Internet but has also created a crazy dependency. But I truly feel it’s worth such dependency. I have wondered many times about various aspects of Google. At the same time, the little professor in me has raised so many technical questions.

Presently I am in the process of developing a website that more or less works like a search engine, but all together for a different purpose. I have found many practical difficulties while developing it. Like each time when I look up a huge repository, the Remote Method Invocation Interface that I use is pretty costly for every single hit .I wonder how efficiently the Google team would have worked to make it so fast and give us results in fraction of seconds. I was once discussing about this with my coworkers and happened to gather little information on the pigeon Algorithm that Google adopts for such adoring efficiency.

http://www.google.com/technology/pigeonrank.html

I am quite not convinced with the pagination aspect of it. As far as I have learnt so far, pagination is a concept of dividing the entire contents into several buckets where each bucket holds a defined set of contents. This means, Imagine I have 15 contents; I would prefer paginating it as 5 contents per page leading to 3 pages.

Observation from Google:

Imagine I get search results as 1-10 of 1,50,000 records in (0.03) seconds. Any layman would understand that he/she would land up with 1,50,000 links. But once I tried to traverse and found pagination gets stopped after 50 such pages. Assuming each page displays around 50 occurrences, a max of only 2500 records are shown to the user at a given search. I don’t question why it gets stopped there. I assume its not quite user friendly to provide till that last one and also as a matter of fact that the user would not have enough patience to look till the last search. Psychologically, after few page navigations, one would eventually start to use better combination of keywords to narrow down the search.

But the little professor in me raised these questions.

Does Google give the top frequently visited links? If so, does it have any metadata information for each such link regarding its number of hits based on which it decides the priority? Because for a keyword search, at any arbitrary amount of hits, it gives the same sequence of search results.

The other aspect of Google that I am fascinated about is, its refresh part. I read Google adopts AJAX (Asynchronous JavaScript and XML) for this.

For those of you who are new to AJAX, it is a web development technique used for creating interactive web applications. The intent is to make web pages feel more responsive by exchanging small amounts of data with the server behind the scenes, so that the entire web page does not have to be reloaded each time the user requests a change. This is intended to increase the web page's interactivity, speed, functionality, and usability.
(Description courtesy: wikipedia)

AJAX is nothing new. Though the name was coined in 2005, the technology that enabled AJAX started a decade earlier.
I once had a requirement of auto refresh functionality where in a particular action would be constantly called in a specific interval of time. I used java script in JSF initially. But then it was very costly and behaved little crazy too. Later I found AJAX is trendy in its very kind providing tags like JSF, struts and also its easier to use AJAX tags with JSF with any web/application server. My job was pretty simple just to configure AJAX and just use the tags. It has polling feature as part of the tag itself and helps refresh a specific part of the page.

Google has of late got tie ups with many web sites and they are buying up the best really

(http://en.wikipedia.org/wiki/List_of_Google_acquisitions

). As far as I have learnt so far, I know they are linked to bloggers, orkut and may be some more. Apart from that, is its wide variety of add on like maps, Google news, earth, mail. I realize single sign on is implemented as soon as a user logs into a Google related site or any of its associated sites.

This means when I log into the blogger site, a method of access control enables a user to authenticate once and gain access to the resources of multiple software systems. In plain terms when I hit the gmail in a different browser in my system, I am straight away taken to the inbox page since I have already logged into the blogegr site. If I logout from my maibox, I am also thrown out from the blogger site.

Single sign on is an excellent authentication mechanism that’s implemented in any giant organizations where a system is dedicated to a user. But this concept does not hold good and is not effective when a personal computer is meant for a whole family. Imagine I log into the blogger site to post something, quite sometime later when a new user happens to check his/her gmail , the user eventually lands up to my mailbox.
The security is ultimately lost and the very purpose of it is defeated in the first place.


I just wrote all that striked me when I think of Google. I would welcome anyone to share any ideas on the questions I have raised.


2 comments:

cm chap said...

Hats off to the little professor in you.. The same little professor in me nagged me a little while and I did similar R&D. well its worth to understand how Google create indexes which play a critical role in results... will try to talk abt it with u sometime...

Anonymous said...

Oi, achei teu blog pelo google tá bem interessante gostei desse post. Quando der dá uma passada pelo meu blog, é sobre camisetas personalizadas, mostra passo a passo como criar uma camiseta personalizada bem maneira. Se você quiser linkar meu blog no seu eu ficaria agradecido, até mais e sucesso.(If you speak English can see the version in English of the Camiseta Personalizada.If he will be possible add my blog in your blogroll I thankful, bye friend).