Understanding Web Scraping

1 Comment

Since the evolution of world-wide-web, the world has become a small village and we refer it to ‘Global Village’. No doubt, the impact of web on personal and commercial life is very prominent and it has played a vital role in the rapid growth of information. The upbringing of web went through very innovative hands and minds. In today’s world, where we retrieve and share information with the world for several purposes, we prefer using web since it is fastest and easily accessible to more people.


Web-scraping is generally referred to scripts that surf web and collect information in certain format, for instance XML. A real life example of web-scraping is crawler. Web-scraping is another form of retrieval of data from web. It is a technique through which we can collect useful and meaningful information that we may require for any purpose.

For example, if we need to start a service to provide latest happenings in town, and we do not have interest in investing our time and money into news investigation. We can use web for retrieval of data that is required for our service. Simply we can scrap what information we require from web and use it, and run our own service.


For a business recording sales trend in a certain region, it might be easy to scrap information from web rather than generating data by them self. Similarly, developers who find other companies sharing same interest may would like to scrap some information from other web-sites and add them to their listing in order to make their system effective.

For example, a search engine, in addition to its crawler activity on world-wide-web, may have a dedicated crawler over popular search engines to gather more and effective information for its users.


Like scraping is as easy as piece of a cake for developers and maybe sometimes very unpleasant for company owners whoose sites are being scraped. Preventing scraping; anti-scraping techniques can be adapted to ensure your information is private. One of the technique used is ‘CAPTCHA: Telling Humans and Computers Apart Automatically‘. It helps preventing bots accessing private regions of your web-site.


I must not forget to mention in bold that web-scraping is ILLEGAL without permission, and it will might sometime get you in trouble. Similarly, if you own a service that provides original information, no other person can scrap your original data. To know more about your *rights*, *web-crawlers* and other terms read in this post, stay tuned. :-)



Utilizing 64-bit Capability



I just came across a thought, how to utilize a 64-bit processor. One of such a piece is at Pi-Labs, may be. [CTO’s PC].

So, the thought is if we design a simple application on 64-bits processor that runs with exactly two instances. Both instances are made to work on separate 32 bits, that is one on lower 32-bits and other on higher 32-bits. After this, we can run 32-bits Virtual Machine Software on each of these instances, or let these instances handle 32-bits Operating Systems them selves.

In this way, we have two Operating Systems running at the same time, yet using full 32-bits resources, each of them. For instance, one of them runs Windows7, for Windows Phone7 Development and other for H/Macintosh for iPhone Development, and all we have to do is, press [Windows + Alt + Tab] to switch between them.

The important thing here is that, we do not waste any resources and utilizing hardware capabilities to its level best. The point of discussion with readers, is to identify the possible architectural problems in afore mentioned thought and to get to know, if such a thing already exists in the real world.

Thank You for reading.