PROGRAMMER TUTORIALS
solutions to programmer problems

ASP
C#
C++
COBOL
Delphi
HTML
Java
J2EE
JavaScript
JSP
.NET
Perl
PHP
SQL
Visual Basic
XML
View Shopping Cart


Get a FREE Apple iPod Photo

  Books Spidering Hacks

Rating: 4 out of 5 stars - Good, but needs more variety of languages
Nearly all of the examples were written using Perl, but the few pages written with PHP contained some very useful nuggets!

I especially liked the use of the explode() function to split a table-formatted html report into multiple PHP array elements for individual processing. Now, if only the authors had included examples written for ASP, Cold Fusion, etc. they could have appealed to a much wider audience!



Rating: 4 out of 5 stars - Many examples of how to use spiders
The book has a nice collection of case studies on how to gather data from disparate websites. You might consider this as showing a simple way for you to use Web Services.

Spidering is the way that search engines gather their data. But you do not have to be Altavista or Google to use spiders. Nor do you have to be scanning a large fraction of the Web. The authors demistify spiders. If you can follow their examples, then you get concrete instances of usage that might help your particular application.

Thoughtfully, the examples are mostly written in Perl, with a few in Java. These languages should be familiar to many. Though even if you don't know them, the logic of the code can still be useful. (That is, you can treat the code as pseudocode.)

While spiders are probably best known as being used by search engines, they are really only the starting point for the latter. The much harder problems start when you have the data amassed by a spider. Now you have to efficiently find correlations between the various web pages. You should be aware that the book does not discuss these with any significant depth. Not surprising, because these are outside the scope of the book. The examples do show how to use the data found by spiders. But most of these are for web pages that sit in a given domain. So the pages are closely affiliated in content and structure.



Rating: 5 out of 5 stars - Lots of great ideas
Once in a long while you get a book that inspires you with a lot of great small ideas. Spidering Hacks is just that type of book. The web has a wealth of structured and semi-structured that is just waiting to be mined with automated tools. This book not only teaches you how to get the data out of these sources, but gives you idea about where to look for information and what to do with it.

This book demonstrates everything I like in a technical book. It not only describes how things are done. It also gives practical examples of how the technology can be useful in the real world, and presents them enthusiastically. It makes you want to go out and implement all of the ideas and to keep on going with some of your own.

Nitpicks I have with the book are minor. The 'Hacks' format seems imposed, for example, hack #8 is about installing CPAN. I don't think that section should be left out, but I don't think it's a hack either. But hey, I don't care that much about the structure as long as it isn't an imposing flaw and the content within the structure is great, as it is with this book.

Have to say, O'Reilly is on a roll with the Hacks series. They have all been fine books.



Rating: 5 out of 5 stars - Example-filled and easy-to-follow
The knowledgeable collaboration of Kevin Hemenway and Tara Calishain, Spidering Hacks: 100 Industrial-Strength Tips & Tools is an extensive, 402-page instructional guidebook and reference to Internet data retrieval through the use of spiders and scrapers. Including information on methodology, philosophies, and ethical considerations, as well as freely available modules, scripts, frameworks, and templates, information on how to build alternative interfaces to online databases, how to keep one's data current and share it in a user-friendly manner, and so much more, Spidering Hacks is an example-filled, easy-to-follow, highly recommended computer shelf resource.



Rating: 4 out of 5 stars - Rich samples, fit your specific needs if you're Perl lover
If you are a Perl lover and looking for a book to help you extracting contents from this huge resourceful Internet, this book quite fits your needs. Overall is good, the author shows you how to setup your spidering tools -- Perl modules. Yes, Perl, if you're Java folks, too bad. He shows you how to use Perl modules on crawling web pages, logging on to systems, extracting specific contents, and massaging data to your needs, across 100 different scenarios. Most of them are practical, but they don't cover much of the details, you have to read the programs listed in the book, which is quite painful for non-Perl people like me. In addition, it doesn't provide much of resulting screen shots after running the sample codes. Most importantly, the author tries to avoid the copyright questions by delegating URL links for readers to reference. In general, it's still a good tool book in spidering field.


page 2 of  3
 1  2  3 


2000-2006 ProgrammerTutorials.com


Top100WebShops.com