Binding: Paperback Dewey Decimal Number: 006.312 EAN: 9780596005771 Format: Illustrated ISBN: 0596005776 Label: O'Reilly Media, Inc. Manufacturer: O'Reilly Media, Inc. Number Of Items: 1 Number Of Pages: 424 Publication Date: November 01, 2003 Publisher: O'Reilly Media, Inc. Sales Rank: 153614 Studio: O'Reilly Media, Inc.
Product DescriptionThe Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented--you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:
Aggregate and associate data from disparate locations, then store and manipulate the data as you like
Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
Integrate third-party data into your own applications or web sites
Make your own site easier to scrape and more usable to others
Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day
Like the other books in O'Reilly's popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you're interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data.
Customer Reviews
Average Rating:
Rating: - Non-Fiction
Definitely a very useful book.
If you pick one of these books up, from this series, and learn just one thing, or something that saves you some time, it is well worth it.
I use a couple of things from here, or adaptations of constantly, and it helped with learning some new techniques.
Definitely recommended.
Rating: - One of My Favorite 'Hacks' Books
I bought this book shortly after it came out, and actually still refer to it from time to time. This is just another book that shows you how powerful Perl can be when in the right hands.
Rating: - Very good book
This book has a strong perl focus, so make sure you want to use perl. Otherwise, it's a great book with plenty of examples on integrating website data into your site.
Rating: - Perl-intensive book on web crawler design
A spider (also known as a web crawler or web robot) is a program which browses the World Wide Web in a methodical, automated manner. This book is about how to create programs that perform the functions of a web crawler, with most of the Hacks being written in Perl. Like the rest of the Hacks series, this book presents 100 bite-sized chunks of code or technique to tackle specific activities. In this book these range from the simple - how to download a set of image files - to the complex - cross-referring ... Read More
Rating: - what is in a name?
well, sometimes a generalizing lie.
.
IMHO, this book should have been named "(some) Spidering Hacks using Perl"
.
the "100" and "industrial strength" sale pitches they could have spared from the title as well
.
the very little python and java code that was either mentioned and/or included as code examples I think was as a way to pepper the content and apparently make it more appealing to a broader audience
.
._ the book is mostly about Perl scripts (you ... Read More