Web Scraping

A mechanize gem which provides the data extraction in a simple and easiest way in ruby on rails.

To use this gem, install it by:

gem install mechanize

than in your controller:

require 'mechanize'

agent = Mechanize.new # creates the mechanize object

doc = agent.get(“your_url”) #pass your url for which you want to extract the data

web_title = agent.page.title #this will give you title of the specified url

web_url = agent.page.uri.to_s.split("http://").to_s.split("/")[0] #this will give you website name e.g.www.abc.com

sometimes url contains https:// rather than http:// in that case use following code:

if web_url.to_s == "https:"
web_url = agent.page.uri.to_s.split("https://").to_s.split("/")[0]
html = agent.page.content
#this will give you contents of the whole page

The whole contents you can parse using hpricot so don't forget to add this line in your controller: 'require hpricot'
doc2 = Hpricot.parse(html)
(doc2/ :p).each do |link|
# this will give you all p tags of specified url
puts link.attributes

you would have inner_html using:
p = doc2/ :p
puts p.inner_html

That's it!!