Fetch all links from a page using Python

August 10, 2020 By globalsqa No comments yet

Question:

Fetch all links from google home page

Program:

import urllib2
import re

req=urllib2.Request('https://www.google.com')
#connect to a URL
website = urllib2.urlopen(req)

#read html code
html = website.read()

#use re.findall to get all the links
links = re.findall('"((http|ftp)s?://.*?)"', html)

for i in links:
    print i

import urllib2

import re

req=urllib2.Request('https://www.google.com')

#connect to a URL

website = urllib2.urlopen(req)

#read html code

html = website.read()

#use re.findall to get all the links

links = re.findall('"((http|ftp)s?://.*?)"', html)

for i in links:

print i

Explanation:

The urllib2 module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.

re module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings as well as 8-bit strings.

Output:

('http://schema.org/WebPage', 'http')
('https://www.google.co.in/imghp?hl=en&tab=wi', 'http')
('https://maps.google.co.in/maps?hl=en&tab=wl', 'http')
('https://play.google.com/?hl=en&tab=w8', 'http')
('https://www.youtube.com/?gl=IN&tab=w1', 'http')
('https://news.google.co.in/nwshp?hl=en&tab=wn', 'http')
('https://mail.google.com/mail/?tab=wm', 'http')
('https://drive.google.com/?tab=wo', 'http')
('https://www.google.co.in/intl/en/options/', 'http')
('http://www.google.co.in/history/optout?hl=en', 'http')
('https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.co.in/%3Fgfe_rd%3Dcr%26ei%3D83RYWbugK9WkvwT7j4SwCA', 'http')
('http://www.google.co.in/services/', 'http')
('https://plus.google.com/104205742743787718296', 'http')
('https://www.google.co.in/setprefdomain?prefdom=US&amp;sig=__xO-fFja9LlrL0EjCUtIDcyG3flI%3D', 'http')

('http://schema.org/WebPage', 'http')

('https://www.google.co.in/imghp?hl=en&tab=wi', 'http')

('https://maps.google.co.in/maps?hl=en&tab=wl', 'http')

('https://play.google.com/?hl=en&tab=w8', 'http')

('https://www.youtube.com/?gl=IN&tab=w1', 'http')

('https://news.google.co.in/nwshp?hl=en&tab=wn', 'http')

('https://mail.google.com/mail/?tab=wm', 'http')

('https://drive.google.com/?tab=wo', 'http')

('https://www.google.co.in/intl/en/options/', 'http')

('http://www.google.co.in/history/optout?hl=en', 'http')

('https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.co.in/%3Fgfe_rd%3Dcr%26ei%3D83RYWbugK9WkvwT7j4SwCA', 'http')

('http://www.google.co.in/services/', 'http')

('https://plus.google.com/104205742743787718296', 'http')

('https://www.google.co.in/setprefdomain?prefdom=US&sig=__xO-fFja9LlrL0EjCUtIDcyG3flI%3D', 'http')

Fetch all links from a page using Python

Related

Leave a ReplyCancel reply

Footer Widget Area 1

Footer Widget Area 2

Footer Widget Area 3

Footer Widget Area 4

Fetch all links from a page using Python

Sharing is Caring:

Related

Leave a ReplyCancel reply

Footer Widget Area 1

Footer Widget Area 2

Footer Widget Area 3

Footer Widget Area 4