Pentesting and Security Personal Blog

Pentesting and Security Personal Blog



A simple web crawler, written with pentesting in mind and some hacks for smart crawling


  • Recursively crawls a given website upto specified depth, extracting all the hrefs of the same domain(or subdomains if specified)

  • Finds all the input and POST forms on the crawled webpages

  • Supports cookies

  • Reduces the total number of requests sent by only crawling an unique parameter once(explained below)

  • exclude/include subdomains


Of-course there are a lot of other crawlers available on the internet(Burp being the best imo), but they all have the problem of duplicate parameters, which sometimes puts them in a positing of infinite crawling

For eg, All of us have to face URLs like, and the "id" parameter can have a huge amount of values, it could go upto "id=99999" or more. Other crawlers would visit every single of those pages, treating each of them as a unique URL, which sometimes might generate a shitload of traffic, and an infinite crawling(which slows down overall crawling).

For that specific reason, this crawler was written and will detect duplicate parameters, and only visit a unique parameter once. Thus, it can crawl very large websites in a matter of minutes.


The main aim for this crawler is to gather as many GET and POST parameters, in a short amount of time. It does this by reducing the amount of URLs it has to visit by only visiting unique parameters,thus reducing the total URL's to crawl exponentially


Just clone this repo, and then install the requirements(or run the following commands)

git clone

cd SnCrawler/

pip install -r requirements.txt


To look at all the available options, just run the file with --help

$ python --help
usage: [-h] [-w ""] [-d DEPTH]
                  [-c "cooke1=val1; cookie2=val2"] [--subdomains]
                  [-e ""] [-v]
                  [-o /home/user/saveLocation.txt]

optional arguments:
  -h, --help            show this help message and exit
  -d DEPTH, --depth DEPTH
                        How many layers deep to crawl(defaults to 3)
  -c "cooke1=val1; cookie2=val2", --cookie "cooke1=val1; cookie2=val2"
                        The cookies to use(if doing authenticated crawling)
  --subdomains          To include subdomains
  -e "", --exclude ""
                        The url to exclude from being crawled(like logout
  -v, --verbose         To display verbose output
  -o /home/user/saveLocation.txt, --output /home/user/saveLocation.txt
                        The output file where you want to write the scraped
                        URL's(in Json)

required arguments:
  -w "http(s)://", --website "http(s)://"
                        The website you want to crawl

Here is how you can do a simple scaping of a website with the depth 2

python --website "" --depth 2

You can also specify if you want to include subdomains, by --subdomains argument

python -w "" --subdomain   #It will also crawl the subdomains now

By Default, it will display all the scraped URL's and POST parameters on the terminal itself, which can get pretty messy sometimes(especially for larger sites).

To cope with that, we have a -o option, which will write all the scraped URLs in the specified output file in Json(for easier parsing during later use).

python --website "http://domainToCrawl" -o "/home/user/output.txt"  #Will write all URLs to /home/user/out.txt

For cookies, you can specify the -c option. You can directly copy them from burp( or any other intercepting proxy )

python -w "" -c "cookie1=val1 ; cookie2=val2"  #Will send all requests with the cookie values

Sometimes, there are some URL's you wouldn't want the crawler to visit, like logout pages which might destroy your session. You can specify them with -e option. For multiple URLs, you can specify -e multiple times

python -w "" -c "cookie1=val1;" -e "http://domainToCrawl/logout" -e "http://domainToCrawl/destroy"   #It will not send request to both of these URLs

You can use the -v flag for displaying verbose output of every request being sent and data being parsed

python -w "" -v  #will display verbose information about every request being sent

You can find it here:


Thanks for reading!

maninwire =D

Know your enemy!

Written by in Tools on mié 09 agosto 2017. Tags: hacking, tool, crawler,