The RobotsDisallowed project is a harvest of the Disallowed directories from the robots.txt files of the world’s top websites. This list of Disallowed directories is a great way to supplement content discovery during a web security assessment, since the website owner is being kind enough to tell you where he doesn’t want you going.
It’s like a list of places you’re likely to find something interesting.

RobotsDisallowed ~ A harvest of the Disallowed directories from the robots.txt files of the world’s top websites.
The project:
So what we did is take the Alexa Top 1 Million Websites, download their robots.txt files, and then performed a bunch of cleanup on them (they are a mess) to make the list practical in web assessments.
How to use it:
You use the project by coming to the root and downloading the DisallowedDirectories files there. You can then plug them into your favorite web assessment tool/function, e.g., Burp Intruder.
If you want to see how the output is created, enter the ‘Code’ directory. There you can get the raw Alexa site list, the scripts that are used to download and manipulate the robots.txt files, etc.
Installation:
– git clone https://github.com/danielmiessler/RobotsDisallowed
– cd RobotsDisallowed
– cd code
– ./pullrobots.sh if you have privileges acces/root | sudo ./pullrobots.sh
– open robots folder to see.
Source : https://github.com/danielmiessler