pythonreconnaissancescanner ebook

Introduction to the Course

Welcome to the Course! Let’s say that you want to gather information about a particular website for some kind of security analyst job. So what you can do is you can go in the terminal and you can start gathering all the information and it’s gonna take a lot of time, you know finding the IP address of the website, getting a Nmap scan, the robots.txt, the whois and this is what I was actually doing up into a little bit ago and I was like you know every single website I scan , I am doing the same thing over and over again. So, why not just build a tool in Python that allows me to do it all in a single click. So again what the tool is going to do is that you have to just type the website url like “facebook.com” and hit GO, it will grab all the information for you. So it’s pretty cool. So we will be scanning the website and storing the result. People probably doesn’t have MySQL installed on their system so I will teach you how to save the data in a simple text file which is easily readable to each and everyone.

General.py

We will create a new file called “general.py”. Inside here what I am going to do is I am going to make two really simple functions to first create a directory and another one to just write to a file. So then whenever we start building our little tools then we can save the results easily using this file. Let’s start by importing os! => import os

Basically now I want to create a function to make a new directory because let’s say that we have a bunch of target or scanning a bunch of websites, I wanna store all the results in their own directory. So I have a directory for youtube, ebay and etc etc. => def create_dir( directory ): Now we have to check if the directory already exists or not. So let’s say we are scanning a list of 100 websites, maybe we already have some websites scanned and we do not want to scan them again. => if not os.path.exists ( directory ): So basically we are only gonna create this folder if it is not created yet. Simple Enough! So, => os.makedirs ( directory ) So let’s say that we pass “ebay” So it’s gonna say that did we create this folder yet? No! then I am gonna create it. If it is already created then you don’t have to do anything. That was the easiest function in the world. So this next one is just to write a simple file so I am just gonna call it write_file. => def write_file ( path, data ): Path is where do you want to write it and data is what do you want to write. So first thing I am going to do is I will open a file with path in a writing mode. => f = open ( path, ‘w’ ) => f.write ( data ) => f.close ()

So all this does is, we are going to pass in the path essentially where we want to write it, what folder, what location and also what you want in the file and that is all we need to do for the “general.py”. In the next tutorial we are going to start with the fun stuff and making the actual tools. Full source code for general.py

Top Level Domain Name

All right guys! Welcome back and I am gonna show you how to get the top level domain for a website. Now if you don’t know what the top level domain is, it’s basically a small part of the URL. Let’s understand this by an example. “https://www.facebook.com/” => This is a simple url or the full url but when we talk about the top level domain name then it only talks about “facebook.com”. Not the protocol, not the www, not the directory at the end, it’s only “facebook.com” in this case.

https://www.facebook.com/

At first I thought user is going to post a URL and then we are just going to rip the extra part which is not needed. So for this we are going to use a Python module. First let’s open a terminal and try “whois” command. => whois https://www.facebook.com It will not show the result.

You can easily see the error “No whois server is know for this kind of object”. This only works with a top level domain name. Now let’s try with a top level domain. => whois facebook.com Now it will show all the results. So now let’s get to work. We will create a new file called “domain_name.py”. You need to go ahead and import “tld” and from “tld” we can import “get_tld” => from tld import get_tld If you don’t know how to install this then you can do a “pip” or a manual installation. Let’s see how you can install “pip” and “tld”. => sudo apt-get install python-pip


So this has successfully installed pip and now we will install tld using pip. => pip install tld

This has installed the Python module “tld”. Let’s get back to the domain_name.py file. Now I am going to make a function called “get_domain_name” and pass in the url. => def get_domain_name ( url ):

So essentially what the user is going to pass in is the full url. So now we are going to rip the extra part from the full url to get the top level domain. => domain_name = get_tld ( url ) This only accepts a single parameter which is the full url of the website and then we are just going to return the top level domain i.e domain name. => return domain_name So again this function right here, what it does is that you pass in an url and it gives you the plain top level domain name and just so that we can verify it, if we just run => print ( get_domain_name ( ‘https://www.facebook.com’ ) ) Alright let’s run this real quick and check it out. So we just passed in the full url and it returned the top level domain name. Now we can allow the user to pass in any url and we can extract the top level domain, looking good, see you guys in the next tutorial.

Full code for “domain_name.py”


IP Address

Now that we have the top level domain of the target, what we can do now is we need to get the IP Address of that website and I will show you guys what I mean. Now I am pretty sure that there is an easy way to do this but this is how I do it. So in the terminal if you type => host facebook.com

or any other top level domain and hit enter, what this does is, it returns the IP Address. Now the thing is we just can’t take these results and store them in a text file because we are only worried about the IP Address , not the whole result. So what I am going to do is run this command through Python and then we are going to extract the IP Address from the whole result. Let’s make a new file “ip_address.py”.

We are going to import os which allows us to make operating system calls and allows us to use the command line or the terminal through Python. => import os => def get_ip_address ( url ): We are passing an argument which is the top level domain name. Now the command that we are going to run is : => command = “host “ + url Now what we are going to do is, in order to actually run that command and get the results back we are going to pretty much open up a new process. => process = os.popen ( command ) So this is going to run a new process, just think of it like running or opening a new terminal. We are storing the result in the variable called “process”. So now what we need to do after that is we actually need to work on removing the extra part from the result as we only need the IP Address. We are going to write : => results = str ( process.read () ) All we are doing here is actually just converting it to a string. Now what I am going to do from here is, I will make a marker like this : => marker = results.find ( ‘has address’ ) + 12

Let’s understand what this method does. This will look into the string ‘results’ and will find the index of ‘has address’. It will return the index of first character of the string. So now we will need to move 12 characters ahead so that we can reach at the starting of the IP Address that we are finding. => return results[marker:].splitlines()[0] The reason I am doing this is because let’s say that we have a domain name and it has multiple IP Addresses, like google.com => host google.com ( inside terminal )

We do not want all the IP Addresses , we only want the top one. So we are using a method split lines to give us only the top level IP Address. So now, let’s verify whether this works. => print ( get_ip_address ( ‘google.com’ ) ) => print ( get_ip_address ( ‘ facebook.com’ ) ) Let’s run this in the terminal.

We have got the IP Address of ‘google.com’ and ‘facebook.com’. So it does not matter if the result is one IP Address or more, we wrote a method in Python that will only extract the top level IP Address of the website. We can now use this in our other scanning tools. So see you in the next section. Source code for “ip_address.py”

Nmap Port Scan

Alright guys, So now that we have the IP Address of a target, of a server whatever, what I want to do now is I want to show you guys how to run a Nmap Scan from Python. If you don’t know what Nmap is, it is a tool that allows you to scan a server and find out what processes are running and what ports are open. So for example => nmap -F 54.186.250.79 ( terminal )

Now you can see the results. This server is running ssh, http and https. This can also tell you if server is running FTP or MySQL. For example if they have a database running on it and a bunch of other good information. But what we want to do is we want to run this from Python. There are bunch of options through which we can run this tool, so we are going to have an additional parameter for the options. So the function we are going to create will take two arguments, the first one is any option that user wants to use and the second one is the target IP. Let’s make a new file called “namp.py”. => import os => def get_nmap ( options, url ): So we are actually going to be passing in IP Address and the options to this function as parameters.

=> command = “nmap “ + options + “ “ + ip Now take a look, what if they don’t include any options, that is going to be fine because “options” will be an empty string and it will run the command without any option. Next thing we want to do is ofcourse to start a new process. => process = os.popen( command ) => results = str( process.read () ) So here we are building the process and then converting it to a string. Now we only have to return those results. => return results You can actually parse the results if you want, but there is no special need so I am returning the whole result as it is. Let’s verify whether this works or not. => print( get_nmap( ‘-F’, ‘54.186.250.9’ ) )

So it runs a scan and it returns the result. Later on we are going to save all this to a text file. For now, this is how we run Nmap Scan with Python. I will see you in the next section. Source Code of “nmap.py”

Robots.txt

Alright guys, welcome back and in this video I am going to show you how to build a Python tool to scan for a “robots.txt” file. Now if you don’t know what robots.txt file is, it’s this! Whenever you make a website a bunch of search engines like google, yahoo. They are going to crawl your website and that’s how with the crawler they will go page by page and store in their search engine. So whenever people type in the website name, all the results pop up for profile’s page, forum etc etc. Now the problem with this is whenever you are developing a website there are some pages that you don’t want google or yahoo to crawl. Some examples of this page would be the admin login page, maybe some sensitive areas or maybe some moderator panels. So lot of the private areas of the website, you want to make sure that google does not crawl. So what you can do is you can make a special file called robots.txt and you can upload this

to your server and usually what web developer do is they list all the files that they do not want google to crawl and then google ignores them. Now the cool thing is whenever you are analysing a website for security issues, one of the first file that you always go to is that “robots.txt” file. Why is that? So, if the developers said Hey google don’t crawl these because the people shouldn’t be looking at them. So we can look at the file and make sure these are the areas which are sensitive. So let’s make a new file called “robot_txt.py”. => import urllib.request All this does is, it allows us to use or make a request to a url like get request. So basically it downloads files from the internet. We also need to import a package called “io”. => import io This is just for encoding so that we can ensure we are getting our data in a readable format. => def get_robots_txt ( url ): So here we are going to pass in a URL => if url.endswith ( ‘/’ ): => path = url => else

=> path = url + “/” So what we basically did is, if the url ends with a forward slash then it will remain as it is and if it does not then we will add a forward slash at the end of the url. So now we have to make the request to that file. So we are going to request a file from the internet. => req = urllib.request.urlopen( path + “robots.txt”, data = None ) So this is the function, it is going to open this file i.e robots.txt and it is going to store the result in the variable. Now we have to make sure that our data is encoded properly. => data = io.TextIOWrapper( req, encoding = ‘utf-8’ ) Now let’s just return whatever those results were. => return data.read() So again, all we are doing is passing a url and we are getting robots.txt file from that website and then we are returning the data. Let’s verify if it works or not. => print ( get_robots_txt( ‘https://www.reddit.com/’ ) )

https://www.reddit.com/

So this is reddit.com’s robots.txt. So there you go, looking beautiful and you guys can play with this for different websites. In the next section I am going to show how to get whois for a website. See you next time Source code for “robots_txt.py”

Whois

Welcome back and in this section I am going to show you how to get the “whois” of a website domain. Now if you don’t know what “whois” is, this is a tool to give you information about who registered a domain name. So if you write any domain name i.e the top level domain name it will give you the required information. We also did this some tutorials back. => whois facebook.com ( terminal )

It will give you information about who registered the domain, who they registered it through and if you didn’t choose domain name privacy then it will also give information like your phone number, their address and a bunch of personal information.

So another thing I want to point out, whenever you are making a website, you buy a domain name, there is always going to be an option that says : Do you want to buy domain name privacy?. It’s going to be like 10 bucks a year or something but it’s worth it because you don’t have to show your personal information to the world. So now let’s make a new file “whois.py”. => import os => def get_whois ( url ): => command = “whois “ + url => process = os.popen( command ) => results = str( process.read() ) => return results This is the simplest tool we have build yet and you are familiar with all things that we have done as we have been doing this from last 2-3 sections. Let’s see if it’s working. => print( get_whois( ‘reddit.com’ ) )

https://www.reddit.com/

So this is all the information about the domain “reddit.com”. You can try this for any website you want. This is how whois is done using Python, see you in the next section. Source Code for ‘whois.py”

The Final Program

Alright guys and welcome back, the last thing that we need to do for the program is, now that we have all the individual tools created to find the domain name, IP Address, Nmap Scan, robots.txt and whois. We have got all those tools and now I am going to teach you how to make a function to run all of them with a single click. So then the user has to do is to give a URL and hit Run and it gathers all of those information automatically. Let’s first import all of our tools that we have created. => from general import * => from domain_name import * => from ip_address import * => from nmap import * => from robots_txt import * => from whois import import * Now I will make a new directory to store all of our results. => ROOT_DIR = ‘companies’ You can name this company's, targets, projects or whatever you want to but this is just going to be a separate folder and every website we scan will be stored inside this directory. => create_dir ( ROOT_DIR )

Now I will make a new function that user is going to call. => def gather_info ( name, url ): We are passing the name i.e the name of the website or the company and then the URL of that website. Now what we are going to do is call all the different tools and they will run a scan and we will then save all those results to different files. Let’s call all those tools. => robots_txt = get_robots_txt ( url ) => domain_name = get_domain_name ( url ) => whois = get_whois ( domain_name ) => ip_address = get_ip_address ( domain_name ) => nmap = get_nmap ( ‘-F’ , ip_address ) Till now what we did is ran all those tools and saved their results in different variables. Now we can make one more function and all this function does is save this text and write them to a text file. => create_report( name, url, domain_name, nmap, robots_txt, whois ) Now let’s create the function => def create_report ( name, url, domain_name, nmap, robots_txt ,whois ):

Now remember that each new website that you scan which is essentially a new project, you want to save it inside a new folder. So what we can do is : => project_dir = ROOT_DIR + “/” + name => create_dir( project_dir ) Remember that we have created the function “create_dir” in our general.py file that we did in the first section of the course. Now we only have to write the content to each and every file. We will use the “write_file” function that is also a part of general.py file. => write_file( project_dir + “/full_url.txt”, url ) => write_file( project_dir + “/domain_name.txt”, domain_name ) => write_file( project_dir + “/nmap.txt”, nmap ) => write_file( project_dir + “/robots.txt”, robots_txt ) => write_file( project_dir + “/whois.txt”, whois ) And I think this is it. Let’s now call the function. => gather_info( ‘google’, ‘https://www.google.com’ ) Now let me run and check whether it’s working or not. It will take a bit of time to run.

https://www.google.com/

So, now we have all the details. Now we are saving a hack lot of time by using this tool with a single click. Source Code for “main.py”

pythonreconnaissancescanner ebook

Documents