Blog » How Web Scraping Google Results Helps Hackers

Web scraping is not a new technique for gathering information. The approach has been used by marketers, researchers, and hackers alike as a way to gather information quickly and effectively. In today’s era of big data, web scraping is even more prominent as a way to collect specific information.

There are more outlets from which you can derive information too, especially now that giant search engines like Google have virtually every corner of the known internet indexed meticulously. You just have to know the keywords to target and how to scrape Google results for different purposes.

This is where Google dorking comes in handy. The right commands can turn a simple Google search into a treasure trove of valuable information.

Google Dorking for Hacking

First of all, let’s get one thing out of the way: dorking on Google is considered legal. You are simply utilizing different commands already available on the search engine to find specific information available publicly on the World Wide Web.

Google dorking is often associated with Google hacking. While it may seem like hacking at first, the entire process of tweaking search queries to find specific results doesn’t go against Google policies. The hacking part usually comes after sensitive information and details are uncovered.

There are some interesting commands to use too. For starters, you can use intitle:"index of" "debian.cnf" to find information about Debian servers. The search query simply reveals any Debian.cnf file that can be accessed publicly.

The same is true with intitle:index.of.?.database which also reveals folders with sensitive files. The command can be used to expose servers with sensitive directories that can be accessed openly, usually because of incorrect CHOWN and CHMOD configurations.

The list goes on and on. You can use commands like site:*/wp-login?redirect_to= intitle:"login" and site:admin.*.*/forgot?username= to find pages that might reveal username and other important details. The same commands also reveal pages that contain login portals.

Scraping Google Dorking Results

Google Dorking, when done manually, is a great way to find vulnerabilities on one or two sites. You can try multiple commands to see if there are exposed directories and login pages to be exploited. Use Google dorking to collect information from hundreds of websites, however, and the task becomes more complex.

This is where web scraping comes in handy. You can combine Google dorking with a capable web scraping tool to discover hundreds of websites that are vulnerable. You can also use the same combination to search for personal information, configuration files, and other materials. The process is even simple enough for everyday users.

You start by establishing a proxy. An intermediary proxy allows you to safely scrape Google dorking results without getting banned or revealing your original IP address. Additional security measures can be added to further hide your identity. You can, for instance, use a web scraping VPN and automation tools to help simplify the process.

Once the proxy or VPN is set up, the rest can be fully automated. A simple dorking command can uncover hundreds – if not thousands – of vulnerable sites and sensitive information in seconds. Think about what you can actually achieve when you run an automated web scraping tool on dorking results for a couple of hours.

Don’t forget that web scraping supports advanced information processing too. Rather than collecting a vast amount of data without context, you can set the scraping tool to automatically filter relevant information and details that you can actually use. This makes finding exploits and identifying vulnerable sites that much easier to do.

Simplifying the Process

Google dorking may not be illegal, but you can still use it for hacking. In fact, the combination of Google dorking – Google search hacks – and web scraping is a powerful one for every hacker to use. Imagine the kind of attacks you can launch when you can gather hundreds of .cnf files in minutes. You can navigate through security layers of servers easily.

You can go as far as finding .ini files for frameworks like MySQL, which will then give you access to more information. No more jumping through hoops to gain access to a server or collect valuable information about your targets; a simple dorking-scraping operation is all you need to get started.

That brings us to the most important point of learning about this trick: the scale of which it can be utilized. Imagine running multiple web scrapers for multiple dorks, all from behind proxies and VPN servers; imagine the wealth of information and vulnerabilities you can gather in such a short amount of time.

Whether you are doing penetration testing or trying to steal information from reinforced servers, dorking and scraping are approaches to consider. They are so simple that many system administrators don’t really prepare for them. That makes these approaches even more powerful as tools for hackers.