GhostProxies Blog

Check for Anonymity

When building apps that involve scraping, downloading data and automation, staying completely anonymous can be a huge concern for developers. Although there are many different proxy checkers out there, most of them all seem to deliver slightly different and unreliable results.

This guide will walk you through the three steps and provide clean PHP code to detect exactly how anonymous a specific proxy is.

What are the Different Levels of Proxy Anonymity?

Elite (High-Anonymous)

Your proxy is completely undetectable and your real IP will remain hidden. The server you connect to will have no idea you’re using a proxy. These are the best proxies you will find and the level of anonymity and quality is unprecedented.

Anonymous

Although your proxy IP is still hidden while connected to an anonymous proxy, some servers and proxy detection scripts will be able to detect that you’re using a proxy. Although these proxies are still useful for whitehat practices and data mining, your original IP still has a slight chance of exposure.

Transparent

Your original IP will be exposed and everyone will know you’re using a proxy. It is extremely risky and highly recommended to avoid using transparent proxies while trying to remain anonymous.

Image of an Undetectable Proxy

Step 1: Create a Proxy Gateway

The first step is to set up a gateway on your server that will emulate what any other server will use to determine if you’re using a proxy using the $_SERVER superglobal. Make sure this PHP file is accessible through a public URL (e.g. http://yourdomain.com/gateway.php)

Since $_SERVER outputs as an array, you’ll need to do some formatting. Here is an example of how I formatted the output in gateway.php as a string to easily extract the data for the proxy anonymity tester:

$output = '';

foreach ($_SERVER as $key => $value) {
	if (!empty($value)) {
		$output .= $key . '--' . $value . '---';
	}
}

$output = substr($output, 0, -3);

die($output);

Step 2: Connect to the Server Gateway and Retrieve Results

Once your gateway is set up, you’re ready to connect to it with your proxy and retrieve the $_SERVER output which will reveal how anonymous the proxy is. Below is some simple PHP code using cURL to access your gateway URL. This simple cURL script will detect if your proxy’s protocol is HTTP, SOCKS4, SOCKS5 or SOCKS4/5, so no need to determine that beforehand.

Note: Make sure the $url variable is set to your gateway URL and the $proxy variable is set to the proxy you’d like to test (in IP:PORT format).

public function gatewayResults($url, $proxy) {
	$types = array(
		'http',
		'socks4',
		'socks5'
	);

	$url = curl_init($url);

	curl_setopt($url, CURLOPT_PROXY, $proxy);

	foreach ($types as $type) {
		switch ($type) {
			case 'http':
				curl_setopt($url, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
				break;
			case 'socks4':
				curl_setopt($url, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4);
				break;
			case 'socks5':
				curl_setopt($url, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
				break;
		}

		curl_setopt($url, CURLOPT_TIMEOUT, 10);
		curl_setopt($url, CURLOPT_RETURNTRANSFER, 1);

		$resultsQuery = explode('---', curl_exec($url));

		if (!empty($resultsQuery)) {
			break;
		}
	}

	$results = array();

	foreach ($resultsQuery as $result) {
		if (!empty($result)) {
			$split = explode('--', $result);

			if (!empty($split[1])) {
				$results[$split[0]] = $split[1];
			}
		}
	}

	curl_close($url);
	unset($url);

	return $results;
}

Step 3: Check Proxy Anonymity by Using the Gateway Results

Image of Proxy Anonymity Detection

After you have the returned server data from the above gatewayResults function, simply pass it to the function below and it will return the proxy anonymity level.

public function checkAnonymity($server = array()) {
	$realIp = $_SERVER['REMOTE_ADDR'];

	$level = 'transparent';

	if (!in_array($realIp, $server)) {
		$level = 'anonymous';

		$proxyDetection = array(
			'HTTP_X_REAL_IP',
			'HTTP_X_FORWARDED_FOR',
			'HTTP_X_PROXY_ID',
			'HTTP_VIA',
			'HTTP_X_FORWARDED_FOR',
			'HTTP_FORWARDED_FOR',
			'HTTP_X_FORWARDED',
			'HTTP_FORWARDED',
			'HTTP_CLIENT_IP',
			'HTTP_FORWARDED_FOR_IP',
			'VIA',
			'X_FORWARDED_FOR',
			'FORWARDED_FOR',
			'X_FORWARDED FORWARDED',
			'CLIENT_IP',
			'FORWARDED_FOR_IP',
			'HTTP_PROXY_CONNECTION',
			'HTTP_XROXY_CONNECTION'
		);

		if (!array_intersect(array_keys($server), $proxyDetection)) {
			$level = 'elite';
		}
	}

	return $level;
}

If you have any improvements, suggestions or questions about this tutorial on how to detect proxy anonymity using PHP, please feel free to leave a comment below.

secure-browsing-with-an-elite-proxy

Almost everyone uses a computer at some point in their day, but not everyone understands the tech jargon a typical computer expert is likely to use. In particular, “proxy” is a term an individual is likely to have heard before, without necessarily understanding what the word means. In simpler terms, proxy basically means a hub computer through which other computer requests are processed–it’s almost like a parent computer giving out allowances to all the other “baby” computers. But understanding the definition of a thing and truly understanding how it works are two completely different things, so some background knowledge might be needed, especially if you are thinking about specifying the need for what is known as an elite proxy.

How It Works

A proxy is a computer that serves as an in-between for inquiries from clients looking for resources from other servers. The client in connections gets connected to the proxy server, requests a type of service, like a web page, connection, file, or some other item that comes from another. Then, the server works with the proxy server and is evaluated to determine if the request can be approved. Proxies were first created to add some organization and structure, and in turn compartmentalize various systems.

What Are Some Common Proxy Uses?

It might sound complicated, but believe it or not, proxies are used in the computer world for a multitude of reasons, many of which take place behind the scenes and might have gone unnoticed to you. One major use is security, as a proxy can help protect anonymity. Proxies also help speed things up, so your computer requests are granted more quickly. Bandwidth is saved, and scans for malware help keep you from downloading infected content before it is too late. The proxy is also often used by employers to scan information shared on company computers.

what-is-an-elite-proxy-1

One of its biggest uses however is in enhancing what materials you can access on your computer (within legal limits, of course). You can override blocked sites software, set up allowance for certain sites you frequent, visit sites in foreign countries, download outside content, bypass security controls, and make web requests to other servers. Basically, it is the proxy that puts the power and control right in your hands, without you having to ever think twice about it. It takes the complication out of web browsing, to make the process incredibly seamless and seemingly effortless and speedy.

Types of Proxies on the Market

For a system that is so complex, it’s no wonder that there is also a complex system of different types of proxies. A proxy server could be used right on the user’s home computer, or will sometimes work between the user’s private computer and a system of proxies across the Internet. As such, there are a number of proxy types you are liable to come up against–transparent, forward, reverse, open, performance enhancing, anonymous and, of course, elite. These different proxies each have their own sets of advantages and disadvantages, but we are here to talk about elite.

What Defines an Elite Proxy?

elite-proxy-web-surfing

An elite proxy server does all of the basic functions of any other proxy you might find, with some unique additions. First, REMOTE_ADDR is a header proxies send out to other servers, which contains your IP address. However, most proxies also send out the headers HTTP_X_FORWARDED_FOR and HTTP_VIA, which reveal more information about your particular computer (you can find the exact headers that are revealed here). With an elite proxy, this information is hidden, as these headers are not included with the typical IP address header that is customary. This can help you get around software, sites and programs that might not normally accept requests from your computer.

The Advantages of Using an Elite Proxy Server

So now you understand a bit about proxies and what an elite proxy is, but why would you actually bother to use one? There are several reasons why. First of all, many insiders believe an elite proxy is the best way to stay secure and keep your personal information protected. And, an elite proxy can be great if you want access to a foreign program or download, as the foreign server will not know you are not native to that region. Bottom line is it is an excellent way to keep your computer anonymous.

How to Increase Security Using an Elite Proxy

elite-proxy-browser-privacy

One great thing about elite proxies is that it can be used in tandem with other platforms to further enhance your computer’s security. The Virtual Private Server, or VPS, is a machine sold by various Internet servers to further protect users. This package is customizable so you can adapt a system that works best for your own needs. This, combined with the elite proxy, means you can set yourself up to surf the web, download and do business without worry about who is looking on.

Summing It Up

In all reality, understanding the computer world can be incredibly confusing, especially for those who feel a little out of their element when it comes to all things tech. But it does not have to be as overwhelming as it might seem on the surface. Proxies are fundamental parts of any computer system, whether it be a complex business infrastructure or a single person who’s concerned about their online security. Elite proxies are just a sophisticated and simple way to protect your identity when surfing the web. With an elite proxy, you can rest assured that you are safe and protected.

Private proxies are one of the most popular tools for privacy-savvy web users to remain anonymous online. They allow individuals to filter web access from one server to another, accessing the intervening server. This protects their IP address from the target site, allowing private proxies to serve as protectors of both privacy and security. Most malware exploits will not function through private proxy, either.

However, recent court rulings out of California have left many asking whether or not private proxies are legal in the United States.

proxies are legal in the us

Yes, Private Proxies are Legal in the USA

The bottom line is that, as with any tool, proxies are entirely legal. However, as with any tool, they can be used for unlawful purposes, and recent rulings have significantly changed what may qualify as unlawful.

The History of Proxy Laws in the USA

The current laws concerning private proxies were introduced in 1986. The Computer Fraud and Abuse Act originally served to throttle federal jurisdiction with respect to computer-related crimes. Unless a case had escalated to a significant scale or grew to encompass national interests across state lines, federal enforcers simply weren’t to get involved with the particulars of how the crime was committed. Effectively, it only handed them the tools to prosecute the same crimes that had always fallen within their sphere, with the specific inclusion of those crimes when they were committed over a network.

A ruling out of Northern California has changed that. One Judge Breyer has held that circumventing a block placed by a website to forbid a particular user entry is a violation of the CFAA. This is one of the most common uses of a private proxy, and it is a natural consequence of simple use with or without the intent to circumvent a block. This means that simply navigating to a website you have been banned from through a proxy, whether or not you are aware of the ban and whether or not you intend to circumvent it, could constitute a violation of the CFAA.

This is perhaps the scariest element of the whole question, but it isn’t inaccessible. It just requires understanding what you need to be aware of. The following are some examples of when it isn’t okay to use a private proxy.

When is Using a Proxy Considered Illegal?

Spamming & Bypassing IP Bans

illegal proxy spam

This is the most glaring incidence of proxies being used to infringe, because accessing a site you ought to be blocked from is very simple. Blocking can only be done by IP address or IP range. You don’t need to have a private proxy to get around this. Resetting your router can achieve the same effect, and it’s likely that your IP address will cycle not-infrequently. This can be navigated safely by understanding the terms of service in use on any site you will be actively engaging with. Understand what kind of access they do or do not allow. If you stay within their parameters, you are likely in the clear with respect to whether or not you can safely access their site via proxy.

It is very common for sites to intend blocks as a means of forbidding interaction to prevent abuse rather than preventing access for passive use; most sites draw their income from traffic, so reducing traffic outright with a ban isn’t something most will be interested in when they can functionally silence troublemakers. If the ruling out of California holds, this will likely grow more common as sites try to adapt.

Circumventing International Copyright Laws

illegal proxy copyright infringement

A more serious example is rooted in copyright law. An increase in streaming media on the web has made regional copyright status a major hot button. Many entertainment products fall under regional copyright in some countries but not in others. In some cases, this is considered a grey zone by various fans. Laws from overseas nations regarding media that is entirely unlicensed in other nations can’t generally be enforced. It gets more complicated when proxies enter the picture. Streaming media that is expressly blocked in your home country by using a private proxy physically located in another country can be prosecuted in the same fashion as other copyright violations. The only protection for users engaged in this behavior lies in the hands of the owners of the proxy servers themselves, who may be subpoenaed for your home IP address and identity.

Neither of these make proxies illegal in any fashion, but it does mean that users of private proxies should understand their proxy service and understand the sites they use it to visit.

How to Stay Legal with your Proxies

use private proxies safely

Don’t Use a Blackhat Proxy Service

You should look into the goals and inclinations of your private proxy service. Some private proxy services have expressly-blackhat ideas and are intended for those that are going to deliberately circumvent IP trackers for illicit reasons. Whitehat proxy services will cheerfully comply with regional restrictions and other issues on their own; they’ll effectively prevent you from making any mistakes. Private proxies with bad intentions will not offer you the same peace of mind.

Always Read the Terms & Conditions

Those concerned about the legality of their proxy are best served by looking carefully at the terms and conditions of the sites they value. Many different sites have different takes on what constitutes a blocked user or forbidden access to their site. Many have different ideas about whether or not differing or concealed IP addresses are considered a violation of policy. Some are choosing to enumerate the use of a proxy as explicitly allowed in response to recent questions about this topic. Others are banning proxies outright on general principle. Make it a point to understand how different sites feel about access via proxy.

Under some circumstances, users will be permitted to read or access data from a site through a proxy, but not contribute material. This is a compromise drawn to hold people more accountable for the things they say online while still allowing for secure browsing. This model is likely to catch on with greater force as both privacy and online conduct come into focus as issues to manage.

Use General Common Sense

Of course, common sense should be applied. If you have been banned from any site or service on the web, don’t use private proxies or any other method to circumvent the ban. Circumventing it passively with IP-switching your router might already enable this. It is also a bad idea regardless of what the new legal implications may be. Most of the questions that are raised by this new ruling can be avoided by simply monitoring your own conduct and being respectful to webmasters while using private proxies responsibly and legally.

In conclusion, private proxies are legal in the US as long as they are used appropriately. If you are concerned about unlawful access, you probably aren’t going to find yourself getting into trouble anytime soon. Simply keep yourself educated and be aware of the particular pitfalls that might cause you to break the law unknowingly, and you should do fine.

Disclaimer: This article does not condone illegal proxy usage. Please abide by all proxy laws and court rulings in your country.

Bypass Anonymously with Proxies

The Internet offers a wide range of content that is normally supposed to be accessible to anyone. However, in recent times, there have been many cases of local governments and Internet providers blocking certain websites or online services. In many countries, adult-oriented or gambling websites are blocked. So are certain communications apps that provide Internet telephony services. Even the world’s most popular websites, like Facebook and Twitter, are blocked in some locations. Such blocks are often put in place in countries with authoritarian regimes such as China and Iran.

But even those living in countries without such drastic state-imposed Internet censorship, like Canada, the USA and the Netherlands may sometimes find themselves unable to access the content that they want to see. This is mainly due to the fact that some websites and content publishers may choose to make their online content available only to a specific country. For example, certain live television streams from the UK may not be accessible to those who are outside of the country. A few American TV networks may upload their shows to video sharing sites like YouTube, but will restrict viewership to those in the United States.

Bypassing Blocked Websites

If you want to access content that has been blocked in your location, there are a few solutions available to you. One of the most common ones is to use a proxy server. A proxy works by acting as an intermediary between your computer and the website that you are trying to access. As such, all the data is securely tunneled through it. Furthermore, the website that you are accessing will see the IP address of the proxy and not that assigned to your PC. Therefore, if you use a proxy based in the USA, websites will believe that you are in fact based in America, while in reality you could be found anywhere on the globe.

Bypass Blocks with Private Proxy

What Type of Proxy Do You Need?

You need private proxies. Some websites have free web-based proxies, or lists of public proxy servers that you can use with your browser. But these public, or shared, proxies come with numerous disadvantages. They may be very slow, which will render them practically useless for websites that have heavy amounts of content on them, such as video streaming sites or Flash game arcades. This is simply due to the fact that a public proxy will be shared by many users at the same time, thus rapidly becoming overloaded.

Proxy servers that can be found through public lists also tend to be quite unreliable. They could be working one day and gone the next. You may have to spend a lot of time finding another proxy that works and provides a decent connection speed, only to see it go down the next day.

Finally, as information on public proxies is widely disseminated online, many websites have resorted to blocking all connections coming from a known public proxy server. In many cases, these blocks have been put in place due to the actions of various miscreants, such as spammers and hackers, use public proxies to conduct activities that violate the terms of use the sites they access.

How a Private Proxy Works

How Bypassing with Proxies Works

Benefits of Using a Paid Proxy for Bypassing

A private, or dedicated proxy, is one that you can rent all for yourself. This means that nobody else will be using the same proxy as you, which will give you a much faster connection when compared to a public server. Private proxies are available from all over the world, so you will have no trouble finding one that will have an IP address from the USA, Europe, Canada, or practically any other country on the planet.

How Can I Use the Private Proxies I’ve Purchased?

Using a private proxy is quite simple. All you need to do is to sign up with a provider that offers them, such as http://ghostproxies.com, and select a server based on your desired geographical region. You will then follow some simple instructions in order to configure your browser to route all of your online traffic through the proxy. Some providers also offer a downloadable app that will do the configuration automatically for you. Depending on the provider, you may also be able to use the proxy for more than just standard web browsing. This could allow you to play online multiplayer games, as well as using communications software like Skype, or file sharing tools such as BitTorrent, all while safeguarding your online anonymity and bypassing any geographical restrictions.

Why Paid Private Proxies are the Best Option

Private proxies are typically quite affordable, with prices starting at just a few dollars per month. Unlike free services, you will not be subject to endless popup ads while you browse the web, or have a frustratingly slow download speed. This makes the small monthly fee for a private proxy very well worth it.

Today, using the Internet has never been easier. Now that most places incorporate Wireless Internet, you can just turn on your computer, click “Find A Network” and then you are connected! It is one of the easiest things to do.

Unfortunately, with this comes the ease of losing our privacy. While there are many ways to protect your privacy while using the Internet (such as Firewalls and password protection), more and more people are setting up proxies on their browsers.

Use Proxies in Top 5 Browsers

What Is A Proxy?

A proxy is a short term for “proxy server”, in which the user sets up another server and uses it to communicate through the Internet. It’s similar to using a telephone, where you have one person trying to talk to another person, but they are using a third party (which would be the telephone). Using this method protects people’s privacy by giving them anonymity. Just like a physical telephone can hide your identity in the real world, a proxy can hide it on the web. So a web browser can continue to use the Internet like normal, but any person or website on the other end will only have contact with whatever info gets sent to the proxy, and can only see the proxy’s information, as opposed to the web user.

How Do You Set Up A Proxy On A Browser?

If you decide you want to set up a proxy, it is not a difficult thing to do. However, the process varies depending on what browser is being used. Below are the instructions for setting up a proxy server on the top five most used web browsers:

Proxy Set Up Using Google Chrome

Proxies for Chrome

First you want to find the Control Panel under the Start Menu in Windows, then go ahead and select Internet Options.
Find the LAN settings button after you located and select Internet Properties.
Select the box next to Use a proxy server for your LAN. Once that is done, you can enter proxy IP addresses and ports.
When finished, select OK to save changes.

Proxy Set Up Using Internet Explorer

Proxies for Internet Explorer

Open Internet Explorer. Once it is up and running, look for Internet Options under the Tools drop down menu.
Select Connections, and then hit the LAN Settings button (this is found all the way down at the bottom under LAN Settings Section).
Once there, find Use a proxy server and click on it. Then select Advanced.
Under Advanced you can enter the proxy IP address and port in their respective places.
Once that is done, hit Use the same proxy server for all protocols and finish by selecting Ok.

Proxy Set Up Using Firefox

Proxies for Firefox

Start by opening Firefox. Once open, go ahead find Options under the Tools tab.
Once in Options, select Advanced.
Click on the Network tab, and then select Settings.
Under Settings, look for the button called Manual Proxy Configuration.
Once here, type any proxy server’s IP address under the HTTP Proxy field and any proxy port under Port.
Check the box Use this proxy server for all protocols and then click OK.

Proxy Set Up Using Safari

Proxies for Safari

First, open up Safari. Under the tab labeled Safari find and click on Preferences…
Under Preferences, select Advanced. Once in the Advanced menu, look for Proxies then select Change Settings…
Under Change Settings, a separate window will pop up named Internet Properties. Find LAN Settings and select it.
Once you click on LAN Settings, a window be pulled up that says Local Area Network (LAN) Settings. Once here, go ahead and select the box in Proxy Server block.
Once you do this, Port and Address rows can be altered. Once you’ve made your additions or changes to both, select OK. Close all open windows and Quit Safari.
The next time you open Safari, your new settings will be in place.

Proxy Set Up Using Opera

Proxies for Opera

When using Opera, you can find the Proxy server settings under Tools, then by clicking on Preferences, then the Advanced button, and then Network.
If using Opera on a Mac, open Opera, then select Preferences, then click on the Advanced button, and then Network.
Click on the Proxy servers button to find the Proxy settings.
Once under Proxy Settings, you can select the specific protocols you want to use and then enter the server’s IP address. To make any changes the port number, look for a box on the right-hand side of the Protocols list.

Conclusion

Now that you know how to set up proxy servers in the more popular web browsers, you can now continue to browse the web a little more securely. Using this technique, combined with a few other security measures should put the average Internet user more at ease when it comes to their identity and privacy.

How Many of The Internet's Users are Fake

<a href="javascript:;" onclick="post_nav(sdl('oFru.xhsZpio-/hewa/ps-rp?.meruca-eronfslmyae=%-knh2oettFf%.tb-2rpltFs%oh2s3ge7iA%-zn%2ijg2Fn8.F2tQc%0eyo21rSmF5nd/g%eith2t7roFsHas0-fnt3uasp%svir2eut', 131, 149, 5, 57), {su:window.location}, '_blank');"><img alt="How Many of The Internet's Users are Fake" border="0" style=";height:auto;;position:inherit !important;" indx="42407926" rank="5" irank="1213505274" atitle="How Many of The Internet’s Users are Fake" data-src="//ghostproxies.com/blog/wp-content/uploads/2015/03/How-Many-of-the-Internets-Users-are-Fake1.jpg" data-srcset="" class="rs-article-img-src do-lazy"></a><br>Via: <a href="javascript:;" onclick="post_nav(sdl('p..ahhAgoo5rrcnpt%hxmsesos?t2oi%AssmiupFse2Dei/trs%tsF/nnt.l%2p.E/tgrp=3Frcu', 70, 76, 11, 2), {su:window.location}, '_blank');">Ghostproxies.com</a>

This code links back to Ghostproxies.com

The old metaphor for the Internet was an information superhighway. Real highways have anywhere from two to a half-dozen or so lanes, so you might assume a superhighway has a few more than that. This was back years ago, though, before the web grew to the juggernaut it is today. The modern superhighway might be millions of lanes, all packed with traffic, people going to and fro at high speeds.

Can you imagine the traffic problems on such a highway? Billions of cars traveling at any given time, to and from billions of destinations. One bottleneck, one accident, and you have gridlock for lightyears.

The problem isn’t helped, either, by the incredible number of these metaphorical cars that don’t have drivers. These cars are the fake traffic that spreads around the Internet, coming from a variety of sources. You have your proxy users, people in other cars remote controlling cars so the metaphorical Internet police don’t locate the driver. You have your spiders, robot cars with Google Earth cameras strapped to the top, taking surveys of the landmarks they pass. You even have the fake people, cardboard cutouts surrounding the skyscrapers that are Facebook, Twitter, Instagram, and the rest.

Bots Both Good and Bad

The Internet is just packed full of bots of various kinds, fake users riddle social networks and web forums, bots comment on blog posts anywhere they can. Spiders crawl websites to index them or to look for changes. Malicious spiders look for vulnerabilities or opportunities for spam. “Fake” traffic comes from proxy URLs, redirects to hide the source of the traffic for good or ill.

That’s really the question, isn’t it? With so much bot traffic flying around and so many fake accounts made, how much of it is valuable? How much of it is dangerous? How much of it do people even realize is going on?

Well, we can at least answer the last question. We’ve been hard at work compiling statistics about fake users, proxy traffic and web bots, and we have it here for you in convenient infographic form.

Fake Users

Fake users on websites are admittedly a problem. Facebook, Twitter, Instagram and all the rest fight to combat them, by increasing detection and removing fake users. They try to make it detrimental to accumulate fake users, but there’s not a lot they can do to fully fix the problem.

What about legitimate fake user accounts? Are there ever legitimate uses for them? I can think of a few. For example, you might want a fake account to test how your page looks from a user’s perspective. A Facebook Page Admin, for example, can look at their page, but they still see all of the insights and additional tools available to them. They would need an account without access in order to see how their page truly looks. PHPBB, a forums software, solved this issue by giving admins the ability to view the site from the perspective of any user, so they don’t need to create an account specifically for that purpose.

Unfortunately, malicious and spam-centric fake users vastly outnumber the few legitimate uses for sock puppet accounts. It’s for reasons like this that some Asian games and social networks require valid national identity numbers to register.

Proxy Traffic

Proxy traffic, likewise, can be used for good and ill. Some people use proxies to get around IP blocks on forums and in games, or to make it look like their posts – possibly spam posts – come from a different location so they can’t be blocked easily. Proxies are even used to hide the source of hacks and DDoS attacks.

On the other hand, proxy traffic can be useful to test geolocation services, or to get around certain tracking and privacy invasion software. Privacy is a huge issue on the Internet today,

Web Crawlers

Web spiders, of course, are the bread and butter of many online services. Any search engine either uses their own fleet of web drones, or they borrow their index from another engine that does. It’s not only a legitimate use, it’s a necessary one, if we want such services to remain functional. The Internet is so large and changes so often that it would be impossible to keep up with it in any other way.

On the other hand, you also have malicious web crawlers. You have crawlers that ignore search engine directives to index pages that otherwise would remain invisible to the public. You have crawlers that search for comment fields or contact lists to submit their advertising through, which are the front line soldiers in the Captcha wars. You have similar bots designed to create fake accounts on forums, blogs and social networks, for use in creating fake accounts. There are bots that steal content, bots that look for outdated software with known vulnerabilities, and more.

No one can say that these malicious bots are anything other than a problem, but there’s no good way to block bot traffic without blocking the beneficial traffic as well, at least not on an Internet-wide basis. Perhaps the first step to a better solution, at least, is an awareness of the scale of the problem.

GSA Proxies

GSA Search Engine Ranker is a powerful tool when used appropriately. When used poorly, it can come back to bite you in the ass, hard. Unfortunately, it’s incredibly easy to use poorly, particularly if you’re not used to the relatively arcane program and its features.

GSA works on a simple premise. One of the most potent of all forms of SEO is the backlink. In order to gain a powerful boost to your SEO, you need a number of valid, useful backlinks. The more the better (generally speaking).

Building these backlinks is a long and time-consuming process that partially involves luck and persistence. It’s also where many young webmasters make their first dangerous mistakes, falling victim to thin schemes on Fiverr or a black hat web forum. They end up with a ton of spam backlinks, which hurt their site overall.

Ideally, you would find a way to automate this process, without risking black hat penalties or linkspam flags. GSA is a tool that does just that. It’s essentially an automated scraper and link submission tool, though using it specifically for website scraping is a terrible idea. There are better scrapers, specifically scrapers that have additional useful features.

Most similar pieces of software require you to build a manual list of sites to submit, but GSA allows you to build one through the program. More importantly, that database is dynamic and ever-changing. When a site dies or is delisted, making it useless to you, it’s removed from the list. When a scan shows a new site that’s a ripe target for a link, it’s added to the list. You can even rank sites in different value tiers, so you can put your careful and valuable links on the high tier sites.

Proxies Matter

GSA Settings

Here’s the thing about GSA Search Engine Ranker, and why it’s risky for inexperienced users; it throws links and content on a wide variety of sites, very quickly. If you’re not using a content spinner – bundled with the software – and a list of proxies, it looks like hundreds or thousands of posts with similar or duplicate content are all coming from the same user. This brings up a number of problems.

First, the lack of content spin means you immediately cause duplicate content penalties, all of which point back to your site. This is an immediate Google penalty, because Google hates duplicate content.

Second, when all of your submissions come from the same IP, it’s easy for that IP to be banned. This temporarily kills the utility of GSA until you’re able to use a different IP, and causes the same problem. This is where you need proxies.

With a proxy list, GSA is able to post these automated links from a wide range of different geographic locations. Of course, it’s all originating from your computer, but no one but you knows that.

Proxy Quality

However, you can’t use just any old proxy list. Proxies are notoriously fickle, particularly free proxies. For one thing, they’re prone to dying or drying up without warning. They’re also prone to high latency, when a lot of people are using the same public proxies at the same time.

Some public proxies also impose advertising or layovers on your content, which can hinder the operation of a tool like GSA.

You also have the issue of public proxy lists ending up on site blacklists. A proxy IP does you no good if it can’t post to the sites you want to post to, according to your scraped list.

This is why you need a private proxy network. You need a large number of proxies, all of which are low-latency, do not impose interstitials or overlays, and are guaranteed to work for submission. When you pay for a proxy list, you’re also paying for upkeep; if a proxy dies, a new one will take its place. This allows you to maintain the efficiency of GSA without issues.

Here’s a good tutorial for configuring a proxy list in GSA, though you do need to actually purchase access to a private proxy list if you want to benefit from it.

There’s one other thing to consider with your proxies; you need enough on your list to handle the volume of links you’re trying to generate. This is where threading comes in. GSA allows you to set up individual processes running in threads on each proxy, automatically of course.

Essentially, this allows you to divide your link creation amongst a variety of proxies. If you want to create 100 links, and you only have one proxy, 100 links must be made through one IP; this is bad. If you want to make 100 links, and have 50 proxies, you only need to make two links per proxy. This is much better. You don’t need to go all the way down to one link per proxy; this wastes a lot of time on your proxies.

An ideal thread ratio is somewhere around two or three. That is, two or three links created per proxy.

A Note on Penguin

Using GSA improperly can and will penalize your site dramatically via the Penguin algorithm. The last thing you want is to fall victim to such a penalty; it can essentially kill your site for months until you disavow or remove all of the links you made with GSA. You need to be very careful to make your links look natural. This means:

You need to set up partial match keywords, so you’re not spamming the same anchor text everywhere.
You need to set up variations on anchor text and keyword usage so no two links look the same out of a given handful. Occasionally some duplicates are fine, but the more dupes you have, the worse off you are.
You need to vary up your content, typically via spinning. This will require some manual labor, though you can pay a freelance writer to create your spun content.

You should also try to avoid putting your links on sites that are way out of your niche, or sites that are clearly spam. The worse a site is, the less valuable a link is for you.

Proxies with CURL

When many people think of web proxies, they think of borderline illegal activities. There’s something of a bad reputation generated by spambots, black hat SEOs, and 4chan hackers. When proxies are associated with DDOS attacks and spam, it’s hard to talk about them openly.

There are plenty of legitimate uses for proxies, though, and one of them is data mining. The web is packed full of awesome data resources, but a lot of them are hidden behind gates through which you need to make requests. Some of them, of course, put paywalls in these gates to make money. Others simply use them as a way to filter traffic to prevent excessive access.

These gates are the bane of every data miner. You want a large quantity of data from the publicly accessible database. You try to access it, and the first dozen or so requests work fine. Then you find your page timing out or your requests bouncing due to rate limits and excessive use bans.

Avoiding IP and Account Bans

When a server like this bans a user for excessive use, it’s either an account ban or an IP ban. Account bans are difficult to deal with. After all, you need to make a new account and jump through all of the hoops to verify it. I don’t recommend trying to use the proxy method to get around this, though if you have a script to fill out and verify profiles, you can do so easily enough. Excessive account creation and usage in sequence is likely to lead to an IP ban anyways.

An IP ban, obviously enough, blocks the IP address you’re using to connect to the Internet. If you want to get around it, you need to access the offending website from a different IP address. This is where web proxies come in.

The point of a web proxy, I’m sure you know, is to anonymize your traffic by shuttling it through a third party server. It’s like using a virtual assistant to call for take-out, or a chauffer to drive you from place to place. You send your traffic request to the proxy, the proxy ferries it on, it gets the results, and delivers them back to you.

Rate Limits and Usage Limits

There are a few problems with using proxies in this manner. For one thing, public proxies tend to have either rate limits or usage limits, typically alongside a bandwidth limit. If you’re making a lot of small requests, or several large requests, your operation is likely to time out partway through the process. This is extremely frustrating, as you might imagine, particularly when trying to complete single large requests that can’t be resumed easily.

Some proxies also render your content in an iframe or use interstitial ads on every third or so request, in order to make some money. Depending on the automatic system you’re using to harvest data, this can add junk fields or break your script.

There are also all of the usual issues with using a proxy server, which is almost always located in a foreign country. This adds latency to the requests, making your operations take longer. Proxies are notoriously fickle and prone to dropping requests, all of which cause issues.

Benefits of Private Proxies (Versus Shared Ones)

Private proxy lists alleviate many of these issues. For one thing, you’re typically paying to access the proxies, so those proxies are clean – that is, they don’t have to add advertising to your data stream to make money. They have lower volume and can support a more consistent throughput. You’ll have fewer timeouts, fewer lost requests, and less latency.

Of course, that only alleviates the issues with the proxy; you still have the issues with the IP bans from the site you’re harvesting. You can jump from proxy to proxy each time one gets banned, but that’s tedious. Data mining requires as much automation as possible, because the real fun begins with analysis rather than the harvest.

Retrieving Data with CURL

This is where CURL and PHP come together to solve the problem. What you do, in this case, is set up a script that rotates through a variety of proxies in a list. You have a list of requests, a list of proxies, and a target site. You make request A from proxy A, then request B from proxy B, and so on down the list. At some point, say request L, you loop back to proxy A because request A has been done for a while.

The idea is to have a rotating list of proxies to handle your requests in a more organic, low rate manner. To the database site, it looks as though a dozen or so users are just making normal requests. Not only does this distribute the load of the requests you’re putting on the database; it also cuts the time it takes to process your requests significantly.

This distributes the load throughout numerous IP addresses, so it’s harder if not impossible to ban them all. If one does receive a ban, you just add another proxy to the list to replace it. In no time, your harvesting will be complete and you can leave the poor database alone to contemplate its role in life.

Example Script to Try

I like this particular script, as I said, in PHP and CURL. The script allows you to fill a file with proxy addresses and submit your requests. It also automatically includes retry logic, so when a request fails, it tries again rather than hanging or skipping to the next request. It’s surprisingly efficient the way it’s programmed, so it doesn’t overload your CPU as well.

If you’re using the Mac OS or Linux – a Posix system – you can run the script directly. Windows systems will need a PHP interpreter to run it. You can see examples of the specific usage for downloading files in the readme on the github page.

If you want to do something more advanced than downloading simple files, you’ll need to add a small script to run to make that request. You’re on your own for that one.

Craigslist Scraping 101

Craigslist is a notoriously difficult site to use for data harvesting, because of how they have everything set up. There’s no easy way to scrape data, at all. On most commerce, database, and social sites, the developers provide an API for power users to scrape data and output it in a format they want. For example, look at how much documentation Facebook has for their API. You can pull practically any Insights data from a page you own, and you can pull a bunch of public data from pages you don’t own. It’s all surprisingly simple, even.

Craigslist is a special case. They have an API, but it functions in reverse. Facebook’s API allows you to pull data, but does not allow posting. You need to use apps for that functionality. The Craigslist API allows you to post, in bulk if you want, but it doesn’t allow you to pull read-only data.

It’s quite a backwards implementation, but it makes a certain amount of sense from the Craigslist point of view. They gain a benefit from allowing businesses, particularly real estate managers with large numbers of properties, to post in bulk via a simple API. On the other hand, they gain nothing by allowing third parties to scrape data and, presumably, display it on a non-Craigslist site. Even if all you want to do is run some data analysis, it’s just that much more stress on their servers for which they gain nothing.

Craigslist does have RSS feeds you can subscribe to in various subsections and regions of the site. These are available for personal use, but if you try to use them to harvest data in bulk and use that data elsewhere, you’re likely to have your access blocked. Craigslist even says in their terms of service, flat out:

You agree not to use or provide software (except for general purpose web browsers and email clients, or software expressly licensed by us) or services that interact or interoperate with CL, e.g. for downloading, uploading, posting, flagging, emailing, search, or mobile use. Robots, spiders, scripts, scrapers, crawlers, etc. are prohibited, as are misleading, unsolicited, unlawful, and/or spam postings/email. You agree not to collect users’ personal and/or contact information (“PI”).

What does this all mean? It’s pretty simple to break down.

You can only access Craigslist via a web browser or email client.
You can only post to Craigslist using a web browser or their bulk posting API.
You cannot scrape data with a spider, crawler, script, or bot of any sort.
You cannot harvest user personal data or contact information.

Additionally, of course, there are the basic anti-spam measures as well. In short, the entire focus of this article – scraping Craigslist data using third party software – is against the CL terms of use.

Scraping Legality

Scraping Example

Why do I bring this up? Two reasons, primarily. One is obvious enough; we’re a site that provides proxies, and proxies are essential to this process. The other is a basic warning. Anything you do, while following these instructions, is on you. You now know, going into it, that it’s against the terms of use for the site. You are thus liable for anything that happens, ranging from having your access blocked, your posts removed, or your IP banned. You could potentially even be subject to legal action.

Craigslist has, in the past, even taken that legal action. It all depends on the scale of your scraping, of course, and the usage of the data you harvest. Data analysis is more or less fine. Commercial use, particularly commercial use that steps on CL’s territory, will enrage the beast.

The most notable instance of this was the recently-settled legal fight between Craigslist and the 3Taps API creator, itself named 3Taps.

Essentially, 3Taps created a Craigslist data harvesting API. They partnered with Padmapper, a company that used the real estate data harvested from Craigslist and overlaid it on a map. This produced a real estate availability map, which is honestly a very useful function, and it’s amazing that Craigslist hasn’t made something of the sort on their own. That’s for the next section, though.

Craigslist obviously didn’t approve of having the data from their site used against their terms of service on a third party site. They started a legal suit against both 3Taps and Padmapper, which began as early as June of 2012, and was only just settled in June of 2015. Both sites were required to stop harvesting data, and 3Taps paid Craigslist a tidy million dollars.

While 3Taps and Padmapper both still exist using data from non-Craigslist sites, the settlement hurt, and it’s just one example of what could happen if you try to scrape CL data and use it in a commercial use.

The primary mistake these businesses made was ignoring when CL sent out a cease and desist letter and banned their IPs. They continued to circumvent those restrictions and scraped data, which in turn led to further legal action. My recommendation? If you get a C&D letter, comply. It’s probably not worth it to you.

Issues With Craigslist

Craigslist is a site with a lot of issues. It was debuted in 2006, but how much has it changed since then? They have had a few major updates over the years, but just compare the current design to an Internet Archive of the site from its launch. It’s hardly changed at all. It’s centered rather than left-aligned, it has a it better coloring and spacing, but it’s largely identical.

The user interface hasn’t changed much, but it has obscured more data than it used to. These days, you see three types of ads posted.

Ads with plaintext contact information. These are usually posted by businesses looking to get people to contact them. These businesses have staff to answer the phones, and thus weed out unsavory callers.
Ads with obfuscated contact information. These are the people who post personal ads and post their phone numbers with a format like (five,,,,5,,,5) 1two….three-four56’’’’7. They do this so a human can, with a bit of difficulty, parse the phone number, but a bot finds it impossible.
Ads with no contact information. If you want to contact the poster of the ad, you need to send an email to the anonymized email address provided by Craigslist as a forwarding address. You see nothing of the poster, but they see your return address and are free to respond in kind.

Beyond that, there are issues with what is and isn’t allowed on CL these days. Post titles are free to include all sorts of Unicode symbols, and in fact it almost makes it more effective to do so than to not, because normal text headlines don’t stand out. This also presents a problem to scrapers, which need to figure out how to parse these special characters, or remove them altogether.

And, of course, there’s the ongoing problem of spam. This isn’t so much a problem in more “serious” sections, like the real estate section, that are somewhat heavily moderated. Rather, they’re a problem in more personal sections, like Free, Jobs, and the entire Personals category.

Oh, CL does have anti-spam measures. Sometimes they require phone verification. They have a posting limit, excepting the bulk post API, which only works in certain sections. They have an automated system to lock out people who break the rules. None of it works.

The worst part is, Craigslist was making moves to improve the flexibility and viability of the site, a few years ago. You could use a lot of HTML to customize your postings, to make the thin site itself look more robust and to provide more information in better ways. In 2013, Craigslist removed these features, returning the site to its basic black and white look. They called it Hurricane Craig, because web monitors and marketers are nothing if not overdramatic.

There’s only one benefit to Hurricane Craig, and that’s the fact that it standardized a lot more of the data in posts. It makes it much easier for a robot to pull data from a browser window, rather than needing to find and parse data in code based on certain criteria. So, good for you, Craigslist; you made it easier for us to do what you don’t want.

Why You Might Scrape Craigslist

Scraping Example 2

What possible reason could you have to scrape Craigslist data? Well, there are a lot of different reasons.

On the analytical front, you could always just want to harvest data to write a report. Investigative journalism still exists, rare as it may be these days. You might want to scrape all of the posts in a given section and analyze things about them, like average prices for products, or frequency of posting, or comparing type of item with how hard it is to contact the user. None of this is profitable, of course; it’s just information for you to use in other ways. Honestly, I think Craigslist would be fine with this, and I think you’d be safe doing it, because they wouldn’t win a court case over it. Of course, I’m not a lawyer, so take that with a chunk of salt.

On the personal front, you could harvest data for information you want to use. If you’re shopping for used cars, for example, you might want to harvest all of the data on used cars to correlate prices, locations, and make/model information about the vehicles so you have once central location to browse through. As useful as Craigslist can be, their browsing and filtering kind of sucks.

On the profitable front, you can scrape data for something you would like to buy and resell. One common target is concert and event tickets; you can monitor events that are sold out, scrape Craigslist to locate tickets for those events being sold, buy up any below a certain price point, and resell them for more elsewhere, like eBay. This does of course rely on a lot of personal effort, but hey, some people will do a lot to make a few bucks.

On the commercial front, you can use it to generate leads. You could scrape the Wanted section for anyone who is searching for a service or item you provide, and then reach out to them to sell your product. It’s probably not a very efficient means of generating leads – possibly no more effective than posting a selling ad in the first place – but it’s there.

Of course, all of this relies on your willingness to violate the Craigslist terms of service. I highly recommend avoiding any overt commercial usages. Going the route of Padmapper opens you up to all the same possible legal damages, and there’s already a legal precedent for the arguments that can and cannot be successful.

Scraping Data from Craigslist

The exact method you use for scraping data will, unfortunately, depend a lot on the tool you decide to use. The general process will look something like this.

Step 1: Pick a Tool

The first step is to pick a tool you would like to use to scrape Craigslist. You can, if you want, develop one yourself. It’s an interesting exercise if you’re a coder. If you’re not, well, there’s no reason to bother making one when so many different tools already exist. Here’s a rundown of a few options, though they are by no means all the options available.

Cloud Crawler – This crawler is a web spider working specifically in the cloud, which makes step 2 a little unnecessary. It is, however, quite difficult to use. There’s not much documentation for it. It’s good if you want to experiment with coding but don’t want to develop a scraper from scratch. On the plus side, it’s a free open source project.
Visual Web Ripper – Where Cloud Crawler is coding raw HTML in a notepad txt file, Visual Web Ripper is Dreamweaver. It’s a very user-friendly, graphical web ripper that allows you to point at the information you want to scrape, and the program does the rest. It has video demonstrations, it has a fancy website, and everything. It does have limitations, however. The free trial only scrapes up to 100 elements on a website, which can be bogged down by scripts and code. It’s also only available for fifteen days. It is, however, very expensive. The license for the full version of the program – including lifetime upgrades – is $350.
Craigslist Scraper – This tool is a much smaller scraper, and it’s limited to just scraping personal information – a double whammy of ToS violations. It exports nicely into a CSV, at least. It’s also much cheaper than Visual Web Ripper, at only $50.
Python Craigslist Scraper – This is another open source code scraper, but it’s a little easier to use. Free, as with anything on Githb, it’s coded in one of the easiest languages to learn. It’s possibly the most popular free CL scraper out there.
Scrapy – This is, in my opinion, one of the most useful, robust, and legitimate scrapers out there. It’s billed as an all-purpose web crawler, so you can use it for a lot more than just Craigslist. It’s also much less limited, it’s easy to configure, and it’s free. Really, I just saved the best for last.

The best part about Scrapy is the documentation. For example, if you want to scrape Craigslist, you can follow this tutorial which was built around scraping nonprofit jobs in a specific area. It may look a little intimidating, but it’s really not that bad.

Step 2: Use Proxies Whenever Possible

Remember how I mentioned Craigslist is pretty aggressive about stopping scrapers? Proxies are a solution. Their only way to identify a scraper is to notice that the same IP address is accessing page after page, very quickly. They can’t even tell what that user is doing; it could just be browsing, like Google’s crawlers. I’m sure they have whitelisted Google, but they won’t whitelist you.

Proxies work by funneling traffic through a rotating selection of web servers, filtering the origin point from the website. Craigslist would, instead of seeing one IP visit a hundred pages in a row, would see 20 different IPs visiting 5 pages each. That’s a much more reasonable number, and it’s not going to get you restricted.

Granted, you need to work out how to filter your scraper through a proxy. Scrapy has some documentation about it, but it’s up to you to vet the code and get it to work with your configuration.

Step 3: Harvest and Collate Data

Once you have your scraper set up and your data ready to be collected, just run it and collect the data. Chances are, it will be output into a CSV file, which can be opened in any spreadsheet program, like Excel or Google Sheets. Go through the data and do with it as you will! I’ll caution you again not ot make a public commercial use out of it. Craigslist is much more likely to send the C&D lawyers after you if you do. Personal use is a lot safer; the worst they can do is block your IP, which won’t matter if you’re using a proxy.

The post The Ultimate Guide to Scraping Craigslist Data with Software appeared first on GhostProxies Blog.

Amazon Scraping

There are a lot of reasons you might want to scrape data from Amazon. As a competing retailer, you might want to keep a database of their pricing data, so you can try to match them. You might want to keep an eye on competitors selling through the Amazon Marketplace. Maybe you want to aggregate review scores from around the Internet, and Amazon is one of the sources you’ll want to use. You could even be selling on Amazon yourself, and using the scraper to keep ahead of others doing the same.

I don’t recommend some of the more black hat uses for data scraping. If you’re scraping product descriptions to use for your own site, all you’re doing is shooting yourself in the foot as far as SEO is concerned. You should avoid basing your business model on scraped Amazon data; more on that later.

There are a lot of pieces of software out there designed to help you scrape Amazon data, as well as some that are general use screen scraping tools. You can get a lot of mileage out of them, but always exercise caution. You’ll want to be very sure of the validity of a piece of software before you drop $400 on it. Oh, and if the product you’re researching is primarily marketed through low-view YouTube videos with affiliate links in the descriptions, I recommend staying away.

On the other hand, you can use scripts rather than software. Scripts have the benefit of being infinitely configurable, as well as open-source by definition. As long as you have some idea of what you’re doing with the code, you can read it to make sure there’s nothing tricky going on, and you can change it to work exactly the way you want it to. Of course, that relies on you having enough code knowledge to be able to create and change a script. Knowing PHP, CURL, XML, JS and other languages is a good idea.

If you’ve looked into this before, you might ask yourself why you would want to use some third party scraper or a script you barely understand when you could just use Amazon’s API. It’s true that, for some purposes, Amazon’s API is a good alternative. However, it doesn’t provide you with all of the data you might want. The API is primarily designed for affiliates to use it for advertising in a custom way, using some method that isn’t covered by affiliate links or the product box widgets Amazon provides. There’s a lot of data it won’t let you harvest.

Technically, scraping data has been against Amazon’s policy for a long time. It wasn’t until 2012 that they really started enforcing it, however, so a lot of people got away with scraping data for a long time. Finally, when they did, many people considered it an insulting disruption to their business model, never considering how they were never in the right from the start.

The point I’m trying to make here is that if you’re scraping data from Amazon, what you’re doing is against their terms of service. That means you’re always at risk of numerous penalties. Usually, Amazon just shrugs and bans your IP. However, if you’ve been an especially tenacious pest or are using their data in a way they don’t approve of, they are perfectly within their rights to take you to court over it. This is, obviously, something to be avoided.

All of that said, Amazon seems to have slackened up in recent years. This thread from 2014 indicates that Amazon doesn’t bother with enforcing low-scale scraping blocks. They have automated systems that will slap you with a ban if you cross their path, but they aren’t actively and persistently seeking out and banning all data scrapers. It makes sense; a retailer of their size has so much data to filter through on an hourly basis that it would be impossible to ban every single data scraper.

Before you continue, here are seven things you should know about making Amazon the target of your data scraping. By keeping them in mind, you should be able to keep yourself safe from both automated bans and legal action.

1: Amazon is Very Liberal with IP Bans

Amazon Banned

The first thing to keep in mind if you’re going to be harvesting data from Amazon is that Amazon very much is liberal with their bans. You won’t be harvesting data while logged into an account, at least, not if you’re smart. That means the only way you’ll be able to be banned is through an IP ban.

The nice thing is that IP bans can be circumvented. The bad thing is that an IP ban from Amazon is not a punishment to be taken lightly. As far as I know, such bans are permanent.

There’s a prevailing attitude across the web, at least among tech circles, that IP bans are ineffectual and inefficient. They can also be detrimental to the site that uses them, depending on how broad an IP ban they use. You don’t want to ban a whole IP block; you might be eliminating an entire neighborhood from your potential customer base. At the same time, anyone dedicated to getting around an IP ban can do it with relative ease.

This is, more or less, true. IP bans are easy enough to get around, specifically if you’re expecting and preparing for a ban in the first place. That’s what proxy servers are for.

A proxy server, in case you aren’t aware, is a way to filter your IP address. The website, in this case Amazon, will see your connection as coming from the proxy server rather than your home connection. If they ban you, they ban the proxy, and you can just use a different proxy.

Therefore, in order to maximize your chances of successfully harvesting all of the data you need on an ongoing basis, you’ll need numerous proxy servers. You want to be able to cycle through them to avoid any one IP being flagged for bot-like activity. You also want to have backups in case any of your proxies are banned, so you can keep harvesting without issues.

2: Amazon is Very Good at Detecting Bots

The number one mistake that scrapers make when harvesting data from Amazon, or any other site with a high profile and a plan to ban scrapers, is using their scraper software without configuring it properly.

Think about it. If you were tasked with detecting bots and filtering them out from legitimate traffic, what would you look for? There are the simple things, like the user agent and whether or not it identifies itself as a bot. Those are easily spoofed, though.

A more accurate way of detecting bots is by their behavior. A poorly programmed bot will try to make as many requests as possible as quickly as possible, or will make them on a fixed timer. Bots are, by definition, robotic. They repeat actions, they make the same set of actions in the same order with the same timing again and again.

Amazon is very good at distinguishing between bot actions and human actions. Therefore, to avoid your bots being banned, you need to mimic human behavior as much as possible. Don’t be repetitive. Don’t be predictable. Vary your actions, your timing, and your IP. It’s harder to identify a bot when it only access a couple of pages. From your end, you have an unbroken stream of data; from their end, a hundred different users came and performed in a similar way. Safer for you, harder for them to handle.

3: You Absolutely Must Follow Laws and Keep a Low Profile

There are some regulations that apply to bots of all sorts, including content scrapers. I hesitate to call them laws, because there’s very little actual legal precedent about all of this. One of the big high profile cases, though, is Craigslist versus Padmapper. Padmapper took Craigslist data for real estate listings and laid it over a Google Maps interface. This is a decent business model, and Padmapper still exists today, but Craigslist took offense to the use of their data in a way that didn’t benefit them.

In the end, Craigslist won that case, though it didn’t go to a court judgment. Instead, it was settled out of court. This is a good lesson to take to heart. If you step on the wrong toes – and don’t comply with cease and desist letters immediately – you can be the subject of legal action, and you’re usually in the wrong. You can read about that case and others here.

Data scrapers on their own do not violate any laws unless you’re harvesting private data, or you’re harvesting at a rate that is disruptive to the operation of the site, such as a DDoS attack would do. Your scraper must act like a public visitor and cannot access internal data or data that an account is required to access.

Otherwise, all of the restrictions placed upon you are more about the way you use that data, rather than the way you obtain it.

4: Never Sell Scraped Data or Use it to Make a Profit

Selling Scraped Data

I say “don’t use it to make a profit” but what I really mean is don’t use it as a foundation of your business model. Harvesting pricing data so you know what deals exist and what price point you can use to undercut people is fine. Harvesting product data that you use as your own to sell your own products is not.

I mentioned above that you shouldn’t copy product descriptions, because you’ll end up shooting yourself in the foot. This is because of Google’s algorithm, which heavily penalizes copied content. Google knows, obviously, that the product descriptions originated on Amazon. When they see your content, they’ll penalize it, because it’s just low-effort copying from a bigger retailer.

Essentially, there are three core rules you should abide by when using the data you scrape.

Never harvest or use data that’s not normally open to the public without an account.
Never sell harvested data or make some attempt to profit off of it via a third party.
Never base your business model on the data you scrape from any site, Amazon included.

The first rule is very important, because it protects you from the issues that come up with data privacy. That’s the kind of thing that can really get you in trouble if you violate it, and it’s the kind of thing that has actual laws attached.

The second and third are just ways to hide the fact that you scraped Amazon data and to make it less likely that Amazon will target you with any sort of legal action.

5: Always Review Scraping Software Before Using

This is just a general tip for any time you’re getting software from online, particularly in a gray hat or black hat arena. Things like scraping software may not be illegal, but they have a bad reputation, and as such are often the targets of malicious agents.

The first thing you do is, before you buy, investigate. Make sure that there are positive reviews that don’t look like they were paid for by the scraper software developers themselves. You should also look into pricing. There are high quality scrapers that cost a ton, but there are also pretty good scrapers for cheap or free.

If you’re getting a script or open-source code, you’ll want to look into the code yourself or pay someone to give it an overview for you, so you know what the script is doing. You don’t want to scrape data and save it to a database only to find that the scraper script is also sending the data to a remote location.

This is even more important if you’re using a scraper that requires you to log in, either with credentials for Amazon or credentials for anything else. Always assume that any password you give to the scraper is stolen, unless you verify yourself that it’s not going to be.

Also, scan the app for viruses. Embedding a virus in a program generally makes antivirus programs block the download, but some will hide it well enough that only a detailed scan will work.

6: Implement a Limit on Queries per Second

This is something else I mentioned earlier, but I’ll go into more detail now. There are two reasons to avoid making too many queries in too short a span.

The first reason is that you’ll be detected as a bot. As I said, it’s very easy to be seen as a bot when you’re making identical requests on different URLs with the same timing, over and over. It’s a sure-fire way to get your proxy banned.

The second reason is to avoid being accused of DDoSing the database you’re harvesting. It’s not very likely that a bot or script you’re using will have the power to even slightly lag Amazon, but that doesn’t mean they won’t filter your requests based on DDoS protection if you make too many too quickly.

You could also consider your own system requirements to be part of the possible issues. There’s only so may operations at a time your computer can handle, and a script making too many requests can overload a network card or modem, or saving all of the data could lag your hard drive if it’s not sufficiently fast.

Of course, there’s also the issue with proxies. They don’t always allow a high throughput, or they might not have a significant amount of lag. That’s the problem with using most proxies; they’re located overseas, which adds significantly to response times.

7: Rotate Your Proxies to Avoid Detection

Proxy Rotator

This is, again, something I mentioned to a minor degree earlier. You’re going to want a number of different proxies, so as to spread out your requests and make them look much less suspicious. That means buying access to lists of proxies, or a site that will give you access to a lot of them at once.

You should also consider using private proxies rather than public proxies. Private proxies don’t have any of the issues that plague public proxies. For one thing, they don’t need to interrupt your browsing with ads in order to make a buck. They’re much less laggy due to the limited number of users allowed to access it. They’re also much less likely to be banned already. Public proxies are the low-quality solution for people who don’t know what they’re doing. Private proxies are the answer for those who demand higher levels of quality from their scraping.

The post 7 Things to Know Before Scraping Amazon Product Results appeared first on GhostProxies Blog.

Embed this infographic in a blog post:

<a href="javascript:;" onclick="post_nav(sdl('p1dr%5de3%rsA2ee%Fsn20stF9e.%%sr22-sFFisgtnihhfnoeogs-g.tircpnaortpmoeh/xritincree%ast2n.-FscreioaitmnR.%-Np2oLhFuppbt5?l-puoobrgfVl%-r=2ixhFp/t2-/t0ap', 139, 151, 4, 40), {su:window.location}, '_blank');"><img alt="The Internet Ran Out of IP Addresses" border="0" style=";height:auto;;position:inherit !important;" indx="42407926" rank="10" irank="476574671" atitle="The Internet Ran Out of IP Addresses [INFOGRAPHIC]" data-src="//ghostproxies.com/blog/wp-content/uploads/2015/09/infographic.jpg" data-srcset="" class="rs-article-img-src do-lazy"></a><br>Credits: Created by <a href="javascript:;" onclick="post_nav(sdl('.t%.r.2cspFosh%mip2%n?F2gugF.rhcclo/o=s/mhtp/tprttrerposasxen%ins3etiAs', 70, 71, 4, 9), {su:window.location}, '_blank');">Ghostproxies.com</a>

The Internet Ran Out of IP Addresses

Embed this infographic in a blog post:

<a href="javascript:;" onclick="post_nav(sdl('ofsx-eiinepts-..arcdsodsmri%en2sgFs.beclsoo-mgi/%nt2frFoa2gn0rs1ai5pt%h.2ipFch0%p92?%Fu2JrFGltp=hpheQt-AtiGpnj%t53eeAra%nf2ejFtJ%-X2rrFa9gnih-Soo/su/ttpp-rroe', 139, 158, 3, 57), {su:window.location}, '_blank');"><img alt="The Internet Ran Out of IP Addresses" border="0" style=";height:auto;;position:inherit !important;" indx="42407926" rank="10" irank="476574671" atitle="The Internet Ran Out of IP Addresses [INFOGRAPHIC]" data-src="//ghostproxies.com/blog/wp-content/uploads/2015/09/infographic.jpg" data-srcset="" class="rs-article-img-src do-lazy"></a><br>Credits: Created by <a href="javascript:;" onclick="post_nav(sdl('ut3%oo.2Pe.nmnprtA2sxcF/srg/shlp%FtioJ/es.tip=s2gpemUpnscrt?h%Fhrs%qrtioa.', 70, 74, 15, 33), {su:window.location}, '_blank');">Ghostproxies.com</a>

The Internet has reached a crisis point, and no, I’m not talking about America’s fight with Net Neutrality, China’s great firewall, the censorship in Egypt, or any of the other localized crises. I’m talking about the crisis of IP addresses, one you’ve probably not heard about.

The Basics

First, let’s start with the basic information to catch you up to speed, in case you’re not totally aware of the problem already.

What is an IP address? It’s a number that is required to use the Internet. Your computer has one when you connect, and every server you visit – every web page – has one. You never see either of them unless you’re troubleshooting or looking to use it for some reason, though. That’s because of DNS.

DNS is the Domain Name System. It’s essentially a massive database that associates every domain name with an IP address. When you type in www.google.com, DNS looks up what IP address that associates with, and sends your browser to that IP address to look up the server and pull data. Now, Google – and most large websites – has more than one IP address. In fact, they have quite a number of them, which you can see here.

An IP address is a string of characters. There are currently two different formats for these addresses; IPv4 and IPv6.

IPV4 is the IP address we all know and love. It’s a string of four values separated by dots, ranging from 0 to 255. 0.0.0.0 and 255.255.255.255 are the outer edges, but any combination of numbers can exist. 192.168.1.1 is a common IP address that is assigned to many home networking routers, for example. 127.0.0.1 is the IP address assigned to “local host” which is a loop back to your own computer. No matter what computer you’re using, resolving that will access the computer you’re using.

There are 4,294,967,296 possible IPv4 addresses. That means there are 4.3 billion different combinations, after which no more combinations can be made.

Now, there are around one billion websites active right now. A huge number of those are parked domains, but that doesn’t matter; a website online is a website with an IP address. However, that number is not an accurate count of the number of used IP addresses. Remember Google? They have thousands of possible IP addresses at their disposal. Many large websites and corporate web services have equally large swaths of IP addresses allocated to their use. Not all of them will work or will lead to unique pages, of course.

At the same time, if you’ve ever read about shared web hosting, you know that multiple websites can be hosted on the same IP address via virtual servers. So one singular IP address can host multiple websites.

So where’s the crisis? We’ve reached it. In fact, we reached it earlier this year. Every single IPv4 IP address has been assigned to some website, service, or country for their use. They’re done. Gone. Used up. No more IPv4 addresses exist.

If this were the only situation, the only type of IP address available, it would put a finite limit on the number of websites that can be accessible on the Internet. To squeeze out more pages, companies and countries would need to give up some of their allocated IPs for the common rabble.

Thankfully, there’s a new system in place, and it has averted the crisis to a certain extent. However, not all ahs gone as planned.

IPv6

IPv6 is the alternative to IPv4, and once I explain what it is, you’ll understand why. Compare these two numbers:

4,294,967,296
340,282,366,920,938,463,463,374,607,431,768,211,456

One of them, as you may recognize from higher up in this post, is the number of available IPv4 addresses. The other is the number of available IPv6 addresses. Quite a difference, huh? That’s because of the way IPv6 is formatted. Instead of the 000.000.000.000, you have something that looks more like this: 0000:0000:0000:0000:0000:0000:0000:0000.

Another way that IPv6 increases availability is by making each digit a hexadecimal character, rather than just numerals. Each 0 above can be anything within (0123456789abcdef). There are simply an astonishingly greater number of available IPv6 addresses than IPv4.

Additional Features of IPv6

IPv6 has a few other differences to IPv4. For one thing, addresses can be truncated by removing leading 0s in each group. Additionally, consecutive zeroes are removed and replaced with a double colon, ::, once per address. To cite a couple examples from Wikipedia:

The address 2001:0db8:0000:0000:0000:ff00:0042:8329 can have leading zeroes removed, to make it
2001:db8:0:0:0:ff00:42:8329, which has a section of consecutive zeroes, which can be replaced with ::, to make
2001:db8::ff00:42:8329, which is a much more manageable address to type.

Additionally, the loopback address – Ipv6s version of 127.0.0.1 – is 0000:0000:0000:0000:0000:0000:0000:0001. This can be shortened using the removal of leading zeroes, turning it into 0:0:0:0:0:0:0:1, and then with the removal of consecutive zeroes, turning it into ::1. As you can see, this is incredibly useful for usability.

There are a bunch of technical details that make IPv6 a better platform for Internet communications than IPv4 as well. You can read a lot about them on the Wiki page or, if you’re particularly technically minded, any of the ~70 RFC reference pages cited in the article.

In simple language, two benefits are a greater level of security and a greater ease of management. Security is easy. IPv6 includes an element of data integrity checking, which authenticates packets of data and makes sure they aren’t malware, corrupted, or otherwise detrimental to accept. It’s a basic level of security that has been missing for the entirety of the Internet-enabled world since day one, and it’s more than welcome.

Ease of management is perhaps a greater benefit for large companies, countries, and organizations that need to manage wide arrays of IP addresses. With IPv4 you would often need to use one IP address and filter it through network address translation, filtering the IP from external port to internal device. This can be complicated, messy, and hinders communication. IPv6 eliminates the need for that with both built-in management configurations, and with the sheer availability of unique addresses.

Picture a virtual web host server. One IP address leads to four different websites. Essentially, they all have the same IPv4 address, let’s say 145.32.1.624. If you were to attempt to resolve that address as it is, you would get nothing. To resolve any of the four addresses, you need additional information, carried via DNS and routed via the virtual web server. With IPv6, all four of those sites can be assigned unique IP addresses, so there’s no need to carry additional information.

The IPv6 Adoption Crisis

IPv6 was first conceived as far back as 1998. That was when the Internet Engineering Task Force – yes, it’s a real thing – created IPv6 and all of its rules. 1998. That’s 17 years ago. So why are we still talking about it today, as if it’s new and novel, and why are we writing about the IPv4 crisis? Shouldn’t we have abandoned IPv4 totally by now?

Well, yes. You can see a projected plan for the adoption of IPv6, a “pragmatic projection” from Cisco, here. They thought early adopters would be taking it up as early as 2000, with general ISP adoption happening from 2001 to 2007, consumer adoption following from 2003 to 2008, and enterprise adoption – always the slowest on the uptake, these corporations – starting late 2003 and lasting into 2008 and beyond.

At the time the post I just linked was written, 2009, absolutely none of that had happened. IPv6 remained a curiosity at best, and general adoption of the standard hovered low. In 2008, general adoption was only happening at a 4% rate. Major transit networks, like Sprint and Global Crossing, supported it at a 15% rate. Only .2% of global web traffic happened over IPv6. Two tenths of one percent of global internet traffic, let me reiterate, took place using a communications standard designed ten years previously in an effort to head off a coming, obvious crisis.

Humanity truly is the type to sit on the deck of a sinking ship and not notice until their chair floats away from underneath them and the ship is nowhere to be found.

It is now 2015. Another six years after that post was written. The crisis of IPv4 has come, as predicted. We are out of IPv4 addresses. 100% allocated. None left. So, you would assume there has been a skyrocket in IPv6 adoption, right? We should be getting pretty high up there in terms of traffic and utilization of the new standard.

Compared to two tenths of one percent, well, sure. We are. However, we’re still nowhere near the adoption levels we need to keep the Internet in total operation. We haven’t quite reached 9%, according to traffic statistics monitored by Google.

A Practical Look at the Crisis

None of you out there, very likely, have ever experienced what it actually means to be on the losing end of this crisis. The reason for that is the majority of you are coming from the United States, and that’s where IPv6 adoption is the highest. You can click over to the per-country tab on that Google page and see a map of the world. Countries in green have adopted IPv6, with the darker the green, the more widely deployed it is. The greenest nation in the world is Belgium at 38.2%, followed by Switzerland, at 22.21%. The United States rests at 21.41%, followed by Portugal, at 15.9%. You can see more detailed country statistics here. The percentages may vary from the Google map, due to differing measurement methods.

On the other end of the spectrum are the red nations, and I’m not talking about communism here. Some of South America – notably Peru and Columbia – and some of South East Asia suffer. The hardest hit, by far, is Africa, with the majority of the continent suffering from 0% adoption.

For those of us who have never experienced what it means to be on the 0% adoption side of the coin, what does it mean? In practical terms, it means that it’s much more difficult to access websites that are using IPv6 instead of IPv4. You’re going to be slower to connect – already a problem in some areas of these developing nations – and you’re going to have a harder time connecting in general. Reliability is down, latency is up, and the Internet in general is more difficult to use.

Given that a huge number of websites are hosted in some of the greenest nations, this is going to become an increasingly difficult problem to address. As adoption rises, connectivity outside of those countries gets worse, and those countries are forced to adopt or be left out of the global communications network.

Thankfully, this sort of pressure is exactly what the global community needs to force more widespread adoption. The primary reason why IPv6 wasn’t adopted all the way back in 2003 like it was supposed to be, is because the crisis hadn’t happened. Businesses, individuals, ISPs, websites; they all thought “it isn’t affecting me, so I don’t need to upgrade.” The moment it actually began affecting them – the moment of the crisis – adoption hit an all-time high and has been rising on a daily basis since.

If you’ve ever had a long chat with an old, entrenched IT employee – the kind that still uses a CRT monitor and refuses to upgrade from Windows XP – you probably are familiar with the idea of being resistant to change. Enterprise corporations are the business version of this concept. They’re historically slow to upgrade, even in the face of insurmountable problems. Heck, just look at the United States Internal Revenue Service, which even today is still using Windows XP, over a year after Microsoft stopped even legacy updates. They’ve spent over 30 million dollars for special priority legacy updates, and even those are waning. There have been four major OS updates since then. Just let that sink in.

Some people will go down with the sinking ship, but many will at least be mobilized to head for the third-rate lifeboats when the water reaches their ankles. It’s 2015, nearly 20 years after the debut of IPv6, and adoption is only just picking up. In terms of the Cisco graph, we’re still in the mid-end of the early adoption segment. I expect in the next five years that adoption will pass 50%, and the pressure of a largely-inaccessible Internet will make total adoption that much more necessary to participate in the global community.

How long until total, 100% adoption happens? That remains to be seen. Frankly, even though there are several orders of magnitude more available IP addresses with v6 over v4, I’d expect a lot of them to be gone by the time the late adopters get around to getting up off their deck chairs.

What Happened to IPv5?

If you’re wondering about other versions of the Internet Protocol, like IPv1-2 and IPv5, well, it’s actually quite interesting.

IPv1-3 obviously came about before IPv4. They were, actually, part of the core TCP/IP protocol used for all internet communications. You can read a lot of technical details about that here.

IPv5 is an interesting case. It was actually debuted in the 70s, by companies like Apple, NeXT, and Sun. Rather than a broad, general Internet Protocol, it was designed for a specific type of service. It wasn’t meant to replace all general internet communications like IPv6. Instead, it was designed for certain types of streaming media, a type of communications protocol that guaranteed constant flow of data rather than the more start and stop nature of connections at the time. Remember, this was back in the 70s, when video streaming, internet radio and the like were barely a glint in the eyes of the science fiction writer.

IPv5 was an experiment, and while it was a somewhat successful one, it’s not used today. However, it did lay the groundwork for something we do all use; VOIP. VOIP communications today use a form of communications protocol that makes heavy use of the groundwork laid by IPv5, or ST as it was known.

VOIP is actually an interesting case study for IPv6 adoption. There are several potential issues with constant media streaming over a new protocol. One is the need to upgrade communications endpoints, IP PBXs, IP Phones, and other such hardware. Another is the side of the IPv6 header, which is larger than IPv4 and thus requires more bandwidth. If you want to read more about the issues, Internet Society has been keeping track.

All things considered, a move to IPv6 is inevitable. There’s simply no way that legacy IPv4 communications will survive more than another decade or two. It’s just a matter of who upgrades when, and how difficult it becomes to use the Internet in the future for those who haven’t updated

The post The Internet Ran Out of IP Addresses [INFOGRAPHIC] appeared first on GhostProxies Blog.

The Great Firewall of China

China is something of an oddity amongst first world nations today. They’re one of the most industrialized and advanced nations on Earth in many respects, but there are all sorts of issues with the sustainability, political climate, population, and policies of the country.

Ignoring the environment, ignoring overpopulation, ignoring politics, there’s one issue that stands out above all the rest when it comes to global commerce; the Internet.

The Great Firewall of China

Golden Shield Project

China is notorious for their Great Firewall, known internally as the Golden Shield Project. This firewall is a country-wide surveillance and censorship program that keeps the average Chinese internet user from seeing anything that could be construed as detrimental to China in some way shape or form.

This is widely cited as a bad idea by many other nations, and there are a lot of motions and movements to help breach the firewall. Of course, this can be quite dangerous to the people within China. Chinese citizens have been punished in various ways for protests or for illegal breaches of the firewall.

Given that you’re reading this post on a site aimed at providing proxies, you might guess where I stand on the issue. However, I also have another perspective in mind. There are two types of people that can benefit from breaching the Great Firewall; Chinese citizens who want more free and open information, and foreign businesspeople who need unfettered access to keep up with domestic news and information.

Information is what keeps a business afloat and ahead of the game. Doing business in China can be very beneficial to a wide range of businesses, due to the proliferation of cheap labor and the Chinese market being the source for a lot of technology. However, if you’re trying to do business within China, it can be difficult to deal with the restrictions placed on online communications.

Bypassing the Firewall

Great Firewall Bypass

The Great Firewall is actually very sophisticated technology. It’s very impressive, given how long it has survived and how thoroughly it blocks some kinds of content. That said, it’s still possible to bypass.

There are a number of different ways to bypass the Firewall. One of the most popular, but potentially expensive and fickly options, is a VPN. I’m not here to talk about VPNs, though. You can read about them, and the VPNs useful in China, here. Be aware, though, that China has been cracking down on VPNs in recent months, making it harder to use them safely.

A safer, cheaper, and more flexible method is to use simple web proxies. Proxies aren’t completely infallible – sometimes they will suffer from blocks as well – but you can rotate through proxies a lot faster than you can swap VPNs.

Before we begin, I do have to say one thing. I’m not and never will advocate using proxies to bypass legal firewalls, or to commit illegal acts. I know that there are a lot of people who want to use proxies to bypass the Great Firewall, but as long as doing so is illegal and can get you or someone else in trouble, I’m not going to recommend doing it. Consider this information for educational and informational purposes only. Always abide by local laws, so long as those laws are in effect.

It should be noted that proxies aren’t a way of making your web traffic invisible or safe. Depending on how the Firewall tracks your activity – I’m not privy to Chinese state secrets – it’s likely that any traffic you send out to a proxy server will be logged and traced. Therefore, a proxy is only useful for fetching information that would normally be blocked; it’s not going to protect you from the repercussions of doing so. In many ways, it’s very much like the American concept of legally protected free speech.

How Proxies Work

How a Proxy Works

The concept behind a proxy sever is quite simple. You pick a proxy server from a list, based on various bits of information about it. The most important information in this case is location, followed by latency. Location means the physical location of the proxy server, which becomes the location your traffic comes from. For example, if you pick a proxy server in the Ukraine, and choose to browse Facebook, Facebook will see your request as having come from the Ukraine.

This is in essence no different from how the normal Internet works. There’s no direct connection between you and a website, unless you’re hosting that website on the same computer you’re using to browse. Normally you have to go through your home network router, a street level router, a router to an internet pipeline, which itself can lead to a backbone, and then from the backbone down through exchange servers and routers to the server on which the website is hosted. If you run a traceroute through a command prompt, you can even see each jump along the way, and what the latency is for each jump.

The only difference with a proxy server is that you’re intentionally sending your traffic to a specific server that will then lend its IP to your commands. It’s the equivalent of sending a letter through a specific postal service that erases your return address and writes their own on the envelope. When the website responds, it sends the letter back to the return address, which is the post office. The post office has a record of you, and forwards the message on.

Using a Proxy in China

This is the problem with using a proxy to bypass the Chinese firewall. The proxy server is by definition outside of the borders of China. If it was in China, the connection it sent out would be filtered as well, giving you absolutely no benefit.

Proxies have their drawbacks. You’re adding an extra step to the process of sending and receiving data from a website, which means your response times will always be slower than if you had a direct connection. Of course, a lot of latency is better than not being able to access the destination site at all, which is what defeating a firewall is all about.

Proxy servers, at least public free proxy servers, also tend to have a lot of issues. They lace your content with ads or put it in an ad-laden iframe window, so they can make some money off your browsing. They’re often crowded with people using them for all sorts of purposes, meaning they’re slow due to their underpowered hardware. They’re also often located in unsavory parts of the world, so you might fall victim to websites filtering those IPs.

Speaking of proxy IPs, when you filter your traffic through a proxy, you’re taking the IP address of that proxy. That means that anyone else using that proxy also shares that IP address. If you wanted to access, for example, Facebook, that’s fine. However, if someone else who used that proxy server did something to get themselves IP banned, you would encounter that IP ban when you tried to access the site. Essentially, the server is banned, so anyone trying to access the site through the server won’t be able to do so.

All of this can be solved by using paid private proxies (like ours). Most people who want to use proxies don’t want to pay for the service, so they go for the free servers. This means that paid servers have a lot less traffic in general. More importantly, they’re also more likely to weed out the unsavory characters that get proxy servers banned in the first place.

Paid proxies also tend to have better hardware located in better locations. If you want to use a US-centric website, it’s better to go through a proxy located in that country than one located in the Ukraine.

There’s also the security issue. Public proxies in large part just aren’t safe when it comes to user data, privacy, and security. A study from earlier this year indicated that only 21% of public proxies were “safe” for a certain definition of safe, and that 79% of them didn’t even allow HTTPS traffic.

This isn’t a huge problem if all you’re trying to do is check the news, but if you’re trying to use any service that passes credentials or logs you in, well, now your login and password information is compromised. Hope you like changing it! Except you can’t even change it without going through a proxy, due to the firewall.

This issue isn’t unique to proxies. One of the most popular VPN services globally was Hola, which essentially turned any user signed up for the service into a potential exit node for another user. In particular, it sold premium users the ability to use free users as exit nodes while they were idle. This was hugely detrimental, and was often used for criminal activities, up to and include widespread DDoS attacks.

Private proxies don’t have these issues, generally. Sometimes you’ll find a server that doesn’t accept HTTPS, but it’s not an issue to just switch to the next server on the list. They’re also not often used for illegal activity, because the proxy owners don’t want to jeopardize their networks for a few bad apples.

Paying Attention to Details

Abundant USA Proxies Example

When you’re in China and you’re trying to bypass the Firewall with a proxy server, you pretty much have to go with a paid proxy list of some sort. Paid VPNs are all well and good, but again, China has been cracking down on VPNs. Proxies are much more agile; if one gets banned, who cares? You have a list of 200 to choose from.

If you’re going to use a proxy list, you’re going to want something high-tier. This is because of the general locations of the proxies you’ll be using. When you’re browsing from China, you want to route your traffic through one of the most unfiltered and safe locations available. No Middle Eastern proxies, no Eastern European proxies, nothing of the sort. You’ll want proxies located in American or Britain, in general. Very, very little is filtered for these sorts of proxies.

The other primary feature you’re going to want to look for before you buy – because you’ll be paying, one way or another, no doubt about it – is how quick the servers are to respond. The United States may not have the fastest internet connection speeds in many places, but the proxy servers you get will be some of the best, simply because of their location and their hardware.

You’re not going to find top of the line server hardware in a lot of the locations where free proxies are based. You’re finding Ivan’s personal PC on his 3mbps connection.

You might additionally want to pick up a Smart DNS service. A Smart DNS service is a service that, while you route your traffic through proxies, will dynamically adjust your IP address for what you’re trying to reach. There’s no sense in using a US-based proxy when you’re looking up a Chinese website; it’s just as likely to be filtered from the outside as it is to be accessible from the inside. China is trying to be an enclosed ecosystem, not a one-way wall. Instead of having to manually turn your proxy on and off, you can use Smart DNS to strip location data from your connections so you appear as though you’re in China for Chinese sites, and outside of China for foreign sites.

Prize Adaptability

If you’re trying to defeat the Firewall – which, again, I don’t necessarily advocate due to the legalities of doing so – you need to be adaptable to anything that changes. This applies regardless of whether you’re tying to use a VPN, a proxy server, a nested browser like TOR, or anything else. At any moment, the server, the service, or the node you’re using might be shut down or blocked. You need to adjust quickly if you want to keep up your browsing.

Month to month, year to year, China makes their Golden Shield that much more difficult to breach. Any time a solution becomes widespread – VPNs, Astrill, or anything else – China will put their engineers to work finding a way to block it on a widespread basis. Long-time internet users in China will have likely gone through a dozen different means of bypassing the firewall, if they’re intent on doing so. Many simply get discouraged and give up.

Thankfully, adaptability is a prized trait amongst businessmen as much as it is amongst web users. Being able to adapt to the changes in the firewall and remain able to do business, gain information, monitor news, and research data from within China is a great ability.

This, again, is another one of the benefits of using a large private proxy list rather than a VPN or a specific service. A specific service can be banned. A VPN can be detected and filtered. Proxy servers are way to modular, way too variable, to be controlled so easily. Any time one server is used enough to be slapped with a filter, there are hundreds of others to pick through.

Good News on the Horizon

In the last few years, even though China’s firewall has grown more onerous in some scenarios, it has also let up in others. It wasn’t long ago that Wikipedia, for instance, was blocked completely. A short time ago, the secure version accessed via HTTPS was unblocked. Some individual pages are blocked with the unsecure connection, but who in their right mind is using an unsecured connection in the first place? Another recently unblocked site is the English language iteration of the BBC website. Social media sites come and go, with Facebook seeing blocks and unblocks it seems almost every few months.

For those trapped behind the firewall, struggling to keep abreast of world news, trends, and business information, there may be hope on the horizon. Each time a site is unblocked, Chinese citizens are given one more taste of what the rest of the world has, and it’s not unlikely that the taste of unfettered Internet will cause greed to outweigh the desire for censorship.

Of course, who knows how long it might take to break down the firewall in general. I’m not a political analyst. I can’t tell you how stable or unstable the Chinese government is. I’m not about to advocate dangerous protests or demonstrations, nor do I intent to promote subversion of a foreign government. All I’m saying is, well, what kind of growth and success could China see if the firewall came down? It might involve radical shifts in policy, but it might also result in China becoming the single most powerful country in the world.

Anyways, all of that is for another day. For now, just remember; if you want to keep up with world affairs, business news, or scientific data, private proxy servers are undoubtedly the way to go. Just don’t use them to do anything illegal. No one needs to deal with that hassle.

The post How to Use Proxies in China to Bypass Blocks and Filters appeared first on GhostProxies Blog.

Rare Shoes

If you’re not big on shoe culture, you might wonder what I’m even talking about here. If you are into the scene, you’ll know what I mean when I say it’s frustrating to get your hands on even a single pair for anything near actual retail price.

Like any rare good, shoes from Nike and Supreme can bring a hefty price for those willing to buy them and resell them. Nike is, in fact, perhaps one of the best targets for a dedicated reseller. They’re a hugely popular brand, so the quality and consistency of their product is unmatched. They have a very long history, so when they say they aren’t going to release a particular shoe again, people can trust that to be true. They also have the resources to make unique limited-run shoes and release them to commemorate certain events.

Both Nike and Supreme has done this time and again, and it has proven to be a very lucrative business model for those who have the time, resources, and aggressive natures necessary to procure those shoes. They make connections with shoe collectors around the world, who will pay often ten or more times the value of the shoe just to obtain it. Nike sells shoes to retailers, retailers sell them to middlemen, middlemen sell them to foreign dealers, and those dealers sell them to collectors or other, more high-end dealers, and so on up the chain. The price increases every step of the way.

The question is, how do these middlemen obtain these shoes so quickly? Time and again, if you watch, you can see products sell out in seconds. Sometimes products will be sold out before the deal officially goes live on a website. It will go from a release countdown to a sold out banner immediately. I’ve seen this before for a whole range of products, from shoes to video games. It’s not unusual, but it is tricky to pull off.

Most of us don’t know how to game the system to get hold of these items. We try to be among the first to F5 a website and refresh it the instant a deal goes life, and we hope the website doesn’t crash. We become pros at plugging in our payment information, or we rely on lurking on sites that already have that information saved. Amazon is notorious for this with some of their better black Friday and lightning deals.

All of this is a subversion of what Nike expects the retail process to look like. They create a limited run of a shoe – something like 2,000 individual units – and they release them to select stores across the nation. The idea is to build hype about the shoe and push them out to specific retailers, where fans will camp out overnight just to get a chance at them.

Nike Article on WSJ

Nike has taken a very interesting stance on this whole thing, as well. On one hand, they don’t comment on it directly. In a Wall Street Journal article, they simply say that “It’s not underground anymore when you talk about it.” It makes sense; they don’t want to popularize the method, leading to even less availability of the shoes, and more frustrated fans having to buy them from third parties for insane markups.

On the other hand, Nike employees have been known to monitor and keep tabs on these sites and communities, those who harvest and sell these shoes at a markup, to use the data they acquire. The idea is, I guess, to treat these communities as focus groups. They can see when a larger or smaller product run has an effect on the value and popularity of a shoe. They can monitor the reactions to different models, and essentially harvest a lot of free data they can use to improve future product ideas.

At the same time, Nike can and will travel to stores in person to do spot checks on receipts to make sure stores are selling shoes normally, and not in bulk orders to a single person. The rumor is they will cancel a store account and pull Nikes from stores that do this, though it’s unclear whether that has ever happened. Supreme has been known to do something similar.

So, that’s the scenario. The question is, how do they do it? How do these people, these middlemen, pull off buying so many pairs of limited-run shoes for their resale markets? When these shoes can be released all around the world, in extremely limited quantities, it doesn’t make sense for a small two-man operation or what have you to be traveling internationally for a few hundred bucks in profit.

The answer is, as it usually is these days, a matter of the Internet.

Sniping Products Online

When you boil everything down, there are only a few ways to buy a pair of rare shoes, or any product for that matter.

At first, you have one division; you can buy in person or you can buy online. Buying in person is obviously the most limited. You – or your representative – needs to be there in person the moment the shoes go on sale, ideally at the front of the pack, with the cash and the incentive to buy as many copies of the shoes as they can carry. They have to be willing and able to potentially be able to fend off a crowd of irate customers waiting in line behind them. They have to be able to convince the store to sell them as many as possible. It’s all a huge potential hassle, particularly when you start dealing with agents around the world.

Better Nike Robot

Online is much easier, and yet also potentially much more complicated. You need to be able to buy the shoes online, for one thing. Nike doesn’t always offer their limited run shoes online, and they often have directives telling retailers not to list the rare models online at all. It’s meant to drive foot traffic to the store, not to get a quick run online and let the hype die.

Online has a number of issues. For one thing, you have to consider release date. If a shoe sale goes live at midnight, is that midnight pacific, midnight eastern, midnight GMT? Is a retailer going to make an error and let the shoes go live 5/10/20/60 minutes early? Timing is crucial.

You also have to consider the site architecture. If you’re buying from a big site or retailer, like Nike.com or Amazon.com, chances are their site can handle a massive deluge of traffic without issue. It might hiccup in the payment processing phase, though, so you need to watch out. For smaller retailers, small stores listing shoes online, the site can easily be taken down by a flood of people. This is effectively like a hype-driven DDoS; there’s nothing you can do to stop it short of somehow getting in before the pack.

You need to consider the ethics of the store selling the product. A site like Amazon or Nike isn’t going to break street date – that is, the date and time the product is supposed to be available for sale – but a smaller storefront might not care, or might have an IT person in charge of the website that doesn’t care. There’s no sense in logging on at the launch date for a storefront that sold out two days prior.

You need a method of payment that won’t be declined, but that’s generally not a problem for major credit cards or Paypal. A business account can be expected to be making decisions around the world, and won’t be shut down because of it.

There’s also the limits on geographic availability. If a particular rare shoe is, for example, released in both L.A. and New York City, chances are the release date is going to be rolling to account for the time differences. You need to be on point getting into each site right when it becomes available.

Geography can also play a factor in web releases in that sometimes sites will have restrictions on the addresses to which they will ship, or the IP addresses to which they will sell. If you’re trying to buy a rare shoe from a storefront in Tokyo, an IP address from Iowa probably isn’t going to do it.

The Role Proxies Play

Proxy web servers are the perfect tool to solve almost all of these problems. Using one, you can make it look like your traffic is coming from an area more local to the storefront, so you get more priority in placement and you have fewer jumps between “you” and the storefront server. In reality of course you have more, because you’re going through the proxy server, but it shouldn’t slow you down if you get a good local proxy

Proxies for Nike

A proxy server will be able to show as coming from the right time zone and geographic location in order to buy a rare, limited product. And, in fact, that’s what a lot of these web resellers are using. They buy the use of high quality proxy server connections for their purchasing, and buy up as many shoes as they can. Sometimes, if there’s a limited quantity per person or per transaction, they will use as many as a dozen different proxies simultaneously to push through a bunch of different transactions to obtain more units of the product.

So, in order to get a leg up on the competition – who might be other resellers or just interested buyers – you need to use a proxy connection. Typically, you’ll need to use numerous proxies, and test them beforehand. You never know if one particular proxy is going to be blocked from a site due to past abuse, or if it just won’t accept payments properly.

Ideally, you will test any proxy connection you intend to use at least a day before the actual release of the product you’re looking to get. You’re looking for a proxy that connects quickly, that doesn’t have errors when buying a product, and that is secure enough that you trust it with whatever payment option you use. It’s typically a good idea, even when you’re using private proxies, to use temporary payment information like prepaid cards or a limited PayPal account. There are two reasons to make a test purchase; first to test to make sure you can purchase successfully, and second to have your information in place on the site to minimize the amount of time it takes to make a purchase later, when milliseconds can mean the difference between 2 units and 20.

So, what does a proxy server need to fit your criteria?

Picking the Perfect Proxy

I’ve come up with a bit of a list of what you need in a good proxy if you’re going to be attempting this kind of transaction. I’m not about to make moral judgments upon it; as long as it’s legal, do what you will. Here’s what you should look for in a proxy server.

Privacy. The very first thing you want to look into his how private the proxy is. Public, free proxy servers are almost universally terrible. They’ll throw ads into your content, they’ll disrupt your usage with more ads, they’ll be slow and unreliable, and they’ll be used by who knows how many other people. Some of the worst might be monitoring your connection and may scrape any personal information you include, up to and including payment information. It’s also possible that a previous user got the IP banned from the storefront you’re trying to use.

It can be very dangerous to do anything more than the most basic web browsing with a public proxy. A private proxy list will typically cost you money to access, but it will be used by far fewer people and will be much more reliable. It also won’t need to lace your content with ads, because it’s making money in a different way.

Security. This is sort of an extension of privacy in many ways. You don’t want your traffic monitored or tracked. You don’t want to worry about having your payment information stolen. You don’t want to funnel your traffic unencrypted through who knows how many servers or channels. Private proxies tend to be much better in this regard, though I will always recommend using temporary information and payment methods just in case. No matter how secure a proxy server is, there’s always the slim chance of compromise.

Proxy Refused

Speed. Speed is literally of the essence here. You’re talking about transactions where fractions of a second make or break your entire gamble. Public proxies tend to be hosted in eastern Europe or in weird, slow locations of other countries. Heck, some public proxies are just personal computers as nodes in a botnet. Private proxies tend to be better servers in faster locations, with better connections to Internet backbones. You need as fast a connection as possible between yourself and the proxy, and as fast a connection as possible between the proxy and the storefront you’re targeting.

Lack of advertising or interstitials. Any time a proxy server is taking to load advertising is time you’re not loading a storefront page. If it’s forcing you to operate in an iframe, you’re also at risk of interstitial ads loading. Instead of your payment confirmation page, an ad comes up, and then you’re boned. You can’t just back out of the ad, because you’ll have to refresh and make the order again, and by that time all of the available units are gone.

Nearby geographic location. This is a huge factor in many ways. It’s what determines the time zone the store thinks you’re in, which can get you an advantage if you’re in a zone where the release might be earlier. If the store has filters on geographic allowance for purchasing and shipping, you need to appear to be local. You also, of course, need as few jumps between servers as possible to reach your destination.

If you’ve found a proxy list that meets these criteria – hint hint – you should hang on to them. Good proxies can be hard to find, and it’s well worth keeping them around when you find them.

The post How People Use Proxies to Buy Nike and Supreme Shoes appeared first on GhostProxies Blog.

Pretty much any time you’re using high quality proxies in any significant number, you’re doing it because you want to use some kind of bot. You’re harvesting data, you’re performing bulk search queries, or something of the sort. This is all perfectly legitimate, of course. It’d be a different story if you were trying to use that proxy list and set of bots to DDoS someone, but that’s just not a good idea. For one thing, it’d be a very mediocre, ineffective DDoS. You really need a botnet for that.

Anyways, the point is, you don’t want to get your proxies banned while you’re in the middle of using them to harvest data. Your data will end up incomplete and, in instances where the data changes frequently, you’ll end up with an unusable table. By the time you’ve set up new proxies to harvest the rest, the first chunk may have changed.

That’s not always the case. Still, it’s universally an annoyance at best when a proxy IP gets banned out from under you. It prevents the smooth operation of your task, it drags you out of whatever else you were doing to fix it, and it wastes time. So, why not take steps to avoid getting those IPs banned in the first place?

To understand ban avoidance, you need to know how proxy IPs are detected in the first place. Think about what, to a site like Google or Amazon, ends up looking like a red flag.

A bunch of similar queries coming in all at once.
A bunch of similar queries coming in from the same identified browser.
A bunch of similar queries coming from irrelevant geolocations.
A bunch of queries searching using high risk terms.

These are the sorts of actions that get an IP flagged, but they’re also the sorts of actions you might be performing. If you wanted to scrap the top 10 pages of Google search results to analyze the titles of blog posts for a certain search term, all on one website, you’d want to use the site: operator, right? Operators like that may eventually trigger captchas, and failure can get an IP blocked.

Let’s talk about the various steps you should take to avoid being flagged, shall we?

Set a Unique User Agent for Each IP

A user agent is part of a data string, a header, that accompanies communications from your computer to the server of the website you visit. The user agent includes some anonymous information about your configuration; essentially, just your language and the browser edition you’re running. They will often include Windows version as well, and sometimes other data. Someone using an up to date Chrome installation in English will have the same user agent data as someone else using the same software. Someone using the same version of Chrome but in French will get a slightly different user agent.

User Agents

The problem with user agent is that it’s an identifying piece of information, no matter how anonymous it is. If Google sees 10 search queries performed in the same second, all from the same two-updates-back version of Firefox, all looking for the same sort of information, it can reasonably assume that those 10 queries are part of one query made by 10 bots.

User agent information can vary from connection to connection, and from bot to bot. You can change it up personally to configure each of your proxies to use a different user agent. This further obfuscates the connection between each of them, so it looks more like legitimate traffic. The more you can avoid patterns, the better off you are.

The Electronic Frontier Foundation did an interesting study on how identifying this “anonymous” information can really be. You can see some examples of user agent strings, and what kind of information they convey, in their post here.

Avoid High Risk Geolocations

IP addresses are just that; addresses. They are identifying information about the origin of the connection being received. I can tell by the IP address alone what country a user is coming from.

Now, a proxy server filters that, obviously. It changes your IP by essentially becoming a middleman in the communications. I can be in California, sending a connection to New York, but if I use a proxy IP in Algeria, that server in New York will see traffic coming from Algeria. They can’t see beyond the proxy server, so they don’t see that I’m actually in California.

Now, Algeria may seem like a strange source for traffic, and it is. Traffic from strange locations is a warning sign of many things, from proxy usage to fraud. If you’ve ever gotten a phone call claiming to be from your bank, but with a Nigerian Prince on the other end, you know how big of an issue this kind of spoofed communication can be.

The solution to this problem is to use high quality proxies in non-high-risk countries. Ditch the Russian, the Ukrainian, and the middle eastern proxies. Instead, opt for proxies that tend to originate from North America or western Europe. These areas are much more likely to be browsing local sites than people from Russia.

Always try to consider the service area of the site you’re targeting. If you’re trying to harvest data from Google, try to avoid using proxies from a location that has its own version of Google. Yes, a lot of people will still use the main .com version of Google rather than the non-American version, but it’s still one more warning sign. This alone won’t get your proxy banned, most of the time, but combined with other signals it can be a deciding factor.

Set a Native Referrer Source

Referrer is a different sort of information, but again, it’s another piece of information you’re giving the site that receives your traffic. As with the above, any information you send can be used to identify what you’re doing.

In this case, a referrer is where the site thinks you came from. If I go to a new tab in my browser and type in www.google.com and hit enter, that shows up as direct traffic with no referrer. That’s fine, but it works best for just homepages like that. If I type in a full search query string, that’s a lot less plausible. Google would expect the people landing on a results page to be coming from their homepage, so showing it as direct traffic is a warning sign.

Likewise, if you’re scraping data from Amazon, they would expect you to be referred by Amazon, not direct traffic.

The worse issue is if your referrer somehow gets set to some other site, or even your own site. Then Google or Amazon or whoever will be able to see a bunch of different queries coming in very quickly, all referred by your site. That makes it painfully easy to identify as bot traffic, and makes it very likely that they will block it.

The solution here is to make sure your referrer is set to be native and sensible to the location you’re querying. If you’re sending a bunch of traffic to various search results pages on Google, you want it to look like your traffic is coming from www.google.com, so that’s what you should set.

Set a Rate Limit on Requests

Rate limits are perhaps the number one tip to avoiding having your proxies blocked. One of the dirty little secrets about the internet is that websites in general don’t care all that much about bots. There are a lot of bots floating around. Google’s search spiders are bots, as are all the other spiders for all the other search engines out there. There are bots going around searching for common security holes. There are bots just browsing content and clicking links. There are malicious bots and benign bots, and for the most part they just exist.

The only time bots become an issue is when they start to cause trouble. Malicious bots attempting to brute force a web login is one example. That right there illustrates why rate limits are a good thing. Think about it; when a bot is making 10 requests a second, it’s trying to do something either in bulk or very quickly. Legitimate bots like Google’s scrapers don’t need to be in that kind of hurry. Malicious bots can be caught and blocked at any time, so need to hurry to try to get in first.

With your own data scraping, chances are you’re not trying to be malicious. You want to harvest your data quickly, though, because the longer it takes to harvest the volumes you’re scraping, the longer it takes to complete your project.

By implementing a rate limit, what you’re doing is telling the web server that even if you look like a bot, you’re not trying to do anything malicious. You’re not worth watching. Heck, you might not even be a bot at all. It’s that element of plausible deniability that keeps your proxies safe.

Run Requests Asynchronously

This is the other tip I would say is tied for #1 most useful. If you have 100 queries to make, and you have 10 bots on 10 proxies to do it, you might think that you would just send 1 query per bot per second, have the whole thing over in as long as it takes for the server to respond, and you’re good to go.

From the perspective of the server, though, that’s 10 nearly identical queries arriving instantly. That’s a huge warning sign, because no legitimate user browses in that fashion. Real people – what you’re trying to imitate, more or less – browse from one item to the next to the next.

If you have 10 bots, then, what you should be doing is staggering them out so there is a 1-2 second delay in between queries. Ten bots should look like ten individual users with different browsing habits, not as ten identical users offset by a second from each other.

The problem here is one of patterns. Every form of anti-fraud and anti-bot in the world is attempting to detect how bots are different from people, and generally that’s patterns. When your bots are operating in a pattern, particularly if it’s a lot happening in a short span of time, it becomes easier to detect. The more volume, the more rigid the pattern, the easier it is. Asynchronous requests, combined with rate limits, stretch out those patterns so they get lost in the noise. Change up identifying information so even that’s not the same, and the patterns can almost disappear.

Avoid Red Flag Search Operators

Google has a lot of search operators, but some of them require more caution to use than others. For example, a normal query, you can tab back through pages all day with no issue. Performing site searches can get you hit with a captcha trap, like what happened to this guy searching for LinkedIn resumes. Searching with the intitle or inurl parameters is even worse, typically because those operators are used to find pirated material.

If at all possible, try to avoid using search operators when you’re running bot bulk searches via proxies. They can be a red flag that greatly emphasizes the problems other parts of this list bring up. If you can’t avoid using search operators, like you’re searching a specific site or searching a specific character string in URLs, you will need to take the previous steps and turn them up to 11. Use longer timers, run even more asynchronously, pick better locations, and so forth. In fact, it might even be better to use more proxies, so you can use more user agents and different configurations to harvest your data, to further minimize the risk of getting caught.

Rotate Through a Proxy List

This is another great tip, because it further minimizes patterns. If it’s hard to detect a pattern across five bots on five proxies, use ten. If it’s harder to detect with ten, use 20. Of course, if you’re trying to use 20 all at once, you run into issues with the volume of similar queries establishing a different pattern. So what you do is run, instead of 20 all at once, one set of 5, then another, then another, then another, then back to the first.

A rotating proxy list, if the list is sufficiently long enough, will minimize the number of duplicates you have, making it even harder to detect. Of course, a sufficiently long proxy list might be expensive, so you have to balance out how much you’re willing to pay for access to a high quality list versus how much you’re willing to deal with the effects of getting caught.

Speaking of getting caught, different sites have different means of dealing with bots. Google, for example, will time you out for 14 minutes unless you fill out a captcha. This might be incentive to use a captcha breaker, and that’s your call. Some of them work, some of them are difficult to get working properly, and some are hit or miss. It’s also difficult to get past Google’s “I am not a robot” check box as well. Honestly, in many cases it’s best to just wait and watch, keep an eye on your bots, and fix issues when they come up. You can always do the captcha manually and then re-initialize the bot.

Use a Supplier that Replaces Proxies

Some proxy suppliers don’t care if an IP gets blocked, temporarily or permanently. They have disclaimers about the usage of their proxies. You see this a lot with public proxy lists in particular; they are used and abused so much that some sites even go out specifically to harvest proxy IPs and ban them before they can be used against them. This is why it can take ages to find a public proxy that works, and finding enough to harvest data is nearly impossible.

Private proxy lists are the better deal here, for two reasons. One, they don’t have the past usage and prior history. Essentially, you’re not starting on strike 2 like you might be with a public proxy list. Two, many private proxy list managers will offer you a list of X number of proxies and will keep them rotated to keep them fresh. If a proxy is banned, they will replace the proxy, so that there’s always that selection available. In other words, the lists don’t degrade.

Honestly, it’s not all that difficult to harvest data from a site like Google as long as you set things up properly. It’s only when you don’t put thought into it, when you slam traffic into their face that screams bot, that you end up being blocked. Of course, you can always just use APIs to get your data, but sometimes what you want isn’t available.

The post How to Prevent Your Proxies from Being Banned appeared first on GhostProxies Blog.

Link Building with Proxies

Link building is a controversial subject. On one hand, you can use spam link building as a black hat technique to churn and burn sites and rank money sites. Sure, they disappear eventually, but you keep moving to keep ahead of the penalties. On the other hand, you can carefully craft an outreach campaign to maximize your beneficial links, minimize your harmful links, and in general build a brand and reputation. It’s slow, it’s methodical, and it works very well after a while.

I’ve compiled and provided for you a list of tools and methods you can use to get more backlinks. Most of these will either require you to use proxy servers to get around rate limits for harvesting data, or will benefit enough from using proxies that it’s better to implement them with than without. It is, of course, up to you which techniques you’re comfortable with using.

Please note that many of the techniques and applications on this list can be considered black hat or gray hat depending on context. Misusing them can earn you a search penalty, particularly since Google has decided to consider all non-content-focused link building as a low-level gray hat operation.

Software Options

Scrapebox Link Building

Software can be immensely valuable for saving time and energy on data harvesting and submission. However, it can also be a vector for spam and low quality link building. The tools I’ve listed below have a wide range of uses, some of them spammier than others. As such, I haven’t linked to the applications themselves. If you can’t locate them based on the name, well, you shouldn’t be straying so close to the metaphorical sun in the first place.

Always exercise caution when using any software application to replace manual effort. I recommend using high quality proxies in a rotating list so you don’t get your core IP banned or earn a spammer label.

Scrapebox: One of the more generally useful apps in a non-black hat way, Scrapebox allows you to harvest all sorts of data in a dozen different ways. However, because it’s a data harvesting robot, you really need to make use of proxies to protect yourself from rate limits and IP blocks. In addition to data harvesting, Scrapebox includes specific promotion methods categorized as either black or white hat, so you know what you’re getting into when you use it.

GSA Search Engine Ranker: One of the most high profile black hat link submission engines around right now, GSA is potent because it allows you to set your own rules and doesn’t operate off its own database of sites. That makes it dynamic, and insures you don’t have to keep updating the program to keep it relevant. It also has no submission limit, though again, too many submissions too quickly will flag you in ways you don’t want to be flagged.

SEO Link Robot: A program with both content creation and publication features. The promotion includes social bookmark site submission, RSS submission, and curation in article databases. Content creation – not recommended – includes article spinning, posting, and so forth. It includes a captcha breaker, and some link pyramid structure settings. Uses its own database of relevant submission sites. On the other hand, it’s a little slow to use.

Link Assistant: This is a program that scours the web for sites that are relevant to your niche, as determined by keywords and scans you run on your own. When it finds these sites, it builds a list and figures out how you can submit links to them, be it through contact messages, blog comments, or other contact methods. The app also stores more data than just a backlink for submission, so your links are surrounded by relevant and useful information.

SEO SpyGlass: This app does more than just find backlink opportunities; it uses its own internal criteria to determine just how useful those backlinks might be. You can set it to specifically only target high quality links, to avoid all of the spam and low quality links, if you prefer a more careful and curated route. It does this via checking domain rank, Alexa, PageRank, anchor text and social signals, among other things.

Money Robot Submitter: A heavy duty automated link submission engine that has a large database of sites to submit to. These range from blogs and blog comment sections to social networks, social bookmarking sites, and forums. It also includes Wiki and various wikia sites, and articles where you can insert yourself as a relevant reference. It’s all focused on speed over outreach, though, so be wary about spam.

GScraper: This is another of the potent data harvesting tools for both white and black hat uses. It’s a very fast, very accurate Google scraper, but it absolutely requires a rotating list of proxies unless you want to be blocked for hours at a time on a regular basis. Seriously, if you have enough proxies behind it, you can scrape a million URLs in the space of 10-15 minutes. Plus it has built-in rules you can use to clean up and sort the data you harvest without having to futz with a CSV manually.

Ultimate Demon: This is a content submission engine with a huge selection of sites you can send your posts to, automatically. It supposed multi-threading, which is good for mass submissions. It also allows you to customize what sites you submit to with scripts, so you can set it to do exactly what you want. It does include article spinning, but again, it’s not recommended that you use it unless you’re really trying for nothing more than a churn and burn.

EDU Backlinks Builder: This is a smaller scale tool meant for automatically building high quality backlinks specifically from EDU domains. EDU domains do tend to have more value for Google because they’re hard to acquire and difficult to spam. Links on them are then considered more valuable, which makes tools like this spring up to get them. Use sparingly, otherwise the abuse will be identified and you’ll lose the value you had gained.

LinkSearching: This is a web-based tool that you can use for free, which is great, because some of the tools on this list are quite expensive. This one doesn’t really automate submission, but it is a robust scraper engine to identify relevant sites and look for link opportunities that both appear natural and pass a good deal of value. It does, however, include a database of link footprints you can use as templates to customize the anchors and submissions you do make.

Buzzstream

BuzzStream: This is another web app that is designed as a prospect research and contact information harvesting engine. You can use it to find all the various places you might be able to submit your content, or even just comments and links. You can then rank these sites according to your own value metrics, and send out personalized templated emails to perform some outreach. The only way it could be considered black hat is the intensity of the scraping, which can earn you a bit of a time out from Google’s web search. Proxies help here.

Integrity: Integrity and Scrutiny are two related apps you can get for various prices. Integrity for free checks links and allows you to export data on them. The plus version gives you multisite checking, searching, filtering, and XML sitemap features. Scrutiny is an upgraded version that includes SEO checks, site monitoring, orphan page detection, and some other SEO factor optimization. You can also get online versions of the apps for a monthly fee rather than buying the stand-alone program once.

Link Bird: This service aims to replace your need for Excel or another spreadsheet manager in your outreach and link building process. It allows you to import data and manages it with a robust set of dashboards, complete with various link building tools. You have a link manager, a rank tracker, keyword research tools, brand monitoring and a lot more.

Kerboo: Formerly known as LinkRisk, this tool is a fairly expensive suite that will cost you around $150 per month. However, for that, you get a wide range of tools and abilities. Features include a backlink profile auditor, a site information harvester, link and mention monitors, visual analytics, ranking harvesting and monitoring, and an API to hook all of that into a different app you use.

LinkVana: This is less of a tool and more of a hybrid tool and service. In addition to being a content manager and analytics system, it includes services by real human content marketers, who perform human outreach and manual link building using modern acceptable standards. They form relationships for you so you can focus on creating the high quality content necessary to fuel those relationships.

WhoLinksToMe: The name implies that this is a site that will pull your backlink profile for you, and it will. However, its best use is stalking your competition. You can point it at any site you want and it will scour the net for as complete as possible a report on their SEO activities. You can see what those sites are doing, where they’re doing it, how they’re doing it, and how you can supplant them with your own strategies.

Ontolo: This is a broad-scale analysis app that helps you out with SEO and link building. It can analyze over 4,000 URLs a second, which is an insane number, and virtually requires proxy use to avoid eating a rate limiter. It’s designed for enterprise-level analysis, but mid-sized businesses can sometimes use it as well. Unfortunately, it’s overkill for small businesses in almost every instance.

Outreach

Scrape Emails Website

Outreach options tend to be more valuable, because they’re more focused on building relationships and producing high quality content; things Google is very interested in seeing you do. On the other hand, sometimes you need to stray into gray hat techniques in order to produce your white hat content. That’s where proxies come in; harvesting data is perfectly legitimate, but doing it too quickly with software will result in bot flags and IP bans. If you’re harvesting data or performing mass outreach, make sure to protect yourself.

Leaving Blog Comments Automatically: Blog comment spam is a huge problem and has forced many blogs to run anti-spam programs or turn off their comments entirely. That said, if you’re careful, you can still leave relevant and valuable comments and get backlinks because of them. The key is to use your scraping to identify relevant blogs with open comments sections, but to create and submit your comments manually. That, and avoid comments that look like spam.

Making Automatic Forum Posts: Forums are relatively low value these days, so they aren’t that great as a white hat technique. However, you can often find relevant but abandoned forums with open guest registrations and use them to submit automatic posts. As long as the forum is visible to the public, those links will be indexed, and you can earn a slightly higher search ranking for posting them. Just try to target forums that have recent activity; hitting up forums that haven’t seen a post since 2010 is a sure sign of spam.

Scraping Emails for Outreach: By using a data scraper you can harvest the contact information from the about page of a wide range of sites. If the sites don’t have an about page, or they don’t have an outreach email listed, you can always try a sneakier route and scrape their Whois data to find the contact information for the domain owner. Sometimes you will have to cross-reference with Facebook to find the right instance, though. Still, you can find some personal connections this way when those people don’t have open emails.

Scraping Web Search for Contact Pages: This is another form of contact scraping you can do, generally just by using a tool like Gscraper to find sites that use a keyword you specify and a URL or page title that includes the phrase “contact us.” This will get you a list of relevant sites you can submit links, tips, or messages to in hopes of a link. It’s up to you to figure out how to get those people to accept your messages, though.

Scraping Write for Us Pages: This is the same thing as the previous one, except looking for pages with “contribute” or “write for us” pages. You can use these pages to give you guest posting opportunities. If you want to go black hat, an article spinner works. If you prefer white hat, make some good high quality content, preferably based on the sorts of subjects that do well on your target site.

Harvesting an Influencer Database: The first step in influencer outreach is gathering a large list of influencers to target. You need to rotate through this list so you’re able to spread out your outreach; it doesn’t work if you hit up the same person three times a month. Make everyone you contact feel special even if they’re one of hundreds you touch bases with.

Scraping for Broken Links: Scraping Google or specific sites for broken links is a good way to implement some broken link building. Identify content that no longer exists and that you can replace, and scrape looking for links to that content. Proxies help as usual by avoiding bans and rate limits, giving you a faster list so you can act sooner.

Searching for Paid Guest Post Opportunities: This is very similar to the write for us tip, but it’s also a way to make a little money on the side. A lot of high profile blogs will pay for a guest post, assuming that guest post is of sufficient quality. This is not a technique you can use with an article spinner! Anyone paying for a guest post is going to be looking it over pretty carefully before paying you, so make sure you’re submitting high quality content.

Stalking Competitor Outreach: There are a lot of tools, including a couple listed above, that allow you to look up the SEO strategies and link sources your competitors are using. Ideally, you will be able to find sites that link to your competitors, and supplant their links. If not, you can still try to get your own links from those sites, to do what they do plus your own benefits to outstrip them.

Scraping Brand Mentions: Brand mentions are implied links, and while Google has talked about giving value to implied links in the past, they haven’t really done it just yet. So, in the mean time, what you can do is scrape any and all mentions of your brand. You can do two things with this information. First, you can pull out the brand mentions that are complaints and work to resolve the situation for a customer service boost. Second, you can find positive brand mentions and message the site owners to turn those mentions into explicit links.

Submitting Content for Syndication: Content syndication is risky for some sites because, if it’s done poorly, it can result in duplicate content penalties. If you’re willing to risk it, or you’re submitting content with a link but which isn’t attached to your name, you can use bots and scrapers to find syndication sites to submit your own content to. This can be very good for links, because many sites that syndicate content have higher rankings than the sites that submit that content.

Scraping Data to Prioritize Link Building Efforts: This is a built-in feature for some of the tools in the first section, but you can do it on your own with API access to some tools like Moz or Majestic. Compile a list of sites you can submit your link to, and run that list through those APIs to scrape data about them. Sort by the metrics you care about for backlink quality, and prioritize the best sites.

Scraping Data to Create Compelling Content: This is a more traditional use of data scraping, and it’s what Google thinks of when they put rate limits on their search. For example, maybe you want to do an infographic on the “top X best” titles of blog posts, looking at the distribution of what number X is. You can scrape that data using a tool and proxies to keep from getting rate limited and throwing off your harvest.

Scraping New Niche Content to Make Top Lists: Run a regular scraping campaign each week or each month, looking for new content in your niche. That means setting searches with keywords and a “past week” time filter. You can then have data for a robust and varied “best content of the week” top list series, giving you great backlink opportunities form every site you feature. For bonus points, scrape contact information so you can notify sites when you feature them.

Finding Hosts of Competing Content: If your competitors are engaged in scraping-powered backlink campaigns, they might be using more innovative strategies than you are, or targeting keywords you didn’t think to target. You can scrape data about their links or their name for contributions, and identify sites that accept guest posts for backlinks. Use them yourself.

Harvesting Whois and Stalking Site Owners: Whois data gives you a lot of information you can use, ranging from infographic fodder about domain registrars of top sites, to contact information you can use for personal outreach. Be sparing with the personal stuff; a lot of people don’t like an internet detective. When all else fails and you find you used some of these strategies in a way that got you penalized, you can use some of the same tools to compile a list of bad links to disavow. This will help you get rid of link-based penalties and tell Google hey, you’re totally on the up and up, it wasn’t your fault.

The post 33+ Ways to Use Proxies to Get More Backlinks appeared first on GhostProxies Blog.

The world of proxy connections can be quite complex when you’re getting into automating software and connections. It’s easy enough to use a proxy for web browsing, but what about using half a dozen proxies for running something like Scrapebox targeting Google, notorious for detecting bots and stopping them in their tracks? You have to worry about perhaps a lot more than you might realize. Do you know the difference between SOCKS4 and SOCKS5? Do you know what ports are required to forward proxy traffic? Do you know what a residential IP means? Thankfully, you have me to go over it for you.

I’m often asked questions about individual pieces of software as well. “Does application X work with your proxies?” Generally, the answer is going to be yes, but I’d rather educate you about why that is than just have you blindly accept it. So, first, I’m going to cover different important factors you might encounter with proxies and what they mean, then I’m going to cover some of the most popular applications you might use with proxies, and what they require.

Things to Check

As I said, the first thing I’m going to do is go over the common aspects you might find when dealing with proxy connections and using them to do your bidding. Some of this might get a little technical, so if all you’re looking for is application compatibility, feel free to skip to the next section.

HTTP vs SOCKS4 vs SOCKS5

This is the first and possibly most important compatibility issue; the type of connection the proxy can use. SOCKS is the default kind of proxy connection. A proxy server using SOCKS sits in the middle, between a client and a server destination. For example, it would sit between you and Google if you were using something like Scrapebox. SOCKS itself stands for SOCKet Secure.

The difference between SOCKS4 and SOCKS5 is that SOCKS5 includes authentication. With a SOCKS4 proxy, there is no ability to use a login and password as a requirement to use it, or to use authentication information at the destination server. In other words, if you’re trying to scrape data on a page you need to log in to access, you need to use a SOCKS5 proxy server.

So what about HTTP? HTTP is more specialized, and thus more limited. You might recognize HTTP as the beginning of the common URL. That’s because it’s the common protocol used for standard web traffic. SOCKS is a protocol used for server to server communication, and has no interpretation of the data; it just passes it from point A through point B and to point C unchanged.

HTTP connections at point B, however, have the chance to interpret and forward the traffic. This is useful to streamline some aspects of scraping. For example, if you were scraping Amazon traffic, an HTTP connection is capable of recognizing and caching common elements, to minimize what your scraper needs to download from Amazon itself.

That said, HTTP connections are limited to just HTTP communications. If you’re trying to access a server that does not allow HTTP connections, but your software requires you to use HTTP connections, you won’t be able to make the connection in the first place.

Ports Necessary for Communications

Ports are another part of internet communication that is both foundational and utterly ignored by most people unless they have a need to mess with them. They essentially act like radio channel frequencies or TV channels. Another analogy might be an apartment building; it as one street address, which is the IP address. The port would specify the apartment itself.

Different ports are often used to differentiate the service being used to make the connection. Port 21 is typically used for FTP connections, port 22 is used for SSH connections, and port 53 is used for DNS servicing. Port 80 is almost always used exclusively for HTTP communications, and that’s a limitation on proxies as well. If your proxy only supports HTTP, it will be limited to port 80. If the proxy uses SOCKS, it can typically use any port, so you will have to tailor your port to the destination’s requirements.

Secure Transmission of Data

This is another concern you might have about a proxy server, but is unrelated to the SOCKS and Port factors above. It’s all about how secure the connection is through the proxy. Many public proxies are not secure at all; they are routed through eastern European servers, which inject ads into the traffic or route it through an overlay. You never know what kind of software might be running on that server to snoop on the connections being made and data being sent.

By contrast, private proxies tend to have more security, because the proxy servers themselves are located in more secure locations. They are also designed for more advanced users, users who would take umbrage to their data being snooped. You also might need a secure connection to access some websites, particularly websites through SCOKS5 that require authentication. Always avoid putting in sensitive login information to an unsecured proxy.

Anonymous or Not

The issue of anonymity is one that is central to the idea of proxy connections. Many people use proxies for simple web browsing, because they don’t want their home IP address associated with their browsing habits. They might just not want to be tracked by large entities like Facebook, Google, or the big ad networks. Alternatively, they might be doing something borderline illegal – or actually illegal – and want to hide from law enforcement or the NSA. This isn’t always possible, of course. Just look at the Silk Road, and all the users who thought they were safe before the FBI raided the place and arrested the worst offenders. The false sense of security that comes from perceived anonymity, which itself comes from the idea that hiding behind a proxy makes you untraceable.

Proxy Anonymitity

There are different levels of anonymity with proxies. Some of them will forward pretty much all the normal information you normally forward, and don’t actually provide you any anonymity at all. They would tell the destination server their IP address for access, but will say “by the way, my actual IP is <X>, in case that matters.” It won’t, unless someone wants to track you, in which case they can find your real IP right there.

Higher levels of security don’t forward as much information. The next step up are called distorting proxies, and will not reveal your IP address, but WILL reveal that they are a proxy connection. The destination server will know someone is connecting through a proxy, but won’t know the originating IP address.

The highest level of anonymity comes from top tier proxies that emulate real connections. These don’t even reveal that they are proxies, though sometimes user behaviors will give them away.

Capable of Passing Search Engine Blocks

This is a factor people refer to as “Google safe” for proxies, and all it means is that the IP address of the proxy is not known to be a proxy server, and has not been abused in the past. Google has aggressive anti-proxy and anti-bot measures in place, and will time out your connections if it detects abuse and botting.

A proxy being Google safe is not necessarily a factor of the proxy itself; it is often more a matter of user behaviors. If you’re making a lot of similar repeated requests from one IP address, it looks like a bot. If you’re varying the IP address for those requests, and varying the timing on them, it looks more like organic users. This is why you should use proxy lists rather than a singular proxy, and why you should set delays and asynchronous connections.

IP Location

This final factor is just a matter of where the proxy server is coming from. There are two main categories for this.

The first category is geographic. If you’re trying to log into a US-centric website, it’s probably not a good idea to use a proxy server located in the Ukraine. A lot of sites commonly targeted by scrapers will block foreign IPs, or re-route them to foreign versions of the site; not valuable to your needs.

The other category is usage. Does the IP come from a data center, or does it come from a residential neighborhood? This is possibly the most important factor on this list. Many large entities, like Google and Amazon, will detect when a connection is being made from a data center. It’s one of the ways they can detect proxy and scraper abuse. It’s always better to be coming in from a residential IP location, because it’s more like their typical user behavior.

Apps and Their Compatibility

There are a bunch of common apps or pieces of software you might want to use with proxies. They usually scrape data automatically in some form, though others will submit data in bulk. Usually sites don’t like robots making these kinds of actions, because it’s how spam and fake accounts come about. I’m not here to judge you on your usage; I’m sure you know what you’re doing. I also take no responsibility for how you choose to use proxies. All I’m doing is reviewing common programs and telling you their requirements. As a disclaimer, I do not necessarily support or condone black hat usage of the following applications; what you do is up to you.

SERobot: This is a link building application that submits your content in bulk to various sites, ranging from RSS directories to article aggregators and social networks. It uses multithreading to support numerous connections at once, and it has an optional captcha breaker service. It uses an API that can link to several article spinning applications as well. All of this is designed for automated SEO, useful for gray or black hat sites and blog network building.

Supports both public and private proxies.
Uses HTTP connections exclusively.

XRumer: This is another link building SEO application that primarily focuses on web forums with some residual value. It also targets blog comments, journal guestbooks, link directories, social networks, social bookmarking sites, and more. It includes captcha bypassing for a number of common systems, including the textual Q&A systems. To avoid spam labels, it tries to customize posts according to the theme of the forum or board being targeted.

Supports both HTTP and SOCKS connections.
Prefers private proxies to avoid trying to use previously banned IP addresses.

SEnukeX: SEnuke is an older program designed for SEO, which was used as the base for SEnukeX, a more advanced version. This new version was created from the ground up to include more features, including a basic tutorial, a process diagram, and scheduling out for weeks. It strives to stay on the good side of Google, by appearing as natural as possible. The application has a 14-day trial and a 30-day money back guarantee.

Requires HTTP connections exclusively.
Prefers private proxies to avoid common issues with public proxy servers.

Scrapebox: Possibly one of the most powerful tools usable in both black and white hat operations, this is an incredibly robust data harvester. It’s used equally by black hat SEOs and top Fortune 500 companies. Multithreaded operations support numerous connections, and it is Google-safe as long as you’re using it properly. It is, of course, possible to be banned according to your usage. That’s why you need numerous proxies, asynchronous and varied requests, and delays on submissions. Use with caution.

Supports both HTTP and SOCKS connections.
Supports both private and public proxies, though private proxies are preferred.
Highly recommended that you use a large, rotating list of proxies rather than a short, static list.

Tweet Attacks: Tweet Attacks Pro 4, the current version of Tweet Attacks, is a piece of software made to manage as many as several thousand Twitter accounts at any given time. It allows automatic follows, unfollows, return follows, tweets, retweets, replies, likes, deletes, and really any other action you could want to take through Twitter. IT also allows individual customization of those Twitter accounts, to eliminate the “egg” problem when running networks of simulated accounts. Costs vary depending on the tier of program you prefer.

Requires HTTP connections exclusively, due to Twitter’s authentication requirements.
Supports both private and public proxies, though private proxies are preferred to avoid detection.
Recommended that you use numerous proxies to manage your accounts, though you don’t need to go so far as to have a dedicated proxy for every account.

Ticketmaster: This is a general category for any number of different Ticketmaster ticket-buying bots. There are a wide range of them, including one named TicketMaster, the TicketMaster Spinner, and TicketBots. All of these have their requirements in common, because they access the same site with the same goal; buying numerous tickets to shows to then re-sell the tickets for a profit. This sort of ticket scalping is not illegal unless done physically on the premises of the venue. Some states may have stricter laws regarding ticket resales, however.

Requires HTTP connections to the Ticketmaster website for authentication and appearance purposes.
Residential IP addresses preferred, as Ticketmaster is prone to revoking sales made to data center IPs and other non-native IPs that signal a bot.

Twitter Account Creation: For using bots like the Twitter manager above, you need to mass-create Twitter accounts. There are a number of different bots that allow this, such as Twitter Mass Account Maker or Twitter Account Creator Bot. Like Ticketmaster bots, these all have similar requirements.

Requires HTTP connection for authenticity and login authentication to Twitter’s servers.
Prefers residential IP addresses, usually private rather than public, though the occasional data center IP is not unexpected due to Twitter’s agency and corporate usage.

Facebook Account Creation: This is the same in many ways as the Twitter bots listed above. Some common Facebook account bots include Facebook Account Creator and FBDevil.

Requires HTTP connection for authenticity and login authentication to Facebook’s servers.
Prefers residential IP addresses, and typically prefers private over public addresses.

Email Account Creation: Email accounts are capable of being created in bulk much the same way as social profiles, though there are as many different bots as there are email providers. Every provider is different and every bot is different, so make sure the requirements are met before you buy or use a proxy list. Typically the requirements will be the same as the social requirements above: HTTP connections and residential IPs. Some email systems are okay with other connections or with data center IPs, though.

The post The Ultimate Guide to Proxy Compatibility appeared first on GhostProxies Blog.

Proxy Error Chrome Screenshot

When you’re dealing with web proxies, you’re liable to encounter a wide range of possible errors. This is primarily because the web was not designed with the use of proxies in mind. They are, in a sense, exploits in the way the web works.

Unfortunately, there’s a sort of sliding scale of errors in proxies. Errors come when a piece of software, a server, or a website expects one thing and gets something else. It’s like opening a box labeled “puppies” and getting spiders. You error out of that situation right quick.

The more data the proxy passes, the fewer errors there are. However, if the server passes enough data, it’s no longer even really a proxy, it’s just a referral server and is part of normal web operation. Many people use proxies because they want a more anonymous connection, though, and that’s what causes problems. A proxy server that passes your requests but strips out header data is sending one thing while a server expects another. Most of the time, this doesn’t cause any issues; it just means the server thinks you’re located somewhere else and doesn’t have much data about you to report to analytics. Sometimes, though, the discrepancy between expected data and provided data causes an error.

That’s what the ERR_TUNNEL_CONNECTION_FAILED error is. It’s a particular set of software interacting with a proxy in a particular way such that it causes an issue.

To Err is Human

To forgive is divine, but forgiveness doesn’t help you in this situation. In fact, you may be begging for it if you’ve been fighting this error for a while. The problem is, it’s a pretty specific, narrow situation that doesn’t come up a lot, so you may have been happily using proxies for quite a while without ever encountering the error.

The error is specific to Google Chrome, which may be helpful to know if you’re willing to use another browser. That’s not always possible, though; similar errors will show up on Firefox and – god forbid – IE if you’re using them.

The error itself is caused when you try to access a page that uses SSL, which is going to be more and more pages moving forward ever since Google declared SSL was a search ranking factor.

If you’re interested in making sure that the error you’re getting is the one we’re talking about, and not just a related error, you can replicate the behavior by using a proxy that filters SSL requests and use it to try to access the HTTPS Google page, or Facebook. You’ll end up with the Error 111: ERR_TUNNEL_CONNECTION_FAILED: Unknown Error message. You can read a bit more about the error and why it happens here.

Another possible cause of the error is a broken registry key on Windows machines. If you’re getting the error intermittently, or on a non-Windows machine, this isn’t the problem. However, if it is, you may have to try alternative solutions.

Make Sure IP, Password, and Port Are Correct

This is a simple tip, but it’s an important first one nonetheless. There are three important bits of information when using a proxy to connect to a site. Those are the IP address of the proxy you’re using, the password you need to authenticate if any, and the port you need to connect.

Proxy Configuration Page

IP address is quick and easy to test. Just try using the proxy to reach another site. If it works, your IP works and your issue is with the SSL. If it doesn’t, check to make sure the proxy is still online, and that the IP address you want to use is correct.

Password is a password. Verify that you’re using the right one. You should be able to manage that on your own.

As for port, most proxies by default connect through port 80. This is the most common web port for proxies, but it’s far from the only one. I recommend checking into the system you’re trying to access and see if there are specific ports it uses. For example, some proxy providers will work on a port like 55555. This might be a question you ask your proxy provider, or it might be a question you research about the site you’re trying to connect to.

Make Sure IP Authentication Matches

Some proxies will pass geolocation data, while others do not. Some websites take offense to this. They want to know where you’re coming from. A lot of times, nothing comes of it if you don’t provide that information. You just end up as one of the garbage connections in analytics providing no useful data. Other times, though, the site will have filters in place. They want to make sure that their traffic is coming from areas they service. A local HVAC company in Wisconsin doesn’t want or care to know about people visiting their site from California. A US-only business doesn’t care about visitors from Russia. It’s all the same concept; keeping your visitors in the areas you service.

Some sites take this a step further and will identify and filter connections from high risk countries, or from out of their service area. High risk countries tend to be countries where a lot of scammers reside, like Nigeria, Vietnam, and Bangladesh.

Other sites might not actively block connections coming in from risky countries, but instead might just look for hidden or mismatched IP authentication data. If the site gets a connection claiming to come from Oklahoma but reads an IP coming from Sweden, there’s probably something fishy going on, and that connection may be blocked.

By fixing your authentication data, or choosing a different proxy for the task at hand, you can remove the errors caused by the mismatch.

Change Browser Proxy Settings

In Chrome – the source of this error most of the time – you can go to change your browser settings by navigating to the Chrome settings URL, which is chrome://chrome/settings/. By visiting that URL, you are presented with a basic settings menu. Click to show advanced settings, then scroll down to the section labeled network. There will be a “change proxy settings” option there, and the option you’re looking for is “automatically detect.”

Change Proxy Settings in Chrome

When your browser is set to automatically detect proxy settings, it will look for settings about proxies embedded in your operating system. For Windows it looks for configuration used by IE as well. For Mac it pulls the Safari settings. By turning off this automatic detection, you allow Chrome – or whatever application you’re using based on it – to use its own settings, which are likely set by you or the proxy owner to be more effective.

You can read more about Chrome and its handling of proxy information in this developer document. It’s probably overkill if all you needed to know was to change one setting, but it helps you understand what’s going on and why.

Contact Proxy Provider for Support

If all else fails, one thing you can do is hope you have a high quality proxy provider. Just visit their site and send their support staff a message, give them a call, or use whatever other contact information they have provided. A good provider will have a solution on hand, or at the very least will have a trained technician available to help you with your problems.

This isn’t much help if you’re using a free proxy list that doesn’t have active support. Most of the time, these lists are made up of compromised computers and they’re run by a small team of foreign nationals who don’t have any support structure. You’d be hoping they would work with you, while they’re hoping you won’t go out of your way to harm their reputation. Not that they care; they survive without you.

Consider a Software Solution

I put this option at the end, even though it might be the easiest of them all, just because it has the potential to cause more harm than good. This is because a software solution by default can only fix problems on your computer, not on the proxy itself. Therefore, this fix is only valid if the registry key on your Windows PC is the problem.

Fair warning: much of the time, any program that tampers with the registry has the potential to cause problems with other software and normal operation of the machine. Ideally, you will want to find a program that will allow you to selectively fix a single issue, rather than presenting you with a list of 500 it finds and forcing you to fix them all at once.

One possible option is to try the software called RegCure Pro. I have not used it myself, so exercise caution when you’re downloading and installing software from an untrusted source. Scan any files you download for viruses and be wary if it asks you for any personal information.

For reference, here is the registry fix you can make to bypass the error in Internet Explorer. The error is no longer the ERR TUNNEL one we’re familiar with, though; in this specific situation the error is “Error Code: 502 Proxy Error. The Forefront TMG denied the specified Uniform Resource Locator (URL).”

The fix is to dive into your Registry using RegEdit. Check under HKEY_LOCAL_MACHINE or HKEY_CURRENT_USER for the tree of Software – Microsoft – Internet Explorer – Main – FeatureControl. You will need to find or create the key FEATURE_SHOW_FAILED_CONNECT_CONTENT_KB942615. In it, set the reg key name to iexplore.exe, the type to REG_DWORD, and the value to 0x00000001.

Be aware that making this change has the potential to open you up to a security exploit that has been known since at least 2011, and thus is likely in use by numerous viruses floating around the web. I recommend that you only implement this change when you’re actively needing to access something on the page via proxy and cannot access it any other way. Once you’ve gotten what you need, remove the key so you close the security hole.

Alternatively, ditch IE. If you’re married to using a Microsoft browser, Edge is supposed to be quite good in comparison. Chrome, of course, has a bunch of fixes above you can try.

The Android Issue

Android phones and tablets use Google’s operating system, and of course Google is going to put Chrome in their OS as their sole web browser.

Android Proxies That means if you’re trying to use a proxy service on a mobile device running Android, you’re stuck with chrome, so you won’t be able to use a different browser unless you have a device with an alternative installed. Also, despite being chrome, the chrome://chrome/settings URL doesn’t exist. If you’re encountering the error on mobile Chrome, try these steps:

Try updating your version of Chrome. On an Android device, you need to go to the Play Store and swipe over to the “my apps” tab, and then install any updates that are available for the Chrome app.
Try opening a new incognito tab to do your browsing there. If your issue was caused by the data being passed or a data mismatch, as mentioned in the IP authentication section, incognito should solve it by not passing as much data along. Note that if this solves the problem, you may have a Chrome extension affecting your ability to connect.
Clear your mobile browser cache and cookies. Yes, this is a step you should do even on mobile devices.
Clear your profile. This is sort of a nuclear option but can remove any data hampering your ability to browse via proxy. Go to your settings, app settings, and chrome settings. Once there, clear your app data. This will remove your profile, clear your local state, and reset any errant flags. Be aware that you may have to log in, input data, open old tabs, and generally restore your sessions manually.

Another alternative is to use a dedicated proxy manager for your phone. ProxyDroid is one such manager, though it does require that your device be rooted for full access. This is a problem you’ll encounter a lot with proxy management on mobile; you need root access to make the most of the functions these apps provide. This is because by default mobile devices segregate a lot of their data, to make it harder for them to be compromised or hacked and exploited. However, many of the advanced features we come to expect in PCs are locked out on mobile for that reason.

If all of that fails to work, every desktop and mobile fix, well, maybe the problem isn’t on your end. Contact your proxy provider and work with them to see if the problem is on their end. If so, they can fix it from where they are. If not, it’s also possible that the site you’re trying to access is having issues. The ERR_TUNNEL_CONNECTION error can also appear when the site you’re trying to reach is under a DDoS attack. You’ll just have to wait for the issue to resolve itself, in that case.

The post How to Fix an “Err Tunnel Connection Failed” Proxy Error appeared first on GhostProxies Blog.

Proxy Setting Configuration

A proxy can be defined as anything that can be used to represent something or someone else. For instance, there are a few daytime television shows that have a backup cast to replace the main actors when they are sick or unable to come themselves. These backup actors are the proxy because they stand in and represent the other person. In the realm of technology, it is very much the same. A person might desire to use a proxy if they do not want to appear quite like themselves.

It is common knowledge that the Internet, in general, is an enormous machine of data that constantly takes and gives a stream of information. On most occasions, this stream will also take at least basic information from the person accessing the internet, such as location and general demographics. In an age of increasing worry for internet security, this is quite a concern for the people that need to get to access content on the internet freely without worrying about their personal information being stored by companies with ill-intent.

Reasons for a Proxy

While there are nefarious reasons to use a proxy service to hide internet activity, it has to be accepted that there are just as many valid purposes. There is, after all, the overriding concern for security and privacy as technology and the internet expand into unexplored realms of growth. Many users may desire to access their internet through a proxy for the sole means of keeping their personal information personal.

Outside of the privacy reason, there are also many companies that use various methods (i.e. firewalls) to limit a person’s access to their internet. Schools will use this to ensure that students are not browsing the web for sites that have games, violence, or any content that the school district deems inappropriate or overtly distracting for the students. In the same hand, workplaces will restrict their workers from doing many activities on the internet that would either cause their data to be insecure or lessen that particular workers productivity.

It is in this light that using a proxy will shine. Since the person accessing the internet is using the proxy to ‘represent’ them, the rules placed on that interconnection will no longer apply. By running the online activities through a proxy connection, the internet restrictions will only see the proxy instead of a gaming website, personal email, or any other innocent application that a student or worker could want to use.

A quick word should be said for the fact that people intending to use a proxy to hide their illegal activities will not be entirely hidden; the Internet Service Providers (ISP) will still be able to see, if they cared to look, that the internet is being routed through a proxy. For the most part, the ISP will not randomly look to see who is using a proxy or think too much about it. Since there are valid reasons for wanting to use a proxy, the provider may only look when there is suspicious activity paired with the use of a proxy service.

How to Get a Proxy?

Getting a basic proxy service is usually quite simple. There are many online websites that will provide the proxy directly through that website (such as this one). This method is quite ingenious since specific internet traffic will be tunneled through that one website. For instance, a student at school wants to play a game in his downtime, but the school has all the gaming websites that he knows about on lock down. However, this student can access the proxy website and enter in the gaming websites address to access his games. All of this traffic is handled on the proxy server and mirrored in their website, which means the school will only see the student accessing the proxy website. Unfortunately, that is also the major downfall of a web-based proxy. They can easily be blocked when someone notices that their rules are being bypassed through that particular website.

Another option is to configure the internet settings on the device to route itself through the proxy server. This method is not as simple as the previously mentioned one, but it does tend to work very well once you know how to configure everything properly. In fact, many of the popular browsers and applications, like Google Chrome, Firefox, and Internet Explorer support configuration for proxy connections.

First, before the applications can be configured for the proxy connection, the person needing to bypass rules or hide their activity will need to find a company that is providing access to their proxy server. There are some out there that are for free, but the most secured connections that come with all the functionality that a proxy connection should usually have required a small service fee.

A Few Words of Warning

It is obvious that using a proxy can have a lot of benefits for someone to keep their information private or to bypass internet restriction, but there are a few, unfortunately, areas that a person may need to make sacrificed in for the proxy to be used. One of these sacrifices will be the speed of the internet connection.

This makes sense since the connection is being routed through a third-party system (a middleman) at every point of communication with the servers being accessed. Moreover, if the person is not careful about the proxy servers they are using it can start to be counterproductive to their intention in keeping their personal information private. This is because the third-party proxy server will be simply tunneling all the data being streamed through it, and it can store a cache of commonly accessed web pages and information. For security and speed, anyone looking to use a proxy should intend to find a company that offers a proxy server, and they will list what information they keep and how it impacts the customer’s privacy concerns.

Configuration of the Proxy Server

To start with the very basics of setting up a proxy connection, the new browser for Windows 10, Microsoft Edge has a simple configuration for proxy settings. The instructions for the two popular Microsoft Internet browsers, Edge and Explorer, will be respectively listed below.

Microsoft Edge

Edge Proxy Settings

1. Open Microsoft Edge
2. Navigate to the ‘More’ Menu in the top right of the window
3. Click ‘Settings’ at the bottom of the drop down menu
4. Scroll down until ‘View Advanced Settings’ is visible and click on that button
5. A new window will open up with the Automatic Proxy and Manual Proxy setup options
6. The company providing the proxy server will provide the information that needs to be entered here

Internet Explorer

Internet Explorer Proxy Settings

1. Open Internet Explorer
2. Navigate to ‘Settings’ in the top right
3. Scroll down and select ‘Internet Options.’
4. Find the connections tab
5. Input information provided by the proxy server

Mozilla Firefox

Firefox Proxy Settings

1. Open Mozilla Firefox
2. Navigate to the Menu button in the top right
3. Find the ‘Advanced’ option in the left-hand menu
4. Select the ‘Network’ tab from the top options
5. Open the ‘Settings’ under the Connection section
6. Choose the ‘LAN settings’ option
7. Select the ‘Use a proxy server for your LAN.’
8. Input information was given by proxy server

**If there is a previous setup the proxy connection with Internet Explorer or Microsoft Edge, selecting ‘use system proxy settings’ will import the same proxy information that was input at that time.

Google Chrome

Chrome Proxy Settings

1. Open Google Chrome
2. Navigate to the Menu in the top right corner
3. Go into ‘Settings.’
4. Scroll down until ‘Show advanced settings’ is visible at the bottom of the page
5. Click on that and then go into the ‘change proxy settings’ under the Network section
6. Select ‘Settings’ on the window that pops up.
7. Click on LAN settings
8. Choose ‘use a proxy server for your LAN.’
9. Enter in the information provided by the proxy server

Some Good Things to Know

There are multiple forms of security when using a proxy connection. The company hosting the proxy server will give out the information on what security protocols they are using. This supplier will typically include everything the consumer needs to know on how to configure their devices to use their services. However, listed below are a few terms that may come up in the supplier’s instructions or on their forums.

HTTP – regarding proxy connections, the HTTP protocol forwards the HTTP requests to the correct servers. This method recreates the request from the original device on the proxy server, and the proxy server with this protocol will forward that exact request (or as similar as possible) to the desired destination. The communication between server and device is left unhindered and open.
SOCKS – This protocol uses a handshake operation to initiate the connection with the proxy server. The connection can also utilize both UDP and TCP traffic, and it can work in reverse. The SOCKS protocol consists of at least two commonly used operations of SOCK4(a) and SOCK5. Each version of the protocol is an improvement upon the last, with the SOCK5 bringing greater levels of authentication, support of IPv6, and the previously mentioned UDP.
P2P – This is a general term to refer to Peer-to-Peer connections. They are utilized to download torrents and other shared content. Proxy suppliers will list if they support P2P.

Still Need Help?

The good majority of companies that provide the proxy servers and Virtual Private Networks (VPN) services will give the customers a large amount of instructions on how to access their proxy servers and methods on setting up specific applications and browsers with their settings. These resources should always be checked before setting up the connection and during any troubleshooting steps.

Checking to see if the proxy server is routing the information can be done by navigating to various websites that will automatically detect and list the connections address. In fact, many search engines will not provide the given IP Address when ‘what is my IP address’ is typed into the search function. The new address that was typed in as the proxy address should now be displayed. Troubleshooting should only happen if there is any reason to suspect that the proxy is not working (i.e. certain web pages remain blocked, or the address listed is the same as the old one).

When troubleshooting, refer to the supplier’s provided client dashboard and, possibly, software to assist in finding the problem. It will help check if the port, address, and security protocols were entered incorrectly; should that fail, the company that supplies the proxy will most often have a support number to contact for addition assistance in getting the proxy up and running!

The post Proxy Servers: Setup and Configuration Guide appeared first on GhostProxies Blog.

Zennoposter

Zennoposter is one of the many tools to arrive in the recent trend of automating various internet marketing tasks. While the idea of automation itself isn’t new, Zennoposter’s complete lack of any coding makes it exceptionally easy to use. Much of what you want to automate is handled in the background, so you don’t need to worry about

The program is essentially a macro shell, into which you can “code” without coding various tasks you want completed. This can cover a wide range of tasks, from email processing and sending to blog commenting, forums registration, and article posting. If you have something you need done routinely and repeatedly in the course of your marketing, you can probably automate it through Zennoposter.

The biggest drawback of the program as a whole is just how free-form it is. There’s a lot going on, and it has a steep learning curve, even without needing to code anything. It’s also a risky tool to use; if you’re not careful, you can go overboard with black hat SEO techniques and earn yourself a penalty. Though, more likely, you’ll eat an IP ban and need to figure out how to keep using the program – or the site you were banned from – in the future.

Zennoposter works with a sort of flowchart model. It’s graphical, hiding the actual code behind the GUI so you don’t need to deal with it. This image shows a good basic example. On the left is the project flowchart, and on the right is an embedded browser window to record the actions Zennoposter will take and iterate. You can see the steps of the project flow already. The project clears cookies to prevent errors with session memory, then visits WordPress.com. Once there, it fills out a website form in the box, then clicks the create website button.

Zennoposter Flow Chart

That alone is a very simple process made with Zennoposter, used for bulk blog creation on WordPress. It’s also not a complete process; it leaves blog creation partway through and doesn’t take advantage of it in any way. This is where you can expand the project or create more projects to take up where this one left off.

If you notice off to the bottom left of the image, you see a variables section. This allows you to create lists which you can expand the process. For example, the process just creates one website, but you could list ten different names in the variables section and it would iterate the process ten times, each time using one of the variables.

You can do this on the back of other processes as well. Create a process that registers ten email addresses, then uses those ten email addresses to register ten websites, then check each email address for the verification email to confirm the creation of that website, and so on.

Account Creation

One of the primary things many people use Zennoposter for is creating accounts for various web services. You can use the program to create fresh email accounts, new websites on free hosts like WordPress.com, accounts for commenting systems, and so forth.

Account Creation on Twitter

Account creation can be a bit of a complex task to automate, and for good reason; many of these companies don’t want people automating account creation, because it’s most often used for spam.

When you go to create an account through Zennoposter, you will want to use the action recorder to go to the site and record going to the “join us” page. In there, you can use a macro builder to create a name/email address/password for each relevant field. You can use the variables section as well, if you want to use a list rather than a randomly generated name. Any field can be set manually, filled in from a list, or created dynamically.

That’s all you need to create an account, but you can go on further. You can select a set of text from the confirmation page and create a check to make sure the account creation worked, which will thrown an error if the process fails for some reason or another.

Often times, you’ll need to verify the creation of an account, and to do that, you need email processing.

Zennoposter Email Processing

What is email processing, in the context of Zennoposter? Well, the general idea is that you give the program access to your email inbox, and it looks for emails of a certain type, and performs a certain action on those emails. For example, checking email inboxes for account verification emails and clicking the links in those emails to verify the creation of the account on whatever site it was for.

Confirm Account

Email processing requires POP3 access. POP3, if you don’t know, is one of the protocols used for pulling emails from a service into a client. In other words, if you’re using Gmail for your email, but you want to access it through Zennoposter, you need to give Zennoposter POP3 access to the Gmail email servers. Some webmail services don’t allow POP3, while others do. Gmail, in fact, does allow it, but they catch on very quickly to abuse.

Essentially, the more volume you have going with your automation, the more risky it becomes. Gmail will catch on sooner than others, but other free webmail services will terminate accounts if it looks like they’re being used to automate large-scale spam operations.

The easiest solution to this problem is to use your own email server or dedicated server, where you as sysadmin have control over the filtering rules. You won’t be blocking yourself, and no one else will be using the email system, so you don’t have to apply filters at all.

Using POP3 email through Zennoposter will require the name of the POP3 server, as well as the port of access – generally 110 – and the username and password of the email account. It’s quite simple to implement, as well; just record a macro and fill in the information as requested.

For a good demonstration of both account creation and email processing, you can view this video series.

Captcha Busting

Zennoposter is, at its heart, a bot. As such, you will run into issues with bot filtering processes, like with Gmail or with WordPress, as well as all sorts of other systems. Even Google has captchas in place to prevent repetitive bot activity. There’s one way I’ll discuss shortly to help alleviate the issue, but you’re going to run into it sooner or later.

By default, Zennoposter will stop its actions when it encounters a captcha, to prevent running face-first into a wall over and over and getting itself banned. It’s up to you to notice and perform the captcha manually to set the bot in motion again. Unlike other bot programs, Zennoposter doesn’t include a captcha breaker by default. There’s no DeathByCaptcha integration or anything of that sort.

However, Zennolabs has produced a captcha breaker, known as CapMonster, which you can buy to add on to the core program. It allows you to program in captchas as you encounter them, or pull data for common captchas from a central database. It’s not 100% foolproof, but it will handle the majority of the captchas you see in the wild. The downside, of course, is price. CapMonster costs nearly $100 per year, which is a lot if you’re rarely encountering captchas and don’t need a full automated solution.

CapMonster

CapMonster also may or may not work on the Google “I am not a robot” check box captchas. These have been notoriously hard to break consistently, and reports are mixed as to how successful CapMonster is.

Anti-Bot Workarounds

The number one way to work around any bot filtering is to use proxy connections. When 10 accounts are made from one IP address, it’s easy for a piece of software to say “hey, this looks like one bot creating a bunch of accounts, we better shut that down before it gets too spammy.” If you use a list of proxy servers for each connection, though, the software sees 10 different accounts made with 10 different IP addresses. There’s nothing wrong with this, as far as the software is concerned.

Email Account Blocked

Obviously, you can avoid most issues with bot filtering by using proxies. However, there are some instances where you want to turn them off. For example, if you’re using one Gmail account to create multiple accounts on other sites, you don’t want to use different proxies to access the same Gmail inbox; Gmail will detect your account logging on from different locations and will block access or lock it down out of concern for user security. Try to make sure you have one IP assigned to one Gmail address.

Now, proxies come in two flavors; public and private. Public proxies, as you might expect, are freely accessible to anyone who wants to use them. Often, they’re used by casual browsers looking to obfuscate their locations, and they’re geared as such.

Public proxies will work for Zennoposter, but I’m warning you right now; they’ll probably be a headache. See, public proxies come from all around the world. They’re often slow and flooded with traffic already. They’re free, but they often lace your browsing experience with an overlay or ads, which can cause issues with some Zennoposter macros.

Public proxies are much slower than private proxies, due to the mass of users already using them.
Public proxies can run into geolocation filters on some websites, such as automatic redirects to local versions of sites.
Public proxies are prone to dying unexpectedly. Your connection might not go through the first try, or the server might be taken down completely.
Public proxies might have anyone behind the wheel; you never know if there’s a man in the middle or a sniffer along the way, siphoning off your data.

All of this makes it a hassle to use public proxies for much of anything where serious automation is concerned. Private proxy lists are much better. They’re faster, they’re well maintained, and they’re often more tightly geolocated in first world nations. They don’t show up on IP filter lists, and they’re typically guaranteed secure. Plus, if one does get taken down, it will be replaced in the list. Of course, you’ll have to pay for private proxy lists, but that’s a small price to pay for the freedom and ease of experience you get just from avoiding the public proxy migraine.

If you’re looking to use Zennoposter on a small scale, like creating 2-3 sites or accounts and just automating the processes involved, I wouldn’t recommend it. It’s like using a sledgehammer to tap in a thumb tack; it’s overkill. It’s more hassle than it’s worth. If you have a medium-scale project, maybe 10-20 sites or accounts, you can probably get away with doing it slowly using public proxies. Even then, you’ll probably have some issues. If you’re going to be using Zennoposter on a large scale, with 20+ accounts, you’re going to want to learn all of the macro options and, more importantly, buy some reliable private proxies.

Seriously, it cannot be overstated how much easier life is when you don’t have to worry about your internet connection sabotaging you. Automation is meant to be automatic; if you have to babysit it, what’s the point?

The post The Definitive Tutorial on Setting Up Zennoposter appeared first on GhostProxies Blog.

Tor and Proxies

Tor is an interesting concept. The idea of layered connection bouncing through an anonymized network with semi-random exit nodes is a good one, when it comes to security. You essentially put a cloud in the way of anyone trying to track your traffic. You enter it and they have no idea where you exit. At least, that’s the theory.

Tor is not without its issues, though. It’s slow, much slower than using the Internet normally, or even through a single proxy server. It’s also not quite as anonymous as people would like to think it is. There have been a lot of big raids from the FBI, the NSA, Europol, and other law enforcement agencies. It’s not really all that secure.

If all someone needs to track you down using Tor is the ability to force you to download a large file and the ability to monitor some data, they can see both the entry and exit points and ignore Tor’s routing completely. You essentially slow down your internet connection for no practical benefit. The attacker – be they a hacker or the FBI – can just put together 1 and 1 to get 2.

The key components to this circumvention of the anonymity of Tor requires two things. First, it requires the connection entering the entry node to be yours. Second, it requires you to be visiting a site controlled by a malicious actor. It could be an FBI honeypot or a hacker setup designed to harvest data for later sale.

Tor Honey Pot Example

On the normal web, you can just make sure not to browse websites you don’t trust. On the deep web, sites don’t have those reputations, or those reputations can be seeded and falsified. You have much less ability to avoid sites you don’t trust when the entire purpose of what you’re doing is visiting the seedy underbelly of the internet.

If you can’t ensure the quality of the reputation of the site you’re reaching from the exit node, you have to ensure the anonymity of the traffic you’re sending into the entry node. You can do this by layering in a level of proxy connection.

Of course, a proxy server isn’t exactly anonymous either. You have to be just as sure of the proxy you’re using as you would be of the exit site you’re reaching normally. Generally, this means that public proxy lists are out, and you have to be confident in the quality of the private proxy lists you’re using.

Achieving true anonymity is difficult. That’s by design. Governments want to be aware of what communications are happening, to avoid terrorist plots and foil uprisings. It’s no different than any system implemented for the security of something larger. However, some degree of anonymity is necessary to keep governments from getting too oppressive. China’s Great Firewall is security gone too far, and something like a proxied Tor connection can help circumvent that kind of oppression.

I’m saying this to remind you that the issue isn’t black and white. There are terrorists using anonymous internet setups to plot. There are governments seeking them out to protect citizens. There are criminals using Tor to buy drugs and weapons. There are law enforcement agencies preventing crime by tracking them down. There are activists using anonymous traffic to plan rallies and build support. There are oppressive regimes tracing their traffic to stomp them out, arrest them, or even kill them.

I’m not here to judge why you would want to use proxies with Tor to anonymize your traffic. I’m just here to help you do it.

Tor Vs Proxies

If Tor is designed to obfuscate the middle between your computer and a client, and a proxy server is designed to obfuscate the middle between your computer and a client, what’s the difference between them?

A proxy server is a single server that is set up to refer traffic coming in to its intended destination, stripping it of referral information and replacing it with its own. This can be used to change your apparent geographic location, web browser, client version, and other details. The problem is, a proxy server is a single server. Anyone capable of accessing it can install monitoring software, which can keep logs and piece 1 and 1 together. The proxy server knows who you are, and can be the subject of a man in the middle attack.

How a Proxy Works

Tor is like using a layer of semi-randomized proxy servers. You have a designated entry node, that changes periodically. You have a randomly chosen exit node, that passes your request to your actual destination. In the middle you have a number of different jumps and servers, chosen randomly from a swarm. There might be 50 servers there, and you pass through 4 of them. Each time you connect, you pass through different internal nodes and come out through a different exit node. Along the way, your traffic is not monitored or modified. It’s encrypted and untraced.

How Tor Works

The idea is that even if one of the servers is a bad server, it just sees a proxy on either end, neither of which pass more information about you. All the monitor would see is that the traffic is encrypted using Tor, and that’s not in itself illegal or punishable. Some agencies may see it as suspicious, but there’s no way for someone at the FBI to differentiate between you using Tor to browse Wikipedia or you using Tor to buy cocaine.

Now, some people worry that Tor has a backdoor built into it. This isn’t true, despite modern attempts to force various agencies to install such backdoors. While Tor isn’t necessarily 100% secure, compromising it involves a more sophisticated attack than just “a special key the government can use to get in.” That, or a simpler user error, like using Bittorrent.

Using a Proxy with Tor

Adding a proxy server before the Tor entry node helps boost your anonymity and security. It helps obfuscate your location in case anyone is monitoring the entry node and tracing traffic.

If you’ve tried to set up a proxy server with your internet connection to route it through Tor in the past, you’ll probably notice that Tor’s icon says it is disabled as soon as you enable the proxy. That’s because Tor expects to be able to route your traffic into an entry node, but you’re forcing the traffic to go to the proxy server instead. You’re circumventing Tor to use the proxy.

What you need to do is set up the proxy server from within Tor. Using the Tor web browser, you need to click the Tor Network Settings and click configure. The first option asks if you want to use a proxy; click yes, and you will be brought to a proxy configuration screen. You can choose HTTP or SOCKS proxies, where you will then need to put in the IP, the password and username, or whatever other information is necessary to use the proxy. Once you have that in, click next and then click connect. If your information was correct, Tor will now be configured to use your chosen proxy server in addition to using the Tor network.

Tor Proxy Settings

You can read a bit more about Tor proxy configuration options, including what sort of connections are required to operate Tor properly, by reading their manual page here.

Why is Everything So Slow?

The number one problem with Tor is the same problem you get on a smaller scale with proxies, and that’s a slow site speed. It’s a simple matter of physics, really, and it’s not a problem that’s going to be solved.

Imagine you are standing on one end of a room and your friend is standing at the other. You have a baseball. You write a question on the baseball and toss it to your friend. They write the answer and toss it back. You get the answer, and it’s fast.

It’s not secure, though. Anyone taking a snapshot of that scene would see the originator of the traffic and the originator of the response. If they catch the baseball in the middle, they can read the traffic as well.

Now imagine you have a circle of ten friends. You write your question on the baseball, and toss it to a random person. They toss it to another, who tosses it to another, who tosses it to another, who tosses it to the friend who has the answer. That friend writes the answer down and tosses it to a random person in the circle, who tosses it to another, who tosses it to another, who tosses it to you.

Now you have your answer, but it was a lot slower. The baseball bounced between six different people, making a bunch of stops, before getting to its destination with the answer. It’s more secure, though; anyone intercepting the baseball might get the data, but they don’t know who you are or who your friend who wrote the answer is. Any snapshot only gets a minimal partial picture.

The internet works in much the same way, where each friend in the circle is a web server that is part of the Tor network. The connection has to bounced from person to person, and in some cases it may be crossing overseas multiple times.

Tor actually makes things worse, because it’s a small network in comparison to the amount of traffic passing through it. Back to your circle of friends, imagine there are 75 baseballs flying through the air at any given time. Anyone would get overwhelmed trying to catch and throw back the baseballs quickly enough. Often several will back up and take time to be relayed. That’s what is happening to Tor to slow it further.

Adding a proxy on top will further slow your traffic, but not by much. A good web proxy isn’t going to be that much slower, particularly when it’s from a private proxy provider. Adding it on to Tor is just adding a person between you and the circle.

There will be occasional other issues with Tor. For example, using an adblocker adds a fingerprint that can be used to break your anonymity. Google will also often glitch out; they serve content based on geographic location, so if your Tor exit node is in a foreign country, you will be served content geared for that country. It also may make you solve a captcha to prove you’re not a bot or malicious user.

Alternative Options

One option you have with Tor is, instead of using a proxy, use a VPN. A VPN can be placed either before or after your Tor connection, and can add more encryption and security than a regular proxy server can provide. On the other hand, VPNs can be even slower than proxies and harder to configure.

VPNs can be used to get around ISP blocking of Tor traffic, which has been known to happen occasionally. The VPN will be the traffic your ISP sees, and it acts as a barrier between you and your ISP. However, VPNs don’t just cease operation if they break; in order to continue your browsing, they revert to normal traffic. This means Tor will still be visible if the VPN drops. That, and a VPN is much more prone to handing over your records than a proxy server would be. This is where proxies excel.

At the end of the day, there is no truly secure, anonymous way of browsing the Internet. Whether it’s because you’re sending personal information out, or logging into accounts tied to your information through an anonymizer, it can still be compromised. Tor, if you’re truly afraid of surveillance, is not going to be your solution. Then again, neither will a proxy, or a list of proxies. If, on the other hand, you merely want your traffic hidden but aren’t doing anything to break the law, you’re probably not going to be a target regardless.

The post How to Use Proxies to Surf Tor Anonymously appeared first on GhostProxies Blog.