With a growing interest in automated data collection solutions that source from multiple online sources, how will data and web scraping solutions evolve to serve changing sales and marketing needs? Julius Černiauskas, CEO at Oxylabs shares his thoughts:
______
Welcome to this MarTech chat Julius, tell us more about Oxylabs.
Well, our business began back in 2015. Six years ago, there was barely a proxy industry to speak of. While the technology might have existed, there were few widespread uses for them. However, more of them were emerging, piquing our business interest. It seemed like the perfect opportunity to hop onto the train before it got off the station.
Like almost all businesses, in the beginning we had to pick one solution as our source of revenue. We had some experience in dealing with other businesses, especially in the IT sphere. Picking datacenter proxies (those produced by server owners) as our primary solution seemed like the perfect choice.
Our entrance into the proxy industry was marked by both business acumen and luck. In general, we knew that proxies will be important rather soon. However, it turned out that they became significantly more important a lot faster than we had anticipated.
As time passed, we added more products to our list such as residential proxies. However, we didn’t want to limit ourselves to resource delivery. Automated public web data collection was and is on the rise as the usefulness of information grows each day.
A few years ago, an opportunity had presented itself, thanks to one of our data analysts, to create a solution that would help our clients get the data they want by simply sending a few requests. They would no longer have to build an in-house scraping solution. Thus, one of our most advanced solutions, Real-Time Crawler, was born and allowed us to pivot towards a more comprehensive service.
How are you seeing demand for data scraping solutions evolve in sales and marketing today?
Everyone involved in these industries has been repeating the same mantra “Data is the new oil”. While it’s getting a little tired now, I must say they aren’t wrong. Interest in automated data collection from various online sources is growing exponentially. As access to the technology gets cheaper over time, more and more businesses are finding data collection as a way to boost revenue.
Initially, data acquisition needs were scattered few and far between. Only tech giants or data-necessary businesses (e.g. VCs, hedge funds, etc.) were involved in the game. Financial companies were interested in, what is now called, alternative data because it could provide additional signals for market movements.
A good study back in 2010 had shown that even innocuous things such as overall Twitter sentiment can be used as predictors of financial swings. Usually, businesses take a bit of time to catch on to the newest research. By 2015, when we started, automated data collection was already being utilized by a number of companies outside the financial service sphere.
Now, all manners of businesses utilize publicly available data. One thing I have noticed about the demand is that the average number of data sources per business has increased. Marketing, sales, and R&D departments are attempting to syndicate a larger, more comprehensive view of data instead of relying on one particular source. Many of them attempt to gather data from secondary sources as well (i.e. an aggregator of data for a particular target instead of only gathering from the target itself).
Additionally, as data enrichment practices take hold, sales and marketing departments have started looking towards matching professional data with leads. They are attempting to optimize the sales and marketing process by getting information on companies before the first interaction between personnel even begins.
Finally, localized data is becoming more important every day. There are many reasons for that, however, the primary goal is to serve increasingly relevant information. Instead of attempting to scrape global data, businesses are turning towards acquiring information based on geolocation settings.
How according to you can marketers make better use of data with better web scraping solutions?
Honestly, when I get asked a similar question, I think we’re looking the wrong way right from the get-go. Most businesses don’t need more data. They need either better data or better data management practices.
I would pose a question to every data-driven marketer – how much of all the data acquired by your business from all sources is in regular use? The answer is usually “not that much”. Therefore, the first step should always be to clean up the internal processes first, then look outwards for more data acquisition.
Getting too much data is not as harmless as it may seem. There is no free lunch – businesses are paying for that information one way or another. These costs may be indirect (e.g. employee wages or time), however, they always exist.
For those who have already solved data management issues, it’s hard to find a wide-sweeping, all-inclusive recommendation. Marketing is a very wide field. However, a good rule of thumb would be to keep a close eye on the competition through scraping solutions.
Following your competitors closely isn’t for beating them to the punch and releasing something sooner. It’s for finding the gap of information. If they are taking a new business approach or pivoting somehow, you can be sure they have data you don’t. That will allow you to discover gaps in knowledge that can often be filled in with data from scraping.
Can you talk about some of the most interesting web scraping technologies and features (or case studies) you’ve come across from the global marketplace?
Well, the industry is spidering into so many different directions at once that pointing out something specific can be difficult. However, there have been interesting applications for data acquired in a similar manner. One of those has been the scientific research into the correlation between Twitter sentiment and Dow Jones Industrial Average closes I’ve mentioned previously.
A particularly interesting and useful application of web scraping is in cybersecurity – Open-Source Intelligence. In essence, large-scale scraping is used as an early warning system for any malicious actors. Discoveries can range from specific information about malware to discussions about specific cybersecurity companies or experts.
In marketing, SEO specialists and developers have been using web scraping for a wide variety of applications. From customized and localized keyword research to creating industry giants that can deliver insights about ranking algorithms.
A few predictions that you have for the future of martech as a whole and web scraping technologies?
Outside of the tried-and-true predictions of “every business will eventually use data”, I think there are a few trends currently happening that will have a large effect years down the road.
One of these trends is consumer awareness. While governments have been catching up on legislation on private or personal data, in the US the law is still slacking. However, users have become increasingly aware that staying off the grid completely is nearly impossible and are looking for other avenues to retain privacy.
Data poisoning is one of these avenues. There are some applications already in development as the entire concept is simple. Instead of trying to leave as little data as possible, users attempt to leave random data points. For example, an extension Ad Nauseam works as a regular ad blocker. However, it additionally clicks all the ads behind the scenes, costing businesses money and eventually ruining customer profiling.
Businesses will have to find ways to discover intentionally obfuscated profiles and remove them from datasets. Essentially, removing obfuscated profiles delivers a win for both the consumer and the business. The consumer retains his privacy by being removed. The business will increase the accuracy of its other predictions by removing ruined data.
For the web scraping technologies themselves, I firmly believe that they will be increasingly controlled by machine learning models. We have pioneered implementing machine learning based features. Currently, these features are scattered across service providers, however, we believe the importance of artificial intelligence will skyrocket in the near future.
As far as we know, artificial intelligence plays a small role in the overall process. However, the data acquisition and parsing process lends itself well to automation through machine learning. Thus, over time, more pieces of the pipeline will be done through artificial intelligence.
A few takeaways for marketing leaders and CMOs/CEOs in 2021: top factors they should keep in mind as they plan for the rest of the year?
Primarily, I would say that we should look very closely at the data on all three stages of the COVID-19 pandemic (pre pandemic, during the pandemic, and after it concludes). There were a lot of predictions about how everything will change permanently – from shopping behaviors to the nature of work.
However, we can see some trends both ways. Not all businesses have decided to go for a full work from home environment. Some are intending to return back to the regular office life with small changes to flexibility.
Consumers aren’t acting as we had originally anticipated as well. Ecommerce has been booming, however, it is clear that some industries might recover a large portion of their brick-and-mortar sales.
In essence, look at the data carefully. See what has changed without falling into the hype. Then act accordingly – innovate and expand where necessary. And if you don’t have someone to give you a detailed analysis? Hire a data team.
Oxylabs is a leading global provider of premium proxies and data scraping solutions for large-scale web data extraction. The company’s mission is clear: To give every business – whether big or small – the right to access big data. With unmatched hands-on experience in web data harvesting, Oxylabs is in trusted partnerships with dozens of Fortune 500 companies and global businesses, helping them unearth hidden gems of business intelligence data through state-of-the-art products and technological expertise.