A new kind of pub crawl

by Angela Herring

August 23, 2012

Websites like Facebook, LinkedIn and other social-media networks contain massive amounts of valuable public information. Automated web tools called web crawlers sift through these sites, pulling out information on millions of people in order to tailor search results and create targeted ads or other marketable content.

But what happens when “the bad guys” employ web crawlers? For Engin Kirda, Sy and Laurie Sternberg Interdisciplinary Associate Professor for Information Assurance in the College of Computer and Information Science and the Department of Electrical and Computer Engineering, they then become tools for spamming, phishing or targeted Internet attacks.

“You want to protect the information,” Kirda said. “You want people to be able to use it, but you don’t want people to be able to automatically download content and abuse it.”

Kirda and his colleagues at the University of California–Santa Barbara have developed a new software call PubCrawl to solve this problem. PubCrawl both detects and contains malicious web crawlers without limiting normal browsing capacities. The team joined forces with one of the major social-networking sites to test PubCrawl, which is now being used in the field to protect users’ information.

Kirda and his collaborators presented a paper on their novel approach at the 21st USENIX Security Symposium in early August. The article will be published in the proceedings of the conference this fall.

In the cybersecurity arms race, Kirda explained, malicious web crawlers have become increasingly sophisticated in response to stronger protection strategies. In particular, they have become more coordinated: Instead of utilizing a single computer or IP address to crawl the web for valuable information, efforts are distributed across thousands of machines.

“That becomes a tougher problem to solve because it looks similar to benign user traffic,” Kirda said. “It’s not as straightforward.”

Traditional protection mechanisms, like a CAPTCHA, which operates on an individual basis, are still useful, but their deployment comes at a cost: Users may be annoyed if too many CAPTCHAs are shown. As an alternative, nonintrusive approach, PubCrawl was specifically designed with distributed crawling in mind. By identifying IP addresses with similar behavior patterns, such as connecting at similar intervals and frequencies, PubCrawl detects what it expects to be distributed web-crawling activity.

Once a crawler is detected, the question is whether it is malicious or benign. “You don’t want to block it completely until you know for sure it is malicious,” Kirda explained. “Instead, PubCrawl essentially keeps an eye on it.”

Potentially malicious connections can be rate-limited and a human operator can take a closer look. If the operators decide that the activity is malicious, IPs can also be blocked.

In order to evaluate the approach, Kirda and his colleagues used it to scan logs from a large-scale social network, which then provided feedback on its success. Then, the social network deployed it in real time, for a more robust evaluation. Currently, the social network is using the tool as a part of its production system. Going forward, the team expects to identify areas where the software could be evaded and make it even stronger.

Editor's Picks

Can big data have a role in treating dementia? That’s what this Northeastern student is hoping to help solve

She went from marketing exec and part-time singer to opening her own art studio — while leaning on her Northeastern MBA

The UK wants to ban the next generation from smoking. Will it work?

These Northeastern graduates are improving our neighborhoods one tree at a time

New models of Big Bang by Northeastern physicists show that visible universe and invisible dark matter co-evolved

Featured Stories

Wheelchair sensors can cost $10,000. Here’s how Northeastern engineering students built a better version — for $87

Start Summit at Northeastern’s Portland campus focuses on inclusivity and welcoming new entrepreneurs to Maine

How can we provide better care for patients with severe brain injuries? That’s the mission of this Northeastern graduate

The whole world would lose in a full-blown war between Israel and Iran, Northeastern expert says after latest escalation

Can big data have a role in treating dementia? That’s what this Northeastern student is hoping to help solve

She went from marketing exec and part-time singer to opening her own art studio — while leaning on her Northeastern MBA

Can big data have a role in treating dementia? That’s what this Northeastern student is hoping to help solve

She went from marketing exec and part-time singer to opening her own art studio — while leaning on her Northeastern MBA

Can big data have a role in treating dementia? That’s what this Northeastern student is hoping to help solve

Photos: Springtime blooms, Huntington 100 and Academic Honors Convocation

14th annual Academic Honors Convocation recognizes Northeastern students and faculty for their scholarship, research, leadership and innovation

The UK wants to ban the next generation from smoking. Will it work?

The whole world would lose in a full-blown war between Israel and Iran, Northeastern expert says after latest escalation

2024 will prove a crucial year for EU, the Russia-Ukraine war, former State Department official says

New models of Big Bang by Northeastern physicists show that visible universe and invisible dark matter co-evolved

What does the 2024 bitcoin ‘halving’ event mean for miners, investors?

Wheelchair sensors can cost $10,000. Here’s how Northeastern engineering students built a better version — for $87

Does Hollywood have a pain problem? Researchers study Netflix and find that depictions of adolescent pain in TV and movies could be reinforcing stereotypes

From right swipe to writing: How this Northeastern professor wrote a book with a fellow entrepreneur she met on a dating app

Can ‘pre-saving’ Taylor Swift’s new album help speed its delivery on launch day?

She went from marketing exec and part-time singer to opening her own art studio — while leaning on her Northeastern MBA

Start Summit at Northeastern’s Portland campus focuses on inclusivity and welcoming new entrepreneurs to Maine

From right swipe to writing: How this Northeastern professor wrote a book with a fellow entrepreneur she met on a dating app

Can big data have a role in treating dementia? That’s what this Northeastern student is hoping to help solve

Overheated or dehydrated after the Boston Marathon? These Northeastern physical therapy students will help you recover

Can stretching replace other types of exercise? Fitness experts explain the positives (and negatives) of the latest trend

From Donald Trump to Karen Read — how does jury selection proceed in high-profile cases?

Lawsuit against Birkin bag maker Hermès is ‘a nonstarter’ in antitrust law, Northeastern expert says

European regulators are cracking down on Alphabet, Apple and Meta. Will that have an impact on how their products work around the world?

This co-op at a Napa Valley winery teaches students about wine ‘from grape to bottle’

Efforts to limit fast-food near homes need rethinking, Northeastern researcher says

Nike Dunks, Air Jordans, Yeezy slides: Huskick’s club is all about sneakers

O.J. Simpson is dead. How the former NFL star’s double-murder trial captured the nation’s attention

Video: These Northeastern grads met on the Green Line. They’re now husband, wife and owners of two independent pharmacies

Video: 2023 graduate reflects on her Northeastern experience: neuroscience, co-ops, sports and volunteerism

Overheated or dehydrated after the Boston Marathon? These Northeastern physical therapy students will help you recover

March Madness is coming to a peak. Will collegiate basketball superstar Caitlin Clark maintain her momentum as she moves on to the WNBA?

Federal sports betting bill is introduced with assist from Northeastern’s Public Health Advocacy Institute

The thinking behind gender stereotypes

How to secure the cloud

What’s wiping out the Caribbean corals?

Can big data have a role in treating dementia? That’s what this Northeastern student is hoping to help solve

She went from marketing exec and part-time singer to opening her own art studio — while leaning on her Northeastern MBA

The UK wants to ban the next generation from smoking. Will it work?

.ngn-magazine__shapes {fill: var(--wp--custom--color--emphasize, #000) } .ngn-magazine__arrow {fill: var(--wp--custom--color--accent, #cf2b28) } NGN Magazine These Northeastern graduates are improving our neighborhoods one tree at a time

New models of Big Bang by Northeastern physicists show that visible universe and invisible dark matter co-evolved

Wheelchair sensors can cost $10,000. Here’s how Northeastern engineering students built a better version — for $87

Start Summit at Northeastern’s Portland campus focuses on inclusivity and welcoming new entrepreneurs to Maine

How can we provide better care for patients with severe brain injuries? That’s the mission of this Northeastern graduate

The whole world would lose in a full-blown war between Israel and Iran, Northeastern expert says after latest escalation

Science & Technology

Recent Stories

These Northeastern graduates are improving our neighborhoods one tree at a time