• español
    • English
  • Login
  • English 
    • español
    • English
  • Publication Types
    • bookbook partconference objectdoctoral thesisjournal articlemagazinemaster thesispatenttechnical documentationtechnical report
View Item 
  •   IMDEA Networks Home
  • View Item
  •   IMDEA Networks Home
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Apophanies or Epiphanies: How Crawlers Can Impact Our Understanding of the Web

Share
Files
Apophanies_Epiphanies_How_Crawlers_Impact_Our_Understanding_Web_2020_EN.pdf (1.870Mb)
Identifiers
URI: http://hdl.handle.net/20.500.12761/777
Metadata
Show full item record
Author(s)
Ahmad, Syed Suleman; Dar, Muhammad Daniyal; Zaffar, Zareed; Vallina-Rodriguez, Narseo; Nithyanand, Rishab
Date
2020-04-20
Abstract
Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.
Share
Files
Apophanies_Epiphanies_How_Crawlers_Impact_Our_Understanding_Web_2020_EN.pdf (1.870Mb)
Identifiers
URI: http://hdl.handle.net/20.500.12761/777
Metadata
Show full item record

Browse

All of IMDEA NetworksBy Issue DateAuthorsTitlesKeywordsTypes of content

My Account

Login

Statistics

View Usage Statistics

Dissemination

emailContact person Directory wifi Eduroam rss_feed News
IMDEA initiative About IMDEA Networks Organizational structure Annual reports Transparency
Follow us in:
Community of Madrid

EUROPEAN UNION

European Social Fund

EUROPEAN UNION

European Regional Development Fund

EUROPEAN UNION

European Structural and Investment Fund

© 2021 IMDEA Networks. | Accesibility declaration | Privacy Policy | Disclaimer | Cookie policy - We value your privacy: this site uses no cookies!