Automaattisen verkkoharavoinnin menetelmät ja haasteet

Peltomaa, Olli

URN:NBN:fi:jyu-202305122996.pdf

Automaattisen verkkoharavoinnin menetelmät ja haasteet

Abstract

Verkkoharavointi on tekniikka, jota käyttämällä voidaan kerätä tietoa internetistä ohjelmallisesti ja sitä voidaan hyödyntää moniin tieteellisiin ja kaupallisiin tarkoituksiin. Verkkoharavointiohjelmat voivat kuitenkin kohdata monenlaisia haasteita, jotka saattavat pakottaa kehittäjän päivittämään haravointiohjelmaa toistuvasti. Kirjallisuuden perusteella käyttöliittymättömät selaimet ja koneoppimisalgoritmit tuottavat yhdessä parhaiten erilaisia haasteita sietävän ohjelman. Verkkoharavoinnin ala on altis nopeille muutoksille, mutta nykyisen kirjallisuuden perusteella koneoppimiseen perustuvissa algoritmeissa on kenties eniten tutkittavaa.

Web scraping is a technique that can be used to gather information from the Internet programmatically and it can be used for many scientific and commercial purposes. However, web scrapers can face a variety of challenges that may force the developer to update the scraper repeatedly. Based on the literature, headless browsers and machine learning algorithms together produce the best scrapers that tolerates different challenges. The field of web scraping is prone to rapid changes, but based on the current literature, algorithms based on machine learning have perhaps the most research to do.

Main Author

Peltomaa, Olli

Format

Theses Bachelor thesis

Published

2023

Subjects

verkkoharavointi

CAPTCHA

päätön selain

WWW-sivut

Internet

tietotekniikka

tiedonhaku

The permanent address of the publication

https://urn.fi/URN:NBN:fi:jyu-202305122996Use this for linking

Language

Finnish

License

Automaattisen verkkoharavoinnin menetelmät ja haasteet

Share

Similar Items