University of Jyväskylä | JYX Digital Repository

  • English  | Give feedback |
    • suomi
    • English
 
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  • JYX
  • Opinnäytteet
  • Väitöskirjat
  • View Item
JYX > Opinnäytteet > Väitöskirjat > View Item

High-dimensional Big Data processing with dictionary learning and diffusion maps

Thumbnail
View/Open
2.5 Mb

Downloads:  
Show download detailsHide download details  
Published in
Jyväskylä studies in computing
Authors
Rotbart, Aviv
Date
2015
Discipline
Tietotekniikka

 
Algorithms for modern Big Data analysis deal with both massive amount of sam- ples and a large number of features (high-dimension). One way to cope with these challenges is to assume and discover the existence of localization in the data by uncovering its intrinsic geometry. This approach suggests that different data segments can be analyzed separately and then unified in order to gain an understanding of the whole phenomenon. Methods that utilize efficiently local- ized data are attractive for high-dimensional big data analysis, because they can be parallelized, and thus the computational resources, which are needed for their utilization, are realistic and affordable. These methods can explore local proper- ties such as intrinsic dimension that vary among different pieces of data. This thesis presents two different methods to locally analyze large datasets for classification, clustering and anomaly detection. The first method localizes dictionary learning based on matrix factorization techniques. We utilize random- ized LU decomposition and QR-decomposition algorithms to build dictionaries that describe different types of data. Then, these dictionaries are used to assign new samples to their respective class. One application in cyber security deals with learning of computer files and detecting executable code hidden in PDF files. In a different application, a dictionary learned from a normally behaving computer network data is used to detect anomalies in test data which may imply a cyber threat. The second method is localized diffusion process (LDP), which constitutes a coarse-graining of the classic Diffusion Maps algorithm. In LDP, a Markov walk is calculated on small data point clouds instead of the original data points. This work establishes a theoretical foundation for the Localized Diffusion Folders for hierarchical data analysis. ...
Publisher
University of Jyväskylä
ISBN
978-951-39-6402-3
ISSN Search the Publication Forum
1456-5390
Contains publications
  • Article I: Guy Wolf, Aviv Rotbart, Gil David, Amir Averbuch. Coarse-grained localized diffusion. Applied and Computational Harmonic Analysis(3):388-400, 2012. DOI: 10.1016/j.acha.2012.02.004
  • Article II: Guy Wolf, Aviv Rotbart, Gil David, Amir Averbuch. Hierarchical data organization, clustering and denoising via Coarse-grained localized diffusion. CJR conference, Yale, 2012.
  • Article III: Aviv Rotbart, Gil Shabat, Yaniv Shmueli, Amir Averbuch. Randomized LU decomposition: An algorithm for dictionaries construction. Submitted to IEEE transaction on Information Forensics and Security, 2014. arxiv.org
  • Article IV: Amit Bermanis, Aviv Rotbart, Moshe Salhov, Amir Averbuch. Incomplete Pivoted QR-based Dimensionality Reduction. Submitted, 2015. Please see.
Keywords
localized diffusion dictionary learning randomized LU QR factorization data big data analyysimenetelmät algoritmit koneoppiminen matriisilaskenta
URI

http://urn.fi/URN:ISBN:978-951-39-6402-3

Metadata
Show full item record
Collections
  • Väitöskirjat [3080]

Related items

Showing items with similar title or keywords.

  • Big high-dimensional data analysis with diffusion maps 

    Wolf, Guy (University of Jyväskylä, 2013)
  • Knowledge discovery using diffusion maps 

    Sipola, Tuomo (University of Jyväskylä, 2013)
  • Dimensionality reduction framework for detecting anomalies from network logs 

    Sipola, Tuomo; Juvonen, Antti; Lehtonen, Joel (CRL Publishing, 2012)
    Dynamic web services are vulnerable to multitude of intrusions that could be previously unknown. Server logs contain vast amounts of information about network traffic, and finding attacks from these logs improves the ...
  • Adaptive framework for network traffic classification using dimensionality reduction and clustering 

    Juvonen, Antti; Sipola, Tuomo (IEEE, 2012)
    Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting ...
  • An Efficient Network Log Anomaly Detection System using Random Projection Dimensionality Reduction 

    Juvonen, Antti; Hämäläinen, Timo (IEEE, 2014)
    Network traffic is increasing all the time and network services are becoming more complex and vulnerable. To protect these networks, intrusion detection systems are used. Signature-based intrusion detection cannot find ...
  • Browse materials
  • Browse materials
  • Articles
  • Conferences and seminars
  • Electronic books
  • Historical maps
  • Journals
  • Tunes and musical notes
  • Photographs
  • Presentations and posters
  • Publication series
  • Research reports
  • Research data
  • Study materials
  • Theses

Browse

All of JYXCollection listBy Issue DateAuthorsSubjectsPublished inDepartmentDiscipline

My Account

Login

Statistics

View Usage Statistics
  • How to publish in JYX?
  • Self-archiving
  • Publish Your Thesis Online
  • Publishing Your Dissertation
  • Publication services

Open Science at the JYU
 
Data Protection Description

Accessibility Statement

Unless otherwise specified, publicly available JYX metadata (excluding abstracts) may be freely reused under the CC0 waiver.
Open Science Centre