dc.contributor.author | Rotbart, Aviv | |
dc.date.accessioned | 2015-11-26T07:14:00Z | |
dc.date.available | 2015-11-26T07:14:00Z | |
dc.date.issued | 2015 | |
dc.identifier.isbn | 978-951-39-6402-3 | |
dc.identifier.other | oai:jykdok.linneanet.fi:1504613 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/47840 | |
dc.description.abstract | Algorithms for modern Big Data analysis deal with both massive amount of sam-
ples and a large number of features (high-dimension). One way to cope with
these challenges is to assume and discover the existence of localization in the
data by uncovering its intrinsic geometry. This approach suggests that different
data segments can be analyzed separately and then unified in order to gain an
understanding of the whole phenomenon. Methods that utilize efficiently local-
ized data are attractive for high-dimensional big data analysis, because they can
be parallelized, and thus the computational resources, which are needed for their
utilization, are realistic and affordable. These methods can explore local proper-
ties such as intrinsic dimension that vary among different pieces of data.
This thesis presents two different methods to locally analyze large datasets
for classification, clustering and anomaly detection. The first method localizes
dictionary learning based on matrix factorization techniques. We utilize random-
ized LU decomposition and QR-decomposition algorithms to build dictionaries
that describe different types of data. Then, these dictionaries are used to assign
new samples to their respective class. One application in cyber security deals
with learning of computer files and detecting executable code hidden in PDF
files. In a different application, a dictionary learned from a normally behaving
computer network data is used to detect anomalies in test data which may imply
a cyber threat.
The second method is localized diffusion process (LDP), which constitutes a
coarse-graining of the classic Diffusion Maps algorithm. In LDP, a Markov walk
is calculated on small data point clouds instead of the original data points. This
work establishes a theoretical foundation for the Localized Diffusion Folders for
hierarchical data analysis. | |
dc.format.extent | 1 verkkoaineisto (21, [70] sivua) | |
dc.language.iso | eng | |
dc.publisher | University of Jyväskylä | |
dc.relation.ispartofseries | Jyväskylä studies in computing | |
dc.relation.haspart | <b>Article I:</b> Guy Wolf, Aviv Rotbart, Gil David, Amir Averbuch. Coarse-grained localized diffusion. <i>Applied and Computational Harmonic Analysis(3):388-400, 2012. </i> <a href="http://dx.doi.org/ 10.1016/j.acha.2012.02.004 " target="_blank">DOI: 10.1016/j.acha.2012.02.004 </a> | |
dc.relation.haspart | <b>Article II:</b> Guy Wolf, Aviv Rotbart, Gil David, Amir Averbuch. Hierarchical data organization, clustering and denoising via Coarse-grained localized diffusion. CJR conference, Yale, 2012. </i> | |
dc.relation.haspart | <b>Article III:</b> Aviv Rotbart, Gil Shabat, Yaniv Shmueli, Amir Averbuch. Randomized LU decomposition: An algorithm for dictionaries construction. <i>Submitted to IEEE transaction on Information Forensics and Security, 2014. </i><a href=" http://arxiv.org/pdf/1502.04824v1.pdf "> arxiv.org </a> | |
dc.relation.haspart | <b>Article IV:</b> Amit Bermanis, Aviv Rotbart, Moshe Salhov, Amir Averbuch. Incomplete Pivoted QR-based Dimensionality Reduction. <i>Submitted, 2015. </i><a href=" http://www.cs.tau.ac.il/~amir1/PS/qrDR.pdf" target="_blank">Please see.</a> | |
dc.subject.other | localized diffusion | |
dc.subject.other | dictionary learning | |
dc.subject.other | randomized LU | |
dc.subject.other | QR factorization | |
dc.title | High-dimensional Big Data processing with dictionary learning and diffusion maps | |
dc.type | Diss. | |
dc.identifier.urn | URN:ISBN:978-951-39-6402-3 | |
dc.type.dcmitype | Text | en |
dc.type.ontasot | Väitöskirja | fi |
dc.type.ontasot | Doctoral dissertation | en |
dc.contributor.tiedekunta | Informaatioteknologian tiedekunta | fi |
dc.contributor.yliopisto | University of Jyväskylä | en |
dc.contributor.yliopisto | Jyväskylän yliopisto | fi |
dc.contributor.oppiaine | Tietotekniikka | fi |
dc.relation.issn | 1456-5390 | |
dc.relation.numberinseries | 223 | |
dc.rights.accesslevel | openAccess | fi |
dc.subject.yso | data | |
dc.subject.yso | big data | |
dc.subject.yso | analyysimenetelmät | |
dc.subject.yso | algoritmit | |
dc.subject.yso | koneoppiminen | |
dc.subject.yso | matriisilaskenta | |