Random Projections for Matrix Decomposition and Manifold Learning
The thesis focuses on solving problems that are related to the behavior of random
variables in high-dimensional spaces. The main motivation comes from the understanding
that many of the scientific challenges involve large amounts of highdimensional
data. It is known that there are always a small number of “hidden”
parameters that encode the “interesting” part of the data. The question is, how do
we identify and extract these parameters? This thesis is focused on two different
aspects of data analysis: Numerical linear algebra and manifold learning.
Numerical linear algebra is a major component for data analysis. It includes
matrix factorization algorithms such as SVD and LU. SVD is considered to be the
single most important algorithm in numerical linear algebra. However, due to the
computational complexity of classical SVD algorithms, they cannot be applied in
practice to huge datasets. One possible solution to this problem is to use low-rank
methods. The idea of low-rank methods is the fact that in many cases there are
dependencies and redundancies within the data. Therefore, the data can be well
approximated and processed by utilizing its low-rank property which results in
a faster processing of smaller data. In this thesis, Low-rank SVD and LU approximation
algorithms are presented. They create a trade-off between accuracy and
computational time. We improve on the state-of-the-art algorithms for Low-rank
SVD and LU approximation. Since matrix factorization algorithms play a central
central role in almost any modern computation, this part of the thesis provides
general tools for many of the modern big data, and data analysis challenges.
Understanding high-dimensional data via manifold learning. Many data
analysis problems are formulated in the language of manifold learning. A typical
assumption is that the data is on (or near) some unknown manifold embedded in
high dimensions, and the goal is to “understand” the structure of this manifold.
The thesis presents two result on this subject. First, a connection between two of
the most classical methods in manifold learning, PCA and least squares, is presented.
Secondly, a method for regression over manifold is presented. It allows to
interpolate functions defined on manifolds given only the values of the function
in several sampled points, without knowing the manifold on which the function
is defined. The ability to solve regression problems over manifolds, can enable us
to gain new insights from complex sampled data.
Keywords: Matrix decompositions, Random projections, SVD, LU, manifold learning,
Regression over manifolds
...
Publisher
Jyväskylän yliopistoISBN
978-951-39-7965-2ISSN Search the Publication Forum
2489-9003Contains publications
- Artikkeli I: Shabat, G., Shmueli, Y., Aizenbud, Y., Averbuch. A. (2018). Randomized LU Decomposition. Applied and Computational Harmonic Analysis, 44(2), 246-272. DOI: 10.1016/j.acha.2016.04.006
- Artikkeli II: Aizenbud, Y., Averbuch. A. (2018). Matrix Decompositions Using sub-Gaussian Random Matrices. Information and Inference: A Journal of the IMA,8.3, 445-469. DOI: 10.1093/imaiai/iay017
- Artikkeli III: Aizenbud, Y, and Sober. B. (2019). Approximating the Span of Principal Components via Iterative Least-Squares. arXiv:1907.12159
- Artikkeli IV: Sober, B., Aizenbud, Y., Levin, D. (2021). Approximation of functions over manifolds : A Moving Least-Squares approach. Journal of Computational and Applied Mathematics, 383, 113140. DOI: 10.1016/j.cam.2020.113140
Keywords
Metadata
Show full item recordCollections
- JYU Dissertations [852]
- Väitöskirjat [3580]
License
Related items
Showing items with similar title or keywords.
-
Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?
Linja, Joakim; Hämäläinen, Joonas; Nieminen, Paavo; Kärkkäinen, Tommi (MDPI AG, 2020)Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of ... -
Intelligent solutions for real-life data-driven applications
Ivannikova, Elena (University of Jyväskylä, 2017)The subject of this thesis belongs to the topic of machine learning or, specifically, to the development of advanced methods for regression analysis, clustering, and anomaly detection. Industry is constantly seeking ... -
Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism
Aaltonen, Olli-Pekka (2016)Rikosseuraamusalalla on viime vuosina kehitetty uusintarikollisuutta ennustavia malleja (Tyni, 2015), jotka perustuvat tyypillisesti rekisteripohjaisiin mittareihin, jotka mittaavat mm. tuomitun sukupuolta, ikää, rikostaustaa ... -
An Efficient Network Log Anomaly Detection System using Random Projection Dimensionality Reduction
Juvonen, Antti; Hämäläinen, Timo (IEEE, 2014)Network traffic is increasing all the time and network services are becoming more complex and vulnerable. To protect these networks, intrusion detection systems are used. Signature-based intrusion detection cannot find ... -
Approximation of functions over manifolds : A Moving Least-Squares approach
Sober, Barak; Aizenbud, Yariv; Levin, David (Elsevier BV, 2021)We present an algorithm for approximating a function defined over a d-dimensional manifold utilizing only noisy function values at locations sampled from the manifold with noise. To produce the approximation we do not ...