A Review of Dimensionality Reduction Methods and their Applications.

ABSTRACT

In the we live in today, the in data generally has seen a great rise. This is because of the numerous that comes with working with smaller efficient data instead of the original large dataset.

With this analogy, we can adopt Dimensionality reduction in computer science emphasizing on reducing computer memory in order to have more storage capacity on a . An example of this would be to reduce digital which are then stored in 2D matrices.

Dimensionality reduction is a process where by given a collection of data points in a high dimensional Euclidean space, it is often helpful to be able to project it into a lower dimensional Euclidean space without suffering great distortion.

The result obtained by working in the lower dimensional space becomes a good approximation to the original dataset obtained by working in the high dimensional space.

Dimensionality Reduction has two categories: In the first category includes those in which each attribute in the reduced set is a linear combination of the attributes in the original dataset. These include RP and PCA.

While the second category includes those in which the set of attributes in the reduced set is a proper subset of the attributes in the original dataset.

These include all the other six techniques I implemented such as New Random Approach, Variance Approach, The first Novel Approach, The second Novel Approach, The Third Novel Approach and the LSA- Transform Approach.

Also, I compared these techniques mentioned above by how they preserve their images. Furthermore, I looked at the various applications we can use Dimensionality reduction example include:

Image reduction, Text data, Nearest, neighbor search, K-nearest neighbor search, Similarity search in a time series, Clustering, Classification. MATLAB programming language will be used to carry all forms of implementation in this project course.

TABLE OF CONTENTS

DECLARATION …… I
ACKNOWLEDGEMENT …. 2
ABSTRACT .. 3

CHAPTER 1

1.1 PROJECT OBECTIVE …. 7
1.2 BACKGROUND ….. 8
1.3 ADVANTAGES …… 8

CHAPTER 2

2.0 DIMENSIONALITY REDUCTION TECHNIQUES ……..
2.2 PRINCIPAL COMPONENT ANALYSIS .11
2.3 NEW RANDOM APPROACH (NRA) …….. 12
2.4 SINGULAR VALUE DECOMPOSITION (SVD) ….. 13
2.5 VARIANCE APPROACH ………. 15
2.6 LATENT SEMANTIC ANALYSIS (LSA)-TRANSFORM… 15
2.7 FIRST NOVEL APPROACH ……….. 17
2.8 SECOND NOVEL APPROACH …. 18
2.9 THIRD NOVEL APPROACH ……. 19

CHAPTER 3

3.0 APPLICATIONS. 20
3.1 TEXT DATA … 20
3.2NEAREST NEIGHBOR SEARCH … 21
3.3 SIMILARITY SEARCH IN A TIME SERIES… 22
3.4 CLUSTERING ……… 22
3.5 CLASSIFICATION …….. 23
3.6 K-NEARSEST NEIGHBOR …… 23

CHAPTER 4

4.0 IMPLEMENTATION AND RESULTS OF DIMENSIONALITY REDUCTION ON IMAGES 26
4.1REDUCTION WITH PRINCIPAL COMPONENT ANALYSIS . 26
4.2REDUCTION WITH RANDOM PROJECTION .. 27
4.3REDUCTION WITH THE NEW RANDOM APPROACH. … 28
4.4REDUCTION WITH VARIANCE… 29
4.5REDUCTION WITH THE FIRST NOVEL APPROACH. .. 30
4.6REDUCTION WITH THE SECOND NOVEL APPROACH. ……. 31
4.7REDUCTION WITH THE THIRD NOVEL APPROACH.. 31
4.8REDUCTION WITH LATENT SEMANTIC ANALYSIS. …. 33

CHAPTER 5

5.1 CONCLUSIONS ….. 34
5.2 MATLAB CODES USED . 35
REFERENCES … 41

INTRODUCTION

1.1 Project Objective

This project is mainly a survey on dimensionality reduction discussing different motives why we might want to reduce the dimensionality of a dataset. Outlining various works done, methods used and finally their applications in different domains of life.

This project goes further in depth to look at different dimensionality reduction methods and ways in which we can implement a few of them.

Finally, this project goes further to compare these techniques to the extent in which they preserve images and outlines the various applications in random projection.

1.2 Background

Assume a data set D contains n points in a high dimensional space, this can be mapped out onto a lower dimensional space with minimal distortion. (see Nsang, Novel Approaches to Dimensionality Reduction and Applications). For example, a data set with 30,000 columns will be difficult to inspect.

Evidently, it will of great assistance to obtain 1,500 columns, which will make it a lot easier to analyze the result obtained compared to the original data set with 30,000 columns.

Such that after conducting an analysis of the dataset, the result obtained is a good approximation when put in contrast to the result obtained by analyzing the original data set.

REFERENCES

Nsang and A. Ralescu. A review of dimensionality reduction methods and their applications. In Proceedings of the Twentieth Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 2009), pages 118–123, 2009.
Nsang and A. Ralescu. Approaches to dimensionality reduction to a subset of the original dimensions. In Proceedings of the Twenty-First Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 2010), 2010.
Nsang and A. Ralescu. More approaches to dimensionality reduction to a subset of the original attribute set. In Proceedings of the Twenty-First Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 2010), 2010.
Achlioptas, D. “Database friendly random projections.” Achlioptas, D. Journal of Computer and System Sciences. 2003. 6671 – 678.
Yu, Lei, Jeiping ye and Huan Liu. Dimensionality Reduction for Data Mining – Techniques, Applications and Trends. n.d. Binghamton University: State University of New York; Arizona State University.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *