Employing Probabilistic Matching Algorithms for Identity Management in the Telecommunication Industry.

ABSTRACT

The telecommunication industry has a lot of data related to households, individuals and devices. Advertisers pay a premium to ensure they advertise to their target audience.

To ensure that content is personalized, it is necessary to accurately predict who is using a device in real time.

A probabilistic matching algorithm to determine the profile of an individual based on behavioural analytics is developed and implemented.

Two datasets ‘People data’ and ‘Device data’ were linked and matched using social behaviours exhibited by individuals whose information are contained in the People data and by devices whose addresses show specific social behaviours of individuals who use the devices.

A match score was generated to show the accuracy of a pair of records from the different datasets (i.e. to show if both records are indeed a match or not).

TABLE OF CONTENTS

LIST OF FIGURES …………. ix
LIST OF TABLES …………… ix

CHAPTER 1

1.1. INTRODUCTION ………………….. 1
1.2. BACKGROUND STUDY ……….. 2
1.2.1. On Record Linkage …………. 2

1.2.1. On Identity Management .. 3
1.3. LIMITIATIONS OF SOME WORKS …… 4
1.4. PROBLEM STATEMENT ……… 4
1.5. AIM OF STUDY ………. 5
1.6. OBJECTIVES ………….. 5
1.7. TECHNOLOGIES REQUIRED ……………. 5
1.8. DEFINITION OF TERMS ………………… 6
1.9. GENERAL PROBLEMS OF PROBABILISTIC MATCHING …….. 7
1.10. PROJECT SCOPE ……….. 8

CHAPTER 2

2.1. RECORD LINKAGE …….. 9
2.1.1. Deterministic Record Linkage Versus Probabilistic Record Linkage ….. 9
2.2. IDENTITY MANAGEMENT ………………. 9
2.2.1. Authorization Versus Authentication…………….. 9
2.3. RELATED WORKS ………. 10

CHAPTER 3

3.1. METHODOLOGY ……………….. 17
3.1.1. General Problems of Record Linkage That Requires Probabilistic Matching ..18

3.2. PROBABILISTIC MATCHING …………….. 18
3.2.1. Probabilistic Matching ………….. 18
3.2.2. Performing Probabilistic Matching ……… 20
3.2.3. Probabilistic Matching Algorithm…………… 20
3.2.4. Mathematical Implications of Probabilistic Matching….. 22
3.2.5. Fellegi-Sunter Model ………………. 24
3.2.6. Processes in Probabilistic Matching ………. 24

CHAPTER 4

4.1. MATCH SCORE AND POSITIVE PREDICTIVE VALUE (PPV) …… 35
4.1.1. Manual Calculation of Match Score and PPV ………….. 35
4.1.2. Using Fuzzywuzzy Library for PPV and Match Score…. 36
4.1.3. Comparing Manual Matching, Fuzzywuzzy and EM Algorithms …. 38
4.2. STRING COMPARATORS ……………. 38
4.3. THE MATCHED DATASETS ……………… 39

CHAPTER 5

5.1. CONCLUSION …………….. 41
5.2 CHALLENGES ………. 41
5.3. RECOMMENDATION ………. 42
5.4. CONTRIBUTIONS ……… 42
5.5. FUTURE WORKS …………….. 42
5.6. APPLICATIONS OF PROBABILISTIC MATCHING …… 42
APPENDIX
REFERENCES

INTRODUCTION

The Telecommunications industry is one of the subsectors that make up the Information and Telecommunication Technology sector.

This industry includes all telephone companies, Internet Service Providers (ISP), radio companies and television companies. The Telecommunication industry gets wider and more complex due to the proliferative nature of the devices involved.

The Telecommunication industry is a very high revenue generating company. Research has it, that due to the increasing scope of the Telecommunication industry, telecommunications service revenue will grow from $2.2 trillion in 2015 to $2.4 trillion in 2019.

A way to achieve this is through advertisement. Advertisers pay huge amount of money to advertise their services. So, there is the need to advertise products and services and more so, to advertise to the target user.

When products and services are advertised to the target audience, there is higher chance of companies selling and users purchasing.

Therefore, there is a need to know who uses a device at a particular time and what such user is interested in. This rapid increase in the number of devices available allows individuals (or households, as the case may be) to own more than one device.

The need for the identification of users per time and also for advertisement to target audience is where Identity Management is taken into consideration.

REFERENCES

Li. J and Wang. A. G. 2015. A framework of Identity Resolution: evaluating identity attributes and matching algorithms. Security Informatics; a SpringerOpen journal. DOI 10.1186/s13388-015-0021-0.

Winkler W. E. 2015. Probabilistic Linkage. Methodological Development in Data Linkage.

Oracle Corporation. June 2008. An Introduction to Oracle Identity Management. An Oracle White Paper. June 2008.

Diaz-Morales. R. 2015. Cross-Device Tracking: Matching Devices and Cookies. IEEE 15th International Conference on Data Mining Workshops. Page 30. 33428.

Bigelow. W, Karlson. T and Beutel. P. 1999. Using Probabilistic Linkage to Merge Multiple Data Sources for Monitoring Population Health. Centre for Health Systems Research Analysis.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *