Salah Sharieh, Head, Transformation Office at RBC.
Adjunct Professor Ryerson University, Tech Titan 2019
Over the years, the amount of produced data has increased exponentially, leading to the development of robust infrastructure and massive databases able to cope with the demands for managing information. Data in isolation is meaningless; it requires to be analyzed in order to be transformed into valuable information. This information is translated into useful knowledge and the process of transforming data into knowledge is called Knowledge Discovery (KDD) which uses Data Mining (DM) techniques. DM is the core of the KDD process and that it uses algorithms to explore and discover unknown patterns.
DM is defined as the process of extraction of useful information, patterns and trends from large quantities of data which could be found in sources such as databases, texts, images and data on the Web. While KDD explores, analyses and models masses of data hosted in repositories, it identifies useful and novel patterns from complex data sets.
Learning is the process of extracting knowledge in most DM methods. A model learns from training data by a learning algorithm, which is evaluated using a test dataset. Due to a high degree of randomness or limitations in the algorithms, several iterations might be necessary before a satisfactory model is found. A model produces a classification/prediction function which foretells the values of future data. These methods are also known as Supervised Learning methods. The main steps within the KDD process are:
1. Business understanding: Define the goals to be achieved and understand the environment in which knowledge discovery will take place.
2. Data Pre-processing:
a. Identification and selection of a data set which will be used for learning during the data mining process
b. Data cleansing for ensuring the completeness and reliability of data
c. Data transformation for preparing better data to increase the accuracy of the results Statistical methods such as regression analysis, cluster analysis or decision trees can be used in this task.
3. Evaluation: Interpretation of knowledge patterns through a visualization tool. In this step, the results are compared and evaluated against the original goals that were set in the beginning of the process.
4. Deployment: Report the results of the data mining study and apply them as convenient.
The taxonomy of the DM paradigms provides an understanding of methods and their grouping. DM is divided in two types a) Verification-oriented which verifies a user’s hypothesis (traditional statistics); and b) Discovery-oriented where the system finds new rules and patterns autonomously. The latter method consists of two methods: Descriptive, focused on data interpretation, also known as unsupervised learning. The second method is Prediction, which aims to build a behavioral model able to predict values and develop patterns; this is also known as supervised learning.
Other DM techniques are Association, Prediction, Sequential Patterns, and Similar Time Sequences. In Association, the relationship of a particular item in data transactions is used to predict patterns through the use of association rules; the rules consist of a confidence factor and a support factor. In Classification, methods learn different functions from an item which are mapped into classes. The set of classes, the attributes and the learning set can predict the class of other unclassified data. Sequential pattern analysis aims to find similar patterns in data transactions over a business period. Lastly, Time sequences; discover sequences similar to a known sequence over a past and present business period.
EA and DM techniques are playing an important role in supporting Business Intelligence (BI) thus enabling knowledge extraction that can be tactically used to design business strategies, product development and market analysis. The objective of BI is to support better business decision making. For that purpose, it uses technology, applications and methods for the analysis of data. DM, not only “identifies nuggets of information that can result in profitability” but also, can be expanded to retrieve more meaningful data based on behaviours rather than on statistical methods only. However, it also comes with some limitations and disadvantages. In order for DM algorithms to be effective, it is necessary to have a competitive data set where the system can infer learning, despite the fact Web data mining is complex due to the massive volume of information.
Dr. Salah Sharieh is a senior technical Innovator with extensive experience in business, technology, and digital transformation. Working at RBC, he led the delivery of the first Developer Portal in Canada, enabling API economy and allowing external Developers across industries to collaborate and innovate. As Technical Head with BMO, Salah’s technical and leadership skills managed his team to deliver several high-profile initiatives such as the first mobile account open in Canada, the first bio-metric touch ID solution, and the first integrated tablet solution for investment and everyday banking.
Salah is a Yeates School of Graduate Studies member at Ryerson University, where he supervised Ph.D. and Master’s students. One of his research areas is the new role of CIOs in the new Digital Economy. In addition, he taught several courses in areas like security, algorithms, and networks. Salah holds the degree of Doctor of Philosophy from McMaster University. He has more than forty-five peer-reviewed publications and has contributed to several books