Data Mining Techniques to Identify Financial Restatements

Data Mining Techniques to Identify Financial Restatements PDF Author: Ila Dutta
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Data mining is a multi-disciplinary field of science and technology widely used in developing predictive models and data visualization in various domains. Although there are numerous data mining algorithms and techniques across multiple fields, it appears that there is no consensus on the suitability of a particular model, or the ways to address data preprocessing issues. Moreover, the effectiveness of data mining techniques depends on the evolving nature of data. In this study, we focus on the suitability and robustness of various data mining models for analyzing real financial data to identify financial restatements. From data mining perspective, it is quite interesting to study financial restatements for the following reasons: (i) the restatement data is highly imbalanced that requires adequate attention in model building, (ii) there are many financial and non-financial attributes that may affect financial restatement predictive models. This requires careful implementation of data mining techniques to develop parsimonious models, and (iii) the class imbalance issue becomes more complex in a dataset that includes both intentional and unintentional restatement instances. Most of the previous studies focus on fraudulent (or intentional) restatements and the literature has largely ignored unintentional restatements. Intentional (i.e. fraudulent) restatements instances are rare and likely to have more distinct features compared to non-restatement cases. However, unintentional cases are comparatively more prevalent and likely to have fewer distinct features that separate them from non-restatement cases. A dataset containing unintentional restatement cases is likely to have more class overlapping issues that may impact the effectiveness of predictive models. In this study, we developed predictive models based on all restatement cases (both intentional and unintentional restatements) using a real, comprehensive and novel dataset which includes 116 attributes and approximately 1,000 restatement and 19,517 non-restatement instances over a period of 2009 to 2014. To the best of our knowledge, no other study has developed predictive models for financial restatements using post-financial crisis events. In order to avoid redundant attributes, we use three feature selection techniques: Correlation based feature subset selection (CfsSubsetEval), Information gain attribute evaluation (InfoGainEval), Stepwise forward selection (FwSelect) and generate three datasets with reduced attributes. Our restatement dataset is highly skewed and highly biased towards non-restatement (majority) class. We applied various algorithms (e.g. random undersampling (RUS), Cluster based undersampling (CUS) (Sobhani et al., 2014), random oversampling (ROS), Synthetic minority oversampling technique (SMOTE) (Chawla et al., 2002), Adaptive synthetic sampling (ADASYN) (He et al., 2008), and Tomek links with SMOTE) to address class imbalance in the financial restatement dataset. We perform classification employing six different choices of classifiers, Decision three (DT), Artificial neural network (ANN), Naïve Bayes (NB), Random forest (RF), Bayesian belief network (BBN) and Support vector machine (SVM) using 10-fold cross validation and test the efficiency of various predictive models using minority class recall value, minority class F-measure and G-mean. We also experiment different ensemble methods (bagging and boosting) with the base classifiers and employ other meta-learning algorithms (stacking and cost-sensitive learning) to improve model performance. While applying cluster-based undersampling technique, we find that various classifiers (e.g. SVM, BBN) show a high success rate in terms of minority class recall value. For example, SVM classifier shows a minority recall value of 96% which is quite encouraging. However, the ability of these classifiers to detect majority class instances is dismal. We find that some variations of synthetic oversampling such as 'Tomek Link + SMOTE' and 'ADASYN' show promising results in terms of both minority recall value and G-mean. Using InfoGainEval feature selection method, RF classifier shows minority recall values of 92.6% for 'Tomek Link + SMOTE' and 88.9% for 'ADASYN' techniques, respectively. The corresponding G-mean values are 95.2% and 94.2% for these two oversampling techniques, which show that RF classifier is quite effective in predicting both minority and majority classes. We find further improvement in results for RF classifier with cost-sensitive learning algorithm using 'Tomek Link + SMOTE' oversampling technique. Subsequently, we develop some decision rules to detect restatement firms based on a subset of important attributes. To the best of our knowledge, only Kim et al. (2016) perform a data mining study using only pre-financial crisis restatement data. Kim et al. (2016) employed a matching sample based undersampling technique and used logistic regression, SVM and BBN classifiers to develop financial restatement predictive models. The study's highest reported G-mean is 70%. Our results with clustering based undersampling are similar to the performance measures reported by Kim et al. (2016). However, our synthetic oversampling based results show a better predictive ability. The RF classifier shows a very high degree of predictive capability for minority class instances (97.4%) and a very high G-mean value (95.3%) with cost-sensitive learning. Yet, we recognize that Kim et al. (2016) use a different restatement dataset (with pre-crisis restatement cases) and hence a direct comparison of results may not be fully justified. Our study makes contributions to the data mining literature by (i) presenting predictive models for financial restatements with a comprehensive dataset, (ii) focussing on various datamining techniques and presenting a comparative analysis, and (iii) addressing class imbalance issue by identifying most effective technique. To the best of our knowledge, we used the most comprehensive dataset to develop our predictive models for identifying financial restatement.

Data Mining Techniques to Identify Financial Restatements

Data Mining Techniques to Identify Financial Restatements PDF Author: Ila Dutta
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Data mining is a multi-disciplinary field of science and technology widely used in developing predictive models and data visualization in various domains. Although there are numerous data mining algorithms and techniques across multiple fields, it appears that there is no consensus on the suitability of a particular model, or the ways to address data preprocessing issues. Moreover, the effectiveness of data mining techniques depends on the evolving nature of data. In this study, we focus on the suitability and robustness of various data mining models for analyzing real financial data to identify financial restatements. From data mining perspective, it is quite interesting to study financial restatements for the following reasons: (i) the restatement data is highly imbalanced that requires adequate attention in model building, (ii) there are many financial and non-financial attributes that may affect financial restatement predictive models. This requires careful implementation of data mining techniques to develop parsimonious models, and (iii) the class imbalance issue becomes more complex in a dataset that includes both intentional and unintentional restatement instances. Most of the previous studies focus on fraudulent (or intentional) restatements and the literature has largely ignored unintentional restatements. Intentional (i.e. fraudulent) restatements instances are rare and likely to have more distinct features compared to non-restatement cases. However, unintentional cases are comparatively more prevalent and likely to have fewer distinct features that separate them from non-restatement cases. A dataset containing unintentional restatement cases is likely to have more class overlapping issues that may impact the effectiveness of predictive models. In this study, we developed predictive models based on all restatement cases (both intentional and unintentional restatements) using a real, comprehensive and novel dataset which includes 116 attributes and approximately 1,000 restatement and 19,517 non-restatement instances over a period of 2009 to 2014. To the best of our knowledge, no other study has developed predictive models for financial restatements using post-financial crisis events. In order to avoid redundant attributes, we use three feature selection techniques: Correlation based feature subset selection (CfsSubsetEval), Information gain attribute evaluation (InfoGainEval), Stepwise forward selection (FwSelect) and generate three datasets with reduced attributes. Our restatement dataset is highly skewed and highly biased towards non-restatement (majority) class. We applied various algorithms (e.g. random undersampling (RUS), Cluster based undersampling (CUS) (Sobhani et al., 2014), random oversampling (ROS), Synthetic minority oversampling technique (SMOTE) (Chawla et al., 2002), Adaptive synthetic sampling (ADASYN) (He et al., 2008), and Tomek links with SMOTE) to address class imbalance in the financial restatement dataset. We perform classification employing six different choices of classifiers, Decision three (DT), Artificial neural network (ANN), Naïve Bayes (NB), Random forest (RF), Bayesian belief network (BBN) and Support vector machine (SVM) using 10-fold cross validation and test the efficiency of various predictive models using minority class recall value, minority class F-measure and G-mean. We also experiment different ensemble methods (bagging and boosting) with the base classifiers and employ other meta-learning algorithms (stacking and cost-sensitive learning) to improve model performance. While applying cluster-based undersampling technique, we find that various classifiers (e.g. SVM, BBN) show a high success rate in terms of minority class recall value. For example, SVM classifier shows a minority recall value of 96% which is quite encouraging. However, the ability of these classifiers to detect majority class instances is dismal. We find that some variations of synthetic oversampling such as 'Tomek Link + SMOTE' and 'ADASYN' show promising results in terms of both minority recall value and G-mean. Using InfoGainEval feature selection method, RF classifier shows minority recall values of 92.6% for 'Tomek Link + SMOTE' and 88.9% for 'ADASYN' techniques, respectively. The corresponding G-mean values are 95.2% and 94.2% for these two oversampling techniques, which show that RF classifier is quite effective in predicting both minority and majority classes. We find further improvement in results for RF classifier with cost-sensitive learning algorithm using 'Tomek Link + SMOTE' oversampling technique. Subsequently, we develop some decision rules to detect restatement firms based on a subset of important attributes. To the best of our knowledge, only Kim et al. (2016) perform a data mining study using only pre-financial crisis restatement data. Kim et al. (2016) employed a matching sample based undersampling technique and used logistic regression, SVM and BBN classifiers to develop financial restatement predictive models. The study's highest reported G-mean is 70%. Our results with clustering based undersampling are similar to the performance measures reported by Kim et al. (2016). However, our synthetic oversampling based results show a better predictive ability. The RF classifier shows a very high degree of predictive capability for minority class instances (97.4%) and a very high G-mean value (95.3%) with cost-sensitive learning. Yet, we recognize that Kim et al. (2016) use a different restatement dataset (with pre-crisis restatement cases) and hence a direct comparison of results may not be fully justified. Our study makes contributions to the data mining literature by (i) presenting predictive models for financial restatements with a comprehensive dataset, (ii) focussing on various datamining techniques and presenting a comparative analysis, and (iii) addressing class imbalance issue by identifying most effective technique. To the best of our knowledge, we used the most comprehensive dataset to develop our predictive models for identifying financial restatement.

Data mining techniques in financial fraud detection

Data mining techniques in financial fraud detection PDF Author: Rohan Ahmed
Publisher: GRIN Verlag
ISBN: 3668709270
Category : Computers
Languages : en
Pages : 18

Get Book Here

Book Description
Seminar paper from the year 2016 in the subject Computer Science - General, grade: 1.7, Heilbronn University, language: English, abstract: In this seminar thesis you will get a view about the Data Mining techniques in financial fraud detection. Financial Fraud is taking a big issue in economical problem, which is still growing. So there is a big interest to detect fraud, but by large amounts of data, this is difficult. Therefore, many data mining techniques are repeatedly used to detect frauds in fraudulent activities. Majority of fraud area are Insurance, Banking, Health and Financial Statement Fraud. The most widely used data mining techniques are Support Vector Machines (SVM), Decision Trees (DT), Logistic Regression (LR), Naives Bayes, Bayesian Belief Network, Classification and Regression Tree (CART) etc. These techniques existed for many years and are used repeatedly to develop a fraud detection system or for analyze frauds.

Comparison of Data Mining Techniques to Correctly Identify Financial Statement Fraud

Comparison of Data Mining Techniques to Correctly Identify Financial Statement Fraud PDF Author: Sterling Panos
Publisher:
ISBN:
Category : Data mining
Languages : en
Pages : 58

Get Book Here

Book Description
Obtained results from this study indicate that the Support Vector Machine analysis is superior to logistic regression and decision-tree analysis in identifying financial statement fraud; improving the overall prediction rates by approximately 8.5%. Support Vector Machine additionally had a 15.5% prediction rate than neural networking. The findings suggest that Support Vector Machine analysis is an important new tool in identifying financial statement fraud.

Fraudulent Financial Statement Detection Using Data Mining Techniques

Fraudulent Financial Statement Detection Using Data Mining Techniques PDF Author: 江玟諭
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


The Application of Data Mining Techniques in the Detection of Financial Statement Fraud

The Application of Data Mining Techniques in the Detection of Financial Statement Fraud PDF Author: Zakaria Ouraich
Publisher:
ISBN:
Category : Fraud
Languages : en
Pages : 86

Get Book Here

Book Description


ARTIFICIAL NEURAL NETWORK AND OTHER DATA MINING TECHNIQUES

ARTIFICIAL NEURAL NETWORK AND OTHER DATA MINING TECHNIQUES PDF Author: VATSALA A/P VIJAYAN (TP051392)
Publisher:
ISBN:
Category : Artificial neural network
Languages : en
Pages : 63

Get Book Here

Book Description
There have been great concerns among stakeholders on how Financial Fraudulent Reporting (FFR) can affect the reputation of public-listed companies (PLCs). FFFR has affected many countries around the world including Malaysia, the focus of the thesis. FFR not only causes significant ethical concern to both individuals and companies but also involves a great amount of financial losses. Various fraud prediction tools have developed to detect FFR, including Artificial Neural Network (ANN), Decision Tree and Linear Regression. The current study assesses the reliability of the above three tools in detecting FFR committed by Malaysian PLCs. This research utilising time-series analysis, which involves 30 Malaysia PLCs (6 fraudulent PLCs and 24 Financial Distress(FD) PLCs) for an eight-year period (from 2010 to 2017), this research examines seven as proxy variables where their directors and top management have been charged and prosecuted by the Securities Commission Malaysia (SC) for committing fraudulent reporting and misstatement from 2010 until 2017. Experts claim that ANN technology can outperform standards statistical methods when applied to examine actual financial data. Therefore, this research compares these three tools to detect FFR approach among Malaysian PLCs. In short, utilising a quantitative design, this research has explored on how to detect of FFR among Malaysian PLCs using the three data mining techniques. The result reflect that ANN is the most accurate.

Data Mining in Finance

Data Mining in Finance PDF Author: Boris Kovalerchuk
Publisher: Springer Science & Business Media
ISBN: 0792378040
Category : Computers
Languages : en
Pages : 323

Get Book Here

Book Description
Data Mining in Finance presents a comprehensive overview of major algorithmic approaches to predictive data mining, including statistical, neural networks, ruled-based, decision-tree, and fuzzy-logic methods, and then examines the suitability of these approaches to financial data mining. The book focuses specifically on relational data mining (RDM), which is a learning method able to learn more expressive rules than other symbolic approaches. RDM is thus better suited for financial mining, because it is able to make greater use of underlying domain knowledge. Relational data mining also has a better ability to explain the discovered rules - an ability critical for avoiding spurious patterns which inevitably arise when the number of variables examined is very large. The earlier algorithms for relational data mining, also known as inductive logic programming (ILP), suffer from a relative computational inefficiency and have rather limited tools for processing numerical data. Data Mining in Finance introduces a new approach, combining relational data mining with the analysis of statistical significance of discovered rules. This reduces the search space and speeds up the algorithms. The book also presents interactive and fuzzy-logic tools for `mining' the knowledge from the experts, further reducing the search space. Data Mining in Finance contains a number of practical examples of forecasting S&P 500, exchange rates, stock directions, and rating stocks for portfolio, allowing interested readers to start building their own models. This book is an excellent reference for researchers and professionals in the fields of artificial intelligence, machine learning, data mining, knowledge discovery, and applied mathematics.

Statistical and Machine-Learning Data Mining

Statistical and Machine-Learning Data Mining PDF Author: Bruce Ratner
Publisher: CRC Press
ISBN: 1466551216
Category : Business & Economics
Languages : en
Pages : 544

Get Book Here

Book Description
The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has completely revised, reorganized, and repositioned the original chapters and produced 14 new chapters of creative and useful machine-learning data mining techniques. In sum, the 31 chapters of simple yet insightful quantitative techniques make this book unique in the field of data mining literature. The statistical data mining methods effectively consider big data for identifying structures (variables) with the appropriate predictive power in order to yield reliable and robust large-scale statistical models and analyses. In contrast, the author's own GenIQ Model provides machine-learning solutions to common and virtually unapproachable statistical problems. GenIQ makes this possible — its utilitarian data mining features start where statistical data mining stops. This book contains essays offering detailed background, discussion, and illustration of specific methods for solving the most commonly experienced problems in predictive modeling and analysis of big data. They address each methodology and assign its application to a specific type of problem. To better ground readers, the book provides an in-depth discussion of the basic methodologies of predictive modeling and analysis. While this type of overview has been attempted before, this approach offers a truly nitty-gritty, step-by-step method that both tyros and experts in the field can enjoy playing with.

Data Preparation for Data Mining

Data Preparation for Data Mining PDF Author: Dorian Pyle
Publisher: Morgan Kaufmann
ISBN: 9781558605299
Category : Computers
Languages : en
Pages : 566

Get Book Here

Book Description
This book focuses on the importance of clean, well-structured data as the first step to successful data mining. It shows how data should be prepared prior to mining in order to maximize mining performance.

Machine Learning Applications for Accounting Disclosure and Fraud Detection

Machine Learning Applications for Accounting Disclosure and Fraud Detection PDF Author: Papadakis, Stylianos
Publisher: IGI Global
ISBN: 179984806X
Category : Business & Economics
Languages : en
Pages : 270

Get Book Here

Book Description
The prediction of the valuation of the “quality” of firm accounting disclosure is an emerging economic problem that has not been adequately analyzed in the relevant economic literature. While there are a plethora of machine learning methods and algorithms that have been implemented in recent years in the field of economics that aim at creating predictive models for detecting business failure, only a small amount of literature is provided towards the prediction of the “actual” financial performance of the business activity. Machine Learning Applications for Accounting Disclosure and Fraud Detection is a crucial reference work that uses machine learning techniques in accounting disclosure and identifies methodological aspects revealing the deployment of fraudulent behavior and fraud detection in the corporate environment. The book applies machine learning models to identify “quality” characteristics in corporate accounting disclosure, proposing specific tools for detecting core business fraud characteristics. Covering topics that include data mining; fraud governance, detection, and prevention; and internal auditing, this book is essential for accountants, auditors, managers, fraud detection experts, forensic accountants, financial accountants, IT specialists, corporate finance experts, business analysts, academicians, researchers, and students.