Databases are everywhere. In the ever-emerging Internet age our daily interactions with databases and cloud storage have increased in ways we can barely imagine and will only continue to increase in the future. With the rise in the collection of raw data, techniques must be developed to sort through and find the usable data. This is where data mining comes in.
Although data mining is still a relatively new technique being employed by companies, there is already a lot of negative stigma and public fear associated with the practice. A report from the Annenberg School for Communication at the University of Pennsylvania in 2015 found that 55% of people disagreed or strongly disagreed with the idea that “it’s okay if a store where I shop uses information it has about me to create a picture of me, that improves the services they provide for me” (Turow, Hennessy, & Draper, 2015). Headlines in sensationalist newspapers and news reports use words like invasive when they describe data mining, which skews public perception to adopt a more negative view of data mining. But is it something we really need to be worried about?    
        
            Save your time!
We can take care of your essay
            
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
 
        
     
Despite the concerns which surround the use of data mining and the use of consumers’ personal information, the practice itself is essential to sort through the massive amounts of data in order to find the best possible solution to the problem presented. The process, which combines statistics, artificial intelligence and machine learning (SAS, n.d.), has applications across multiple fields ranging from science and research to business management (The Economic Times, n.d.).
How it works is that after the initial business problem has been laid out, analysists can assess if data mining can assist in solving the issue (MicroStrategy, n.d.). If this is the case then the process itself can be explained by three main stages: exploration, model building (or pattern identification) and evaluation and deployment (insideBigData Editorial Team, 2018).
Exploration in data mining involves the preparation of the data that is wanting to be mined. Data visualization tools (MicroStrategy, n.d.) are used to explore the data and ensure that it is fit for the purpose of solving the initial problem – which can involve anything from predictors for a regression model to various graphical and statistical methods. This allows for the business to identify the most relevant variables and get an idea of the general nature of the model they should be anticipating (insideBigData Editorial Team, 2018). During this stage the data is cleansed (MicroStrategy, n.d.) by going through the database and either removing or updating information that is incomplete, incorrect, improperly formatted, duplicated or irrelevant (blue-pencil, 2018). This is usually done as the information has been collected over several years which leads to obsolete data and is performed using distributed systems to improve speed and security of information whilst reducing the strain on a single computer system (MicroStrategy, n.d.).
The second stage in data mining involves model building or pattern identification, the basics of which is learning about various data models and then choosing the one which best fits the business problem initially set out (insideBigData Editorial Team, 2018). Data mining can be sorted into two primary types: supervised and unsupervised learning. The type of model which is best suited for the target problem can be best predicted by looking at whether the issue requires a single output variable (supervised) or the understanding of data to discern patterns within it (unsupervised). For example, the implementation of spam filters requires supervised learning to run thousands of sample emails through the model with labels saying whether the email was spam or not – allowing the model to then be able to predict which emails fed to it without these labels are spam. In contrast an unsupervised learning model is the most common one used to enhance customer experience. An example of this is the creating of a recommendation system which can track user patterns and then, by using the data from other users with similar patterns, can make personalized recommendations for that user (MicroStrategy, n.d.).
The final stage is the evaluation and deployment of the selected model. The model is tested against the business problem that was established at the beginning of the project to determine whether it should be deployed across the company, if it is deemed successful then the model is implemented into everyday business operations. The model then benefits the business in a variety of ways such as through automated decision-making, accurate prediction and forecasting to help organize supply and demand, cost reduction through more efficient allocation of resources, and helps to provide customer insights which are then used to improve customer experience (MicroStrategy, n.d.).
The benefits of data mining from a consumer perspective can be seen on streaming platforms such as Netflix which use a simple associative learning model to recommend users new TV shows and movies based on content that they have previously watched or indicated that they want to watch (insideBigData Editorial Team, 2018).
This has the dual benefit of increasing customer enjoyment of the service and increasing customer retention for the company.
Data mining can also be used to help prevent bank fraud by using supervised learning to teach a model using data from a third-party warehouse what a fraudulent transaction looks like and then applying it to their own database to find internal issues (Rajdeepa & Nandhitha, 2015). The model looks for statistically unusual numbers of cash transfers by customer and bank account as well as transactions or disbursements of cash which are just under regulatory reporting thresholds (individually) or exceed regulatory reporting thresholds (together) (Experfy, n.d.). These data mining models have been applied to the ever-growing number of online shopping transactions (which make up 85% of the and have helped to reduce losses to financial institutions by $36.4 billion in last year, a benefit not only for customers but also to the Australian banks which often lose money through the refunds they provide their customers (Housego, 2019).
However, whilst there are benefits to be found within the world of data mining there are serious ethical issues involved in the process which are emerging. As the amount of data that organizations store increases, many companies are forced to turn to cloud storage (MicroStrategy, n.d.), which is far more vulnerable to malicious attack from external sources than a physical storage system. In addition to this, whilst data mining is an incredibly useful tool for businesses to analyze customer behavior to the benefit of both parties (MicroStrategy, n.d.), the is question when do the insights provided by this analysis begin to infringe on the privacy of an individual?
An article published by Forbes in 2012 (Hill, 2012) outlines how Target figured out that a teenage girl was pregnant before her father did. This was managed through the associative learning algorithm used by Netflix to recommend new content to users. Target statistician Andrew Pole in an interview with Charles Duhigg of the New York Times (Duhigg, 2012) spoke of how the model noticed that expectant mothers were buying certain items at certain points in their pregnancies such as “unscented lotions around the beginning of their second trimester”, “that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc” and other shopping habits that Target then used to assign each shopper a ‘pregnancy score’. If the shopper’s pregnancy score was sufficiently high the company began sending them advertising and coupons with a focus on maternity and baby items (Hill, 2012). Not only can this usage of data mining be interpreted as an invasion of privacy by customers, but it could also have adverse effects on their customers. For instance, if a person with a high pregnancy has had a miscarriage, constant sending by the company of pregnancy related items can have harmful effects towards their mental health.
Other data such as customer location data which is stored by mobile apps even when the device is not in use, such as was included in the 2015 privacy policy update of the Uber app (Singer & Isaac, ‘Uber Data Collection Changes Should Be Barred, Privacy Group Urges’, 2015), also raises privacy issues since many users of these apps are unaware that this tracking technology is included because it’s hidden within the pages of terms and conditions of the software (Kukral, 2019).
Databases are everywhere. As the Internet continues to evolve, the collection and analysis of data will continue to exist to be used for better or for worse. However, as we move forward, the question will become how we will need to adapt our laws to encompass these changes and ensure the continued safety and privacy of people both on and offline.
Bibliography
- Blue-pencil. (2018, July 16). Data Cleansing: What Is It and Why Is It Important. Retrieved August 29, 2019, from blue-pencil Information Security: https://www.blue-pencil.ca/data-cleansing-what-is-it-and-why-is-it-important/
- Duhigg, C. (2012, February 16). How Companies Learn Your Secrets. Retrieved August 29, 2019, from The New York Times Magazine: https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=1&_r=1&hp
- Experfy. (n.d.). Bank Fraud Detection. Retrieved August 29, 2019, from Experfy Website: https://www.experfy.com/fraud-risk/banking-fraud-management
- Hill, K. (2012, February 16). How Target Figured Out a Teen Girl Was Pregnant Before Her Father Did. Retrieved August 7, 2019, from Forbes: https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/#15bf67856668
- insideBigData Editorial Team. (2018, November 24). Data Mining and Predictive Analytics: Things We should Care About. Retrieved August 29, 2019, from insideBigData: https://insidebigdata.com/2018/11/24/data-mining-predictive-analytics-things-care/
- Kukral, J. (2019, March 28). The Ethical Dilemma Posed by Data Mining. Retrieved August 29, 2019, from The Carroll News: https://carrollnews.org/1787/opinion/the-ethical-dilemma-posed-by-data-mining/
- MicroStrategy. (n.d.). Data Mining Explained. Retrieved August 7, 2019, from MicroStrategy Website: https://www.microstrategy.com/us/resources/introductory-guides/data-mining-explained
- Rajdeepa, B., & Nandhitha, D. (2015, July). Fraud Detection in Banking Sector using Data mining. International Journal of Science and Research, 4(7), 1822-1825. Retrieved August 29, 2019
- SAS. (n.d.). Data Mining. Retrieved August 7, 2019, from SAS Website: https://www.sas.com/en_au/insights/analytics/data-mining.html
- Singer, N. (2015, June 4). Sharing Data, but Not Happily. Retrieved August 29, 2019, from The New York Times: https://www.nytimes.com/2015/06/05/technology/consumers-conflicted-over-data-mining-policies-report-finds.html
- Singer, N., & Isaac, M. (2015, June 22). Uber Data Collection Changes Should Be Barred, Privacy Group Urges. Retrieved August 29, 2019, from The New York Times: https://www.nytimes.com/2015/06/23/technology/uber-data-collection-changes-should-be-barred-privacy-group-urges.html
- The Economic Times. (n.d.). Definition of 'Data Mining'. Retrieved August 7, 2019, from The Economic Times Website: https://economictimes.indiatimes.com/definition/data-mining
- Turow, J., Hennessy, M., & Draper, N. (2015). The Tradeoff Fallacy. Philadelphia: June. Retrieved August 29, 2019
- wikiHow. (2019, March 29). How to Change Netflix Preferences. Retrieved August 29, 2019, from wikiHow Website: https://www.wikihow.com/Change-Netflix-Preferences