Tag

data mining

Browsing
By Ali Azhar

Data mining tools can collect and analyse data in much the same way a human can, but much faster. Learn what data mining is, how it works and how to use it effectively.

Data mining is an important big data management strategy that is gaining steam, especially as organizations realize how many patterns and problems data mining operations can detect across their data sets. In this guide, learn what data mining is, how it operates and why it might be the next data management strategy you need to incorporate into your business.

Jump to:

What is data mining?

Data mining is used to identify patterns, correlations and anomalies in large data sets for data analysis. This helps turn raw data into actionable information to make informed business decisions, predict outcomes and develop business strategies.

Although the term “data mining” wasn’t coined until the 1990s, data mining techniques were used long before that. As the quality and complexity of data increased, software applications were used for data mining. The potential of data mining continues to increase with technological advancements in computing power and the enormous potential of big data.

Benefits of data mining

Data mining helps organizations analyse a large amount of data, deriving useful insights that allow an organization to become more efficient or profitable. With increases in data complexity and the volumes of data that are available to an organization, data mining provides a semi-automated way to process large data sets.

SEE: Data governance checklist for your organization (TechRepublic Premium)

An organization can make informed decisions and improve its strategic planning by uncovering data patterns, data anomalies and data correlations. Business executives can also use data mining to reduce legal, financial, cybersecurity and other types of risks to the organization.

How data mining operates

Data mining works by exploring and analysing large volumes of data to derive meaningful trends, relationships and patterns. Data mining software solutions are versatile tools that can be used for different objectives and functions like fraud detection, customer sentiment analysis and credit risk management.

Although data mining can be used in various ways, the process includes a few common steps. The first step is to gather and load the data. This step is followed by preparing the data through methods such as data cleansing or data transformation.

Once the data is prepared, it is ready to be mined. Computer applications with data mining algorithms are most frequently used to perform data mining. From there, data mining results are often translated into visual or statistical representations for further analysis.

Different types of data mining

There are several types of data mining techniques that businesses can apply to their big data. The right data mining technique to use depends on several factors, including the type of data and the objective of the data mining project. Here are some of the most common types of data mining:

Affinity grouping

Data elements that share the same characteristics are grouped. For example, customers that have the same buyer intent, interests or goals can be grouped. This type of data mining is also known as clustering.

Regression

Predicting data values based on a set of variables. This type of data mining is often used to find relationships between data sets.

Neural networks

Computing systems that are inspired by biological neural networks, such as the human brain. The algorithms in neural networks are useful for recognizing complex patterns in data.

Association rule

Association rules are established to determine the relationship between data elements. This includes determining co-occurrences and patterns in data.

Data mining examples

Telecommunications and media

Several industries use data mining, including the telecom and media industries, where it is often used to analyse consumer data. These companies use data mining to map customer behaviour and run highly targeted marketing campaigns.

Insurance

Similarly, data mining is commonly used in the insurance industry, where it helps companies solve complex problems related to compliance, customer attrition and risk management. Health insurance companies use data mining to map the patient’s medical history, examination results and treatment patterns. This helps them develop and execute an efficient health resource management strategy.

Manufacturing

Data mining is also used in the manufacturing industry to align supply chains with sales forecasts and for early detection of future problems. Through data mining, manufacturers are able to anticipate maintenance and predict the depreciation of production assets.

Banking

Finally, the banking industry uses data mining algorithms to detect fraud and other anomalies in their data. Data mining helps banks and other financial institutions achieve optimum ROI on marketing investments, meet compliance requirements and have a better view of market risks.

Top 3 GRC Solutions

1Domo

Visit website

Build a modern business, driven by data. Connect to any data source to bring your data together into one unified view, then make analytics available to drive insight-based actions—all while maintaining security and control. Domo serves enterprise customers in all industries looking to manage their entire organization from a single platform.

Learn more about Domo

2RSA

Visit website

RSA Archer removes silos from the risk management process so that all efforts are streamlined and the information is accurate, consolidated, and comprehensive. The platform’s configurability enables users to quickly make changes with no coding or database development required. Archer was named a Leader in Gartner’s 2020 Magic Quadrant for IT risk management and IT vendor risk management tools. Additionally, Forrester named it a Contender in its Q1 2020 GRC Wave.

Learn more about RSA

3LogicManager

Visit website

LogicManager’s GRC solution has specific use cases across financial services, education, government, healthcare, retail, and technology industries, among others. Like other competitive GRC solutions, it speeds the process of aggregating and mining data, building reports, and managing files. LogicManager is lauded for its user experience and technical training and was named a Challenger in Gartner’s 2020 Magic Quadrant for IT risk management. Forrester named it a Leader in its Q1 2020 GRC Wave.

Learn more about LogicManager

Feature Image Credit: ZinetroN/Adobe Stock

By Ali Azhar

Ali is a professional writer with diverse experience in content writing, technical writing, social media posts, SEO/SEM website optimization, and other types of projects. Ali has a background in engineering, allowing him to use his analytical skills and attention to detail for his writing projects.

Sourced from TechRepublic

By Kelly Hodgkins

Starting with iOS 14, Apple requires developers to reveal all of the personal data an app can collect. These App Privacy labels may be shocking to users who will be made aware that their iPhone is being used to mine data for advertising and other purposes. Not surprisingly, Google is a principal offender.

When Apple unveiled its new App Privacy labels, Facebook took a swipe at Apple, accusing the company of squashing small companies and putting the free internet at risk. The social network even took full-page advertisements in print newspapers to attack Apple.

After Facebook released its updated Messenger app, Apple’s privacy labels revealed the reasons behind Facebook’s brutal attack.

The company’s Messenger app siphons off a ton of personal data, including search history, browsing history, usage data, and more.

It has four-times more privacy labels than WhatsApp and 30 times more than iMessage.

Now it is Google’s turn to come under the spotlight. After a short hiatus, the company finally updated its YouTube and Gmail applications.

Just like Facebook, the amount of information being collected by Google is staggering as noted by BGR. The tech giant mines personal data for third-party advertising, app functionality, analytics, and more.

The most troubling category is the “Other Data,” a catch-all for usages that Google is not ready to disclose.

YouTube gathers more personal information than Gmail, which isn’t surprising. Most of the revenue that YouTube generates comes from advertisements. The company then uses your data for targeted advertising.

Google isn’t providing your data directly to advertisers. Instead, it is organizing your data into categories and allowing advertisers to target specific categories.

Apple isn’t banning Google or even Facebook for mining your data. These new privacy labels are designed to inform you of how your data is being used. You then can decide for yourself if you want to use Google or Facebook, knowing what type of data you are allowing them to access.

By Kelly Hodgkins

Sourced from iDROPNEWS

The application to the advertising industry is so obvious it is like a slap in the face with a wet fish.

By MediaStreet Staff Writers

Lately, social media has been all about heated exchanges and distribution of fake news. And right in the thick of these skirmishes are Twitter bots. They have certainly earned themselves a bad reputation, tweeting on behalf of politicians and driving troll trains through the media landscape with abandon.

But, not all bots are bad, according to a boffins at USC’s Information Sciences Institute. Computer scientist Emilio Ferrara undertook a large-scale experiment designed to analyse the spread of information on social networks. Ferrara teamed up with some Danish boffins from the Technical University of Denmark to deploy a network of “social bots,” programmed to spread positive messages on Twitter.

“We found that bots can be used to run interventions on social media that trigger or foster good behaviours,” says Ferrara, whose previous research focused on the proliferation of bots in the U.S. election campaign.

But it also revealed another intriguing pattern: information is much more likely to become viral when people are exposed to the same piece of information multiple times through multiple DIFERENT sources. Says Ferrara, “This milestone shatters a long-held belief that ideas spread like an infectious disease, or contagion, with each exposure resulting in the same probability of infection. Now we have seen empirically that when you are exposed to a given piece of information multiple times, your chances of adopting this information increase every time.”

To reach these conclusions, the researchers first developed a dozen positive hashtags, ranging from health tips to fun activities, such as encouraging users to get the flu shot, high-five a stranger and even Photoshop a celebrity’s face onto a turkey at Thanksgiving. Then, they designed a network of 39 bots to deploy these hashtags in a synchronised manner to 25,000 real followers during a four-month period from October to December 2016.

Each bot automatically recorded when a target user retweeted intervention-related content and also each exposure that had taken place prior to retweeting. Several hashtags received more than one hundred retweets and likes. “We also saw that every exposure increased the probability of adoption – there is a cumulative reinforcement effect,” says Ferrara. “It seems there are some cognitive mechanisms that reinforce your likelihood to believe in or adopt a piece of information when it is validated by multiple sources in your social network.”

This mechanism could explain, for example, why you might take one friend’s movie recommendation with a grain of salt. But the probability that you will also see that movie increases cumulatively as each additional friend makes the same recommendation.

This discovery could improve how positive intervention strategies are deployed in social networks in many scenarios, including public health announcements for disease control or emergency management in the wake of a crisis. The common approach is to have one broadcasting entity with many followers. But this study implies that it would be more effective to have multiple, decentralised bots share synchronised content.

Advertisers, mull this over. Bots can be your very best friend.

Researchers need to be aware of the mistakes that can be made when for mining social-media data.

By MediaStreet Staff Writers

A growing number of academic researchers are mining social media data to learn about both online and offline human behaviour. In recent years, studies have claimed the ability to predict everything from summer blockbusters to fluctuations in the stock market.

But mounting evidence of flaws in many of these studies points to a need for researchers to be wary of serious pitfalls that arise when working with huge social media data sets. That is, according to computer scientists at McGill University in Montreal and Carnegie Mellon University in Pittsburgh.

Such erroneous results can have huge implications: thousands of research papers each year are now based on data gleaned from social media. “Many of these papers are used to inform and justify decisions and investments among the public and in industry and government,” says Derek Ruths, an assistant professor in McGill’s School of Computer Science.

Ruths and Jürgen Pfeffer of Carnegie Mellon’s Institute for Software Research highlight several issues involved in using social media data sets – along with strategies to address them. Among the challenges:

  • Different social media platforms attract different users – Pinterest, for example, is dominated by females aged 25-34 – yet researchers rarely correct for the distorted picture these populations can produce.
  • Publicly-available data feeds used in social media research don’t always provide an accurate representation of the platform’s overall data – and researchers are generally in the dark about when and how social media providers filter their data streams.
  • The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured. For instance, on Facebook the absence of a “dislike” button makes negative responses to content harder to detect than positive “likes.”
  • Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour.
  • Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65% accuracy for typical users – even though studies (focusing on politically active users) have claimed 90% accuracy.

Many of these problems have well-known solutions from other fields such as epidemiology, statistics, and machine learning, Ruths and Pfeffer write. “The common thread in all these issues is the need for researchers to be more acutely aware of what they’re actually analysing when working with social media data,” Ruths says.

Social scientists have honed their techniques and standards to deal with this sort of challenge before.

The infamous ‘Dewey Defeats Truman’ headline of 1948 stemmed from telephone surveys that under-sampled Truman supporters in the general population. Rather than permanently discrediting the practice of polling, that glaring error led to today’s more sophisticated techniques, higher standards, and more accurate polls. Says Ruths, “Now, we’re poised at a similar technological inflection point. By tackling the issues we face, we’ll be able to realise the tremendous potential for good promised by social media-based research.”