The 7 Most Useful Data Analysis Methods and Techniques

Data analytics is the process of analyzing raw data to draw out meaningful insights. These insights are then used to determine the best course of action.

When is the best time to roll out that marketing campaign? Is the current team structure as effective as it could be? Which customer segments are most likely to purchase your new product?

Ultimately, data analytics is a crucial driver of any successful business strategy. But how do data analysts actually turn raw data into something useful? There are a range of methods and techniques that data analysts use depending on the type of data in question and the kinds of insights they want to uncover.

You can get a hands-on introduction to data analytics in this free short course .

In this post, we’ll explore some of the most useful data analysis techniques. By the end, you’ll have a much clearer idea of how you can transform meaningless data into business intelligence. We’ll cover:

  • What is data analysis and why is it important?
  • What is the difference between qualitative and quantitative data?
  • Regression analysis
  • Monte Carlo simulation
  • Factor analysis
  • Cohort analysis
  • Cluster analysis
  • Time series analysis
  • Sentiment analysis
  • The data analysis process
  • The best tools for data analysis
  •  Key takeaways

The first six methods listed are used for quantitative data , while the last technique applies to qualitative data. We briefly explain the difference between quantitative and qualitative data in section two, but if you want to skip straight to a particular analysis technique, just use the clickable menu.

1. What is data analysis and why is it important?

Data analysis is, put simply, the process of discovering useful information by evaluating data. This is done through a process of inspecting, cleaning, transforming, and modeling data using analytical and statistical tools, which we will explore in detail further along in this article.

Why is data analysis important? Analyzing data effectively helps organizations make business decisions. Nowadays, data is collected by businesses constantly: through surveys, online tracking, online marketing analytics, collected subscription and registration data (think newsletters), social media monitoring, among other methods.

These data will appear as different structures, including—but not limited to—the following:

The concept of big data —data that is so large, fast, or complex, that it is difficult or impossible to process using traditional methods—gained momentum in the early 2000s. Then, Doug Laney, an industry analyst, articulated what is now known as the mainstream definition of big data as the three Vs: volume, velocity, and variety. 

  • Volume: As mentioned earlier, organizations are collecting data constantly. In the not-too-distant past it would have been a real issue to store, but nowadays storage is cheap and takes up little space.
  • Velocity: Received data needs to be handled in a timely manner. With the growth of the Internet of Things, this can mean these data are coming in constantly, and at an unprecedented speed.
  • Variety: The data being collected and stored by organizations comes in many forms, ranging from structured data—that is, more traditional, numerical data—to unstructured data—think emails, videos, audio, and so on. We’ll cover structured and unstructured data a little further on.

This is a form of data that provides information about other data, such as an image. In everyday life you’ll find this by, for example, right-clicking on a file in a folder and selecting “Get Info”, which will show you information such as file size and kind, date of creation, and so on.

Real-time data

This is data that is presented as soon as it is acquired. A good example of this is a stock market ticket, which provides information on the most-active stocks in real time.

Machine data

This is data that is produced wholly by machines, without human instruction. An example of this could be call logs automatically generated by your smartphone.

Quantitative and qualitative data

Quantitative data—otherwise known as structured data— may appear as a “traditional” database—that is, with rows and columns. Qualitative data—otherwise known as unstructured data—are the other types of data that don’t fit into rows and columns, which can include text, images, videos and more. We’ll discuss this further in the next section.

2. What is the difference between quantitative and qualitative data?

How you analyze your data depends on the type of data you’re dealing with— quantitative or qualitative . So what’s the difference?

Quantitative data is anything measurable , comprising specific quantities and numbers. Some examples of quantitative data include sales figures, email click-through rates, number of website visitors, and percentage revenue increase. Quantitative data analysis techniques focus on the statistical, mathematical, or numerical analysis of (usually large) datasets. This includes the manipulation of statistical data using computational techniques and algorithms. Quantitative analysis techniques are often used to explain certain phenomena or to make predictions.

Qualitative data cannot be measured objectively , and is therefore open to more subjective interpretation. Some examples of qualitative data include comments left in response to a survey question, things people have said during interviews, tweets and other social media posts, and the text included in product reviews. With qualitative data analysis, the focus is on making sense of unstructured data (such as written text, or transcripts of spoken conversations). Often, qualitative analysis will organize the data into themes—a process which, fortunately, can be automated.

Data analysts work with both quantitative and qualitative data , so it’s important to be familiar with a variety of analysis methods. Let’s take a look at some of the most useful techniques now.

3. Data analysis techniques

Now we’re familiar with some of the different types of data, let’s focus on the topic at hand: different methods for analyzing data. 

a. Regression analysis

Regression analysis is used to estimate the relationship between a set of variables. When conducting any type of regression analysis , you’re looking to see if there’s a correlation between a dependent variable (that’s the variable or outcome you want to measure or predict) and any number of independent variables (factors which may have an impact on the dependent variable). The aim of regression analysis is to estimate how one or more variables might impact the dependent variable, in order to identify trends and patterns. This is especially useful for making predictions and forecasting future trends.

Let’s imagine you work for an ecommerce company and you want to examine the relationship between: (a) how much money is spent on social media marketing, and (b) sales revenue. In this case, sales revenue is your dependent variable—it’s the factor you’re most interested in predicting and boosting. Social media spend is your independent variable; you want to determine whether or not it has an impact on sales and, ultimately, whether it’s worth increasing, decreasing, or keeping the same. Using regression analysis, you’d be able to see if there’s a relationship between the two variables. A positive correlation would imply that the more you spend on social media marketing, the more sales revenue you make. No correlation at all might suggest that social media marketing has no bearing on your sales. Understanding the relationship between these two variables would help you to make informed decisions about the social media budget going forward. However: It’s important to note that, on their own, regressions can only be used to determine whether or not there is a relationship between a set of variables—they don’t tell you anything about cause and effect. So, while a positive correlation between social media spend and sales revenue may suggest that one impacts the other, it’s impossible to draw definitive conclusions based on this analysis alone.

There are many different types of regression analysis, and the model you use depends on the type of data you have for the dependent variable. For example, your dependent variable might be continuous (i.e. something that can be measured on a continuous scale, such as sales revenue in USD), in which case you’d use a different type of regression analysis than if your dependent variable was categorical in nature (i.e. comprising values that can be categorised into a number of distinct groups based on a certain characteristic, such as customer location by continent). You can learn more about different types of dependent variables and how to choose the right regression analysis in this guide .

Regression analysis in action: Investigating the relationship between clothing brand Benetton’s advertising expenditure and sales

b. Monte Carlo simulation

When making decisions or taking certain actions, there are a range of different possible outcomes. If you take the bus, you might get stuck in traffic. If you walk, you might get caught in the rain or bump into your chatty neighbor, potentially delaying your journey. In everyday life, we tend to briefly weigh up the pros and cons before deciding which action to take; however, when the stakes are high, it’s essential to calculate, as thoroughly and accurately as possible, all the potential risks and rewards.

Monte Carlo simulation, otherwise known as the Monte Carlo method, is a computerized technique used to generate models of possible outcomes and their probability distributions. It essentially considers a range of possible outcomes and then calculates how likely it is that each particular outcome will be realized. The Monte Carlo method is used by data analysts to conduct advanced risk analysis, allowing them to better forecast what might happen in the future and make decisions accordingly.

So how does Monte Carlo simulation work, and what can it tell us? To run a Monte Carlo simulation, you’ll start with a mathematical model of your data—such as a spreadsheet. Within your spreadsheet, you’ll have one or several outputs that you’re interested in; profit, for example, or number of sales. You’ll also have a number of inputs; these are variables that may impact your output variable. If you’re looking at profit, relevant inputs might include the number of sales, total marketing spend, and employee salaries. If you knew the exact, definitive values of all your input variables, you’d quite easily be able to calculate what profit you’d be left with at the end. However, when these values are uncertain, a Monte Carlo simulation enables you to calculate all the possible options and their probabilities. What will your profit be if you make 100,000 sales and hire five new employees on a salary of $50,000 each? What is the likelihood of this outcome? What will your profit be if you only make 12,000 sales and hire five new employees? And so on. It does this by replacing all uncertain values with functions which generate random samples from distributions determined by you, and then running a series of calculations and recalculations to produce models of all the possible outcomes and their probability distributions. The Monte Carlo method is one of the most popular techniques for calculating the effect of unpredictable variables on a specific output variable, making it ideal for risk analysis.

Monte Carlo simulation in action: A case study using Monte Carlo simulation for risk analysis

 c. Factor analysis

Factor analysis is a technique used to reduce a large number of variables to a smaller number of factors. It works on the basis that multiple separate, observable variables correlate with each other because they are all associated with an underlying construct. This is useful not only because it condenses large datasets into smaller, more manageable samples, but also because it helps to uncover hidden patterns. This allows you to explore concepts that cannot be easily measured or observed—such as wealth, happiness, fitness, or, for a more business-relevant example, customer loyalty and satisfaction.

Let’s imagine you want to get to know your customers better, so you send out a rather long survey comprising one hundred questions. Some of the questions relate to how they feel about your company and product; for example, “Would you recommend us to a friend?” and “How would you rate the overall customer experience?” Other questions ask things like “What is your yearly household income?” and “How much are you willing to spend on skincare each month?”

Once your survey has been sent out and completed by lots of customers, you end up with a large dataset that essentially tells you one hundred different things about each customer (assuming each customer gives one hundred responses). Instead of looking at each of these responses (or variables) individually, you can use factor analysis to group them into factors that belong together—in other words, to relate them to a single underlying construct. In this example, factor analysis works by finding survey items that are strongly correlated. This is known as covariance . So, if there’s a strong positive correlation between household income and how much they’re willing to spend on skincare each month (i.e. as one increases, so does the other), these items may be grouped together. Together with other variables (survey responses), you may find that they can be reduced to a single factor such as “consumer purchasing power”. Likewise, if a customer experience rating of 10/10 correlates strongly with “yes” responses regarding how likely they are to recommend your product to a friend, these items may be reduced to a single factor such as “customer satisfaction”.

In the end, you have a smaller number of factors rather than hundreds of individual variables. These factors are then taken forward for further analysis, allowing you to learn more about your customers (or any other area you’re interested in exploring).

Factor analysis in action: Using factor analysis to explore customer behavior patterns in Tehran

d. Cohort analysis

Cohort analysis is a data analytics technique that groups users based on a shared characteristic , such as the date they signed up for a service or the product they purchased. Once users are grouped into cohorts, analysts can track their behavior over time to identify trends and patterns.

So what does this mean and why is it useful? Let’s break down the above definition further. A cohort is a group of people who share a common characteristic (or action) during a given time period. Students who enrolled at university in 2020 may be referred to as the 2020 cohort. Customers who purchased something from your online store via the app in the month of December may also be considered a cohort.

With cohort analysis, you’re dividing your customers or users into groups and looking at how these groups behave over time. So, rather than looking at a single, isolated snapshot of all your customers at a given moment in time (with each customer at a different point in their journey), you’re examining your customers’ behavior in the context of the customer lifecycle. As a result, you can start to identify patterns of behavior at various points in the customer journey—say, from their first ever visit to your website, through to email newsletter sign-up, to their first purchase, and so on. As such, cohort analysis is dynamic, allowing you to uncover valuable insights about the customer lifecycle.

This is useful because it allows companies to tailor their service to specific customer segments (or cohorts). Let’s imagine you run a 50% discount campaign in order to attract potential new customers to your website. Once you’ve attracted a group of new customers (a cohort), you’ll want to track whether they actually buy anything and, if they do, whether or not (and how frequently) they make a repeat purchase. With these insights, you’ll start to gain a much better understanding of when this particular cohort might benefit from another discount offer or retargeting ads on social media, for example. Ultimately, cohort analysis allows companies to optimize their service offerings (and marketing) to provide a more targeted, personalized experience. You can learn more about how to run cohort analysis using Google Analytics .

Cohort analysis in action: How Ticketmaster used cohort analysis to boost revenue

e. Cluster analysis

Cluster analysis is an exploratory technique that seeks to identify structures within a dataset. The goal of cluster analysis is to sort different data points into groups (or clusters) that are internally homogeneous and externally heterogeneous. This means that data points within a cluster are similar to each other, and dissimilar to data points in another cluster. Clustering is used to gain insight into how data is distributed in a given dataset, or as a preprocessing step for other algorithms.

There are many real-world applications of cluster analysis. In marketing, cluster analysis is commonly used to group a large customer base into distinct segments, allowing for a more targeted approach to advertising and communication. Insurance firms might use cluster analysis to investigate why certain locations are associated with a high number of insurance claims. Another common application is in geology, where experts will use cluster analysis to evaluate which cities are at greatest risk of earthquakes (and thus try to mitigate the risk with protective measures).

It’s important to note that, while cluster analysis may reveal structures within your data, it won’t explain why those structures exist. With that in mind, cluster analysis is a useful starting point for understanding your data and informing further analysis. Clustering algorithms are also used in machine learning—you can learn more about clustering in machine learning in our guide .

Cluster analysis in action: Using cluster analysis for customer segmentation—a telecoms case study example

f. Time series analysis

Time series analysis is a statistical technique used to identify trends and cycles over time. Time series data is a sequence of data points which measure the same variable at different points in time (for example, weekly sales figures or monthly email sign-ups). By looking at time-related trends, analysts are able to forecast how the variable of interest may fluctuate in the future.

When conducting time series analysis, the main patterns you’ll be looking out for in your data are:

  • Trends: Stable, linear increases or decreases over an extended time period.
  • Seasonality: Predictable fluctuations in the data due to seasonal factors over a short period of time. For example, you might see a peak in swimwear sales in summer around the same time every year.
  • Cyclic patterns: Unpredictable cycles where the data fluctuates. Cyclical trends are not due to seasonality, but rather, may occur as a result of economic or industry-related conditions.

As you can imagine, the ability to make informed predictions about the future has immense value for business. Time series analysis and forecasting is used across a variety of industries, most commonly for stock market analysis, economic forecasting, and sales forecasting. There are different types of time series models depending on the data you’re using and the outcomes you want to predict. These models are typically classified into three broad types: the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models. For an in-depth look at time series analysis, refer to our guide .

Time series analysis in action: Developing a time series model to predict jute yarn demand in Bangladesh

g. Sentiment analysis

When you think of data, your mind probably automatically goes to numbers and spreadsheets.

Many companies overlook the value of qualitative data, but in reality, there are untold insights to be gained from what people (especially customers) write and say about you. So how do you go about analyzing textual data?

One highly useful qualitative technique is sentiment analysis , a technique which belongs to the broader category of text analysis —the (usually automated) process of sorting and understanding textual data.

With sentiment analysis, the goal is to interpret and classify the emotions conveyed within textual data. From a business perspective, this allows you to ascertain how your customers feel about various aspects of your brand, product, or service.

There are several different types of sentiment analysis models, each with a slightly different focus. The three main types include:

Fine-grained sentiment analysis

If you want to focus on opinion polarity (i.e. positive, neutral, or negative) in depth, fine-grained sentiment analysis will allow you to do so.

For example, if you wanted to interpret star ratings given by customers, you might use fine-grained sentiment analysis to categorize the various ratings along a scale ranging from very positive to very negative.

Emotion detection

This model often uses complex machine learning algorithms to pick out various emotions from your textual data.

You might use an emotion detection model to identify words associated with happiness, anger, frustration, and excitement, giving you insight into how your customers feel when writing about you or your product on, say, a product review site.

Aspect-based sentiment analysis

This type of analysis allows you to identify what specific aspects the emotions or opinions relate to, such as a certain product feature or a new ad campaign.

If a customer writes that they “find the new Instagram advert so annoying”, your model should detect not only a negative sentiment, but also the object towards which it’s directed.

In a nutshell, sentiment analysis uses various Natural Language Processing (NLP) algorithms and systems which are trained to associate certain inputs (for example, certain words) with certain outputs.

For example, the input “annoying” would be recognized and tagged as “negative”. Sentiment analysis is crucial to understanding how your customers feel about you and your products, for identifying areas for improvement, and even for averting PR disasters in real-time!

Sentiment analysis in action: 5 Real-world sentiment analysis case studies

4. The data analysis process

In order to gain meaningful insights from data, data analysts will perform a rigorous step-by-step process. We go over this in detail in our step by step guide to the data analysis process —but, to briefly summarize, the data analysis process generally consists of the following phases:

Defining the question

The first step for any data analyst will be to define the objective of the analysis, sometimes called a ‘problem statement’. Essentially, you’re asking a question with regards to a business problem you’re trying to solve. Once you’ve defined this, you’ll then need to determine which data sources will help you answer this question.

Collecting the data

Now that you’ve defined your objective, the next step will be to set up a strategy for collecting and aggregating the appropriate data. Will you be using quantitative (numeric) or qualitative (descriptive) data? Do these data fit into first-party, second-party, or third-party data?

Learn more: Quantitative vs. Qualitative Data: What’s the Difference? 

Cleaning the data

Unfortunately, your collected data isn’t automatically ready for analysis—you’ll have to clean it first. As a data analyst, this phase of the process will take up the most time. During the data cleaning process, you will likely be:

  • Removing major errors, duplicates, and outliers
  • Removing unwanted data points
  • Structuring the data—that is, fixing typos, layout issues, etc.
  • Filling in major gaps in data

Analyzing the data

Now that we’ve finished cleaning the data, it’s time to analyze it! Many analysis methods have already been described in this article, and it’s up to you to decide which one will best suit the assigned objective. It may fall under one of the following categories:

  • Descriptive analysis , which identifies what has already happened
  • Diagnostic analysis , which focuses on understanding why something has happened
  • Predictive analysis , which identifies future trends based on historical data
  • Prescriptive analysis , which allows you to make recommendations for the future

Visualizing and sharing your findings

We’re almost at the end of the road! Analyses have been made, insights have been gleaned—all that remains to be done is to share this information with others. This is usually done with a data visualization tool, such as Google Charts, or Tableau.

Learn more: 13 of the Most Common Types of Data Visualization

To sum up the process, Will’s explained it all excellently in the following video:

5. The best tools for data analysis

As you can imagine, every phase of the data analysis process requires the data analyst to have a variety of tools under their belt that assist in gaining valuable insights from data. We cover these tools in greater detail in this article , but, in summary, here’s our best-of-the-best list, with links to each product:

The top 9 tools for data analysts

  • Microsoft Excel
  • Jupyter Notebook
  • Apache Spark
  • Microsoft Power BI

6. Key takeaways and further reading

As you can see, there are many different data analysis techniques at your disposal. In order to turn your raw data into actionable insights, it’s important to consider what kind of data you have (is it qualitative or quantitative?) as well as the kinds of insights that will be useful within the given context. In this post, we’ve introduced seven of the most useful data analysis techniques—but there are many more out there to be discovered!

So what now? If you haven’t already, we recommend reading the case studies for each analysis technique discussed in this post (you’ll find a link at the end of each section). For a more hands-on introduction to the kinds of methods and techniques that data analysts use, try out this free introductory data analytics short course. In the meantime, you might also want to read the following:

  • The Best Online Data Analytics Courses for 2024
  • What Is Time Series Data and How Is It Analyzed?
  • What is Spatial Analysis?

Qualitative case study data analysis: an example from practice

Affiliation.

  • 1 School of Nursing and Midwifery, National University of Ireland, Galway, Republic of Ireland.
  • PMID: 25976531
  • DOI: 10.7748/nr.22.5.8.e1307

Aim: To illustrate an approach to data analysis in qualitative case study methodology.

Background: There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods: Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion: By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice: This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Keywords: Case study data analysis; case study research methodology; clinical skills research; qualitative case study methodology; qualitative data analysis; qualitative research.

  • Case-Control Studies*
  • Data Interpretation, Statistical*
  • Nursing Research / methods*
  • Qualitative Research*
  • Research Design

Qualitative Data Analysis Methods

In the following, we will discuss basic approaches to analyzing data in all six of the acceptable qualitative designs.

After reviewing the information in this document, you will be able to:

  • Recognize the terms for data analysis methods used in the various acceptable designs.
  • Recognize the data preparation tasks that precede actual analysis in all the designs.
  • Understand the basic analytic methods used by the respective qualitative designs.
  • Identify and apply the methods required by your selected design.

Terms Used in Data Analysis by the Six Designs

Each qualitative research approach or design has its own terms for methods of data analysis:

  • Ethnography—uses modified thematic analysis and life histories.
  • Case study—uses description, categorical aggregation, or direct interpretation.
  • Grounded theory—uses open, axial, and selective coding (although recent writers are proposing variations on those basic analysis methods).
  • Phenomenology—describes textures and structures of the essential meaning of the lived experience of the phenomenon
  • Heuristics—patterns, themes, and creative synthesis along with individual portraits.
  • Generic qualitative inquiry—thematic analysis, which is really a foundation for all the other analytic methods. Thematic analysis is the starting point for the other five, and the endpoint for generic qualitative inquiry. Because it is the basic or foundational method, we'll take it first.

Preliminary Tasks in Analysis in all Methods

In all the approaches—case study, grounded theory, generic inquiry, and phenomenology—there are preliminary tasks that must be performed prior to the analysis itself. For each, you will need to:

  • Arrange for secure storage of original materials. Storage should be secure and guaranteed to protect the privacy and confidentiality of the participants' information and identities.
  • Transcribe interviews or otherwise transform raw data into usable formats.
  • Make master copies and working copies of all materials. Master copies should be kept securely with the original data. Working copies will be marked up, torn apart, and used heavily: make plenty.
  • Arrange secure passwords or other protection for all electronic data and copies.
  • When ready to begin, read all the transcripts repeatedly—at least three times—for a sense of the whole. Don't force it—allow the participants' words to speak to you.

These tasks are done in all forms of qualitative analysis. Now let's look specifically at generic qualitative inquiry.

Data Analysis in Generic Qualitative Inquiry: Thematic Analysis

The primary tool for conducting the analysis of data when using the generic qualitative inquiry approach is thematic analysis, a flexible analytic method for deriving the central themes from verbal data. A thematic analysis can also be used to conduct analysis of the qualitative data in some types of case study.

Thematic analysis essentially creates theme-statements for ideas or categories of ideas (codes) that the researcher extracts from the words of the participants.

There are two main types of thematic analysis:

  • Inductive thematic analysis, in which the data are interpreted inductively, that is, without bringing in any preselected theoretical categories.
  • Theoretical thematic analysis, in which the participants' words are interpreted according to categories or constructs from the existing literature.

Analytic Steps in Thematic Analysis: Reading

Remember that the last preliminary task listed above was to read the transcripts for a sense of the whole. In this discussion, we'll assume you're working with transcribed data, usually from interviews. You can apply each step, with changes, to any kind of qualitative data. Now, before you start analyzing, take the first transcript and read it once more, as often as necessary, for a sense of what this participant told you about the topic of your study. If you're using other sources of data, spend time with them holistically.

Thematic Analysis: Steps in the Process

When you have a feel for the data,

  • Underline any passages (phases, sentences, or paragraphs) that appear meaningful to you. Don't make any interpretations yet! Review the underlined data.
  • Decide if the underlined data are relevant to the research question and cross out or delete all data unrelated to the research question. Some information in the transcript may be interesting but unrelated to the research question.
  • Create a name or "code" for each remaining underlined passage (expressions or meaning units) that focus on one single idea. The code should be:
  • Briefer than the passage, should
  • Sum up its meaning, and should be
  • Supported by the meaning unit (the participant's words).
  • Find codes that recur; cluster these together. Now begin the interpretation, but only with the understanding that the codes or patterns may shift and change during the process of analysis.
  • After you have developed the clusters or patterns of codes, name each pattern. The pattern name is a theme. Use language supported by the original data in the language of your discipline and field.
  • Write a brief description of each theme. Use brief direct quotations from the transcript to show the reader how the patterns emerged from the data.
  • Compose a paragraph integrating all the themes you developed from the individual's data.
  • Repeat this process for each participant, the "within-participant" analysis.
  • Finally, integrate all themes from all participants in "across-participants" analysis, showing what general themes are found across all the data.

Some variation of thematic analysis will appear in most of the other forms of qualitative data analysis, but the other methods tend to be more complex. Let's look at them one at a time. If you are already clear as to which approach or design your study will use, you can skip to the appropriate section below.

Ethnographic Data Analysis

Ethnographic data analysis relies on a modified thematic analysis. It is called modified because it combines standard thematic analysis as previously described for interview data with modified thematic methods applied to artifacts, observational notes, and other non-interview data.

Depending on the kinds of data to be interpreted (for instance pictures and historical documents) Ethnographers devise unique ways to find patterns or themes in the data. Finally, the themes must be integrated across all sources and kinds of data to arrive at a composite thematic picture of the culture.

(Adapted from Bogdan and Taylor, 1975; Taylor and Bogdan, 1998; Aronson, 1994.)

Data Analysis in Grounded Theory

Going beyond the descriptive and interpretive goals of many other qualitative models, grounded theory's goal is building a theory. It seeks explanation, not simply description.

It uses a constant comparison method of data analysis that begins as soon as the researcher starts collecting data. Each data collection event (for example, an interview) is analyzed immediately, and later data collection events can be modified to seek more information on emerging themes.

In other words, analysis goes on during each step of the data collection, not merely after data collection.

The heart of the grounded theory analysis is coding, which is analogous to but more rigorous than coding in thematic analysis.

Coding in Grounded Theory Method

There are three different types of coding used in a sequential manner.

  • The first type of coding is open coding, which is like basic coding in thematic analysis. During open coding, the researcher performs:
  • A line-by-line analysis (or sentence or paragraph analysis) of the data.
  • Labels and categorizes the dimensions or aspects of the phenomenon being studied.
  • The researcher also uses memos to describe the categories that are found.
  • The second type of coding is axial coding, which involves finding links between categories and subcategories found in the open coding.
  • The open codes are examined for their relationships: cause and effect, co-occurrence, and so on.
  • The goal here is to picture how the various dimensions or categories of data interact with one another in time and space.
  • The third type of coding is selective coding, which identifies a core category and relates the categories subsidiary to this core.
  • Selective coding selects the main phenomenon, (core category) around which subsidiary phenomena, (all other categories) are grouped, arranging the groupings, studying the results, and rearranging where the data require it.

The Final Stages of Grounded Theory Analysis, after Coding

From selective coding, the grounded theory researcher develops:

  • A model of the process, which is the description of which actions and interactions occur in a sequence or series.
  • A transactional system, which is the description of how the interactions of different events explain the phenomenon being investigated.
  • Finally, A conditional matrix is diagrammed to help consider the conditions and consequences related to the phenomenon under study.

These three essentially tell the story of the outcome of the research, in other words, the description of the process by which the phenomenon seems to happen, the transactional system supporting it, and the conditional matrix that pictures the explanation of the phenomenon are the findings of a grounded theory study.

(Adapted from Corbin and Strauss, 2008; Strauss and Corbin, 1990, 1998.)

Data Analysis in Qualitative Case Study: Background

There are a few points to consider in analyzing case study data:

  • Analysis can be:
  • Holistic—the entire case.
  • Embedded—a specific aspect of the case.
  • Multiple sources and kinds of data must be collected and analyzed.
  • Data must be collected, analyzed, and described about both:
  • The contexts of the case (its social, political, economic contexts, its affiliations with other organizations or cases, and so on).
  • The setting of the case (geography, location, physical grounds, or set-up, business organization, etc.).

Qualitative Case Study Data Analysis Methods

Data analysis is detailed in description and consists of an analysis of themes. Especially for interview or documentary analysis, thematic analysis can be used (see the section on generic qualitative inquiry). A typical format for data analysis in a case study consists of the following phases:

  • Description: This entails developing a detailed description of each instance of the case and its setting. The words "instance" and "case" can be confusing. Let's say we're conducting a case study of gay and lesbian members of large urban evangelical Christian congregations in the Southeast. The case would be all such people and their congregations. Instances of the case would be any individual person or congregation. In this phase, all the congregations (the settings) and their larger contexts would be described in detail, along with the individuals who are interviewed or observed.
  • Categorical Aggregation: This involves seeking a collection of themes from the data, hoping that relevant meaning about lessons to be learned about the case will emerge. Using our example, a kind of thematic analysis from all the data would be performed, looking for common themes.
  • Direct Interpretation: By looking at the single instance or member of the case and drawing meaning from it without looking for multiple instances, direct interpretation pulls the data apart and puts it together in more meaningful ways. Here, the interviews with all the gay and lesbian congregation members would be subjected to thematic analysis or some other form of analysis for themes.
  • Within-Case Analysis: This would identify the themes that emerge from the data collected from each instance of the case, including connections between or among the themes. These themes would be further developed using verbatim passages and direct quotation to elucidate each theme. This would serve as the summary of the thematic analysis for each individual participant.
  • Cross-Case Analysis: This phase develops a thematic analysis across cases as well as assertions and interpretations of the meaning of the themes emerging from all participants in the study.
  • Interpretive Phase: In the final phase, this is the creation of naturalistic generalizations from the data as a whole and reporting on the lesson learned from the case study.

(Adapted from Creswell, 1998; Stake, 1995.)

Data Analysis in Phenomenological Research

There are a few existing models of phenomenological research, and they each propose slightly different methods of data analysis. They all arrive at the same goal, however. The goal of phenomenological analysis is to describe the essence or core structures and textures of some conscious psychological experience. One such model, empirical, was developed at Duquesne University. This method of analysis consists of five essential steps and represents the other variations well. Whichever model is chosen, those wishing to conduct phenomenological research must choose a model and abide by its procedures. Empirical phenomenology is presented as an example.

  • Sense of the whole. One reads the entire description in order to get a general sense of the whole statement. This often takes a few readings, which should be approached contemplatively.
  • Discrimination of meaning units. Once the sense of the whole has been grasped, the researcher returns to the beginning and reads through the text once more, delineating each transition in meaning.
  • The researcher adopts a psychological perspective to do this. This means that the researcher looks for shifts in psychological meaning.
  • The researcher focuses on the phenomenon being investigated. This means that the researcher keeps in mind the study's topic and looks for meaningful passages related to it.
  • The researcher next eliminates redundancies and unrelated meaning units.
  • Transformation of subjects' everyday expressions (meaning units) into psychological language. Once meaning units have been delineated,
  • The researcher reflects on each of the meaning units, which are still expressed in the concrete language of the participants, and describes the essence of the statement for the participant.
  • The researcher makes these descriptions in the language of psychological science.
  • Synthesis of transformed meaning units into a consistent statement of the structure of the experience.
  • Using imaginative variation on these transformed meaning units, the researcher discovers what remains unchanged when variations are imaginatively applied, and
  • From this develops a consistent statement regarding the structure of the participant's experience.
  • The researcher completes this process for each transcript in the study.
  • Final synthesis. Finally, the researcher synthesizes all of the statements regarding each participant's experience into one consistent statement that describes and captures [of] the essence of the experience being studied.

(Adapted from Giorgi, 1985, 1997; Giorgi and Giorgi, 2003.)

Data Analysis in Heuristics

Six steps typically characterize the heuristic process of data analysis, consisting of:

  • Initial engagement.
  • Incubation.
  • Illumination.
  • Explication.

To start, place all the material drawn from one participant before you (recordings, transcriptions, journals, notes, poems, artwork, and so on). This material may either be data gathered by self-search or by interviews with co-researchers.

  • Immerse yourself fully in the material until you are aware of and understand everything that is before you.
  • Incubate the material. Put the material aside for a while. Let it settle in you. Live with it but without particular attention or focus. Return to the immersion process. Make notes where they would enable you to remember or classify the material. Continue this rhythm of working with the data and resting until an illumination or essential configuration emerges. From your core or global sense, list the essential components or patterns and themes that characterize the fundamental nature and meaning of the experience. Reflectively study the patterns and themes, dwell inside them, and develop a full depiction of the experience. The depiction must include the essential components of the experience.
  • Illustrate the depiction of the experience with verbatim samples, poems, stories, or other materials to highlight and accentuate the person's lived experience.
  • Return to the raw material of your co-researcher (participant). Does your depiction of the experience fit the data from which you have developed it? Does it contain all that is essential?
  • Develop a full reflective depiction of the experience, one that characterizes the participant's experience reflecting core meanings for the individuals as a whole. Include in the depiction, verbatim samples, poems, stories, and the like to highlight and accentuate the lived nature of the experience. This depiction will serve as the creative synthesis, which will combine the themes and patterns into a representation of the whole in an aesthetically pleasing way. This synthesis will communicate the essence of the lived experience under inquiry. The synthesis is more than a summary: it is like a chemical reaction, a creation anew.
  • Return to the data and develop a portrait of the person in such a way that the phenomenon and the person emerge as real.

(Adapted from Douglass and Moustakas, l985; Moustakas, 1990.)

Bogdan, R., & Taylor, S. J. (1975). Introduction to qualitative research methods: A phenomenological approach (3rd ed.). New York, NY: Wiley.

Corbin, J., & Strauss, A. (2008). Basics of qualitative research: Techniques and procedures for developing grounded theory (3rd ed.). Los Angeles, CA: Sage.

Creswell, J. W. (1998). Qualitative inquiry and research design: Choosing among five traditions . Thousand Oaks, CA: Sage.

Douglass, B. G., & Moustakas, C. (1985). Heuristic inquiry: The internal search to know. Journal of Humanistic Psychology , 25(3), 39–55.

Giorgi, A. (Ed.). (1985). Phenomenology and psychological research . Pittsburgh, PA: Duquesne University Press.

Giorgi, A. (1997). The theory, practice and evaluation of phenomenological methods as a qualitative research procedure. Journal of Phenomenological Psychology , 28, 235–260.

Giorgi, A. P., & Giorgi, B. M. (2003). The descriptive phenomenological psychological method. In P. M. Camic, J. E. Rhodes, & L. Yardley (Eds.), Qualitative research in psychology: Expanding perspectives in methodology and design (pp. 243–273). Washington, DC: American Psychological Association.

Moustakas, C. (1990). Heuristic research: Design, methodology, and applications . Newbury Park, CA: Sage.

Stake, R. E. (1995). The art of case study research . Thousand Oaks, CA: Sage.

Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques . Newbury Park, CA: Sage.

Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and theory for developing grounded theory (2nd ed.). Thousand Oaks, CA: Sage.

Taylor, S. J., & Bogdan, R. (1998). Introduction to qualitative research methods: A guidebook and resource (3rd ed.). New York: Wiley.

Doc. reference: phd_t3_u06s6_qualanalysis.html

Case Study Research in Software Engineering: Guidelines and Examples by Per Runeson, Martin Höst, Austen Rainer, Björn Regnell

Get full access to Case Study Research in Software Engineering: Guidelines and Examples and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

DATA ANALYSIS AND INTERPRETATION

5.1 introduction.

Once data has been collected the focus shifts to analysis of data. It can be said that in this phase, data is used to understand what actually has happened in the studied case, and where the researcher understands the details of the case and seeks patterns in the data. This means that there inevitably is some analysis going on also in the data collection phase where the data is studied, and for example when data from an interview is transcribed. The understandings in the earlier phases are of course also valid and important, but this chapter is more focusing on the separate phase that starts after the data has been collected.

Data analysis is conducted differently for quantitative and qualitative data. Sections 5.2 – 5.5 describe how to analyze qualitative data and how to assess the validity of this type of analysis. In Section 5.6 , a short introduction to quantitative analysis methods is given. Since quantitative analysis is covered extensively in textbooks on statistical analysis, and case study research to a large extent relies on qualitative data, this section is kept short.

5.2 ANALYSIS OF DATA IN FLEXIBLE RESEARCH

5.2.1 introduction.

As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence. The chain of evidence means that a reader ...

Get Case Study Research in Software Engineering: Guidelines and Examples now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

data analysis techniques case study

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case Study? | Definition, Examples & Methods

What Is a Case Study? | Definition, Examples & Methods

Published on May 8, 2019 by Shona McCombes . Revised on November 20, 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyze the case, other interesting articles.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Prevent plagiarism. Run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

TipIf your research is more practical in nature and aims to simultaneously investigate an issue as you solve it, consider conducting action research instead.

Unlike quantitative or experimental research , a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

Example of an outlying case studyIn the 1960s the town of Roseto, Pennsylvania was discovered to have extremely low rates of heart disease compared to the US average. It became an important case study for understanding previously neglected causes of heart disease.

However, you can also choose a more common or representative case to exemplify a particular category, experience or phenomenon.

Example of a representative case studyIn the 1920s, two sociologists used Muncie, Indiana as a case study of a typical American city that supposedly exemplified the changing culture of the US at the time.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews , observations , and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data.

Example of a mixed methods case studyFor a case study of a wind farm development in a rural area, you could collect quantitative data on employment rates and business revenue, collect qualitative data on local people’s perceptions and experiences, and analyze local and national media coverage of the development.

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis , with separate sections or chapters for the methods , results and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyze its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). What Is a Case Study? | Definition, Examples & Methods. Scribbr. Retrieved April 4, 2024, from https://www.scribbr.com/methodology/case-study/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, primary vs. secondary sources | difference & examples, what is a theoretical framework | guide to organizing, what is action research | definition & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

10 Real World Data Science Case Studies Projects with Example

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.  We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.

data_science_project

Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

data science case studies

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science  at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Here's what valued users are saying about ProjectPro

user profile

Director Data Analytics at EY / EY Tech

user profile

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

Not sure what you are looking for?

iii) Packing Optimization 

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering. 

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience. 

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics. 

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn 

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used. 

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato. 

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time. 

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Currently taking bookings for January 2024 >>

data analysis techniques case study

The Convergence Blog

The convergence - an online community space that's dedicated to empowering operators in the data industry by providing news and education about evergreen strategies, late-breaking data & ai developments, and free or low-cost upskilling resources that you need to thrive as a leader in the data & ai space., data analysis case study: learn from humana’s automated data analysis project.

Lillian Pierson, P.E.

Lillian Pierson, P.E.

Playback speed:

Got data? Great! Looking for that perfect data analysis case study to help you get started using it? You’re in the right place.

If you’ve ever struggled to decide what to do next with your data projects, to actually find meaning in the data, or even to decide what kind of data to collect, then KEEP READING…

Deep down, you know what needs to happen. You need to initiate and execute a data strategy that really moves the needle for your organization. One that produces seriously awesome business results.

But how you’re in the right place to find out..

As a data strategist who has worked with 10 percent of Fortune 100 companies, today I’m sharing with you a case study that demonstrates just how real businesses are making real wins with data analysis. 

In the post below, we’ll look at:

  • A shining data success story;
  • What went on ‘under-the-hood’ to support that successful data project; and
  • The exact data technologies used by the vendor, to take this project from pure strategy to pure success

If you prefer to watch this information rather than read it, it’s captured in the video below:

Here’s the url too: https://youtu.be/xMwZObIqvLQ

3 Action Items You Need To Take

To actually use the data analysis case study you’re about to get – you need to take 3 main steps. Those are:

  • Reflect upon your organization as it is today (I left you some prompts below – to help you get started)
  • Review winning data case collections (starting with the one I’m sharing here) and identify 5 that seem the most promising for your organization given it’s current set-up
  • Assess your organization AND those 5 winning case collections. Based on that assessment, select the “QUICK WIN” data use case that offers your organization the most bang for it’s buck

Step 1: Reflect Upon Your Organization

Whenever you evaluate data case collections to decide if they’re a good fit for your organization, the first thing you need to do is organize your thoughts with respect to your organization as it is today.

Before moving into the data analysis case study, STOP and ANSWER THE FOLLOWING QUESTIONS – just to remind yourself:

  • What is the business vision for our organization?
  • What industries do we primarily support?
  • What data technologies do we already have up and running, that we could use to generate even more value?
  • What team members do we have to support a new data project? And what are their data skillsets like?
  • What type of data are we mostly looking to generate value from? Structured? Semi-Structured? Un-structured? Real-time data? Huge data sets? What are our data resources like?

Jot down some notes while you’re here. Then keep them in mind as you read on to find out how one company, Humana, used its data to achieve a 28 percent increase in customer satisfaction. Also include its 63 percent increase in employee engagement! (That’s such a seriously impressive outcome, right?!)

Step 2: Review Data Case Studies

Here we are, already at step 2. It’s time for you to start reviewing data analysis case studies  (starting with the one I’m sharing below). I dentify 5 that seem the most promising for your organization given its current set-up.

Humana’s Automated Data Analysis Case Study

The key thing to note here is that the approach to creating a successful data program varies from industry to industry .

Let’s start with one to demonstrate the kind of value you can glean from these kinds of success stories.

Humana has provided health insurance to Americans for over 50 years. It is a service company focused on fulfilling the needs of its customers. A great deal of Humana’s success as a company rides on customer satisfaction, and the frontline of that battle for customers’ hearts and minds is Humana’s customer service center.

Call centers are hard to get right. A lot of emotions can arise during a customer service call, especially one relating to health and health insurance. Sometimes people are frustrated. At times, they’re upset. Also, there are times the customer service representative becomes aggravated, and the overall tone and progression of the phone call goes downhill. This is of course very bad for customer satisfaction.

Humana wanted to use artificial intelligence to improve customer satisfaction (and thus, customer retention rates & profits per customer).

Humana wanted to find a way to use artificial intelligence to monitor their phone calls and help their agents do a better job connecting with their customers in order to improve customer satisfaction (and thus, customer retention rates & profits per customer ).

In light of their business need, Humana worked with a company called Cogito, which specializes in voice analytics technology.

Cogito offers a piece of AI technology called Cogito Dialogue. It’s been trained to identify certain conversational cues as a way of helping call center representatives and supervisors stay actively engaged in a call with a customer.

The AI listens to cues like the customer’s voice pitch.

If it’s rising, or if the call representative and the customer talk over each other, then the dialogue tool will send out electronic alerts to the agent during the call.

Humana fed the dialogue tool customer service data from 10,000 calls and allowed it to analyze cues such as keywords, interruptions, and pauses, and these cues were then linked with specific outcomes. For example, if the representative is receiving a particular type of cues, they are likely to get a specific customer satisfaction result.

The Outcome

Customers were happier, and customer service representatives were more engaged..

This automated solution for data analysis has now been deployed in 200 Humana call centers and the company plans to roll it out to 100 percent of its centers in the future.

The initiative was so successful, Humana has been able to focus on next steps in its data program. The company now plans to begin predicting the type of calls that are likely to go unresolved, so they can send those calls over to management before they become frustrating to the customer and customer service representative alike.

What does this mean for you and your business?

Well, if you’re looking for new ways to generate value by improving the quantity and quality of the decision support that you’re providing to your customer service personnel, then this may be a perfect example of how you can do so.

Humana’s Business Use Cases

Humana’s data analysis case study includes two key business use cases:

  • Analyzing customer sentiment; and
  • Suggesting actions to customer service representatives.

Analyzing Customer Sentiment

First things first, before you go ahead and collect data, you need to ask yourself who and what is involved in making things happen within the business.

In the case of Humana, the actors were:

  • The health insurance system itself
  • The customer, and
  • The customer service representative

As you can see in the use case diagram above, the relational aspect is pretty simple. You have a customer service representative and a customer. They are both producing audio data, and that audio data is being fed into the system.

Humana focused on collecting the key data points, shown in the image below, from their customer service operations.

By collecting data about speech style, pitch, silence, stress in customers’ voices, length of call, speed of customers’ speech, intonation, articulation, silence, and representatives’  manner of speaking, Humana was able to analyze customer sentiment and introduce techniques for improved customer satisfaction.

Having strategically defined these data points, the Cogito technology was able to generate reports about customer sentiment during the calls.

Suggesting actions to customer service representatives.

The second use case for the Humana data program follows on from the data gathered in the first case.

In Humana’s case, Cogito generated a host of call analyses and reports about key call issues.

In the second business use case, Cogito was able to suggest actions to customer service representatives, in real-time , to make use of incoming data and help improve customer satisfaction on the spot.

The technology Humana used provided suggestions via text message to the customer service representative, offering the following types of feedback:

  • The tone of voice is too tense
  • The speed of speaking is high
  • The customer representative and customer are speaking at the same time

These alerts allowed the Humana customer service representatives to alter their approach immediately , improving the quality of the interaction and, subsequently, the customer satisfaction.

The preconditions for success in this use case were:

  • The call-related data must be collected and stored
  • The AI models must be in place to generate analysis on the data points that are recorded during the calls

Evidence of success can subsequently be found in a system that offers real-time suggestions for courses of action that the customer service representative can take to improve customer satisfaction.

Thanks to this data-intensive business use case, Humana was able to increase customer satisfaction, improve customer retention rates, and drive profits per customer.

The Technology That Supports This Data Analysis Case Study

I promised to dip into the tech side of things. This is especially for those of you who are interested in the ins and outs of how projects like this one are actually rolled out.

Here’s a little rundown of the main technologies we discovered when we investigated how Cogito runs in support of its clients like Humana.

  • For cloud data management Cogito uses AWS, specifically the Athena product
  • For on-premise big data management, the company used Apache HDFS – the distributed file system for storing big data
  • They utilize MapReduce, for processing their data
  • And Cogito also has traditional systems and relational database management systems such as PostgreSQL
  • In terms of analytics and data visualization tools, Cogito makes use of Tableau
  • And for its machine learning technology, these use cases required people with knowledge in Python, R, and SQL, as well as deep learning (Cogito uses the PyTorch library and the TensorFlow library)

These data science skill sets support the effective computing, deep learning , and natural language processing applications employed by Humana for this use case.

If you’re looking to hire people to help with your own data initiative, then people with those skills listed above, and with experience in these specific technologies, would be a huge help.

Step 3: S elect The “Quick Win” Data Use Case

Still there? Great!

It’s time to close the loop.

Remember those notes you took before you reviewed the study? I want you to STOP here and assess. Does this Humana case study seem applicable and promising as a solution, given your organization’s current set-up…

YES ▶ Excellent!

Earmark it and continue exploring other winning data use cases until you’ve identified 5 that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that.

NO , Lillian – It’s not applicable. ▶  No problem.

Discard the information and continue exploring the winning data use cases we’ve categorized for you according to business function and industry. Save time by dialing down into the business function you know your business really needs help with now. Identify 5 winning data use cases that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that data use case.

More resources to get ahead...

Get income-generating ideas for data professionals, are you tired of relying on one employer for your income are you dreaming of a side hustle that won’t put you at risk of getting fired or sued well, my friend, you’re in luck..

ideas for data analyst side jobs

This 48-page listing is here to rescue you from the drudgery of corporate slavery and set you on the path to start earning more money from your existing data expertise. Spend just 1 hour with this pdf and I can guarantee you’ll be bursting at the seams with practical, proven & profitable ideas for new income-streams you can create from your existing expertise. Learn more here!

Get the convergence newsletter.

data analysis techniques case study

Income-Generating Ideas For Data Professionals

A 48-page listing of income-generating product and service ideas for data professionals who want to earn additional money from their data expertise without relying on an employer to make it happen..

data analysis techniques case study

Data Strategy Action Plan

A step-by-step checklist & collaborative trello board planner for data professionals who want to get unstuck & up-leveled into their next promotion by delivering a fail-proof data strategy plan for their data projects..

data analysis techniques case study

Get more actionable advice by joining The Convergence Newsletter for free below.

Data Breach Types

The Top 10 Data Breach Types and How to Safeguard Yourself

data privacy risks

Hidden Danger In Greater Data Privacy

data analysis techniques case study

Data Product Manager Resume Template To Land The Job!

Machine Learning Security - how to protect your networks and applications in the ML environment

Machine Learning Security: Protecting Networks and Applications in Your ML Environment

The generative ai ethics involved in RLHF seem iffy

Ugly Generative AI Ethics Concerns: RLHF Edition

data analysis techniques case study

Fractional CMO for deep tech B2B businesses. Specializing in go-to-market strategy, SaaS product growth, and consulting revenue growth. American expat serving clients worldwide since 2012.

Get connected, © data-mania, 2012 - 2024+, all rights reserved - terms & conditions  -  privacy policy | products protected by copyscape, privacy overview.

data analysis techniques case study

Get The Newsletter

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Case Study – Methods, Examples and Guide

Case Study – Methods, Examples and Guide

Table of Contents

Case Study Research

A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation.

It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied. Case studies typically involve multiple sources of data, including interviews, observations, documents, and artifacts, which are analyzed using various techniques, such as content analysis, thematic analysis, and grounded theory. The findings of a case study are often used to develop theories, inform policy or practice, or generate new research questions.

Types of Case Study

Types and Methods of Case Study are as follows:

Single-Case Study

A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail.

For Example , A researcher might conduct a single-case study on a particular individual to understand their experiences with a particular health condition or a specific organization to explore their management practices. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice.

Multiple-Case Study

A multiple-case study involves the analysis of several cases that are similar in nature. This type of case study is useful when the researcher wants to identify similarities and differences between the cases.

For Example, a researcher might conduct a multiple-case study on several companies to explore the factors that contribute to their success or failure. The researcher collects data from each case, compares and contrasts the findings, and uses various techniques to analyze the data, such as comparative analysis or pattern-matching. The findings of a multiple-case study can be used to develop theories, inform policy or practice, or generate new research questions.

Exploratory Case Study

An exploratory case study is used to explore a new or understudied phenomenon. This type of case study is useful when the researcher wants to generate hypotheses or theories about the phenomenon.

For Example, a researcher might conduct an exploratory case study on a new technology to understand its potential impact on society. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as grounded theory or content analysis. The findings of an exploratory case study can be used to generate new research questions, develop theories, or inform policy or practice.

Descriptive Case Study

A descriptive case study is used to describe a particular phenomenon in detail. This type of case study is useful when the researcher wants to provide a comprehensive account of the phenomenon.

For Example, a researcher might conduct a descriptive case study on a particular community to understand its social and economic characteristics. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a descriptive case study can be used to inform policy or practice or generate new research questions.

Instrumental Case Study

An instrumental case study is used to understand a particular phenomenon that is instrumental in achieving a particular goal. This type of case study is useful when the researcher wants to understand the role of the phenomenon in achieving the goal.

For Example, a researcher might conduct an instrumental case study on a particular policy to understand its impact on achieving a particular goal, such as reducing poverty. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of an instrumental case study can be used to inform policy or practice or generate new research questions.

Case Study Data Collection Methods

Here are some common data collection methods for case studies:

Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where the same questions are asked to all participants) or unstructured (where the interviewer follows up on the responses with further questions). Interviews can be conducted in person, over the phone, or through video conferencing.

Observations

Observations involve watching and recording the behavior and activities of individuals or groups relevant to the case study. Observations can be participant (where the researcher actively participates in the activities) or non-participant (where the researcher observes from a distance). Observations can be recorded using notes, audio or video recordings, or photographs.

Documents can be used as a source of information for case studies. Documents can include reports, memos, emails, letters, and other written materials related to the case study. Documents can be collected from the case study participants or from public sources.

Surveys involve asking a set of questions to a sample of individuals relevant to the case study. Surveys can be administered in person, over the phone, through mail or email, or online. Surveys can be used to gather information on attitudes, opinions, or behaviors related to the case study.

Artifacts are physical objects relevant to the case study. Artifacts can include tools, equipment, products, or other objects that provide insights into the case study phenomenon.

How to conduct Case Study Research

Conducting a case study research involves several steps that need to be followed to ensure the quality and rigor of the study. Here are the steps to conduct case study research:

  • Define the research questions: The first step in conducting a case study research is to define the research questions. The research questions should be specific, measurable, and relevant to the case study phenomenon under investigation.
  • Select the case: The next step is to select the case or cases to be studied. The case should be relevant to the research questions and should provide rich and diverse data that can be used to answer the research questions.
  • Collect data: Data can be collected using various methods, such as interviews, observations, documents, surveys, and artifacts. The data collection method should be selected based on the research questions and the nature of the case study phenomenon.
  • Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions and should aim to provide insights and conclusions relevant to the research questions.
  • Draw conclusions: The conclusions drawn from the case study should be based on the data analysis and should be relevant to the research questions. The conclusions should be supported by evidence and should be clearly stated.
  • Validate the findings: The findings of the case study should be validated by reviewing the data and the analysis with participants or other experts in the field. This helps to ensure the validity and reliability of the findings.
  • Write the report: The final step is to write the report of the case study research. The report should provide a clear description of the case study phenomenon, the research questions, the data collection methods, the data analysis, the findings, and the conclusions. The report should be written in a clear and concise manner and should follow the guidelines for academic writing.

Examples of Case Study

Here are some examples of case study research:

  • The Hawthorne Studies : Conducted between 1924 and 1932, the Hawthorne Studies were a series of case studies conducted by Elton Mayo and his colleagues to examine the impact of work environment on employee productivity. The studies were conducted at the Hawthorne Works plant of the Western Electric Company in Chicago and included interviews, observations, and experiments.
  • The Stanford Prison Experiment: Conducted in 1971, the Stanford Prison Experiment was a case study conducted by Philip Zimbardo to examine the psychological effects of power and authority. The study involved simulating a prison environment and assigning participants to the role of guards or prisoners. The study was controversial due to the ethical issues it raised.
  • The Challenger Disaster: The Challenger Disaster was a case study conducted to examine the causes of the Space Shuttle Challenger explosion in 1986. The study included interviews, observations, and analysis of data to identify the technical, organizational, and cultural factors that contributed to the disaster.
  • The Enron Scandal: The Enron Scandal was a case study conducted to examine the causes of the Enron Corporation’s bankruptcy in 2001. The study included interviews, analysis of financial data, and review of documents to identify the accounting practices, corporate culture, and ethical issues that led to the company’s downfall.
  • The Fukushima Nuclear Disaster : The Fukushima Nuclear Disaster was a case study conducted to examine the causes of the nuclear accident that occurred at the Fukushima Daiichi Nuclear Power Plant in Japan in 2011. The study included interviews, analysis of data, and review of documents to identify the technical, organizational, and cultural factors that contributed to the disaster.

Application of Case Study

Case studies have a wide range of applications across various fields and industries. Here are some examples:

Business and Management

Case studies are widely used in business and management to examine real-life situations and develop problem-solving skills. Case studies can help students and professionals to develop a deep understanding of business concepts, theories, and best practices.

Case studies are used in healthcare to examine patient care, treatment options, and outcomes. Case studies can help healthcare professionals to develop critical thinking skills, diagnose complex medical conditions, and develop effective treatment plans.

Case studies are used in education to examine teaching and learning practices. Case studies can help educators to develop effective teaching strategies, evaluate student progress, and identify areas for improvement.

Social Sciences

Case studies are widely used in social sciences to examine human behavior, social phenomena, and cultural practices. Case studies can help researchers to develop theories, test hypotheses, and gain insights into complex social issues.

Law and Ethics

Case studies are used in law and ethics to examine legal and ethical dilemmas. Case studies can help lawyers, policymakers, and ethical professionals to develop critical thinking skills, analyze complex cases, and make informed decisions.

Purpose of Case Study

The purpose of a case study is to provide a detailed analysis of a specific phenomenon, issue, or problem in its real-life context. A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community.

The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics. Case studies can help researchers to identify and examine the underlying factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and detailed understanding of the case, which can inform future research, practice, or policy.

Case studies can also serve other purposes, including:

  • Illustrating a theory or concept: Case studies can be used to illustrate and explain theoretical concepts and frameworks, providing concrete examples of how they can be applied in real-life situations.
  • Developing hypotheses: Case studies can help to generate hypotheses about the causal relationships between different factors and outcomes, which can be tested through further research.
  • Providing insight into complex issues: Case studies can provide insights into complex and multifaceted issues, which may be difficult to understand through other research methods.
  • Informing practice or policy: Case studies can be used to inform practice or policy by identifying best practices, lessons learned, or areas for improvement.

Advantages of Case Study Research

There are several advantages of case study research, including:

  • In-depth exploration: Case study research allows for a detailed exploration and analysis of a specific phenomenon, issue, or problem in its real-life context. This can provide a comprehensive understanding of the case and its dynamics, which may not be possible through other research methods.
  • Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced understanding of the case and its complexity.
  • Holistic perspective: Case study research allows for a holistic perspective of the case, taking into account the various factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and comprehensive understanding of the case.
  • Theory development: Case study research can help to develop and refine theories and concepts by providing empirical evidence and concrete examples of how they can be applied in real-life situations.
  • Practical application: Case study research can inform practice or policy by identifying best practices, lessons learned, or areas for improvement.
  • Contextualization: Case study research takes into account the specific context in which the case is situated, which can help to understand how the case is influenced by the social, cultural, and historical factors of its environment.

Limitations of Case Study Research

There are several limitations of case study research, including:

  • Limited generalizability : Case studies are typically focused on a single case or a small number of cases, which limits the generalizability of the findings. The unique characteristics of the case may not be applicable to other contexts or populations, which may limit the external validity of the research.
  • Biased sampling: Case studies may rely on purposive or convenience sampling, which can introduce bias into the sample selection process. This may limit the representativeness of the sample and the generalizability of the findings.
  • Subjectivity: Case studies rely on the interpretation of the researcher, which can introduce subjectivity into the analysis. The researcher’s own biases, assumptions, and perspectives may influence the findings, which may limit the objectivity of the research.
  • Limited control: Case studies are typically conducted in naturalistic settings, which limits the control that the researcher has over the environment and the variables being studied. This may limit the ability to establish causal relationships between variables.
  • Time-consuming: Case studies can be time-consuming to conduct, as they typically involve a detailed exploration and analysis of a specific case. This may limit the feasibility of conducting multiple case studies or conducting case studies in a timely manner.
  • Resource-intensive: Case studies may require significant resources, including time, funding, and expertise. This may limit the ability of researchers to conduct case studies in resource-constrained settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Observational Research

Observational Research – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

Survey Research

Survey Research – Types, Methods, Examples

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis. 

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include: 

  • Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate. 
  • Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes. 
  • Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic. 
  • Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

  • Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
  • Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply. 
  • Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

Data analysis process graphic

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis. 

  • Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step. 
  • Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others.  An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario. 
  • Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data. 
  • Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others. 
  • Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them. 

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of ​​application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

Top 17 data analysis methods

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches. 

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world: 

A. Quantitative Methods 

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods. 

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.  

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist. 

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.  When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result. 

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events. 

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.  

8. Decision Trees 

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision. 

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely.  Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision.  In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis 

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more. 

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments. 

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic. 

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example. 

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of. 

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all”  and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses.  When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all. 

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading. 

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best. 

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data. 

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next. 

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context. 

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question. 

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service. 

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore,  to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize. 

15. Narrative Analysis 

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others. 

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.  

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study. 

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on. 

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice. 

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data. 

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes. 

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.  

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

data connectors from datapine

4. Think of governance 

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical. 

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole. 

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors. 

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

Transportation costs logistics KPIs

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations. 

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

  • Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation. 
  • Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
  • Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

  • Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
  • Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
  • SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
  • Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly 

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving. 

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in. 

  • Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low. 
  • External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high. 
  • Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now. 
  • Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps. 

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource . 

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail. 

  • Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions. 
  • Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective. 
  • Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them. 
  • Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them. 
  • Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.   
  • Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy. 
  • Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way. 
  • Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data. 

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

  • Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers. 
  • Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate. 
  • Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient. 
  • SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis. 
  • Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context. 

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

  • By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
  • 94% of enterprises say that analyzing data is important for their growth and digital transformation. 
  • Companies that exploit the full potential of their data can increase their operating margins by 60% .
  • We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis 

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

  • Cluster analysis
  • Cohort analysis
  • Regression analysis
  • Factor analysis
  • Neural Networks
  • Data Mining
  • Text analysis
  • Time series analysis
  • Decision trees
  • Conjoint analysis 
  • Correspondence Analysis
  • Multidimensional Scaling 
  • Content analysis 
  • Thematic analysis
  • Narrative analysis 
  • Grounded theory analysis
  • Discourse analysis 

Top 17 Data Analysis Techniques:

  • Collaborate your needs
  • Establish your questions
  • Data democratization
  • Think of data governance 
  • Clean your data
  • Set your KPIs
  • Omit useless data
  • Build a data management roadmap
  • Integrate technology
  • Answer your questions
  • Visualize your data
  • Interpretation of data
  • Consider autonomous technology
  • Build a narrative
  • Share the load
  • Data Analysis tools
  • Refine your process constantly 

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Prev Med Public Health
  • v.56(2); 2023 Mar
  • PMC10111102

Qualitative Research in Healthcare: Data Analysis

1 Department of Preventive Medicine, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Korea

2 Ulsan Metropolitan City Public Health Policy’s Institute, Ulsan, Korea

Hyeran Jung

3 Department of Preventive Medicine, University of Ulsan College of Medicine, Seoul, Korea

Qualitative research methodology has been applied with increasing frequency in various fields, including in healthcare research, where quantitative research methodology has traditionally dominated, with an empirically driven approach involving statistical analysis. Drawing upon artifacts and verbal data collected from in-depth interviews or participatory observations, qualitative research examines the comprehensive experiences of research participants who have experienced salient yet unappreciated phenomena. In this study, we review 6 representative qualitative research methodologies in terms of their characteristics and analysis methods: consensual qualitative research, phenomenological research, qualitative case study, grounded theory, photovoice, and content analysis. We mainly focus on specific aspects of data analysis and the description of results, while also providing a brief overview of each methodology’s philosophical background. Furthermore, since quantitative researchers have criticized qualitative research methodology for its perceived lack of validity, we examine various validation methods of qualitative research. This review article intends to assist researchers in employing an ideal qualitative research methodology and in reviewing and evaluating qualitative research with proper standards and criteria.

INTRODUCTION

Researchers should select the research methodology best suited for their study. Quantitative research, which is based on empiricism and positivism, has long been the mainstream research methodology in most scientific fields. In recent years, however, increasing attempts have been made to use qualitative research methodology in various research fields, either combined with quantitative research methodology or as a stand-alone research method. Unlike quantitative research, which performs statistical analyses using the results derived in numerical form through investigations or experiments, qualitative research uses various qualitative analysis methods based on verbal data obtained through participatory observations or in-depth interviews. Qualitative research is advantageous when researching topics that involve research participants’ in-depth experiences and perceptions, topics that are important but have not yet drawn sufficient attention, and topics that should be reviewed from a new perspective.

However, qualitative research remains relatively rare in healthcare research, with quantitative research still predominating as the mainstream research practice [ 1 ]. Consequently, there is a lack of understanding of qualitative research, its characteristics, and its procedures in healthcare research. The low level of awareness of qualitative research can lead to the denigration of its results. Therefore, it is essential not only for researchers conducting qualitative research to have a correct understanding of various qualitative research methods, but also for peer researchers who review research proposals, reports, and papers to properly understand the procedures and advantages/disadvantages of qualitative research.

In our previous review paper, we explored the characteristics of qualitative research in comparison to quantitative research and its usefulness in healthcare research [ 2 ]. Specifically, we conducted an in-depth review of the general qualitative research process, selection of research topics and problems, selection of theoretical frameworks and methods, literature analysis, and selection of research participants and data collection methods [ 2 ]. This review article is dedicated to data analysis and the description of results, which may be considered the core of qualitative research, in different qualitative research methods in greater detail, along with the criteria for evaluating the validity of qualitative research. This review article is expected to offer insights into selecting and implementing the qualitative research methodology best suited for a given research topic and evaluating the quality of research.

IN-DEPTH REVIEW OF QUALITATIVE RESEARCH METHODS

This section is devoted to the in-depth review of 6 qualitative research methodologies (consensual qualitative research, phenomenological research, qualitative case study, grounded theory, photovoice, and content analysis), focusing on their characteristics and concrete analysis processes. Table 1 summarizes the characteristics of each methodology.

Characteristics and key analytical approaches of each qualitative research methodology

Consensual Qualitative Research

Consensual qualitative research (CQR) was developed by Professor Clara Hill of the University of Maryland [ 3 ]. It emphasizes consensus within a research team (or analysis team) to address the problem of low objectivity being likely to occur when conducting qualitative research. This method seeks to maintain scientific rigor by deriving analysis results through team consensus, asserting the importance of ethical issues, trust, and the role of culture. In CQR, researchers are required to verify each conclusion whenever it is drawn by checking it against the original data.

Building a solid research team is the first step in conducting CQR. Most importantly, each team member should have resolute initiative and clear motivations for joining the research team. In general, at least 3 main team members are needed for data analysis, with 1 or 2 advisors (or auditors) reviewing their work. Researchers without experience in CQR should first receive prior education and training on its procedures and then team up with team members experienced in CQR. Furthermore, as is the case with other types of qualitative research, CQR attaches great importance to ensuring the objectivity of research by sharing prejudices, pre-understanding, and expectations of the research topic among the team members.

CQR is performed in 4 sequential steps: the initial stage, intra-case analysis stage, cross-analysis stage, and manuscript writing stage [ 4 ]. First, in the initial stage, the pre-formed team of researchers selects a research topic, performs a literature review, develops an interview guideline, and conducts pilot interviews. Research participants who fit the research topic are recruited using inclusion and exclusion criteria for selecting suitable participants. Then, interviews are conducted according to the interview guideline, recorded, and transcribed. The transcripts are sent to the interviewees for review. During this process, researchers could make slight modifications to explore the research topic better.

Second, in intra-case analysis stage, domains and subdomains are developed based on the initial interview guideline. The initial domains and subdomains are used to analyze 1 or 2 interviews, and afterward, the domains and subdomains are modified to reflect the analysis results. Core ideas are also created through interview analysis and are coded in domains and subdomains. The advisors review the domains, subdomains, and core ideas and provide suggestions for improvement. The remaining interviews are analyzed according to the revised domains, subdomains, and core ideas.

Third, in the cross-analysis stage, the core ideas from the interview analysis are categorized according to the domains and subdomains. In this process, repeated team discussions are encouraged to revise domains and subdomains and place the core ideas that do not lend themselves well to categorization into a miscellaneous category. The frequency of occurrence of each domain is then calculated for each interview case. In general, a domain is classified as a general category when it appears in all cases, a typical category when it appears in more than half of the cases, and a variant category when it appears in fewer than half of the cases [ 5 ]. However, the criteria for frequency counting may slightly differ from study to study. The advisors should also review the results of the cross-analysis stage, and the main analysis team revises the analysis results based on those comments.

Fourth, the intra-case analysis and cross-analysis results are described in the manuscript writing stage. It is essential to present a clear and convincing narrative to the audience [ 5 ], and it is thus recommended to revise and formulate the manuscript based on team discussions and advisor opinions. However, CQR does not guarantee that different research teams would reach similar conclusions, and the CQR research team dynamics strongly affect conflict-resolution issues during the consensus-building process [ 3 ].

As examined above, despite its limitations, the salient feature of CQR is its rigorous process for ensuring the objectivity of analysis results compared to other qualitative research methods. In addition, it is an accessible method for quantitative researchers because it explains the analysis results in terms of the frequency of domain occurrences. CQR can be a suitable research methodology to persuade researchers who are hesitant to accept the results of qualitative research. Although CQR is still rarely used in healthcare research, some studies have applied it to investigate topics of interest [ 6 , 7 ].

Phenomenological Research

Phenomenological research (PR) is, as its name suggests, qualitative research based on the phenomenological principle. The term “phenomenological principle” is based on Husserlian phenomenology, which seeks the essence (inner core) and the meaning of people’s lived experiences [ 8 ]. According to Husserl, it is necessary to go “back to the things themselves” (in German: zurück zu den Sachen selbst ) and accurately explore the essence of experience. Diverse reflective attitudes based on the phenomenological principle are required to understand “ Sachen ” without expectations and prejudices [ 9 ]. Thus, the purpose of PR using Husserl’s phenomenological principle can be understood as an inquiry into the essence of experience.

The process of PR aiming to fulfill this purpose differs among various schools and scholars. The Husserlian, Heideggerian, and Utrecht schools had major impacts on PR [ 10 ]. Representative Husserlian scholars who further developed the PR process include Amedeo Giorgi and Paul Colaizzi. Giorgi, who pioneered the field of phenomenological psychology, collected data through in-depth interviews and divided the analysis process into 4 steps [ 11 ]. Colaizzi, who was one of Giorgi’s students, proposed a more complex process from data collection to analysis [ 12 , 13 ]. Representative Heideggerian scholars are Patricia Benner, who introduced an interpretive phenomenological qualitative research method to the field of nursing on the subject of clinical placement of nursing students but did not fully clarify its specific procedure [ 14 ], and Nancy Diekelmann [ 15 ] and Nancy Diekelmann and David Allen [ 16 ], who emphasized the role of the team in the analysis process and proposed the 7-step method of analysis. Max Van Manen, a Dutch-born Canadian scholar, is a representative Utrecht School scholar who proposed a 6-step data collection and analysis process and emphasized the importance of phenomenological description [ 8 ]. As a scholar with no affiliation with any specific school, Adrian Van Kaam [ 17 ], an existentialist psychologist, developed an experiential PR method using descriptive texts. Despite differences in data collection and analysis processes, the common denominator of these approaches is a fundamentally phenomenological attitude and the goal of exploring the essence of experience.

In general, the process of phenomenological qualitative analysis can be divided into 5 steps based on the phenomenological attitude [ 18 ]: step 1, reading the data repeatedly to get a sense of the whole and gauge the meanings of the data; step 2, categorizing and clustering the data by meaning unit; step 3, writing analytically by meaning unit in a descriptive, reflective, and hermeneutic manner; step 4, deriving essential factors and thematizing while writing; and step 5, deriving the essential experiential structure by identifying the relationships between essential experiential factors. During the entire process, researchers must embrace the attitudes of “reduction” and “imaginative variation.” The term “reduction” reflects the thought of accepting the meaning of experience in the way it manifests itself [ 19 ]. An attitude of phenomenological reduction is required to recover freshness and curiosity about the research object through non-judgment, bracketing, and epoché , which assist to minimize the effects of researchers’ prejudices of research topic during the analysis process. An attitude of imaginative variation is required to diversify the meanings pertaining to data and view them as diametric opposites.

As described above, PR is characterized more by emphasizing the researcher’s constant reflection and interpretation/recording of the experience, seeking to explore its very essence, than by being conducted according to a concrete procedure. Based on these characteristics, PR in healthcare research has been applied to various topics, including research on the meaning of health behaviors such as drinking and smoking in various cultures since the 1970s [ 20 , 21 ], information and education needs of patients with diabetes [ 22 ], pain in cancer patients [ 23 ], and the experiences of healthcare students and professionals in patient safety activities [ 24 , 25 ].

Qualitative Case Study

Although case studies have long been conducted in various academic fields, in the 1980s [ 26 ], they began to be recognized as a qualitative research method with the case study publications by researchers such as Merriam [ 27 ], Stake [ 28 ], Yin [ 29 ], and Hays [ 30 ]. Case studies include both quantitative and qualitative strategies and can also be used with other qualitative research methods. In general, a qualitative case study (QCS) is a research method adopted to understand the complexity of a case, derive its meaning, and identify the process of change over time [ 27 ]. To achieve these goals, a QCS collects in-depth data using various information sources from rich contexts and explores one or more bounded systems [ 31 ].

A case, which is the core of a case study, has delimitation [ 28 ], contextuality [ 29 ], specificity [ 30 ], complexity [ 32 ], and newness [ 27 ]. The definition of a case study differs among scholars, but they agree that a case to be studied should have boundaries that distinguish it from other cases. Therefore, a case can be a person, a group, a program, or an event and can also be a single or complex case [ 28 ]. The types of QCSs are classified by the scale of the bounded system and the purpose of case analysis. From the latter perspective, Stake [ 28 ] divided case studies into intrinsic and instrumental case studies.

A QCS is conducted in 5 steps [ 33 ]. Stage 1 is the research design stage, where an overall plan is established for case selection, research question setting, research time and cost allocation, and the report format of research outcomes [ 28 ]. Yin [ 33 ] noted that 4 types of case studies could be designed based on the number of cases (single or multiple cases) and the number of analysis units (holistic design for a single unit or embedded design for multiple units). These types are called single holistic design, single embedded design, multiple holistic design, and multiple embedded design. Stage 2 is the preparation stage for data collection. The skills and qualifications required for the researcher are reviewed, prior training of researchers takes place, a protocol is developed, candidate cases are screened, and a pilot case study is conducted. Stage 3 is data collection. Data are collected from the data sources commonly used in case studies, such as documents, archival records, interviews, direct observations, participatory observations, and physical artifacts [ 33 ]. Other data sources for case studies include films, photos, videotapes, and life history studies [ 34 ]. The data collection period may vary depending on the research topic and the need for additional data collection during the analysis process. Stage 4 is the data analysis stage. The case is described in detail based on the collected data, and the data for concrete topics are analyzed [ 28 ]. With no prescribed method related to data collection and analysis for a case study, a general data analysis procedure is followed, and the choice of analysis method differs among researchers. In a multiple-case study, the meaning of the cases is interpreted by performing intra-case and inter-case analyses. The last stage is the interpretation stage, in which the researcher reports the meaning of the case—that is, the lessons learned from the case [ 35 ].

Compared to other qualitative research methods, QCSs have no prescribed procedure, which may prove challenging in the actual research process. However, when the researcher seeks an in-depth understanding of a bound system clearly distinguished from other cases, a QCS can be an appropriate approach. Based on the characteristics mentioned above, QCSs in healthcare research have been mainly conducted on unique cases or cases that should be known in detail, such as the experience of rare diseases [ 36 ], victims of medical malpractice [ 37 ], complications due to home birth [ 38 ], and post-stroke gender awareness of women of childbearing age [ 39 ].

Grounded Theory

Grounded theory (GT) is a research approach to gaining facts about an unfamiliar specific social phenomenon or a new understanding of a particular phenomenon [ 40 ]. GT involves the most systematic research process among all qualitative research methods [ 41 ]. Its most salient feature is generating a theory by collecting various data from research subjects and analyzing the relationship between the central phenomenon and each category through an elaborate analysis process. GT is adequate for understanding social and psychological structural phenomena regarding a specific object or social phenomenon, rather than framework or hypothesis testing [ 42 ].

GT was first introduced in 1967 by Strauss and Glaser. Their views subsequently diverged and each scholar separately developed different GT methods. Glaser’s GT focused on the natural emergence of categories and theories based on positivism [ 40 , 43 ]. Strauss, who was influenced by symbolic interactionism and pragmatism, teamed up with Corbin and systematically presented the techniques and procedures of the GT process [ 44 ]. Since then, various GT techniques have been developed [ 45 ]; Charmaz’s GT is based on constructivism [ 43 ].

Researchers using GT should collect data based on theoretical sampling and theoretical saturation. Theoretical sampling refers to selecting additional data using the theoretical concepts encountered in collecting and analyzing data, and theoretical saturation occurs when no new categories are expected to appear [ 40 ]. Researchers must also possess theoretical sensitivity—that is, the ability to react sensitively to the collected data and gain insight into them [ 40 ]. An analysis is performed through the constant comparative method, wherein researchers constantly compare the collected data and discover similarities and differences to understand the relationships between phenomena, concepts, and categories.

Among the different types of GT research designs, the one proposed by Strauss and Corbin is divided into 3 stages. Stage 1 is open coding; the concepts are derived from the data through a line-by-line data analysis, and the initial categorization occurs. Stage 2 is axial coding; the interrelationships among the categories derived from open coding are schematized in line with the structural framework defined as a paradigm. The major components of the paradigm are causal conditions, context, intervening conditions, action/interaction strategies, and consequences. Stage 3 is selective coding; the core category is first derived, the relationships between subcategories and concepts are identified, and the narrative outline is described. Lastly, the process is presented in a visual mode, whereupon a theoretical model is built and integrated. In contrast, Glaser’s analysis method involves theoretical coding that weaves practical concepts into hypotheses or theories instead of axial coding [ 46 ]. Currently, Strauss and Corbin’s GT method is the most widely used one [ 47 ], and given that different terms are used among scholars, it is crucial to accurately understand the meaning of a term in context instead of solely focusing on the term itself [ 48 ].

The most salient features of GT are that it seeks to generate a new theory from data based on the inductive principle through its analytical framework. This framework enables an understanding of the interaction experience and the structure of its performances [ 40 ]. Furthermore, the above-described characteristics of GT widen the pathway of quantitative researchers to apply GT more than other qualitative research methods [ 43 ], which has resulted in its broader application in healthcare research. GT has been used to explore a wide range of research topics, such as asthma patients’ experiences of disease management [ 48 ], the experiences of cancer patients or their families [ 49 , 50 ], and the experiences of caregivers of patients with cognitive disorders and dementia [ 51 ].

Photovoice, a research methodology initiated by Wang and Burris [ 52 ], has been used to highlight the experiences and perspectives of marginalized people using photos. In other words, photos and their narratives are at the heart of photovoice; this method is designed to make marginalized voices heard. Photovoice, which uses photos to bring to the fore the experiences of participants who have lived a marginalized life, requires the active engagement of the participants. In other research methods, the participants play an essential role in the data collection stage (interview, topic-related materials such as diary and doodle) and the research validation stage (participants’ review). In contrast, in photovoice research, which is classified as participatory action research, participants’ dynamic engagement is essential throughout the study process—from the data collection and analysis procedure to exhibition and policy development [ 53 ].

Specifically, the photovoice research design is as follows [ 54 , 55 ]: First, policymakers or community stakeholders, who will likely bring about practical improvements on the research topic, are recruited. Second, participants with a wealth of experience on a research topic are recruited. In this stage, it should be borne in mind that the drop-out rate is high because participants’ active involvement is required, and the process is relatively time-consuming. Third, the participants are provided with information on the purpose and process of photovoice research, and they are educated on research ethics and the potential risks. Fourth, consent is obtained from the participants for research participation and the use of their photos. Fifth, a brainstorming session is held to create a specific topic within the general research topic. Sixth, researchers select a type of camera and educate the participants on the camera and photo techniques. The characteristics of the camera function (e.g., autofocus and manual focus) should be considered when selecting a camera type (e.g., mobile phone camera, disposable camera, or digital camera). Seventh, participants are given time to take pictures for discussion. Eighth, a discussion is held on the photos provided by the participants. The collected data are managed and analyzed in 3 sub-steps: (1) participants’ photo selection (selecting a photo considered more meaningful or important than other photos); (2) contextualization (analyzing the selected photo and putting the meanings attached to the photo into context); and (3) codifying (categorizing similar photos and meanings among the data collected and summarizing them in writing). In sub-step 2, the “SHOWeD” question skill could be applied to facilitate the discussion [ 56 ]: “What do you S ee here? What’s really H appening here? How does this relate to O ur lives? W hy does this situation, concern, or strength E xist? What can we D o about it?” Ninth, the participants’ summarized experiences related to their respective photos are shared and presented. This process is significant because it provides the participants with an opportunity to exhibit their photos and improve the related topics’ conditions. It is recommended that policymakers or community stakeholders join the roundtable to reflect on the outcomes and discuss their potential involvement to improve the related topics.

Based on the characteristics described above, photovoice has been used in healthcare research since the early 2000s to reveal the experiences of marginalized people, such as the lives of Black lesbian, gay, bisexual, transgender and questioning people [ 57 ] and women with acquired immunodeficiency syndrome [ 58 ], and in studies on community health issues, such as the health status of indigenous women living in a remote community [ 59 ], the quality of life of breast cancer survivors living in rural areas [ 60 ], and healthy eating habits of rural youth [ 61 ].

Qualitative Content Analysis

Content analysis is a research method that can use both qualitative and quantitative methods to derive valid inferences from data [ 62 ]. It can use a wide range of data covering a long period and diverse fields [ 63 ]. It helps compare objects, identify a specific person’s characteristics or hidden intentions, or analyze a specific era’s characteristics [ 64 ]. Quantitative content analysis categorizes research data and analyzes the relationships between the derived categories using statistical methods [ 65 ]. In contrast, qualitative content analysis (QCA) uses data coding to identify categories’ extrinsic and intrinsic meanings. The parallelism of these aspects contributes to establishing the validity of conclusions in content analysis [ 63 ].

Historically, mass media, such as newspapers and news programs, played the role of the locomotive for the development of content analysis. As interest in mass media content dealing with particular events and issues increased, content analysis was increasingly used in research analyzing mass media. In particular, it was also used in various forms to analyze propaganda content during World War II. The subsequent emergence of computer technology led to the revival of various types of content analysis research [ 66 ].

QCA is largely divided into conventional, directed, and summative [ 67 ]. First, conventional content analysis is an inductive method for deriving categories from data without using perceived categories. Key concepts are derived via the coding process by repeatedly reading and analyzing the data collected through open-ended questions. Categorization is then performed by sorting the coded data while checking similarities and differences. Second, directed content analysis uses key concepts or categories extracted from existing theories or studies as the initial coding categories. Unlike conventional content analysis, directed content analysis is closer to a deductive method and is anchored in a more structured process. Summative content analysis, the third approach, not only counts the frequency of keywords or content, but also evaluates their contextual usage and provides qualitative interpretations. It is used to understand the context of a word, along with the frequency of its occurrence, and thus to find the range of meanings that a word can have.

Since there is no concrete set procedure, the content analysis procedure varies among researchers. Some of the typical processes are a 3-step process (preparation, organizing, reporting) proposed by Elo and Kyngäs [ 68 ], a 4-step process (formulating research questions, sampling, coding, analyzing) presented by White and Marsh [ 69 ], and a 6-step process proposed by Krippendorff [ 66 ].

The 6-step content analysis research process proposed by Krippendorff [ 66 ] is as follows: Step 1, unitizing, is a process in which the researcher selects a scheme for classifying the data of interest for data collection and analysis. Step 2, sampling, involves selecting a conceptually representative sample population. In Step 3, recording/coding, the researcher records materials that are difficult to preserve, such as verbal statements, in a way that allows repeated review. Step 4, reducing, refers to simplifying the data into a manageable format using statistical techniques or summaries. Step 5, abductively inferring, involves inferring a phenomenon in the context of a situation to understand the contextual phenomenon while analyzing the data. In Step 6, narrating, the research outcomes are presented in a narrative accessible to the audience. These 6 steps are not subject to a sequential order and may go through a cyclical or iterative process [ 63 ].

As examined above, content analysis is used in several fields due to its advantages of embracing both qualitative and quantitative aspects and processing comprehensive data [ 62 , 70 ]. In recognition of its research potential, the public health field is also increasingly using content analysis research, as exemplified by suicide-related social media content analysis [ 71 ], an analysis of children’s books in association with breast cancer [ 72 ], and an analysis of patients’ medical records [ 73 ].

VALIDATION OF QUALITATIVE RESEARCH

The validation of qualitative research begins when a researcher attempts to persuade others that the research results are worthy of attention [ 35 ]. Several researchers have advanced their arguments in many different ways, from the reason or justification for existence of the validity used in qualitative research to the assessment terms and their meanings [ 74 ]. We explain the validity of qualitative research, focusing on the argument advanced by Guba and Lincoln [ 75 ]. They emphasized that the evaluation of qualitative research is a socio-political process—namely, a researcher should assume the role of a mediator of the judgment process, not that of the judge [ 75 ]. Specifically, Lincoln and Guba [ 75 ] proposed trustworthiness as a validity criterion: credibility, transferability, dependability, and confirmability.

First, credibility is a concept that corresponds to internal validity in quantitative research. To enhance the credibility of qualitative research, a “member check” is used to directly assess whether the reality of the research participants is well-reflected in the raw data, transcripts, and analysis categories [ 76 , 77 ]. Second, transferability corresponds to external validity or generalizability in quantitative research. To enhance the transferability of qualitative research, researchers must describe the data collection and analysis processes in detail and provide thick data on the overall research process, including research participants and the context and culture of research [ 77 , 78 ]. Transferability can also be enhanced by checking whether the analysis results elicit similar feelings in those who have not participated in the study but share similar experiences. Third, dependability corresponds to reliability in quantitative research and is associated with data stability. To enhance the trustworthiness of qualitative research, it is common for multiple researchers to perform the analysis independently; alternatively, or if one researcher has performed the analysis, another researcher reviews the analysis results. Furthermore, a qualitative researcher must provide a detailed and transparent description of the entire research process so that other researchers, internal or external, can evaluate whether the researcher has adequately proceeded with the overall research process. Fourth, confirmability corresponds to objectivity in quantitative research. Bracketing, a process of disclosing and discussing the researcher’s pre-understanding that may affect the research process from the beginning to the end, is conducted to enhance the confirmability of qualitative research. The results of bracketing should be included in the study results so that readers can also track the possible influence [ 77 ].

However, regarding the validity of a qualitative study, it is necessary to consider the research topic, the target audience, and research costs. Caution should also be applied to the proposed theories because presentation methods vary among scholars and researchers. Apart from the methods discussed above, other methods are used to enhance the validity of qualitative research methods, such as prolonged involvement, persistent observation, triangulation, and peer debriefing. In prolonged involvement, a researcher depicts the core of a phenomenon while staying at the study site for a sufficient time to build rapport with the participants and pose a sufficient amount of questions. In persistent observation, a researcher repeatedly reviews and observes data resources until the factors closest to the research topic are identified, giving depth to the study. Triangulation is used to check whether the same results are drawn by a team of researchers who conduct a study using various resources, including individual interviews, talks, and field notes, and discuss their respective analysis processes and results. Lastly, in peer debriefing, research results are discussed with colleagues who have not participated in the study from the beginning to the end, but are well-informed about the research topic or phenomenon [ 76 , 78 ].

This review article examines the characteristics and analysis processes of 6 different qualitative research methodologies. Additionally, a detailed overview of various validation methods for qualitative research is provided. However, a few limitations should be considered when novice qualitative researchers follow the steps in this article. First, as each qualitative research methodology has extensive and unique research approaches and analysis procedures, it should be kept in mind that the priority of this article was to highlight each methodology’s most exclusive elements that essentially compromise the core of its identity. Its scope unfortunately does not include the inch-by-inch steps of individual methodologies—for this information, it would be necessary to review the references included in the section dedicated to each methodology. Another limitation is that this article does not concentrate on the direct comparison of each methodology, which might benefit novice researchers in the process of selecting an adequate methodology for their research topic. Instead, this review article emphasizes the advantages and limitations of each methodology. Nevertheless, this review article is expected to help researchers considering employing qualitative research methodologies in the field of healthcare select an optimal method and conduct a qualitative study properly. It is sincerely hoped that this review article, along with the previous one, will encourage many researchers in the healthcare domain to use qualitative research methodologies.

Ethics Statement

Approval from the institutional review board was not obtained as this study is a review article.

ACKNOWLEDGEMENTS

CONFLICT OF INTEREST

The authors have no conflicts of interest associated with the material presented in this paper.

AUTHOR CONTRIBUTIONS

Conceptualization: Ock M. Literature review: Im D, Pyo J, Lee H, Jung H, Ock M. Funding acquisition: None. Writing – original draft: Im D, Pyo J, Lee H, Jung H, Ock M. Writing – review & editing: Im D, Pyo J, Lee H, Jung H, Ock M.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

data analysis techniques case study

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection  methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

ux research software

Top 17 UX Research Software for UX Design in 2024

Apr 5, 2024

Healthcare Staff Burnout

Healthcare Staff Burnout: What it Is + How To Manage It

Apr 4, 2024

employee retention software

Top 15 Employee Retention Software in 2024

employee development software

Top 10 Employee Development Software for Talent Growth

Apr 3, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Next Gen Data Learning – Amplify Your Skills

Blog Home

Data Analytics Case Study Guide 2023

by Enterprise DNA Experts | Data Analytics

Data Analytics Case Study Guide 2023

Data analytics case studies reveal how businesses harness data for informed decisions and growth.

For aspiring data professionals, mastering the case study process will enhance your skills and increase your career prospects.

So, how do you approach a case study?

Use these steps to process a data analytics case study:

Understand the Problem: Grasp the core problem or question addressed in the case study.

Collect Relevant Data: Gather data from diverse sources, ensuring accuracy and completeness.

Apply Analytical Techniques: Use appropriate methods aligned with the problem statement.

Visualize Insights: Utilize visual aids to showcase patterns and key findings.

Derive Actionable Insights: Focus on deriving meaningful actions from the analysis.

This article will give you detailed steps to navigate a case study effectively and understand how it works in real-world situations.

By the end of the article, you will be better equipped to approach a data analytics case study, strengthening your analytical prowess and practical application skills.

Let’s dive in!

Data Analytics Case Study Guide

Table of Contents

What is a Data Analytics Case Study?

A data analytics case study is a real or hypothetical scenario where analytics techniques are applied to solve a specific problem or explore a particular question.

It’s a practical approach that uses data analytics methods, assisting in deciphering data for meaningful insights. This structured method helps individuals or organizations make sense of data effectively.

Additionally, it’s a way to learn by doing, where there’s no single right or wrong answer in how you analyze the data.

So, what are the components of a case study?

Key Components of a Data Analytics Case Study

Key Components of a Data Analytics Case Study

A data analytics case study comprises essential elements that structure the analytical journey:

Problem Context: A case study begins with a defined problem or question. It provides the context for the data analysis , setting the stage for exploration and investigation.

Data Collection and Sources: It involves gathering relevant data from various sources , ensuring data accuracy, completeness, and relevance to the problem at hand.

Analysis Techniques: Case studies employ different analytical methods, such as statistical analysis, machine learning algorithms, or visualization tools, to derive meaningful conclusions from the collected data.

Insights and Recommendations: The ultimate goal is to extract actionable insights from the analyzed data, offering recommendations or solutions that address the initial problem or question.

Now that you have a better understanding of what a data analytics case study is, let’s talk about why we need and use them.

Why Case Studies are Integral to Data Analytics

Why Case Studies are Integral to Data Analytics

Case studies serve as invaluable tools in the realm of data analytics, offering multifaceted benefits that bolster an analyst’s proficiency and impact:

Real-Life Insights and Skill Enhancement: Examining case studies provides practical, real-life examples that expand knowledge and refine skills. These examples offer insights into diverse scenarios, aiding in a data analyst’s growth and expertise development.

Validation and Refinement of Analyses: Case studies demonstrate the effectiveness of data-driven decisions across industries, providing validation for analytical approaches. They showcase how organizations benefit from data analytics. Also, this helps in refining one’s own methodologies

Showcasing Data Impact on Business Outcomes: These studies show how data analytics directly affects business results, like increasing revenue, reducing costs, or delivering other measurable advantages. Understanding these impacts helps articulate the value of data analytics to stakeholders and decision-makers.

Learning from Successes and Failures: By exploring a case study, analysts glean insights from others’ successes and failures, acquiring new strategies and best practices. This learning experience facilitates professional growth and the adoption of innovative approaches within their own data analytics work.

Including case studies in a data analyst’s toolkit helps gain more knowledge, improve skills, and understand how data analytics affects different industries.

Using these real-life examples boosts confidence and success, guiding analysts to make better and more impactful decisions in their organizations.

But not all case studies are the same.

Let’s talk about the different types.

Types of Data Analytics Case Studies

 Types of Data Analytics Case Studies

Data analytics encompasses various approaches tailored to different analytical goals:

Exploratory Case Study: These involve delving into new datasets to uncover hidden patterns and relationships, often without a predefined hypothesis. They aim to gain insights and generate hypotheses for further investigation.

Predictive Case Study: These utilize historical data to forecast future trends, behaviors, or outcomes. By applying predictive models, they help anticipate potential scenarios or developments.

Diagnostic Case Study: This type focuses on understanding the root causes or reasons behind specific events or trends observed in the data. It digs deep into the data to provide explanations for occurrences.

Prescriptive Case Study: This case study goes beyond analytics; it provides actionable recommendations or strategies derived from the analyzed data. They guide decision-making processes by suggesting optimal courses of action based on insights gained.

Each type has a specific role in using data to find important insights, helping in decision-making, and solving problems in various situations.

Regardless of the type of case study you encounter, here are some steps to help you process them.

Roadmap to Handling a Data Analysis Case Study

Roadmap to Handling a Data Analysis Case Study

Embarking on a data analytics case study requires a systematic approach, step-by-step, to derive valuable insights effectively.

Here are the steps to help you through the process:

Step 1: Understanding the Case Study Context: Immerse yourself in the intricacies of the case study. Delve into the industry context, understanding its nuances, challenges, and opportunities.

Identify the central problem or question the study aims to address. Clarify the objectives and expected outcomes, ensuring a clear understanding before diving into data analytics.

Step 2: Data Collection and Validation: Gather data from diverse sources relevant to the case study. Prioritize accuracy, completeness, and reliability during data collection. Conduct thorough validation processes to rectify inconsistencies, ensuring high-quality and trustworthy data for subsequent analysis.

Data Collection and Validation in case study

Step 3: Problem Definition and Scope: Define the problem statement precisely. Articulate the objectives and limitations that shape the scope of your analysis. Identify influential variables and constraints, providing a focused framework to guide your exploration.

Step 4: Exploratory Data Analysis (EDA): Leverage exploratory techniques to gain initial insights. Visualize data distributions, patterns, and correlations, fostering a deeper understanding of the dataset. These explorations serve as a foundation for more nuanced analysis.

Step 5: Data Preprocessing and Transformation: Cleanse and preprocess the data to eliminate noise, handle missing values, and ensure consistency. Transform data formats or scales as required, preparing the dataset for further analysis.

Data Preprocessing and Transformation in case study

Step 6: Data Modeling and Method Selection: Select analytical models aligning with the case study’s problem, employing statistical techniques, machine learning algorithms, or tailored predictive models.

In this phase, it’s important to develop data modeling skills. This helps create visuals of complex systems using organized data, which helps solve business problems more effectively.

Understand key data modeling concepts, utilize essential tools like SQL for database interaction, and practice building models from real-world scenarios.

Furthermore, strengthen data cleaning skills for accurate datasets, and stay updated with industry trends to ensure relevance.

Data Modeling and Method Selection in case study

Step 7: Model Evaluation and Refinement: Evaluate the performance of applied models rigorously. Iterate and refine models to enhance accuracy and reliability, ensuring alignment with the objectives and expected outcomes.

Step 8: Deriving Insights and Recommendations: Extract actionable insights from the analyzed data. Develop well-structured recommendations or solutions based on the insights uncovered, addressing the core problem or question effectively.

Step 9: Communicating Results Effectively: Present findings, insights, and recommendations clearly and concisely. Utilize visualizations and storytelling techniques to convey complex information compellingly, ensuring comprehension by stakeholders.

Communicating Results Effectively

Step 10: Reflection and Iteration: Reflect on the entire analysis process and outcomes. Identify potential improvements and lessons learned. Embrace an iterative approach, refining methodologies for continuous enhancement and future analyses.

This step-by-step roadmap provides a structured framework for thorough and effective handling of a data analytics case study.

Now, after handling data analytics comes a crucial step; presenting the case study.

Presenting Your Data Analytics Case Study

Presenting Your Data Analytics Case Study

Presenting a data analytics case study is a vital part of the process. When presenting your case study, clarity and organization are paramount.

To achieve this, follow these key steps:

Structuring Your Case Study: Start by outlining relevant and accurate main points. Ensure these points align with the problem addressed and the methodologies used in your analysis.

Crafting a Narrative with Data: Start with a brief overview of the issue, then explain your method and steps, covering data collection, cleaning, stats, and advanced modeling.

Visual Representation for Clarity: Utilize various visual aids—tables, graphs, and charts—to illustrate patterns, trends, and insights. Ensure these visuals are easy to comprehend and seamlessly support your narrative.

Visual Representation for Clarity

Highlighting Key Information: Use bullet points to emphasize essential information, maintaining clarity and allowing the audience to grasp key takeaways effortlessly. Bold key terms or phrases to draw attention and reinforce important points.

Addressing Audience Queries: Anticipate and be ready to answer audience questions regarding methods, assumptions, and results. Demonstrating a profound understanding of your analysis instills confidence in your work.

Integrity and Confidence in Delivery: Maintain a neutral tone and avoid exaggerated claims about findings. Present your case study with integrity, clarity, and confidence to ensure the audience appreciates and comprehends the significance of your work.

Integrity and Confidence in Delivery

By organizing your presentation well, telling a clear story through your analysis, and using visuals wisely, you can effectively share your data analytics case study.

This method helps people understand better, stay engaged, and draw valuable conclusions from your work.

We hope by now, you are feeling very confident processing a case study. But with any process, there are challenges you may encounter.

Key Challenges in Data Analytics Case Studies

Key Challenges in Data Analytics Case Studies

A data analytics case study can present various hurdles that necessitate strategic approaches for successful navigation:

Challenge 1: Data Quality and Consistency

Challenge: Inconsistent or poor-quality data can impede analysis, leading to erroneous insights and flawed conclusions.

Solution: Implement rigorous data validation processes, ensuring accuracy, completeness, and reliability. Employ data cleansing techniques to rectify inconsistencies and enhance overall data quality.

Challenge 2: Complexity and Scale of Data

Challenge: Managing vast volumes of data with diverse formats and complexities poses analytical challenges.

Solution: Utilize scalable data processing frameworks and tools capable of handling diverse data types. Implement efficient data storage and retrieval systems to manage large-scale datasets effectively.

Challenge 3: Interpretation and Contextual Understanding

Challenge: Interpreting data without contextual understanding or domain expertise can lead to misinterpretations.

Solution: Collaborate with domain experts to contextualize data and derive relevant insights. Invest in understanding the nuances of the industry or domain under analysis to ensure accurate interpretations.

Interpretation and Contextual Understanding

Challenge 4: Privacy and Ethical Concerns

Challenge: Balancing data access for analysis while respecting privacy and ethical boundaries poses a challenge.

Solution: Implement robust data governance frameworks that prioritize data privacy and ethical considerations. Ensure compliance with regulatory standards and ethical guidelines throughout the analysis process.

Challenge 5: Resource Limitations and Time Constraints

Challenge: Limited resources and time constraints hinder comprehensive analysis and exhaustive data exploration.

Solution: Prioritize key objectives and allocate resources efficiently. Employ agile methodologies to iteratively analyze and derive insights, focusing on the most impactful aspects within the given timeframe.

Recognizing these challenges is key; it helps data analysts adopt proactive strategies to mitigate obstacles. This enhances the effectiveness and reliability of insights derived from a data analytics case study.

Now, let’s talk about the best software tools you should use when working with case studies.

Top 5 Software Tools for Case Studies

Top Software Tools for Case Studies

In the realm of case studies within data analytics, leveraging the right software tools is essential.

Here are some top-notch options:

Tableau : Renowned for its data visualization prowess, Tableau transforms raw data into interactive, visually compelling representations, ideal for presenting insights within a case study.

Python and R Libraries: These flexible programming languages provide many tools for handling data, doing statistics, and working with machine learning, meeting various needs in case studies.

Microsoft Excel : A staple tool for data analytics, Excel provides a user-friendly interface for basic analytics, making it useful for initial data exploration in a case study.

SQL Databases : Structured Query Language (SQL) databases assist in managing and querying large datasets, essential for organizing case study data effectively.

Statistical Software (e.g., SPSS , SAS ): Specialized statistical software enables in-depth statistical analysis, aiding in deriving precise insights from case study data.

Choosing the best mix of these tools, tailored to each case study’s needs, greatly boosts analytical abilities and results in data analytics.

Final Thoughts

Case studies in data analytics are helpful guides. They give real-world insights, improve skills, and show how data-driven decisions work.

Using case studies helps analysts learn, be creative, and make essential decisions confidently in their data work.

Check out our latest clip below to further your learning!

Frequently Asked Questions

What are the key steps to analyzing a data analytics case study.

When analyzing a case study, you should follow these steps:

Clarify the problem : Ensure you thoroughly understand the problem statement and the scope of the analysis.

Make assumptions : Define your assumptions to establish a feasible framework for analyzing the case.

Gather context : Acquire relevant information and context to support your analysis.

Analyze the data : Perform calculations, create visualizations, and conduct statistical analysis on the data.

Provide insights : Draw conclusions and develop actionable insights based on your analysis.

How can you effectively interpret results during a data scientist case study job interview?

During your next data science interview, interpret case study results succinctly and clearly. Utilize visual aids and numerical data to bolster your explanations, ensuring comprehension.

Frame the results in an audience-friendly manner, emphasizing relevance. Concentrate on deriving insights and actionable steps from the outcomes.

How do you showcase your data analyst skills in a project?

To demonstrate your skills effectively, consider these essential steps. Begin by selecting a problem that allows you to exhibit your capacity to handle real-world challenges through analysis.

Methodically document each phase, encompassing data cleaning, visualization, statistical analysis, and the interpretation of findings.

Utilize descriptive analysis techniques and effectively communicate your insights using clear visual aids and straightforward language. Ensure your project code is well-structured, with detailed comments and documentation, showcasing your proficiency in handling data in an organized manner.

Lastly, emphasize your expertise in SQL queries, programming languages, and various analytics tools throughout the project. These steps collectively highlight your competence and proficiency as a skilled data analyst, demonstrating your capabilities within the project.

Can you provide an example of a successful data analytics project using key metrics?

A prime illustration is utilizing analytics in healthcare to forecast hospital readmissions. Analysts leverage electronic health records, patient demographics, and clinical data to identify high-risk individuals.

Implementing preventive measures based on these key metrics helps curtail readmission rates, enhancing patient outcomes and cutting healthcare expenses.

This demonstrates how data analytics, driven by metrics, effectively tackles real-world challenges, yielding impactful solutions.

Why would a company invest in data analytics?

Companies invest in data analytics to gain valuable insights, enabling informed decision-making and strategic planning. This investment helps optimize operations, understand customer behavior, and stay competitive in their industry.

Ultimately, leveraging data analytics empowers companies to make smarter, data-driven choices, leading to enhanced efficiency, innovation, and growth.

data analysis techniques case study

Related Posts

The Importance of Data Analytics in Today’s World

The Importance of Data Analytics in Today’s World

Data Analytics , Power BI

In today’s data-driven world, the role of data analytics has never been more crucial. Data analytics is...

4 Types of Data Analytics: Explained

4 Types of Data Analytics: Explained

Data Analytics

In a world full of data, data analytics is the heart and soul of an operation. It's what transforms raw...

data analysis techniques case study

How long until building complaints are dispositioned? A survival analysis case study

Learn how to use tidymodels for survival analysis.

Introduction

To use code in this article, you will need to install the following packages: aorsf, censored, glmnet, modeldatatoo, and tidymodels.

Survival analysis is a field of statistics and machine learning for analyzing the time to an event. While it has its roots in medical research, the event of interest can be anything from customer churn to machine failure. Methods from survival analysis take into account that some observations may not yet have experienced the event of interest and are thus censored .

Here we want to predict the time it takes for a complaint to be dispositioned 1 by the Department of Buildings in New York City. We are going to walk through a complete analysis from beginning to end, showing how to analyze time-to-event data.

Let’s start with loading the tidymodels and censored packages (the parsnip extension package for survival analysis models).

The buildings complaints data

The city of New York publishes data on the complaints received by the Department of Buildings. The data includes information on the type of complaint, the date it was entered in their records, the date it was dispositioned, and the location of the building the complaint was about. We are using a subset of the data, available in the modeldatatoo package.

Before we dive into survival analysis, let’s get a impression of how the complaints are distributed across the city. We have complaints in all five boroughs, albeit with a somewhat lower density of complaints in Staten Island.

Building complaints in New York City (closed complaints in purple, active complaints in pink).

In the dataset, we can see the days_to_disposition as well as the status of the complaint. For a complaint with the status "ACTIVE" , the time to disposition is censored, meaning we do know that it has taken at least that long, but not how long for it to be completely resolved.

The standard form for time-to-event data are Surv objects which capture the time as well as the event status. As with all transformations of the response, it is advisable to do this before heading into the model fitting process with tidymodels.

Data splitting and resampling

For our resampling strategy, let’s use a 3-way split into training, validation, and test set.

First, let’s pull out the training data and have a brief look at the response using a Kaplan-Meier curve .

We can see that the majority of complaints is dispositioned relatively quickly, but some complaints are still active after 100 days.

A first model

The censored package includes parametric, semi-parametric, and tree-based models for this type of analysis. To start, we are fitting a parametric survival model with the default of assuming a Weibull distribution on the time to disposition. We’ll explore the more flexible models once we have a sense of how well this more restrictive model performs on this dataset.

We have several missing values in complaint_priority that we are turning into a separate category, "unknown" . We are also combining the less common categories for community_board and unit into an "other" category to reduce the number of levels in the predictors. The complaint category often does not tell us much more than the unit, with several complaint categories being handled by a specific unit only. This can lead to the model being unable to estimate some of the coefficients. Since our goal here is only to get a rough idea of how well the model performs, we are removing the complaint category for now.

We combine the recipe and the model into a workflow. This allows us to easily resample the model because all preprocessing steps are applied to the training set and the validation set for us.

To fit and evaluate the model, we need the training and validation sets. While we can access them each on their own, validation_set() extracts them both, in a manner that emulates a single resample of the data. This enables us to use fit_resamples() and other tuning functions in the same way as if we had used some other resampling scheme (such as cross-validation).

We are calculating several performance metrics: the Brier score, its integrated version, the area under the ROC curve, and the concordance index. Note that all of these are used in a version tailored to survival analysis. The concordance index uses the predicted event time to measure the model’s ability to rank the observations correctly. The Brier score and the ROC curve use the predicted probability of survival at a given time. We evaluate these metrics every 30 days up to 300 days, as provided in the eval_time argument. The Brier score is a measure of the accuracy of the predicted probabilities, while the ROC curve is a measure of the model’s ability to discriminate between events and non-events at the given time point. Because these metrics are defined “at a given time,” they are also referred to as dynamic metrics .

For more information see the Dynamic Performance Metrics for Event Time Data article.

The structure of survival model predictions is slightly different from classification and regression model predictions:

The predicted survival time is in the .pred_time column and the predicted survival probabilities are in the .pred list column.

For each observation, .pred contains a tibble with the evaluation time .eval_time and the corresponding survival probability .pred_survival . The column .weight_censored contains the weights used in the calculation of the dynamic performance metrics.

For details on the weights see the Accounting for Censoring in Performance Metrics for Event Time Data article.

Of the metrics we calculated with these predictions, let’s take a look at the AUC ROC first.

We can discriminate between events and non-events reasonably well, especially in the first 30 and 60 days. How about the probabilities that the categorization into event and non-event is based on?

The accuracy of the predicted probabilities is generally good, albeit lowest for evaluation times of 30 and 60 days. The integrated Brier score is a measure of the overall accuracy of the predicted probabilities.

Which metric to optimise for depends on whether separation or calibration is more important in the modeling problem at hand. We’ll go with calibration here. Since we don’t have a particular evaluation time that we want to predict well at, we are going to use the integrated Brier score as our main performance metric.

Try out more models

Lumping factor levels together based on frequencies can lead to a loss of information so let’s also try some different approaches. We can let a random forest model group the factor levels via the tree splits. Alternatively, we can turn the factors into dummy variables and use a regularized model to select relevant factor levels.

First, let’s create the recipes for these two approaches:

Next, let’s create the model specifications and tag several hyperparameters for tuning. For the random forest, we are using the "aorsf" engine for accelerated oblique random survival forests. An oblique tree can split on linear combinations of the predictors, i.e., it provides more flexibility in the splits than a tree which splits on a single predictor. For the regularized model, we are using the "glmnet" engine for a semi-parametric Cox proportional hazards model.

We can tune workflows with any of the tune_*() functions such as tune_grid() for grid search or tune_bayes() for Bayesian optimization. Here we are using grid search for simplicity.

So do any of these models perform better than the parametric survival model?

The best regularized Cox model performs a little better than the parametric survival model, with an integrated Brier score of 0.0496 compared to 0.0512 for the parametric model. The random forest performs yet a little better with an integrated Brier score of 0.0468.

The final model

We chose the random forest model as the final model. So let’s finalize the workflow by replacing the tune() placeholders with the best hyperparameters.

We can now fit the final model on the training data and evaluate it on the test data.

The Brier score across the different evaluation time points is also very similar between the validation set and the test set.

To finish, we can extract the fitted workflow to either predict directly on new data or deploy the model.

For more information on survival analysis with tidymodels see the survival analysis tag .

Session information

In this context, the term disposition means that there has been a decision or resolution regarding the complaint that is the conclusion of the process. ↩︎

A case study evolving quality management in Indian civil engineering projects using AI techniques: a framework for automation and enhancement

  • Published: 02 April 2024

Cite this article

  • Kaushal Kumar 1 ,
  • Saurav Dixit 2 ,
  • Umank Mishra 3 &
  • Nikolai Ivanovich Vatin 4 , 5  

Explore all metrics

The present research examines a wide range of civil engineering projects across India, each providing a distinct platform for investigating quality management, automation techniques, and improvement activities using artificial intelligence (AI) techniques. The study covers projects demonstrating the variety of India’s civil engineering undertakings, from the Smart City Mission to the Mumbai Metro Line 3 and the Chennai-Madurai Expressway. The adoption of quality management techniques, including ISO 9001 Certification, Lean Construction, Six Sigma, Building Information Modeling (BIM), and Total Quality Management (TQM), is evaluated in the projects. In this case study, experimental datasets and employed AI techniques such as Artificial Neural Networks (ANN) are used to predict accurate outcomes. It was also observed that more variation in the regression coefficient (R 2 ) and errors (MSE) from 1 to 5 hidden layer nodes. While hidden layer nodes 6 to 10 performed stable outcomes. Out of them, hidden layer node 9 performed best of the best regression coefficient (R 2  = 99.4%) with minimum error (MSE = 0.04). The comple investigation of the outcomes indicating towards the suitability of the existing model as an important one for accurately predicting the UCS. A thorough framework for improving quality management in Indian civil engineering projects is the research’s final product, and it offers insightful information to industry stakeholders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

data analysis techniques case study

Data availability

No datasets were generated or analysed during the current study.

Arora, R., Kumar, K., & Dixit, S. (2023). Comparative analysis of the infuence of partial replacement of cement with supplementing cementitious materials in sustainable concrete using machine learning approach. Asian Journal of Civil Engineering . https://doi.org/10.1007/s42107-023-00858-0 .

Article   Google Scholar  

Dong, W., Huang, Y., Lehane, B., & Ma, G. (2020). XGBoost algorithmbased prediction of concrete electrical resistivity for structural health monitoring. Automation in Construction , 114 , 103155. https://doi.org/10.1016/j.autcon.2020.103155 .

Feng, D. C., Liu, Z. T., Wang, X. D., Chen, Y., Chang, J. Q., Wei, D. F., & Jiang, Z. M. (2020). Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Construction and Building Materials , 230 , 117000. https://doi.org/10.1016/j.conbuildmat.2019.117000 .

Iranmanesh, A., & Kaveh, A. (1999). Structural optimization by gradient based neural networks. International Journal for Numerical Methods in Engineering , 46 (2), 297–311.

Kaveh, A., Elmieh, R., & Servati, H. (2001). Prediction of moment rotation characteristic for semi-rigid connections using BP neural networks. Asian Journal of Civil Engineering , 2 (2), 131–142.

Google Scholar  

Kaveh, A., Gholipour, Y., & Rahami, H. (2008). Optimal design of transmission towers using genetic algorithm and neural networks. International Journal of Space Structures , 23 (1), 1–19. https://doi.org/10.1260/026635108785342073 .

Kaveh, A. (2016). Advances in metaheuristic algorithms for optimal design of structures (2nd ed., pp. 1–631). Springer. https://doi.org/10.1007/978-3-319-46173-1/COVER .

Kaveh, A., Seddighian, M. R., & Ghanadpour, E. (2020). Black hole mechanics optimization: A novel meta-heuristic algorithm. Asian Journal of Civil Engineering , 21 , 1129–1149. https://doi.org/10.1007/s42107-020-00282-8 .

Kaveh, A., Eskandari, A., & Movasat, M. (2023). Buckling resistance prediction of high-strength steel columns using metaheuristic-trained artifcial neural networks. Structures , 56 (C), 104853.

Kaveh, A., & Khavaninzadeh, N. (2023). Efficient training of two ANNs using four meta-heuristic algorithms for predicting the FRP strength. Structures , 52 , 256–272. https://doi.org/10.1016/J.ISTRUC.2023.03.178 .

Kim, J. H. (2022). Smart city trends: A focus on 5 countries and 15 companies. Cities , 123 , 103551. https://doi.org/10.1016/j.cities.2021.103551 .

Kumar, A., Yadav, U. S., Yadav, G. P., & Tripathi, R. (2023). New sustainable ideas for materialistic solutions of smart city in India: A review from allahabad city. Mater Today Proc . https://doi.org/10.1016/j.matpr.2023.08.057 .

Kumar, K., Arora, R., Tipu, R. K., Dixit, S., Vatin, N., & Arya, S. (2024). Infuence of machine learning approaches for partial replacement of cement content through waste in construction sector. Asian Journal of Civil Engineering . https://doi.org/10.1007/s42107-023-00972-z .

Mortaheb, R., & Jankowski, P. (2023). Smart city re-imagined: City planning and GeoAI in the age of big data. Journal of Urban Management , 12 , 4–15. https://doi.org/10.1016/j.jum.2022.08.001 .

Peraka, N. S. P., & Biligiri, K. P. (2020). Pavement asset management systems and technologies: A review, Autom Constr. 119 (2020). https://doi.org/10.1016/j.autcon.2020.103336 .

Shashi, P., Centobelli, R., Cerchione, M., Ertz, & Oropallo, E. (2023). What we learn is what we earn from sustainable and circular construction, Journal of Cleaner Production. 382 (2023). https://doi.org/10.1016/j.jclepro.2022.135183 .

Singh, V., & Mirzaeifar, S. (2020). Assessing transactions of distributed knowledge resources in modern construction projects – A transactive memory approach, Autom Constr. 120 (2020). https://doi.org/10.1016/j.autcon.2020.103386 .

Singh, A. K., Kumar, V. R. P., Dehdasht, G., Mohandes, S. R., Manu, P., & Rahimian, F. P. (2023). Investigating the barriers to the adoption of blockchain technology in sustainable construction projects. Journal of Cleaner Production , 403 (2023). https://doi.org/10.1016/j.jclepro.2023.136840 .

Tipu, R. K., Panchal, V. R., & Pandya, K. S. (2022). An ensemble approach to improve BPNN model precision for predicting compressive strength of high-performance concrete. Structures , 45 , 500–508. https://doi.org/10.1016/j.istruc.2022.09.046 .

Tipu, R. K., Arora, R., & Kumar, K. (2023). Machine learning-based prediction of concrete strength properties with coconut shell as partial aggregate replacement: A sustainable approach in construction engineering. Asian Journal of Civil Engineering . https://doi.org/10.1007/s42107-023-00957-y .

Tipu, R. K., & Batra, V. (2023). Enhancing prediction accuracy of workability and compressive strength of high-performance concrete through extended dataset and improved machine learning models. Asian Journal of Civil Engineering . https://doi.org/10.1007/s42107-023-00768-1 .

Verma, A., Gupta, V., Nihar, K., Jana, A., Jain, R. K., & Deb, C. (2023). Tropical climates and the interplay between IEQ and energy consumption in buildings: A review. Building and Environment , 242. https://doi.org/10.1016/j.buildenv.2023.110551 .

Download references

Acknowledgements

The authors are thankful to Lovely profession University, Jalandhar, an autonomous organization Punjab, India, for providing basic data set for analysis to carrying out this study.

This research was also funded by the Ministry of Science and Higher Education of the Russian Federation within the framework of the state assignment No. 075-03-2022-010 dated 14 January 2022 and No. 075– 01568-23-04 dated 28 March 2023(Additional agreement 075-03-2022- 010/10 dated 09 November 2022, Additional agreement 075-03-2023- 004/4 dated 22 May 2023), FSEG-2022-0010.

Author information

Authors and affiliations.

Department of Mechanical Engineering, K. R. Mangalam University, Gurugram, Haryana, 122103, India

Kaushal Kumar

Division of Research and Development, Lovely Professional University, Phagwara, Punjab, 144401, India

Saurav Dixit

Department of Civil Engineering, Shri Shankaracharya Technical Campus, Bhilai, Chhattisgarh, 490020, India

Umank Mishra

Peter The Great St. Petersburg Polytechnic University, Saint Petersburg, 195251, Russia

Nikolai Ivanovich Vatin

Division of Research and Innovation, Uttaranchal University, Dehradun, India

You can also search for this author in PubMed   Google Scholar

Contributions

Author contributions: K.K. wrote main manuscript text while K.K. and S.D. provide the methodology and U.M., N.V. reviewed the manuscript.

Corresponding author

Correspondence to Kaushal Kumar .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Kumar, K., Dixit, S., Mishra, U. et al. A case study evolving quality management in Indian civil engineering projects using AI techniques: a framework for automation and enhancement. Asian J Civ Eng (2024). https://doi.org/10.1007/s42107-024-01029-5

Download citation

Received : 21 February 2024

Accepted : 06 March 2024

Published : 02 April 2024

DOI : https://doi.org/10.1007/s42107-024-01029-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence (AI)
  • Automation tools
  • Building information modeling (BIM)
  • Enhancement initiatives
  • Indian projects
  • Quality management
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 02 April 2024

Warning sign of an accelerating decline in critically endangered killer whales ( Orcinus orca )

  • Rob Williams   ORCID: orcid.org/0000-0001-7496-453X 1 ,
  • Robert C. Lacy 2 ,
  • Erin Ashe 1 ,
  • Lance Barrett-Lennard 3 ,
  • Tanya M. Brown 4 ,
  • Joseph K. Gaydos   ORCID: orcid.org/0000-0001-6599-8797 5 ,
  • Frances Gulland   ORCID: orcid.org/0000-0002-6416-0156 6 ,
  • Misty MacDuffee 3 ,
  • Benjamin W. Nelson 7 ,
  • Kimberly A. Nielsen   ORCID: orcid.org/0000-0002-6019-2919 1 ,
  • Hendrik Nollens 8 ,
  • Stephen Raverty   ORCID: orcid.org/0000-0003-2879-073X 9 ,
  • Stephanie Reiss 1 ,
  • Peter S. Ross 3 ,
  • Marena Salerno Collins 1 ,
  • Raphaela Stimmelmayr 10 &
  • Paul Paquet   ORCID: orcid.org/0000-0002-4844-2559 11  

Communications Earth & Environment volume  5 , Article number:  173 ( 2024 ) Cite this article

896 Accesses

93 Altmetric

Metrics details

  • Ecological modelling

Wildlife species and populations are being driven toward extinction by a combination of historic and emerging stressors (e.g., overexploitation, habitat loss, contaminants, climate change), suggesting that we are in the midst of the planet’s sixth mass extinction. The invisible loss of biodiversity before species have been identified and described in scientific literature has been termed, memorably, dark extinction. The critically endangered Southern Resident killer whale ( Orcinus orca ) population illustrates its contrast, which we term bright extinction; namely the noticeable and documented precipitous decline of a data-rich population toward extinction. Here we use a population viability analysis to test the sensitivity of this killer whale population to variability in age structure, survival rates, and prey-demography functional relationships. Preventing extinction is still possible but will require greater sacrifices on regional ocean use, urban development, and land use practices, than would have been the case had threats been mitigated even a decade earlier.

Similar content being viewed by others

data analysis techniques case study

Repatriation of a historical North Atlantic right whale habitat during an era of rapid climate change

O. O’Brien, D. E. Pendleton, … J. V. Redfern

data analysis techniques case study

Enhanced, coordinated conservation efforts required to avoid extinction of critically endangered Eastern Pacific leatherback turtles

The Laúd OPO Network

data analysis techniques case study

Historical reconstruction of the population dynamics of southern right whales in the southwestern Atlantic Ocean

M. A. Romero, M. A. Coscarella, … E. A. Crespo

Introduction

Challenges in conservation biology are generally assigned into small population or declining population paradigms 1 . Resource management typically distinguishes between decisions to protect the welfare of individuals and those to promote recovery of populations 2 . Below a critical threshold, populations become sufficiently small that demographic stochasticity (i.e., random fluctuations in birth and death rates) can result in extinction, even when the average population growth rate is positive 3 . Many of these extinction events are taking place undocumented, before a species has even been described scientifically, in a process termed memorably as dark extinction 4 . The concept of dark extinction could lead some to conclude falsely that extinction is largely an information deficit problem. In other words, if only we knew that a population or species were declining toward extinction, we would step in to mitigate anthropogenic stressors and reverse declines. In our experience, many populations and species are declining toward extinction in plain sight. We call this latter process a bright extinction, with thanks to Boehm and colleagues for inspiring the term.

Small populations can persist despite large variability in environmental conditions around some long-term stationary state, whereas a deteriorating trend in environmental conditions increases extinction risk in small populations 5 . Drake and Griffen hypothesize that “environmental degradation may cause a tipping point in population dynamics, corresponding to a bifurcation in the underlying population growth equations, beyond which decline to extinction is almost certain” 5 . In practice, demographic parameters of wild populations are rarely estimated with sufficient precision to detect these early warning signs (a bifurcation in population rates of change) until a decline may be irreversible 6 , 7 . Evidence-based conservation requires knowledge of demographic rates, as well as natural and anthropogenic influences on those rates, to guide timely and effective interventions 5 , 8 , 9 . While improved tools for data analyses to assess conservation status and extinction risks are needed urgently to protect data-poor species and populations 10 , 11 , not all extinctions can be attributed to an information deficit alone 12 , 13 . To complicate matters further, the threats that lead to a legal listing recognizing a population’s endangered status may not represent the same drivers likely to lead to population recovery. Instead, wildlife population dynamics and risk of extinction can be the net result of multiple concurrent, persistent, interacting, and evolving drivers that include both natural ecological and anthropogenic factors 14 , 15 .

Population assessment of Southern Resident killer whales (SRKW, Orcinus orca ) is extremely data-rich compared with those of many other wild mammals. These whales represent the smallest (75 individuals 16 ) of four separate, non-interbreeding, behaviorally, and culturally distinct, fish- eating ecotypes of killer whales in the eastern North Pacific Ocean. Every individual in the population has been censused annually by the Center for Whale Research and colleagues since the 1970s 17 . Depleted in the 1960s and 1970s by an unsustainable live-capture fishery for aquaria displays, the population has failed to recover due to a combination of sublethal and lethal stressors, including reduced availability and quality of Chinook salmon ( Oncorhynchus tshawytscha ), its preferred prey; noise, which further reduces foraging efficiency 18 ; contaminant exposure, which is associated with decreased fecundity, increased calf mortality, and other adverse effects 19 , 20 ; and vessel strikes 21 . The whales’ preferred prey, Chinook salmon, are themselves heavily depleted, and the ability of Chinook salmon stocks to support survival, let alone recovery of SRKW has been in question for over two decades 22 , 23 . Years with low Chinook salmon abundance are temporally associated with low SRKW reproduction and survival 22 , 23 . Ensuring recovery of SRKW and the salmon on which they depend hinges on explicit recognition of the costs and conflicts associated with recovery of predator and prey alike 24 .

Results and discussion

Given observed demographic rates over the last 40 years, the baseline population dynamics model predicts a mean annual population decline of roughly 1% (Fig.  1a, b ). This average decline is characterized by gradual reduction for roughly two generations (~40 years), followed by a stereotypical period of accelerating decline that presages extinction (Fig.  1 ). This baseline model is optimistic, because all evidence suggests that natural and anthropogenic drivers of population status are dynamic, transient, and multifactorial, and many threats are expected to worsen in future.

figure 1

Population growth rate (r) (Fig. 1 a ) and number of whales and proportion of current gene diversity projected (Fig. 1 b ) over 100 years and averaged across 1000 iterations of the Baseline model of the SRKW population. The expected growth rate is in blue, the projected decline is in red, and the horizontal dashed line represents the mean rate. Note the bifurcation around 50 years (two killer whale generations) indicative of an accelerating decline, even without accounting for increasing threats 5 . Shading represents the 95% confidence intervals around SRKW abundance (dark blue line) and gene diversity (light blue line).

By using more recent data, the aforementioned relationships between interannual variability in Chinook salmon and SRKW survival and fecundity 25 are changing enough that we predict that prey-mediated changes in SRKW survival and reproduction (Fig.  2a, b ) are likely to lead to even more dramatic declines in the coming decades than the prior baseline model suggests (Fig.  3 ). Our analyses reveal that the population shows lower recovery potential than previously estimated, due to reduced leverage of prey availability on SRKW demography, adverse stochastic effects (e.g., few female offspring in recent years, mortality from vessel strikes), and potentially amplifying effects of inbreeding 20 , 26 , 27 . Ultimately, overexploitation caused the initial decline, but proximate effects of habitat degradation and loss (and possibly destruction) are inhibiting SRKW recovery 19 . The whales are also obligate prey specialists on the largest, fattiest Chinook salmon, which limits their ability to adapt to a changing environment. Accordingly, SRKW epitomize the naturally rare, wide-ranging or broadly distributed species that may be hardest to protect.

figure 2

Annual survival rates (Fig. 2 a ) and reproductive rates (the proportion of breeding age females producing a calf) (Fig. 2 b ) for SRKW of different age-sex classes (Table  1 ) predicted from logistic regressions against the Chinook salmon prey abundance. Calf survival is in yellow, post-reproductive female is in red, older male is in green, older female is in orange, subadult survival is in dark blue, young female is in light blue, and young male is in blue.

figure 3

Spider plot showing relative impacts of the 5 most influential factors affecting SRKW population growth. The x-axis is scaled for each factor so that the Baseline value is set to 50, and the range scaled from 0 to 100. (See Table  2 for definitions of factors and ranges tested.) Chinook abundance expected due to climate change is in red, the Chinook abundance index is in yellow, noise is in light blue, the PCB accumulation rate is in blue, preventable deaths is in orange, and total PCBs plus other contaminants is in dark blue. Other factors listed in Table  2 had lesser impacts on SRKW population growth, and their relative impacts are provided in  Supplementary Notes and Supplementary Fig.  S20 .

Immediate, multidisciplinary approaches, including supporting Chinook salmon recovery and appropriate veterinary interventions when indicated, will be necessary to stabilize the population (Fig.  4 ). Although no single scenario can help SRKWs reach one stated recovery objective of 2.3% sustained growth over 28 years, concerted efforts can reverse the decline and possibly reach 1% annual recovery. Slowing or halting the population decline might provide opportunities to develop and implement new strategies to mitigate and facilitate recovery of SRKW that are not yet feasible. In a population of 75 individuals, a single birth or death represents an annual population growth or decline of 1.4%, underscoring the value of each individual in preventing the disappearance of a population.

figure 4

Projections of SRKW population size, averaged across 1000 iterations for six scenarios that range from optimal to pessimistic: “Road to recovery” (in blue) assumes direct and indirect human impacts on the whales and their habitats re removed (1.5× Chinook, no climate change effects, no noise, human-caused mortalities prevented, no PCBs or other contaminants); “Slow recovery” (in yellow) assumes lesser but still considerable improvements to threats (1.3× Chinook, no climate change, no noise, no human-caused mortalities, environmental PCBs reduced with 25-year half-life); “Persistence” (in light blue) assumes each threat reduced to half as much as in “Slow recovery”; “Current decline” (in orange) is the Baseline; “Decline toward extinction” (in dark blue) adds further threats (8% reduction in prey size, climate change decimating Chinook salmon stocks, total contaminants 1.67× PCB, a low probability of oil catastrophic spills); “Worst case” (in red) adds further plausible increases in threats (0.7× Chinook, noise disturbance 100% of time, oil spills at higher frequency).

Recovery considerations

Treating individual wild animals to promote population recovery only benefits conservation when individual animals are known and populations are small enough for individual survival to make a considerable difference 28 , such as in the recovery of habituated mountain gorillas 29 , Ethiopian wolves 30 , and Hawaiian monk seals 31 . For SRKWs to attain a 1% population growth, non-invasive diagnostic investigations, informed clinical intervention, and ongoing post treatment monitoring of animals that present with serious morbidity or clinical disease is warranted. This extreme conservation measure enables humans to reduce mortality in high value animals, such as reproductively active females. When feasible, post mortem examination of stranded SRKWs is critical to inform future clinical decisions and management options. Interventions should be rank ordered and those injuries attributed to human activities, such as vessel strike or net, line or hook entanglement, or potential oil exposure are priorities that may warrant immediate intervention 21 . Other future interventions may include remote administration of antiparasitic drugs and other treatments 32 for treating disease 33 . It may be time to discuss more drastic measures, including pre-emptive vaccination to protect individuals against pathogens with known high morbidity and mortality rates among cetaceans (e.g., cetacean morbillivirus, Brucella cetorum , Toxoplasma gondii ) 34 . We encourage transboundary and inter- agency discussions to coordinate emergency plans for veterinary intervention, including permits and decision trees, before a high-profile crisis necessitates implementation. Emergency veterinary intervention plans could be modeled on similar bilateral, multi-agency plans to respond to an oil spill in these transboundary waters 35 . With timely and effective management actions (such as mandated reduced vessel speeds near whales to minimize vessel strike) to reduce human-caused mortality 36 , we estimate that up to 28% of natural mortality could be deferred each year (Fig.  4 , Supplementary Notes ). Given the delay between medical interventions and demographic changes (e.g., survival, growth, fecundity, abundance) regular evaluation of short-term health benchmarks (e.g., body condition, reproductive potential of existing population, behavior, pregnancies, etc.) is critical to strike a balance between risk and reward of any particular intervention.

Southern Resident killer whales are known to be among the most contaminated marine mammals in the world, with polychlorinated biphenyl (PCB) concentrations readily exceeding established thresholds for health effects, including growth and development, immune function, and reproductive performance 37 . However, PCBs are an important, but not exclusive, contaminant class found in SRKW. Despite their phase-out under the terms of the international Stockholm Convention on Persistent Organic Pollutants (POPs), the persistence of PCBs in the marine environment and resistance to metabolic elimination means that it will take decades before this population is considered to be ‘safe’ from PCB and other legacy contaminant-related health effects 38 . This lag between mitigation and benefits to wildlife, together with the co-occurrence of many other contaminants, suggest that threats attributable to POPs will decline, but the population consequences will linger. This lag time was accounted for using the predicted PCB level trends in this killer whale population 38 and a 1.75× factor was applied to our previously modeled population impact attributed to PCBs to capture the contribution and associated risk of other POPs, including legacy organochlorine pesticides (OCPs). This 1.75× factor was derived from a endocrine disruption risk-based quotient for local harbor seals ( Phoca vitulina ) 27 , a species that has been used previously as a surrogate to characterize Resident killer whale contaminant levels and risk in the North Pacific 39 . Contaminant mitigation alone will be insufficient to promote population growth, but should be considered as one pillar of a comprehensive, ‘action’-oriented plan to protect at-risk coastal cetaceans 40 , 41 .

Biological resilience is partially determined by genetic diversity. Due to the decline in the SRKW population from the 1960s, the population is currently so small that there are relatively few breeders (especially males) and, that we anticipate inbreeding will exacerbate this process and population decline (Fig.  1 ). In this way, the continued loss of genetic diversity will likely hamper the population’s ability to adapt to an ever-evolving threatscape. Kardos et al. found that the SRKW population is already partly inbred and that reduced survival further jeopardizes its recovery potential 26 . Recovery is currently more difficult than if effective measures had been initiated a few decades ago, although other small marine mammal populations with low genetic diversity have continued to reproduce effectively 42 .

The time scales needed to detect demographic effects of threats and benefits of mitigation might be too long in this species to be the primary metrics by which we gauge success (Fig.  4 ). Short-term benchmarks (e.g., body condition, growth rate, pregnancy, behavior, etc.) for measuring the success of mitigation measures are critical given the long lifespan, low reproductive rate, and small sample size in this population. In fact, Canada’s Species At Risk Act outlines a recovery goal to: “ensure the long-term viability of Resident Killer Whale populations by achieving and maintaining demographic conditions that preserve their reproductive potential, genetic variation, and cultural continuity 43 .” Environmental degradation may manifest in social network fragmentation and loss of cultural traditions (e.g., resting lines and greeting ceremonies) long before demographic effects become detectable against background fluctuations.

Marine species are no more or no less vulnerable than terrestrial counterparts to extinction 44 . Although indiscriminate exploitation and unintentional bycatch tend to be the dominant factors in decline and extinction of marine taxa, habitat loss is a close second 44 . Predicting when and how a species is likely to go extinct is extremely challenging, but it is a fundamental task of conservation science 45 . Without rich demographic data on wildlife populations, extinction risk due to habitat loss can be modeled in a species-area relationship framework. Species-area approaches can overestimate the proportion of habitat loss that would result in the removal of the last individual from a population. An inverse relationship has been found between species diversity and density, so protecting part of a species’ range, without considering density, habitat use, or sampling effort, can lead to a false sense of confidence about population-level protection 46 . One study found a 53-year average lag between the time of the last sighting of a species and its reported extinction 44 . None of these statistical issues are at play for SRKW, in which clinically ill and lost animals are recognized through ongoing surveys and a census that is conducted annually.

Although wildlife censuses are rare in conservation biology, many seemingly irreversible and overt population declines are being witnessed in plain sight, even when the causes of these declines are well known. We use the term bright extinction to refer to data-rich cases where a decline toward extinction has been identified early, the driver(s) of the decline have been well quantified, but the population has declined to a precarious state nonetheless despite interventions. The loss of the baiji (Yangtze River dolphin, Lipotes vexillifer ) illustrates the bright extinction concept well 7 . The species was extirpated from part of its range by the 1950s, and a precipitous decline in its core habitat was well documented between the 1980s and 1990s. In this case, the cause was attributed to fisheries-related mortality. By 2006, the population was declared functionally extinct. Proposals to create an ex situ or semi-natural reserve were made and ignored since the 1980s; perhaps policy-makers thought we had more time than we did 47 . A similar bright-extinction process appears to be underway for vaquita ( Phocoena sinus ) in the northern Gulf of California, Mexico 48 . The species has been declining since the 1990s due to unsustainable bycatch levels in fish and shrimp gillnet fisheries. Although the decline is well documented and the cause well understood, management actions have proven insufficient 49 . By 2018, only nine individuals were thought to be left 50 . Numbering in the low hundreds, North Atlantic right whales ( Eubalaena glacialis ) are also facing unsustainable levels of human-caused mortality due to vessel strikes and entanglement in fishing gear 51 .

Importantly, these select examples represent cases where declines in small and highly vulnerable populations have been detected. The loss of each animal reduces the power to detect decreases in population abundance 52 . If we are unable to implement timely interventions of these high-profile species, what hope do we have for meeting our current and future biodiversity conservation objectives at large?

Preventing bright extinctions: from knowledge to action

Preventing extirpation of small populations may require extraordinary measures, but several populations of terrestrial and marine wildlife recovered from the brink of extinction offer a useful roadmap to ensure survival and recovery of SRKW.

The California condor ( Gymnogyps californianus ) was decimated to 27 individuals by 1987, from a combination of poaching, cyanide and lead poisoning, and habitat degradation. Captive breeding saved the population from extinction. Although infectious disease did not cause the initial decline, the US Fish and Wildlife Service has begun testing avian influenza vaccines in captive condors and considers capture and vaccination of wild condors in face of the ongoing multi-year epizootic 53 . Owing to declines in prairie dog ( Cynomys sp.) (prey species) and their habitat, the black-footed ferret ( Mustela nigripes ) was once thought to be extinct; however, after the species was rediscovered in Wyoming in 1981, captive breeding and reintroductions, habitat protection, vaccination against canine distemper and cloning helped restore this species to over 300 free-ranging animals. Like SKRW, black-footed ferrets were largely dependent on a single prey species, prairie dogs. All of those conservation efforts for the black-footed ferret could be undone by a single outbreak of plague in their prey, necessitating management vigilance to prevent a disease outbreak 54 . By the time the whooping crane was listed as endangered in 1967, only 50 birds remained. Whooping cranes ( Grus americana ) remain one of North America’s most threatened birds, but their recovery to an estimated 600 birds today is a testament to the progress that is made possible by acting decisively 55 . Mountain gorillas ( Gorilla beringei beringei ) were thought to be extinct by the end of the 20 th century, but a large population now resides in protected forest in Uganda, Rwanda, and the Democratic Republic of the Congo 29 . Extreme vigilance in the form of veterinary monitoring and intervention is now needed to prevent backsliding and gorilla mortality 56 . Other populations brought back from the brink include black robin ( Petroica traversi ) 57 and the Eastern barred bandicoot ( Perameles gunnii ) 57 . In both cases, low levels of genetic diversity did not prevent recovery. Brazil’s golden lion tamarin ( Leontopithecus rosalia ) was recovered from a few hundred individuals in the 1970s to about 3700 individuals in 2014 after actions were taken to restore habitat, re-establish connectivity via wildlife corridors, and release captive animals and conduct translocations among wild tamarins 58 . A yellow-fever epidemic in 2017–2018 reduced the population to about 2600 individuals, a decline that would have doomed the species had their habitat and population numbers not been previously recovered 58 .

Meanwhile, as Caughley warned 1 , many previously wide-ranging species have declined in plain sight. Boreal woodland caribou (Rangifer tarandus caribou) have been extirpated from vast sections of their range due to habitat loss and hunting, with few signs of success following recovery efforts 59 . Having failed to address those root causes, predation on calves now appears to be inhibiting population growth 60 . Karner blue butterfly ( Plebejus samuelis ) are dependent on the native sundial lupine, which has been eliminated from much of its range due to habitat loss and replacement in the northeast by Lupinus polyphyllus , a western species that has been introduced in the east. Conservation of Karner blue butterfly cannot be assured without aggressive measures to reduce ultimate population stressors and protect the microsites on which large fractions of the population depend 61 . Decades of warnings failed to prevent the functional extinction of northern white rhino ( Ceratotherium simum cottoni ) due to hunting and poaching, while conservation organizations and governments debated if, when, how, and who should act 62 . There is an ongoing debate whether radical proposals to clone the northern white rhino may come at the cost of urgently needed measures to prevent the extinction of southern white rhinos 63 .

While many species have been brought back from the brink through interventions such as captive breeding programs, SRKW recovery will require aggressive actions to protect and restore their habitat, which includes mitigating effects to both SRKWs and their primary prey, Chinook salmon. Our analysis showed that the threat with the greatest impact to SRKW population growth is the availability of Chinook salmon (Fig.  3 , Supplementary Fig.  S20 ). Salmon recovery is a crucial component of achieving SRKW recovery. Although no salmon recovery scenario alone resulted in a fully recovered SRKW population, all of the successful multi-threat mitigation scenarios included some ambitious salmon recovery scenario.

Vessel noise can reduce the amount of time SRKWs spend foraging 18 , but it can also have a direct impact on the behavior of prey species 64 , limiting the number of salmon available to SRKWs. Efforts to mitigate impacts from vessel noise include a suite of approaches ranging from building quieter ships to designating slowdown areas 65 . Voluntary efforts to slow ships in important feeding areas for SRKWs has been shown to reduce noise levels by nearly half  66 , which in turn results in increased foraging activity by killer whales 18 . While efforts are underway to reduce noise from existing ships, a number of development applications are underway that would increase shipping traffic in the region 67 . It may be necessary to consider ocean noise budgets, caps, or limits that allow killer whales to hunt scarce prey efficiently.

Protecting SRKWs appears to be impossible without restoring diminished populations of Chinook salmon, which in turn requires effective implementation of conservation and precautionary resource management measures. Implementation will require acknowledgement of the potential trade-offs involved between conservation and resource management, including harvest, and open dialog between involved agencies and stakeholders 24 . Both Canada and the USA have produced recovery plans and strategies for SRKW 43 , 68 . Those plans have recognized the need to ensure adequate prey sources for survival and recovery of SRKW since at least 2008. In a declining population, the longer the lag time between knowledge and mitigation, the more draconian the recovery actions can become, with a larger social cost, and a higher risk that harm reduction actions may not work 69 , 70 . Unfortunately, a legal species listing alone is insufficient to ensure survival and recovery of threatened taxa 63 . In the face of bright extinction, targeted threat reduction measures and community involvement, in addition to monitoring, are needed to reverse declines 71 . Yet the capacity to determine what we can do often outstrips our ability to decide what we will do; a dilemma that leads to delays in threat reduction measures and perpetuates extinction debt, especially in long-lived species 72 .

Abundant examples, however, of successful rescues of plants, insects, and animals in aquatic, terrestrial, and aerial environments confirm that we can halt the loss of endangered wild species and that extraordinary measures can even recover critically imperiled ones. Unfortunately, these eleventh-hour rescues carry higher environmental and societal costs than earlier actions might have. Preventing extinctions of populations on the brink require a high degree of planning and coordination by scientists, managers, decision makers, stakeholders, and affected communities, and may require higher levels of threat reduction than would have been the case had actions been taken sooner. The benefits of species recovery may be difficult to define, both in terms of reversing global biodiversity loss and to the long-term resiliency and health of ecosystems. Rising to the challenge of biodiversity conservation requires robust data on species and threats, but also acting on those threats in a timely manner 73 .

We used program Vortex 10.6.0 to parameterize a population viability analysis (PVA) model (software and manual available at scti.tools/vortex) for Southern Resident killer whales (SRKW) 20 with demographic rates observed over 1976 through 2022. We tested the sensitivity of population growth to variability and uncertainty in fecundity and survival rates (by age class), and prey-demography functional relationships 22 , 23 . Next, we constructed a PVA that explores population consequences of the three primary anthropogenic threats to SRKWs identified in Canadian and USA recovery plans, namely prey limitation (Chinook salmon), noise-mediated disruption of foraging, and effects of contaminants (e.g., PCBs).

We ran more speculative scenarios to consider the threat of not only PCBs, but also other POPs including legacy organochlorine pesticides (OCPs), pathways of effects of contaminants on calf survival 74 , 75 , climate-mediated impacts of Chinook salmon on SRKW demography 22 , 23 , climate- and fisheries-related declines in the size of Chinook salmon 76 , and increased oil spill risk related to industrial development applications in the Salish Sea 67 . In addition to modeling population consequences of threats, efforts were made to model the likely population-level benefits of management measures intended to mitigate human-caused impacts to abundance and population structure from fisheries.

Addressing prey needs requires increasing the abundance of large, older Chinook. Increased abundance and quality of prey within SRKW critical habitat can be realized by changing fishing practices. Moving Pacific Salmon Treaty fisheries in Alaska and BC away from Chinook rearing grounds and migration routes into terminal river and estuarine locations results in an immediate increase of Chinook salmon in critical habitat of up to 25% (Supplementary Table  S2 ). Secondly, transitioning marine fisheries to terminal (river-based) areas can recover a more archetypical Chinook age structure (early- mid 20th century). By not harvesting immature fish in marine fisheries, and then allowing large females to pass through terminal fisheries to spawning grounds, a size increase up to 40% can occur over a 50-year period. Scaling these scenarios to consider both improved value and abundance of mature Chinook salmon in critical habitat results in increases of 35%, 28%, 18%, and 9% at the end of 50 years, if scaled for effectiveness at 100%, 75%, 50%, and 25% (Supplementary Table  S2 ). While not quantified, freshwater habitat restoration and protection would further support recovery of wild Chinook abundance.

The relative importance of each threat and mitigation opportunities were explored by projecting the population growth across the possible range of each threat. Finally, we used the PVA to explore the degree to which threats would have to be mitigated, alone or in combination, to stop the decline and achieve positive population growth toward recovery 68 .

For the baseline model, parameters for fecundity and survival (for calves; subadults young, older, and post-reproductive adult females; young and older adult males) were estimated from 1976 to 2022 data (Table  1 ). Prey availability was drawn from the Chinook prey indexed to the mean from 1976 to 2022 (i.e., impacts scaled such that when Chinook = 1, demographic rates are the means over that time span). The prey-demography relationship of killer whale breeding rate and survival of each age class to the Chinook salmon index was drawn from a recent re-analysis 77 . Inputs for noise (disturbance) impacts and its effect on feeding were as in Lacy et al. 20 . We used the model from Hall et al. 74 for PCB accumulation and depuration parameters and the impact on calf survival was estimated through a comparison between a sympatric killer whale population, Northern Resident killer whales and SRKWs. A 1.75× factor was applied to our previously modeled population impact attributed to only PCBs to capture the contribution and associated risk of other POPs, including legacy OCPs. The factor was derived from a risk-based quotient for endocrine disruption for local harbor seals ( Phoca vitulina ) 27 . Inputs for variance in male breeding success were sampled from beta distribution (mean = SD = 0.40). Effects of inbreeding depression were set to 6.29 lethal equivalents per diploid, imposed via reduced calf survival.

Data availability

Please see additional technical details on methods and results in the Supplementary Notesfile. The data needed to replicate the model can be found on Zenodo at: Lacy, Robert C, & Williams, Rob. (2023). Vortex project file for PVA of Southern Resident Killer Whale — manuscript by Williams et al. (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8099710 .

Code availability

The Vortex code needed to replicate the model can also be found on Zenodo at: Lacy, Robert C, & Williams, Rob. (2023). Vortex project file for PVA of Southern Resident Killer Whale — manuscript by Williams et al. (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8099710 .

Caughley, G. Directions in conservation biology. J. Animal Ecol. 63 , 215–244 (1994).

Sekar, N. & Shiller, D. Engage with animal welfare in conservation. Science 369 , 629–630 (2020).

Article   CAS   Google Scholar  

Mace, G. M. et al. Quantification of extinction risk: IUCN’s system for classifying threatened species. Conservation Biol. 22 , 1424–1442 (2008).

Article   Google Scholar  

Boehm, M. M. A. & Cronk, Q. C. B. Dark extinction: the problem of unknown historical extinctions. Biol. Lett. 17 , 20210007 (2021).

Drake, J. M. & Griffen, B. D. Early warning signals of extinction in deteriorating environments. Nature 467 , 456–459 (2010).

Taylor, B. L., Martinez, M., Gerrodette, T., Barlow, J. & Hrovat, Y. N. Lessons from monitoring trends in abundance of marine mammals. Marine Mammal Sci. 23 , 157–175 (2007).

Turvey, S. L. et al. First Human Caused Extinction of a Cetacean Species? Biol. Lett. 3 , 537–540 (2007).

Conde, D. A. et al. Data gaps and opportunities for comparative and conservation biology. Proc. Natl Acad. Sci. 116 , 9658–9664 (2019).

Sutherland, W. J., Pullin, A. S., Dolman, P. M. & Knight, T. M. The need for evidence-based conservation. Trends Ecol. Evol. 19 , 305–308 (2004).

Kindsvater, H. K. et al. Overcoming the data crisis in biodiversity conservation. Trends Ecol. Evol. 33 , 676–688 (2018).

Ashe, E. et al. Minding the data-gap trap: exploring dynamics of abundant dolphin populations under uncertainty. Front. Marine Sci. 8 , 606932 (2021).

Serrouya, R. et al. Saving endangered species using adaptive management. Proc. Natl Acad. Sci. 116 , 6181–6186 (2019).

Barnosky, A. D. et al. Has the Earth’s sixth mass extinction already arrived? Nature 471 , 51–57 (2011).

Kimmel, K., Clark, M. & Tilman, D. Impact of multiple small and persistent threats on extinction risk. Conserv. Biol. 36 , e13901 (2022).

Maxwell, S. L., Fuller, R. A., Brooks, T. M. & Watson, J. E. M. Biodiversity: The ravages of guns, nets and bulldozers. Nature 536 , 143–145 (2016).

CWR. Southern Resident Orca (SRKW) Population , https://www.whaleresearch.com/orca-population (2023).

Balcomb, K. C. & Bigg, M. A. in Behavioral Biology of Killer Whales Zoo Biology Monographs (eds B. C. Kirkevold & J. S. Lockard) 85-95 (Alan R. Liss Inc., 1986).

Williams, R. et al. Reducing vessel noise increases foraging in endangered killer whales. Marine Pollut. Bull. 173 , 112976 (2021).

Williams, R. et al. Destroying and restoring critical habitats of endangered killer whales. Bioscience 71 , 1117–1120 (2021).

Lacy, R. C. et al. Evaluating anthropogenic threats to endangered killer whales to inform effective recovery plans. Sci. Rep. 7 , 1–12 (2017).

Raverty, S. et al. Pathology findings and correlation with body condition index in stranded killer whales ( Orcinus orca ) in the northeastern Pacific and Hawaii from 2004 to 2013. PloS One 15 , e0242505 (2020).

Ford, J. K. B., Ellis, G. M., Olesiuk, P. F. & Balcomb, K. C. Linking killer whale survival and prey abundance: food limitation in the oceans’ apex predator? Biol. Lett. 6 , 139–142 (2010).

Ward, E. J., Holmes, E. E. & Balcomb, K. C. Quantifying the effects of prey abundance on killer whale reproduction. J. Appl. Ecol. 46 , 632–640 (2009).

Linnell, J. D. C. et al. Confronting the costs and conflicts associated with biodiversity. Animal Conserv. 13 , 429–431 (2010).

Nelson, B. W., Ward, E. J., Linden, D. W., Ashe, E. & Williams, R. Identifying drivers of demographic rates in an at-risk population of marine mammals using integrated population models. Ecosphere 15 , e4773 (2024).

Kardos, M. et al. Inbreeding depression explains killer whale population dynamics. Nat. Ecol. Evol. 7 , 675–686 (2023).

Melbourne, B. A. & Hastings, A. Extinction risk depends strongly on factors contributing to stochasticity. Nature 454 , 100–103 (2008).

Clay, A. S. & Visseren-Hamakers, I. J. Individuals Matter: Dilemmas and Solutions in Conservation and Animal Welfare Practices in Zoos. Animals 12 https://doi.org/10.3390/ani12030398 (2022).

Robbins, M. M. et al. Extreme conservation leads to recovery of the Virunga mountain gorillas. PloS One 6 , e19788 (2011).

Sillero-Zubiri, C. et al. Feasibility and efficacy of oral rabies vaccine SAG2 in endangered Ethiopian wolves. Vaccine 34 , 4792–4798 (2016).

Harting, A. L., Johanos, T. C. & Littnan, C. L. Benefits derived from opportunistic survival-enhancing interventions for the Hawaiian monk seal: the silver BB paradigm. Endangered Species Res. 25 , 89–96 (2014).

Gulland, F. M. D. The role of nematode parasites in Soay sheep ( Ovis aries L. ) mortality during a population crash. Parasitology 105 , 493–503 (1992).

Gobush, K., Baker, J. & Gulland, F. Effectiveness of an antihelminthic treatment in improving the body condition and survival of Hawaiian monk seals. Endangered Species Res. 15 , 29–37 (2011).

Viana, M. et al. Dynamics of a morbillivirus at the domestic–wildlife interface: Canine distemper virus in domestic dogs and lions. Proc. Natl Acad. Sci. 112 , 1464–1469 (2015).

Northwest Area Committee. Northwest Area contingency plan. https://www.rrt10nwac.com/NWACP/Default.aspx (2020).

McHugh, K. A. et al. Staying alive: long-term success of bottlenose dolphin interventions in southwest Florida. Front. Mar. Sci. 7 , 1254 (2021).

Ross, P. S., Ellis, G. M., Ikonomou, M. G., Barrett-Lennard, L. G. & Addison, R. F. High PCB concentrations in free-ranging Pacific killer whales, Orcinus orca : effects of age, sex and dietary preference. Marine Pollut. Bull. 40 , 504–515 (2000).

Hickie, B. E., Ross, P. S., Macdonald, R. W. & Ford, J. K. B. Killer whales ( Orcinus orca ) face protracted health risks associated with lifetime exposure to PCBs. Environ.Science & Technology 41 , 6613–6619 (2007).

Mos, L., Cameron, M., Jeffries, S. J., Koop, B. F. & Ross, P. S. Risk‐based analysis of polychlorinated biphenyl toxicity in harbor seals. Integrated Environ. Assess. Manag. 6 , 631–640 (2010).

Ross, P. S. et al. Ten guiding principles for the delineation of priority habitat for endangered small cetaceans. Marine Policy 35 , 483–488 (2011).

Braulik, G. T. et al. Red-list status and extinction risk of the world’s whales, dolphins, and porpoises. Conserv. Biol. 37(5), e14090 (2023).

Morin, P. A. et al. Reference genome and demographic history of the most endangered marine mammal, the vaquita. Mol. Ecol. Resour. 21 , 1008–1020 (2021).

Fisheries and Oceans Canada. Recovery Strategy for the Northern and Southern Resident Killer Whales (Orcinus orca) in Canada. Species at Risk Act Recovery Strategy Series (Fisheries and Oceans Canada, 2018).

Dulvy, N. K., Sadovy, Y. & Reynolds, J. D. Extinction vulnerability in marine populations. Fish and fisheries 4 , 25–64 (2003).

Purvis, A., Gittleman, J. L., Cowlishaw, G. & Mace, G. M. Predicting extinction risk in declining species. Proceedings of the royal society of London . Ser. B Biol. Sci. 267 , 1947–1952 (2000).

Williams, R. et al. Prioritizing global marine mammal habitats using density maps in place of range maps. Ecography 37 , 212–220 (2014).

Chen, P. & Hua, Y. in Biology and Conservation of the River Dolphins (eds Perrin, W. F., Brownell, Jr. R. L., Kaiya, Z. & Jiankang, L.) 81–85 (IUCN, 1989).

Rojas-Bracho, L., Reeves, R. R. & Jaramillo-Legorreta, A. Conservation of the vaquita Phocoena sinus . Mammal Rev. 36 , 179–216 (2006).

Rojas-Bracho, L. & Reeves, R. R. Vaquitas and gillnets: Mexico’s ultimate cetacean conservation challenge. Endangered Species Res. 21 , 77–87 (2013).

Jaramillo-Legorreta, A. M. et al. Decline towards extinction of Mexico’s vaquita porpoise ( Phocoena sinus ). R. Soc. Open Sci. 6 , 190598 (2019).

Pace, R. M., Corkeron, P. J. & Kraus, S. D. State–space mark–recapture estimates reveal a recent decline in abundance of North Atlantic right whales. Ecol. Evol. 7 , 8730–8741 (2017).

Taylor, B. L. & Gerrodette, T. The uses of statistical power in conservation biology: the vaquita and northern spotted owl. Conserv. Biol. 7 , 489–500 (1993).

Kozlov, M. US will vaccinate birds against avian flu for first time—what researchers think. Nature 618 , 220–221 (2023).

May, R. M. Species conservation: The cautionary tale of the black-footed ferret. Nature 320 , 13–14 (1986).

Mueller, T., O’Hara, R. B., Converse, S. J., Urbanek, R. P. & Fagan, W. F. Social learning of migratory performance. Science 341 , 999–1002 (2013).

Zimmerman, D. M. et al. Projecting the impact of an ebola virus outbreak on endangered mountain gorillas. Sci. Rep. 13 , 5675 (2023).

Ardern, S. & Lambert, D. Is the black robin in genetic peril? Mol. Ecol. 6 , 21–28 (1997).

Ruiz-Miranda, C. R. et al. Estimating population sizes to evaluate progress in conservation of endangered golden lion tamarins ( Leontopithecus rosalia ). Plos One 14 , e0216664 (2019).

McLoughlin, P. D., Dzus, E., Wynes, B. O. B. & Boutin, S. Declines in populations of woodland caribou. J. Wildlife Manag. 67 , 755–761 (2003).

Levy, S. The new top dog. Nature 485 , 296 (2012).

Delach, A. et al. Agency plans are inadequate to conserve US endangered species under climate change. Nat. Clim. Change 9 , 999–1004 (2019).

Hillman-Smith, K., ma Oyisenzoo M., & Smith, F. A last chance to save the northern white rhino? Oryx 20 , 20–26 (1986).

Callaway, E. Geneticists aim to save rare rhino. Nature 533 , 20–21 (2016).

van der Knaap, I. et al. Behavioural responses of wild Pacific salmon and herring to boat noise. Mar Pollut Bull 174 , 113257 (2022).

Williams, R., Veirs, S., Veirs, V., Ashe, E. & Mastick, N. Approaches to reduce noise from ships operating in important killer whale habitats. Marine Pollut. Bull. 139 , 459–469 (2019).

Joy, R. et al. Potential benefits of vessel slowdowns on endangered southern resident killer whales. Front. Marine Sci. 6 , 344 (2019).

Gaydos, J. K., Thixton, S. & Donatuto, J. Evaluating Threats in Multinational Marine Ecosystems: A Coast Salish First Nations and Tribal Perspective. PLoS One 10 , e0144861 (2015).

National Marine Fisheries Service. Recovery Plan for Southern Resident Killer Whales ( Orcinus orca ). 251 (National Marine Fisheries Service, Northwest Region, Seattle, WA, USA, 2008).

Gerber, L. R. Conservation triage or injurious neglect in endangered species recovery. Proc. Natl Acad. Sci. 113 , 3563–3566 (2016).

Martin, T. G. et al. Acting fast helps avoid extinction. Nature 5 , 274–280 (2012).

Google Scholar  

Jaramillo-Legorreta, A. et al. Saving the Vaquita: Immediate Action, Not More Data. Conserv. Biol. 21 , 1653–1655 (2007).

Kuussaari, M. et al. Extinction debt: a challenge for biodiversity conservation. Trends Ecol. Evol. 24 , 564–571 (2009).

Nature. Is a single target the best way to cut biodiversity loss? Nature 583 , 7 (2020).

Hall, A. J. et al. Individual-Based Model Framework to Assess Population Consequences of Polychlorinated Biphenyl Exposure in Bottlenose Dolphins. Environ. Health Perspect. 114 , 60–64 (2006).

Hall, A. J. et al. Predicting the effects of polychlorinated biphenyls on cetacean populations through impacts on immunity and calf survival. Environ. Pollut. 233 , 407–418 (2018).

Crozier, L. G., Burke, B. J., Chasco, B. E., Widener, D. L. & Zabel, R. W. Climate change threatens Chinook salmon throughout their life cycle. Commun. Biol. 4 , 222 (2021).

Nelson, B. W. Ecosphere-SRKW-modeling. github https://github.com/benjaminnelson/Ecosphere-SRKW-modeling (2023).

Download references

Acknowledgements

We are grateful to Puget Sound Partnership for funding this study, with special thanks to Scott Redman. This project was a collaborative, interdisciplinary endeavor. We thank both US and Canadian government, community, and non-profit partners who contributed to and attended a webinar on “Assessing the status, threats, and options for the Southern Resident killer whale”. The Vortex PVA software is made freely available by the Species Conservation Toolkit Initiative and its sponsoring organizations. Open Access page charges were covered by the BC Ministry of Agriculture and Lands and SeaDoc Society, a program of the UC Davis Karen C. Drayer Wildlife Health Center.

Author information

Authors and affiliations.

Oceans Initiative, 117 E. Louisa Street #135, Seattle, WA, 98102, USA

Rob Williams, Erin Ashe, Kimberly A. Nielsen, Stephanie Reiss & Marena Salerno Collins

Chicago Zoological Society, Brookfield, IL, 60513, USA

Robert C. Lacy

Raincoast Conservation Foundation, Sidney, BC, Canada

Lance Barrett-Lennard, Misty MacDuffee & Peter S. Ross

Fisheries and Oceans Canada, 4160 Marine Drive, West Vancouver, BC, V7V 1N6, Canada

Tanya M. Brown

SeaDoc Society, UC Davis Wildlife Health Center – Orcas Island Office, 1020 Deer Harbor Rd., Eastsound, WA, 98245, USA

Joseph K. Gaydos

Wildlife Health Center, School of Veterinary Medicine, University of California, One Shields Avenue, Davis, CA, 95616, USA

Frances Gulland

Independent Consultant, Seattle, WA, 98107, USA

Benjamin W. Nelson

San Diego Zoo Wildlife Alliance 2920 Zoo Dr, San Diego, CA, 92101, USA

Hendrik Nollens

Animal Health Center, 1767 Angus Campbell Road, Abbotsford, BC, V3G2M3, Canada

Stephen Raverty

Department of Wildlife Management, North Slope Borough, Utqiagvik, AK, 99723, USA

Raphaela Stimmelmayr

Department of Geography, University of Victoria, PO Box 1700 STN CSC, Victoria, BC, V8W 2Y2, Canada

Paul Paquet

You can also search for this author in PubMed   Google Scholar

Contributions

R.W. conceived of the study, coordinated the project, and contributed to writing. R.C.L. conceived of the study, performed all statistical modeling with custom code, and contributed to writing and editing. E.A. conceived of the study and contributed to writing. L.B.L. contributed to writing. T.M.B. contributed to writing. J.K.G. contributed to writing and editing. F.G. contributed to writing. M.M. contributed to writing and editing. B.W.N. parameterized the prey-demography part of the model. K.A.N. generated figures and contributed to writing and editing. H.N. contributed to writing, particularly related to veterinary interventions. S.Raverty contributed to writing. S.Reiss assisted with project coordination. P.S.R. contributed to writing, particularly related to prey impacts. M.S.C. generated figures and contributed to writing and editing. R.S. contributed to writing. P.P. contributed to writing and editing.

Corresponding author

Correspondence to Rob Williams .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Communications Earth & Environment thanks Robert Harcourt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Clare Davis. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file, supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Williams, R., Lacy, R.C., Ashe, E. et al. Warning sign of an accelerating decline in critically endangered killer whales ( Orcinus orca ). Commun Earth Environ 5 , 173 (2024). https://doi.org/10.1038/s43247-024-01327-5

Download citation

Received : 01 October 2023

Accepted : 15 March 2024

Published : 02 April 2024

DOI : https://doi.org/10.1038/s43247-024-01327-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

data analysis techniques case study

IMAGES

  1. Four Steps to Analyse Data from a Case Study Method

    data analysis techniques case study

  2. How To Do Case Study Analysis?

    data analysis techniques case study

  3. 5 Steps of the Data Analysis Process

    data analysis techniques case study

  4. Top 4 Data Analysis Techniques

    data analysis techniques case study

  5. What is Data Analysis ?

    data analysis techniques case study

  6. Quantitative research tools for data analysis

    data analysis techniques case study

VIDEO

  1. Data Analyst Case Study Interview

  2. Part 8: Data Analysis Techniques: Knowing the right visualization to run as a data analyst

  3. Data analysis types

  4. Data Analysis Skills: What, Why and How

  5. Steps to Perform Data Analysis| How to Perform Data Analysis| Data Analysis Steps #database #dbwala

  6. Session 04: Data Analysis techniques in Qualitative Research

COMMENTS

  1. Data Analysis Techniques for Case Studies

    Mixed methods analysis combines qualitative and quantitative data to answer research questions, enrich case study data, address analysis strengths and limitations, and provide a comprehensive ...

  2. Case Study Methodology of Qualitative Research: Key Attributes and

    In a case study research, multiple methods of data collection are used, as it involves an in-depth study of a phenomenon. It must be noted, as highlighted by Yin ... (2001, p. 220) posits that the 'unit of analysis' in a case study research can be an individual, a family, a household, a community, an organisation, an event or even a decision.

  3. The 7 Most Useful Data Analysis Techniques [2024 Guide]

    Cluster analysis. Time series analysis. Sentiment analysis. The data analysis process. The best tools for data analysis. Key takeaways. The first six methods listed are used for quantitative data, while the last technique applies to qualitative data.

  4. Qualitative case study data analysis: an example from practice

    Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  5. Four Steps to Analyse Data from a Case Study Method

    The strategies to collect data using such techniques are well defined however one of the main issues associated with any research is how to interpret the resulting data (Coffey and Atkinson, 1996). ... the Miles and Huberman (1994) process of analysis of case study data, although quite detailed, may still be insufficient to guide the novice ...

  6. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case study protocol is a formal document capturing the entire set of procedures involved in the collection of empirical material . It extends direction to researchers for gathering evidences, empirical material analysis, and case study reporting . This section includes a step-by-step guide that is used for the execution of the actual study.

  7. Qualitative Data Analysis Methods

    Terms Used in Data Analysis by the Six Designs. Each qualitative research approach or design has its own terms for methods of data analysis: Ethnography—uses modified thematic analysis and life histories. Case study—uses description, categorical aggregation, or direct interpretation. Grounded theory—uses open, axial, and selective coding ...

  8. Chapter 5: DATA ANALYSIS AND INTERPRETATION

    5.2 ANALYSIS OF DATA IN FLEXIBLE RESEARCH 5.2.1 Introduction. As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence. The chain of evidence means that ...

  9. PDF A (VERY) BRIEF REFRESHER ON THE CASE STUDY METHOD

    Besides discussing case study design, data collection, and analysis, the refresher addresses several key features of case study research. First, an abbreviated definition of a "case study" will help identify the circumstances when you might choose to use the case study method instead of (or as a complement to) some other research method.

  10. What Is a Case Study?

    A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are sometimes also used.

  11. PDF Analyzing Case Study Evidence

    For case study analysis, one of the most desirable techniques is to use a pattern-matching logic. Such a logic (Trochim, 1989) compares an empiri-cally based pattern with a predicted one (or with several alternative predic-tions). If the patterns coincide, the results can help a case study to strengthen its internal validity. If the case study ...

  12. Learning to Do Qualitative Data Analysis: A Starting Point

    In this article, we take up this open question as a point of departure and offer thematic analysis, an analytic method commonly used to identify patterns across language-based data (Braun & Clarke, 2006), as a useful starting point for learning about the qualitative analysis process.In doing so, we do not advocate for only learning the nuances of thematic analysis, but rather see it as a ...

  13. PDF Open Case Studies: Statistics and Data Science Education through Real

    topic can walk through the case study to see an example of a complete data analysis. In addition, this method is particularly helpful for instructors who may not feel con-fident creating an analysis from scratch, especially if it is outside their main area of expertise, as our case studies built with domain experts and are peer-reviewed.

  14. Develop Your Case! How Controversial Cases, Subcases, and Moderated

    We refer to this approach as the "develop your case" approach, and we take as our starting point Bazeley's (2018) description of data analysis within a mixed methods case study, which is based on the case study as "an empirical method that investigates a contemporary phenomenon (the 'case') in depth and within its real-world context ...

  15. Continuing to enhance the quality of case study methodology in health

    Authors sometimes use incongruent methods of data collection and analysis or use the case study as a default when other methodologies do not fit. 9,10 Despite these criticisms, case study methodology is becoming more common as a viable approach for HSR. 11 An abundance of articles and textbooks are available to guide researchers through case ...

  16. 10 Real World Data Science Case Studies Projects with Example

    A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

  17. Data Analysis Case Study: Learn From These Winning Data Projects

    Humana's Automated Data Analysis Case Study. The key thing to note here is that the approach to creating a successful data program varies from industry to industry. Let's start with one to demonstrate the kind of value you can glean from these kinds of success stories. Humana has provided health insurance to Americans for over 50 years.

  18. Case Study

    The data collection method should be selected based on the research questions and the nature of the case study phenomenon. Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions ...

  19. What is data analysis? Methods, techniques, types & how-to

    Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

  20. Qualitative Research in Healthcare: Data Analysis

    Stage 4 is the data analysis stage. The case is described in detail based on the collected data, and the data for concrete topics are analyzed . With no prescribed method related to data collection and analysis for a case study, a general data analysis procedure is followed, and the choice of analysis method differs among researchers.

  21. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  22. Data Analytics Case Study Guide 2023

    Roadmap to Handling a Data Analysis Case Study. Embarking on a data analytics case study requires a systematic approach, step-by-step, to derive valuable insights effectively. Here are the steps to help you through the process: Step 1: Understanding the Case Study Context: Immerse yourself in the intricacies of the case study.

  23. Cracking the Analytics Interview: A Beginner's Guide to Case Study

    To excel in analytics case studies, you need to be familiar with various data analysis techniques and tools. This includes statistical methods, data visualization, and programming languages such ...

  24. Adaptive neighborhood rough set model for hybrid data ...

    The detailed analysis of existing techniques highlights the need for a generalized NRS-based classification technique to handle both categorical and numerical data.

  25. A multi-source data fusion method for land cover production: a case

    The production of existing land cover data products based on deep learning methods requires a large amount of manual label sample creation that incurs high time and labor costs. Therefore, this study aims to address these issues and investigate a method that integrates multiple data sources for land cover production.

  26. Full article: Assessment of soil erosion and prioritization of

    This study focuses on quantifying and prioritizing micro-watersheds that require conservation actions by piloting spatial modeling of soil loss in the upper Bilate watershed. Sentinel image, soil, DEM, rainfall, and support practice data were used. A Revised Universal Soil Loss Equation (RUSLE) using GIS and satellite images was applied.

  27. tidymodels

    Survival analysis is a field of statistics and machine learning for analyzing the time to an event. While it has its roots in medical research, the event of interest can be anything from customer churn to machine failure. Methods from survival analysis take into account that some observations may not yet have experienced the event of interest ...

  28. A case study evolving quality management in Indian civil ...

    The present research examines a wide range of civil engineering projects across India, each providing a distinct platform for investigating quality management, automation techniques, and improvement activities using artificial intelligence (AI) techniques. The study covers projects demonstrating the variety of India's civil engineering undertakings, from the Smart City Mission to the Mumbai ...

  29. Warning sign of an accelerating decline in critically ...

    Here we use a population viability analysis to test the sensitivity of this killer whale population to variability in age structure, survival rates, and prey-demography functional relationships.

  30. Completion of the DrugMatrix Toxicogenomics Database using ...

    The DrugMatrix Database contains systematically generated toxicogenomics data from short-term in vivo studies for over 600 chemicals. However, most of the potential endpoints in the database are missing due to a lack of experimental measurements. We present our study on leveraging matrix factorization and machine learning methods to predict the missing values in the DrugMatrix, which includes ...