• Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • *New* Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

decision making case study learning

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

5 Benefits of Learning Through the Case Study Method

Harvard Business School MBA students learning through the case study method

  • 28 Nov 2023

While several factors make HBS Online unique —including a global Community and real-world outcomes —active learning through the case study method rises to the top.

In a 2023 City Square Associates survey, 74 percent of HBS Online learners who also took a course from another provider said HBS Online’s case method and real-world examples were better by comparison.

Here’s a primer on the case method, five benefits you could gain, and how to experience it for yourself.

Access your free e-book today.

What Is the Harvard Business School Case Study Method?

The case study method , or case method , is a learning technique in which you’re presented with a real-world business challenge and asked how you’d solve it. After working through it yourself and with peers, you’re told how the scenario played out.

HBS pioneered the case method in 1922. Shortly before, in 1921, the first case was written.

“How do you go into an ambiguous situation and get to the bottom of it?” says HBS Professor Jan Rivkin, former senior associate dean and chair of HBS's master of business administration (MBA) program, in a video about the case method . “That skill—the skill of figuring out a course of inquiry to choose a course of action—that skill is as relevant today as it was in 1921.”

Originally developed for the in-person MBA classroom, HBS Online adapted the case method into an engaging, interactive online learning experience in 2014.

In HBS Online courses , you learn about each case from the business professional who experienced it. After reviewing their videos, you’re prompted to take their perspective and explain how you’d handle their situation.

You then get to read peers’ responses, “star” them, and comment to further the discussion. Afterward, you learn how the professional handled it and their key takeaways.

HBS Online’s adaptation of the case method incorporates the famed HBS “cold call,” in which you’re called on at random to make a decision without time to prepare.

“Learning came to life!” said Sheneka Balogun , chief administration officer and chief of staff at LeMoyne-Owen College, of her experience taking the Credential of Readiness (CORe) program . “The videos from the professors, the interactive cold calls where you were randomly selected to participate, and the case studies that enhanced and often captured the essence of objectives and learning goals were all embedded in each module. This made learning fun, engaging, and student-friendly.”

If you’re considering taking a course that leverages the case study method, here are five benefits you could experience.

5 Benefits of Learning Through Case Studies

1. take new perspectives.

The case method prompts you to consider a scenario from another person’s perspective. To work through the situation and come up with a solution, you must consider their circumstances, limitations, risk tolerance, stakeholders, resources, and potential consequences to assess how to respond.

Taking on new perspectives not only can help you navigate your own challenges but also others’. Putting yourself in someone else’s situation to understand their motivations and needs can go a long way when collaborating with stakeholders.

2. Hone Your Decision-Making Skills

Another skill you can build is the ability to make decisions effectively . The case study method forces you to use limited information to decide how to handle a problem—just like in the real world.

Throughout your career, you’ll need to make difficult decisions with incomplete or imperfect information—and sometimes, you won’t feel qualified to do so. Learning through the case method allows you to practice this skill in a low-stakes environment. When facing a real challenge, you’ll be better prepared to think quickly, collaborate with others, and present and defend your solution.

3. Become More Open-Minded

As you collaborate with peers on responses, it becomes clear that not everyone solves problems the same way. Exposing yourself to various approaches and perspectives can help you become a more open-minded professional.

When you’re part of a diverse group of learners from around the world, your experiences, cultures, and backgrounds contribute to a range of opinions on each case.

On the HBS Online course platform, you’re prompted to view and comment on others’ responses, and discussion is encouraged. This practice of considering others’ perspectives can make you more receptive in your career.

“You’d be surprised at how much you can learn from your peers,” said Ratnaditya Jonnalagadda , a software engineer who took CORe.

In addition to interacting with peers in the course platform, Jonnalagadda was part of the HBS Online Community , where he networked with other professionals and continued discussions sparked by course content.

“You get to understand your peers better, and students share examples of businesses implementing a concept from a module you just learned,” Jonnalagadda said. “It’s a very good way to cement the concepts in one's mind.”

4. Enhance Your Curiosity

One byproduct of taking on different perspectives is that it enables you to picture yourself in various roles, industries, and business functions.

“Each case offers an opportunity for students to see what resonates with them, what excites them, what bores them, which role they could imagine inhabiting in their careers,” says former HBS Dean Nitin Nohria in the Harvard Business Review . “Cases stimulate curiosity about the range of opportunities in the world and the many ways that students can make a difference as leaders.”

Through the case method, you can “try on” roles you may not have considered and feel more prepared to change or advance your career .

5. Build Your Self-Confidence

Finally, learning through the case study method can build your confidence. Each time you assume a business leader’s perspective, aim to solve a new challenge, and express and defend your opinions and decisions to peers, you prepare to do the same in your career.

According to a 2022 City Square Associates survey , 84 percent of HBS Online learners report feeling more confident making business decisions after taking a course.

“Self-confidence is difficult to teach or coach, but the case study method seems to instill it in people,” Nohria says in the Harvard Business Review . “There may well be other ways of learning these meta-skills, such as the repeated experience gained through practice or guidance from a gifted coach. However, under the direction of a masterful teacher, the case method can engage students and help them develop powerful meta-skills like no other form of teaching.”

Your Guide to Online Learning Success | Download Your Free E-Book

How to Experience the Case Study Method

If the case method seems like a good fit for your learning style, experience it for yourself by taking an HBS Online course. Offerings span seven subject areas, including:

  • Business essentials
  • Leadership and management
  • Entrepreneurship and innovation
  • Finance and accounting
  • Business in society

No matter which course or credential program you choose, you’ll examine case studies from real business professionals, work through their challenges alongside peers, and gain valuable insights to apply to your career.

Are you interested in discovering how HBS Online can help advance your career? Explore our course catalog and download our free guide —complete with interactive workbook sections—to determine if online learning is right for you and which course to take.

decision making case study learning

About the Author

Smart. Open. Grounded. Inventive. Read our Ideas Made to Matter.

Which program is right for you?

MIT Sloan Campus life

Through intellectual rigor and experiential learning, this full-time, two-year MBA program develops leaders who make a difference in the world.

A rigorous, hands-on program that prepares adaptive problem solvers for premier finance careers.

A 12-month program focused on applying the tools of modern data science, optimization and machine learning to solve real-world business problems.

Earn your MBA and SM in engineering with this transformative two-year program.

Combine an international MBA with a deep dive into management science. A special opportunity for partner and affiliate schools only.

A doctoral program that produces outstanding scholars who are leading in their fields of research.

Bring a business perspective to your technical and quantitative expertise with a bachelor’s degree in management, business analytics, or finance.

A joint program for mid-career professionals that integrates engineering and systems thinking. Earn your master’s degree in engineering and management.

An interdisciplinary program that combines engineering, management, and design, leading to a master’s degree in engineering and management.

Executive Programs

A full-time MBA program for mid-career leaders eager to dedicate one year of discovery for a lifetime of impact.

This 20-month MBA program equips experienced executives to enhance their impact on their organizations and the world.

Non-degree programs for senior executives and high-potential managers.

A non-degree, customizable program for mid-career professionals.

Teaching Resources Library

Case studies.

The teaching business case studies available here are narratives that facilitate class discussion about a particular business or management issue. Teaching cases are meant to spur debate among students rather than promote a particular point of view or steer students in a specific direction.  Some of the case studies in this collection highlight the decision-making process in a business or management setting. Other cases are descriptive or demonstrative in nature, showcasing something that has happened or is happening in a particular business or management environment. Whether decision-based or demonstrative, case studies give students the chance to be in the shoes of a protagonist. With the help of context and detailed data, students can analyze what they would and would not do in a particular situation, why, and how.

Case Studies By Category

decision making case study learning

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Decision making and problem solving

  • Change management
  • Competitive strategy
  • Corporate strategy
  • Customer strategy

decision making case study learning

Building Consensus Around Difficult Strategic Decisions

  • Scott D. Anthony
  • Natalie Painchaud
  • Andy Parker
  • October 27, 2023

decision making case study learning

When Should Multinationals Move Back into Venezuela?

  • Pablo González Alonso
  • Alejandro Valerio
  • September 01, 2017

Betting on the Future: The Virtues of Contingent Contracts

  • Max H. Bazerman
  • James J. Gillespie
  • From the September–October 1999 Issue

How Anger Poisons Decision Making

  • Jennifer S. Lerner
  • Katherine Shonk
  • From the September 2010 Issue

decision making case study learning

Assess Whether You Have a Data Quality Problem

  • Thomas C Redman
  • Thomas C. Redman
  • July 28, 2016

The Year Ahead: Make Better Decisions

  • Thomas H. Davenport
  • January 05, 2009

Winning in Turbulence: A Downturn Caution–Be Careful What You Cut

  • Darrell Rigby and Hernan Saenz
  • January 14, 2009

Is Jerry Yang’s Bond to Yahoo Too Tight?

  • Barbara Kellerman
  • May 06, 2008

decision making case study learning

How Managers Can Build a Culture of Experimentation

  • Frank V. Cespedes
  • February 15, 2022

decision making case study learning

Data Scientists Don't Scale

  • Stuart Frankel
  • May 22, 2015

decision making case study learning

How Women Manage the Gendered Norms of Leadership

  • Alyson Meister
  • November 28, 2018

decision making case study learning

A Checklist for Making Faster, Better Decisions

  • Erik Larson
  • March 07, 2016

decision making case study learning

To Make Better Choices, Look at All Your Options Together

  • Shankha Basu
  • Krishna Savani
  • June 28, 2017

To Make Better Decisions, Combine Datasets

  • Rita Gunther McGrath
  • September 04, 2014

The Ethics of Resume Writing

  • Clinton D. Korver
  • May 19, 2008

Competing Through Manufacturing

  • Steven C. Wheelwright
  • Robert H. Hayes
  • From the January 1985 Issue

decision making case study learning

The Risks You Can’t Foresee

  • Robert S. Kaplan
  • Herman B. Leonard
  • Anette Mikes
  • From the November–December 2020 Issue

Breakthrough ideas for 2006

  • From the February 2006 Issue

We Don’t Know What We Don’t Know

  • Tony Schwartz
  • July 26, 2011

Don’t Take the Wrong Decision Shortcuts

  • Steve Martin
  • November 10, 2010

decision making case study learning

The Age of Outrage: How to Lead in a Polarized World

  • Karthik Ramanna
  • October 29, 2024

decision making case study learning

Andersen Consulting - EMEAI: Reorganization for Revitalization

  • Ashish Nanda
  • Michael Y. Yoshino
  • October 11, 1995

Group Process in the Challenger Launch Decision (A)

  • Amy C. Edmondson
  • Laura R. Feldman
  • October 15, 2002

Utah Symphony and Utah Opera: A Merger Proposal

  • Thomas J. DeLong
  • David L. Ager
  • June 14, 2004

AirTrans Airways, West Coast Service

  • Jose Gomez-Ibanez
  • March 01, 2005

Mount Everest--1996

  • Michael A. Roberto
  • Gina M. Carioggia
  • November 12, 2002

Gentle Electric Co.

  • W. Earl Sasser Jr.
  • August 09, 2002

Extend Fertility

  • Myra M. Hart
  • Sylvia Sensiper
  • December 16, 2004

Aster DM Healthcare: Budgeting for a Crisis

  • V.G. Narayanan
  • Amy Klopfenstein
  • January 25, 2021

Saint Elizabeth: Innovation in Health Care

  • David Barrett
  • Ramasastry Chandrasekhar
  • December 08, 2016

decision making case study learning

Create Detailed Action Plans

  • Harvard Business Publishing
  • May 16, 2016

decision making case study learning

Prediction Machines, Updated and Expanded: The Simple Economics of Artificial Intelligence

  • Ajay Agrawal
  • Joshua Gans
  • Avi Goldfarb
  • November 15, 2022
  • Michael J. Roberts
  • December 01, 1997

Note on the Technical Aspects of Programming in Nonprofit Organizations

  • David W. Young
  • March 13, 2013

decision making case study learning

Preparing a Budget

  • Harvard Business Press
  • May 04, 2009

decision making case study learning

Open Talent: Leveraging the Global Workforce to Solve Your Biggest Challenges

  • John Winsor
  • Jin Hyun Paik
  • January 16, 2024

Hawaii Best, Inc. (A)

  • Steven C. Brandt
  • September 27, 2002

TwinHills Centro: Social Return on Investment

  • Mahrukh Tahir
  • Elizabeth Henderson
  • Irene M. Herremans
  • June 14, 2016

Evan Williams: From Blogger to Odeo (B)

  • Noam Wasserman
  • December 07, 2008

Hola, Bandida: Launching a Beverage Brand with Purpose

  • Violina Rindova
  • Rebecca Castillo
  • Katie Lanfranki
  • Allison Monroe
  • July 07, 2021

decision making case study learning

Regrets Are Inevitable. Start Learning From Them.

  • ALISON BEARD
  • March 01, 2022

AquaSafi Purification Systems: Changing the Operating Model, Teaching Note

  • Elizabeth M.A. Grasby
  • June 08, 2017

decision making case study learning

The Lost Art of Thinking for Yourself

  • June 18, 2020

Popular Topics

Partner center.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 September 2022

Machine learning in project analytics: a data-driven framework and case study

  • Shahadat Uddin 1 ,
  • Stephen Ong 1 &
  • Haohui Lu 1  

Scientific Reports volume  12 , Article number:  15252 ( 2022 ) Cite this article

8226 Accesses

8 Citations

19 Altmetric

Metrics details

  • Applied mathematics
  • Computational science

The analytic procedures incorporated to facilitate the delivery of projects are often referred to as project analytics. Existing techniques focus on retrospective reporting and understanding the underlying relationships to make informed decisions. Although machine learning algorithms have been widely used in addressing problems within various contexts (e.g., streamlining the design of construction projects), limited studies have evaluated pre-existing machine learning methods within the delivery of construction projects. Due to this, the current research aims to contribute further to this convergence between artificial intelligence and the execution construction project through the evaluation of a specific set of machine learning algorithms. This study proposes a machine learning-based data-driven research framework for addressing problems related to project analytics. It then illustrates an example of the application of this framework. In this illustration, existing data from an open-source data repository on construction projects and cost overrun frequencies was studied in which several machine learning models (Python’s Scikit-learn package) were tested and evaluated. The data consisted of 44 independent variables (from materials to labour and contracting) and one dependent variable (project cost overrun frequency), which has been categorised for processing under several machine learning models. These models include support vector machine, logistic regression, k -nearest neighbour, random forest, stacking (ensemble) model and artificial neural network. Feature selection and evaluation methods, including the Univariate feature selection, Recursive feature elimination, SelectFromModel and confusion matrix, were applied to determine the most accurate prediction model. This study also discusses the generalisability of using the proposed research framework in other research contexts within the field of project management. The proposed framework, its illustration in the context of construction projects and its potential to be adopted in different contexts will significantly contribute to project practitioners, stakeholders and academics in addressing many project-related issues.

Introduction

Successful projects require the presence of appropriate information and technology 1 . Project analytics provides an avenue for informed decisions to be made through the lifecycle of a project. Project analytics applies various statistics (e.g., earned value analysis or Monte Carlo simulation) among other models to make evidence-based decisions. They are used to manage risks as well as project execution 2 . There is a tendency for project analytics to be employed due to other additional benefits, including an ability to forecast and make predictions, benchmark with other projects, and determine trends such as those that are time-dependent 3 , 4 , 5 . There has been increasing interest in project analytics and how current technology applications can be incorporated and utilised 6 . Broadly, project analytics can be understood on five levels 4 . The first is descriptive analytics which incorporates retrospective reporting. The second is known as diagnostic analytics , which aims to understand the interrelationships and underlying causes and effects. The third is predictive analytics which seeks to make predictions. Subsequent to this is prescriptive analytics , which prescribes steps following predictions. Finally, cognitive analytics aims to predict future problems. The first three levels can be applied with ease with the help of technology. The fourth and fifth steps require data that is generally more difficult to obtain as they may be less accessible or unstructured. Further, although project key performance indicators can be challenging to define 2 , identifying common measurable features facilitates this 7 . It is anticipated that project analytics will continue to experience development due to its direct benefits to the major baseline measures focused on productivity, profitability, cost, and time 8 . The nature of project management itself is fluid and flexible, and project analytics allows an avenue for which machine learning algorithms can be applied 9 .

Machine learning within the field of project analytics falls into the category of cognitive analytics, which deals with problem prediction. Generally, machine learning explores the possibilities of computers to improve processes through training or experience 10 . It can also build on the pre-existing capabilities and techniques prevalent within management to accomplish complex tasks 11 . Due to its practical use and broad applicability, recent developments have led to the invention and introduction of newer and more innovative machine learning algorithms and techniques. Artificial intelligence, for instance, allows for software to develop computer vision, speech recognition, natural language processing, robot control, and other applications 10 . Specific to the construction industry, it is now used to monitor construction environments through a virtual reality and building information modelling replication 12 or risk prediction 13 . Within other industries, such as consumer services and transport, machine learning is being applied to improve consumer experiences and satisfaction 10 , 14 and reduce the human errors of traffic controllers 15 . Recent applications and development of machine learning broadly fall into the categories of classification, regression, ranking, clustering, dimensionality reduction and manifold learning 16 . Current learning models include linear predictors, boosting, stochastic gradient descent, kernel methods, and nearest neighbour, among others 11 . Newer and more applications and learning models are continuously being introduced to improve accessibility and effectiveness.

Specific to the management of construction projects, other studies have also been made to understand how copious amounts of project data can be used 17 , the importance of ontology and semantics throughout the nexus between artificial intelligence and construction projects 18 , 19 as well as novel approaches to the challenges within this integration of fields 20 , 21 , 22 . There have been limited applications of pre-existing machine learning models on construction cost overruns. They have predominantly focussed on applications to streamline the design processes within construction 23 , 24 , 25 , 26 , and those which have investigated project profitability have not incorporated the types and combinations of algorithms used within this study 6 , 27 . Furthermore, existing applications have largely been skewed towards one type or another 28 , 29 .

In addition to the frequently used earned value method (EVM), researchers have been applying many other powerful quantitative methods to address a diverse range of project analytics research problems over time. Examples of those methods include time series analysis, fuzzy logic, simulation, network analytics, and network correlation and regression. Time series analysis uses longitudinal data to forecast an underlying project's future needs, such as the time and cost 30 , 31 , 32 . Few other methods are combined with EVM to find a better solution for the underlying research problems. For example, Narbaev and De Marco 33 integrated growth models and EVM for forecasting project cost at completion using data from construction projects. For analysing the ongoing progress of projects having ambiguous or linguistic outcomes, fuzzy logic is often combined with EVM 34 , 35 , 36 . Yu et al. 36 applied fuzzy theory and EVM for schedule management. Ponz-Tienda et al. 35 found that using fuzzy arithmetic on EVM provided more objective results in uncertain environments than the traditional methodology. Bonato et al. 37 integrated EVM with Monte Carlo simulation to predict the final cost of three engineering projects. Batselier and Vanhoucke 38 compared the accuracy of the project time and cost forecasting using EVM and simulation. They found that the simulation results supported findings from the EVM. Network methods are primarily used to analyse project stakeholder networks. Yang and Zou 39 developed a social network theory-based model to explore stakeholder-associated risks and their interactions in complex green building projects. Uddin 40 proposed a social network analytics-based framework for analysing stakeholder networks. Ong and Uddin 41 further applied network correlation and regression to examine the co-evolution of stakeholder networks in collaborative healthcare projects. Although many other methods have already been used, as evident in the current literature, machine learning methods or models are yet to be adopted for addressing research problems related to project analytics. The current investigation is derived from the cognitive analytics component of project analytics. It proposes an approach for determining hidden information and patterns to assist with project delivery. Figure  1 illustrates a tree diagram showing different levels of project analytics and their associated methods from the literature. It also illustrates existing methods within the cognitive component of project analytics to where the application of machine learning is situated contextually.

figure 1

A tree diagram of different project analytics methods. It also shows where the current study belongs to. Although earned value analysis is commonly used in project analytics, we do not include it in this figure since it is used in the first three levels of project analytics.

Machine learning models have several notable advantages over traditional statistical methods that play a significant role in project analytics 42 . First, machine learning algorithms can quickly identify trends and patterns by simultaneously analysing a large volume of data. Second, they are more capable of continuous improvement. Machine learning algorithms can improve their accuracy and efficiency for decision-making through subsequent training from potential new data. Third, machine learning algorithms efficiently handle multi-dimensional and multi-variety data in dynamic or uncertain environments. Fourth, they are compelling to automate various decision-making tasks. For example, machine learning-based sentiment analysis can easily a negative tweet and can automatically take further necessary steps. Last but not least, machine learning has been helpful across various industries, for example, defence to education 43 . Current research has seen the development of several different branches of artificial intelligence (including robotics, automated planning and scheduling and optimisation) within safety monitoring, risk prediction, cost estimation and so on 44 . This has progressed from the applications of regression on project cost overruns 45 to the current deep-learning implementations within the construction industry 46 . Despite this, the uses remain largely limited and are still in a developmental state. The benefits of applications are noted, such as optimising and streamlining existing processes; however, high initial costs form a barrier to accessibility 44 .

The primary goal of this study is to demonstrate the applicability of different machine learning algorithms in addressing problems related to project analytics. Limitations in applying machine learning algorithms within the context of construction projects have been explored previously. However, preceding research has mainly been conducted to improve the design processes specific to construction 23 , 24 , and those investigating project profitabilities have not incorporated the types and combinations of algorithms used within this study 6 , 27 . For instance, preceding research has incorporated a different combination of machine-learning algorithms in research of predicting construction delays 47 . This study first proposed a machine learning-based data-driven research framework for project analytics to contribute to the proposed study direction. It then applied this framework to a case study of construction projects. Although there are three different machine learning algorithms (supervised, unsupervised and semi-supervised), the supervised machine learning models are most commonly used due to their efficiency and effectiveness in addressing many real-world problems 48 . Therefore, we will use machine learning to represent supervised machine learning throughout the rest of this article. The contribution of this study is significant in that it considers the applications of machine learning within project management. Project management is often thought of as being very fluid in nature, and because of this, applications of machine learning are often more difficult 9 , 49 . Further to this, existing implementations have largely been limited to safety monitoring, risk prediction, cost estimation and so on 44 . Through the evaluation of machine-learning applications, this study further demonstrates a case study for which algorithms can be used to consider and model the relationship between project attributes and a project performance measure (i.e., cost overrun frequency).

Machine learning-based framework for project analytics

When and why machine learning for project analytics.

Machine learning models are typically used for research problems that involve predicting the classification outcome of a categorical dependent variable. Therefore, they can be applied in the context of project analytics if the underlying objective variable is a categorical one. If that objective variable is non-categorical, it must first be converted into a categorical variable. For example, if the objective or target variable is the project cost, we can convert this variable into a categorical variable by taking only two possible values. The first value would be 0 to indicate a low-cost project, and the second could be 1 for showing a high-cost project. The average or median cost value for all projects under consideration can be considered for splitting project costs into low-cost and high-cost categories.

For data-driven decision-making, machine learning models are advantageous. This is because traditional statistical methods (e.g., ordinary least square (OLS) regression) make assumptions about the underlying research data to produce explicit formulae for the objective target measures. Unlike these statistical methods, machine learning algorithms figure out patterns on their own directly from the data. For instance, for a non-linear but separable dataset, an OLS regression model will not be the right choice due to its assumption that the underlying data must be linear. However, a machine learning model can easily separate the dataset into the underlying classes. Figure  2 (a) presents a situation where machine learning models perform better than traditional statistical methods.

figure 2

( a ) An illustration showing the superior performance of machine learning models compared with the traditional statistical models using an abstract dataset with two attributes (X 1 and X 2 ). The data points within this abstract dataset consist of two classes: one represented with a transparent circle and the second class illustrated with a black-filled circle. These data points are non-linear but separable. Traditional statistical models (e.g., ordinary least square regression) will not accurately separate these data points. However, any machine learning model can easily separate them without making errors; and ( b ) Traditional programming versus machine learning.

Similarly, machine learning models are compelling if the underlying research dataset has many attributes or independent measures. Such models can identify features that significantly contribute to the corresponding classification performance regardless of their distributions or collinearity. Traditional statistical methods have become prone to biased results when there exists a correlation between independent variables. Machine learning-based current studies specific to project analytics have been largely limited. Despite this, there have been tangential studies on the use of artificial intelligence to improve cost estimations as well as risk prediction 44 . Additionally, models have been implemented in the optimisation of existing processes 50 .

Machine learning versus traditional programming

Machine learning can be thought of as a process of teaching a machine (i.e., computers) to learn from data and adjust or apply its present knowledge when exposed to new data 42 . It is a type of artificial intelligence that enables computers to learn from examples or experiences. Traditional programming requires some input data and some logic in the form of code (program) to generate the output. Unlike traditional programming, the input data and their corresponding output are fed to an algorithm to create a program in machine learning. This resultant program can capture powerful insights into the data pattern and can be used to predict future outcomes. Figure  2 (b) shows the difference between machine learning and traditional programming.

Proposed machine learning-based framework

Figure  3 illustrates the proposed machine learning-based research framework of this study. The framework starts with breaking the project research dataset into the training and test components. As mentioned in the previous section, the research dataset may have many categorical and/or nominal independent variables, but its single dependent variable must be categorical. Although there is no strict rule for this split, the training data size is generally more than or equal to 50% of the original dataset 48 .

figure 3

The proposed machine learning-based data-driven framework.

Machine learning algorithms can handle variables that have only numerical outcomes. So, when one or more of the underlying categorical variables have a textual or string outcome, we must first convert them into the corresponding numerical values. Suppose a variable can take only three textual outcomes (low, medium and high). In that case, we could consider, for example, 1 to represent low , 2 to represent medium , and 3 to represent high . Other statistical techniques, such as the RIDIT (relative to an identified distribution) scoring 51 , can also be used to convert ordered categorical measurements into quantitative ones. RIDIT is a parametric approach that uses probabilistic comparison to determine the statistical differences between ordered categorical groups. The remaining components of the proposed framework have been briefly described in the following subsections.

Model-building procedure

The next step of the framework is to follow the model-building procedure to develop the desired machine learning models using the training data. The first step of this procedure is to select suitable machine learning algorithms or models. Among the available machine learning algorithms, the commonly used ones are support vector machine, logistic regression, k -nearest neighbours, artificial neural network, decision tree and random forest 52 . One can also select an ensemble machine learning model as the desired algorithm. An ensemble machine learning method uses multiple algorithms or the same algorithm multiple times to achieve better predictive performance than could be obtained from any of the constituent learning models alone 52 . Three widely used ensemble approaches are bagging, boosting and stacking. In bagging, the research dataset is divided into different equal-sized subsets. The underlying machine learning algorithm is then applied to these subsets for classification. In boosting, a random sample of the dataset is selected and then fitted and trained sequentially with different models to compensate for the weakness observed in the immediately used model. Stacking combined different weak machine learning models in a heterogeneous way to improve the predictive performance. For example, the random forest algorithm is an ensemble of different decision tree models 42 .

Second, each selected machine learning model will be processed through the k -fold cross-validation approach to improve predictive efficiency. In k -fold cross-validation, the training data is divided into k folds. In an iteration, the (k-1) folds are used to train the selected machine models, and the remaining last fold isF used for validation purposes. This iteration process continues until each k folds will get a turn to be used for validation purposes. The final predictive efficiency of the trained models is based on the average values from the outcomes of these iterations. In addition to this average value, researchers use the standard deviation of the results from different iterations as the predictive training efficiency. Supplementary Fig 1 shows an illustration of the k -fold cross-validation.

Third, most machine learning algorithms require a pre-defined value for their different parameters, known as hyperparameter tuning. The settings of these parameters play a vital role in the achieved performance of the underlying algorithm. For a given machine learning algorithm, the optimal value for these parameters can be different from one dataset to another. The same algorithm needs to run multiple times with different parameter values to find its optimal parameter value for a given dataset. Many algorithms are available in the literature, such as the Grid search 53 , to find the optimal parameter value. In the Grid search, hyperparameters are divided into discrete grids. Each grid point represents a specific combination of the underlying model parameters. The parameter values of the point that results in the best performance are the optimal parameter values 53 .

Testing of the developed models and reporting results

Once the desired machine learning models have been developed using the training data, they need to be tested using the test data. The underlying trained model is then applied to predict its dependent variable for each data instance. Therefore, for each data instance, two categorical outcomes will be available for its dependent variable: one predicted using the underlying trained model, and the other is the actual category. These predicted and actual categorical outcome values are used to report the results of the underlying machine learning model.

The fundamental tool to report results from machine learning models is the confusion matrix, which consists of four integer values 48 . The first value represents the number of positive cases correctly identified as positive by the underlying trained model (true-positive). The second value indicates the number of positive instances incorrectly identified as negative (false-negative). The third value represents the number of negative cases incorrectly identified as positive (false-positive). Finally, the fourth value indicates the number of negative instances correctly identified as negative (true-negative). Researchers also use a few performance measures based on the four values of the confusion matrix to report machine learning results. The most used measure is accuracy which is the ratio of the number of correct predictions (true-positive + true-negative) and the total number of data instances (sum of all four values of the confusion matrix). Other measures commonly used to report machine learning results are precision, recall and F1-score. Precision refers to the ratio between true-positives and the total number of positive predictions (i.e., true-positive + false-positive), often used to indicate the quality of a positive prediction made by a model 48 . Recall, also known as the true-positive rate, is calculated by dividing true-positive by the number of data instances that should have been predicted as positive (i.e., true-positive + false-negative). F1-score is the harmonic mean of the last two measures, i.e., [(2 × Precision × Recall)/(Precision + Recall)] and the error-rate equals to (1-Accuracy).

Another essential tool for reporting machine learning results is variable or feature importance, which identifies a list of independent variables (features) contributing most to the classification performance. The importance of a variable refers to how much a given machine learning algorithm uses that variable in making accurate predictions 54 . The widely used technique for identifying variable importance is the principal component analysis. It reduces the dimensionality of the data while minimising information loss, which eventually increases the interpretability of the underlying machine learning outcome. It further helps in finding the important features in a dataset as well as plotting them in 2D and 3D 54 .

Ethical approval

Ethical approval is not required for this study since this study used publicly available data for research investigation purposes. All research was performed in accordance with relevant guidelines/regulations.

Informed consent

Due to the nature of the data sources, informed consent was not required for this study.

Case study: an application of the proposed framework

This section illustrates an application of this study’s proposed framework (Fig.  2 ) in a construction project context. We will apply this framework in classifying projects into two classes based on their cost overrun experience. Projects rarely experience a delay belonging to the first class (Rare class). The second class indicates those projects that often experience a delay (Often class). In doing so, we consider a list of independent variables or features.

Data source

The research dataset is taken from an open-source data repository, Kaggle 55 . This survey-based research dataset was collected to explore the causes of the project cost overrun in Indian construction projects 45 , consisting of 44 independent variables or features and one dependent variable. The independent variables cover a wide range of cost overrun factors, from materials and labour to contractual issues and the scope of the work. The dependent variable is the frequency of experiencing project cost overrun (rare or often). The dataset size is 139; 65 belong to the rare class, and the remaining 74 are from the often class. We converted each categorical variable with a textual or string outcome into an appropriate numerical value range to prepare the dataset for machine learning analysis. For example, we used 1 and 2 to represent rare and often class, respectively. The correlation matrix among the 44 features is presented in Supplementary Fig 2 .

Machine learning algorithms

This study considered four machine learning algorithms to explore the causes of project cost overrun using the research dataset mentioned above. They are support vector machine, logistic regression, k- nearest neighbours and random forest.

Support vector machine (SVM) is a process applied to understand data. For instance, if one wants to determine and interpret which projects are classified as programmatically successful through the processing of precedent data information, SVM would provide a practical approach for prediction. SVM functions by assigning labels to objects 56 . The comparison attributes are used to cluster these objects into different groups or classes by maximising their marginal distances and minimising the classification errors. The attributes are plotted multi-dimensionally, allowing a separation line, known as a hyperplane , see supplementary Fig 3 (a), to distinguish between underlying classes or groups 52 . Support vectors are the data points that lie closest to the decision boundary on both sides. In Supplementary Fig 3 (a), they are the circles (both transparent and shaded ones) close to the hyperplane. Support vectors play an essential role in deciding the position and orientation of the hyperplane. Various computational methods, including a kernel function to create more derived attributes, are applied to accommodate this process 56 . Support vector machines are not only limited to binary classes but can also be generalised to a larger variety of classifications. This is accomplished through the training of separate SVMs 56 .

Logistic regression (LR) builds on the linear regression model and predicts the outcome of a dichotomous variable 57 ; for example, the presence or absence of an event. It uses a scatterplot to understand the connection between an independent variable and one or more dependent variables (see Supplementary Fig 3 (b)). LR model fits the data to a sigmoidal curve instead of fitting it to a straight line. The natural logarithm is considered when developing the model. It provides a value between 0 and 1 that is interpreted as the probability of class membership. Best estimates are determined by developing from approximate estimates until a level of stability is reached 58 . Generally, LR offers a straightforward approach for determining and observing interrelationships. It is more efficient compared to ordinary regressions 59 .

k -nearest neighbours (KNN) algorithm uses a process that plots prior information and applies a specific sample size ( k ) to the plot to determine the most likely scenario 52 . This method finds the nearest training examples using a distance measure. The final classification is made by counting the most common scenario or votes present within the specified sample. As illustrated in Supplementary Fig 3 (c), the closest four nearest neighbours in the small circle are three grey squares and one white square. The majority class is grey. Hence, KNN will predict the instance (i.e., Χ ) as grey. On the other hand, if we look at the larger circle of the same figure, the nearest neighbours consist of ten white squares and four grey squares. The majority class is white. Thus, KNN will classify the instance as white. KNN’s advantage lies in its ability to produce a simplified result and handle missing data 60 . In summary, KNN utilises similarities (as well as differences) and distances in the process of developing models.

Random forest (RF) is a machine learning process that consists of many decision trees. A decision tree is a tree-like structure where each internal node represents a test on the input attribute. It may have multiple internal nodes at different levels, and the leaf or terminal nodes represent the decision outcomes. It produces a classification outcome for a distinctive and separate part to the input vector. For non-numerical processes, it considers the average value, and for discrete processes, it considers the number of votes 52 . Supplementary Fig 3 (d) shows three decision trees to illustrate the function of a random forest. The outcomes from trees 1, 2 and 3 are class B, class A and class A, respectively. According to the majority vote, the final prediction will be class A. Because it considers specific attributes, it can have a tendency to emphasise specific attributes over others, which may result in some attributes being unevenly weighted 52 . Advantages of the random forest include its ability to handle multidimensionality and multicollinearity in data despite its sensitivity to sampling design.

Artificial neural network (ANN) simulates the way in which human brains work. This is accomplished by modelling logical propositions and incorporating weighted inputs, a transfer and one output 61 (Supplementary Fig 3 (e)). It is advantageous because it can be used to model non-linear relationships and handle multivariate data 62 . ANN learns through three major avenues. These include error-back propagation (supervised), the Kohonen (unsupervised) and the counter-propagation ANN (supervised) 62 . There are two types of ANN—supervised and unsupervised. ANN has been used in a myriad of applications ranging from pharmaceuticals 61 to electronic devices 63 . It also possesses great levels of fault tolerance 64 and learns by example and through self-organisation 65 .

Ensemble techniques are a type of machine learning methodology in which numerous basic classifiers are combined to generate an optimal model 66 . An ensemble technique considers many models and combines them to form a single model, and the final model will eliminate the weaknesses of each individual learner, resulting in a powerful model that will improve model performance. The stacking model is a general architecture comprised of two classifier levels: base classifier and meta-learner 67 . The base classifiers are trained with the training dataset, and a new dataset is constructed for the meta-learner. Afterwards, this new dataset is used to train the meta-classifier. This study uses four models (SVM, LR, KNN and RF) as base classifiers and LR as a meta learner, as illustrated in Supplementary Fig 3 (f).

Feature selection

The process of selecting the optimal feature subset that significantly influences the predicted outcomes, which may be efficient to increase model performance and save running time, is known as feature selection. This study considers three different feature selection approaches. They are the Univariate feature selection (UFS), Recursive feature elimination (RFE) and SelectFromModel (SFM) approach. UFS examines each feature separately to determine the strength of its relationship with the response variable 68 . This method is straightforward to use and comprehend and helps acquire a deeper understanding of data. In this study, we calculate the chi-square values between features. RFE is a type of backwards feature elimination in which the model is fit first using all features in the given dataset and then removing the least important features one by one 69 . After that, the model is refit until the desired number of features is left over, which is determined by the parameter. SFM is used to choose effective features based on the feature importance of the best-performing model 70 . This approach selects features by establishing a threshold based on feature significance as indicated by the model on the training set. Those characteristics whose feature importance is more than the threshold are chosen, while those whose feature importance is less than the threshold are deleted. In this study, we apply SFM after we compare the performance of four machine learning methods. Afterwards, we train the best-performing model again using the features from the SFM approach.

Findings from the case study

We split the dataset into 70:30 for training and test purposes of the four selected machine learning algorithms. We used Python’s Scikit-learn package for implementing these algorithms 70 . Using the training data, we first developed six models based on these six algorithms. We used fivefold validation and target to improve the accuracy value. Then, we applied these models to the test data. We also executed all required hyperparameter tunings for each algorithm for the possible best classification outcome. Table 1 shows the performance outcomes for each algorithm during the training and test phase. The hyperparameter settings for each algorithm have been listed in Supplementary Table 1 .

As revealed in Table 1 , random forest outperformed the other three algorithms in terms of accuracy for both the training and test phases. It showed an accuracy of 78.14% and 77.50% for the training and test phases, respectively. The second-best performer in the training phase is k- nearest neighbours (76.98%), and for the test phase, it is the support vector machine, k- nearest neighbours and artificial neural network (72.50%).

Since random forest showed the best performance, we explored further based on this algorithm. We applied the three approaches (UFS, RFE and SFM) for feature optimisation on the random forest. The result is presented in Table 2 . SFM shows the best outcome among these three approaches. Its accuracy is 85.00%, whereas the accuracies of USF and RFE are 77.50% and 72.50%, respectively. As can be seen in Table 2 , the accuracy for the testing phase increases from 77.50% in Table 1 (b) to 85.00% with the SFM feature optimisation. Table 3 shows the 19 selected features from the SFM output. Out of 44 features, SFM found that 19 of them play a significant role in predicting the outcomes.

Further, Fig.  4 illustrates the confusion matrix when the random forest model with the SFM feature optimiser was applied to the test data. There are 18 true-positive, five false-negative, one false-positive and 16 true-negative cases. Therefore, the accuracy for the test phase is (18 + 16)/(18 + 5 + 1 + 16) = 85.00%.

figure 4

Confusion matrix results based on the random forest model with the SFM feature optimiser (1 for the rare class and 2 for the often class).

Figure  5 illustrates the top-10 most important features or variables based on the random forest algorithm with the SFM optimiser. We used feature importance based on the mean decrease in impurity in identifying this list of important variables. Mean decrease in impurity computes each feature’s importance as the sum over the number of splits that include the feature in proportion to the number of samples it splits 71 . According to this figure, the delays in decision marking attribute contributed most to the classification performance of the random forest algorithm, followed by cash flow problem and construction cost underestimation attributes. The current construction project literature also highlighted these top-10 factors as significant contributors to project cost overrun. For example, using construction project data from Jordan, Al-Hazim et al. 72 ranked 20 causes for cost overrun, including causes similar to these causes.

figure 5

Feature importance (top-10 out of 19) based on the random forest model with the SFM feature optimiser.

Further, we conduct a sensitivity analysis of the model’s ten most important features (from Fig.  5 ) to explore how a change in each feature affects the cost overrun. We utilise the partial dependence plot (PDP), which is a typical visualisation tool for non-parametric models 73 , to display this analysis’s outcomes. A PDP can demonstrate whether the relation between the target and a feature is linear, monotonic, or more complicated. The result of the sensitivity analysis is presented in Fig.  6 . For the ‘delays in decisions making’ attribute, the PDP shows that the probability is below 0.4 until the rating value is three and increases after. A higher value for this attribute indicates a higher risk of cost overrun. On the other hand, there are no significant differences can be seen in the remaining nine features if the value changes.

figure 6

The result of the sensitivity analysis from the partial dependency plot tool for the ten most important features.

Summary of the case study

We illustrated an application of the proposed machine learning-based research framework in classifying construction projects. RF showed the highest accuracy in predicting the test dataset. For a new data instance with information for its 19 features but has not had any information on its classification, RF can identify its class ( rare or often ) correctly with a probability of 85.00%. If more data is provided, in addition to the 139 instances of the case study, to the machine learning algorithms, then their accuracy and efficiency in making project classification will improve with subsequent training. For example, if we provide 100 more data instances, these algorithms will have an additional 50 instances for training with a 70:30 split. This continuous improvement facility put the machine learning algorithms in a superior position over other traditional methods. In the current literature, some studies explore the factors contributing to project delay or cost overrun. In most cases, they applied factor analysis or other related statistical methods for research data analysis 72 , 74 , 75 . In addition to identifying important attributes, the proposed machine learning-based framework identified the ranking of factors and how eliminating less important factors affects the prediction accuracy when applied to this case study.

We shared the Python software developed to implement the four machine learning algorithms considered in this case study using GitHub 76 , a software hosting internet site. user-friendly version of this software can be accessed at https://share.streamlit.io/haohuilu/pa/main/app.py . The accuracy findings from this link could be slightly different from one run to another due to the hyperparameter settings of the corresponding machine learning algorithms.

Due to their robust prediction ability, machine learning methods have already gained wide acceptability across a wide range of research domains. On the other side, EVM is the most commonly used method in project analytics due to its simplicity and ease of interpretability 77 . Essential research efforts have been made to improve its generalisability over time. For example, Naeni et al. 34 developed a fuzzy approach for earned value analysis to make it suitable to analyse project scenarios with ambiguous or linguistic outcomes. Acebes 78 integrated Monte Carlo simulation with EVM for project monitoring and control for a similar purpose. Another prominent method frequently used in project analytics is the time series analysis, which is compelling for the longitudinal prediction of project time and cost 30 . Apparently, as evident in the present current literature, not much effort has been made to bring machine learning into project analytics for addressing project management research problems. This research made a significant attempt to contribute to filling up this gap.

Our proposed data-driven framework only includes the fundamental model development and application process components for machine learning algorithms. It does not have a few advanced-level machine learning methods. This study intentionally did not consider them for the proposed model since they are required only in particular designs of machine learning analysis. For example, the framework does not contain any methods or tools to handle the data imbalance issue. Data imbalance refers to a situation when the research dataset has an uneven distribution of the target class 79 . For example, a binary target variable will cause a data imbalance issue if one of its class labels has a very high number of observations compared with the other class. Commonly used techniques to address this issue are undersampling and oversampling. The undersampling technique decreases the size of the majority class. On the other hand, the oversampling technique randomly duplicates the minority class until the class distribution becomes balanced 79 . The class distribution of the case study did not produce any data imbalance issues.

This study considered only six fundamental machine learning algorithms for the case study, although many other such algorithms are available in the literature. For example, it did not consider the extreme gradient boosting (XGBoost) algorithm. XGBoost is based on the decision tree algorithm, similar to the random forest algorithm 80 . It has become dominant in applied machine learning due to its performance and speed. Naïve Bayes and convolutional neural networks are other popular machine learning algorithms that were not considered when applying the proposed framework to the case study. In addition to the three feature selection methods, multi-view can be adopted when applying the proposed framework to the case study. Multi-view learning is another direction in machine learning that considers learning with multiple views of the existing data with the aim to improve predictive performance 81 , 82 . Similarly, although we considered five performance measures, there are other potential candidates. One such example is the area under the receiver operating curve, which is the ability of the underlying classifier to distinguish between classes 48 . We leave them as a potential application scope while applying our proposed framework in any other project contexts in future studies.

Although this study only used one case study for illustration, our proposed research framework can be used in other project analytics contexts. In such an application context, the underlying research goal should be to predict the outcome classes and find attributes playing a significant role in making correct predictions. For example, by considering two types of projects based on the time required to accomplish (e.g., on-time and delayed ), the proposed framework can develop machine learning models that can predict the class of a new data instance and find out attributes contributing mainly to this prediction performance. This framework can also be used at any stage of the project. For example, the framework’s results allow project stakeholders to screen projects for excessive cost overruns and forecast budget loss at bidding and before contracts are signed. In addition, various factors that contribute to project cost overruns can be figured out at an earlier stage. These elements emerge at each stage of a project’s life cycle. The framework’s feature importance helps project managers locate the critical contributor to cost overrun.

This study has made an important contribution to the current project analytics literature by considering the applications of machine learning within project management. Project management is often thought of as being very fluid in nature, and because of this, applications of machine learning are often more difficult. Further, existing implementations have largely been limited to safety monitoring, risk prediction and cost estimation. Through the evaluation of machine learning applications, this study further demonstrates the uses for which algorithms can be used to consider and model the relationship between project attributes and cost overrun frequency.

The applications of machine learning in project analytics are still undergoing constant development. Within construction projects, its applications have been largely limited and focused on profitability or the design of structures themselves. In this regard, our study made a substantial effort by proposing a machine learning-based framework to address research problems related to project analytics. We also illustrated an example of this framework’s application in the context of construction project management.

Like any other research, this study also has a few limitations that could provide scopes for future research. First, the framework does not include a few advanced machine learning techniques, such as data imbalance issues and kernel density estimation. Second, we considered only one case study to illustrate the application of the proposed framework. Illustrations of this framework using case studies from different project contexts would confirm its robust application. Finally, this study did not consider all machine learning models and performance measures available in the literature for the case study. For example, we did not consider the Naïve Bayes model and precision measure in applying the proposed research framework for the case study.

Data availability

This study obtained research data from publicly available online repositories. We mentioned their sources using proper citations. Here is the link to the data https://www.kaggle.com/datasets/amansaxena/survey-on-road-construction-delay .

Venkrbec, V. & Klanšek, U. In: Advances and Trends in Engineering Sciences and Technologies II 685–690 (CRC Press, 2016).

Google Scholar  

Damnjanovic, I. & Reinschmidt, K. Data Analytics for Engineering and Construction Project Risk Management (Springer, 2020).

Book   Google Scholar  

Singh, H. Project Management Analytics: A Data-driven Approach to Making Rational and Effective Project Decisions (FT Press, 2015).

Frame, J. D. & Chen, Y. Why Data Analytics in Project Management? (Auerbach Publications, 2018).

Ong, S. & Uddin, S. Data Science and Artificial Intelligence in Project Management: The Past, Present and Future. J. Mod. Proj. Manag. 7 , 26–33 (2020).

Bilal, M. et al. Investigating profitability performance of construction projects using big data: A project analytics approach. J. Build. Eng. 26 , 100850 (2019).

Article   Google Scholar  

Radziszewska-Zielina, E. & Sroka, B. Planning repetitive construction projects considering technological constraints. Open Eng. 8 , 500–505 (2018).

Neely, A. D., Adams, C. & Kennerley, M. The Performance Prism: The Scorecard for Measuring and Managing Business Success (Prentice Hall Financial Times, 2002).

Kanakaris, N., Karacapilidis, N., Kournetas, G. & Lazanas, A. In: International Conference on Operations Research and Enterprise Systems. 135–155 Springer.

Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 349 , 255–260 (2015).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, 2014).

Book   MATH   Google Scholar  

Rahimian, F. P., Seyedzadeh, S., Oliver, S., Rodriguez, S. & Dawood, N. On-demand monitoring of construction projects through a game-like hybrid application of BIM and machine learning. Autom. Constr. 110 , 103012 (2020).

Sanni-Anibire, M. O., Zin, R. M. & Olatunji, S. O. Machine learning model for delay risk assessment in tall building projects. Int. J. Constr. Manag. 22 , 1–10 (2020).

Cong, J. et al. A machine learning-based iterative design approach to automate user satisfaction degree prediction in smart product-service system. Comput. Ind. Eng. 165 , 107939 (2022).

Li, F., Chen, C.-H., Lee, C.-H. & Feng, S. Artificial intelligence-enabled non-intrusive vigilance assessment approach to reducing traffic controller’s human errors. Knowl. Based Syst. 239 , 108047 (2021).

Mohri, M., Rostamizadeh, A. & Talwalkar, A. Foundations of Machine Learning (MIT press, 2018).

MATH   Google Scholar  

Whyte, J., Stasis, A. & Lindkvist, C. Managing change in the delivery of complex projects: Configuration management, asset information and ‘big data’. Int. J. Proj. Manag. 34 , 339–351 (2016).

Zangeneh, P. & McCabe, B. Ontology-based knowledge representation for industrial megaprojects analytics using linked data and the semantic web. Adv. Eng. Inform. 46 , 101164 (2020).

Akinosho, T. D. et al. Deep learning in the construction industry: A review of present status and future innovations. J. Build. Eng. 32 , 101827 (2020).

Soman, R. K., Molina-Solana, M. & Whyte, J. K. Linked-Data based constraint-checking (LDCC) to support look-ahead planning in construction. Autom. Constr. 120 , 103369 (2020).

Soman, R. K. & Whyte, J. K. Codification challenges for data science in construction. J. Constr. Eng. Manag. 146 , 04020072 (2020).

Soman, R. K. & Molina-Solana, M. Automating look-ahead schedule generation for construction using linked-data based constraint checking and reinforcement learning. Autom. Constr. 134 , 104069 (2022).

Shi, F., Soman, R. K., Han, J. & Whyte, J. K. Addressing adjacency constraints in rectangular floor plans using Monte-Carlo tree search. Autom. Constr. 115 , 103187 (2020).

Chen, L. & Whyte, J. Understanding design change propagation in complex engineering systems using a digital twin and design structure matrix. Eng. Constr. Archit. Manag. (2021).

Allison, J. T. et al. Artificial intelligence and engineering design. J. Mech. Des. 144 , 020301 (2022).

Dutta, D. & Bose, I. Managing a big data project: The case of ramco cements limited. Int. J. Prod. Econ. 165 , 293–306 (2015).

Bilal, M. & Oyedele, L. O. Guidelines for applied machine learning in construction industry—A case of profit margins estimation. Adv. Eng. Inform. 43 , 101013 (2020).

Tayefeh Hashemi, S., Ebadati, O. M. & Kaur, H. Cost estimation and prediction in construction projects: A systematic review on machine learning techniques. SN Appl. Sci. 2 , 1–27 (2020).

Arage, S. S. & Dharwadkar, N. V. In: International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 594–599 (IEEE, 2017).

Cheng, C.-H., Chang, J.-R. & Yeh, C.-A. Entropy-based and trapezoid fuzzification-based fuzzy time series approaches for forecasting IT project cost. Technol. Forecast. Soc. Chang. 73 , 524–542 (2006).

Joukar, A. & Nahmens, I. Volatility forecast of construction cost index using general autoregressive conditional heteroskedastic method. J. Constr. Eng. Manag. 142 , 04015051 (2016).

Xu, J.-W. & Moon, S. Stochastic forecast of construction cost index using a cointegrated vector autoregression model. J. Manag. Eng. 29 , 10–18 (2013).

Narbaev, T. & De Marco, A. Combination of growth model and earned schedule to forecast project cost at completion. J. Constr. Eng. Manag. 140 , 04013038 (2014).

Naeni, L. M., Shadrokh, S. & Salehipour, A. A fuzzy approach for the earned value management. Int. J. Proj. Manag. 29 , 764–772 (2011).

Ponz-Tienda, J. L., Pellicer, E. & Yepes, V. Complete fuzzy scheduling and fuzzy earned value management in construction projects. J. Zhejiang Univ. Sci. A 13 , 56–68 (2012).

Yu, F., Chen, X., Cory, C. A., Yang, Z. & Hu, Y. An active construction dynamic schedule management model: Using the fuzzy earned value management and BP neural network. KSCE J. Civ. Eng. 25 , 2335–2349 (2021).

Bonato, F. K., Albuquerque, A. A. & Paixão, M. A. S. An application of earned value management (EVM) with Monte Carlo simulation in engineering project management. Gest. Produção 26 , e4641 (2019).

Batselier, J. & Vanhoucke, M. Empirical evaluation of earned value management forecasting accuracy for time and cost. J. Constr. Eng. Manag. 141 , 05015010 (2015).

Yang, R. J. & Zou, P. X. Stakeholder-associated risks and their interactions in complex green building projects: A social network model. Build. Environ. 73 , 208–222 (2014).

Uddin, S. Social network analysis in project management–A case study of analysing stakeholder networks. J. Mod. Proj. Manag. 5 , 106–113 (2017).

Ong, S. & Uddin, S. Co-evolution of project stakeholder networks. J. Mod. Proj. Manag. 8 , 96–115 (2020).

Khanzode, K. C. A. & Sarode, R. D. Advantages and disadvantages of artificial intelligence and machine learning: A literature review. Int. J. Libr. Inf. Sci. (IJLIS) 9 , 30–36 (2020).

Loyola-Gonzalez, O. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 7 , 154096–154113 (2019).

Abioye, S. O. et al. Artificial intelligence in the construction industry: A review of present status, opportunities and future challenges. J. Build. Eng. 44 , 103299 (2021).

Doloi, H., Sawhney, A., Iyer, K. & Rentala, S. Analysing factors affecting delays in Indian construction projects. Int. J. Proj. Manag. 30 , 479–489 (2012).

Alkhaddar, R., Wooder, T., Sertyesilisik, B. & Tunstall, A. Deep learning approach’s effectiveness on sustainability improvement in the UK construction industry. Manag. Environ. Qual. Int. J. 23 , 126–139 (2012).

Gondia, A., Siam, A., El-Dakhakhni, W. & Nassar, A. H. Machine learning algorithms for construction projects delay risk prediction. J. Constr. Eng. Manag. 146 , 04019085 (2020).

Witten, I. H. & Frank, E. Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, 2005).

Kanakaris, N., Karacapilidis, N. I. & Lazanas, A. In: ICORES. 362–369.

Heo, S., Han, S., Shin, Y. & Na, S. Challenges of data refining process during the artificial intelligence development projects in the architecture engineering and construction industry. Appl. Sci. 11 , 10919 (2021).

Article   CAS   Google Scholar  

Bross, I. D. How to use ridit analysis. Biometrics 14 , 18–38 (1958).

Uddin, S., Khan, A., Hossain, M. E. & Moni, M. A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 19 , 1–16 (2019).

LaValle, S. M., Branicky, M. S. & Lindemann, S. R. On the relationship between classical grid search and probabilistic roadmaps. Int. J. Robot. Res. 23 , 673–692 (2004).

Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2 , 433–459 (2010).

Saxena, A. Survey on Road Construction Delay , https://www.kaggle.com/amansaxena/survey-on-road-construction-delay (2021).

Noble, W. S. What is a support vector machine?. Nat. Biotechnol. 24 , 1565–1567 (2006).

Article   CAS   PubMed   Google Scholar  

Hosmer, D. W. Jr., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression Vol. 398 (John Wiley & Sons, 2013).

LaValley, M. P. Logistic regression. Circulation 117 , 2395–2399 (2008).

Article   PubMed   Google Scholar  

Menard, S. Applied Logistic Regression Analysis Vol. 106 (Sage, 2002).

Batista, G. E. & Monard, M. C. A study of K-nearest neighbour as an imputation method. His 87 , 48 (2002).

Agatonovic-Kustrin, S. & Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 22 , 717–727 (2000).

Zupan, J. Introduction to artificial neural network (ANN) methods: What they are and how to use them. Acta Chim. Slov. 41 , 327–327 (1994).

CAS   Google Scholar  

Hopfield, J. J. Artificial neural networks. IEEE Circuits Devices Mag. 4 , 3–10 (1988).

Zou, J., Han, Y. & So, S.-S. Overview of artificial neural networks. Artificial Neural Networks . 14–22 (2008).

Maind, S. B. & Wankar, P. Research paper on basic of artificial neural network. Int. J. Recent Innov. Trends Comput. Commun. 2 , 96–100 (2014).

Wolpert, D. H. Stacked generalization. Neural Netw. 5 , 241–259 (1992).

Pavlyshenko, B. In: IEEE Second International Conference on Data Stream Mining & Processing (DSMP). 255–258 (IEEE).

Jović, A., Brkić, K. & Bogunović, N. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). 1200–1205 (Ieee, 2015).

Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 , 389–422 (2002).

Article   MATH   Google Scholar  

Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Louppe, G., Wehenkel, L., Sutera, A. & Geurts, P. Understanding variable importances in forests of randomized trees. Adv. Neural. Inf. Process. Syst. 26 , 431–439 (2013).

Al-Hazim, N., Salem, Z. A. & Ahmad, H. Delay and cost overrun in infrastructure projects in Jordan. Procedia Eng. 182 , 18–24 (2017).

Breiman, L. Random forests. Mach. Learn. 45 , 5–32. https://doi.org/10.1023/A:1010933404324 (2001).

Shehu, Z., Endut, I. R. & Akintoye, A. Factors contributing to project time and hence cost overrun in the Malaysian construction industry. J. Financ. Manag. Prop. Constr. 19 , 55–75 (2014).

Akomah, B. B. & Jackson, E. N. Contractors’ perception of factors contributing to road project delay. Int. J. Constr. Eng. Manag. 5 , 79–85 (2016).

GitHub: Where the world builds software , https://github.com/ .

Anbari, F. T. Earned value project management method and extensions. Proj. Manag. J. 34 , 12–23 (2003).

Acebes, F., Pereda, M., Poza, D., Pajares, J. & Galán, J. M. Stochastic earned value analysis using Monte Carlo simulation and statistical learning techniques. Int. J. Proj. Manag. 33 , 1597–1609 (2015).

Japkowicz, N. & Stephen, S. The class imbalance problem: A systematic study. Intell. data anal. 6 , 429–449 (2002).

Chen, T. et al. Xgboost: extreme gradient boosting. R Packag. Version 0.4–2.1 1 , 1–4 (2015).

Guarino, A., Lettieri, N., Malandrino, D., Zaccagnino, R. & Capo, C. Adam or Eve? Automatic users’ gender classification via gestures analysis on touch devices. Neural Comput. Appl. 1–23 (2022).

Zaccagnino, R., Capo, C., Guarino, A., Lettieri, N. & Malandrino, D. Techno-regulation and intelligent safeguards. Multimed. Tools Appl. 80 , 15803–15824 (2021).

Download references

Acknowledgements

The authors acknowledge the insightful comments from Prof Jennifer Whyte on an earlier version of this article.

Author information

Authors and affiliations.

School of Project Management, The University of Sydney, Level 2, 21 Ross St, Forest Lodge, NSW, 2037, Australia

Shahadat Uddin, Stephen Ong & Haohui Lu

You can also search for this author in PubMed   Google Scholar

Contributions

S.U.: Conceptualisation; Data curation; Formal analysis; Methodology; Supervision; and Writing (original draft, review and editing) S.O.: Data curation; and Writing (original draft, review and editing) H.L.: Methodology; and Writing (original draft, review and editing) All authors reviewed the manuscript).

Corresponding author

Correspondence to Shahadat Uddin .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Uddin, S., Ong, S. & Lu, H. Machine learning in project analytics: a data-driven framework and case study. Sci Rep 12 , 15252 (2022). https://doi.org/10.1038/s41598-022-19728-x

Download citation

Received : 13 April 2022

Accepted : 02 September 2022

Published : 09 September 2022

DOI : https://doi.org/10.1038/s41598-022-19728-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Evaluation and prediction of time overruns in jordanian construction projects using coral reefs optimization and deep learning methods.

  • Jumana Shihadeh
  • Ghyda Al-Shaibie
  • Hamza Al-Bdour

Asian Journal of Civil Engineering (2024)

Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

  • Shahadat Uddin

Health and Technology (2024)

Prediction of SMEs’ R&D performances by machine learning for project selection

  • Hyoung Sun Yoo
  • Ye Lim Jung
  • Seung-Pyo Jun

Scientific Reports (2023)

A robust and resilience machine learning for forecasting agri-food production

  • Amin Gholamrezaei
  • Kiana Kheiri

Scientific Reports (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

decision making case study learning

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • BMC Med Educ

Logo of bmcmedu

How does video case-based learning influence clinical decision-making by midwifery students? An exploratory study

Kana nunohara.

1 Medical Education Development Center, Gifu University, Yanagido 1-1, Gifu, 501-1194 Japan

2 Nursing Department, Gifu College of Nursing, Egira-cho 3047-1, Hashima, Gifu, 501-6295 Japan

Rintaro Imafuku

Takuya saiki, susan m. bridges.

3 Faculty of Education, The University of Hong Kong, Pokfulam Road, Pok Fu Lam, Hong Kong

Chihiro Kawakami

Koji tsunekawa, masayuki niwa, kazuhiko fujisaki, yasuyuki suzuki, associated data.

The datasets generated and analyzed during the current study are not publicly available since the patients` confidential information are included, but are available from the corresponding author on reasonable request.

Clinical decision-making skills are essential for providing high-quality patient care. To enhance these skills, many institutions worldwide use case-based learning (CBL) as an educational strategy of pre-clinical training. However, to date, the influence of different learning modalities on students’ clinical decision-making processes has not been fully explored. This study aims to explore the influence of video and paper case modalities on the clinical decision-making process of midwifery students during CBL.

CBL involving a normal pregnant woman was provided for 45 midwifery students. They were divided into 12 groups; six groups received the video modality, and six groups received the paper modality. Group discussions were video-recorded, and focus groups were conducted after the CBL. Transcripts of the group discussions were analysed in terms of their interaction patterns, and focus groups were thematically analysed based on the three-stage model of clinical decision-making, which includes cue acquisition, interpretation, and evaluation/decision-making.

The students in the video groups paid more attention to psychosocial than biomedical aspects and discussed tailored care for the woman and her family members. They refrained from vaginal examinations and electric fetal heart monitoring. Conversely, the students in the paper groups paid more attention to biomedical than psychosocial aspects and discussed when to perform vaginal examinations and electric fetal heart monitoring.

This study clarified that video and paper case modalities have different influences on learners’ clinical decision-making processes. Video case learning encourages midwifery students to have a woman- and family-centred holistic perspective of labour and birth care, which leads to careful consideration of the psychosocial aspects. Paper case learning encourages midwifery students to have a healthcare provider-centred biomedical perspective of labour and childbirth care, which leads to thorough biomedical assessment.

Current midwifery educators are finding it increasingly difficult to provide adequate opportunities for students to learn how to offer effective care for women with normal pregnancies in clinical placements [ 1 , 2 ]. Several factors impact this issue: the increased risk(s) of interventions for pregnant women over the age of 35 [ 3 ], societal awareness of women’s rights and concerns about litigation [ 4 , 5 ], and decreased birth rates [ 6 ]. As health professionals, midwives are responsible for making important clinical decisions regarding positive childbirth experiences for pregnant women. Thus, midwifery students must acquire clinical decision-making skills that effectively support pregnant women and their family members from both physical and psychosocial perspectives.

Due to the sharply declining trend of birth rates in Japan (e.g. 1.42 in 2018), midwifery students have fewer learning opportunities involving normal pregnancies. Ninety-nine percent of Japanese women give birth with assistance from midwives in maternity hospitals in collaboration with obstetricians; only 1% give birth in midwifery homes. Fifty-eight percent of births occur in the presence of the women’s partners for emotional support [ 7 ].

Midwifery students participate in either a one-year program at an independent midwifery school after completing a 3 to 4-year nursing program or a 1 to 1.5-year midwifery course within a 4-year nursing program. In general, while some midwifery students have clinical experience as registered nurses before receiving their midwifery education, most lack any such experience.

Case-based learning (CBL) is an inquiry-based pedagogical approach that prepares students for clinical practice through authentic cases to develop their clinical decision-making skills. It has been implemented in various pre-clinical and clinical settings [ 8 , 9 ]. CBL before clinical placement is a common educational method in Japan, used in more than 70% of midwifery schools [ 10 ]. Successful implementation of CBL can foster students’ learning in terms of knowledge acquisition and application, intrinsic motivation [ 8 ], patient assessment [ 11 ], problem-solving [ 12 ], and critical thinking [ 13 , 14 ]. Moreover, pictorial information in the case materials induces intuition, and objective and quantitative information induces analysis [ 15 ].

Previous studies have revealed that students tend to prefer video cases since they perceive video modality as authentic [ 16 , 17 ], interesting [ 18 ], motivating [ 19 ], and stimulating [ 20 ]. Video cases elicit students’ attention and emotions [ 21 ], promote empathy [ 22 ], improve memory retention [ 17 ], increase understanding of the cases [ 18 ], and improve students’ patient-centredness [ 23 ]. However, some studies have questioned whether video cases make it difficult to identify relevant information and also hamper information retention [ 24 ] and deep critical thinking [ 20 ]. Two-thirds of students preferred paper cases since video cases impeded their ability to critically review the presented information [ 24 ]. Moreover, some students perceived that the written texts of case materials were more reliable since they could reread the contents and learn at their own pace [ 25 ].

Despite these studies, the influences of case modalities on students’ learning processes for clinical decision-making have yet to be thoroughly explored. In particular, although clinical decision-making skills are essential to providing high-quality patient care, they have not been fully examined in the field of nurse and midwifery education.

This study aims to explore the influences of video and paper case modalities on the clinical decision-making processes of midwifery students during CBL. These two modalities were chosen based on their high feasibility and popularity in the world.

Using a three-stage model of clinical decision-making developed based on the concept of hypothetical deductive reasoning [ 26 , 27 ], we analysed the following characteristics of learning modalities in CBL:

  • Cue acquisition: acquisition and retention of clues necessary to interpret the case information.
  • Interpretation: interpretation and understanding of the obtained clues in the case information.
  • Evaluation/decision-making: selection of an optimum plan and related decision-making based on the understanding of the case information.

Case development

First, we developed a typical paper case regarding a full-term primipara who experienced a normal birth in a maternity hospital based on the ‘Evidence-based Midwifery Guidelines’ published by the Japan Academy of Midwifery [ 28 ]. The case consisted of six common scenes of interventions by midwives from the primipara’s admission to the maternity hospital with her partner to the time that she entered the birthing room (Table  1 ): admission (Scene 1), after the morning rounds by an obstetrician (Scene 2), after lunch and once moderate labour pain occurs every 5 min (Scene 3), once labour pain extends to the perianal region indicating the need for a vaginal examination (VE) (Scene 4), once labour pain and anal compression have increased (Scene 5), and typical features of the second stage (Scene 6). At Scene 5, the midwife should confirm the progress of labour and focus on the psychological status of the woman. This scenario includes common clinical problems that midwifery students might encounter in clinical sites, information regarding the labour progress, and the conversations between the woman and her family members. Facial expressions and emotions of the woman and her family are also described in the paper scenario.

Case overview

A video case was precisely reproduced from the paper scenario. Professional actors played the roles of the woman and her partner under the instruction of midwife teachers using video learning material. The midwife teachers played the roles of the woman’s mother and a midwife in the video. The authenticity and consistency regarding the textual and visual descriptions of the labour processes in the paper and video cases were reviewed by four experts in midwifery, each with a minimum of 8 years of clinical experience.

Participants

Three midwifery schools (A, B, and C) in the areas surrounding the researchers’ institutions were selected as the research sites. Fifty midwifery students were invited to participate, and consent was obtained from 45 students (90.0%). All participants had completed at least 3 years of a nursing program and were taking a one-year, full-time midwifery course. At the time of their participation, they had completed the four-month pre-clinical midwifery program (which focused on childbirth care, midwifery philosophy, and women-centred care) and were about to begin their clinical placements. Nine of 45 students (20.0%) had worked as registered nurses after graduating from nursing schools before entering the midwifery program; the other 36 (80%) had no clinical work experience as registered nurses and were taking the midwifery course directly after completing nursing school.

The participants were assigned to 12 similar small groups based on information from the midwifery teachers, such as their age, previous learning experience, academic achievement, nursing experience, commutation skills, and (if possible) their own childbirth experience. The number of students per group was limited to four (maximum five) to maximize student engagement in the CBL. Six groups were assigned to video case learning (V1–V6: three groups from School A, one from School B, and two from School C for a total of 24 students), and six were assigned to paper case learning (P1–P6: four groups from School A, one from School B, and one from School C for a total of 21 students). All students were female with an average age of 24.5 years (21 to 41 years) for the video groups and 24.1 years (21 to 41 years) for the paper groups.

Intervention

The CBLs were conducted as extracurricular activities in Japanese. The researchers (KN, CK, and RI) prepared the classroom setting, recorded and managed timekeeping, and allowed the students to discuss their experiences in Japanese. However, they did not facilitate the students’ discussions in order to gather data in a natural setting. Prior to the CBL activities, the students in both groups were asked to read the background information on the case for 5 min, including the woman’s profile, labour progress, and birth plan. During the CBL activities, the researchers provided a paper scenario or replayed a video clip for each scene in a stepwise manner. After reading or watching each scene with the group members, the students were asked to perform the following tasks: 1) assess the woman and her family members, 2) create a care plan, and 3) decide on an action plan. The times allotted for group discussions of each scene are shown in Table ​ Table1. 1 . The students received additional information regarding the VE and/or electric fetal heart monitoring (EFM) when they decided to perform VE or EFM in each scene.

Data collection

All group discussions during the CBL activities were video- and audio-recorded. Following such activities, focus groups were conducted by KN, CK and RI in order to elicit reflective data, such as the participants’ perceptions of the learning process, any difficulties that they encountered, and the influence(s) of the case modality (i.e., video or paper) on their learning. The focus groups were conducted in Japanese and then translated into English by the researchers.

Ethical considerations

This research was approved by the Institutional Review Board of the Gifu University Graduate School of Medicine (No. 24–366). The students were asked to participate in this study several weeks before conducting the CBL activities so that they were given enough time to make an informed decision. Moreover, the researcher emphasized that the study was completely voluntary and that the participants could withdraw at any time, without negative consequences. Finally, the participants were asked to mail their consent forms directly and individually to the researchers to preserve their anonymity. None of the faculty members at any of the schools attended the sessions when the researchers explained the project, and none of the researchers were engaged as faculty members at the three schools.

Data analysis

To interpret and describe the influences of the case modalities on participants’ clinical decision-making processes, both focus groups and recorded data from group discussions during CBL were analysed using qualitative content analysis [ 29 , 30 ]. Qualitative content analysis is a systematic and flexible method for describing the meaning of qualitative data through reducing the amount of material [ 31 ]. The text materials for qualitative content analysis include all sorts of recorded communication such as transcripts of interviews, discourses, protocols of observations, and videos [ 29 ]. Adopting this analytical approach, this study developed the categories of coding frame in terms of the students’ perceptions of and experiences with CBL.

Codes were iteratively developed during the coding process in addition to those developed from the three-stage model of clinical decision-making. This theoretical consideration led to developing further categories or rephrasing/revising the categories [ 30 ]. The steps of coding include selecting material, structuring and generating categories based on theory or previous studies, defining categories, and revising and expanding the frame [ 31 ].

Following an independent qualitative content analysis of focus group transcripts by KN, RI, TS, and YS, the overlying themes and subcategories were cross-checked, and the findings were carefully reviewed by all of the researchers at the stages of defining, revising, and expanding the frame. Coding disagreements were resolved through discussion leading to refinement of the coding frame.

The transcripts of the group discussions were also analysed using qualitative content analysis with a method of process coding. In the process coding, simple observable activity (e.g. questioning, answering, agreeing, and acknowledging) and more general conceptual action (e.g. struggling, negotiating, adapting, and applying) can be coded to each exchange in the conversation [ 32 ]. Employing the process coding, the distinctive exchange patterns of the video and paper groups were extracted by four independent researchers (KN, RI, TS, and YS). Furthermore, the researchers classified the students’ clinical decision-making processes regarding VE and EFM in each scene into the following categories: ‘should do’, ‘should not do’, and ‘suspended’. The transcription symbols of recorded data from the group discussions are provided in Additional file 1 in the Supplementary Information section.

Clinical decisions regarding VE and EFM

Table  2 shows the frequencies of conducting VE and EFM. The video groups decided to perform VE 16 times and EFM 20 times whereas the paper groups performed VE 23 times and EFM 25 times. In addition, the video groups chose not to perform VE eight times and EFM two times whereas the paper groups decided not to perform VE two times and EFM zero times. The frequencies of conducting VE and EFM in the video groups were below the commonly accepted standards of obstetrics in Japan. Conversely, the frequencies in the paper groups were above the standards. A qualitative analysis of group discussion showed that paper groups tended to decide more VE/FM.

Decision-making on VE and EFM

VE vaginal examination, EFM electric fetal monitor

● Implementation of VE ■ Implementation of EFM × decide not to do VE or EFM

* VE based on general care of low risk women in Japan. EFM based on Guideline for Obstetrical Practice in Japan at the time of data collection in this study

The students’ reflections on their clinical decision-making processes

This section includes qualitative content analysis and excerpts from the focus groups, especially the students’ perceptions regarding the influences of the case modality during each stage of clinical decision-making (Table  3 ).

Characteristics of students’ clinical decision-making

Cue acquisition

Video groups.

Psychosocially oriented information collection.

The students in the video groups found it helpful to imagine the case by grasping the context psychosocially. This also helped them understand the family as a whole, and they empathized with the woman and her family members even if they lacked clinical experience. (In the following excerpts, V indicates a video group, P indicates a paper group, number represents the group number, and the letter indicates the individual student.)

· Through the video, we could see the entire picture, including the family members (V6-A). · I empathized with the woman in labour, and I wanted to relieve her pain as quickly as possible (V3-C). · It was easier to get into the situation, since it made me feel like I was playing an important role in it (V4-B).

However, the students in the video groups felt a sense of urgency and had difficulty memorizing the information due to the rapid flow of details in the video. As a result, the discussion was narrowed down to specific and significant events that were somewhat emotionally biased.

· In the video, I was unable to think slowly (V6-A). · I felt a sense of urgency; I had to hurry if I were to do something, which also made me feel impatient (V6-D). · Since I could not remember the information (in the previous scene), we simply discussed the problems in the current scene (V2-A).

Paper groups

Biomedically oriented information collection.

In contrast, the students in the paper groups focused more on the biomedical data about pregnant woman and fetal heart rate. Since they could read the textual information repeatedly, they found it easy to remember the previous scene systematically.

· I was able to focus on the quantitative data and abnormal data (P5-B). · It was easy to compare the woman’s current and previous status as well as refer to her basic information, birth plan, and VE results, since all of the data was in writing (P6-C).

However, the students in the paper groups, especially those that lacked experience in observing childbirth, found it difficult to imagine the situation of the woman and her family members. They also had difficulty grasping psychosocial aspects such as the feelings/emotions of the woman and her family members.

· Since I had no experience in observing labour, it was difficult to imagine what it must be like (P3-B). · Although the conversations were written in the paper case, I did not know how they were feeling. In other words, it was difficult for me to understand the feelings of the woman and her family members (P3-D).

Interpretation

Easy to imagine woman holistically but a fragmented understanding.

In the video groups, the audio-visual information helped the students understand and empathize with the woman and her family members holistically and made them feel like ‘real attending midwives’.

· Since the information was visual instead of textual, I was able to understand the personalities of the woman and her family members in a more detailed manner (V1-A). · By watching the video, I could imagine myself as a person (in the scene) (V3-A).

However, the students in the video groups felt that their biomedical assessment of the case was insufficient and superficial, which caused their assessment to become fragmented.

· The assessment itself became somewhat superficial (V2-A). · We skipped the assessment and only discussed the birth plan (V2-D). · Since the video included a rapid flow of information, I was unsure whether my assessment was sufficient, and I wondered if my judgment was correct (V5-C).

Biomedically sequential understanding but difficult to imagine woman holistically.

The students in the paper groups felt that it was easy to assess the case by analysing the information thoroughly and sequentially. They relied on biomedical and numerical rather than psychosocial information.

· I was able to assess the woman sentence-by-sentence (P1-B). · Since I was able to understand the numerical information as one criterion for judging the labour progress, I wanted to perform the VE (P4-A).

However, it was difficult for the students to understand the woman holistically and to empathize. As a result, they monitored the woman frequently according to their own anxiety and biomedical viewpoints.

· There was no image at all, so we decided to perform the VE (P1-C). · We decided to perform EFM every time without considering whether the timing was right, since we were anxious about fetal dysfunction (P2-B).

Evaluation/decision-making

Woman-centred practical and tailored care but restraint from monitoring and delayed decision-making.

The students in the video groups found it easy to consider when and how to administer individually tailored care. The planned care was contextually sensitive, woman-centred, and holistic. However, the students refrained from frequent monitoring.

· It also made it easier to think about what type of care to provide and when and how it should be provided (V3-A). · If the timing of EFM is poor, then it can be painful for the woman. In other words, we should never conduct EFM without considering the woman’s situation (V6-A).

However, the students in the video groups regretted delaying the timing of VE and EFM. They also felt that they had over-considered the comfort of the woman and her family members and became emotionally involved in the case.

· I over-focused on relieving her pain and caring for her family, so I missed the overall point (V2-D). · Since VE can be painful, I wondered if I should stop doing it. I decided to perform (VE) later, but I was surprised that the cervix was almost fully dilated (V2-A).

Healthcare provider-centred medical safety-oriented care but general, knowledge-driven care.

The students in the paper groups decided to perform VE and EFM quickly, and they did not focus on the woman’s comfort. The planned care was nonspecific, medical safety-oriented, and driven by textbook-based knowledge. Consequently, monitoring occurred frequently. Moreover, the students had trouble considering how to implement the planned care from a practical viewpoint.

· We decided to perform EFM every time without considering whether the timing was right, since we were anxious about fetal dysfunction (P2-B). · To be honest, we decided to perform VE and EFM at the same time (P2-A). · There were various opinions regarding how to alleviate pain, but it was difficult to think about certain priorities such as identifying what was suitable for the woman, the timing, and the situation (P6-D).

Characteristics of clinical decision-making in the group discussions during CBL

The characteristics of the students’ clinical decision-making processes are also clarified in the group discussions in accordance with the characteristics in Table ​ Table3. 3 . The representative excerpts from the video and paper groups are included in Additional file 2 in the Supplementary Information section.

Woman- and family-centred care by empathetic midwives

The attitudes, manners of speaking, and remarks differed between the video and paper groups. The students in the video groups exhibited more empathetic behaviours toward the woman. This was supported by their usage of personal pronouns, such as ‘I’ and ‘we’, which implied that they had a sense of ownership as midwives. Conversely, the students in the paper groups used third-person pronouns, such as ‘he’ and ‘they’, in order to describe what occurred in the scenario (see Excerpt 1 in Additional file 2 in the Supplementary Information section).

The students in the video groups were immersed in the case. They discussed woman- and family-centred care with empathetic attitudes as if they were in front of the woman and her partner in real life. In Excerpt 1, they used empathetic phrases to cheer up the anxious woman, to reassure her about the increasing intensity of labour pain, and to explain the likely outcome of her labour. They also used empathetic phrases regarding her partner, such as ‘her partner looks upset’, and considered the partner’s fatigue.

The students in the paper groups enumerated textbook-based general care. However, they rarely discussed specific approaches and the timing of care in the given context. The attitudes were healthcare provider-centred and biomedically oriented. In Excerpt 1, although they were aware of the lack of a practical care plan, they only stated textbook-based principles of human care, such as labour pain relief in accordance with the woman’s request, mental support, and being beside her. The students were able to objectively determine that the woman’s anxiety would adversely affect her labour progress, but they only provided biomedical information, such as ‘we need to tell her the pain will get more intense than it is now’, without empathetic remarks.

Psychosocially oriented practical and tailored care

The video groups employed an approach to practical and tailored care that considered the woman’s psychosocial aspects. The care plan was discussed from a holistic viewpoint. Excerpt 2 in Additional file 2 shows careful consideration of psychosocial aspects, including the woman’s daily life activities, her emotions, her partner’s opinions, and the support from her family members. The students also proposed that the woman take a walk to instigate the labour process. Subsequently, they decided to perform EFM as she was resting.

The paper groups generally focused on the woman’s biomedical aspects and knowledge-driven general care. In Excerpt 2, the paper group decided to perform EFM and a VE rather quickly, after which they requested the related biomedical information. There were fewer discussions regarding when and how to perform EFM and VE. In other words, the care plan in the paper groups was biomedically oriented and abstractive.

Refraining from performing invasive care

The video groups generally refrained from performing VE and EFM in consideration of the woman’s comfort whereas the paper groups performed VE and EFM to obtain objective biomedical information. In Excerpt 3 in Additional file 2 , after confirming the woman’s situation, the students in the video group judged the progression of labour according to the nature of her labour pain and proposed a VE. However, after accusatory statements were made regarding excessive VE, the students determined that the labour progression was slow based on the findings of the previous VE, elapsed time, and the interval between contractions, and the VE decision was cancelled.

Conversely, the paper groups focused more on the early detection of abnormalities. In Excerpt 3, all students in the paper group agreed to perform a VE and EFM immediately after a short discussion. They decided to measure the woman’s blood pressure out of worry over hypertensive disorders of pregnancy because of her father’s history of high blood pressure, and, subsequently, they decided to conduct EFM quickly without careful consideration.

Influence of the case modality on the students’ clinical decision-making processes

This exploratory study first found differences in the three stages of clinical decision-making processes, i.e. cue acquisition, interpretation, and evaluation/decision-making, between the midwifery students in the video and paper groups. It was suggested that the video groups also chose to perform EFM and VE less frequently than the paper groups.

The cue acquisition stage

The video groups focused on the woman’s psychosocial aspects and showed empathy with her situation, but their information collection was biased. The video modality placed the students in the ‘actual’ labour process, and it made them feel a sense of authenticity and urgency. These findings are in line with those of previous studies [ 22 , 24 , 33 ]. More specifically, the video clearly visualized the woman’s pain and distress, which, in turn, motivated the students to become more empathetic. In the present study, the video groups had more difficulty retaining information than the paper groups. Previous studies have also highlighted the difficulty of identifying and extracting relevant information from video materials [ 24 , 34 , 35 ]. Although significant life event shown in the five-minute video used this study is most likely retained in learners’ long-term memory, educators should be aware of their limited capacities of memorization and information retention when viewing video materials.

In contrast, the paper groups mainly focused on the woman’s biomedical aspects, and they might not have sufficiently perceived the woman’s emotional dimension or sense of urgency. The focus group showed that the students with limited clinical experience had difficulty in creating an image of her psychosocial aspects. A lower achievement of imaging was also reported when paper patients were used [ 16 ].

The interpretation stage

In the video groups, the students’ interest in and interpretation of the woman’s significant event were psychosocial, and their assessments were often fragmented. They lacked a comprehensive picture of the woman’s labour process. These findings are supported by previous studies in which video cases disrupted learners’ critical thinking [ 20 ]. Fragmented assessments may be caused by psychosocially biased information.

Conversely, in the paper groups, the students’ interest and interpretation were biomedical, and their assessments were thorough and sequential. The paper groups emphasized the importance of discussing and interpreting the biomedical aspects of the woman’s labour process. Since group members had the case information sheet in hand, they were able to refer to such information at their own pace [ 24 ]. Students in the paper group in our study could read and discuss information sheets, which should be advantageous for information gathering and interpretation. However, the use of a paper case would still require detailed descriptions of the woman’s psychosocial aspects and her family’s background to imagine the woman holistically.

The evaluation/decision-making stage

The video groups adopted a woman- and family-centred holistic decision-making approach that considered the woman’s comfort. This approach was reflected in the students’ active discussions regarding how the care could be tailored to the wishes of the woman. These results are in line with the previous studies [ 22 , 36 , 37 ]. The other important finding in the present study is that the video groups generally tended to refrain from performing frequent EFM and VE. The video might have influenced the students to focus more on the woman’s comfort and reconsider the need to excessively perform such procedures.

In contrast, the paper groups adopted a healthcare provider-centred biomedical approach that emphasized medical safety over the women’s comfort. They also viewed the case from an objective perspective. Moreover, the discussion data showed that the students tended to frequently choose to perform EFM and VE to determine whether the woman had any serious problems. Such behaviour might have been partly due to their anxiety caused by the lack of knowledge. Thus, educators should be aware of this possible risk when using paper cases.

Pedagogical implications of using different modalities to improve students’ clinical decision-making skills

This exploratory study revealed that video- and paper-based teaching materials have their own strengths and weaknesses.

Video cases are not only valuable for developing students’ information-gathering skills before entering a clinical setting but also important for planning tailored care that considers the woman’s comfort. However, comprehensive assessments might be insufficient in video cases. The following are suggestions for faculty members’ effective facilitation of learning with video cases.

  • Make video clips short, preferably less than 5 min, considering information processing ability.
  • Encourage students to write down the information while viewing.
  • Provide sufficient time for information sharing by students after viewing.
  • Calm down students when they are too emotionally involved.
  • Set learning goals for students to discuss both psychological and biomedical aspects and medical safety.

On the other hand, paper cases can be valuable for training students to make comprehensive patient assessments, especially regarding biomedical information. The following are suggestions for faculty to facilitate learning with paper cases.

  • Describe psychosocial information of the case, such as facial expressions, voice tones, conversations, and emotions.
  • Encourage students to share a common image of a woman/patient.
  • Ask students to formulate both practical and personalized care plans.
  • Encourage students to think how to communicate with the people and gather information in a real clinical setting.

In our study, we chose paper and video cases because of high feasibility and popularity of these modalities; however, it would be necessary to conduct further research to compare new modalities, such as e-CBL [ 38 , 39 ] and simulation-based scenarios [ 40 ].

Limitations

This study includes several limitations. First, the sample was small, and the study was performed using a single case scenario in one country. Although the scenario was carefully created, the possibility of bias remains, thus limiting the study’s validity. Cultural differences of childbirth and family relations should also be considered. Second, we did not perform a statistical analysis due to the small number of participants, and the frequency of EFM/VE was judged qualitatively from student interactions. Follow-up studies should be conducted in the future including the long-term effects of case modalities on students’ clinical decision-making processes and on their performance in actual clinical practice.

This study clarified the different influences of video and paper case modalities on the clinical decision-making processes of midwifery students. The students’ perceptions of the case, the cues recognized by the students, and the emotions felt by the students might depend on the learning modalities. The video case made the students consider the woman and her family from a holistic viewpoint, which fostered their ability to provide tailored care based on their comfort. The paper case made the students more healthcare provider-centred, which fostered their ability to make clinical decisions based on thorough biomedical assessments. Educators should be aware of strengths and weaknesses of video and paper case modalities in students’ learning for clinical decision-making.

Supplementary information

Acknowledgements.

The authors thank the midwifery students who participated in this study and their teachers who cooperated for collecting data in the research sites.

Authors’ contributions

All authors contributed to the research design. KN, RI, and CK were substantially involved in data collection, including the recordings of CBL and focus groups. KN, RI, TS, and YS were involved in the main data analysis and developed the study concept in consultation with SB. KT, CK, KF, and MN evaluated the credibility of the data analysis. KN, RI, TS and YS substantially worked on writing the manuscript, and all authors revised it and approved the final version.

I certify that no funding has been received for the conduct of this study and/or preparation of this manuscript.

Availability of data and materials

Ethics approval and consent to participate.

This research was approved by the Institutional Review Board of Gifu University Graduate School of Medicine (No. 24–366). Written informed consent was obtained from the participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Rintaro Imafuku is an Editorial Board Member for BMC Medical Education. He had no editorial role in the handling of this manuscript and was blinded to the process.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Kana Nunohara, Email: pj.ca.nc-ufig@arahonun .

Rintaro Imafuku, Email: pj.ca.u-ufig@ukufamir .

Takuya Saiki, Email: pj.ca.u-ufig@katikias .

Susan M. Bridges, Email: kh.ukh@segdirbs .

Chihiro Kawakami, Email: pj.ca.u-ufig@uobiru .

Koji Tsunekawa, Email: pj.ca.u-ufig@tijok .

Masayuki Niwa, Email: pj.ca.u-ufig@awinm .

Kazuhiko Fujisaki, Email: pj.ca.u-ufig@ijufk .

Yasuyuki Suzuki, Email: pj.ca.u-ufig@zusy .

Supplementary information accompanies this paper at 10.1186/s12909-020-1969-0.

  • Columbia University in the City of New York
  • Office of Teaching, Learning, and Innovation
  • University Policies
  • Columbia Online
  • Academic Calendar
  • Resources and Technology
  • Resources and Guides

Case Method Teaching and Learning

What is the case method? How can the case method be used to engage learners? What are some strategies for getting started? This guide helps instructors answer these questions by providing an overview of the case method while highlighting learner-centered and digitally-enhanced approaches to teaching with the case method. The guide also offers tips to instructors as they get started with the case method and additional references and resources.

On this page:

What is case method teaching.

  • Case Method at Columbia

Why use the Case Method?

Case method teaching approaches, how do i get started.

  • Additional Resources

The CTL is here to help!

For support with implementing a case method approach in your course, email [email protected] to schedule your 1-1 consultation .

Cite this resource: Columbia Center for Teaching and Learning (2019). Case Method Teaching and Learning. Columbia University. Retrieved from [today’s date] from https://ctl.columbia.edu/resources-and-technology/resources/case-method/  

Case method 1 teaching is an active form of instruction that focuses on a case and involves students learning by doing 2 3 . Cases are real or invented stories 4  that include “an educational message” or recount events, problems, dilemmas, theoretical or conceptual issue that requires analysis and/or decision-making.

Case-based teaching simulates real world situations and asks students to actively grapple with complex problems 5 6 This method of instruction is used across disciplines to promote learning, and is common in law, business, medicine, among other fields. See Table 1 below for a few types of cases and the learning they promote.

Table 1: Types of cases and the learning they promote.

For a more complete list, see Case Types & Teaching Methods: A Classification Scheme from the National Center for Case Study Teaching in Science.

Back to Top

Case Method Teaching and Learning at Columbia

The case method is actively used in classrooms across Columbia, at the Morningside campus in the School of International and Public Affairs (SIPA), the School of Business, Arts and Sciences, among others, and at Columbia University Irving Medical campus.

Faculty Spotlight:

Professor Mary Ann Price on Using Case Study Method to Place Pre-Med Students in Real-Life Scenarios

Read more  

Professor De Pinho on Using the Case Method in the Mailman Core

Case method teaching has been found to improve student learning, to increase students’ perception of learning gains, and to meet learning objectives 8 9 . Faculty have noted the instructional benefits of cases including greater student engagement in their learning 10 , deeper student understanding of concepts, stronger critical thinking skills, and an ability to make connections across content areas and view an issue from multiple perspectives 11 . 

Through case-based learning, students are the ones asking questions about the case, doing the problem-solving, interacting with and learning from their peers, “unpacking” the case, analyzing the case, and summarizing the case. They learn how to work with limited information and ambiguity, think in professional or disciplinary ways, and ask themselves “what would I do if I were in this specific situation?”

The case method bridges theory to practice, and promotes the development of skills including: communication, active listening, critical thinking, decision-making, and metacognitive skills 12 , as students apply course content knowledge, reflect on what they know and their approach to analyzing, and make sense of a case. 

Though the case method has historical roots as an instructor-centered approach that uses the Socratic dialogue and cold-calling, it is possible to take a more learner-centered approach in which students take on roles and tasks traditionally left to the instructor. 

Cases are often used as “vehicles for classroom discussion” 13 . Students should be encouraged to take ownership of their learning from a case. Discussion-based approaches engage students in thinking and communicating about a case. Instructors can set up a case activity in which students are the ones doing the work of “asking questions, summarizing content, generating hypotheses, proposing theories, or offering critical analyses” 14 . 

The role of the instructor is to share a case or ask students to share or create a case to use in class, set expectations, provide instructions, and assign students roles in the discussion. Student roles in a case discussion can include: 

  • discussion “starters” get the conversation started with a question or posing the questions that their peers came up with; 
  • facilitators listen actively, validate the contributions of peers, ask follow-up questions, draw connections, refocus the conversation as needed; 
  • recorders take-notes of the main points of the discussion, record on the board, upload to CourseWorks, or type and project on the screen; and 
  • discussion “wrappers” lead a summary of the main points of the discussion. 

Prior to the case discussion, instructors can model case analysis and the types of questions students should ask, co-create discussion guidelines with students, and ask for students to submit discussion questions. During the discussion, the instructor can keep time, intervene as necessary (however the students should be doing the talking), and pause the discussion for a debrief and to ask students to reflect on what and how they learned from the case activity. 

Note: case discussions can be enhanced using technology. Live discussions can occur via video-conferencing (e.g., using Zoom ) or asynchronous discussions can occur using the Discussions tool in CourseWorks (Canvas) .

Table 2 includes a few interactive case method approaches. Regardless of the approach selected, it is important to create a learning environment in which students feel comfortable participating in a case activity and learning from one another. See below for tips on supporting student in how to learn from a case in the “getting started” section and how to create a supportive learning environment in the Guide for Inclusive Teaching at Columbia . 

Table 2. Strategies for Engaging Students in Case-Based Learning

Approaches to case teaching should be informed by course learning objectives, and can be adapted for small, large, hybrid, and online classes. Instructional technology can be used in various ways to deliver, facilitate, and assess the case method. For instance, an online module can be created in CourseWorks (Canvas) to structure the delivery of the case, allow students to work at their own pace, engage all learners, even those reluctant to speak up in class, and assess understanding of a case and student learning. Modules can include text, embedded media (e.g., using Panopto or Mediathread ) curated by the instructor, online discussion, and assessments. Students can be asked to read a case and/or watch a short video, respond to quiz questions and receive immediate feedback, post questions to a discussion, and share resources. 

For more information about options for incorporating educational technology to your course, please contact your Learning Designer .

To ensure that students are learning from the case approach, ask them to pause and reflect on what and how they learned from the case. Time to reflect  builds your students’ metacognition, and when these reflections are collected they provides you with insights about the effectiveness of your approach in promoting student learning.

Well designed case-based learning experiences: 1) motivate student involvement, 2) have students doing the work, 3) help students develop knowledge and skills, and 4) have students learning from each other.  

Designing a case-based learning experience should center around the learning objectives for a course. The following points focus on intentional design. 

Identify learning objectives, determine scope, and anticipate challenges. 

  • Why use the case method in your course? How will it promote student learning differently than other approaches? 
  • What are the learning objectives that need to be met by the case method? What knowledge should students apply and skills should they practice? 
  • What is the scope of the case? (a brief activity in a single class session to a semester-long case-based course; if new to case method, start small with a single case). 
  • What challenges do you anticipate (e.g., student preparation and prior experiences with case learning, discomfort with discussion, peer-to-peer learning, managing discussion) and how will you plan for these in your design? 
  • If you are asking students to use transferable skills for the case method (e.g., teamwork, digital literacy) make them explicit. 

Determine how you will know if the learning objectives were met and develop a plan for evaluating the effectiveness of the case method to inform future case teaching. 

  • What assessments and criteria will you use to evaluate student work or participation in case discussion? 
  • How will you evaluate the effectiveness of the case method? What feedback will you collect from students? 
  • How might you leverage technology for assessment purposes? For example, could you quiz students about the case online before class, accept assignment submissions online, use audience response systems (e.g., PollEverywhere) for formative assessment during class? 

Select an existing case, create your own, or encourage students to bring course-relevant cases, and prepare for its delivery

  • Where will the case method fit into the course learning sequence? 
  • Is the case at the appropriate level of complexity? Is it inclusive, culturally relevant, and relatable to students? 
  • What materials and preparation will be needed to present the case to students? (e.g., readings, audiovisual materials, set up a module in CourseWorks). 

Plan for the case discussion and an active role for students

  • What will your role be in facilitating case-based learning? How will you model case analysis for your students? (e.g., present a short case and demo your approach and the process of case learning) (Davis, 2009). 
  • What discussion guidelines will you use that include your students’ input? 
  • How will you encourage students to ask and answer questions, summarize their work, take notes, and debrief the case? 
  • If students will be working in groups, how will groups form? What size will the groups be? What instructions will they be given? How will you ensure that everyone participates? What will they need to submit? Can technology be leveraged for any of these areas? 
  • Have you considered students of varied cognitive and physical abilities and how they might participate in the activities/discussions, including those that involve technology? 

Student preparation and expectations

  • How will you communicate about the case method approach to your students? When will you articulate the purpose of case-based learning and expectations of student engagement? What information about case-based learning and expectations will be included in the syllabus?
  • What preparation and/or assignment(s) will students complete in order to learn from the case? (e.g., read the case prior to class, watch a case video prior to class, post to a CourseWorks discussion, submit a brief memo, complete a short writing assignment to check students’ understanding of a case, take on a specific role, prepare to present a critique during in-class discussion).

Andersen, E. and Schiano, B. (2014). Teaching with Cases: A Practical Guide . Harvard Business Press. 

Bonney, K. M. (2015). Case Study Teaching Method Improves Student Performance and Perceptions of Learning Gains†. Journal of Microbiology & Biology Education , 16 (1), 21–28. https://doi.org/10.1128/jmbe.v16i1.846

Davis, B.G. (2009). Chapter 24: Case Studies. In Tools for Teaching. Second Edition. Jossey-Bass. 

Garvin, D.A. (2003). Making the Case: Professional Education for the world of practice. Harvard Magazine. September-October 2003, Volume 106, Number 1, 56-107.

Golich, V.L. (2000). The ABCs of Case Teaching. International Studies Perspectives. 1, 11-29. 

Golich, V.L.; Boyer, M; Franko, P.; and Lamy, S. (2000). The ABCs of Case Teaching. Pew Case Studies in International Affairs. Institute for the Study of Diplomacy. 

Heath, J. (2015). Teaching & Writing Cases: A Practical Guide. The Case Center, UK. 

Herreid, C.F. (2011). Case Study Teaching. New Directions for Teaching and Learning. No. 128, Winder 2011, 31 – 40. 

Herreid, C.F. (2007). Start with a Story: The Case Study Method of Teaching College Science . National Science Teachers Association. Available as an ebook through Columbia Libraries. 

Herreid, C.F. (2006). “Clicker” Cases: Introducing Case Study Teaching Into Large Classrooms. Journal of College Science Teaching. Oct 2006, 36(2). https://search.proquest.com/docview/200323718?pq-origsite=gscholar  

Krain, M. (2016). Putting the Learning in Case Learning? The Effects of Case-Based Approaches on Student Knowledge, Attitudes, and Engagement. Journal on Excellence in College Teaching. 27(2), 131-153. 

Lundberg, K.O. (Ed.). (2011). Our Digital Future: Boardrooms and Newsrooms. Knight Case Studies Initiative. 

Popil, I. (2011). Promotion of critical thinking by using case studies as teaching method. Nurse Education Today, 31(2), 204–207. https://doi.org/10.1016/j.nedt.2010.06.002

Schiano, B. and Andersen, E. (2017). Teaching with Cases Online . Harvard Business Publishing. 

Thistlethwaite, JE; Davies, D.; Ekeocha, S.; Kidd, J.M.; MacDougall, C.; Matthews, P.; Purkis, J.; Clay D. (2012). The effectiveness of case-based learning in health professional education: A BEME systematic review . Medical Teacher. 2012; 34(6): e421-44. 

Yadav, A.; Lundeberg, M.; DeSchryver, M.; Dirkin, K.; Schiller, N.A.; Maier, K. and Herreid, C.F. (2007). Teaching Science with Case Studies: A National Survey of Faculty Perceptions of the Benefits and Challenges of Using Cases. Journal of College Science Teaching; Sept/Oct 2007; 37(1). 

Weimer, M. (2013). Learner-Centered Teaching: Five Key Changes to Practice. Second Edition. Jossey-Bass.

Additional resources 

Teaching with Cases , Harvard Kennedy School of Government. 

Features “what is a teaching case?” video that defines a teaching case, and provides documents to help students prepare for case learning, Common case teaching challenges and solutions, tips for teaching with cases. 

Promoting excellence and innovation in case method teaching: Teaching by the Case Method , Christensen Center for Teaching & Learning. Harvard Business School. 

National Center for Case Study Teaching in Science . University of Buffalo. 

A collection of peer-reviewed STEM cases to teach scientific concepts and content, promote process skills and critical thinking. The Center welcomes case submissions. Case classification scheme of case types and teaching methods:

  • Different types of cases: analysis case, dilemma/decision case, directed case, interrupted case, clicker case, a flipped case, a laboratory case. 
  • Different types of teaching methods: problem-based learning, discussion, debate, intimate debate, public hearing, trial, jigsaw, role-play. 

Columbia Resources

Resources available to support your use of case method: The University hosts a number of case collections including: the Case Consortium (a collection of free cases in the fields of journalism, public policy, public health, and other disciplines that include teaching and learning resources; SIPA’s Picker Case Collection (audiovisual case studies on public sector innovation, filmed around the world and involving SIPA student teams in producing the cases); and Columbia Business School CaseWorks , which develops teaching cases and materials for use in Columbia Business School classrooms.

Center for Teaching and Learning

The Center for Teaching and Learning (CTL) offers a variety of programs and services for instructors at Columbia. The CTL can provide customized support as you plan to use the case method approach through implementation. Schedule a one-on-one consultation. 

Office of the Provost

The Hybrid Learning Course Redesign grant program from the Office of the Provost provides support for faculty who are developing innovative and technology-enhanced pedagogy and learning strategies in the classroom. In addition to funding, faculty awardees receive support from CTL staff as they redesign, deliver, and evaluate their hybrid courses.

The Start Small! Mini-Grant provides support to faculty who are interested in experimenting with one new pedagogical strategy or tool. Faculty awardees receive funds and CTL support for a one-semester period.

Explore our teaching resources.

  • Blended Learning
  • Contemplative Pedagogy
  • Inclusive Teaching Guide
  • FAQ for Teaching Assistants
  • Metacognition

CTL resources and technology for you.

  • Overview of all CTL Resources and Technology
  • The origins of this method can be traced to Harvard University where in 1870 the Law School began using cases to teach students how to think like lawyers using real court decisions. This was followed by the Business School in 1920 (Garvin, 2003). These professional schools recognized that lecture mode of instruction was insufficient to teach critical professional skills, and that active learning would better prepare learners for their professional lives. ↩
  • Golich, V.L. (2000). The ABCs of Case Teaching. <i>International Studies Perspectives. </i>1, 11-29. ↩
  • </span><span style="font-weight: 400;">Herreid, C.F. (2007). </span><i><span style="font-weight: 400;">Start with a Story: The Case Study Method of Teaching College Science</span></i><span style="font-weight: 400;">. National Science Teachers Association. Available as an </span><a href="http://www.columbia.edu/cgi-bin/cul/resolve?clio12627183"><span style="font-weight: 400;">ebook</span></a><span style="font-weight: 400;"> through Columbia Libraries. ↩
  • Davis, B.G. (2009). Chapter 24: Case Studies. In <i>Tools for Teaching. </i>Second Edition. Jossey-Bass. ↩
  • Andersen, E. and Schiano, B. (2014). <i>Teaching with Cases: A Practical Guide</i>. Harvard Business Press. ↩
  • Lundberg, K.O. (Ed.). (2011). <i>Our Digital Future: Boardrooms and Newsrooms. </i>Knight Case Studies Initiative. ↩
  • Heath, J. (2015). <i>Teaching & Writing Cases: A Practical Guide. </i>The Case Center, UK. ↩
  • Bonney, K. M. (2015). Case Study Teaching Method Improves Student Performance and Perceptions of Learning Gains†. <i>Journal of Microbiology & Biology Education</i>, <i>16</i>(1), 21–28.<a href="https://doi.org/10.1128/jmbe.v16i1.846"> https://doi.org/10.1128/jmbe.v16i1.846</a> ↩
  • Krain, M. (2016). Putting the Learning in Case Learning? The Effects of Case-Based Approaches on Student Knowledge, Attitudes, and Engagement. <i>Journal on Excellence in College Teaching. </i>27(2), 131-153. ↩
  • Thistlethwaite, JE; Davies, D.; Ekeocha, S.; Kidd, J.M.; MacDougall, C.; Matthews, P.; Purkis, J.; Clay D. (2012). <a href="https://www.ncbi.nlm.nih.gov/pubmed/22578051">The effectiveness of case-based learning in health professional education: A BEME systematic review</a>. <i>Medical Teacher.</i> 2012; 34(6): e421-44. ↩
  • Yadav, A.; Lundeberg, M.; DeSchryver, M.; Dirkin, K.; Schiller, N.A.; Maier, K. and Herreid, C.F. (2007). Teaching Science with Case Studies: A National Survey of Faculty Perceptions of the Benefits and Challenges of Using Cases. <i>Journal of College Science Teaching; </i>Sept/Oct 2007; 37(1). ↩
  • Popil, I. (2011). Promotion of critical thinking by using case studies as teaching method. Nurse Education Today, 31(2), 204–207. <a href="https://doi.org/10.1016/j.nedt.2010.06.002">https://doi.org/10.1016/j.nedt.2010.06.002</a> ↩
  • Weimer, M. (2013). Learner-Centered Teaching: Five Key Changes to Practice. Second Edition. Jossey-Bass. ↩
  • Herreid, C.F. (2006). “Clicker” Cases: Introducing Case Study Teaching Into Large Classrooms. <i>Journal of College Science Teaching. </i>Oct 2006, 36(2). <a href="https://search.proquest.com/docview/200323718?pq-origsite=gscholar">https://search.proquest.com/docview/200323718?pq-origsite=gscholar</a> ↩

This website uses cookies to identify users, improve the user experience and requires cookies to work. By continuing to use this website, you consent to Columbia University's use of cookies and similar technologies, in accordance with the Columbia University Website Cookie Notice .

Issue Cover

  • Previous Article

Introduction

Conclusions, author contributions, acknowledgments, competing interests, supplemental material, using case studies to improve the critical thinking skills of undergraduate conservation biology students.

ORCID logo

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Open the PDF for in another window
  • Guest Access
  • Get Permissions
  • Cite Icon Cite
  • Search Site

Ana L. Porzecanski , Adriana Bravo , Martha J. Groom , Liliana M. Dávalos , Nora Bynum , Barbara J. Abraham , John A. Cigliano , Carole Griffiths , David L. Stokes , Michelle Cawthorn , Denny S. Fernandez , Laurie Freeman , Timothy Leslie , Theresa Theodose , Donna Vogler , Eleanor J. Sterling; Using Case Studies to Improve the Critical Thinking Skills of Undergraduate Conservation Biology Students. Case Studies in the Environment 5 February 2021; 5 (1): 1536396. doi: https://doi.org/10.1525/cse.2021.1536396

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Critical thinking (CT) underpins the analytical and systems-thinking capacities needed for effective conservation in the 21st century but is seldom adequately fostered in most postsecondary courses and programs. Many instructors fear that devoting time to process skills will detract from content gains and struggle to define CT skills in ways relevant for classroom practice. We tested an approach to develop and assess CT in undergraduate conservation biology courses using case studies to address both challenges. We developed case studies with exercises to support content learning goals and assessment rubrics to evaluate student learning of both content and CT skills. We also developed a midterm intervention to enhance student metacognitive abilities at a light and intensive level and asked whether the level of the intervention impacted student learning. Data from over 200 students from five institutions showed an increase in students’ CT performance over a single term, under both light and intensive interventions, as well as variation depending on the students’ initial performance and on rubric dimension. Our results demonstrate adaptable and scalable means for instructors to improve CT process skills among undergraduate students through the use of case studies and associated exercises, aligned rubrics, and supported reflection on their CT performance.

Educating the next generation of professionals to address complex conservation and environmental challenges involves more than teaching disciplinary principles, concepts, and content—it also requires cultivating core competencies in critical thinking (CT), collaboration, and communication [ 1 , 2 ]. CT skills are key desired outcomes of college and university education [ 3 ] and can strongly influence how students make life decisions [ 4 ]. Unfortunately, college graduates in the United States appear to lack strong CT skills despite several years of instruction [ 5 , 6 , 7 ]. This may be due to overreliance on teaching and assessment approaches that emphasize mastery of large volumes of content and offer few opportunities to think critically while acquiring and using knowledge [ 8 , 9 ]. Courses that take an alternative approach, using active, collaborative, or inquiry-based approaches to learning could contribute to both long-term retention of knowledge and CT skills [ 10 , 11 , 12 ].

Using case studies to support active, inquiry-based approaches can be especially effective [ 13 , 14 ]. Case study pedagogies are well suited to supporting the development of CT skills because of their sustained focus on a theme with applications in a specific setting and the opportunity to emphasize distinct steps in the processes of understanding and analyzing issues that comprise essential CT skills. Creating exercises that foster CT using a case study approach combines strengths from both inquiry-based and case study–based best practices.

While definitions vary, CT is broadly recognized as “a habit of mind characterized by the comprehensive exploration of issues and evidence before accepting or formulating an opinion or conclusion” [ 15 ]. CT involves higher-order thinking skills, as well as a suite of concrete capacities, including the ability to select, analyze, infer, interpret, evaluate, and explain information, as well as draw conclusions based on varied and conflicting evidence [ 15 , 16 , 17 ]. Not confined to specific analytical tasks, strong CT skills support the ability to think in a complex manner and to process and assess diverse inputs in a constantly changing environment [ 17 ]. This capacity is essential to effective decision-making, problem solving, and adaptive management in conservation research and practice, particularly in addressing the tradeoffs and multiplicity of perspectives at the core of environmental concerns.

CT has been a focus of K–12 educational and cognitive researchers, who show that explicit instruction can enhance learning of CT skills [ 18 , 19 , 20 , 21 ]. Unfortunately, adoption of these ideas and practices has been slower in tertiary STEM education [ 22 ]. Educators in undergraduate science classrooms rarely prioritize explicit direct instruction in CT skills or their assessment, fearing compromising the time available for “coverage” of content [ 6 , 23 ]. Thus, many instructors rely on teaching and assessing core content, assuming the CT skills will automatically develop along with deeper disciplinary knowledge [ 17 , 24 ]. Further, educators typically lack training on CT instruction [ 25 ]. Perhaps not surprisingly, on average, very small or empirically nonexistent gains in CT or complex reasoning skills have been found in a large proportion of students over the course of 4-year college programs [ 6 , 7 , 26 ].

Besides the potential to enhance learning outcomes, an emphasis on CT skills through more active and collaborative learning can also promote equal opportunity in STEM and boost completion rates. It has been shown that these approaches to teaching and learning can enhance learning for underrepresented groups in science [ 27 ] and could also boost performance and hence retention in the field in the midst of current high attrition rates in STEM [ 28 ].

Our experience running faculty professional development programs in diverse contexts over several years [ 29 ] has shown that faculty in diverse learning contexts seek and welcome evidence-based guidance on teaching and assessment practices that promote CT. Given only informal preparation on building CT skills and a curricular focus on essential disciplinary concepts, instructors often search for guidelines on how to incorporate these practices while supporting the simultaneous learning of concepts. Case studies can provide a particularly strong way to support the enhancement of CT skills that are adaptable for individual instructors.

To better understand the investment in time and effort needed for conservation students to learn process skills and for faculty to develop efficient teaching tools, we designed a multi-institutional study on three fundamental process skills: oral communication [ 30 ], data analysis [ 31 ], and CT. These different skills were selected to match the diverse interests of the participating faculty and were targeted by different faculty in different “arms” of the study (in different institutions, courses, and groups of students) to allow for comparison among results. Here we report on the results for CT. A key component of this portion of the study was the use of case studies to foster both content and skill development.

Our study design built on evidence showing that case study exercises help reinforce concept knowledge, as well as cognitive skills, and further, that repetition and reflection [ 32 ] support development of higher-order thinking skills. We investigated three questions: (1) Does instructor emphasis on CT skills—providing metacognitive support for reflection on their performance at light and intensive levels—influence the magnitude of individual CT skill gains? (2) Do students show similar responses for the different dimensions of CT learning, or are any of them more challenging than others? and (3) How does our intervention influence students at different initial achievement levels?

To address these questions, we first created and validated instructional materials in the form of case study exercises and assessment rubrics designed to develop and assess four main dimensions of CT skills (see below) and piloted these materials in diverse classroom settings across five institutions. We assessed student learning using a common rubric to score CT performance on two case study–based exercises, and using an independent assessment of CT skills (The Critical Thinking Assessment Test (CAT) [ 33 ]), applied at the start and end of each course. To address the frequent concern of instructors regarding trade-off with content learning, we investigated these questions while also measuring content gains. A key aim of this study was to develop and use approaches for active teaching using case studies that instructors can readily adopt as part of their regular teaching practices.

Developing, Validating, and Implementing Assessment Tools

Between April and July 2011, we created and validated a set of instructional materials based on case studies designed to develop CT skills (Instructional Unit for CT skills). The Instructional Unit consisted of (1) Case Study Exercise 1 on amphibian declines, with a solution file, (2) Case Study Exercise 2 on invasive species, with a solution file, (3) a pre/post content knowledge assessment for each exercise, (4) a student’s pre/post self-assessment of their CT skills, (5) our CT Rubric, and (6) the files associated with the intensive versus light Teaching Intervention, including a third brief Case Study on climate change used in the intensive intervention, with a solution file. The complete Instructional Unit as used in the study, as well as updated versions of the case studies, can be downloaded from the website of the Network of Conservation Educators and Practitioners (NCEP). 1

Development and Validation of the CT Rubric and Case Study Exercises

To evaluate student CT performance, we developed a rubric based on elements found in existing and available rubrics (e.g., Washington State University’s Guide to Rating Critical & Integrative Thinking from 2006, and Northeastern Illinois University CT Rubric from 2006) and the VALUE Rubric for CT [ 34 ]. The resulting rubric included descriptions of four performance levels (from 1 to 4) for four dimensions of CT: (1) explanation of issues or problems, (2) selection and use of information, (3) evaluation of the influence of context and assumptions, and (4) reaching positions or drawing conclusions. The final rubric drew on broadly validated rubrics and was adapted by a core group of eight participating project faculty at a workshop in 2011. Using a collaborative and participatory approach to rubric development, we sought to validate rubric content, ensure familiarity of faculty participants with the rubric, and minimize scoring differences among project participants.

We then developed two exercises based on real-world case studies, as recommended by the Vision and Change Report [ 1 ]. Case study topics were selected to correspond to core topics that could be incorporated into all courses with minimal syllabus disruption. We developed Case Study Exercise 1 with a focus on threats to biodiversity, specifically on understanding the causes of amphibian declines. We adapted Case Study Exercise 2 on the topic of invasive species, specifically on rusty crayfish in the Eastern United States, from a version previously published by the NCEPs ( http://ncep.amnh.org ). Each case study exercise contained three main parts: (1) a short introduction and instructions to the exercise, (2) the case study, and (3) a section with questions designed to prompt students’ CT skills in relation to the case. Each case study exercise was designed to teach conservation biology content in alignment with the CT skills assessed in the rubric; it included questions and tasks intended to elicit student performance in each of the four CT dimensions described in the rubric.

Implementation of the Case Study Exercises and CT Interventions

Between August 2011 and August 2013, we implemented the Instructional Unit following the experimental design shown in figure 1 in upper-level conservation biology courses given at five U.S. higher education institutions ( table 1 ). Case Study Exercise 1 was administered within the first 2 weeks of class as a preassessment and Case Study Exercise 2 was administered within the last 2 weeks of class in the term as a postassessment (see figure 1 ). To guide and facilitate data collection, we provided each professor with a scoring guide to assign points to answers to each question in the case study exercise and a spreadsheet to enter points. Scores from specific questions were assigned to one of the four CT dimensions. Professors then reported these final scores on each dimension of the rubric to the students. Scores from both case study exercises contributed toward students’ grades.

Experimental design and main questions within and across terms. The discontinuous arrow between Light and Intensive Teaching Interventions (TI) indicates an interchangeable order. Abbreviations are as follows: CAT = Critical Thinking Assessment Test; CE Ex = case study exercise. Not all students completed all components, so total sample size differs in our analyses of these data. N ranged from 52 to 82 for completion of the pre-/postcase study exercise content assessments for the specific case studies, with N = 113 total for the light teaching intervention and N = 103 total for the intensive teaching intervention. For CT skill gains across the case study exercises, N = 216. For the pre- and postcourse student-self assessments, N = 76 for the light and 79 for the intensive teaching intervention. N = 78 for the light and 71 for the intensive teaching interventions for the pre- and postcourse CAT tests.

Experimental design and main questions within and across terms. The discontinuous arrow between Light and Intensive Teaching Interventions (TI) indicates an interchangeable order. Abbreviations are as follows: CAT = Critical Thinking Assessment Test; CE Ex = case study exercise. Not all students completed all components, so total sample size differs in our analyses of these data. N ranged from 52 to 82 for completion of the pre-/postcase study exercise content assessments for the specific case studies, with N = 113 total for the light teaching intervention and N = 103 total for the intensive teaching intervention. For CT skill gains across the case study exercises, N = 216. For the pre- and postcourse student-self assessments, N = 76 for the light and 79 for the intensive teaching intervention. N = 78 for the light and 71 for the intensive teaching interventions for the pre- and postcourse CAT tests.

Institution Type, Student Level, Class Size, and Term When the Instructional Unit With the Intensive (ITI) and/or Light (LTI) Teaching Intervention Was Used for Each Participating Course.

a Following the Carnegie Classification of Institutions of Higher Education http://classifications.carnegiefoundation.org/ .

b Class size = average number of students enrolled in ITI and LTI sections of the course.

We evaluated whether students gained CT skills, content knowledge, and self-confidence in their skills in courses that used the Instructional Unit with one of two levels of teaching intervention: light and intensive. The intervention differed in the amount of time spent in class and the level of reflection required from the students. In the light intervention, students were only given the CT rubric and their scores from the first exercise, while in the intensive intervention, students received the same and also worked in groups around the CT rubric on an additional case over a single class period, followed by individual reflection on how to improve their performance in CT. Using both interventions in the same course, but during different academic terms, we investigated whether the intensity of emphasis on CT in a course influences students’ overall CT gains.

In addition, we conducted an independent assessment of CT gains under the two interventions. At the beginning and end of each course, we administered the Critical Thinking Assessment Test (CAT), a published, validated instrument developed by the Center for Assessment & Improvement of Learning at Tennessee Tech University (CAIL at TTU [ 33 ]; see figure 1 ). The CAT is a 1-h written test consisting of 15 questions that assesses student performance in evaluation and interpretation of graphical and written information, problem solving, identifying logical fallacies or needs for information to evaluate a claim, understanding the limitations of correlational data, and developing alternative explanations for a claim. These CT dimensions were comparable to those we evaluated in our rubric, particularly those under Evidence, Influence of context and assumptions , and Conclusions .

Further, to examine whether explicit instruction in CT skills was more influential than explicit instruction in other skills, the CAT assessments were also given in the other two arms of the study that evaluated interventions designed to improve data analysis [ 31 ] and oral communication skills [ 30 ]. Unfortunately, only one instructor in the oral communication study applied the CAT instrument, so we restricted comparison to the data analysis study, where four instructors applied the CAT in their courses.

We scored batches of completed CAT tests in nine full-day scoring sessions, including only tests for which we have both a pre- and postcourse test from the same student ( N = 290 total; CT study, N = 149; data analysis study, N = 141). In each session, we scored a sample of tests from across multiple institutions, study arms, and intervention levels, and each test was assigned a numerical code so that all scoring was blind. Following CAT procedures, scoring rigorously adhered to CAT scoring rubrics and was discussed by the scoring group as needed to ensure inter-scorer reliability. The CAT tests and scores were then sent to Tennessee Tech University for independent assessment, cross-validation, and analysis. For 2 of the 15 questions, the CAT scoring performed by our team was more generous than norms for these assessments performed nationally, but otherwise scores fell within those norms (results not included; analysis performed by CAIL at TTU). However, this did not affect the use of this tool as an independent assessment of CT skill gains because the scoring sessions were internally consistent and apply to both pre- and postcourse scores.

The project received an exemption from the AMNH Institutional Review Board (IRB 09-24-2010) and the Stony Brook University IRB (265533-1), and the other institutions operated under these exemptions.

A total of 217 students from five upper-level Conservation Biology courses completed both case study exercises over one term. We excluded one student who obtained the maximum score on both exercises while using the light intervention because no improvement was possible, leaving us with N = 216 students in this study. To assess CT skills, content knowledge, and self-confidence, we calculated changes in student performance using normalized change values ( c ) [ 35 ] and compared pre- and postassessments with paired Wilcoxon signed-rank tests [ 36 ]. The two teaching intervention groups (light and intensive) were assessed independently. Changes in the proportions of students scoring in a given quartile before and after the interventions were analyzed using χ 2 tests. We tested for the effect of instructional emphasis using the light versus intensive intervention with a linear mixed-effects model. Online Appendix 1 has additional description of these analyses.

Because we found no differences among courses given at the different institutions, and CAT test samples were homoscedastic, a repeated-measures ANOVA was used on data pooled across institutions. This ANOVA tested overall differences across teaching interventions, across instructional units, and effects on gains for specific skills measured in the CAT. All calculations and statistical analyses were performed in R [ 37 ].

Gains in CT Skills as Measured by Performance Over the Instructional Unit

Most students gained CT skills in each term, as measured by their relative CT performance on the two case study exercises ( figure 2 ). In terms where a light intervention was used ( N = 113 students), 81 students (72%) gained CT skills (positive c value), improving their performance, on average, by 34%. With the intensive intervention ( N = 103 students), 79 students (77%) gained in skills, improving by 37% ( table 2 ).

Percent scores for Case Study Exercises 1 and 2 used as pre- and postassessment of critical thinking skills, respectively, under the light and intensive teaching interventions (TI). Asterisks indicate significant differences (p < .001) tested with the paired Wilcoxon paired signed rank test. In addition, differences in the percent scores were not equally distributed across quartiles in both the light (N = 113; X2 = 23.415, p = .0005) and intensive comparisons (N = 103; X2 = 31.893, p = .0005), and the contingency tables indicated shifts in frequency from the bottom quartile before the intervention to the highest quartile after the intervention.

Percent scores for Case Study Exercises 1 and 2 used as pre- and postassessment of critical thinking skills, respectively, under the light and intensive teaching interventions (TI). Asterisks indicate significant differences ( p < .001) tested with the paired Wilcoxon paired signed rank test. In addition, differences in the percent scores were not equally distributed across quartiles in both the light ( N = 113; X 2 = 23.415, p = .0005) and intensive comparisons ( N = 103; X 2 = 31.893, p = .0005), and the contingency tables indicated shifts in frequency from the bottom quartile before the intervention to the highest quartile after the intervention.

Overall Average Gains for Conservation Biology Courses That Used the Instructional Unit for Critical Thinking With the Light (LTI) and Intensive Teaching Interventions (ITI).

Notes: n.s. = no significant gains between Case Study Exercises 1 and 2 using a paired Wilcoxon signed-rank test.

a Percentage of students that gained skills in parenthesis.

b average normalized gains ± mean standard error.

** Highly significant, * significant.

Significant shifts in performance between the first and second case study exercises were notable in both the light and intensive intervention, based on χ 2 analysis that indicates shifts in frequency from the bottom quartile before the intervention to the highest quartile after the intervention. We found no significant effect of the level of intervention on mean skill gains ( N = 216 students; F (1,216) = 1.359; p = .18). However, the level of intervention was associated with differential gains when students are grouped by initial level of performance, above or below the median; only in the intensive intervention did those performing above the median also show significant gains ( table 2 ). Under the light intervention, 54 students scored below the median of 66%, and 59 scored equal to or above the median in Case Study Exercise 1. Students below the median had greater gains than students with scores equal to or above the median. Students below the median improved their performance by an average of 41%, with 81% of them showing gains ( table 2 ). Students equal to or above the median improved their CT skills by an average of 27%, with 63% of them showing gains.

Under the intensive intervention, 48 students scored below the median score of 64%, and 55 scored equal to or above the median score in Case Study Exercise 1. Students below the median improved by an average of 44% with 90% of them showing gains, while students equal to or above the median improved their CT skills on average by 29% with 65% of them having gains ( table 2 ).

A detailed analysis shows students improved their levels of performance in most of the four dimensions of CT defined for this study. However, achievement level varied among dimensions ( figure 3 ). Surprisingly, for Explanation of the issues to be considered critically , students decreased their level of performance under both interventions ( v = 1542; p < .0025, with Bonferroni correction). In the case of Evidence and Influence of context and assumptions , students significantly improved regardless of which intervention was used ( v = 524 and 39; p < .0025; see figure 3 ).

Distribution of students’ performance within the four levels of proficiency for critical thinking skills (1 = lowest, 4 = highest) when using the instructional unit with the light (N = 113 students) and intensive (N = 103 students) teaching interventions. Asterisks indicate significant differences (p < .0025) and n.s. indicates no significant differences (p > .0125) between the rubric scores for Exercises 1 (preteaching intervention) and 2 (post teaching intervention), tested with the paired Wilcoxon test, Bonferroni corrected.

Distribution of students’ performance within the four levels of proficiency for critical thinking skills (1 = lowest, 4 = highest) when using the instructional unit with the light ( N = 113 students) and intensive ( N = 103 students) teaching interventions. Asterisks indicate significant differences ( p < .0025) and n.s. indicates no significant differences ( p > .0125) between the rubric scores for Exercises 1 (preteaching intervention) and 2 (post teaching intervention), tested with the paired Wilcoxon test, Bonferroni corrected.

Student Content Knowledge, CT Skills, and Self-Confidence

Students gained content knowledge related to the topics of both case study exercises under the light and the intensive intervention, with gains greater than 26% from pre- to postexercise (see table 3 ). Gains in concept knowledge associated with both case studies were greater than 35% for the light teaching intervention and similarly high for the first case study in the intensive teaching intervention group.

Average Gains in Students’ Content Knowledge Measured as the Average Normalized Change ( c ave ) While Using Exercises 1 and 2 of the Instructional Unit With the Light (LTI) and Intensive Teaching Interventions (ITI).

Note: p values are for the paired Wilcoxon signed-rank test on the percentages of the pre- and postcontent scores.

In addition, there was a marginally significant correlation between gains in CT skills and content knowledge ( N = 136 students; ρ = .161; p = .06). Students who showed greater gains in CT skills also showed greater gains in their content knowledge in the topic areas that were the focus of the case studies.

Based on individual self-assessment questionnaires, we found average gains in student self-confidence with CT skills of 21% regardless of intervention. Increases were statistically significant for some of the self-assessment questions, under the intensive intervention only ( figure 4 ). Our results indicate no correlation between gains in CT skills and self-confidence ( N = 155 students; ρ = .049; p = .5).

Frequency distribution of students’ self-assessed confidence levels with their critical thinking skills when using the instructional unit with the light (N = 79 students) and intensive (N = 76 students) teaching interventions. One and two asterisks indicate significant differences between pre- and postassessment scores with p < .0125 and p < .0025, respectively, and n.s. indicates no significant differences (p > .0125) tested with the paired Wilcoxon test, Bonferroni corrected.

Frequency distribution of students’ self-assessed confidence levels with their critical thinking skills when using the instructional unit with the light ( N = 79 students) and intensive ( N = 76 students) teaching interventions. One and two asterisks indicate significant differences between pre- and postassessment scores with p < .0125 and p < .0025, respectively, and n.s. indicates no significant differences ( p > .0125) tested with the paired Wilcoxon test, Bonferroni corrected.

Gains in CT Skills as Measured by the CAT Instrument

We also evaluated differences in CT gains as measured by the CAT instrument, both within the CT study arm described here, and the additional arm of the larger study focused on data analysis skills [ 31 ].

Students gained CT skills in both the light and the intensive intervention, with a significant interaction effect of teaching intervention as students had greater gains in the intensive intervention (repeated measures ANOVA: F (1,147) = 4.081, p = .045; figure 5 ). Significant gains were seen with the light intervention for two questions related to the ability to summarize the pattern of results in a graph without making inappropriate inferences, and the use of basic mathematical skills to help solve a real-world problem, with effect sizes of 0.28 and 0.35, respectively. Over all 15 questions, CT gains were moderate, with an effect size of 0.19 ( table 4 ). Under the intensive CT intervention, significant gains were seen in five different questions, with effect sizes ranging from 0.32 to 0.38, and overall gains across the 15 questions were large, with an effect size of 0.49 ( table 4 ).

Comparison of gains in total CAT scores administered pre- versus postcourse by teaching intervention, as measured by the CAT instrument, across all institutions. Scores (SD indicated between parentheses) of the light teaching intervention were 20.00 (5.87) precourse and 21.22 (6.71) postcourse; scores for the intensive intervention were 19.39 (5.74) precourse and 22.37 (6.30) postcourse. A repeated measures ANOVA was conducted to compare the effect of teaching intervention (Light/Intensive) and test administration (pre-/postcourse) on the CAT total score for courses using the CT Unit, showing a significant interaction of Teaching Intervention and pre-/postcourse administration (F(1,147) = 4.081, p = .045). Students in the Intensive Intervention made greater gains on the CAT total score than students in the light intervention.

Comparison of gains in total CAT scores administered pre- versus postcourse by teaching intervention, as measured by the CAT instrument, across all institutions. Scores (SD indicated between parentheses) of the light teaching intervention were 20.00 (5.87) precourse and 21.22 (6.71) postcourse; scores for the intensive intervention were 19.39 (5.74) precourse and 22.37 (6.30) postcourse. A repeated measures ANOVA was conducted to compare the effect of teaching intervention (Light/Intensive) and test administration (pre-/postcourse) on the CAT total score for courses using the CT Unit, showing a significant interaction of Teaching Intervention and pre-/postcourse administration ( F (1,147) = 4.081, p = .045). Students in the Intensive Intervention made greater gains on the CAT total score than students in the light intervention.

Specific CT Skills Identified in CAT Questions in Which the Students in This Study Showed Significant Gains.

Notes: The specific CAT question numbers are given, with a brief description of the CT skill addressed by the question. The pre- and postcourse means, probability of difference, and effect sizes are given only for those cases in which there was a significant difference observed.

Students using the CT Instructional Unit showed greater increases in CAT scores than those in the data analysis arm of the study ( F 1,290 = 9.505, p = .002), a pattern driven by the results of the intensive teaching intervention. Students in the Intensive Teaching Intervention treatment of the CT arm of the study had significantly higher gains in CAT scores than those in the data analysis arm of the study ( F 1,148 = 11.861, p < .001). There was no significant difference in student CAT scores for those in the light teaching intervention of the two arms of the study ( F 1,142 = 2.540, p = .113).

Our study adds to recent literature on effective approaches for teaching and learning of CT skills (e.g., [ 18 , 19 , 20 , 21 ]), an essential outcome of college and university education. Our pedagogical intervention hinges on the use of case studies to foster both content knowledge and CT skills, with support of an assessment rubric. We show that educators can foster measurable gains in CT over the course of a single term or semester by giving students an opportunity to practice these skills through case study exercises at least twice and reflect on their performance midway through the term, using a rubric that provided an operational definition of CT.

We chose a case study approach because real-world problem solving involves making decisions embedded in context [ 14 , 38 ]. Learning how to think critically about information available in its context and evaluating evidence through identifying assumptions and gaps to arrive at strong inference is better supported through lessons presented in case studies, rather than as abstract principles alone. A key to our process was to help students identify the steps they are taking—enhancing their metacognition—through naming specific skills in formative rubrics. In this way, we specifically targeted enhancing their CT skills while gaining concept knowledge about conservation.

Does Instructor Emphasis on CT Skill Affect the Magnitude of Individual Skill Gains?

The light and intensive interventions used in our study differed in level of engagement with a rubric specifically designed to promote and assess CT skills—a type of formative rubric use. Rubrics are generally designed with assessment and grading in mind and developed to fit a specific assignment; however, they have great potential to help with process skill development [ 39 ]. In this study, students were given the detailed rubric after completing the first case study exercise and were encouraged to locate their performance on the rubric. In the intensive intervention, students were further tasked with using the rubric to evaluate and improve sample answers to an additional, short case study.

The formative rubric use allowed us to align assignments to the dimensions of a given skill, in line with the principles of backwards design [ 40 ] and constructive alignment [ 41 ], and to identify the components where students struggle the most, as areas to target. Our results provide support to the benefits of rubric use [ 39 , 42 ]. Using a rubric to codify and operationalize a complex skill like CT seems to help both educators and learners. Our results are concordant with those of Abrami et al. [ 18 ] and Cargas et al. [ 19 ], as we show that “corrective feedback on a common rubric” aligned with relevant, authentic tasks supported learning, and that the simple act of sharing a rubric with the students may not be sufficient by itself [ 43 ]. We encourage others to make use of available collections of rubrics, such as those generated by the VALUE initiative [ 15 ].

The rubric allowed us to provide qualitative feedback to students as they practiced—an anchor for student reflection—and to analyze gains quantitatively. Furthermore, the unit as a whole was designed to promote self-reflection, which has been shown to increase students’ ability to monitor their own selection and use of resources and evidence [ 44 ]. Self-reflection was also found to increase oral communication performance in a parallel arm of our study [ 30 ].

Finally, using case study exercises aligned to the rubric but designed to encompass topics relevant to course content, we were able to assess the learning of content while practicing CT skills. Our results support previous findings [ 45 ] that students can experience simultaneous gains in knowledge and skills, even when instructional materials and class time are dedicated to CT skill development. Indeed, we found student content knowledge gains were positively correlated with their CT skill gains, although this was marginally significant. Taken together, our results suggest that cultivation of CT skills not only does not compete with content knowledge gains but that the focus on CT skills may well enhance content knowledge.

Do Students Show Similar Responses for the Different Dimensions of CT Learning?

Formative rubric use provided insights into which dimensions of CT are more challenging to students, providing valuable feedback to educators. Our results indicate that some dimensions of CT are more challenging to improve than others. A finer examination of the CT gains shows that the changes driving our results stem from two dimensions in our rubric: selection and use of evidence and recognition of the influence of context and assumptions (see figure 3 ). Several aspects of the CT Instructional Unit are likely to have enhanced outcomes in these dimensions, such as the fact that both exercises were based on case studies where students were asked to explain how a change in context would change their course of action or conclusions. Case studies are considered valuable for science teaching because they can reflect the complexity of problems and professional practice in social-environmental systems [ 14 , 38 ]. Cases present concepts and connections among them in a specific context, therefore highlighting the influence of context and assumptions, and require students to evaluate the information being presented and to select the most useful or relevant evidence for a particular task or decision. Our results support previous studies showing case studies can enhance CT skills and conceptual understanding by design [ 45 , 46 , 47 ].

Student performance did not significantly improve in the remaining dimensions of our CT rubric. In the case of ability to clearly and comprehensively explain the issue , overall students showed a loss ( figure 3A ). This dimension was unique in that students were already high achievers at the outset of the term, and the slight loss may correspond to noise along a dimension in which students were already at maximum performance levels. Alternatively, exercise structure could have played a role. The instructions for Case Study Exercises 1 and 2 were not identical in the questions relating to this dimension. Case Study Exercise 1 scores were derived from three separate questions ( What problem are amphibians facing? Summarize the Climate hypothesis; Summarize the Spread hypothesis ), while Case Study Exercise 2 scores rested on a single answer ( Write a paragraph for your supervisors describing and explaining the problem Bright Lake is facing and why it is important to address it ). Despite the former scores being averaged, having separate questions may have offered more opportunities for achievement in the first exercise and fewer in the second, resulting in an observed loss in this dimension. This was the only rubric dimension for which the number of questions contributing to a dimension’s overall score varied between case study exercises.

Finally, the most challenging dimension for students was the ability to make judgments and reach a position, drawing appropriate conclusions based on the available information, its implications, and consequences. No significant gains and the lowest rates of achievement were observed for this dimension, which maps to higher-order cognitive tasks or higher Bloom’s taxonomy levels, and has also been shown to be the most challenging for students in a broader science context [ 48 , 49 ]. In a review of student writing in biology courses, Schen [ 49 ] observed that students were often adept at formulating simple arguments but showed limited ability to craft alternative explanations or to engage with other more sophisticated uses of available information. Our results mirrored this observation, as students generally only made simple use of information. Becker [ 50 ] found similar patterns in student performance and further showed that explicit instruction in constructing arguments based on evidence resulted in students developing more accurate and more sophisticated conclusions. Again, our results spotlight the importance of explicit instruction in CT. Focusing student attention on how to sift among details presented in case studies to draw inferences and conclusions and on expressing their arguments with clear connection to the evidence within case studies may be necessary steps for students to have significant gains in these more advanced aspects of CT.

Similarly, gains in CAT scores were not randomly distributed throughout questions or dimensions of CT. Students significantly improved their CAT scores in questions measuring the ability to evaluate and interpret information, think creatively, and communicate effectively. Conversely, students did not gain in their capacities to use information critically in drawing conclusions (e.g., identify additional information needed to evaluate a hypothesis , use and apply relevant information to evaluate a problem , or explain how changes in a real world problem situation might affect the solution ). The results of the CAT and our case study assessments were broadly similar, with many significant gains seen in CT, except in those dimensions that required more sophisticated reasoning. Together, these results suggest that more, and perhaps different, instructional attention is needed to help students achieve certain specific dimensions of CT (see also [ 11 ]).

How Does the Intervention Affect Students at Different Achievement Levels?

Students with lower initial performance (i.e., below the median in the first exercise) gained more than those with a higher performance (above the median). These differential CT gains suggest that distinct mechanisms for improvement may be at play. We hypothesize those students who were initially least proficient in CT were assisted by the combination of repeated practice (two case study exercises) and calling attention to the components of CT through the rubric-driven intervention, along with self-reflection. Using similar instructional activities could enhance performance or retention in science courses in general [ 51 ], given links between process skills and risk of failing introductory biology [ 52 ]. We further hypothesize that for higher achieving students, the greater emphasis on metacognition in the intensive intervention may be critical to promote gains in performance. Simply prompting students to reflect on their learning may be insufficient [ 53 ], as many students need support in implementing metacognitive strategies despite being familiar with them, such as purposeful peer interaction [ 54 ]. The combination of repeated practice and reflection through the intervention’s in-class discussion may have helped students engage more effectively with their learning.

Students showed significant gains in CAT scores under both interventions, although significantly higher under the intensive intervention. Importantly, because the students took the CAT at the end of the course, the CAT measured their response to both exercises plus the intervention , which, in the case of the intensive intervention, included practice in improving responses to a short case study exercise in alignment with the CT rubric. This contrasts with the instructional assessment, which measured gains corresponding only to the midterm teaching intervention as measured by improvement in scores for the second case study exercise. Thus, as measured by the CAT, the whole unit improved CT skills among these students over the term in both interventions, while the extensive discussion of CT skills that was part of the intensive intervention improved CT performance even further.

Advancing CT skills has proven to be challenging for many institutions. The CAT test has been used in over 250 institutions around the world [ 55 ], but few have observed gains in CT overall (see [ 11 , 56 ]), although some have found an effect on individual CT dimensions [ 19 , 57 ]. We consider the inclusion of case study–based exercises to be an important factor in activating student learning and fostering strong CT gains among students in our study.

The CAT assessments were also given in another arm of the overall study that evaluated interventions designed to improve data analysis skills [ 31 ], enabling us to compare CT gains when directly targeted (in the CT arm of the study) to when they were not (the data analysis arm of the study). Only students in the CT arm of the study showed notable CT gains under a light intervention, and the gains were greater under the intensive intervention in the CT arm than in the data analysis arm. The intensive intervention was designed particularly to foster student capacity to reflect on their own learning, or metacognition, as this skill has been shown to improve academic performance [ 53 , 58 ]. Thus, the independent CAT assessment shows that explicit instruction in CT, coupled with repeated practice and reflection, is effective in improving student CT (see also [ 57 ]). Importantly, the CAT results imply that by developing CT in a conservation biology context, students are also enhancing their ability to apply that CT skill to other domains of their learning, such as the more general tasks required in the CAT.

Implications for Future Research and Scaling

While the results of this study are promising, our approach could be subjected to further testing. A limitation of the study was the lack of collateral data collection, such as GPA averages or overall course grades, which would have allowed for additional comparison of the student populations in each intervention. Differences in course achievement among classes, however, would not affect our interpretation of the effect of the intervention because the CT gains were observed between exercises in each term and are an internal comparison within the same student population within a given course. Our study did not use a treatment and control design or randomly assign students to the interventions. An approach based on multiple linear regression at the student level [ 59 ] could be a helpful alternative.

Adoption of the approach presented here was successful in a variety of contexts and situations. The institutions in this study varied in size and type, class size, and instructor; they included those ranked as R1, MA-granting and undergraduate only, private and public, a Minority Serving Institution, part-time and residential, and with class sizes between 10 and 60 students (see table 1 for details). Despite this variation, in 9 of the 10 classes, we observed an increase in students’ CT performance over a term, under both light and intensive interventions.

Our study shows educators can foster measurable gains in CT over the course of a single term or semester by giving students an opportunity to practice at least twice and reflect midway using case study exercises aligned to both course content and a rubric that provides an operational definition of CT. Despite the brevity of the interventions, the study has provided valuable new findings on student performance in different dimensions of CT and shows promising results from instructional approaches that can be easily adapted and integrated into a variety of courses and contexts. Importantly, the study design also allowed us to work together as a team with diverse faculty in the design and application of assessment materials, which served as a professional development for faculty that can help “close the loop” between assessment and future teaching practice.

CT underpins the kind of leadership capacity needed in society today, including “ethical behavior, the ability to work with diverse populations, and the ability to think from a systems perspective” [ 17 ]. These skills are essential for conservation biology researchers and professionals because of the multidisciplinary nature of challenges comprising various forms of evidence [ 60 ], the potential for consequences to diverse stakeholders, and the high prevalence of trade-offs among alternative scenarios. Encouraged by the results of this study, we urge educators to explore these and other approaches to target CT explicitly in their learning activities and teaching practice.

ALP, AB, NB, and EJS developed the study framework. ALP, AB, MJG, NB, BJA, JAC, CG, MC, TT, DSF, DV, and EJS contributed to development of the instructional units. ALP, MJG, LMD, BJA, JAC, CG, DLS, MC, DSF, LF, TL, and DV implemented the CT study in their classrooms and collected data for the study. AB, LMD, and ALP performed the data analysis. ALP, MJG, and AB led the writing of the manuscript, with contributions from EJS, LMD, and NB. All authors contributed to CAT scoring sessions and to the discussions that supported writing of the manuscript. The study was made possible by an NSF grant to EJS, ALP, and NB.

We are grateful to all study participants, those who helped score the CAT, and K. Douglas and N. Gazit for key assistance. We thank G. Bowser, A. Gómez, S. Hoskins, K. Landrigan, D. Roon, and J. Singleton for their contributions to the initial design and the original authors of the NCEP materials adapted for this study. The Biology Education Research Group at UW provided helpful input in initial discussions.

The authors have declared that no competing interests exist. Martha J. Groom is an editor at Case Studies in the Environment . She was not involved in the review of this manuscript.

This project was supported by the National Science Foundation (NSF) CCLI/TUES Program (DUE-0942789). Opinions, findings, conclusions, or recommendations expressed are those of the authors and do not necessarily reflect NSF views.

Appendix 1. Supplementary Information on Methods.

Both versions can be downloaded by registering as an educator on http://ncep.amnh.org . To find the original versions used in the study, which also include all instructions given to participating faculty, see “NSF CCLI / TUES Instructional Unit: Critical Thinking.” For classroom-ready, updated versions of the case studies, see “Applying Critical Thinking to an Invasive Species Problem,” and “Applying Critical Thinking to the Amphibian Decline Problem.” These cite more current literature and have been edited for clarity.

Supplementary data

Recipient(s) will receive an email with a link to 'Using Case Studies to Improve the Critical Thinking Skills of Undergraduate Conservation Biology Students' and will not need an account to access the content.

Subject: Using Case Studies to Improve the Critical Thinking Skills of Undergraduate Conservation Biology Students

(Optional message may have a maximum of 1000 characters.)

Citing articles via

Email alerts, affiliations.

  • Recent Content
  • All Content
  • Special Collections
  • Info for Authors
  • Info for Reviewers
  • Info for Librarians
  • Editorial Team
  • Prize Competition
  • Online ISSN 2473-9510
  • Copyright © 2024

Stay Informed

Disciplines.

  • Ancient World
  • Anthropology
  • Communication
  • Criminology & Criminal Justice
  • Film & Media Studies
  • Food & Wine
  • Browse All Disciplines
  • Browse All Courses
  • Book Authors
  • Booksellers
  • Instructions
  • Journal Authors
  • Journal Editors
  • Media & Journalists
  • Planned Giving

About UC Press

  • Press Releases
  • Seasonal Catalog
  • Acquisitions Editors
  • Customer Service
  • Exam/Desk Requests
  • Media Inquiries
  • Print-Disability
  • Rights & Permissions
  • UC Press Foundation
  • © Copyright 2023 by the Regents of the University of California. All rights reserved. Privacy policy    Accessibility

This Feature Is Available To Subscribers Only

Sign In or Create an Account

decision making case study learning

Effective Decision-Making: A Case Study

Effective decision-making:, leading an organization through timely and impactful action.

Senior leaders at a top New England insurance provider need to develop the skills and behaviors for better, faster decision-making. This virtually delivered program spans four half-day sessions and includes individual assignments, facilitator-led presentations, and simulation decision-making. Over the past two months, this program touched over 100 leaders, providing them with actionable models and frameworks to use back on the job.

For one of New England’s most iconic insurers, senior leaders are challenged to make timely, effective decisions. These leaders face decisions on three levels: ones they translate to their teams, ones they make themselves, and ones they influence. But in a quickly changing, highly regulated market, risk aversion can lead to slow and ineffective decisions. How can senior leaders practice in a safe environment the quick, yet informed, decision-making necessary for the job while simultaneously learning new models and techniques — and without the learning experience burdening their precious time?

The Effective Decision-Making program was artfully designed to immerse senior leaders in 16 hours of hands-on experience, including reflection and feedback activities, applicable exercises, supporting content, and participation in a business simulation to practice the core content of the program. Participants work together in small groups to complete these activities within a limited time frame, replicating the work environment in which these leaders must succeed. Continuous reflection and group discussion around results create real-time learning for leaders. Application exercises then facilitate the simulation experience and their work back on the job. The program employs a variety of learning methodologies, including:

  • Individual assignments that incorporate content and frameworks designed to develop effective decision-making skills.
  • Guided reflection activities to encourage self-awareness and commitments for action.
  • Large group conversations — live discussions focused on peer input around key learning points.
  • Small group activities, including virtual role plays designed to build critical interpersonal and leadership skills.
  • A dynamic business simulation in which participants are charged with translating, making, and influencing difficult decisions.
  • Facilitator-led discussions and presentations.

Learning Objectives

Participants develop and improve skills to:

  • Cultivate a leadership mindset that empowers, inspires, and challenges others.
  • Translate decisions for stronger team alignment and performance.
  • Make better decisions under pressure.
  • Influence individuals across the organization.
  • Better understand how one’s leadership actions impact business results

Design Highlights

Program agenda.

As a result of the COVID-19 pandemic and the need for social distancing, this program was delivered virtually. However, this didn't preclude the need to give leaders an opportunity to connect with, and learn from, one another. In response to those needs, Insight Experience developed a fully remote, yet highly interactive, offering delivered over four half-day sessions.

Interactive Virtual Learning Format

Effective Decision-Making was designed to promote both individual and group activities and reflection. Participants access the program via a video-conferencing platform that allows them to work together both in large and small groups. Learning content and group discussions are done as one large group, enabling consistency in learning and opportunities to hear from all participants. The business simulation decision-making and reflection activities are conducted in small groups, allowing teams to develop deeper connections and conversations.

Simulation Overview

IIC

Participants assume the role of a General Manager for InfoMaster, a message management provider. Their leadership challenge as the GM is to translate the broader IIC organizational goals into strategy for their business, support that strategy though the development of organizational capabilities and product offerings, manage multiple divisions and stakeholders, and consider their contribution and responsibility to the broader organization of which they are a part. 

Success in the simulation is based on how well teams:

  • Understand and translate organizational strategy into goals and plans for their business unit.
  • Align organizational initiatives and product development with broader strategies.
  • Develop employee capabilities required to execute strategic goals.
  • Hold stakeholders accountable to commitments and results.
  • Communicate with stakeholders and involve others in plans and decision-making.
  • Develop their network and their influence within IIC to help support initiatives for the organization

History and Results

Effective Decision-Making was developed in 2020 as an experience for senior-level leaders. After a successful pilot, the program was then rolled out to two more cohorts in 2021 and 2022. The senior-level leaders who participated in the program then requested we offer the same program to their direct reports. After some small adjustments to make the program more appropriate for director-level leaders, the program was launched in 2022 for approximately 100 directors.

Here is what some participants have said about this program:

  • “ One of the better programs we've done here at [our organization]. Pace was very quick but content was excellent and approach made it fun .”
  • “ Loved the content and the flow. Very nicely organized and managed. Thank you! ”
  • “ Really enjoyed the collaborative nature of the simulation.”
  • “ It was wonderful and I felt it is a great opportunity. Learnt and reinforced leadership training and what it would take to be successful.”
  • “One of the best I've experienced — especially appreciated how the reality of [our organization] was incorporated and it was with similarly situated peers.”
  • “This program was great! It gave good insight into how to enhance my skills as leader by adopting the leadership mindset.”
  • “Loved the fast pace, having a sim group that had various backgrounds in the company and seeing the results of our decisions at the corporate level.”
  • “Great program — I love the concepts highlighted during these sessions.”

Looking for results like these?

Leave comment, recent posts.

Leadership Foundations for Individual Contributors: A Case Study-featured-image

Leadership Foundations for Individual Contributors: A Case Study

Connecting Category Management Decisions and Business Outcomes: A Case Study-featured-image

Connecting Category Management Decisions and Business Outcomes: A Case Study

Accelerate the Implementation of Manager Expectations: A Case Study-featured-image

Accelerate the Implementation of Manager Expectations: A Case Study

Developing Effective Global Leaders: A Case Study-featured-image

Developing Effective Global Leaders: A Case Study

Delivering Exceptional Internal Customer Service: A Case Study-featured-image

Delivering Exceptional Internal Customer Service: A Case Study

Bringing Leadership Principles to Life: A Case Study-featured-image

Bringing Leadership Principles to Life: A Case Study

Navigating the Organization: A Case Study-featured-image

Navigating the Organization: A Case Study

Business Acumen for Operational Leaders: A Case Study-featured-image

Business Acumen for Operational Leaders: A Case Study

Taking Operational Excellence to the Next Level: A Case Study-featured-image

Taking Operational Excellence to the Next Level: A Case Study

Leading Self and Teams Through Complexity: A Case Study-featured-image

Leading Self and Teams Through Complexity: A Case Study

decision making case study learning

loading

How it works

For Business

Join Mind Tools

Article • 10 min read

Case Study-Based Learning

Enhancing learning through immediate application.

By the Mind Tools Content Team

decision making case study learning

If you've ever tried to learn a new concept, you probably appreciate that "knowing" is different from "doing." When you have an opportunity to apply your knowledge, the lesson typically becomes much more real.

Adults often learn differently from children, and we have different motivations for learning. Typically, we learn new skills because we want to. We recognize the need to learn and grow, and we usually need – or want – to apply our newfound knowledge soon after we've learned it.

A popular theory of adult learning is andragogy (the art and science of leading man, or adults), as opposed to the better-known pedagogy (the art and science of leading children). Malcolm Knowles , a professor of adult education, was considered the father of andragogy, which is based on four key observations of adult learners:

  • Adults learn best if they know why they're learning something.
  • Adults often learn best through experience.
  • Adults tend to view learning as an opportunity to solve problems.
  • Adults learn best when the topic is relevant to them and immediately applicable.

This means that you'll get the best results with adults when they're fully involved in the learning experience. Give an adult an opportunity to practice and work with a new skill, and you have a solid foundation for high-quality learning that the person will likely retain over time.

So, how can you best use these adult learning principles in your training and development efforts? Case studies provide an excellent way of practicing and applying new concepts. As such, they're very useful tools in adult learning, and it's important to understand how to get the maximum value from them.

What Is a Case Study?

Case studies are a form of problem-based learning, where you present a situation that needs a resolution. A typical business case study is a detailed account, or story, of what happened in a particular company, industry, or project over a set period of time.

The learner is given details about the situation, often in a historical context. The key players are introduced. Objectives and challenges are outlined. This is followed by specific examples and data, which the learner then uses to analyze the situation, determine what happened, and make recommendations.

The depth of a case depends on the lesson being taught. A case study can be two pages, 20 pages, or more. A good case study makes the reader think critically about the information presented, and then develop a thorough assessment of the situation, leading to a well-thought-out solution or recommendation.

Why Use a Case Study?

Case studies are a great way to improve a learning experience, because they get the learner involved, and encourage immediate use of newly acquired skills.

They differ from lectures or assigned readings because they require participation and deliberate application of a broad range of skills. For example, if you study financial analysis through straightforward learning methods, you may have to calculate and understand a long list of financial ratios (don't worry if you don't know what these are). Likewise, you may be given a set of financial statements to complete a ratio analysis. But until you put the exercise into context, you may not really know why you're doing the analysis.

With a case study, however, you might explore whether a bank should provide financing to a borrower, or whether a company is about to make a good acquisition. Suddenly, the act of calculating ratios becomes secondary – it's more important to understand what the ratios tell you. This is how case studies can make the difference between knowing what to do, and knowing how, when, and why to do it.

Then, what really separates case studies from other practical forms of learning – like scenarios and simulations – is the ability to compare the learner's recommendations with what actually happened. When you know what really happened, it's much easier to evaluate the "correctness" of the answers given.

When to Use a Case Study

As you can see, case studies are powerful and effective training tools. They also work best with practical, applied training, so make sure you use them appropriately.

Remember these tips:

  • Case studies tend to focus on why and how to apply a skill or concept, not on remembering facts and details. Use case studies when understanding the concept is more important than memorizing correct responses.
  • Case studies are great team-building opportunities. When a team gets together to solve a case, they'll have to work through different opinions, methods, and perspectives.
  • Use case studies to build problem-solving skills, particularly those that are valuable when applied, but are likely to be used infrequently. This helps people get practice with these skills that they might not otherwise get.
  • Case studies can be used to evaluate past problem solving. People can be asked what they'd do in that situation, and think about what could have been done differently.

Ensuring Maximum Value From Case Studies

The first thing to remember is that you already need to have enough theoretical knowledge to handle the questions and challenges in the case study. Otherwise, it can be like trying to solve a puzzle with some of the pieces missing.

Here are some additional tips for how to approach a case study. Depending on the exact nature of the case, some tips will be more relevant than others.

  • Read the case at least three times before you start any analysis. Case studies usually have lots of details, and it's easy to miss something in your first, or even second, reading.
  • Once you're thoroughly familiar with the case, note the facts. Identify which are relevant to the tasks you've been assigned. In a good case study, there are often many more facts than you need for your analysis.
  • If the case contains large amounts of data, analyze this data for relevant trends. For example, have sales dropped steadily, or was there an unexpected high or low point?
  • If the case involves a description of a company's history, find the key events, and consider how they may have impacted the current situation.
  • Consider using techniques like SWOT analysis and Porter's Five Forces Analysis to understand the organization's strategic position.
  • Stay with the facts when you draw conclusions. These include facts given in the case as well as established facts about the environmental context. Don't rely on personal opinions when you put together your answers.

Writing a Case Study

You may have to write a case study yourself. These are complex documents that take a while to research and compile. The quality of the case study influences the quality of the analysis. Here are some tips if you want to write your own:

  • Write your case study as a structured story. The goal is to capture an interesting situation or challenge and then bring it to life with words and information. You want the reader to feel a part of what's happening.
  • Present information so that a "right" answer isn't obvious. The goal is to develop the learner's ability to analyze and assess, not necessarily to make the same decision as the people in the actual case.
  • Do background research to fully understand what happened and why. You may need to talk to key stakeholders to get their perspectives as well.
  • Determine the key challenge. What needs to be resolved? The case study should focus on one main question or issue.
  • Define the context. Talk about significant events leading up to the situation. What organizational factors are important for understanding the problem and assessing what should be done? Include cultural factors where possible.
  • Identify key decision makers and stakeholders. Describe their roles and perspectives, as well as their motivations and interests.
  • Make sure that you provide the right data to allow people to reach appropriate conclusions.
  • Make sure that you have permission to use any information you include.

A typical case study structure includes these elements:

  • Executive summary. Define the objective, and state the key challenge.
  • Opening paragraph. Capture the reader's interest.
  • Scope. Describe the background, context, approach, and issues involved.
  • Presentation of facts. Develop an objective picture of what's happening.
  • Description of key issues. Present viewpoints, decisions, and interests of key parties.

Because case studies have proved to be such effective teaching tools, many are already written. Some excellent sources of free cases are The Times 100 , CasePlace.org , and Schroeder & Schroeder Inc . You can often search for cases by topic or industry. These cases are expertly prepared, based mostly on real situations, and used extensively in business schools to teach management concepts.

Case studies are a great way to improve learning and training. They provide learners with an opportunity to solve a problem by applying what they know.

There are no unpleasant consequences for getting it "wrong," and cases give learners a much better understanding of what they really know and what they need to practice.

Case studies can be used in many ways, as team-building tools, and for skill development. You can write your own case study, but a large number are already prepared. Given the enormous benefits of practical learning applications like this, case studies are definitely something to consider adding to your next training session.

Knowles, M. (1973). 'The Adult Learner: A Neglected Species [online].' Available here .

You've accessed 1 of your 2 free resources.

Get unlimited access

Discover more content

Staying the course.

Advice on Defending Your Position and Knowing When to Consider Changing Your Mind

Learning Curves

Improving Efficiency Through Faster Learning

Add comment

Comments (0)

Be the first to comment!

decision making case study learning

Enhance your in-demand workplace skills

Top skills - leadership, management, communication and more - are available to develop using the 3,000+ resources available from Mind Tools.

Join Mind Tools today!

Sign-up to our newsletter

Subscribing to the Mind Tools newsletter will keep you up-to-date with our latest updates and newest resources.

Subscribe now

Business Skills

Personal Development

Leadership and Management

Most Popular

Newest Releases

Article ai77em3

How to Get the Feedback You Need

Article aljc8px

5 Steps For Collaborating Successfully

Mind Tools Store

About Mind Tools Content

Discover something new today

Brush up on your interview skills.

Preparing for an interview

Developing Emotional Intelligence

Enhancing Your EQ

How Emotionally Intelligent Are You?

Boosting Your People Skills

Self-Assessment

What's Your Leadership Style?

Learn About the Strengths and Weaknesses of the Way You Like to Lead

Recommended for you

Using well-formed outcomes in goal setting.

Building a Picture of Your Objective, and Achieving It!

Business Operations and Process Management

Strategy Tools

Customer Service

Business Ethics and Values

Handling Information and Data

Project Management

Knowledge Management

Self-Development and Goal Setting

Time Management

Presentation Skills

Learning Skills

Career Skills

Communication Skills

Negotiation, Persuasion and Influence

Working With Others

Difficult Conversations

Creativity Tools

Self-Management

Work-Life Balance

Stress Management and Wellbeing

Coaching and Mentoring

Change Management

Team Management

Managing Conflict

Delegation and Empowerment

Performance Management

Leadership Skills

Developing Your Team

Talent Management

Problem Solving

Decision Making

  • Browse All Articles
  • Newsletter Sign-Up

DecisionMaking →

No results found in working knowledge.

  • Were any results found in one of the other content buckets on the left?
  • Try removing some search filters.
  • Use different search filters.

Brought to you by:

Harvard Business School

Decision-Making Exercise (A)

By: David A. Garvin, Michael A. Roberto

Provides questionnaires so students can compare their experiences with different decison-making processes. Students read "Growing Pains," a Harvard Business Review (HBR) case study, and then work in…

  • Length: 5 page(s)
  • Publication Date: Aug 16, 1996
  • Discipline: General Management
  • Product #: 397031-PDF-ENG

What's included:

  • Teaching Note
  • Educator Copy

$4.95 per student

degree granting course

$8.95 per student

non-degree granting course

Get access to this material, plus much more with a free Educator Account:

  • Access to world-famous HBS cases
  • Up to 60% off materials for your students
  • Resources for teaching online
  • Tips and reviews from other Educators

Already registered? Sign in

  • Student Registration
  • Non-Academic Registration
  • Included Materials

Provides questionnaires so students can compare their experiences with different decison-making processes. Students read "Growing Pains," a Harvard Business Review (HBR) case study, and then work in teams to come up with recommendations using a consensus approach to decison making. The next day using Decision-Making Exercise (B) and (C) and "Case of the Unhealthy Hospital," another HBR case study, and working in the same teams, use either a dialectical inquiry or devil's advocacy approach to decision making.

Learning Objectives

To introduce students to different types of decision-making processes, approaches to conflict, and ways that general managers can effectively direct and shape decision making.

Aug 16, 1996 (Revised: Feb 23, 2000)

Discipline:

General Management

Harvard Business School

397031-PDF-ENG

We use cookies to understand how you use our site and to improve your experience, including personalizing content. Learn More . By continuing to use our site, you accept our use of cookies and revised Privacy Policy .

decision making case study learning

  • Open access
  • Published: 13 February 2024

Development of a predictive machine learning model for pathogen profiles in patients with secondary immunodeficiency

  • Qianning Liu 1   na1 ,
  • Yifan Chen 1   na1 ,
  • Peng Xie 2   na1 ,
  • Ying Luo 2 , 4   na1 ,
  • Buxuan Wang 1 ,
  • Yuanxi Meng 3 ,
  • Jiaqian Zhong 3 ,
  • Jiaqi Mei 3 &
  • Wei Zou 2  

BMC Medical Informatics and Decision Making volume  24 , Article number:  48 ( 2024 ) Cite this article

17 Accesses

Metrics details

Secondary immunodeficiency can arise from various clinical conditions that include HIV infection, chronic diseases, malignancy and long-term use of immunosuppressives, which makes the suffering patients susceptible to all types of pathogenic infections. Other than HIV infection, the possible pathogen profiles in other aetiology-induced secondary immunodeficiency are largely unknown.

Medical records of the patients with secondary immunodeficiency caused by various aetiologies were collected from the First Affiliated Hospital of Nanchang University, China. Based on these records, models were developed with the machine learning method to predict the potential infectious pathogens that may inflict the patients with secondary immunodeficiency caused by various disease conditions other than HIV infection.

Several metrics were used to evaluate the models’ performance. A consistent conclusion can be drawn from all the metrics that Gradient Boosting Machine had the best performance with the highest accuracy at 91.01%, exceeding other models by 13.48, 7.14, and 4.49% respectively.

Conclusions

The models developed in our study enable the prediction of potential infectious pathogens that may affect the patients with secondary immunodeficiency caused by various aetiologies except for HIV infection, which will help clinicians make a timely decision on antibiotic use before microorganism culture results return.

Peer Review reports

Introduction

Human immune system plays a crucial role against all kinds of pathogens [ 1 ]. Defects in any components of the immune system lead to immunodeficiency. Depending on the underlying mechanisms, immunodeficiency is categorized as primary and secondary [ 2 ]. Primary immunodeficiency often involves genetic mutations in the components of immune system and includes antibody deficiency, complement deficiency, phagocytic deficiency and combined immune deficiency et al. [ 3 , 4 , 5 ]. Secondary immunodeficiency, which is the topic of our current study, can occur in the circumstances of human immunodeficiency virus (HIV) infection, long-term use of immunosuppressives, severe burns, chronic kidney diseases and malignant tumors et al. [ 6 , 7 , 8 ].

In HIV-infected patients the spectrum of pathogens causing opportunistic infections is well-documented, with specific reference to the CD4+ T cell counts influencing susceptibility to various pathogens [ 9 , 10 , 11 , 12 , 13 ]. However, the pathogen profiles responsible for the opportunistic infections in individuals with other forms of secondary immunodeficiency remain largely unexplored [ 14 , 15 ]. While blood or other sterile body fluid culture remains the gold standard for diagnosing infections [ 16 , 17 , 18 ], it has certain limitations such as time-consuming and a low positive rate [ 19 ]. Therefore, development of other methods capable of informing potential pathogens in patients with secondary immunodeficiency is crucial for timely guiding clinical decision-making, which sometimes is life-saving.

Mathematical models are valuable tools for infectious disease research [ 20 ], and one of their most important applications is predicting disease occurrence. With the development of different mathematical models postoperative infection in elderly spinal fractures [ 21 ], high risk types of human papillomaviruse infection [ 22 ] and disease outcome of septic shock [ 23 ] can be predicted. However, mathematical models haven’t yet been applied to predict the pathogen profiles commonly seen in the secondary immunodeficiency other than HIV infection. It is actually an important question considering the huge size of this patient population and the time it takes for the standard microbiological methods to report positive results. Therefore, to facilitate timely diagnosis of infections in patients with secondary immunodeficiency caused by the etiologies other than HIV, in the current study we intend to construct mathematical models that are based on certain lab test results including complete blood count (CBC), C-reactive protein (CRP), procalcitonin (PCT), erythrocyte sedimentation rate (ESR) and culture results from various body fluids, to help clinicians predict the most likely pathogens and then rapidly start empirical antibiotics before the culture results return in this patient population.

Data collection

Our study was approved by the ethical committee of The First Affiliated Hospital of Nanchang University with reference number (2022) CDYFYYLK (10–010). Data were collected from the medical records of the patients from the department of hematology, transplantation ICU, autoimmune diseases, oncology, intensive care unit (ICU), nephrology and burn from the year of 2012 to 2022.

Based on the definition of the non HIV infected immunodeficient population in the “Clinical practice guideline on early detection for pulmonary tuberculosis in general hospitals” that was published in China in late 2023 [ 24 ], all the included patients had infections of blood, abdominal cavity or other body locations that were secondary to immunodeficiency caused by either their original diseases or the treatments they received. Specimen was collected for culture from the infection sites once infection was suspicious.

The patient inclusion criteria of current study include: 1. Patients with hematological malignancies undergoing chemotherapy and/or radiology; 2. Patients with solid malignant tumors undergoing chemotherapy, radiology and/or surgery with bone marrow suppression for more than 2 weeks; 3. Patients with rheumatoid autoimmune diseases on long-term use of glucocorticoids (defined as prednisone≥30 mg/d or equivalent dosage of prednisone at 0.5 mg/kg/day for more than 2 weeks) and/or cytotoxic drugs; 4. Patients with hypoproteinemia caused by organ dysfunction such as cirrhosis and liver failure or protein loss due to chronic kidney disease; 5. Patients after organ transplantation on immunosuppressants; 6. Patients with severe burn; 7. Patients in ICU with possible secondary immunodeficiency. The exclusion criteria include:1. Short-term use of immunosuppressants (less than 2 weeks); 2. Primary immunodeficiency; 3. People living with HIV; 4. Patients with secondary immunodeficiency whose body fluid culture turned out to be negative; 5. Patients with secondary immunodeficiency whose culture results were suspicious of contamination.

Data preprocessing

The data containing 2024 observations and 42 features were collected from the medical records of the 1st affiliated hospital of Nanchang University in China from 2012 to 2022. Among these data, 1053 pieces were from the department of hematology, 13 pieces were from transplantation ICU, 114 pieces were from the department of autoimmune diseases, 104 pieces were from the department of oncology, 109 pieces were from ICU, 213 pieces were from the department of nephrology and 418 pieces of them were from the department of burn. The original dataset contains a great amount of missing information, and the data matrix is sparse with a lot of tedious text information thus needing to be preprocessed.

To ensure the models’ effectiveness, we deleted the records with missing values while retaining the features such as PCT and CRP that were significantly associated with cultivation results. After preprocessing, the data matrix had 443 observations and 13 columns left.

Finally, we had to deal with unstructured data. The column of cultivation results, as the response variable, needed to be first classified manually to prevent the reduction of model efficiency caused by too many categories. We realized that the number of manual classifications should not be too many or few. On one hand, if the number of categories was too few and in consideration of there being only two classes, Gram (+) and Gram (−) bacterial, our study would have no medical meaning. On the other hand, if the number of categories was too many, for example, 50 small classes, the effect of the model would be foreseeably terrible. Therefore, we categorized the cultivation results into seven classes based on the genus of the bacteria.

However, other text features like “Diagnostic Results” and “Infection Sites” could not be dealt with this way. The reason is that we could not easily classify them into different categories. For example, a patient was diagnosed with leukemia and diabetes, but we could not categorize it into a class called “Leukemia and Diabetes”, as the classes would be too miscellaneous in this way. A better approach would be transforming these two features into dummy variables. We used the tidyverse package in R to pull out the critical information in these two columns and create the dummy variables.

Imbalanced class

However, one question still remained. The seven classes in response variables were highly imbalanced, which may cause serious problems. For instance, the classifier would tend to categorize a small class into a big one although it still had acceptable performance. Although under-sampling and oversampling are the most widely used approaches to address this imbalance problem, they are also likely to cause underfitting and overfitting, respectively. In our study, we chose multinomial distribution sampling, which was proved feasible in the application [ 25 ], to tackle the imbalanced problem. The seven classes were resampled according to a multinomial distribution with probability { q i } i  = 1…7 , where:

α  = 0.5 was employed in our study. Sampling with this distribution would increase the number of small classes and decrease the number of big classes but not cause severe information loss and overfitting problems.

Feature selection

After transforming the “Diagnostic Results” and “Infection Sites” columns into dummy variables, there were 88 features in total. However, too many variables in the model will cause multicollinearity and overfitting. Accordingly, to simplify the model and retain the model performance, we need to reduce the dimensions. We choose the Recursive Feature Elimination method to perform the feature selection and include the top 20 most important variables in our model. Among these variables, PCT level, absolute numbers of lymphocytes and leukocytes, percentage of neutrocytes, age, number of neutrocytes, CRP, bone marrow suppression, lung infection, gender, septicemia and leukemia are the top 12 important variables.

Our primary goal is to predict the cultivation results. Therefore, we applied several machine learning models to the data and observed their performance. After completing initial data preprocessing steps including missing data handling, feature extraction, feature selection, and outcome weighting, the dataset was split into training and testing sets with an 80:20 ratio. The training set was used to develop and tune the prediction models, while the testing set was held out solely for final model evaluation.

Our first model, K Nearest Neighbor (KNN), was correlated with Euclidean distance. The only hyperparameter of this algorithm was the number of the nearest neighbors, k. In order to select the best hyperparameter, we used grid search to iterate k through 1 to 20.

Boosted logistic regression also had only one hyperparameter, nIter, representing the number of boosting iterations. We also set the range of nIter from 1 to 30 to perform the grid search.

Random Forest that was trained by the bagging method is an ensemble algorithm of Decision Trees. It also had only one hyperparameter, mtry. We also used grid search to iterate mtry, which represented the number of randomly selected predictors. The mtry value set was mtry  = {2, 3, 4, 5, 6, 7, 8, 9, 10,12,14,16,18,20,25,30,35,40,45,50}.

Gradient Boosting Machine is an ensemble of weak learners which has several hyperparameters. Since the number of parameters was greater than one, we used grid search to explore all the possible combinations till we reached a set that generated the best performance. This set included max tree depth {1, 5, 10,15,20} and number of trees {50,100,150,200, …, 450,500}.

Model selection

We used grid search to explore the best hyperparameters to obtain the best model performance. Grid search is a popular model selection method that iterates through every possible parameter combination to find the best one. The process is mechanized and effective.

In Fig.  1 , panels A-D depict the results of a 10-fold cross-validation grid search on the training dataset. Panel A details the KNN model, where the optimal number of neighbors (k) is 1. Panel B illustrates the Boosted Logistic Regression model, with the best performance at 30 boosting iterations (nIter). Panel C shows the Random Forest model, achieving highest accuracy with 6 randomly selected predictors (mtry). Finally, Panel D presents the Gradient Boosting Machine, where a maximum tree depth of 15 and 450 trees resulted in the highest accuracy. Notably, the optimal parameter for KNN, k = 1, raises concerns regarding potential overfitting.

figure 1

Hyperparameter Optimization Across Different Machine Learning Models. A - D  Performance metrics obtained from a 10-fold cross-validation grid search. A K-Nearest Neighbors (KNN) model accuracy as a function of the number of neighbors, with optimal performance at k = 1. B Boosted Logistic Regression model accuracy across boosting iterations, peaking at nIter = 30. C Random Forest model accuracy in relation to the number of randomly selected predictors, optimal at mtry = 6. D Gradient Boosting Machine model accuracy influenced by the number of boosting iterations and maximum tree depth, with the highest accuracy achieved at a depth of 15 and 450 trees. The selection of k = 1 for the KNN model suggests a potential overfitting issue that warrants further evaluation

Adjustment on multiclass ROC curve

Receiver Operating Characteristic Curve, also known as ROC Curve, is widely used for assessing binary classification problems. Our problem, however, is related to multiclass classification. Thus, the ROC curve needs some adjustments. There are two ways to address this problem.

Let n be the number of samples in the testing set, and \(\mathcal{P}\) be the number of classes. After training the model in the training set, we generated a probability matrix with n rows and \(\mathcal{P}\) columns where the ( i ,  j ) entry represented the probability of the i th sample in the j th class. Under every category, we could plot a ROC curve. Therefore, we could get \(\mathcal{P}\) ROC curves in the end. After calculation of their mean, we would have the final ROC curve.

Create a label matrix that has the same structure as the probability matrix. Each row is a sample’s one-hot code representing the class’s label. By converting these two matrices into two vectors by column and transposing them respectively, we got two column vectors which could be seen as the probability matrix of binary classification. Then we can easily plot the ROC curve.

The above two approaches are ‘macro’ and ‘micro’ scenarios in the sklearn.metric.roc_auc_score() function. The results of these two different methods should be close to each other. Therefore, we included both of them in our Fig.  2 to ensure the robustness of the models.

figure 2

Receiver Operating Characteristic (ROC) Curves for Machine Learning Models. A - D Comparison of model performance through ROC analysis. A K-Nearest Neighbors (KNN) ROC curves for various classes, highlighting the trade-off between true positive rate and false positive rate with an area under the curve (AUC) for the micro-average at 0.87 and macro-average at 0.91. B Boosted Logistic Regression ROC curves, showing improved performance with an AUC for the micro-average at 0.93 and macro-average at 0.92. C Random Forest ROC curves, indicating superior performance with an AUC for the micro-average at 0.98 and macro-average at 0.98. D Gradient Boosting Machine ROC curves, exhibiting exceptional discriminative power with an AUC for the micro-average at 0.98 and macro-average at 0.97

Figure  2 A-D display ROC curves for K-Nearest Neighbors, Boosted Logistic Regression, Random Forest, and Gradient Boosting Machine models, delineating their discriminative performance in pathogen classification, with the latter two models showing notably superior performance as evidenced by their near-perfect micro-average and macro-average AUCs of 0.98.

Confusion matrix

The confusion matrix is a specific table layout that visualizes the performance of the classification model. In the matrix, each column represents the true value, and each row represents the predicted value. Typically, the bigger the values in the diagonal are, the better the model performs.

Figure  3 A-D depict confusion matrices for the K-Nearest Neighbors, Boosted Logistic Regression, Random Forest, and Gradient Boosting Machine models, respectively, illustrating their classification accuracy across various pathogens. Each matrix shows the number of correct and incorrect predictions, with darker shades indicating higher counts. The Gradient Boosting Machine demonstrates the highest number of correct predictions for Enterobacteriaceae, while all models show good performance in correctly identifying the majority of pathogens, albeit with some false positives and negatives, as is typical in predictive modeling.

figure 3

Confusion Matrices for Model Performance Evaluation. This figure presents the confusion matrices of four machine learning models, allowing for a detailed assessment of prediction accuracy for various pathogen classes. Values along the diagonal represent correct classifications, while non-diagonal values indicate misclassifications. A The K-Nearest Neighbors (KNN) model matrix, with counts of true vs. predicted labels, showing a specific number for true positive rates in pathogen detection. B The Boosted Logistic Regression model matrix, detailing the true positives along the diagonal and misclassifications off-diagonal, reflecting the model’s predictive power and misclassification patterns. C The Random Forest model matrix, which illustrates a higher concentration of true positives along the diagonal, indicative of a model with strong predictive accuracy. D The Gradient Boosting Machine model matrix, showing high true positive rates, especially for certain pathogens, suggesting a high degree of model precision

Accuracy, recall, precision, brier score and F1 score

Accuracy is the most popular metric in classification problems. However, it may not be the most appropriate one for this issue because of the remaining imbalanced classes. We also used recall, precision, Brier Score, and F1-score as metrics. Notice that there are a few differences in these metrics between multiclass and binary classifications. True positive (TP) is just the entry that correctly predicts the true category as usual. However, false-negative (FN) is the sum of the whole row in the confusion matrix except true positive ones. Likewise, false positive (FP) is the sum of the whole columns except true positive elements. Apart from that, all the definitions are the same as those in binary classification.

Descriptive statistics

Demographic characteristics and infection-associated lab test results of studied patients from the department of hematology, rheumatology, oncology, ICU, nephrology and burn were shown in Table  1 . Except for the rheumatology department, male patients out-numbered female patients in all the other departments. The ages of male and female patients were compatible within each department. Patients from ICU were the oldest, followed by oncology department, nephrology department, rheumatology department, hematology department and burn department in age. Only hematology, oncology and burn department have collected enough data of CRP and PCT for analysis. As seen from Table 1 , CRP and PCT of the patients from these 3 departments were dramatically elevated. Except for the patients from hematology and rheumatology department, and the female patients from oncology department, all the other patients had elevated blood leukocyte number. Normal to slightly elevated leukocytes in the patients from hematology and rheumatology department and in the female patients from oncology department may be partly related to other treatments they received such as immunosuppressive therapy and chemotherapy. The absolute number of peripheral neutrocytes in all the patients was elevated. Interestingly, the absolute number of peripheral lymphocytes in the patients from hematology department was dramatically elevated but that in the patients from all the other departments was normal, which probably was related to the original diseases that the patients from hematology department were suffering.

Other interesting information that serves great medical purposes can further be extracted from Table 1 . The number of white blood cells in males has more extreme values than in females. That is to say, there are many male patients whose number of leukocytes is a lot greater than 100 × 10 9 ∕ while the greatest value in females is 99.38 × 10 9  ∕  L , which probably is associated with the biological difference between men and women. When it comes to infection location, the characteristics of blood infection is very unique compared to those of other infection locations. First, the mean PCT of blood infection patients is 78.94 ng  ∕  ml and the mean number of lymphocytes is 45.03 × 10 9  ∕  L , which are both much higher than those in other infection locations. This observation is consistent with what has been known about the clinical significance of PCT. In general, PCT increases slightly in patients with local bacterial infection but increases significantly among those with invasive infection such as blood infection causing systemic inflammatory response syndrome (SIRS), sepsis and septic shock [ 19 ]. Then, the mean number of neutrocyte in males with blood infection is 40.71 × 10 9  ∕  L , which is much higher compared to the others. In the end, the mean CRP in blood infection patients is around 0 and it is much lower than the others, indicating its specificity and sensitivity may be lower than PCT.

Model evaluation

As seen in Fig. 2 , KNN and Boosted Logistic Regression didn’t perform as well as Random Forest and Gradient Boosting Machine. It may be a shred of partial evidence that ensemble algorithms are more appropriate for this dataset. However, we still need further evidence to verify it.

As shown in Fig.  3 , KNN has a significant misclassification. Although the confusion matrix gave us so much specific information about models’ performance on the test set, it could hardly be called a straightforward and quantified metric because it could neither directly tell models’ specific performance nor easily be compared to other models. Therefore, we need to employ some quantified metrics to identify the performance of each model and make a comparison between different models.

The ensemble algorithms, Random Forest and Gradient Boosting Machine exceed others in performance. Meanwhile, we analyzed this table along with the confusion matrix and found some interesting facts. KNN and Random Forest tended to misclassify Enterobacteriaceae into other bacteria including Pseudomonas and Staphylococcus. Moreover, all these four classifiers seemed not good at predicting Staphylococcus and tended to misclassify other bacteria into Staphylococcus since they all had a relatively low precision score.

Table  2 presents a comprehensive overview of the performance metrics for four different machine learning models used to predict the presence of various pathogens in a clinical setting. The performance of each model is evaluated in terms of Accuracy, Brier Score, Precision, Recall, and F1 Score across different bacteria, as well as fungal infections, which provides a robust set of criteria for assessing the predictive capabilities.

The KNN model shows consistent performance, especially for the Enterobacteriaceae genus, with an accuracy of 77.53%, complemented by a moderate Brier Score of 0.449, and a high F1 Score of 0.779 due to its perfect precision. It also performs commendably well with fungi, achieving an F1 Score of 0.923 with a Brier Score that indicates reliable probabilistic predictions.

The Boosted Logistic Regression model overall appears to provide strong results, especially with the Streptococcus genus for which it achieves perfect scores in all three metrics, along with a Brier Score that supports the reliability of its probabilistic assessments. Furthermore, the model maintains high predictive power for both Enterobacteriaceae and fungi, with accuracy rates surpassing 83% and Brier Scores that reflect the consistency of the model’s predictions.

The Random Forest model exhibits outstanding performance, with exceptional results for the Streptococcus and Enterococcus genera, along with fungi – each achieving perfect accuracy, precision, and F1 Scores of 1.0000. These results are further corroborated by the model’s low Brier Scores, particularly noteworthy being the 0.221 for Enterobacteriaceae, indicating the model’s robust ability to correctly classify and balance the presence of these pathogens within the dataset with reliable probability estimates.

The Gradient Boosting Machine (GBM) model demonstrates exemplary performance overall, particularly with the Enterobacteriaceae genus, boasting an accuracy of 91.01% and a corresponding F1 Score of 0.944. The Brier Score of 0.214 for this model indicates a high level of precision in the probabilistic predictions, complementing its exceptional ability to balance precision and recall. It also performs flawlessly for Streptococcus, other Bacilli, Enterococcus, and fungi, as reflected by a perfect F1 Score of 1.0000 and supportive Brier Scores indicating accurate probability estimates.

The confidence intervals for accuracy, along with the Brier Scores, are provided for each genus within the models, which allows for a comprehensive statistical understanding of the accuracy and reliability of the predictions. Notably, for pathogens with the higher prevalence, the models demonstrate tight confidence intervals and lower Brier Scores, implicating robustness in their predictive accuracy.

In conclusion, this comparative analysis illustrates that advanced machine learning techniques can be highly effective for pathogen prediction in patients with secondary immunodeficiency.

Many non-infectious diseases have as a common complication the secondary development of infections, which more often occurs in the circumstances of long-term use of immunosuppressives, severe trauma, malignancy and other chronic diseases [ 26 , 27 ]. In this patient population the spectra of infectious pathogens are largely not systematically studied. The knowledge of it will help clinicians make timely, evidenced-based decisions on antibiotic use before culture results return, which sometimes is life-saving [ 28 , 29 , 30 , 31 ].

With advanced machine learning techniques and inclusion of 13 parameters in the analysis, in current study we successfully developed four types of mathematical models to predict potential pathogens in patients with secondary immunodeficiency. Among these models, gradient Boosting Machine was found perform the best. To our knowledge, current study is the first systematically exploring the pathogen profile in patients with secondary immunodeficiency with mathematical models. In consideration of so many types of pathogens and practical clinical needs, we categorized the cultivated pathogens into seven species, which we think should be instructive for the empirical start of antibiotics therapy before culture results return.

While our study has provided valuable insights, it is important to acknowledge certain limitations in our methodological approach. In line with the PROBAST (Prediction model Risk Of Bias Assessment Tool) guidelines designed for evaluating the risk of bias and applicability of prediction model studies, we recognize that our use of 10-fold cross-validation for internal validation, though robust, may impart limitations in the context of external validation. Specifically, our model’s predictive performance could be overly optimistic if the training data are not representative of the broader population or future patients. Furthermore, despite the randomization process in cross-validation, unmeasured confounders and unknown biases inherent to the initial sample collection could still influence the results. PROBAST emphasizes the importance of validating prediction models in external datasets, originating from different settings or time periods compared to the data used for model development, to ensure applicability and generalizability. We have not yet had the opportunity to test our model on a completely independent external dataset, and such an external validation remains part of our future work. Until then, the generalizability of our findings should be considered with caution, as the chance of model overfitting to the idiosyncrasies of our dataset cannot be completely ruled out. We aim to address this by planning prospective studies that will challenge our model with diverse datasets across different clinical settings.

In addition, it was a single-center study and most of the data were from the hematology department. Therefore, the prediction models established in the current study need to be further verified in more patients with different secondary immunodeficiency from various geographic areas. We also think secondary immunodeficiency needs a more precise definition although it is truly hard to define since so many diseases can cause it, which poses challenges in standardization.

In conclusion, despite these limitations, our study presents models that we believe can assist clinicians, especially those from non-infectious disease departments, in making a timely evidenced-based decision regarding antibiotic use while awaiting culture results.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Sattler S. The role of the immune system beyond the fight against infection. Adv Exp Med Biol. 2017;1003:3–14.

Article   CAS   PubMed   Google Scholar  

Justiz Vaillant AA, Qurie A. Immunodeficiency, in StatPearls. Treasure Island (FL) with ineligible companies: StatPearls publishing copyright © 2023, StatPearls publishing LLC.; 2023. Disclosure: Ahmad Qurie declares no relevant financial relationships with ineligible companies.

Google Scholar  

Allegra A, et al. Secondary immunodeficiency in hematological malignancies: focus on multiple myeloma and chronic lymphocytic leukemia. Front Immunol. 2021;12:738915.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Picard C, et al. Primary immunodeficiency diseases: an update on the classification from the International Union of Immunological Societies Expert Committee for primary immunodeficiency 2015. J Clin Immunol. 2015;35(8):696–726.

Devonshire AL, Makhija M. Approach to primary immunodeficiency. Allergy Asthma Proc. 2019;40(6):465–9.

Šedivá A, et al. Medical algorithm: diagnosis and management of antibody immunodeficiencies. Allergy. 2021;76(12):3841–4.

Article   PubMed   Google Scholar  

Tuano KS, Seth N, Chinen J. Secondary immunodeficiencies: an overview. Ann Allergy Asthma Immunol. 2021;127(6):617–26.

Vargas-Camaño ME, et al. Cancer as secondary immunodeficiency. Rev Alerg Mex. 2016;63(2):169–79.

Tangye SG, Palendira U, Edwards ES. Human immunity against EBV-lessons from the clinic. J Exp Med. 2017;214(2):269–83.

Article   PubMed   PubMed Central   Google Scholar  

Mortaz E, et al. Cancers related to Immunodeficiencies: update and perspectives. Front Immunol. 2016;7:365.

Hatherill M, White RG, Hawn TR. Clinical development of new TB vaccines: recent advances and next steps. Front Microbiol. 2019;10:3154.

Buchacher A, Iberer G. Purification of intravenous immunoglobulin G from human plasma--aspects of yield and virus safety. Biotechnol J. 2006;1(2):148–63.

Bose S, Grammer LC, Peters AT. Infectious chronic Rhinosinusitis. J Allergy Clin Immunol Pract. 2016;4(4):584–9.

José RJ, Periselneris JN, Brown JS. Opportunistic bacterial, viral and fungal infections of the lung. Medicine (Abingdon). 2020;48(6):366–72.

PubMed   Google Scholar  

Pimentel R, et al. Spontaneous bacterial peritonitis in cirrhotic patients: a shift in the microbial pattern? A retrospective analysis. GE Port J Gastroenterol. 2022;29(4):256–66.

Yamane N. Blood culture: gold standard for definitive diagnosis of bacterial and fungal infections--from the laboratory aspect. Rinsho Byori. 1998;46(9):887–92.

CAS   PubMed   Google Scholar  

Stefani S. Diagnostic techniques in bloodstream infections: where are we going? Int J Antimicrob Agents. 2009;34(Suppl 4):S9-12.

Aronson MD, Bor DH. Blood cultures. Ann Intern Med. 1987;106(2):246–53.

Xu HG, Tian M, Pan SY. Clinical utility of procalcitonin and its association with pathogenic microorganisms. Crit Rev Clin Lab Sci. 2022;59(2):93–111.

Rao DW, et al. Partnership dynamics in mathematical models and implications for representation of sexually transmitted infections: a review. Ann Epidemiol. 2021;59:72–80.

Wang H, et al. A mathematical prediction model for postoperative infection based on logistic multiple regression analysis in the assessment of surgical outcome and prediction of infection in elderly spinal fractures. Altern Ther Health Med. 2023:AT9287. Online ahead of print. 

Zhang J, Wang K. Mathematical modeling and computational prediction of high-risk types of human papillomaviruses. Comput Math Methods Med. 2022;2022:1515810.

PubMed   PubMed Central   Google Scholar  

Yamanaka Y, et al. Mathematical modeling of septic shock based on clinical data. Theor Biol Med Model. 2019;16(1):5.

Yanming Li, et al. Clinical practice guideline for early detection of pulmonary tuberculosis in comprehensive medical institutions. Chin J Antituberculosis. 2023. https://doi.org/10.19982/j.issn.1000-6621.20230428 .

Conneau A, Lample G. Cross-lingual language model pretraining[J]. Advances in neural information processing systems, 2019, 32.

Furman CD, Rayner AV, Tobin EP. Pneumonia in older residents of long-term care facilities. Am Fam Physician. 2004;70(8):1495–500.

Cillóniz C, et al. Impact of age and comorbidity on cause and outcome in community-acquired pneumonia. Chest. 2013;144(3):999–1007.

Di Pasquale MF, et al. Prevalence and etiology of community-acquired pneumonia in Immunocompromised patients. Clin Infect Dis. 2019;68(9):1482–93.

Mandell LA, et al. Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of community-acquired pneumonia in adults. Clin Infect Dis. 2007;44(Suppl 2):S27–72.

Lim WS, et al. BTS guidelines for the management of community acquired pneumonia in adults: update 2009. Thorax. 2009;64(Suppl 3):iii1-ii55.

Woodhead M, et al. Guidelines for the management of adult lower respiratory tract infections--full version. Clin Microbiol Infect. 2011;17(Suppl 6):E1-59.

Download references

Acknowledgements

Not applicable.

This work was supported by the National Natural Science Foundation of China (grant No.: 82360391), Jiangxi Department of Science and Technology (grant No.: 20202BAB206023) and the Double Thousand Talents Plan of Jiangxi Province to Wei Zou.

Author information

Qianning Liu, Yifan Chen, Peng Xie and Ying Luo are the co-first authors.

Authors and Affiliations

School of Statistics, Jiangxi University of Finance and Economics, Nanchang, 330013, Jiangxi, China

Qianning Liu, Yifan Chen & Buxuan Wang

Department of Infectious Diseases, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, Jiangxi, China

Peng Xie, Ying Luo & Wei Zou

The First Clinical Medical College,Jiangxi Medical College, Nanchang University, Nanchang, 330006, Jiangxi, China

Yuanxi Meng, Jiaqian Zhong & Jiaqi Mei

Department of Infectious Diseases, Third People’s Hospital of Jiujiang, Jiujiang, 332000, Jiangxi, China

You can also search for this author in PubMed   Google Scholar

Contributions

Q. L: Supervision, Writing – Review & Editing, Y.C: Methodology, Software, Formal Analysis, Visualization, Writing – Original Draft. P.X: Conceptualization, Writing – Original Draft. Y.L: Resource, Writing – Original Draft. B.W: Software, Investigation, Methodology. Y.M: Data Curation, Validation. J.Z and J.M: Resource, Validation. W.Z: Supervision, Project administration, Writing – Review & Editing, Funding acquisition.

Corresponding author

Correspondence to Wei Zou .

Ethics declarations

Ethics approval and consent to participate.

Our study was performed in accordance with the Declaration of Helsinki and approved by the ethical committee of The First Affiliated Hospital of Nanchang University with reference number (2022) CDYFYYLK(10–010). Written informed consent was obtained from individual or guardian participants.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Liu, Q., Chen, Y., Xie, P. et al. Development of a predictive machine learning model for pathogen profiles in patients with secondary immunodeficiency. BMC Med Inform Decis Mak 24 , 48 (2024). https://doi.org/10.1186/s12911-024-02447-w

Download citation

Received : 16 October 2023

Accepted : 30 January 2024

Published : 13 February 2024

DOI : https://doi.org/10.1186/s12911-024-02447-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Secondary immunodeficiency
  • Imbalanced data
  • K nearest neighbour
  • Boosted logistic regression
  • Random forest
  • Gradient boosting machine

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

decision making case study learning

Classification of inertial sensor-based gait patterns of orthopaedic conditions using machine learning: A pilot study

Affiliations.

  • 1 Department of Orthopaedics and Traumatology, University Hospital Basel, Basel, Switzerland.
  • 2 Department of Psychology and Sport Science, University of Bielefeld, Bielefeld, Germany.
  • 3 Department of Biomedical Engineering, University of Basel, Basel, Switzerland.
  • 4 Department of Clinical Research, University of Basel, Basel, Switzerland.
  • 5 Department of Spine Surgery, University Hospital Basel, Basel, Switzerland.
  • 6 Institute for Biomechanics, ETH Zürich, Zürich, Switzerland.
  • PMID: 38341759
  • DOI: 10.1002/jor.25797

Elderly patients often have more than one disease that affects walking behavior. An objective tool to identify which disease is the main cause of functional limitations may aid clinical decision making. Therefore, we investigated whether gait patterns could be used to identify degenerative diseases using machine learning. Data were extracted from a clinical database that included sagittal joint angles and spatiotemporal parameters measured using seven inertial sensors, and anthropometric data of patients with unilateral knee or hip osteoarthritis, lumbar or cervical spinal stenosis, and healthy controls. Various classification models were explored using the MATLAB Classification Learner app, and the optimizable Support Vector Machine was chosen as the best performing model. The accuracy of discrimination between healthy and pathologic gait was 82.3%, indicating that it is possible to distinguish pathological from healthy gait. The accuracy of discrimination between the different degenerative diseases was 51.4%, indicating the similarities in gait patterns between diseases need to be further explored. Overall, the differences between pathologic and healthy gait are distinct enough to classify using a classical machine learning model; however, routinely recorded gait characteristics and anthropometric data are not sufficient for successful discrimination of the degenerative diseases.

Keywords: classification; gait pattern; inertial measurement units; machine learning; osteoarthritis.

© 2024 The Authors. Journal of Orthopaedic Research® published by Wiley Periodicals LLC on behalf of Orthopaedic Research Society.

Grants and funding

  • Merian Iselin Foundation
  • Deutsche Arthrose-Hilfe e.V.
  • Schweizerische Gesellschaft für Orthopädie und Traumotologie
  • Open access
  • Published: 09 February 2024

Evaluation of cost-effectiveness of single-credit traffic safety course based on Kirkpatrick model: a case study of Iran

  • Mina Golestani 1 ,
  • Homayoun Sadeghi-bazargani 1 , 2 ,
  • Sepideh Harzand-Jadidi 1 &
  • Hamid Soori 3  

BMC Medical Education volume  24 , Article number:  128 ( 2024 ) Cite this article

312 Accesses

Metrics details

Training plays a role in reducing traffic accidents, and evaluating the effectiveness of training programs in managers’ decision-making for training continuation is important. Thus, the present study aimed to evaluate the cost-effectiveness of a single-credit traffic safety course based on the four levels of the Kirkpatrick model in all Iranian universities.

This interventional study aimed to evaluate the cost-effectiveness of a single-credit traffic safety course based on the Kirkpatrick model from 2016 to 2020 in Iran. The data were collected in three stages: (1) calculating the costs of offering traffic safety courses, (2) determining the effectiveness of providing such courses based on the levels of the Kirkpatrick model, and (3) evaluating the cost-effectiveness of administering traffic safety courses. Data were collected through researcher-made and standardized questionnaires. The research population included traffic safety course instructors and university students who could take this course. Finally, the data were analyzed with SPSS v. 23 and also calculations related to ICER, which shows the cost effectiveness of providing single credit course.

Scores of the students’ reaction level to the traffic safety course was 41.8% before the course; this score was estimated at 67% after the course. At the level of learning, students’ knowledge was 43.6% before the training course, which reached 73% after the course. At the level of behavior, the state of students’ desirable traffic behaviors was 54% before the course, which reached 66.1% after the course. The educational effectiveness of the course presentation at the level of results was 58.2% before and 74.8% after the course. While assuming that the weights of all model levels were constant, the cost of a 1% increase in the overall educational effectiveness by using the Kirkpatrick model, compared to not providing the course (not administering the intervention) was 486.46 USD.

The results showcased the effectiveness of the traffic safety course in all four levels of The Kirkpatrick model. Therefore, policy-makers and officials in charge of delivering this program should strengthen it and resolve its deficiencies to realize all its educational goals at the highest level.

Peer Review reports

Introduction

Traffic accidents are the leading cause of injuries and the second cause of death in Iran. The examination of human factors involved in accidents in Iran shows that most of these factors can be resolved to some extent with training. Sustainable traffic safety training in the Netherlands, Spain, and Germany partly explains the success of these countries in improving traffic behavior and reducing traffic accidents [ 1 , 2 ]. Therefore, the role of training in reducing traffic accidents is undeniable, but traffic safety training is not beneficial on its own [ 2 , 3 ]. A training program can justify its value when it provides reliable and valid evidence about the effects of training on improving the learners’ behavior and performance. As such, it is necessary to evaluate educational programs [ 4 , 5 ].

Educational program evaluation informs educational planners and staff about the quality of education. It also helps them become aware of the positive and negative aspects of the educational program and, thus, make educational programs and activities more effective [ 6 ]. Various approaches have been presented to evaluate the effectiveness of educational programs, including the Phillips, Kirkpatrick, and Sullivan [ 7 ]. The Kirkpatrick model is one of the most comprehensive and practical evaluation models for educational programs, which defines evaluation as determining the effectiveness of an educational program [ 8 ].

In this model, four levels are proposed for educational evaluation: (1) reaction level (measures how learners feel about all the factors affecting the delivery of the educational program), (2) learning level (determining the level of acquisition of skills, techniques, and facts taught and explained to the participants in the course, and can be understood through training before, during, and after participating in the course), (3) behavior level (the type and degree of changes in the behavior of participants as a result of participating in training courses), and (4) results level (the level of achievement of goals directly related to the organization) [ 9 , 10 ]. Numerous studies have evaluated educational programs with the Kirkpatrick model. Campbell et al. examined the effect of online cancer training on the level of knowledge and efficiency of nurses based on the Kirkpatrick model. Lillo-Crespo et al. used this model to study the effect of advanced healthcare training on the level of knowledge of students. Akbari et al. explored the impact of holding an in-service cardiopulmonary resuscitation training course on the knowledge level of 80 nurses based on the said model [ 11 , 12 , 13 ]. Therefore, this model can be adopted to evaluate educational courses in different fields.

In various organizations, vast sums of money are spent on training employees’ specific skills, while in most cases, the effectiveness of training is not measured, and proper feedback from learners is not obtained. By evaluating the courses, it is possible to judge to what extent the performance of educational programs has been desirable and to what extent it should be improved. It can also be judged how effective this program was and whether the expenses spent on the training were economically justified. Unfortunately, in the Iranian educational system, in many cases, the effectiveness evaluation system either does not exist or is disorderly or rudimentary [ 6 , 14 ]. Considering the importance of evaluating the effectiveness of educational programs in managers’ decision-making for the continuation of training, the present study aimed to evaluate the cost-effectiveness of a single-credit traffic safety course based on the four levels of The Kirkpatrick model. Using the results of the evaluation presented by this study, it is possible to apply the necessary reforms in the planning and delivery of this course and propose practical solutions to improve the quality of its delivery. If the desired outcomes are achieved, the final syllabus of the single-credit traffic safety course will be developed and presented to the Supreme Council of Cultural Revolution to be taught as a suitable and suggested model in all Iranian universities.

Materials and methods

To promote the RTAs-related knowledge in Iran, the task of developing traffic knowledge was assigned to the 2nd Territorial Agenda to carry out the relevant arrangements according to the needs announced by the Ministry of Health, Treatment and Medical Education and considering the potential of research fields and suitable background in this field, the specialized authority for the development of traffic knowledge and road accidents in the 2nd region of Establishment of Territorial Agenda after signing a joint memorandum with the Ministry of health, officially started his activity with the center of Tabriz University of Medical Sciences. Therefore, single-credit course of traffic safety was decided to present for the first time as a compulsory pilot course in volunteering universities. For this purpose, all experts in the field of traffic knowledge were invited and participated, then the educational curriculum of the course was designed, after the approval of this curriculum in the Ministry of Health and Medical Education, this course was presented for 5 years in the universities of medical sciences in the country.

This course was implemented for all students of all academic levels. Eight sessions were held during one academic semester. Each session lasted two hours. Of course, the classes were held both theoretically and practically, especially for the first aid part. The training was done in a hybrid way. For trainers due to the scope of the program, it is difficult to use traffic experts to teach this subject. In this regard four training of trainers (TOT) programs were performed. For students, three-fourths of the classes were completely face-to-face and one-fourth of the classes were held online.

This interventional study aimed to determine the cost-effectiveness of a single-credit traffic safety course in Iran in five years (from 2016 to 2020). The research population consisted of two groups: (1) a total of 2066 students of 12 Iranian universities of medical sciences across the country and (2) professors of these universities. The inclusion criteria for students were passing a single-credit traffic safety course and willingness to participate. The inclusion criteria for the professors were experience teaching this course and willingness to participate. Students and professors who did not wish to continue participating in the study for any reason or whose information was incompletely recorded were excluded.

To attain the primary goal of the study (cost-effectiveness evaluation), first, the costs of each component of the educational program were determined, and then, the cost-effectiveness of this course was evaluated by determining the effectiveness of each level of The Kirkpatrick model:

Calculating the total costs of offering a single-credit traffic safety course in Iran

To analyze the costs of holding the course compared to its effectiveness, first, all the costs spent on the delivery of the traffic safety training program (including the costs of developing and improving human resources, the costs of providing infrastructure, the costs related to the preparation, production, and presentation of educational materials, the costs of teaching and learning process) were calculated the by step down method.

Investigating the educational effectiveness of the traffic safety course based on four levels of The Kirkpatrick model

The Kirkpatrick model was employed to evaluate the effectiveness of the course; in fact, it measured the degree of realization of the educational goals of the course and the effectiveness of the measures taken to achieve its goals. For this purpose, the effectiveness of the traffic safety course was explored at four levels once before the implementation of the course and then 6 months after the implementation of the course was re-evaluated.

The students were surveyed before and after the traffic safety course to check the reaction level. In this survey, 2066 students from 12 universities participated and filled out a researcher-made questionnaire with 9 questions. This instrument examined the participants’ feelings about participating in the course and their degree of satisfaction with the course content. The validity of the questionnaire was confirmed, with a content validity index (CVI) of 0.84, and its reliability was confirmed with an intra-class correlation coefficient (ICC) of 0.81.

The test-retest method was used to check the learning level. To this end, the traffic knowledge level of the students was assessed in the first session of the course using a researcher-made questionnaire. The validity of the questionnaire was confirmed with the content validity index (CVI) and content validity ratio (CVR), which were above 80%, and its reliability was confirmed with the correlation coefficient (ICC).

Then, after 6 months, in the last session, this test was re-administered using the same questionnaire to the same students. A sample of 1056 students was considered for assessing the impact of the course; of these, some students were evaluated only once and had incomplete information. Therefore, 293 students were excluded, and finally, the data of 763 students were analyzed. The instrument designed to evaluate the impact of the training course included 19 questions about different areas of traffic. Out of 19 questions, two descriptive questions dealt with the symptoms of sleep apnea and key points about environmentally friendly driving; 17 tests examined the students’ traffic knowledge in 5 general domains: safety improvement and epidemiology (4 items), pedestrian safety (2 items), first aid (2 items), vehicle safety standards (3 items), road and special traffic issues (6 items).

In order to check the level of behavior (i.e., the manner and degree of changes in traffic behavior) before and six months after the course, the Pedestrian Traffic Behavior Questionnaire designed by Haghighi et al. [ 15 ] was administered to the 515 students under study. This questionnaire included 29 items that measured pedestrian behaviors in five dimensions (following rules, violations, positive behaviors, distractions, and aggressive behaviors) on a 5-point Likert scale. Higher scores indicated safer behavior.

In examining the level of results, the views of 25 instructors about the impacts of the traffic safety course and the results of its presentation were examined in interviews before and after the course (6 months after the implementation of the program). The instrument used for these interviews was a researcher-made questionnaire that examined the level of achievement of the goals of the Ministry of Health and Medical Education and the expert authority on traffic knowledge development.

After collecting data at each level, the data were inputted to SPSS v. 23. Data analysis in the effectiveness section was performed with descriptive statistics (mean, standard deviation, frequency, and percentage) and statistical tests, such as paired t tests.

Evaluating the educational cost-effectiveness of the traffic safety course based on the Kirkpatrick model

The Cost-effectiveness analysis (CEA) technique for educational interventions was used to evaluate the cost-effectiveness of the traffic safety course. CEA measures the relationship between a project’s total inputs or costs and its outputs or objective results. In this technique, both cost and effectiveness dimensions should be quantitatively determined. CEA can be performed in two ways: (a) comparing different strategies and ways to reach a goal and determining the best strategy by simultaneously considering the costs and outcomes of each strategy; this method can be applied in education through a comparison between educational centers, types of education, or different teaching methods. (b) comparing two or more faculties, universities, and even instructors who behave with almost the same costs but different educational effectiveness and determining the most effective educational institution or instructor. In both cases, we must determine, measure, and calculate the incremental cost-effectiveness ratio (ICER) for each strategy or educational institution/instructor. This ratio determines the additional cost of achieving a higher output by a strategy compared to other strategies.

The following equations were used for the CEA: CER = C/E.

Average cost-effectiveness = the costs of offering the course divided by the model’s effectiveness levels / Educational effectiveness divided by the model’s effectiveness level

ICER = (costs before administering the educational course – costs after administering the educational course) / Educational effectiveness before the course – educational effectiveness after the course.

Overall costs of presenting the single-credit traffic safety course at the national level

Based on Table  1 , the greatest share of the costs of course presentation belonged to the development and improvement of human resources, the teaching-learning process, and the development of teaching materials, respectively.

Educational effectiveness of the traffic safety course based on four levels of The Kirkpatrick model

Reaction level.

Out of 2066 students participating in this section, most students belonged to Sabzevar (36.93%) and Tabriz (32.28%) universities of medical sciences. Table  2 shows the results of the students’ survey about the traffic safety course for each item. The results show that most students had moderate and moderate-to-high satisfaction in all items. Based on the reaction level results, the score of the students’ reaction level to the traffic safety course was 41.8% before the course; after the course, this score was estimated at 67%.

Learning level

A total of 763 students from 25 fields of medical sciences participated in this part, most of whom belonged to allied health sciences (37.35%). About half of the participants were undergraduate students (56.23%), and most participants were from Sabzevar (39.29%) and Tabriz (26.08%) universities of medical sciences. According to Table  3 , which presents the students’ traffic knowledge status before and after the course, the students’ total knowledge and their traffic knowledge in different dimensions increased strongly and significantly after the course ( p  < 0.001). Based on the results of the learning level of The Kirkpatrick model, the students’ knowledge level was 43.6% before the course; after the course, it was estimated at 73%.

Behavior level

Based on the results of examining the educational effectiveness at the behavior level, the state of students’ desirable traffic behaviors before the course was 54%, which reached 66.1% after the course. This behavior change was 64% in following the rules, 58% in violations, 61% in positive behavior, 65% in distraction, and 90% in violent behaviors.

Results level

Based on the interviews with traffic safety course instructors, most of the instructors (75%) believed that offering this course greatly impacted the students’ social behavior and would continue to exert its effects in the future, too. Most of the instructors (84%) believed that the course greatly influenced the students’ personal lives. About 80% of the instructors believed that the course’s content greatly helped improve the students’ safety. About 71% of them held that offering the course to students could effectively shape their expectations of officials in demanding civil rights. Besides, 64% of instructors said this course motivated students to learn more about traffic. In general, the educational effectiveness of course delivery at the level of results was 58.2% before the course, which reached 74.8% after the course.

The cost-effectiveness of offering a traffic safety course based on the Kirkpatrick model

Table  4 shows the average educational cost-effectiveness ratio (CER) based on Kirkpatrick model levels. According to the results, the average cost per 1% educational effectiveness is 293.56 USD at the reaction level, 354.31 USD at the learning level, 436.79 USD at the behavior level, and 61.52 USD at the results level.

Table  5 presents the educational incremental cost-effectiveness ratio (ICER) of offering the traffic safety course based on The Kirkpatrick model. Based on the results, the cost per 1% increase in educational effectiveness is 662.89 USD at the reaction level, 576.02 USD at the learning level, 1,256.31 USD at the behavior level, and 2,742.95 USD at the results level. The cost per 1% increase in the overall educational effectiveness obtained by using the Kirkpatrick model (compared to not offering the course (no intervention)) is 486.46 USD (assuming that the weights of all model levels remain constant).

This study determined the cost-effectiveness of a traffic safety course based on the Kirkpatrick model. The results revealed that offering this single-credit course to students is a cost-effective educational program. The costs imposed on society due to accidents and the resulting injuries are about 7% of GDP (gross domestic product), and the (disability adjusted life years) DALY due to traffic accidents was estimated at about 1738 years of life with premature death and disability per 100,000 people in Iran in 2016. Based on calculating the cost of increasing the effectiveness of this course, the cost per 1% increase in the overall educational effectiveness obtained using The Kirkpatrick model, compared to the non-presentation of the course (no intervention), is 486.46 USD (assuming the constant weights of all levels of the model). In fact, this cost is not high compared to the effectiveness achieved at the levels of the Kirkpatrick model, especially at the level of results. Therefore, the estimated cost due to the delivery of the traffic safety training course, which is about 4,742.28 USD from the share of health system costs to improve its effectiveness by 1%, is not significant compared to the high costs of traffic accidents and injuries. Although providing such programs entails costs, it is an investment in human resources. Especially in promoting a safety culture and developing traffic knowledge, such courses can reduce the high costs of traffic accidents and injuries and lead to considerable economic savings. They also increase the productivity of the labor force, who lose their lives or are disabled daily due to traffic accidents.

The study by Bazarafkan et al. to evaluate the effectiveness of training courses for health volunteers according to the Kirkpatrick model, showed that the training program for health volunteers is effective [ 16 ]. Myall’s study also confirmed the effectiveness of the internship mentorship program for nursing students [ 17 ].

At the reaction level, the cost per 1% increase in educational effectiveness was 662.89 USD. At this level, most participants had moderate-to-high satisfaction with the presentation of the traffic safety course and found the course content practical and useful. The educational effectiveness at this level was 41.8% before the course and 67% after the course. This significant increase in effectiveness at the reaction level can be due to the impact of the course content on the students’ personal and social life and behaviors. The results of this level align with the findings of previous studies based on The Kirkpatrick model. Nezamianpour et al. evaluated the reaction of nurses toward the training course on working with electroshock equipment to be favorable [ 18 ]. In Mohan’s study, most participants were highly satisfied with the course at the reaction level [ 19 ]. In Yoon et al.‘s study, which explored the training program for the professional development of physicians, the learners were satisfied with the course [ 20 ]. Akbari et al. also evaluated the reaction of nurses and paramedics to the cardiopulmonary resuscitation course [ 11 ].

Based on the study’s results, the cost per 1% increase in educational effectiveness was 576.02 USD at the learning level. At this level, students’ knowledge before the traffic safety course was 43.6%, which was estimated at 73% after the course. According to the results of the present study, passing the traffic safety course significantly increased students’ traffic knowledge in different traffic domains. Students showed the most significant increase in knowledge in vehicle and road safety standards. The significant increase in students’ traffic knowledge can be because the teaching of various traffic safety topics was neglected in the Iranian education system, and the current level of knowledge about such topics is low among people, including students. Therefore, providing practical training can play a key role in improving students’ traffic knowledge. Students can improve society’s knowledge about traffic safety by transferring their traffic knowledge to their family and friends. The results of this level were aligned with the results of Bazarafkan et al. In the cited study, there was a significant difference between the participants’ average scores on the post-test compared to the pre-test, indicating that their level of awareness, knowledge, and skills increased after the health volunteer program [ 16 ]. Heidari et al. also measured the impact of a training workshop on new teaching methods for healthcare workers; they showed a significant difference between the learning scores of the participants before and after the course, demonstrating an increase in their learning and knowledge after the course [ 21 ]. Hojjati et al. assessed the effectiveness of in-service training courses for nurses based on the Kirkpatrick model, and their results were in line with the results of the present study regarding the learning level [ 14 ]. Le et al. evaluated the training program for physicians in Vietnam with the Kirkpatrick model and reported a positive improvement in the participants’ learning and skills [ 22 ].

At the behavior level, the cost per 1% increase in educational effectiveness was 1,256.31 USD. Based on the results of educational effectiveness at this level, the state of desirable traffic behaviors among students was 54% before the course, which reached 66.1% after the course. This change was observed in different aspects of traffic behavior, especially in violent behaviors. Of course, compared to other levels of The Kirkpatrick model, the change in behavior after the course was less (12%). It seems that continuing to provide traffic safety courses, increasing the number of units and hours, and resolving its deficiencies can have a greater impact on changing the behavior of learners. Mollakazemi et al. showed that participating in occupational medicine retraining courses can positively affect changing the behavior of general practitioners [ 4 ]. Nega et al.‘s study about the educational program at a faculty of medicine and pharmacy evaluated the third level of The Kirkpatrick model to be positive [ 22 ]. In the study by Ranjdoust et al., the behavioral attitudes of trained students were increased in terms of being influenced by what they learned compared to untrained students [ 23 ].

At the level of results, for a 1% increase in educational effectiveness, the cost was 2,742.95 USD. At this level, the effectiveness was 58.2% before and 74.8% after the course, which is a significant change. The instructors believed that the traffic safety course had a positive effect on various aspects of the students’ lives, including their personal and social lives and their traffic behaviors, and the continuation of this course could exert positive impacts in the future as well. The results of this level of study were in line with the study by Jamaledini et al., who showed that the results of crisis management training courses were effective [ 24 ]. In Shayan et al.‘s study, the reduction of pneumonia infection in the ICU of Taleghani Hospital, Tehran, demonstrated the effectiveness of staff training programs [ 25 ]. In the study by Dehghani et al., the results of the level of nurses’ behavior and the level of results also showed that most of the training course objectives were attained [ 2 , 26 ].

Strengths and limitations

The present study is the first in Iran to evaluate the cost-effectiveness of a single-credit traffic safety course based on the Kirkpatrick model. Administrative bureaucracies were a limitation that affected the program’s effective administration. By attracting the support of relevant officials in this field and holding numerous meetings, we tried to solve the problems related to the administrative bureaucracy. The other limitation was the impossibility of accessing all the students who had passed this course. Additionally, self-reported approach to evaluate the behavior of pedestrians at the behavior level was another limitation in the current study.

The current study’s findings revealed the effectiveness of the traffic safety course in all 4 levels of the Kirkpatrick model. The majority of learners were satisfied with the presentation of the course. The traffic safety course improved the participants’ traffic knowledge and behavior, and the course content was useful in their personal and social lives. Therefore, policy-makers and authorities in charge of the delivery of this program should improve this course and resolve its deficiencies so that all its educational goals can be realized at their highest level.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Arasteh H, Behrangi M, Naveebrahim A, Rafiee H. Developing traffic order & safety: examining traffic training in six European countries. 2011.

Bakhtari Aghdam F, Sadeghi-Bazargani H, Azami-Aghdash S, Esmaeili A, Panahi H, Khazaee-Pool M, et al. Developing a national road traffic safety education program in Iran. BMC Public Health. 2020;20:1–13.

Article   Google Scholar  

Ebrahimipour H, Emamian H. Performance evaluation of Bardaskan city health network: using the model of the European Foundation for Quality Management (EFQM). J Health Promotion Manage. 2014;3(4):27–36.

Google Scholar  

Laleh MA, Mollakazemi M, Seyedmehdi SM. Assessment of occupational medicine retraining course on general practitioners’ efficacy using Kirkpatrick’s model. J Health Field. 2018;6(2).

Yazdani S, Akbarilakeh M. Which health cares are related to the family physician? A critical interpretive synthesis of literature. Iran J Public Health. 2017;46(5):585.

PubMed   PubMed Central   Google Scholar  

Gazerani A, Karimi Moonaghi H. Using Kirk Patrick evaluation method in the effectiveness of nursing programs: a review article. Navid No. 2022;25(82):81–90.

Ghorbandoost R, Zeinabadi H, Shabani Shafiabadi M, Mohammadi Z. Evaluation of in-service training course of nurses and midwives (neonatal resuscitation) using kirkpatrick’s model. Res Med Educ. 2020;12(3):4–11.

Smidt A, Balandin S, Sigafoos J, Reed VA. The Kirkpatrick model: a useful tool for evaluating training outcomes. J Intellect Dev Disabil. 2009;34(3):266–74.

Article   PubMed   Google Scholar  

Bates R. A critical analysis of evaluation practice: the Kirkpatrick model and the principle of beneficence. Eval Program Plan. 2004;27(3):341–7.

Mohamed R, Alias AAS. Evaluating the effectiveness of a training program using the four level Kirkpatrick model in the banking sector in Malaysia. 2012.

Akbari M, Dorri S, Mahvar T. The effectiveness of in-service training on cardiopulmonary resuscitation: report of first and second levels of Kirkpatrick’s model. Dev Strategies Med Educ. 2016;3(1):67–72.

Campbell K, Taylor V, Douglas S. Effectiveness of online cancer education for nurses and allied health professionals; a systematic review using Kirkpatrick evaluation framework. J Cancer Educ. 2019;34:339–56.

Lillo-Crespo M, Sierras-Davo MC, MacRae R, Rooney K. Developing a framework for evaluating the impact of Healthcare Improvement Science Education across Europe: a qualitative study. J Educational Evaluation Health Professions. 2017;14.

Hojjati H, Mehralizadeh Yl, Farhadirad H, Alostany S, Aghamolaei M. Assessing the effectiveness of training outcome based on Kirkpatrick model: case study. Q J Nurs Manage. 2013;2(3):35–42.

Bazargan HS, Haghighi M, Heydari ST, Soori H, Shahkolai FR, Motevalian SA, et al. Developing and validating a measurement tool to self-report pedestrian safety-related behavior: the pedestrian behavior questionnaire (PBQ). Bull Emerg Trauma. 2020;8(4):229.

Khanipoor F, Cheraghi A, Bazrafkan L. Evaluation of Training Program of Health Volunteers and covered households of Urban areas of Mamasani City using Kirk Patrick Model at 2020. J Health Res Community. 2022;8(2):12–24.

Myall M, Levett-Jones T, Lathlean J. Mentorship in contemporary practice: the experiences of nursing students and practice mentors. J Clin Nurs. 2008;17(14):1834–42.

Nezamian Pourjahromi ZN, Ghafarian Shirazi H, Ghaedi H, Momeninejad M, Mohamadi Baghmolaee M, Abasi A, et al. The effectiveness of training courses on how to work with DC Shock device for nurses, based on Kirkpatrick Model. Iran J Med Educ. 2012;11(8):896–902.

Mohan DR, Prasad MV, Kumar KS. Impact of training on bio medical waste management–A study and analysis. EXCEL Int J Multidisciplinary Manage Stud. 2012;2(6):69–80.

Yoon HB, Shin J-S, Bouphavanh K, Kang YM. Evaluation of a continuing professional development training program for physicians and physician assistants in hospitals in Laos based on the Kirkpatrick model. J Educational Evaluation Health Professions. 2016;13.

Heydari MR, Taghva F, Amini M, Delavari S. Using Kirkpatrick’s model to measure the effect of a new teaching and learning methods workshop for health care staff. BMC Res Notes. 2019;12(1):1–5.

Nga LTQ, Aya G, Trung TT, Vinh NQ, Khue NT. Capacity building toward evidence-based medicine among healthcare professionals at the university of medicine and pharmacy, ho chi minh city, and its related institutes. Japan Med Association Journal: JMAJ. 2014;57(1):49.

Kazemi M, Mojallal Choboghloo M. Evaluation of the effectiveness of Tabliz University of Medical Sciences journal students’ education based on Donald Krakpatrick’s model. Educ Strategies Med Sci. 2022;15(5):459–70.

Jamaledini SH, Sharifi Sedeh M, Narenji Thani F, Hadavandi M, Biranvandmanesh F, Salehi A. Evaluating the effectiveness of basic courses of crisis management training in Red Crescent society based on Kirkpatrick’s model. Q Sci J Rescue Relief. 2017;8(4):0.

Shayan S, Nowroozi Rad N. Evaluation of the effectiveness of staff in-service training system, Tehran Taleghani Hospital with Kirickpatrik approach. J Med Spiritual Cultivation. 2019;28(2):10–23.

Sadeghi-Bazargani H, Sharifian S, Khorasani-Zavareh D, Zakeri R, Sadigh M, Golestani M, et al. Road safety data collection systems in Iran: a comparison based on relevant organizations. Chin J Traumatol. 2020;23(05):265–70.

Article   PubMed   PubMed Central   Google Scholar  

Download references

Acknowledgements

The authors express their gratitude to the Vice-Chancellor of Education of Tabriz University of Medical Sciences and the staff of the Road Traffic Injury Research Center for their cooperation.

Not applicable.

Author information

Authors and affiliations.

Road Traffic Injury Research Centre, Tabriz University of Medical Sciences, Tabriz, Iran

Mina Golestani, Homayoun Sadeghi-bazargani & Sepideh Harzand-Jadidi

Department of Epidemiology and Biostatistics, School of Health, Tabriz University of Medical Sciences, Tabriz, Iran

Homayoun Sadeghi-bazargani

Safety Promotion and Injury Prevention Research Center, Department of Epidemiology, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehhran, Iran

Hamid Soori

You can also search for this author in PubMed   Google Scholar

Contributions

MG designed the study and managed the data collection and drafted the manuscript. HSB created the main idea of the study and cooperate in writing the text of the manuscript. SHJ did the data collection and data analysis and article writing has collaborated, and HS has collaborated in study design and data analysis and article writing.

Corresponding author

Correspondence to Mina Golestani .

Ethics declarations

Ethics approval and consent to participate.

In compliance with the principles of research ethics, the participants were assured that the questionnaires would be anonymous and all their information would remain confidential. They obtained written informed consent before enrollment. The study received ethical approval from the Ethics Committee of Tabriz University of Medical Sciences (code of ethics IR.TBZMED.REC.1397.438).

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Golestani, M., Sadeghi-bazargani, H., Harzand-Jadidi, S. et al. Evaluation of cost-effectiveness of single-credit traffic safety course based on Kirkpatrick model: a case study of Iran. BMC Med Educ 24 , 128 (2024). https://doi.org/10.1186/s12909-024-05122-w

Download citation

Received : 08 December 2023

Accepted : 01 February 2024

Published : 09 February 2024

DOI : https://doi.org/10.1186/s12909-024-05122-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cost-effectiveness
  • Kirkpatrick
  • Road traffic accidents
  • Traffic safety

BMC Medical Education

ISSN: 1472-6920

decision making case study learning

  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, efficient data-driven machine learning models for scour depth predictions at sloping sea defences.

www.frontiersin.org

  • 1 UCD School of Civil Engineering, UCD Dooge Centre for Water Resources Research and UCD Earth Institute, University College Dublin, Dublin, Ireland
  • 2 School of Engineering, University of Warwick, Coventry, United Kingdom

Seawalls are critical defence infrastructures in coastal zones that protect hinterland areas from storm surges, wave overtopping and soil erosion hazards. Scouring at the toe of sea defences, caused by wave-induced accretion and erosion of bed material imposes a significant threat to the structural integrity of coastal infrastructures. Accurate prediction of scour depths is essential for appropriate and efficient design and maintenance of coastal structures, which serve to mitigate risks of structural failure through toe scouring. However, limited guidance and predictive tools are available for estimating toe scouring at sloping structures. In recent years, Artificial Intelligence and Machine Learning (ML) algorithms have gained interest, and although they underpin robust predictive models for many coastal engineering applications, such models have yet to be applied to scour prediction. Here we develop and present ML-based models for predicting toe scour depths at sloping seawall. Four ML algorithms, namely, Random Forest (RF), Gradient Boosted Decision Trees (GBDT), Artificial Neural Networks (ANNs), and Support Vector Machine Regression (SVMR) are utilised. Comprehensive physical modelling measurement data is utilised to develop and validate the predictive models. A Novel framework for feature selection, feature importance, and hyperparameter tuning algorithms are adopted for pre- and post-processing steps of ML-based models. In-depth statistical analyses are proposed to evaluate the predictive performance of the proposed models. The results indicate a minimum of 80% prediction accuracy across all the algorithms tested in this study and overall, the SVMR produced the most accurate predictions with a Coefficient of Determination ( r 2 ) of 0.74 and a Mean Absolute Error (MAE) value of 0.17. The SVMR algorithm also offered most computationally efficient performance among the algorithms tested. The methodological framework proposed in this study can be applied to scouring datasets for rapid assessment of scour at coastal defence structures, facilitating model-informed decision-making.

1 Introduction

Scouring is the process of gradual erosion and removal of bed materials in the vicinity of coastal structures caused by hydrodynamic forces from waves and tidal currents. In addition to the hydrodynamic forces from tides and waves, which can be compounded by climate change influences, critical infrastructures including underwater pipelines, coastal defence structures, and coastal zone management processes such as dredging can contribute to conditions that are favourable to increased seabed scouring through the disruption of natural sediment transport processes and the alteration of the prevailing hydrodynamic environment in the nearshore region. Scouring at the toes of critical coastal defence structures (e.g., sloping and vertical seawalls) can result in the loss of structural integrity ( Salauddin and Pearson, 2019a ; Salauddin and Pearson, 2019b ; Tseng et al., 2022 ) and ultimate failure, and is particularly critical in the management of coastal flood risks. Toe scouring can elevate wave overtopping discharge at defences, by increasing water depth at the defence and causing the formation of larger waves at the structure ( Peng et al., 2023 ). The sedimentation and scouring in the vicinity of coastal structures can alter the bottom topography and bed slope, which in turn can influence wave shoaling and breaking processes and alters the turbulent kinetic energy budget of waves and their potential to overtop defences ( Peng et al., 2023 ). Given that extreme events in coastal regions are predicted to increase in intensity and frequency under climate change scenarios, increased exposure to toe scouring at coastal defences is likely to be an increasing issue in the coming years ( Fitri et al., 2019 ; Salauddin and O’Sullivan, 2022 ). The availability of accurate methods to predict toe scour depths is, therefore, critical for mitigating scour related risks.

Reliable prediction of scour depths at coastal defences challenging and is influenced by complex wave-structure interactions and a range of nearshore processes (hydrodynamic and morphological). The prediction of scour depth, therefore, involves the consideration of parameters that reflect the diverse processes. These relate to wave and current conditions, tide and wave approach angles, sediment and bathymetric characteristics and features, and water depth at the structure ( Müller et al., 2008 ; Pourzangbar et al., 2017a ). For example, the scouring patterns observed in fine and coarse grained bed material are distinctly different ( Pourzangbar et al., 2017b ). Previous studies also highlighted that scour depth from regular waves are generally larger than those observed for irregular waves.

A significant number of beaches globally are coarse-gained shingle beaches, often with man-made coastal defences such as vertical seawalls or sloping structures ( Powell and Lowe 1994 ; Salauddin and Pearson, 2018 ; Salauddin and Pearson, 2020 ). Although the literature (e.g., Pourzangbar et al., 2017a ; Pourzangbar et al., 2017b ) has demonstrated the robust performance of ML algorithms in predicting scour depth at sandy beaches, the capabilities of ML techniques for predicting scour in shingle foreshores are much less reported. The recent study by Salauddin et al. (2023) focussed on evaluating the effectiveness of ML algorithms for predicting scour depths at vertical seawalls and showed that ML models were able to predict scour depths with good accuracy for experimental data. Nevertheless, there remains a scope of the application of such algorithms to other structure types such as a sloping structure on a permeable shingle bed and investigate the performance of such algorithms in predicting scour depths for the same.

Here we present for the first time the development and testing of ML algorithms (namely, Support Vector Machines Regression (SVMR), Gradient Boosted Decision Trees (GBDT), Random Forests (RF) and Artificial Neural Networks (ANN)) at a sloping structure with a sloping shingle foreshore. The models were trained and tested on a physical modelling experimental dataset of scour depths at a 1 in 2 (1 V:2H) impermeable sloping seawall located on a permeable 1 in 20 (1 V:20H) shingle foreshore. Advanced novel pre-processing and post-processing techniques such as feature selection and feature importance are proposed to facilitate ML-based modelling for scouring datasets and we devise a stepwise methodological framework for scouring prediction. The predictive performance of ML models are investigated through well-established statistical metrics. The key objectives of this study are (i) to develop a robust methodological framework to use data driven ML algorithms for predicting scour depth at coastal defences, and (ii) quantify the predictive performance of selected ML-based models for estimating scour depths at sloping coastal sea defences.

2 Scour prediction methods

Existing studies assessing scour at sea defences such as vertical seawalls and sloping seawalls are typically underpinned by numerical, laboratory and field-based modelling approaches to derive empirical relations and engineering guidance. Fowler (1992) developed empirical formulae for toe scour depth based on physical modelling of scouring at a vertical seawall placed on a sandy foreshore. Wallis et al. (2010) and Sutherland et al. (2003 , 2006) proposed an improved guidance for predicting scour depths at vertical walls constructed on sandy foreshores using field and laboratory observations. These authors also claimed that for the tested conditions, maximum scour depths at a plain vertical wall were similar to those observed for a 1 in 2 sloping seawall. In recent years, Salauddin and Pearson (2019a) , Salauddin and Pearson, (2019b) conducted a comprehensive suite of laboratory-based physical modelling experiments to characterise scouring at both vertical and sloping structures on shingle foreshores, subjected to a wide range of irregular wave conditions (including storm and swell sea states).

The review of literature relating to scour at seawalls reveals a substantial correlation between toe scour depth and relative water depth at the toe ( h t /L 0m ), where, h t is the toe water depth (m) and L 0m is the mean deep water wavelength (m), for defences on sandy foreshores. Sutherland et al. (2008) proposed an empirical relationship (Eq. 1 ) between the dimensionless scour depth ( S t /H s ), [calculated from scour depth S t (m) and significant wave height (m), H s (m)], and relative toe water depth ( h t /L 0m ) for the prediction of toe scour depth at a plain vertical seawall in a sandy beach. This was later verified by Müller et al. (2008) . Similar findings were also observed for scouring at a plain vertical wall with a shingle foreshore slope ( Salauddin and Pearson, 2019a ; Salauddin and Pearson, 2019b ). Sutherland et al. (2008) also proposed an empirically based equation to predict the toe scour depth for vertical seawalls considering the influence of beach slope (Eq. 2 ).

where, S t and S t m a x are the toe scour depth and maximum toe scour depth, respectively, H s is significant wave height (=highest one-third of wave heights), α is beach slope, h t is toe water depth, L 0 m is deep-water wavelength based on T m , where T m is the mean wave period.

Numerical modelling tools have also developed and applied to simulate scour behaviour at coastal defences ( Peng et al., 2018 ; Peng et al., 2023 ; Yeganeh-Bakhtiary et al., 2020 ). For example, Peng et al. (2023) utilized Reynolds Averaged Navier–Stokes equations (RANS) and the Volume of Fluid (VOF) modelling technique, coupled with wave-sediment transport and morphological factors, to simulate scour dynamics in front of an impermeable plane vertical seawall under specific wave conditions. However, robust numerical modelling techniques for estimating scour in wave environments still remain limited, largely as a result of the complexity of multiphase flow simulations, but also as a result of the high computational requirements (due to the involvement of intrinsic equations) that are involved. For example, in numerical simulations of estimating scour depths, uncertainty is induced from the dependency of such models on empirical parameters of the scouring process ( Yang et al., 2018 ).

In recent years, with advancements in data science and computational resources, Artificial Intelligence (AI) in the form of Machine Learning (ML) has been successfully employed to address a wide range of coastal engineering problems. For example, significant research relating to the development of AI based decision-support algorithms for the prediction of wave characteristics ( Yeganeh-Bakhtiary et al., 2023 ) and wave overtopping at coastal defences has been undertaken (see, for example, den Bieman et al., 2021a , 2021b; den Bieman et al., 2020 ; Elbisy, 2023 ; Elbisy and Elbisy, 2021 ; Habib et al., 2022b ; Habib et al., 2023a ; Habib et al., 2023b ). Habib et al. (2022a) has provided an overview of recent studies on the applications of ML approaches in coastal engineering problems.

Data-driven ML modelling approaches have been applied to predict scour depths at vertical breakwaters. Pourzangbar et al. (2017a) , Pourzangbar et al. (2017b) successfully applied several ML algorithms, including Genetic Programming (GP), Artificial Neural Network (ANN), Support Vector Machine Regression (SVMR) and the M5’ Decision Tree model to predict scour depth from physical modelling data for impermeable vertical breakwaters with sandy foreshores. However, the development to date of ML-based scour prediction models have thus far been applied to vertical breakwaters and sandy foreshores with fine grains. Previous studies however have not dealt with the prediction of scour depth at a sloping structure on a permeable shingle foreshore using advanced ML algorithms, which has been addressed for the first time in this work.

3 Materials and methods

3.1 scouring dataset.

The scour dataset used in this study was obtained from experimental studies conducted in a 2D wave flume, 22 m long, 0.6 m wide, and 1 m deep ( Figure 1 ), at the University of Warwick’s Water Engineering Laboratory ( Salauddin and Pearson, 2019b ). The flume was equipped with a piston-type wave paddle, six Wave Gauges (WG) and active adsorption system capable of generating monochromatic and random waves, generating realistic sea states in the wave channel. The dataset consisted of over 120 experiments in which the scour characteristics at the toe of a sloping wall (1:2) with a shingle foreshore, of approximately 6 m length, on a 1:20 slope were observed and included a comprehensive range of incident wave conditions including both impulsive and non-impulsive waves. The JONSWAP wave spectrum with a peak-enhancement factor of 3.3 was applied to generate incident waves that were representative of the young sea state. The relative crest freeboard ( R c /H m0 ), (where Rc is the crest-freeboard of the defence structure and H mo is the wave height at the toe of the structure) ranged from 0.5 to 5.0 and this was achieved by applying six different types of toe water depths. The scouring characteristics were measured for both impulsive and non-impulsive wave conditions. The dataset comprising of 120 sets of observations, was split into a train-test set of 70%–30%.

www.frontiersin.org

FIGURE 1 . Schematic of the Experimental setup for measuring scour depth at a sloping wall with a shingle foreshore (Adopted from Salauddin and Pearson, 2019b ).

For each test configuration, the scour depth was measured at the toe of the structure and at different locations along the wave flume in front of the structure. The maximum scour depth was then determined from these measurements. Analysis of the experimental data showed that, for the wave conditions tested, the maximum scour depth occurred at the toe of the structure. An insight into the database in terms of statistical correlation (Pearson R) revealed very low correlative relations between the scour parameters (described in the Glossary section) and the relative scour depth (= S t /H m0 ; where, S t is the measured scour depth and H m0 is the water depth at the toe of the structure). No negative correlation was observed between the variables, however, only R c /H 1/3,deep and I r showed a maximum correlation of 0.25 with the relative scour depth. Two kernel (ANN and SVMR) and two DT-based (RF and GBDT) algorithms were investigated in the study of Habib et al. (2023a) and it was reported that the algorithms performed satisfactorily in predicting wave overtopping at a vertical sea wall. The algorithms are hence also investigated for a scour dataset, since the intrinsic nature of the scour dataset is similar to what was applied in the overtopping study ( Habib et al., 2023b ).

The workflow followed in the data preparation, together with model development and testing is summarised in Figure 2 .

www.frontiersin.org

FIGURE 2 . The methodological approach adopted for the ML based modelling.

SVMR is a category of supervised ML algorithms and an extension of the classification-based Support Vector Machines (SVM), typically employed for regression tasks ( Noori et al., 2022 ). SVMR algorithms aim to minimize the prediction error and simultaneously maximize the margin around the fitting function, effectively identifying the best-fit function for a given dataset. Figure 3 illustrates a typical workflow structure for SVMR. For a regression problem using a training dataset containing interlinked input features ( x i ) and target values ( y i ), SVMR deduces a function f ( x ) that predicts the target values y based on input features x . The fundamental goal of SVMR is to construct a hyperplane that closely fits the training data within a specified tolerance of error margin ( ε ). Feature points inside the epsilon tube surrounding the hyperplane are regarded as support vectors, w, and do not incur any penalties. Points outside of this tube are penalized because they add to the error. The loss function is determined using Eq. 3 :

where, ε i and ε 1 * are slack variables that gauge how far the outliers are from the ε -tube, N is the total number of slack variables and C is a regularization factor that can be adjusted to determine the flatness of the hyperplane.

www.frontiersin.org

FIGURE 3 . The workflow of a SVMR algorithm adopted in this study.

The main goal of the optimisation approaches related to SVMR is deducing the optimal values for w , and the slack variables, ε i and ε 1 * SVMR. The objective is to minimize the regularization term while ensuring that errors are within ε and slack variables remain non-negative. If non-linearity exists, the feature data is projected onto kernel space, a higher-dimensional hyperplane, which improves the model’s accuracy. The function k x i , x j defines the kernel space, and in this study, a Gaussian Radial Basis kernel function (RBF, Eq. 4 ) is utilised:

where, σ is the kernel parameter. Gaussian RBF kernel is suitable for datasets with unknown or challenging-to-trace intrinsic feature characteristics ( Roushangar and Koosheh, 2015 ). This is because the RBF kernel, based on the Taylor Series expansion, can accommodate an infinite number of feature dimensions. SVMR is particularly known for producing robust predictions when dealing with non-linear and high dimensional data (e.g., Kawashima and Kumano, 2017 ; Lan et al., 2023 ), similar to the dataset used in this study.

ANNs are well established in coastal engineering applications for tackling classification and regression tasks by mapping inputs to outputs, assigning weights to specific inputs, estimating and minimizing a loss function (example.g., Raikar et al., 2018 ; Verhaeghe et al., 2008 ; Zanuttigh et al., 2016 ; Formentin et al., 2017 ; EurOtop, 2018 ; Habib et al., 2023a ; Habib et al., 2023b ). Figure 4 illustrates the workflow of a feed-forward and back propagation ANN algorithm, including the input, hidden, and output layers. The input layer receives data from the training set. The information is only communicated to and from each layer within the neural network and not between neurones in the same layer. The model’s hidden layers are responsible for assigning numerical weights to the incoming information from the input layers and to the activation functions. The output layer of the network estimates the quantity predicted by the activation functions and then calculates the dependent feature(s) from the independent feature(s) in the input layers ( Babaee et al., 2021 ; Khosravi et al., 2023 ).

www.frontiersin.org

FIGURE 4 . Schematic of a feed-forward and back propagation ANN algorithm adopted in this study.

A Multi-Layered Perceptron (MLP) ANN, which is feed-forward and back-propagation in nature, is adopted in this study ( Figure 4 ). The term feed-forward and back-propagation essentially means that until a predetermined allowable error rate is achieved, the error rates are minimized by altering the loss functions through a combination of feed-forward (exchange of information from Input to Hidden to Output Layers) and back propagation (exchange of information from Output to Hidden and Hidden to Input Layers). The adjustment of weights and biases during the backpropagation stage is determined by the error rate. This process involves assigning new weights and activation functions to the hidden layers. The optimization of the number of hidden layers is usually based on the complexity of the input data, aiming to minimize prediction error ( Elbeltagi et al., 2021 ; 2022 ).

3.4 RF and GBDT

RF and GBDT algorithms are typically categorised as Decision Trees (DTs). DTs are supervised machine learning algorithms used to predict an output variable (i.e., dependent or target variable) based on a set of independent variables (i.e., features). DTs are capable of tackling both classification and regression problems. In regression, they predict continuous or numerical output variables, while in classification, they predict class labels for discrete output variables ( Yeganeh-Bakhtiary et al., 2022 ).

In the case of regression-based DTs, the training data is iteratively partitioned into rectangular regions, and the mean and median values within each region is estimated until a pre-determined stopping criteria are met. For example, given a training dataset, X = x 1 , y 1 , x 2 , y 2 . . . . , x n , y n , where x i represents an input feature vector for the i th training dataset and y i is the corresponding output, the DT algorithm divides X into a series of rectangular regions, denoted as R1, R2, R3, etc. For each region, the median and mean are estimated to serve as the prediction value P for that corresponding region. The final DT is constructed using the input features that distinctly divide these rectangular sections and yield the output variable with the smallest variance. DTs are commonly used in prediction tasks due to their ability to handle noise and non-linearity in input data independently ( Pedregosa et al., 2011 ; Kotu and Deshpande, 2015 ; Yeganeh-Bakhtiary et al., 2023 ).

A Random Forest (RF) algorithm is an ensemble of DTs constructed from a random sub-set of training data. Figure 5 illustrates a schematic of methodological workflow for RF modelling approach.

www.frontiersin.org

FIGURE 5 . The workflow of a RF algorithm [Adopted from Habib et al. (2023a) ].

RF model aims to reduce overfitting and enhance generalization by minimizing overexposure to any specific set of training data. The final prediction from tRF is the average of predictions made by individual DTs, often referred to as bagging. An additional advantage of RF is its capability handle both categorical and numerical data, further minimising overfitting.

The boosting strategy is another method for enhancing DTs’ predictive capabilities. An example of a Boosting approach is the GBDT algorithm (see Figure 6 ). The Mean Squared Error (MSE) between the predicted and actual values is measured in the boosting technique using a loss function. During training, the boosting algorithm aims to minimize this loss function by assigning numerical coefficients to input data, often through gradient descent. The GBDT algorithm, in particular, is known for rapidly minimizing the loss function, resulting in faster and more accurate predictions from DT models ( Sutton, 2005 ).

www.frontiersin.org

FIGURE 6 . The workflow of a GBDT algorithm. Adopted from Habib et al. (2023b) .

3.5 Model optimization

3.5.1 hyperparameter tuning.

Hyperparameters refer to the parameters of a ML algorithm that can be adjusted or tuned by the user, as opposed to model parameters, such as the coefficients of mapping functions, which are not user-accessible. Hyperparameter tuning is a crucial process for reducing overfitting and ensuring that the ML algorithm is well-suited for a specific set of input data. Hyperparameter tuning was conducted for all the ML adopted models in this study using the open-source scikit-learn library in Python ( Pedregosa et al., 2011 ). Table 1 summarises the optimum hyperparameters adopted for the SVMR, RF, GBDT, and ANN models. The SVMR algorithm’s regularization parameter is represented by the C term in Table 1 . The algorithm’s “engine” is a function called the kernel that maps input parameters (independent variables) onto output values (dependent variable). This study investigates the performance of linear, polynomial and RBF kernels. Gamma ( Table 1 ) is a kernel function coefficient. This study combines “RandomizedSearch” and a k-fold Cross Validation (CV) to find the best parameters. CV is a popular resampling method that eliminates bias from prediction models ( Pedregosa et al., 2011 ; Salauddin et al., 2023 ). The data is randomly divided into k sets of nearly similar size for k-fold cross validation. The ML algorithms are first tested on these folds to validate the training, and then applied to the test set. The validation step ensures that the algorithms explicitly capture the variations and patterns in the training set. The function RandomizedSearchCV (RS) uses a set number of random combinations of hyperparameters. The RS function is particularly suitable for performing hypertuning when there are a large number of hyperparameters involved, i.e., similar to this work.

www.frontiersin.org

TABLE 1 . Set of hyperparameters and their optimised values.

The key functional components of a DT network can be found in the hyperparameters of the RF model ( Table 1 ). “n_estimators” determines the number of trees in an RF, while “max_depth” and “min_samples_split” help mitigating overfitting. In this study, a random search with Cross Validation (CV) was used for hyperparameter tuning in the RF model.

GBDT and RF are both based on DTs, with GBDT relying on gradient boosting. GBDT’s hyperparameters ( Table 1 ) determine the size of the decision tree that best suits the input data. “learning_rate” is crucial for reducing overfitting as it computes the weights of input features to converge the error in the loss function. “max_depth” also plays a role in reducing overfitting by limiting the number of nodes in the trees. For GBDT, hyperparameter tuning was performed using a random search with 5-fold cross validation. The scope of hyperparameter tuning with ANN is limited ( Huang et al., 2012 ; Ghiasi et al., 2022 ). RS with a k-fold CV approach is implemented in this study to enhance the learning rate “alpha,” and the best model is determined based on the model loss criterion. Typical hyperparameter tuning values for the ANN models are adopted from LeCun et al. (2015) and Glorot and Bengio (2010) . The kernel function of the ANN is located in the hidden layers, and the user can predetermine both the number of layers and neurons in each layer. Additional to the ‘alpha’ parameter’ and the activation function, the number of epochs was adjusted to attain the optimal set of hyperparameters for the ANN in this study.

3.5.2 Feature selection and feature transformation

Robust ML-based predictions can be challenging when dealing with high-dimensional data that can reduce the effectiveness and accuracy of machine learning algorithms due to data redundancy. Additionally, computational resource costs can increase due to prolonged algorithm runtimes. To address the issue of data redundancy, feature selection techniques are employed. These techniques aim to filter a subset of relevant features from a large dataset, effectively eliminating redundancy and irrelevance ( Cai et al., 2018 ). Feature selection is typically achieved through statistics-based permutation combinations, which measure the correlation of individual features with a target feature. The most important features are then deduced based on their correlation scores, as highlighted by Liu and Motoda (2012) and Donnelly et al. (2024) .

Feature transformation is a technique used for extracting useful features from a large dataset, where the initial number of features is transformed into a new, more compact dataset with fewer but relevant features, while conserving the implicit and/or explicit information of the original dataset. One well-known feature transformation technique is Principal Component Analysis (PCA) ( Roessner et al., 2011 ; Noori et al., 2022 ). PCA is particularly useful for capturing and reducing variance in large datasets by selecting the most relevant features that account for the majority of variance across the dataset. It is characterized as a dimensionality reduction technique that converts the original variables into uncorrelated principal components.

This study adopts a combination of feature selection and feature transformation techniques to discover and filter the most relevant features in the scour dataset. A Forward Sequential Feature Selection (FSFS) method is employed for feature selection. FSFS is a “greedy” method that iteratively builds a set of selected features ( S ) by adding new features, one at a time, and performing prediction tasks using a chosen estimator. In more concrete terms, FSFS starts with zero features and identifies the feature that, when used to train an estimator (e.g., linear regression in this study), maximizes a Cross Validation (CV) score. This process is repeated, adding one feature at a time, until all features in the dataset have been considered. The number of features that maximizes the CV score is considered the optimal number. FSFS is widely accepted for its simplicity and accuracy in estimating the number of important features in a dataset ( Marcano-Cedeno et al., 2010 ). In this study, FSFS determined 10 parameters as the optimum number of features (see Figure 7 ). Subsequently, PCA was applied to gain insight into the 10 most important features of the dataset utilised in this study, including d 50 (mm), Duration (s), h t (m), R c (m), T m,deep (s), L p (m), L m (m), R c /H 1/3,deep , h t /H 1/3,deep and I r (terms are explained in the glossary). The data corresponding to the features proposed by FSFS are selected as predictive model input for the training and testing phase of the ML algorithms.

www.frontiersin.org

FIGURE 7 . Variation of performance metric (CV score) with the number of features during Forward Sequential Feature Selection (FSFS).

Further analysis of the training phase was conducted by examining the variation of RMSE in the training set ( Figure 8 ). The CV value and the number of training and validation iterations were set at 5 and 100, respectively. Figure 8 illustrates that, despite observing RMSE variations across all the algorithms, the average RMSE remained consistent in all the cases. This indicates that the selected algorithms in this study are capable of producing similar performance on the given dataset.

www.frontiersin.org

FIGURE 8 . Variation of RMSE during the training phase for SVMR, ANN, RF and GBDT algorithms.

3.6 Evaluation metrics

To evaluate the performance of the machine learning algorithms in predicting relative scour depth, the predicted values were compared to the observed values using statistical metrics including the coefficient of determination ( R 2 ), root mean square error (RMSE), mean absolute error (MAE), and relative absolute error (RAE). The Coefficient of Determination (Eq. 5 ) describes the percentage of the dependent variable’s fluctuation that can be predicted from the independent variables and, as such, serves as a gauge to evaluate the overall effectiveness of ML models ( Cheng et al., 2014 ):

where y i , y ^ i   a n d   y ¯ are the observed values, predicted values, and mean of all observed values, respectively.

The standard deviations between the observed and predicted values are reflected in the Root Mean Square Error ( RMSE ) calculated from Eq. 6 , and discrepancies between these values, averaged across the number of observations, is expressed in terms of the Mean Absolute Error (MAE) as in Eq. 7 :

where, q A and q P are the actual and predicted relative scour depths, respectively.

In a regression test, the null hypothesis is that all of the regression coefficients are zero, i.e., the model is not predictive. The F-test is performed to determine whether accept or reject the null hypothesis. The F-test assesses whether the addition of predictor or dependent variables improves the model compared to a model with only an intercept (zero predictor variables). It quantifies the ratio of explained variance to unexplained variance (residuals) as (Eq. 8 ):

where, S S R = ∑ y i − y ^ i 2 , S S E = ∑ y i − y ¯ 2 , k and n are the numbers of independent variables and observations, respectively.

The plot of the residuals or the Discrepancy Ratio (DR) against the predicted values is also an important criteria about the relevancy of a prediction model and the residuals should ideally exhibit zero correlation with the predicted values ( Sahay and Dutta, 2009 ; Salauddin et al., 2023 ).

The study of Kissell and Poserina, (2017) suggested that the statistical significance of regression models (where predicted values are compared against observed ones) should be holistically evaluated in terms of the r 2 score, F-test score and the p -value. The values obtained from these statistical parameters should be in agreement to deduce the stability and accuracy of regression models.

4 Results and discussion

4.1 model performances.

The experimental dataset of Salauddin and Pearson (2019a) was deployed for training and testing of all the ML algorithms examined in this study following scalar transformations and feature selection. Training and testing of the models followed a common methodology which provided the basis for comparing modelled and measured dimensionless scour depths (S t /H 1/3 deep [-]) in Figure 9 . Results indicate that all the four ML-based models tested in this study are capable of providing realistic approximation of scour depths. In-depth statistical evaluation of the predictive models is presented in Table 2 .

www.frontiersin.org

FIGURE 9 . Comparison of predicted versus actual relative scour depths (=S t /H 1/3 deep [-]) for (A) SVMR , (B) ANN, (C) RF, and (D) GBDT.

www.frontiersin.org

TABLE 2 . Prediction evaluation metrics and statistical scores for ML-based models.

Notably, a number of data points for smaller (near to 0.0) relative scour depths fall outside the 95% CIs. This pattern is also evident for a few datapoints of large relative scour depth, while a few data points representing larger relative scour depths were inside the 95% CI zone. This suggests that while the algorithms were capable of robust overall predictions, but in the case of both smaller and larger relative scour depths, they exhibited some inconsistency. However, predictions for larger relative scour depths were more accurate (positioning on or very close to the regression line in Figure 9 ). The scatter in the graphs can be explained by the Pearson R score. Among the tested algorithms, ANN, GBDT, and RF showed similar scatter with relatively lower R scores compared to SVMR. SVMR, in particular, demonstrated comparatively more accurate predictions, as reflected by the highest r 2 and R scores of 0.74 and 0.85, respectively. The RMSE values of the algorithms did not vary by a large margin with respect to one another. The SVMR yielded the lowest RMSE value of 0.28 while that of the RF was the highest at 0.33. The highest RMSE value was approximately 22% of the predicted maximum relative scour depth. From a computational efficiency perspective, under the given hyperparameter conditions ( Table 1 ) and using a computer with an 8 cores CPU, 16 GB RAM, and 6 GB of dedicated GPU memory, the SVMR, ANN, GBDT and RF algorithms completed the prediction task (for the test set) in 2.5, 6.93, 14.83 and 22.3 s, respectively. This information suggests that SVMR outperforms the other algorithms in terms of computational efficiency.

Comprehensive statistical analyses of the developed ML-based models’ performance were conducted in this study. Statistical scores are then used to rank the performance of the four tested ML algorithms in predicting scour depth at sloping structures with shingle foreshore. Table 2 shows the results of performance evaluation of the algorithms according to the criteria outlined in Section 4.1 .

The results from the evaluation metrics indicate that all the algorithms yielded strong r 2 scores ( r 2 scores >0.40; Kissell and Poserina, 2017 ). Hence, the F-test was performed and it was observed that all the models yielded F-score higher than the critical F-test score of 4.15 ( Table 2 ) and also the p-values for all the models were substantially (∼10 −6 ) lower than the significance level of 0.05. These findings reflect the statistical significance of the results obtained from the ML algorithms and it can be inferred that the variations in the independent variable (actual relative scour depth) were accounted for by the dependent variable (predicted relative scour depth). The cumulative number of outliers in the models is expressed in the form of the RMSE. The SVMR algorithm yielded the predicted relative scour depth quantity with the smallest number of outliers, reflected in the lowest RMSE of 0.28 across all the tested algorithms. The RMSE of the other algorithms is not shown to differ significantly, suggesting the appropriateness and robustness of the proposed ML algorithms for predicting relative scour depth. A higher RMSE and MAE was coupled with lower r 2 and vice versa for all the models. It is noted that the scale of MAE is dependent on the scale of the outputs (here, the predicted relative scour depth). The maximum and minimum absolute relative scour depth in the test set was 1.5 and 0.8, respectively, giving a mean relative scour depth of 1.15. The maximum MAE of 0.22 across the models was observed for the RF model. Conversely, the minimum MAE of 0.17 was determined for the SVMR model. Therefore, the range of MAE evaluated for this study was between 14.7% and 19% for the mean relative scour depth in the test set, consisted of 32 observations derived from the original set of 120 observations using a train-test split of 70%–30%. The significance of the MAE analysis is that the models were able to predict the relative scour depth with an approximate accuracy of 80%. Overall, the most accurate scour predictions were attributed to the SVMR model, with the least accurate predictions being associated with the RF and GBDT models, suggesting that DT based algorithms may be less suited for obtaining predictions from smaller datasets.

4.2 Feature importance

The method of evaluating the relative contribution of various features, also referred to as variables or predictors, in a predictive model is known as Feature Importance (FI). It is useful for selecting features, comprehending the underlying data, and getting new perspectives on the subject at hand. FI reveals which features have the most impact on the model’s predictions, essentially bridging the findings from ML to the physical consistency of the underlying processes (i.e., scouring in this study). The FI results are reported in two formats here, namely, the magnitude of the coefficients method and the permutation importance method. This is due to the fact that although the DT-based algorithms (i.e., RF and GBDT) had in-built FI analysis functions, the other two algorithms (SVMR and ANN) did not possess this function in Scikit-Learn’s module. In the magnitude of the coefficients method, the size of the coefficients directly reflects the significance of the feature. Greater absolute values imply greater significance of the predictors. The permutation importance method involves permuting a predictor’s values at random and analyzing the impact on model performance. The more performance is lost, the more significant the feature is thought to be. The results are reported in a similar format to that of the magnitude of coefficients method. Figure 10 summarizes the impact of the predictors on the prediction analysis.

www.frontiersin.org

FIGURE 10 . Feature Importance Analysis showing the impact of predictors.

In some experiments related to the measurement of relative scour depth at sloping walls with gravel foreshore, it was reported that the Iribarren Number I r had a strong positive correlation with the measured scour depths for a given relative toe water depth ( h t /L 0m ) ( Salauddin and Pearson, 2019b ). Hence, it was expected that I r would have the maximum influence in the prediction analysis to ensure consistency with experimental results. The FI analysis results show that 3 out of 4 (i.e., ANN, SVMR, and RF) algorithms identified I r as the most important predictor. For the GBDT algorithm, I r is ranked as one of the top three predictors, while the water depth at the toe of the structure ( h t ) is identified as the most important predictor. In Figure 10 , the bars labelled as ‘others’, comprises the summation of the magnitude of importance of features including d 50 , Duration, R c , T m deep , and L m from the four tested algorithms. Therefore, it can be inferred from the results of FI, that the physical scouring processes are reasonably well-captured in the proposed ML-based models.

4.3 Residuals

The residual plot for all of the tested algorithms is shown in Figure 11 . The residuals are independent of the predicted values, highlighting that the results are in good agreement regarding the reliability of the models.

www.frontiersin.org

FIGURE 11 . Variation of Residuals with predicted relative scour depth.

4.4 Taylor’s diagram

An effective visual method to describe the statistical metrics from predictive models is the Taylor’s Diagram ( Taylor, 2001 ). The Taylor’s Diagram ( Figure 12 ) shows three statistical parameters, including the correlation coefficient projected as an azimuthal angle (in black), the radially plotted Centered Root Mean Square (cRMS) (in green), and the horizontally plotted standard deviations (in blue). Taylor diagram is particularly robust for assessing and comparing several performance aspectsof complicated models.

www.frontiersin.org

FIGURE 12 . Taylor’s Diagram of the statistical metrics determined for all tested ML-based models.

Due to the fact that they are both the square roots of squared differences between the actual and predicted values, standard deviation and the cRMS are comparable. However, they differ from one another in the context that RMSE is used to gauge the gap between actual values and the corresponding predictions while standard deviation accounts for the spread of data around the mean. The error of prediction, or the quantitative deviation is measured using the cRMS. Here, the cRMS of the SVMR is the lowest, while the standard deviation is the highest. This essentially means that while the predicted results are more spread across the regression line, the quantity of spread is small, indicated by the low cRMS score. Conversely, RF and ANN has lower standard deviation, but the quantity of deviation is high which is reflected by the higher cRMS score. The SVMR model also yielded the highest correlation coefficient of 0.96 followed by that of ANN (0.955), RF (0.95) and GBDT (0.92). Therefore, from a holistic point of view it could be inferred from the Taylor’s Diagram that the SVMR produced the more accurate values of predicted relative scour depth.

5 Conclusion

Climate change-induced extreme climatic events intensify scouring in front of coastal infrastructures, posing a significant threat to their structural integrity and reliability. The development of robust prediction tools for coastal scouring is crucial for enhancing coastal resilience and safeguard these vital defences. This study examined the capabilities of advanced ML techniques for prediction of relative scour depths at sloping seawalls with shingle foreshores. This study developed a methodological framework for implementing of ML-based models for accurate predictions of relative scour depths at sloping walls with shingle foreshore. Four ML algorithms including RF, GBDT, SVMR, and ANN were utilised and tested on an experimental dataset of scour depths. We proposed a robust and efficient framework including detailed procedures for data scaling, feature selection, and tuning of the modelling parameters.

A methodological approach is proposed for pre-processing the physical modelling dataset to conduct missing value imputations, feature transformation (PCA), selection, and data scaling to ensure redundant data and missing values do not impair the performance of the ML models. In order to verify the ML algorithms on a randomly selected sub-set of training data, cross validation was carried out in the training step. A typical train-test split of 70%–30% was implemented. These precautions ensured that a consistent methodology was followed to achieve comparable outcomes from the predictions made by the four algorithms. Iribarren Number (Ir) was identified as the most important parameter influencing the scouring process, in agreement with the physical process of scouring.

The performance of the proposed ML-based predictive models were evaluated for a comprehensive experimental dataset. The predicted relative scour depth and comprehensive statistical evaluation confirmed the robust performance and accuracy of all the tested algorithms.

A set of statistical indices, ( r 2 , RMSE, MAE, F-test and Pearson R ) were incorporated to gauge the efficiency of the tested ML algorithms. The SVMR algorithm showed superior performance compared to the other tested algorithms with an r 2 score of 0.74, RMSE of 0.28, MAE of 0.17 and Pearson R value of 0.96. The DT based algorithms were not able to match performance of SVMR and ANN with scores of 0.62 for r 2 for both RF and GBDT. ANN was identified as the second-best performing algorithm with a r 2 score closest to that of SVMR the (0.68). The F-test score and the Pearson R values of the algorithms are indicative of the fact that the variation of the independent variable is accounted for by the dependent variables or the predictors and that there is strong correlation between the actual and predicted values. These findings were reinforced by the high Pearson R values of 0.96, 0.955, 0.95 and 0.92 for SVMR, ANN, RF and GBDT, respectively. The SVMR model was also the most computationally efficient model (<3s), more than two times faster than ANN (6.93s) followed by DT based GBDT (14.83s) and RF (22.3s). The comparison of MAE revealed that accuracy of predictions was over 80% for all the algorithms. One important reason of DTs underperforming in this study may be due to the relatively small number of training data. Although there is no explicit requirement of the amount of training data required by ML algorithms, larger and more diverse datasets could improve the performance of ML-based models presented in this study. Future studies should focus on further improving the performance of the proposed predictive tool by the inclusion of larger experimental datasets. Hybrid machine learning approaches with optimisation techniques could potentially enhance predictive performance of the models proposed here and should be tested on wave-induced scouring datasets. The method proposed in this study could be adopted by coastal engineers for rapid scour depth prediction and inform design and maintenance of coastal defence structures.

Data availability statement

The data analyzed in this study is subject to the following licenses/restrictions: Data will be made available on reasonable request. Requests to access these datasets should be directed to [email protected] .

Author contributions

MAH: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing–original draft. SA: Writing–review and editing. JO’S: Writing–review and editing, Supervision. MS: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing–review and editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Babaee, M., Maroufpoor, S., Jalali, M., Zarei, M., and Elbeltagi, A. (2021). Artificial intelligence approach to estimating rice yield. Irrigation Drainage 70 (4), 732–742. doi:10.1002/ird.2566

CrossRef Full Text | Google Scholar

Cai, J., Luo, J., Wang, S., and Yang, S. (2018). Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79. doi:10.1016/j.neucom.2017.11.077

Cheng, C.-L., Shalabh, , and Garg, G. (2014). Coefficient of determination for multiple measurement error models. J. Multivar. Analysis 126, 137–152. doi:10.1016/j.jmva.2014.01.006

den Bieman, J. P., van Gent, M. R. A., and van den Boogaard, H. F. P. (2021a). Wave overtopping predictions using an advanced machine learning technique. Coast. Eng. 166, 103830. doi:10.1016/j.coastaleng.2020.103830

den Bieman, J. P., Wilms, J. M., van den Boogaard, H. F. P., and van Gent, M. R. A. (2020). Prediction of mean wave overtopping discharge using gradient boosting decision trees. Water 12 (6), 1703. doi:10.3390/w12061703

Donnelly, J., Daneshkhah, A., and Abolfathi, S. (2024). Forecasting global climate drivers using Gaussian processes and convolutional autoencoders. Eng. Appl. Artif. Intell. 128, 107536. doi:10.1016/j.engappai.2023.107536

Elbeltagi, A., Kumari, N., Dharpure, J., Mokhtar, A., Alsafadi, K., Kumar, M., et al. (2021). Prediction of combined terrestrial evapotranspiration index (CTEI) over large river basin based on machine learning approaches. Water 13 (4), 547. doi:10.3390/w13040547

Elbeltagi, A., Pande, C. B., Kouadri, S., and Islam, A. R. M. T. (2022). Applications of various data-driven models for the prediction of groundwater quality index in the Akot basin, Maharashtra, India. Environ. Sci. Pollut. Res. 29 (12), 17591–17605. doi:10.1007/s11356-021-17064-7

Elbisy, M. S. (2023). Machine learning techniques for estimating wave-overtopping discharges at coastal structures. Ocean. Eng. 273, 113972. doi:10.1016/j.oceaneng.2023.113972

Elbisy, M. S., and Elbisy, A. M. S. (2021). Prediction of significant wave height by artificial neural networks and multiple additive regression trees. Ocean. Eng. 230, 109077. doi:10.1016/j.oceaneng.2021.109077

EurOtop, (2018). Manual on Wave Overtopping of Sea Defences and Related Structures . 2nd Edn. Available online at: www.overtopping-manual.com (accessed July, 2023)

Fitri, A., Hashim, R., Abolfathi, S., and Abdul Maulud, K. N. (2019). Dynamics of sediment transport and erosion-deposition patterns in the locality of a detached low-crested breakwater on a cohesive coast. Water 11 (8), 1721. doi:10.3390/w11081721

Formentin, S. M., Zanuttigh, B., and van der Meer, J. W. (2017). A neural network tool for predicting wave reflection, overtopping and transmission. Coast. Eng. J. 59 (1), 1750006. doi:10.1142/S0578563417500061

Fowler, J. E. (1992). “Scour problems and methods for prediction of maximum scour at vertical seawalls,” in Us army corps of engineers (Vicksburg, MS, USA: Coastal Engineering Research Center ). W. E. S. (eds.), Technical Report CERC-92–16.

Google Scholar

Ghiasi, B., Noori, R., Sheikhian, H., Zeynolabedin, A., Sun, Y., Jun, C., et al. (2022). Uncertainty quantification of granular computing-neural network model for prediction of pollutant longitudinal dispersion coefficient in aquatic streams. Sci. Rep. 12, 4610. doi:10.1038/s41598-022-08417-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Glorot, X., and Bengio, Y. (2010). “Understanding the difficulty of training deep feedforward neural networks,” in International conference on artificial intelligence and statistics. 2010 .

Habib, M. A., Abolfathi, S., O'Sullivan, J. J., and Salauddin, M. (2023b). “Prediction of wave overtopping rates at sloping structures using artificial intelligence,” in Proceedings of the 40th IAHR World Congress. Rivers–Connecting Mountains and Coasts , 404–413. doi:10.3850/978-90-833476-1-5_iahr40wc-p0115-cd

Habib, M. A., O'Sullivan, J., and Salauddin, M. (2022a). Comparison of machine learning algorithms in predicting wave overtopping discharges at vertical breakwaters . Austria: EGU General Assembly Vienna , EGU22–329. 23–27 May 2022. doi:10.5194/egusphere-egu22-329

Habib, M. A., O’Sullivan, J. J., Abolfathi, S., and Salauddin, M. (2023a). Enhanced wave overtopping simulation at vertical breakwaters using machine learning algorithms. PLOS ONE 18 (8), e0289318. doi:10.1371/journal.pone.0289318

Habib, M. A., O’Sullivan, J. J., and Salauddin, M. (2022b). Prediction of wave overtopping characteristics at coastal flood defences using machine learning algorithms: a systematic rreview. IOP Conf. Ser. Earth Environ. Sci. 1072 (1), 012003. doi:10.1088/1755-1315/1072/1/012003

Huang, G. B., Zhou, H., Ding, X., and Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man. Cyber Part B 42, 513–529. doi:10.1109/tsmcb.2011.2168604

Kawashima, I., and Kumano, H. (2017). Prediction of mind-wandering with electroencephalogram and non-linear regression modeling. Front. Hum. Neurosci. 11, 365. doi:10.3389/fnhum.2017.00365

Khosravi, K., Rezaie, F., Cooper, J. R., Kalantari, Z., Abolfathi, S., and Hatamiafkoueieh, J. (2023). Soil water erosion susceptibility assessment using deep learning algorithms. J. Hydrology 618, 129229. doi:10.1016/j.jhydrol.2023.129229

Kissell, R., and Poserina, J. (2017). “Regression models,” in Optimal sports math, statistics, and fantasy ( Elsevier ), 39–67. doi:10.1016/B978-0-12-805163-4.00002-5

Kotu, V., and Deshpande, B. (2015). “Classification,” in Predictive analytics and data mining ( Elsevier ), 63–163. doi:10.1016/B978-0-12-801460-8.00004-5

Lan, J., Zheng, M., Chu, X., and Ding, S. (2023). Parameter prediction of the non-linear nomoto model for different ship loading conditions using support vector regression. J. Mar. Sci. Eng. 11 (5), 903. doi:10.3390/jmse11050903

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521, 436–444. doi:10.1038/nature14539

Liu, H., and Motoda, H. (2012). Feature selection for knowledge discovery and data mining . Springer Science & Business Media .

Marcano-Cedeno, A., Quintanilla-Dominguez, J., Cortina-Januchs, M. G., and Andina, D. (2010). “Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network,” in IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society , 2845–2850. doi:10.1109/IECON.2010.5675075

Müller, G., Allsop, W., Bruce, T., Kortenhaus, A., Pearce, A., and Sutherland, J. (2008). “The occurrence and effects of wave impacts,” in Proceedings of the ICE-Maritime Engineering (ICE) , 167–173.

Noori, R., Ghiasi, B., Salehi, S., Esmaeili Bidhendi, M., Raeisi, A., Partani, S., et al. (2022). An efficient data driven-based model for prediction of the total sediment load in rivers. Hydrology 9 (2), 36. doi:10.3390/hydrology9020036

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine learning in Python. doi:10.48550/arXiv.1201.0490

Peng, Z., Zou, Q. P., and Lin, P. (2018). A partial cell technique for modeling the morphological change and scour. Coast. Eng. 131, 88–105. doi:10.1016/j.coastaleng.2017.09.006

Peng, Z., Zou, Q. P., and Lin, P. (2023). “Impulsive wave overtopping with toe scour at a vertical seawall,” in ICE Breakwater Conference 2023 , UK .

Pourzangbar, A., Brocchini, M., Saber, A., Mahjoobi, J., Mirzaaghasi, M., and Barzegar, M. (2017b). Prediction of scour depth at breakwaters due to non-breaking waves using machine learning approaches. Appl. Ocean. Res. 63, 120–128. doi:10.1016/j.apor.2017.01.012

Pourzangbar, A., Losada, M. A., Saber, A., Ahari, L. R., Larroudé, P., Vaezi, M., et al. (2017a). Prediction of non-breaking wave induced scour depth at the trunk section of breakwaters using Genetic Programming and Artificial Neural Networks. Coast Eng. 121, 107–118. doi:10.1016/j.coastaleng.2016.12.008

Powell, K. A., and Lowe, J. P. (1994). The scouring of sediments at the toe of seawalls. In: Proceedings of the Hornafjordor International Coastal Symposium, Iceland , 749–755.

Raikar, R. V., Wang, C.-Y., Shih, H.-P., and Hong, J.-H. (2016). Prediction of contraction scour using ANN and GA. Flow Meas. Instrum. 50, 26–34. doi:10.1016/j.flowmeasinst.2016.06.006

Roessner, U., Nahid, A., Chapman, B., Hunter, A., and Bellgard, M. (2011). “Metabolomics – the combination of analytical biochemistry, biology, and informatics,” in Comprehensive biotechnology ( Elsevier ), 435–447. doi:10.1016/B978-0-444-64046-8.00027-6

Roushangar, K., and Koosheh, A. (2015). Evaluation of GA-SVR method for modeling bed load transport in gravel-bed rivers. J. Hydrology 527, 1142–1152. doi:10.1016/j.jhydrol.2015.06.006

Sahay, R. R., and Dutta, S. (2009). Prediction of longitudinal dispersion coefficients in natural rivers using genetic algorithm. Hydrology Res. 40 (6), 544–552. doi:10.2166/nh.2009.014

Salauddin, M., O’Sullivan, J., Abolfathi, S., Peng, Z., Dong, S., and Pearson, J. M. (2022). New insights in the probability distributions of wave-by-wave overtopping volumes at vertical breakwaters. Sci. Rep. 12, 16228. doi:10.1038/s41598-022-20464-5

Salauddin, M., and Pearson, J. (2018). A laboratory study on wave overtopping at vertical seawalls with a shingle foreshore. Coast. Eng. Proc. (36), 56. doi:10.9753/icce.v36.waves.56

Salauddin, M., and Pearson, J. M. (2019a). Wave overtopping and toe scouring at a plain vertical seawall with shingle foreshore: a Physical model study. Ocean. Eng. 171, 286–299. doi:10.1016/j.oceaneng.2018.11.011

Salauddin, M., and Pearson, J. M. (2019b). Experimental study on toe scouring at sloping walls with gravel foreshores. J. Mar. Sci. Eng. 7, 198. doi:10.3390/jmse7070198

Salauddin, M., Shaffrey, D., and Habib, M. A. (2023). Data-driven approaches in predicting scour depths at a vertical seawall on a permeable shingle foreshore. J. Coast Conserv. 27, 18. doi:10.1007/s11852-023-00948-w

Salauddin, M., and Pearson, J. M. (2020). Laboratory investigation of overtopping at a sloping structure with permeable shingle foreshore. Ocean Engineering 197. doi:10.1016/j.oceaneng.2019.106866

Sutherland, J., Brampton, A. H., Motyka, G., Blanco, B., and Whitehouse, R. J. W. (2003). Beach lowering in front of coastal structures-Research Scoping Study . London, UK . Report FD1916/TR.

Sutherland, J., Brampton, A. H., Obrai, C., Dunn, S., and Whitehouse, R. J. W. (2008). Understanding the lowering of beaches in front of coastal defence structures, Stage 2-Research Scoping Study . London, UK . Report FD1927/TR.

Sutherland, J., Obhrai, C., Whitehouse, R., and Pearce, A. (2006). “Laboratory tests of scour at a seawall,” in Proceedings of the 3rd International Conference on Scour and Erosion, CURNET (Gouda, Netherlands: Technical University of Denmark ).

Sutton, C. D. (2005). Classification and regression trees, bagging, and boosting (pp. 303–329). doi:10.1016/S0169-7161(04)24011-1

Taylor, K. E. (2001). Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 106 (D7), 7183–7192. doi:10.1029/2000JD900719

Tseng, I.-F., Hsu, C.-H., Yeh, P.-H., and Lin, T.-C. (2022). Physical mechanism for seabed scouring around a breakwater—a case study in mailiao port. J. Mar. Sci. Eng. 10 (10), 1386. doi:10.3390/jmse10101386

Verhaeghe, H., De Rouck, J., and van der Meer, J. (2008). Combined classifier–quantifier model: a 2-phases neural model for prediction of wave overtopping at coastal structures. Coast. Eng. 55 (5), 357–374. doi:10.1016/j.coastaleng.2007.12.002

Wallis, M., Whitehouse, R., and Lyness, N. (2010). Development of guidance for the management of the toe of coastal defence structures. In Coasts, marine structures and breakwaters: Adapting to change: Proceedings of the 9th international conference organised by the Institution of Civil Engineers and held in Edinburgh on 16 to 18 September 2009. Thomas Telford Ltd. , 696–707.

Yang, J., Low, Y. M., Lee, C.-H., and Chiew, Y.-M. (2018). Numerical simulation of scour around a submarine pipeline using computational fluid dynamics and discrete element method. Appl. Math. Model. 55, 400–416. doi:10.1016/j.apm.2017.10.007

Yeganeh-Bakhtiary, A., EyvazOghli, H., Shabakhty, N., and Abolfathi, S. (2023). Machine learning prediction of wave characteristics: comparison between semi-empirical approaches and DT model. Ocean. Eng. 286 (2), 115583. doi:10.1016/j.oceaneng.2023.115583

Yeganeh-Bakhtiary, A., EyvazOghli, H., Shabakhty, N., Kamranzad, B., and Abolfathi, S. (2022). Machine learning as a downscaling approach for prediction of wind characteristics under future climate change scenarios. Complexity 2022, 8451812. doi:10.1155/2022/8451812

Yeganeh-Bakhtiary, A., Houshangi, H., and Abolfathi, S. (2020). Lagrangian two-phase flow modeling of scour in front of vertical breakwater. Coast. Eng. J. 62 (2), 252–266. doi:10.1080/21664250.2020.1747140

Zanuttigh, B., Formentin, S. M., and van der Meer, J. W. (2016). Prediction of extreme and tolerable wave overtopping discharges through an advanced neural network. Ocean. Eng. 127, 7–22. doi:10.1016/j.oceaneng.2016.09.032

www.frontiersin.org

Keywords: random forest, gradient boosted decision trees, Support Vector Machine Regression, marine and coastal management, coastal hazards mitigation, toe scouring, sloping structures

Citation: Habib MA, Abolfathi S, O’Sullivan JJ and Salauddin M (2024) Efficient data-driven machine learning models for scour depth predictions at sloping sea defences. Front. Built Environ. 10:1343398. doi: 10.3389/fbuil.2024.1343398

Received: 23 November 2023; Accepted: 26 January 2024; Published: 09 February 2024.

Reviewed by:

Copyright © 2024 Habib, Abolfathi, O’Sullivan and Salauddin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: M. Salauddin, [email protected]

This article is part of the Research Topic

Recent Developments in Modelling Wave-Structure Interactions at Sea Defences in a Changing Climate

IMAGES

  1. Decision Making Case Study Examples

    decision making case study learning

  2. Case Study 39

    decision making case study learning

  3. Case Method

    decision making case study learning

  4. A Sample Case Study of Decision Making

    decision making case study learning

  5. Decision making- case study

    decision making case study learning

  6. case study approach decision making

    decision making case study learning

VIDEO

  1. 03 Decision Making

  2. CHAPTER (4) Decision Making Techniques (Tools). Theoretical part

  3. Case Assignment

  4. Decision making

  5. 12 Complex Decision Making

  6. Case Study (Learning and Cognition)

COMMENTS

  1. 5 Benefits of the Case Study Method

    The case study method, or case method, is a learning technique in which you're presented with a real-world business challenge and asked how you'd solve it. After working through it yourself and with peers, you're told how the scenario played out. HBS pioneered the case method in 1922. Shortly before, in 1921, the first case was written.

  2. What the Case Study Method Really Teaches

    December 21, 2021 Klaus Vedfelt/Getty Images Summary. It's been 100 years since Harvard Business School began using the case study method. Beyond teaching specific subject matter, the case...

  3. Case Studies

    Some of the case studies in this collection highlight the decision-making process in a business or management setting. Other cases are descriptive or demonstrative in nature, showcasing something that has happened or is happening in a particular business or management environment.

  4. Do Your Students Know How to Analyze a Case—Really?

    Step 1: Problem definition. What is the major challenge, problem, opportunity, or decision that has to be made? If there is more than one problem, choose the most important one. Often when solving the key problem, other issues will surface and be addressed.

  5. What the Case Study Method Really Teaches

    Cases expose students to real business dilemmas and decisions. Cases teach students to size up business problems quickly while considering the broader organizational, industry, and societal context. Students recall concepts better when they are set in a case, much as people remember words better when used in context.

  6. Decision making and problem solving

    Competing Through Manufacturing. Decision making and problem solving Magazine Article. Steven C. Wheelwright. Robert H. Hayes. The past several years have witnessed a growing awareness among ...

  7. PDF The Role of Using Case Studies Method in Improving Students' Critical

    By presenting content in the format of a narrative accompanied by questions and activities that promote group discussion and solving of complex problems, case studies facilitate the development of the higher levels of Bloom's taxonomy of cognitive learning; moving beyond recall of knowledge to analysis, evaluation, and application (Bonney, 2015).

  8. Machine learning in project analytics: a data-driven framework and case

    For data-driven decision-making, machine learning models are advantageous. This is because traditional statistical methods (e.g., ordinary least square (OLS) regression) make assumptions about the ...

  9. (PDF) Strategic Decision Making Cycle in Higher Education: Case Study

    The methodology is structured as a cycle of strategic decision making with four phases, and it is focused on institutional and nationalperspective, i.e. on decision making that takes place...

  10. How does video case-based learning influence clinical decision-making

    This study clarified that video and paper case modalities have different influences on learners' clinical decision-making processes. Video case learning encourages midwifery students to have a woman- and family-centred holistic perspective of labour and birth care, which leads to careful consideration of the psychosocial aspects.

  11. Case Method Teaching and Learning

    What is the case method? How can the case method be used to engage learners? What are some strategies for getting started? This guide helps instructors answer these questions by providing an overview of the case method while highlighting learner-centered and digitally-enhanced approaches to teaching with the case method.

  12. (PDF) Developing Decision-making Skills in Students: an active learning

    Developing Decision-making Skills in Students: an active learning approach January 2010 Affiliation: Teaching and Learning Development Unit, Edge Hil University Authors: Paul Greenbank Edge...

  13. The effectiveness of case-based learning in health professional

    Abstract. Background: Case-based learning (CBL) is a long established pedagogical method, which is defined in a number of ways depending on the discipline and type of 'case' employed. In health professional education, learning activities are commonly based on patient cases. Basic, social and clinical sciences are studied in relation to the case, are integrated with clinical presentations ...

  14. Using Case Studies to Improve the Critical Thinking Skills of

    We chose a case study approach because real-world problem solving involves making decisions embedded in context [14, 38]. Learning how to think critically about information available in its context and evaluating evidence through identifying assumptions and gaps to arrive at strong inference is better supported through lessons presented in case ...

  15. How does video case-based learning influence clinical decision-making

    1 Altmetric Metrics Abstract Background Clinical decision-making skills are essential for providing high-quality patient care. To enhance these skills, many institutions worldwide use case-based learning (CBL) as an educational strategy of pre-clinical training.

  16. Effective Decision-Making: A Case Study

    Effective Decision-Making: A Case Study By Ashley Perry | Aug 10, 2022 Effective Decision-Making: Leading an Organization Through Timely and Impactful Action Abstract Senior leaders at a top New England insurance provider need to develop the skills and behaviors for better, faster decision-making.

  17. Case Study-Based Learning

    Case studies are a form of problem-based learning, where you present a situation that needs a resolution. A typical business case study is a detailed account, or story, of what happened in a particular company, industry, or project over a set period of time. The learner is given details about the situation, often in a historical context.

  18. PDF CASE STUDY

    CASE STUDY ALGORITHMIC DECISION-MAKING AND ACCOUNTABILITY This is part of a set of materials developed by an interdisciplinary research team at Stanford University, led by Hilary Cohen, Rob Reich, Mehran Sahami, and Jeremy Weinstein. Their original use is for an undergraduate course on ethics, public policy, and technology,

  19. Decision Making: Articles, Research, & Case Studies on Decision Making

    05 Dec 2023 Research & Ideas Lessons in Decision-Making: Confident People Aren't Always Correct (Except When They Are) by Kara Baskin A study of 70,000 decisions by Thomas Graeber and Benjamin Enke finds that self-assurance doesn't necessarily reflect skill.

  20. AI and Machine Learning for Forecasting and Decision-Making ...

    Microsoft Tokyo's real-world case study demonstrates the power of AI and machine learning in decision-making and forecasting processes. The company has improved accuracy, efficiency, and resource ...

  21. Cooperative learning and case study: does the ...

    The purpose of this study was to investigate the effectiveness of cooperative learning techniques combined with case study on nursing students' self-perception of problem-solving and decision making skills in comparison with other teaching-learning methods.

  22. Decision-Making Exercise (A)

    Provides questionnaires so students can compare their experiences with different decison-making processes. Students read "Growing Pains," a Harvard Business Review (HBR) case study, and then work in teams to come up with recommendations using a consensus approach to decison making. The next day using Decision-Making Exercise (B) and (C) and "Case of the Unhealthy Hospital," another HBR case ...

  23. Effective Decision-Making; Theory And Case Study in Digital ...

    This case study is designed to illustrate the practical application of effective decision-making concepts to answer business challenge. Specifically, focuses on digital transformation in supply ...

  24. Reinforcement learning based two‐timescale energy management for energy

    The decision-making in the real world is complex, and it may no longer be sufficient to rely solely on a single type of action. ... Two-timescale energy management based on reinforcement learning proposed in this study, ... We constructed real-world scenarios using building models provided by DOE and EnergyPlus software to do case studies, and ...

  25. Development of a predictive machine learning model for pathogen

    Hyperparameter Optimization Across Different Machine Learning Models. A-D Performance metrics obtained from a 10-fold cross-validation grid search.A K-Nearest Neighbors (KNN) model accuracy as a function of the number of neighbors, with optimal performance at k = 1.B Boosted Logistic Regression model accuracy across boosting iterations, peaking at nIter = 30.

  26. Classification of inertial sensor-based gait patterns of ...

    Elderly patients often have more than one disease that affects walking behavior. An objective tool to identify which disease is the main cause of functional limitations may aid clinical decision making. Therefore, we investigated whether gait patterns could be used to identify degenerative diseases using machine learning.

  27. Evaluation of cost-effectiveness of single-credit traffic safety course

    Background Training plays a role in reducing traffic accidents, and evaluating the effectiveness of training programs in managers' decision-making for training continuation is important. Thus, the present study aimed to evaluate the cost-effectiveness of a single-credit traffic safety course based on the four levels of the Kirkpatrick model in all Iranian universities. Methods This ...

  28. Efficient data-driven machine learning models for scour depth

    The SVMR algorithm also offered most computationally efficient performance among the algorithms tested. The methodological framework proposed in this study can be applied to scouring datasets for rapid assessment of scour at coastal defence structures, facilitating model-informed decision-making.