Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 December 2022

Systematic analysis of healthcare big data analytics for efficient care and disease diagnosing

  • Sulaiman Khan 1 ,
  • Habib Ullah Khan 1 &
  • Shah Nazir 2  

Scientific Reports volume  12 , Article number:  22377 ( 2022 ) Cite this article

5398 Accesses

8 Citations

7 Altmetric

Metrics details

  • Biotechnology
  • Computational biology and bioinformatics

Big data has revolutionized the world by providing tremendous opportunities for a variety of applications. It contains a gigantic amount of data, especially a plethora of data types that has been significantly useful in diverse research domains. In healthcare domain, the researchers use computational devices to extract enriched relevant information from this data and develop smart applications to solve real-life problems in a timely fashion. Electronic health (eHealth) and mobile health (mHealth) facilities alongwith the availability of new computational models have enabled the doctors and researchers to extract relevant information and visualize the healthcare big data in a new spectrum. Digital transformation of healthcare systems by using of information system, medical technology, handheld and smart wearable devices has posed many challenges to researchers and caretakers in the form of storage, minimizing treatment cost, and processing time (to extract enriched information, and minimize error rates to make optimum decisions). In this research work, the existing literature is analysed and assessed, to identify gaps that result in affecting the overall performance of the available healthcare applications. Also, it aims to suggest enhanced solutions to address these gaps. In this comprehensive systematic research work, the existing literature reported during 2011 to 2021, is thoroughly analysed for identifying the efforts made to facilitate the doctors and practitioners for diagnosing diseases using healthcare big data analytics. A set of rresearch questions are formulated to analyse the relevant articles for identifying the key features and optimum management solutions, and laterally use these analyses to achieve effective outcomes. The results of this systematic mapping conclude that despite of hard efforts made in the domains of healthcare big data analytics, the newer hybrid machine learning based systems and cloud computing-based models should be adapted to reduce treatment cost, simulation time and achieve improved quality of care. This systematic mapping will also result in enhancing the capabilities of doctors, practitioners, researchers, and policymakers to use this study as evidence for future research.

Introduction

Healthcare around the world is under high pressure due to limiting financial resources, over-population, and disease burden. In this modern technological age the healthcare paradigm is shifting from traditional, one-size-fits-all approach to a focus on personalized individual care 1 . Additionally, the healthcare data is varying both in type and amount. The healthcare providers are not only dealing with patient’s historical, physical and namely information, but they also deal with imaging information, labs, and other digital and analogue information consists of ECG, MRI etc. This data is voluminous, varying in type and formats, and of differing structure. These are the capabilities of Big Data to handle not only different types of and forms of data, but can handle 10 V structure including volume, variety, venue, varifocal, varmint, vocabulary, validity, volatility, veracity and velocity. Thus, the doctors facing an increasing burden of rising patient numbers coupled with progressively less time to spend with each patient. In other words, we are facing more patients, more data, and less time.

Big data has significantly attracted the researchers to explore different research fields including healthcare, banking, imaging, smart cities, internet of things (IoT) based smart applications, tracking and transportation system etc. 2 . Software engineers constantly develops new applications for patient’s health and well-being. Both government and non-government organizations develop infrastructure using big data analytics for improved decision making capabilities of both doctors and managers 3 . It was recorded that 80% increase in big data is due to cloud sources, big data analytics, mobile technology and social media technologies 4 . A number of research articles proposed using big data analytics in varying domains especially in healthcare such as Kumar et al. 5 proposed a cognitive technology-based healthcare evaluations system using big data analytics. Chen et al. 6 presented an intelligent healthcare application for brain hemorrhage detection using Big Data analytics and machine learning (ML) techniques. Smart health appointment system is developed by Liang and Zhao using big data analytics is 7 .

Some researchers explored big data analytics in healthcare domain in different ways. They presented survey papers and review papers to understand the meanings of big data analytics in healthcare such as Galetsi and Kasaliasi performed a review of healthcare big data analytics 8 while Lindell defined big data analytics in terms of accounting and business perspectives 9 . Alharthi proposed a review article on healthcare challenges facing in Saudi Arabia by performing analysis of the available literature 10 . Lee et al. 11 presented a survey paper to explore the applications and challenges of healthcare big data analytics. From the literature it is concluded that multiple new applications are developed for big data analysis. Review and survey papers are presented to outline the published literature, but most of these papers are region specific or limited to a few numbers of papers. On the other side systematic review process formulate multiple research question and identifies keywords to explore the available literature from different angles. Systematic analysis of the available literature is presented in many fields like PMIPv6 domain 12 , in smart homes 13 , navigation assistants 14 , and many others, but there is no significant work reported on systematic analysis for healthcare big data domain to find the gaps in the available literature and suggest future research directions.

The inspirational point that led us to pursue this systematic analysis was the pervasive and ubiquitous nature of big data. Efficient management and timely execution are the dire needs of big data, to extract enriched information regarding a certain problem of interest 15 . Many factors involved behind this systematic research work, but the most eminent reasons are:

The exiting research reported on big data does not provide significant information about the key features that should be considered to integrate both structured and unstructured big data in healthcare domain. The pervasiveness of big data features challenging the researchers in pursuing research in this specialized domain. The underlying research on finding the key features will not only help in integrating big data in healthcare domain, but it will also assist in findings new gateways for future research directions.

Digital transformation of healthcare systems after the integration of information system, medical technology and other imaging systems have posed a big barrier for the research community in the form of a vast amount of information to deal with. While the over-population, limited data access, and disease burdens have restricted the doctors and practitioners to check more patients in a limited time. So, finding a suitable model that can efficiently process healthcare big data to extract information for a certain disease symptoms will not only helps the practitioners to suggest accurate medication and check more patients in timely manners, but it will open future research directions for the industrialists and policymakers to develop optimal healthcare big data processing models.

Accurate disease diagnosing by processing of gigantic amount of data, especially a plethora of types of data, within an interested processing domain is a key concern for both researchers and practitioners. Developing an efficient model that can accurately diagnose a certain by classifying images or other historical details of patients will not only helps the doctors to diagnose disease in timely manner and suggest medicine accordingly, but it will encourage the researchers and developers to develop an accurate disease identification model.

The remaining research paper of the paper is organized as follows. Section  2 of the paper outlines the related work reported in the proposed field. Section  3 presents the research framework followed for this systematic research work. Quality assessment is detailed in Sect.  4 . Section  5 outlines the discussion on findings of the proposed systematic research work. Section  6 provides the limitations of this systematic study traced by the conclusion and future work in Sect.  7 of the paper.

Literature review

From the last few decades, we experienced an unprecedented transformation of traditional healthcare systems to digital and portable healthcare applications with the help of information systems, medical technology and other imaging resources 16 . Big data are radically changing the healthcare system by encouraging the healthcare organizations to embrace extraction of relevant information from imaginary data and other clinical records. This information will produce high throughput in terms of accurate disease diagnosing, plummeting treatment cost increase availability. In data visualization context the term ‘big data’, is firstly introduced in 1997 17 , posed an ambitious and exceptional challenge for both policy-makers and doctors with special emphasis on personalized medicine. Nonetheless, data gathering moves faster than both data analysis and data processing, emphasizing the widening gap between the rapid technological progress in data acquisition and the comparatively slow functional characterization of healthcare information. In this regard, the historical information (phonotypical and other genomic information) of an individual patient form electronic health records (EHR) are becoming of critical importance. Figure  1 represents the primary sources of big data.

figure 1

Main steps of the research protocol.

Significant research work has been reported in the domains of healthcare big data analytics. To process this vast amount of information in timely manner and identify someone’s health condition based on his her is more difficult. Researchers proposed numerous applications to address this problem such as; Syed et al. 18 proposed a machine learning-based healthcare system for providing remote healthcare services to both diseased and healthy population using big data analytics and IoT devices. Venkatesh et al. 19 developed heart disease prediction model using big data analytics and Naïve Bayes classification technique. Kaur et al. 20 suggested a machine learning (ML) based healthcare application for disease diagnosing and data privacy restrictions. This model works by considering different aspects like activity monitoring, granular access control and mask encryption. Some researchers presented review and survey papers to outline the recent published work in a specific directions such as Patel and Gandhi reviewed the literature for identifying the machine learning approaches proposed for healthcare big data analytics 21 . Rumbold et al. 22 reviewed the literature for find the research work reported for diabetic diagnosing using big data analytics.

From the above discussions, it is worth mentioning that most of the researchers and industrialists gave significant attention towards the development of new computational models or surveyed the literature in a specific research direction (heart disease detection, diabetes detection, storage and security analysis etc.), but no significant research work is reported to systematically analyze the literature with different perspectives. To address this problem, this research work presents a systematic literature review (SLR) work to analyze the literature reported in healthcare big data analytics domain. This systematic analysis will not only find the gaps in the available literature but it will also suggest new directions of future research to explore.

Research framework

Systematic literature reviews and meta-analysis has gained significant attention and became increasingly important in healthcare domain. Clinicians, developers and researchers follow SLR studies to get updated about new knowledge reported in their fields 23 , 24 , and they are often followed as a starting point for preparing basic records. Granting agencies mostly requires SLR studies to ensure justification of further research 25 , and even some healthcare journals follows this direction 26 . Keeping these SLR applications in mind the proposed systematic analysis is performed following the guidelines presented by Moher et al. 27 (PRISMA) and Kitchenham et al. 28 . This SLR work accumulates the most relevant research work from primary sources. These papers are then evaluated and analyzed to grab the best results for the selected research problem. Figure  2 represents the results after following the PRISMA guidelines. This systematic analysis are performed using the following preliminary steps:

Identification of research questions to systematically analyze the proposed field from different perspectives.

Selection of relevant keywords and queries to download the most relevant research articles.

Selection of peer-reviewed online databases to download relevant research articles published in healthcare big data domain during the period ranging from 2011 – 2021.

Perform inclusion and exclusion process based on title, abstract and the contents presented in the article to remove duplicate records.

Assess the finalized relevant articles for identifying gaps in the available literature and suggest new research directions to explore.

figure 2

PRISMA process model for articles accumulation, screening, and final selection.

Research questions

Selecting a well-constructed research question(s) is essential for a successful review process. We formulate a set of five research questions based on the Goal Questions Metrics approach proposed by Van Solingen et al. 29 . The formulated research questions are depicted in Table 1 below.

Search strategy

Search strategy is the key step in any systematic research work because this is the step that ensures the most relevant article for the analysis and the assessment process. To define a well-organized search strategy a search string is developed using the formulated relevant keywords. For the accumulation of most relevant articles for a certain research problem, only keywords are not sufficient. These keywords are concatenated in different strings for searching articles in multiple online repositories 30 . Inspired from the SLR work of Achimugu et al. 31 , in software requirement domain, our search strategy consists of four main steps includes identification of keywords relevant to selected research problem, formulation of search string based on the keywords, and selection of online repositories to accumulate relevant articles to the problem selected.

Selection of keywords

List of keywords are defined for each research question to download all relevant articles. Some researchers defined a generic query 32 and starts downloading articles. Although it is simple for the accumulation of articles from online database but mostly it tends to skip some most relevant articles. So, the correct option is to define keywords for each research question. In fact, it is a hectic job, but it ensures the retrieval of each relevant article from online databases regarding a certain research problem.

Formulation of search string

Search strings (queries) are formulated using the keywords identified from the selected research questions. The search string is tested in online databases and was modified according to retrieve each relevant articles from these databases. Inspired from the guidelines proposed by Wohlin 33 , following are the key steps undertaken to develop an optimal search string:

Identification of key terms from the formulated topic and research questions

Selection of alternate words or synonyms for key terms

Use “OR” operator for alternating words or synonyms during query formation

Link all major terms with Boolean “AND” operator to validate every single keyword.

Following all these preliminary steps a generic query/search-string is developed that is depicted in Table 2 . This generic query is further refined for each research question as depicted in Table 3 to retrieve each relevant article.

Selection of online repositories

After identifying keywords and formulating search strings the next step is to download relevant articles specific to the interested research problem. For the accumulation of relevant articles six well-known and peer-reviewed online repositories are selected, as depicted in Table 3 .

Articles accumulation and final database development

For relevant articles accumulation and final database development we followed the guidelines suggested by Kable et al. 34 . After specifying the research questions, identifying keywords, and formulating search queries, and selecting online repositories, the next key step is to develop a relevant articles database for the analysis and assessment purposes that includes three prime steps: (1) identification of inclusion/exclusion criteria for a certain research article(s), and (2) Relevant articles database development. These steps are discussed in detail below.

Inclusion and exclusion criteria

After selecting online database and starts the articles downloading process, the most tedious task that the author (s) facing, is the decision about whether a certain paper should be included in the final database or not? To overcome this problem an inclusion and exclusion criteria is defined for the inclusion of a certain article in the final set of articles. Table 4 represents the inclusion and exclusion criteria followed for this systematic research work.

A manual process is followed by the authors for the inclusion and exclusion of a certain article. These articles are evaluated based on title, abstract and information provided in the overall paper. If more than half authors agree upon the inclusion of a certain article based on these parameters (title, abstract, and contents presented in the article), then that paper was counted in the final database otherwise rejected. A total of 134 relevant primary studies are selected for the final assessment process. To ensure no skip of relevant article snowballing is applied to retrieve each relevant article.

Snowballing To extract each relevant primary article snowballing is applied in the proposed research work 33 . In this systematic analysis both types of snowballing backward and forward snowballing is applied to ensure extraction of each relevant primary article. 145 relevant articles retrieved after applying snowballing process. These articles are then filtered by title and resulted for 53 relevant articles. After further processing by abstract resulted into 19 articles, and at last when filtered by contents presented in the paper resulted into only 5 relevant articles. This overall process is depicted in Fig.  3 . After adding these articles to the accumulated relevant articles, a total of 139 articles added to the final database.

figure 3

Extraction of each relevant article using snowballing.

Relevant articles database development

After accumulating each primary article reported in the proposed field, a database of relevant articles is developed for the assessment and analysis work, to find the current available trends in healthcare big data analytical domain and investigate the gaps in these research articles to open new gates for future research work. A total of 139 relevant articles are added to the final database. The overall contribution of the selected online repositories in the relevant articles database development is depicted in Fig.  4 .

figure 4

Distribution of primary studies.

From Fig.  4 , it is concluded that IEEE Xplore and Science Direct contributing the more that reflects the interest of research community to present their work with.

After developing a database of relevant articles, it is evaluated using different parameters like type of article (conference proceedings, journal article, book chapter etc.), publication year, and contribution of individual library. Figure  5 represents the information regarding the total contribution of articles by type in the final database.

figure 5

Evolution of final database by type of article and year.

Figure  5 concludes that the researchers paid significant attention towards the development of new healthcare systems instead of finding the gaps in the available systems and develop enhanced solutions accordingly. This enhanced solution can accurately identify and diagnose a certain disease based on patient’s historical medical information. A small amount of work is reported using review articles, survey papers, but no systematic mechanism is followed to analyse the work in specific range of years followed by a set of research questions. The same problem can also be seen from Fig.  6 where highest percentage contribution is shown more comparative to book sections, conference papers etc.

figure 6

Percentage contribution by type of paper.

Figure  7 depicts the percentage contribution of each library in the proposed assessment work.

figure 7

Percentage contribution of each library.

Figure  8 represents the annual distribution of articles selected for the analysis and assessment purposes. Form Fig.  8 it is evident, that with passage of time number of articles increases, and that shows the maturity and interest of the researchers in this specific domain.

figure 8

Annual distribution of articles.

From Fig.  8 , it is concluded that IEEE Xplore contributing the more in the final database of relevant articles that shows the trend of researchers to present healthcare relevant works in the IEEE journals. Figure  9 represents the total number of journal articles, survey papers, conference papers, and book sections in the selected relevant articles database.

figure 9

Evolution of database by number of articles by type.

From Fig.  9 it is concluded that significant attention is given towards the development of new healthcare models. This shows the maturity of the proposed field. Dealing with such a mature field and extracting useful information is hectic job for the researchers. A systematic analysis of this research field is needed to provide an overview of the work reported during a specific range of years. This analysis will not only save precious time of the researchers, but it will also open gates for the future research work in this field.

Table 5 represents the annual contribution of studies in the final relevant database.

Overall information regarding type of paper, publication year and number of records is depicted in Fig.  10 below.

figure 10

Evolution of final database.

Quality assesment

After executing exclusion and inclusion process, all the relevant articles in the database are manually assessed by authors to check the relevancy of each article with the selected research problem. A quality criterion is defined to check every research article against the formulated research questions. This quality criteria is defined in Table 6 .

Weighted values are assigned against each quality criteria to check the relevancy of an article with a certain research question. These weighted values and description is depicted in Fig.  11 .

figure 11

Quality criteria for the proposed SLR work.

After the assessment process, the relevancy of each article is decided based on its aggregated weighting score. If the score is greater than 3 it represents the most relevancy of an article to the selected research topic. Figure  12 represents the aggregate score values of each article based on the defined quality assessment criteria.

figure 12

Quality assessment process.

Results and discussion

After executing the quality assessment work, the next key step of an SLR work is, to analyse all the relevant article to identify different techniques proposed for efficient communication between patient and practitioner, accurate feature extraction from healthcare big data and implement it in practical use.

This section of the paper performs a descriptive analysis of each article based on five research questions. In this systematic review process, a total of 139 research articles published during the period ranging from 2011 to 2021.

Healthcare big data

The researcher and data analysts suggested no contextual name for “big data” in healthcare, but for implementation and interpretation purposes they divided it into 5 V architecture. Figure  13 depicts a 5 V architecture of big data.

figure 13

Big Data 5Vs 15 .

The exponential increase in IoT-based smart devices and information systems resulted a plethora of information in healthcare domain. This information increases exponentially on daily basis. These smart IoT based healthcare devices produces a huge of data. An alternated term “Big Data” is selected for this gigantic amount of data. This is the data for which scale, diversity, and complexities require innovative structure, variables, design, and analytics for efficient utilization and management, accurate data extraction and visualization, and to grab hidden stored information regarding a specific problem of interest. Main idea behind the implementation of healthcare big data analytics is to retrieve enriched information from huge amount of data using different machine leering and data mining techniques 191 . These techniques help in improving quality of care, reducing cost of care, and helps the practitioners to suggest medicines based on clinical historical information.

RQ1. What are the key features adapted to integrate the structured and unstructured data in healthcare big data domain?

Big data comprises a huge amount of data to be processed, especially a plethora of types of data to process and extract enriched information regarding a problem of interest. Several features are assessed and analyzed especially in healthcare domain, to integrate both structural and non-structural data. Multiple researchers analyzed semantic based big data features for big data integration purposes while some researchers proposed behavior and structural based features for patient monitoring and activity management purposes 151 , 192 . While some performed real-time analysis using a group of people for data integrating and clustering purposes. Table 7 enlists the research work published for the structural and non-structural data integration purposes.

After analysing the available literature in Table 8 , it was concluded that mostly semantic based, structure-based, and real-time activity-based features are considered for the information extraction and organization purposes. If we consider geometric based feature and adapt clustering mechanism for data organization purposes, then this will not only integrate both structural and non-structural data efficiently, but it will improve the simulation capabilities of different applications.

RQ2. What are different techniques proposed to provide an easy and timely data-access interface for doctors?

Digital transformation of healthcare systems by using of information system, medical technology, handheld and smart wearable devices has posed many challenges for both the researchers and caretakers in the form of storage, dropping the cost of care and processing time (to extract relevant information for refining quality of care and reduce waste and error rates). Prime goal of healthcare big data analytics is, to process this vast amount of data using machine learning and other processing models to extract certain problem relevant information and use it for human well beings 195 . Several supervised and unsupervised classification techniques are followed for the said purposes. ML-based architectures and big data analytical techniques are integrated in healthcare domain for efficient information retrieval and exchange purposes, risk analysis, optimum decision-support system in clinics, and suggesting precise medicines using genomic information 196 . Table 8 represent the literature reported for the providence of an easy and timely data-access interface for the practitioners.

RQ3. What are different ways to improve communication between the doctor and patient?

Healthcare around the world is under high pressure due to limiting financial resources, over-population, and disease burden. In this modern technological age, the healthcare paradigm is shifting from traditional, one-size-fits-all approach to a focus on personalized individual care 1 . Additionally, the healthcare data is varying both in type and amount. The healthcare providers are not only dealing with patient’s historical, physical and namely information, but they also deal with imaging information, labs, and other digital and analogue information consists of ECG, MRI etc. This data is voluminous, varying in type and formats, and of differing structure. These are the capabilities of Big Data to handle not only different types of and forms of data, but can handle 5 V structure including volume, variety, value, veracity, and velocity. Thus, the doctors facing an increasing burden of rising patient numbers coupled with progressively less time to spend with each patient. In other words, we are dealing with more patients, more data, and less time.

Different techniques are proposed in the literature to provide an easy and timely communication interface for both doctors and patients. Table 9 depicts different information exchange tools/techniques reported in the literature.

RQ4. What are different types of classification models proposed for accurate disease diagnosing using patient historical information?

This research question aims to outline different disease diagnosing models proposed in the literature using healthcare big data. Around the world diverse approaches are proposed by researchers for healthcare big data analysis to ensure accurate disease diagnosing capabilities, provide healthcare facilities at doorstep, development of eHealth and mHealth applications, and many others. Multiple statistical and ML-based approaches proposed for accurate diagnosing purposes. Figure  14 represents multiple techniques proposed for automatic disease diagnosing purposes using healthcare big data domain.

figure 14

Multiple disease diagnosing techniques proposed in the literature.

All these techniques perform the diagnosing process using semantic-based features or structural based features. But no attention is given towards geometric feature extraction techniques that are prominent in extracting enriched information from data and results in high identification rates. Also, no advanced hybrid neural network and shallow architectures are proposed for the automatic diagnosing purposes. Keeping these gaps in mind, an optimum eHealth application can be developed by applying these hybrid techniques.

RQ5. What are different applications of big data analytics in healthcare domain?

Big data analytics has revolutionized our lives by presenting many state of the art applications in various domains ranging from eHealth to mHealth, weather forecasting to climate changes, traffic management to object detection, and many others. This research question mainly focusing on enlisting different applications of big data analytics in Table 10 .

Limitations

This article has a number of limitations. Some of these limitations are listed below.

For this systematic analysis articles are only accumulated from six different peer-reviewed libraries (ACM, SpringerLink, Taylor & Francis, Science Direct = IEEE Xplore, and Wiley online library), but there exist a number of multi-disciplinary databases for articles accumulation purposes.

This systematic analysis covers a specific range of years (2011 –2021), while a number of articles are reporting on daily basis.

Articles are accumulated from online libraries using search queries, so if a paper has no matching words to the query, then it was skipped during search process.

Google Scholar is skipped during the articles accumulation phase to shorten the searching time. Also, it gives access to both peer-reviewed and non-peer-reviewed journals and we only focused on peer-reviewed journals for the relevant articles.

Being a systematic literature work it can be broadened to grab the knowledge about other varying topics such as healthcare data commercialization, health sociology etc.

Besides these limitations we hope that this systematic research work will be an inspiration for future research in the recommended fields and will open gates for both industrialists and policymakers.

Conclusion and future work

In this research article, the existing research reported during 2011 to 2021 is thoroughly analysed for the efforts made by researchers to help caretakers and clinicians to make authentic decisions in disease diagnosing and suggest medicines accordingly. Based on the research problem and underlying requirements, the researchers proposed several feature extraction, identification, and remote communication frameworks to develop doctor and patient communication in a timely fashion. These real-time or nearer to real-time applications mostly use big data analytics and computational devices. This research work identified several key features and optimum management designs proposed in healthcare big data analytical domain to achieve effective outcomes in disease diagnosing. The results of this systematic work suggests that advanced hybrid machine learning-based models and cloud computing application should be adapted to reduce treatment cost, simulation time, and achieve improved quality of care. The findings of this research work will not only help the policymakers to encourage the researchers and practitioners to develop advanced disease diagnosing models, but it will also assist in presenting an improved quality of treatment mechanism for patients.

Advanced hybrid machine learning architectures for cognitive computing are considered as the future toolbox for the data-driven analysis of healthcare big data. Also, geometric-based features must be considered for feature extraction purposes instead of semantic and structural-based features. These geometric-based feature extraction techniques will not only reduce the simulation time, but it will also improve the identification and disease diagnosing capabilities of smart health devices. Additionally, these features can help in accurate identification of Alzheimer, tumours in PET or MRI images using upgraded machine learning and big data analytics. Cluster-based mechanism should be considered for data organization purposes to improve big data timely-access and easy-management capabilities. Promoting research in these areas will be crucial for future innovation in healthcare domain.

Data availability

The data used and/or analyzed during the current study available from the corresponding author on reasonable request.

Rahman, F. & Slepian, M. J. Application of big-data in healthcare analytics—Prospects and challenges. In 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) 13–16 (2016).

Khan, N. et al. Big data: Survey, technologies, opportunities, and challenges. Sci. World J. 2014 , 1–18 (2014).

Google Scholar  

Groves, P., Kayyali, B., Knott, D. & Van Kuiken, S. The ‘big data ‘revolution in healthcare. In McKinsey Quarterly (2013).

Andreu-Perez, J., Poon, C. C., Merrifield, R. D., Wong, S. T. & Yang, G.-Z. Big data for health. IEEE J. Biomed. Health Inform. 19 , 1193–1208 (2015).

Article   Google Scholar  

Kumar, M. A., Vimala, R. & Britto, K. A. A cognitive technology based healthcare monitoring system and medical data transmission. Measurement 146 , 322–332 (2019).

Article   ADS   Google Scholar  

Chen, H., Khan, S., Kou, B., Nazir, S., Liu, W. & Hussain, A. A smart machine learning model for the detection of brain hemorrhage diagnosis based internet of things in smart cities. Complexity 2020 (2020).

Liang, Y. & Zhao, L. Intelligent hospital appointment system based on health data bank. Procedia Comput. Sci. 159 , 1880–1889 (2019).

Galetsi, P. & Katsaliaki, K. A review of the literature on big data analytics in healthcare. J. Oper. Res. Soc. 1–19 (2019).

Lindell, J. What are big data and analytics?. In Analytics and Big Data for Accountants (2018).

Alharthi, H. Healthcare predictive analytics: An overview with a focus on Saudi Arabia. J. Infect. Public Health 11 , 749–756 (2018).

Lee, C. et al. "Big healthcare data analytics: Challenges and applications. In Handbook of Large-Scale Distributed Computing in Smart Healthcare 11–41 (Springer, 2017).

Chapter   Google Scholar  

Hussain, A., Nazir, S., Khan, S. & Ullah, A. Analysis of PMIPv6 extensions for identifying and assessing the efforts made for solving the issues in the PMIPv6 domain: A systematic review. Comput. Netw. 179 , 107366 (2020).

Khan, H.-U. et al. Systematic analysis of safety and security risks in smart homes. Comput. Mater. Contin. 68 , 1409–1428 (2021).

Khan, S., Nazir, S. & Khan, H.-U. Analysis of navigation assistants for blind and visually impaired people: A systematic review. IEEE Access 9 , 26712–26734 (2021).

Nazir, S. et al. A comprehensive analysis of healthcare big data management, analytics and scientific programming. IEEE Access 8 , 95714–95733 (2020).

Kitchin, R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. 1 , 2053951714528481 (2014).

Cox, M. & Ellsworth, D. Application-controlled demand paging for out-of-core visualization. In Proceedings. Visualization’97 (Cat. No. 97CB36155) 235–244 (1997).

Syed, L., Jabeen, S., Manimala, S. & Elsayed, H. A. Data science algorithms and techniques for smart healthcare using IoT and big data analytics. In Smart Techniques for a Smarter Planet 211–241 (Springer, 2019).

Venkatesh, R., Balasubramanian, C. & Kaliappan, M. Development of big data predictive analytics model for disease prediction using machine learning technique. J. Med. Syst. 43 , 272 (2019).

Article   CAS   Google Scholar  

Kaur, P., Sharma, M. & Mittal, M. Big data and machine learning based secure healthcare framework. Procedia Comput. Sci. 132 , 1049–1059 (2018).

Patel, H. B. & Gandhi, S. A review on big data analytics in healthcare using machine learning approaches. In 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI) 84–90 (2018).

Rumbold, J. M. M., O’Kane, M., Philip, N. & Pierscionek, B. K. Big Data and diabetes: The applications of Big Data for diabetes care now and in the future. Diabetic Med. (2019).

Oxman, A. D. et al. Users’ guides to the medical literature: VI. How to use an overview. JAMA 272 , 1367–1371 (1994).

Swingler, G. H., Volmink, J. & Ioannidis, J. P. Number of published systematic reviews and global burden of disease: database analysis. BMJ 327 , 1083–1084 (2003).

Research, C. I. O. H. Randomized controlled trials registration/application checklist (12/2006). Available at: http://www.cihr-irsc.gc.ca/e/documents/rct_reg_e.pdf . Accessed 22 June 2009.

Young, C. & Horton, R. Putting clinical trials into context. Lancet 366 , 107–107 (2005).

P. Group, Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med. 6 , e1000097 (2009).

Kitchenham, B. & Charters, S. Guidelines for performing systematic literature reviews in software engineering (2007).

Van Solingen, R., Basili, V., Caldiera, G. & Rombach, H. D. Goal question metric (gqm) approach. Encycl. Softw. Eng. (2002).

Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M. & Khalil, M. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80 , 571–583 (2007).

Achimugu, P., Selamat, A., Ibrahim, R. & Mahrin, M. N. R. A systematic literature review of software requirements prioritization research. Inf. Softw. Technol. 56 , 568–585 (2014).

Nazir, S., Ali, Y., Ullah, N. & García-Magariño, I. Internet of things for healthcare using effects of mobile computing: A systematic literature review. Wirel. Commun. Mobile Comput. 109 , 5931315 (2019).

Wohlin, C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering 1–10 (2014).

Kable, A. K., Pich, J. & Maslin-Prothero, S. E. A structured approach to documenting a search strategy for publication: A 12 step guideline for authors. Nurse Educ. Today 32 , 878–886 (2012).

Helmer, A., Kretschmer, F., Müller, F., Eichelberg, M., Deparade, R., Tegtbur, U. et al. Integration of medical models in personal health records using the example of rehabilitation training for cardiopulmonary patients. In 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI) 1887–1892 (2011).

Tian, M. Integrated feature based medical image retrieval. In 2011 International Conference on Control, Automation and Systems Engineering (CASE) 1–3 (2011).

Chaves, R., Ramírez, J., Górriz, J. M., Illán, I. A. & Salas-Gonzalez, D. FDG and PIB biomarker PET analysis for the Alzheimer’s disease detection using Association Rules. In 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC) 2576–2579 (2012).

Chute, C. G. Obstacles and options for big-data applications in biomedicine: The role of standards and normalizations. In 2012 IEEE International Conference on Bioinformatics and Biomedicine (2012).

Goel, A. & Chandra, N. A prototype model for secure storage of medical images and method for detail analysis of patient records with PACS. In 2012 International Conference on Communication Systems and Network Technologies 167–170 (2012).

Huang, H. & Hsiao, I. Use of anatomical information in a Bayesian reconstruction with an edge-preserving median prior. In 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC) 3321–3323 (2012).

López, C. M., Welkenhuysen, M., Musa, S., Eberle, W., Bartic, C., Puers, R. et al. Towards a noise prediction model for in vivo neural recording. In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society 759–762 (2012).

Ng, H., Chuang, C. & Hsu, C. Extraction and analysis of structural features of lateral ventricle in brain medical images. In 2012 Sixth International Conference on Genetic and Evolutionary Computing 35–38 (2012).

Patel, A. B., Birla, M. & Nair, U. Addressing big data problem using Hadoop and Map Reduce. In 2012 Nirma University International Conference on Engineering (NUiCONE) 1–5 (2012).

Zheng, G., Yu, L., Feng, Y., Han, Z., Chen, L., Zhang, S. et al. Seizure prediction model based on method of common spatial patterns and support vector machine. In 2012 IEEE International Conference on Information Science and Technology 29–34 (2012).

Li, L., Bagheri, S., Goote, H., Hasan, A. & Hazard, G. Risk adjustment of patient expenditures: A big data analytics approach. In 2013 IEEE International Conference on Big Data 12–14 (2013).

Loshin, D. Chapter 8—Developing big data applications. In Big Data Analytics (ed. Loshin, D.) 73–81 (Morgan Kaufmann, 2013).

Chapter   MATH   Google Scholar  

Loshin, D. Chapter 9—NoSQL data management for big data. In Big Data Analytics (ed. Loshin, D.) 83–90 (Morgan Kaufmann, 2013).

Loshin, D. Chapter 1—Market and business drivers for big data analytics. In Big Data Analytics (ed. Loshin, D.) 1–9 (Morgan Kaufmann, 2013).

MATH   Google Scholar  

Purkayastha, S. & Braa, J. Big data analytics for developing countries–Using the cloud for operational BI in health. Electron. J. Inf. Syst. Dev. Ctries. 59 , 1–17 (2013).

Lin, C.-H., Huang, L.-C., Chou, S.-C. T., Liu, C.-H., Cheng, H.-F. & Chiang, I. J. Temporal event tracing on big healthcare data analytics. In 2014 IEEE International Congress on Big Data 281–287 (2014)

Martínez, J. G., Ramos-Becerril, F. J., Leija, L., López, F., García, U., Vera, A. et al. Development of an electronic equipment for the pre medical diagnose in the progress of diabetic foot disease. In 2014 International Conference on Control, Decision and Information Technologies (CoDIT) 679–683 (2014).

Mian, M., Teredesai, A., Hazel, D., Pokuri, S. & Uppala, K. Work in progress—In-memory analysis for healthcare big data. In 2014 IEEE International Congress on Big Data 778–779 (2014).

Panahiazar, M., Taslimitehrani, V., Jadhav, A. & Pathak, J. Empowering personalized medicine with big data and semantic web technology: Promises, challenges, and use cases. In 2014 IEEE International Conference on Big Data (Big Data) 790–795 (2014).

Vargheese, R. Dynamic protection for critical health care systems using cisco CWS: Unleashing the power of big data analytics. In 2014 Fifth International Conference on Computing for Geospatial Research and Application 77–81 (2014).

Archenaa, J. & Anita, E. A. M. A survey of big data analytics in healthcare and government. Procedia Comput. Sci. 50 , 408–413 (2015).

Boman, M. & Sanches, P. Sensemaking in intelligent health data analytics. KI Künstliche Intell. 29 , 143–152 (2015).

Chong, D. & Shi, H. Big data analytics: A literature review. J. Manag. Anal. 2 , 175–201 (2015).

Dantanarayana, G., Sahama, T. & Wikramanayake, G. Quality of information for quality of life: Healthcare big data analytics. In 2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer) 281–281 (2015).

Gomathi, S. & Narayani, V. Implementing big data analytics to predict systemic lupus erythematosus. In 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS) 1–5 (2015).

Hussain, S. & Lee, S. Semantic transformation model for clinical documents in big data to support healthcare analytics. In 2015 Tenth International Conference on Digital Information Management (ICDIM) 99–102 (2015).

Kuo, M., Chrimes, D., Moa, B. & Hu, W. Design and construction of a big data analytics framework for health applications. In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity) 631–636 (2015).

Mehmood, R. & Graham, G. Big data logistics: A health-care transport capacity sharing model. Procedia Comput. Sci. 64 , 1107–1114 (2015).

Raj, P., Raman, A., Nagaraj, D. & Duggirala, S. Big data analytics for healthcare. In High-Performance Big-Data Analytics Computer Communications and Networks 1525–1525 (Springer, Cham, 2015).

Viceconti, M., Hunter, P. & Hose, R. Big data, big knowledge: Big data for personalized healthcare. IEEE J. Biomed. Health Inform. 19 , 1209–1215 (2015).

Wang, M. D. Biomedical big data analytics for patient-centric and outcome-driven precision health. In 2015 IEEE 39th Annual Computer Software and Applications Conference 1–2 (2015).

Batarseh, F. A. & Latif, E. A. Assessing the quality of service using big data analytics: With application to healthcare. Big Data Res. 4 , 13–24 (2016).

Buzzi, M. C. et al. Facebook: A new tool for collecting health data?. Multimed. Tools Appl. 76 , 10677–10700 (2016).

Chauhan, R., Jangade, R. & Mudunuru, V. K. A cloud based environment for big data analytics in healthcare. In International Conference on Soft Computing and Pattern Recognition 315–321 (2016).

Stefano, A. D., Corte, A. L., Lió, P. & Scatá, M. Bio-inspired ICT for big data management in healthcare. In Intelligent Agents in Data-intensive Computing 1–26 (Springer, 2016).

Gupta, S. & Tripathi, P. An emerging trend of big data analytics with health insurance in India. In 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH) 64–69 (2016).

Haas, M. et al. Big data to smart data in Alzheimer’s disease: Real-world examples of advanced modeling and simulation. Alzheimers Dement. 12 , 1022–1030 (2016).

Jiang, P. et al. An intelligent information forwarder for healthcare big data systems with distributed wearable sensors. IEEE Syst. J. 10 , 1147–1159 (2016).

Kankanhalli, A., Hahn, J., Tan, S. & Gao, G. Big data and analytics in healthcare: Introduction to the special section. Inf. Syst. Front. 18 , 233–235 (2016).

Kashyap, H., Ahmed, H. A., Hoque, N., Roy, S. & Bhattacharyya, D. K. Big data analytics in bioinformatics: Architectures, techniques, tools and issues. Netw. Model. Anal. Health Inform. Bioinform. 5 , 28 (2016).

Lv, Z., Chirivella, J. & Gagliardo, P. Bigdata oriented multimedia mobile health applications. J. Med. Syst. 40 , 120 (2016).

Pandey, M. K. & Subbiah, K. A novel storage architecture for facilitating efficient analytics of health informatics big data in cloud. In 2016 IEEE International Conference on Computer and Information Technology (CIT) 578–585 (2016).

Plachkinova, M., Vo, A., Bhaskar, R. & Hilton, B. A conceptual framework for quality healthcare accessibility: A scalable approach for big data technologies. Inf. Syst. Front. 20 , 289–302 (2016).

Rallapalli, S., Gondkar, R. R. & Ketavarapu, U. P. K. Impact of processing and analyzing healthcare big data on cloud computing environment by implementing hadoop cluster. Procedia Comput. Sci. 85 , 16–22 (2016).

Sakr, S. & Elgammal, A. Towards a comprehensive data analytics framework for smart healthcare services. Big Data Res. 4 , 44–58 (2016).

Xu, B. et al. Healthcare data analytics: Using a metadata annotation approach for integrating electronic hospital records. J. Manag. Anal. 3 , 136–151 (2016).

Tresp, V. et al. Going digital: A survey on digitalization and large-scale data analytics in healthcare. Proc. IEEE 104 , 2180–2206 (2016).

Straton, N., Hansen, K., Mukkamala, R. R., Hussain, A., Gronli, T., Langberg, H. et al. Big social data analytics for public health: Facebook engagement and performance. In 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom) 1–6 (2016).

Abouelmehdi, K., Beni-Hssane, A., Khaloufi, H. & Saadi, M. Big data security and privacy in healthcare: A review. Procedia Comput. Sci. 113 , 73–80 (2017).

Alonso, S. G., de la Torre, Diez I., Rodrigues, J. J., Hamrioui, S. & Lopez-Coronado, M. A systematic review of techniques and sources of big data in the healthcare sector. J. Med. Syst. 41 , 183 (2017).

Anjum, A. et al. Big data analytics in healthcare: A cloud-based framework for generating insights. In Cloud Computing 153–170 (Springer, 2017).

Barik, R. K., Dubey, H. & Mankodiya, K. SOA-FOG: Secure service-oriented edge computing architecture for smart health big data analytics. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 477–481 (2017).

Cano, I., Tenyi, A., Vela, E., Miralles, F. & Roca, J. Perspectives on big data applications of health information. Curr. Opin. Syst. Biol. 3 , 36–42 (2017).

A. Di Meglio and M. Manca, "From Big Data to Big Insights: The Role of Platforms in Healthcare IT," in New Perspectives in Medical Records, ed: Springer, 2017, pp. 33–47.

Manogaran, G. et al. Big data analytics in healthcare Internet of Things. In Innovative Healthcare Systems for the 21st Century 263–284 (Springer, 2017).

Plageras, A. P., Stergiou, C., Kokkonis, G., Psannis, K. E., Ishibashi, Y., Kim, B. et al. Efficient large-scale medical data (eHealth Big Data) analytics in Internet of Things. In 2017 IEEE 19th Conference on Business Informatics (CBI) 21–27 (2017).

Pramanik, M. I., Lau, R. Y. K., Demirkan, H. & Azad, M. A. K. Smart health: Big data enabled health paradigm within smart cities. Expert Syst. Appl. 87 , 370–383 (2017).

Spanoudakis, G., Katrakazas, P., Koutsouris, D., Kikidis, D., Bibas, A. & Pontopidan, N. H. Public health policy for management of hearing impairments based on big data analytics: EVOTION at genesis. In 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE) 525–530 (2017).

Wu, J., Li, H., Liu, L. & Zheng, H. Adoption of big data and analytics in mobile healthcare market: An economic perspective. Electron. Commer. Res. Appl. 22 , 24–41 (2017).

Aceto, G., Persico, V. & Pescape, A. The role of Information and Communication Technologies in healthcare: Taxonomies, perspectives, and challenges. J. Netw. Comput. Appl. 107 , 125–154 (2018).

Antoniou, C., Dimitriou, L. & Pereira, F. Mobility Patterns, Big Data and Transport Analytics: Tools and Applications for Modeling (Elsevier, 2018).

Bates, D. W., Heitmueller, A., Kakad, M. & Saria, S. Why policymakers should care about “big data” in healthcare. Health Policy Technol. 7 , 211–216 (2018).

Choi, T.-M., Wallace, S. W. & Wang, Y. Big data analytics in operations management. Prod. Oper. Manag. 27 , 1868–1883 (2018).

Forestiero, A. & Papuzzo, G. Distributed algorithm for big data analytics in healthcare. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) 776–779 (2018).

Ganesh, S. & Talukder, A. K. Formal methods, artificial intelligence, big-data analytics, and knowledge engineering in medical care to reduce disease burden and health disparities. In International Conference on Big Data Analytics 307–321 (2018).

Giacalone, M., Cusatelli, C. & Santarcangelo, V. Big data compliance for innovative clinical models. Big Data Res. 12 , 35–40 (2018).

Guha, S. & Kumar, S. Emergence of big data research in operations management, information systems, and healthcare: Past contributions and future roadmap. Prod. Oper. Manag. 27 , 1724–1735 (2018).

Gupta, V., Singh Gill, H., Singh, P. & Kaur, R. An energy efficient fog-cloud based architecture for healthcare. J. Stat. Manag. Syst. 21 , 529–537 (2018).

Hopp, W. J., Li, J. & Wang, G. Big data and the precision medicine revolution. Prod. Oper. Manag. 27 , 1647–1664 (2018).

Huang, H. K. Big data in PACS-based multimedia medical imaging informatics. In PACS Based Multimedia Imaging Informatics (ed Huang, H.) 575–589 (2018).

Istepanian, R. S. H. & Al-Anzi, T. m-Health 2.0: New perspectives on mobile health, machine learning and big data analytics. Methods 151 , 34–40 (2018).

Khaloufi, H., Abouelmehdi, K., Beni-hssane, A. & Saadi, M. Security model for big healthcare data lifecycle. Procedia Comput. Sci. 141 , 294–301 (2018).

Krittanawong, C., Johnson, K. W., Hershman, S. G. & Tang, W. H. W. Big data, artificial intelligence, and cardiovascular precision medicine. Expert Rev. Precis. Med. Drug Dev. 3 , 305–317 (2018).

Ma, X., Wang, Z., Zhou, S., Wen, H. & Zhang, Y. Intelligent healthcare systems assisted by data analytics and mobile computing. In 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC) 1317–1322 (2018).

Manogaran, G. et al. A new architecture of Internet of Things and big data ecosystem for secured smart healthcare monitoring and alerting system. Future Gener. Comput. Syst. 82 , 375–387 (2018).

Mehta, N. & Pandit, A. Concurrence of big data analytics and healthcare: A systematic review. Int. J. Med. Inform. 114 , 57–65 (2018).

Miller, J. B. Big data and biomedical informatics: Preparing for the modernization of clinical neuropsychology. Clin. Neuropsychol. 33 , 287–304 (2018).

Moutselos, K., Kyriazis, D. & Maglogiannis, I. A web based modular environment for assisting health policy making utilizing big data analytics. In 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA) 1–5 (2018).

Nair, L. R., Shetty, S. D. & Shetty, S. D. Applying spark based machine learning model on streaming big data for health status prediction. Comput. Electr. Eng. 65 , 393–399 (2018).

Pashazadeh, A. & Navimipour, N. J. Big data handling mechanisms in the healthcare applications: A comprehensive and systematic literature review. J. Biomed. Inform. 82 , 47–62 (2018).

Ravishankar Rao, A., Clarke, D. & Vargas, M. Building an open health data analytics platform: A case study examining relationships and trends in seniority and performance in healthcare providers. J. Healthc. Inform. Res. 2 , 44–70 (2018).

Sahoo, P. K., Mohapatra, S. K. & Wu, S.-L. SLA based healthcare big data analysis and computing in cloud network. J. Parallel Distrib. Comput. 119 , 121–135 (2018).

Sarkar, B. K. & Sana, S. S. A conceptual distributed framework for improved and secured healthcare system. Int. J. Healthc. Manag. 1–13 (2018).

Sebaa, A., Chikh, F., Nouicer, A. & Tari, A. Medical big data warehouse: architecture and system design, a case study: Improving healthcare resources distribution. J. Med. Syst. 42 , 59 (2018).

Shafqat, S., Kishwer, S., Rasool, R. U., Qadir, J., Amjad, T. & Ahmad, H. F. Big data analytics enhanced healthcare systems: A review. J. Supercomput.

Sivaparthipan, C. B., Karthikeyan, N. & Karthik, S. Designing statistical assessment healthcare information system for diabetics analysis using big data. Multimed. Tools Appl.

Tang, V. et al. An adaptive clinical decision support system for serving the elderly with chronic diseases in healthcare industry. Expert. Syst. 36 , e12369 (2018).

Wang, Y., Kung, L. & Byrd, T. A. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change 126 , 3–13 (2018).

Agrawal, A. & Choudhary, A. Health services data: Big data analytics for deriving predictive healthcare insights. In Health Services Evaluation 3–18 (2019).

Ahmed, M., Choudhury, S. & Al-Turjman, F. Big data analytics for intelligent internet of things. In Artificial Intelligence in IoT 107–127 (Springer, 2019).

Ahmed, Z. & Liang, B. T. Systematically dealing practical issues associated to healthcare data analytics. In Future of Information and Communication Conference 599–613 (2019).

Bora, D. J. Chapter 3—Big data analytics in healthcare: A critical analysis. In Big Data Analytics for Intelligent Healthcare Management (eds Dey, N. et al. ) 43–57 (Academic Press, 2019).

Chanchaichujit, J., Tan, A., Meng, F. & Eaimkhong, S. Internet of Things (IoT) and big data analytics in healthcare. In Healthcare 4.0 17–36 (Springer, 2019).

Cirillo, D. & Valencia, A. Big data analytics for personalized medicine. Curr. Opin. Biotechnol. 58 , 161–167 (2019).

Dey, N., Das, H., Naik, B. & Behera, H. S. Big Data Analytics for Intelligent Healthcare Management (Academic Press, 2019).

Din, S. & Paul, A. Smart health monitoring and management system: Toward autonomous wearable sensing for Internet of Things using big data analytics. Future Gener. Comput. Syst. 91 , 611–619 (2019).

Galetsi, P., Katsaliaki, K. & Kumar, S. Values, challenges and future directions of big data analytics in healthcare: A systematic review. Soc. Sci. Med. 241 , 112533 (2019).

Guo, C. & Chen, J. Big data analytics in healthcare: data-driven methods for typical treatment pattern mining. J. Syst. Sci. Syst. Eng. 28 , 694–714 (2019).

Hussain, S. et al. Semantic preservation of standardized healthcare documents in big data. Int. J. Med. Inform. 129 , 133–145 (2019).

Mehta, N., Pandit, A. & Shukla, S. Transforming healthcare with big data analytics and artificial intelligence: A systematic mapping study. J. Biomed. Inform. 100 , 103311 (2019).

Muniasamy, A., Tabassam, S., Hussain, M. A., Sultana, H., Muniasamy, V. & Bhatnagar, R. Deep learning for predictive analytics in healthcare. In International Conference on Advanced Machine Learning Technologies and Applications 32–42 (2019).

Palanisamy, V. & Thirunavukarasu, R. Implications of big data analytics in developing healthcare frameworks–A review. J. King Saud Univ. Comput. Inf. Sci. 31 , 415–425 (2019).

Rajabion, L., Shaltooki, A. A., Taghikhah, M., Ghasemi, A. & Badfar, A. Healthcare big data processing mechanisms: The role of cloud computing. Int. J. Inf. Manag. 49 , 271–289 (2019).

Ramasamy, V., Gomathy, B. & Verma, R. K. Smart HIV/AIDS digital system using big data analytics. In Progress in Advanced Computing and Intelligent Engineering 415–421 (Springer, 2019).

Razzak, M. I., Imran, M. & Xu, G. Big data analytics for preventive medicine. Neural Comput. Appl.

Reiz, A. N., de la Hoz, M. A. & García, M. S. Big data analysis and machine learning in intensive care units. Med. Intensiva 43 , 416–426 (2019).

Saheb, T. & Izadi, L. Paradigm of IoT big data analytics in the healthcare industry: A review of scientific literature and mapping of research trends. Telematics Inform. 41 , 70–85 (2019).

Sahoo, A. K. et al. Chapter 9—Intelligence-based health recommendation system using big data analytics. In Big Data Analytics for Intelligent Healthcare Management (eds Dey, N. et al. ) 227–246 (Academic Press, 2019).

Shahbaz, M., Gao, C., Zhai, L., Shahzad, F. & Hu, Y. Investigating the adoption of big data analytics in healthcare: The moderating role of resistance to change. J. Big Data 6 , 6 (2019).

Sivaparthipan, C. B. et al. Innovative and efficient method of robotics for helping the Parkinson’s disease patient using IoT in big data analytics. Trans. Emerg. Telecommun. Technol. 31 , e3838 (2019).

Sousa, M. J., Pesqueira, A. N. M., Lemos, C., Sousa, M. & Rocha, Ãl. Decision-making based on big data analytics for people management in healthcare organizations. J. Med. Syst. 43 , 290 (2019).

Strang, K. D. Problems with research methods in medical device big data analytics. Int. J. Data Sci. Anal.

Thomas, J., Kneale, D., McKenzie, J. E., Brennan, S. E. & Bhaumik, S. Determining the scope of the review and the questions it will address. In Cochrane Handbook for Systematic Reviews of Interventions 13–31 (2019).

Wang, Y., Kung, L., Gupta, S. & Ozdemir, S. Leveraging big data analytics to improve quality of care in healthcare organizations: A configurational perspective. Br. J. Manag. 30 , 362–388 (2019).

Zetino, J. & Mendoza, N. Big data and its utility in social work: Learning from the big data revolution in business and healthcare. Soc. Work Public Health 34 , 409–417 (2019).

Nazir, S., Nawaz, M., Adnan, A., Shahzad, S. & Asadi, S. Big data features, applications, and analytics in cardiology—A systematic literature review. IEEE Access 7 , 143742–143771 (2019).

Shah, G., Shah, A. & Shah, M. Panacea of challenges in real-world application of big data analytics in healthcare sector. J. Data Inf. Manag. 1 , 107–116 (2019).

Galetsi, P., Katsaliaki, K. & Kumar, S. Big data analytics in health sector: Theoretical framework, techniques and prospects. Int. J. Inf. Manag. 50 , 206–216 (2020).

Iyengar, S. P., Acharya, H. & Kadam, M. Big data analytics in healthcare using spreadsheets. In Big Data Analytics in Healthcare 155–187 (Springer, 2002).

Kumar, S. A. & Venkatesulu, M. BrownBoost classifier-based bloom hash data storage for healthcare big data analytics. In Information and Communication Technology for Sustainable Development 53–69 (Springer, 2020).

Kumar, Y., Sood, K., Kaul, S. & Vasuja, R. Big data analytics and its benefits in healthcare. In Big Data Analytics in Healthcare 3–21 (Springer, 2020).

Naqishbandi, T. A. & Ayyanathan, N. Clinical big data predictive analytics transforming healthcare:-An integrated framework for promise towards value based healthcare. In Advances in Decision Sciences 545–561 (Springer, 2020).

Lambay, M. A. & Mohideen, S. P. Big data analytics for healthcare recommendation systems. In 2020 International Conference on System, Computation, Automation and Networking (ICSCAN) 1–6 (2020).

Katarya, R. & Jain, S. Exploration of big data analytics in healthcare analytics. In 2020 4th International Conference on Computer, Communication and Signal Processing (ICCCSP) 1–4 (2020).

Javid, T., Faris, M., Beenish, H. & Fahad, M. Cybersecurity and data privacy in the cloudlet for preliminary healthcare big data analytics. In 2020 International Conference on Computing and Information Technology (ICCIT-1441) 1–4 (2020).

Leung, C. K., Chen, Y., Hoi, C. S. H., Shang, S. & Cuzzocrea, A. Machine learning and OLAP on big COVID-19 data. In 2020 IEEE International Conference on Big Data (Big Data) 5118–5127 (2020).

Akhtar, U., Lee, J. W., Bilal, H. S. M., Ali, T., Khan, W. A. & Lee, S. The impact of big data in healthcare analytics. In 2020 International Conference on Information Networking (ICOIN) 61–63 (2020).

Mung, P. S. & Phyu, S. Effective analytics on healthcare big data using ensemble learning. In 2020 IEEE Conference on Computer Applications (ICCA) 1–4 (2002).

Georgakopoulos, S. V., Gallos, P. & Plagianakos, V. P. Using big data analytics to detect fraud in healthcare provision. In 2020 IEEE 5th Middle East and Africa Conference on Biomedical Engineering (MECBME) 1–3 (2020).

Leung, C. K., Chen, Y., Shang, S. & Deng, D. Big data science on COVID-19 Data. In 2020 IEEE 14th International Conference on Big Data Science and Engineering (BigDataSE) 14–21 (2020).

Juddoo, S. & George, C. A Qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry. In 2020 3rd International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM) 58–66 (2020).

Chauhan, R. & Yafi, E. Big data analytics for prediction modelling in healthcare databases. In 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM) 1–5 (2021).

Islam, M., Karim, R., Khatun, M. A. & Reza, S. A research on big data analytics in healthcare industry. In 2020 International Conference on Information Science and Communications Technologies (ICISCT) 1–5 (2020).

Leung, C. K., Chen, Y., Hoi, C. S. H., Shang, S., Wen, Y. & Cuzzocrea, A. Big data visualization and visual analytics of COVID-19 data. In 2020 24th International Conference Information Visualisation (IV) 415–420 (2020).

Balaji, S. & Prasathkumar, V. Dynamic changes by big data in health care. In 2020 International Conference on Computer Communication and Informatics (ICCCI) 1–4 (2020).

Alahmar, A. & Benlamri, R. Optimizing hospital resources using big data analytics with standardized e-clinical pathways. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) 650–657 (2020).

Sadineni, P. K. Developing a model to enhance the quality of health informatics using big data. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) 1267–1272 (2020).

Pramanik, M. I. et al. Healthcare informatics and analytics in big data. Expert Syst. Appl. 152 , 113388 (2020).

Ravikumaran, P., Vimala Devi, K., Kartheeban, K. & Narayanan Prasanth, N. Health data analytics: Framework & review on tool & technology. Mater. Today Proc. (2020).

Ramesh, T. & Santhi, V. Exploring big data analytics in health care. Int. J. Intell. Netw. 1 , 135–140 (2020).

Galetsi, P. & Katsaliaki, K. A review of the literature on big data analytics in healthcare. J. Oper. Res. Soc. 71 , 1511–1529 (2020).

Mehta, N., Pandit, A. & Kulkarni, M. Elements of healthcare big data analytics. In Big Data Analytics in Healthcare 23–43 (Springer, 2020).

Ehwerhemuepha, L. et al. HealtheDataLab–a cloud computing solution for data science and advanced analytics in healthcare with application to predicting multi-center pediatric readmissions. BMC Med. Inform. Decis. Mak. 20 , 1–12 (2020).

Sivasangari, A., Lakshmanan, L., Ajitha, P., Deepa, D. & Jabez, J. Big data analytics for 5G-enabled IoT healthcare. In Blockchain for 5G-Enabled IoT 261.

Ma, S. & Huai, J. Approximate computation for big data analytics. SIGWEB Newsl. (2021).

Uzunbaz, S. & Aref, W. G. Shared execution techniques for business data analytics over big data streams. In Presented at the 32nd International Conference on Scientific and Statistical Database Management, Vienna, Austria (2020).

Chalumporn, G. & Hewett, R. Health data analytics with an opportunistic big data algorithm. In Presented at the Proceedings of the 11th International Conference on Advances in Information Technology, Bangkok, Thailand (2020).

Minami, T. & Ohura, Y. Small data analysis for bigger data analysis. In Presented at the 2021 Workshop on Algorithm and Big Data, Fuzhou, China (2021).

Chakraborty, C. & Rathi, M. Chapter 2—Smart healthcare systems using big data. In Demystifying Big Data, Machine Learning, and Deep Learning for Healthcare Analytics (eds Kautish, P. N. S. & Peng, S.-L.) 17–32 (Academic Press, 2021).

Ilmudeen, A. Chapter 3—Big data-based frameworks for healthcare systems. In Demystifying Big Data, Machine Learning, and Deep Learning for Healthcare Analytics (eds Kautish, P. N. S. & Peng, S.-L.) 33–56 (Academic Press, 2021).

Mendhe, C. H., Henderson, N., Srivastava, G. & Mago, V. A scalable platform to collect, store, visualize, and analyze big data in real time. IEEE Trans. Comput. Soc. Syst. 8 , 260–269 (2021).

Sivabalaselvamani, D., Selvakarthi, D., Yogapriya, J., Thiruvenkatasuresh, M. P., Maruthappa, M. & Chandra, A. S. Artificial Intelligence in data-driven analytics for the personalized healthcare. In 2021 International Conference on Computer Communication and Informatics (ICCCI) 1–5 (2021)

Harb, H., Mansour, A., Nasser, A., Cruz, E. M. & de la Torre Diez, I. A sensor-based data analytics for patient monitoring in connected healthcare applications. IEEE Sens. J. 21 , 974–984 (2021).

Article   ADS   CAS   Google Scholar  

Jones, J. & Jones, J. Optimizing healthcare. In 2020 IEEE International Conference on E-health Networking, Application & Services (HEALTHCOM) 1–6 (2021).

Hassan, S., Dhali, M., Zaman, F. & Tanveer, M. Big data and predictive analytics in healthcare in Bangladesh: Regulatory challenges. Heliyon 7 , e07179 (2021).

Khan, S. et al. KNN and ANN-based recognition of handwritten pashto letters using zoning features. Mach. Learn. 9 , 570–577 (2018).

Pant, D., Kumar, V., Kishore, J. & Pal, R. Healthcare data modeling in R. In 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM) 230–233 (2017).

Brennan, P. F. & Bakken, S. Nursing needs big data and big data needs nursing. J. Nurs. Scholarsh. 47 , 477–484 (2015).

Sreedevi, A. G., Nitya Harshitha, T., Sugumaran, V. & Shankar, P. Application of cognitive computing in healthcare, cybersecurity, big data and IoT: A literature review. Inform. Process. Manag. 59 , 102888 (2022).

Sinha, A., Hripcsak, G. & Markatou, M. Large datasets in biomedicine: A discussion of salient analytic issues. J. Am. Med. Inform. Assoc. JAMIA 16 , 759–767 (2009).

Alonso-Betanzos, A. & Bolón-Canedo, V. Big-Data analysis, cluster analysis, and machine-learning approaches (2018).

Dayal, M. & Singh, N. Indian health care analysis using big data programming tool. Procedia Comput. Sci. 89 , 521–527 (2016).

Jayaraman, P. P., Forkan, A. R. M., Morshed, A., Haghighi, P. D. & Kang, Y.-B. Healthcare 4.0: A review of frontiers in digital health. WIREs Data Min. Knowl. Discov. 10 , e1350 (2018).

Gallos, P. et al. CrowdHEALTH: Big data analytics and holistic health records. Stud. Health Technol. Inform. 258 , 255–256 (2019).

Wang, L., Ranjan, R., Kołodziej, J., Zomaya, A. & Alem, L. Software tools and techniques for big data computing in healthcare clouds. Future Gener. Comput. Syst. 43–44 , 38–39 (2015).

Kiourtis, A. et al. An autoscaling platform supporting graph data modelling big data analytics. Stud. Health Technol. Inform. 295 , 376–379 (2022).

Download references

Acknowledgements

This research work is performed by Department of Accounting and Information Systems, Collage of Business and Economics, Qatar University in collaboration with the Department of Computer Science, University of Swabi, Swabi, Pakistan.

Open Access funding provided by the Qatar National Library. This research was funded by Qatar University Internal Grant under Grant No. IRCC-2021–010. The findings achieved herein are solely the responsibility of the authors.

Author information

Authors and affiliations.

Department of Accounting and Information Systems, College of Business and Economics, Qatar University, Doha, Qatar

Sulaiman Khan & Habib Ullah Khan

Department of Computer Science, University of Swabi, Swabi, Pakistan

You can also search for this author in PubMed   Google Scholar

Contributions

S.K. wrote the original draft of the paper. He also revised the draft based on the reviewers suggestions. Dr. H.U.K. developed the experimental setup for the proposed systematic research work. Dr. S.N. performed articles accumulation and database development process.

Corresponding author

Correspondence to Habib Ullah Khan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Khan, S., Khan, H.U. & Nazir, S. Systematic analysis of healthcare big data analytics for efficient care and disease diagnosing. Sci Rep 12 , 22377 (2022). https://doi.org/10.1038/s41598-022-26090-5

Download citation

Received : 09 September 2022

Accepted : 09 December 2022

Published : 26 December 2022

DOI : https://doi.org/10.1038/s41598-022-26090-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

the systematic analysis of large databases to solve problems and make informed decisions

  • Español – América Latina
  • Português – Brasil

What is Big Data?

Big data refers to extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time. These datasets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them. 

The amount and availability of data is growing rapidly, spurred on by digital technology advancements, such as connectivity, mobility, the Internet of Things (IoT), and artificial intelligence (AI). As data continues to expand and proliferate, new big data tools are emerging to help companies collect, process, and analyze data at the speed needed to gain the most value from it. 

Big data describes large and diverse datasets that are huge in volume and also rapidly grow in size over time. Big data is used in machine learning, predictive modeling, and other advanced analytics to solve business problems and make informed decisions.

Read on to learn the definition of big data, some of the advantages of big data solutions, common big data challenges, and how Google Cloud is helping organizations build their data clouds to get more value from their data. 

Big data examples

Data can be a company’s most valuable asset. Using big data to reveal insights can help you understand the areas that affect your business—from market conditions and customer purchasing behaviors to your business processes. 

Here are some big data examples that are helping transform organizations across every industry: 

  • Tracking consumer behavior and shopping habits to deliver hyper-personalized retail product recommendations tailored to individual customers
  • Monitoring payment patterns and analyzing them against historical customer activity to detect fraud in real time
  • Combining data and information from every stage of an order’s shipment journey with hyperlocal traffic insights to help fleet operators optimize last-mile delivery
  • Using AI-powered technologies like natural language processing to analyze unstructured medical data (such as research reports, clinical notes, and lab results) to gain new insights for improved treatment development and enhanced patient care
  • Using image data from cameras and sensors, as well as GPS data, to detect potholes and improve road maintenance in cities
  • Analyzing public datasets of satellite imagery and geospatial datasets to visualize, monitor, measure, and predict the social and environmental impacts of supply chain operations

These are just a few ways organizations are using big data to become more data-driven so they can adapt better to the needs and expectations of their customers and the world around them. 

The Vs of big data

Big data definitions may vary slightly, but it will always be described in terms of volume, velocity, and variety. These big data characteristics are often referred to as the “3 Vs of big data” and were first defined by Gartner in 2001. 

In addition to these three original Vs, three others that are often mentioned in relation to harnessing the power of big data: veracity , variability , and value .  

  • Veracity : Big data can be messy, noisy, and error-prone, which makes it difficult to control the quality and accuracy of the data. Large datasets can be unwieldy and confusing, while smaller datasets could present an incomplete picture. The higher the veracity of the data, the more trustworthy it is.
  • Variability: The meaning of collected data is constantly changing, which can lead to inconsistency over time. These shifts include not only changes in context and interpretation but also data collection methods based on the information that companies want to capture and analyze.
  • Value: It’s essential to determine the business value of the data you collect. Big data must contain the right data and then be effectively analyzed in order to yield insights that can help drive decision-making. 

How does big data work?

The central concept of big data is that the more visibility you have into anything, the more effectively you can gain insights to make better decisions, uncover growth opportunities, and improve your business model. 

Making big data work requires three main actions: 

  • Integration: Big data collects terabytes, and sometimes even petabytes, of raw data from many sources that must be received, processed, and transformed into the format that business users and analysts need to start analyzing it. 
  • Management: Big data needs big storage, whether in the cloud, on-premises, or both. Data must also be stored in whatever form required. It also needs to be processed and made available in real time. Increasingly, companies are turning to cloud solutions to take advantage of the unlimited compute and scalability.  
  • Analysis: The final step is analyzing and acting on big data—otherwise, the investment won’t be worth it. Beyond exploring the data itself, it’s also critical to communicate and share insights across the business in a way that everyone can understand. This includes using tools to create data visualizations like charts, graphs, and dashboards. 

Big data benefits

Improved decision-making.

Big data is the key element to becoming a data-driven organization. When you can manage and analyze your big data, you can discover patterns and unlock insights that improve and drive better operational and strategic decisions.

Increased agility and innovation

Big data allows you to collect and process real-time data points and analyze them to adapt quickly and gain a competitive advantage. These insights can guide and accelerate the planning, production, and launch of new products, features, and updates. 

Better customer experiences

Combining and analyzing structured data sources together with unstructured ones provides you with more useful insights for consumer understanding, personalization, and ways to optimize experience to better meet consumer needs and expectations.

Continuous intelligence

Big data allows you to integrate automated, real-time data streaming with advanced data analytics to continuously collect data, find new insights, and discover new opportunities for growth and value. 

More efficient operations

Using big data analytics tools and capabilities allows you to process data faster and generate insights that can help you determine areas where you can reduce costs, save time, and increase your overall efficiency. 

Improved risk management

Analyzing vast amounts of data helps companies evaluate risk better—making it easier to identify and monitor all potential threats and report insights that lead to more robust control and mitigation strategies.

Challenges of implementing big data analytics

While big data has many advantages, it does present some challenges that organizations must be ready to tackle when collecting, managing, and taking action on such an enormous amount of data. 

The most commonly reported big data challenges include: 

  • Lack of data talent and skills. Data scientists, data analysts, and data engineers are in short supply—and are some of the most highly sought after (and highly paid) professionals in the IT industry. Lack of big data skills and experience with advanced data tools is one of the primary barriers to realizing value from big data environments. 
  • Speed of data growth. Big data, by nature, is always rapidly changing and increasing. Without a solid infrastructure in place that can handle your processing, storage, network, and security needs, it can become extremely difficult to manage. 
  • Problems with data quality. Data quality directly impacts the quality of decision-making, data analytics, and planning strategies. Raw data is messy and can be difficult to curate. Having big data doesn’t guarantee results unless the data is accurate, relevant, and properly organized for analysis. This can slow down reporting, but if not addressed, you can end up with misleading results and worthless insights. 
  • Compliance violations. Big data contains a lot of sensitive data and information, making it a tricky task to continuously ensure data processing and storage meet data privacy and regulatory requirements, such as data localization and data residency laws. 
  • Integration complexity. Most companies work with data siloed across various systems and applications across the organization. Integrating disparate data sources and making data accessible for business users is complex, but vital, if you hope to realize any value from your big data. 
  • Security concerns. Big data contains valuable business and customer information, making big data stores high-value targets for attackers. Since these datasets are varied and complex, it can be harder to implement comprehensive strategies and policies to protect them. 

How are data-driven businesses performing?

Some organizations remain wary of going all in on big data because of the time, effort, and commitment it requires to leverage it successfully. In particular, businesses struggle to rework established processes and facilitate the cultural change needed to put data at the heart of every decision.  

But becoming a data-driven business is worth the work. Recent research shows: 

  • 58% of companies that make data-based decisions are more likely to beat revenue targets than those that don't
  • Organizations with advanced insights-driven business capabilities are 2.8x more likely to report double-digit year-over-year growth
  •  Data-driven organizations generate, on average, more than 30% growth per year

The enterprises that take steps now and make significant progress toward implementing big data stand to come as winners in the future. 

Big data strategies and solutions

Developing a solid data strategy starts with understanding what you want to achieve, identifying specific use cases, and the data you currently have available to use. You will also need to evaluate what additional data might be needed to meet your business goals and the new systems or tools you will need to support those. 

Unlike traditional data management solutions, big data technologies and tools are made to help you deal with large and complex datasets to extract value from them. Tools for big data can help with the volume of the data collected, the speed at which that data becomes available to an organization for analysis, and the complexity or varieties of that data. 

For example, data lakes ingest, process, and store structured, unstructured, and semi-structured data at any scale in its native format. Data lakes act as a foundation to run different types of smart analytics, including visualizations, real-time analytics, and machine learning . 

It’s important to keep in mind that when it comes to big data—there is no one-size-fits-all strategy. What works for one company may not be the right approach for your organization’s specific needs. 

Here are four key concepts that our Google Cloud customers have taught us about shaping a winning approach to big data: 

Solve your business challenges with Google Cloud

How to get started with big data for your business.

BigQuery icon

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Start your next project, explore interactive tutorials, and manage your account.

  • Need help getting started? Contact sales
  • Work with a trusted partner Find a partner
  • Continue browsing See all products
  • Get tips & best practices See tutorials

DZone

  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
  • Manage My Drafts

Modernizing APIs : Share your thoughts on GraphQL, AI, microservices, automation , and more for our April report (+ enter a raffle for $250!).

DZone Research Report : A look at our developer audience, their tech stacks, and topics and tools they're exploring.

Getting Started With Large Language Models : A guide for both novices and seasoned practitioners to unlock the power of language models.

Managing API integrations : Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

  • How Do You Know If a Graph Database Solves the Problem?
  • Why Is SQL Knowledge Vital for Data Scientists? A Sneak Peek
  • Deploying CockroachDB on Kubernetes using OpenEBS LocalPV
  • The Magic of Apache Spark in Java
  • Microservices vs. Monolith at a Startup: Making the Choice
  • Exploring the Power of the Functional Programming Paradigm
  • Five Java Developer Must-Haves for Ultra-Fast Startup Solutions
  • Implementing CI/CD Pipelines With Jenkins and Docker
  • Data Engineering

Problems Being Solved With Databases — Executives' Perspectives

Databases are enabling companies to use data to inform real-time decisions about their business as well as to use predictive analytics to make better informed, real-time decisions..

Tom Smith user avatar

Join the DZone community and get the full member experience.

To gather insights for  DZone's  Data Persistence  Research Guide , scheduled for release in March, 2016, we spoke to 16 executives, from 13 companies, who develop databases and manage persistent data in their own company or help clients do so.

Here's who we talked to:

Satyen Sangani, CEO,  Alation  | Sam Rehman, CTO,  Arxan  | Andy Warfield, Co-Founder/CTO,  Coho Data  | Rami Chahine, V.P. Product Management and Dan Potter, CMO,  Datawatch  | Eric Frenkiel, Co-Founder/CEO,  MemSQL  | Will Shulman, CEO,  MongoLab  | Philip Rathle, V.P. of Product,  Neo Technology  | Paul Nashawaty, Product Marketing and Strategy,  Progress  | Joan Wrabetz, CTO,  Qualisystems  | Yiftach Shoolman, Co-Founder and CTO and Leena Joshi, V.P. Product Marketing,  Redis Labs  | Partha Seetala, CTO,  Robin Systems  | Dale Lutz, Co-Founder, and Paul Nalos, Database Team Lead,  Safe Software  | Jon Bock, VP of Product and Marketing,  Snowflake Computing

The macro-trend is that more data is being analyzed in real-time. The internet of connected things enables you to see how things interact. Applications are tending to use multiple databases to provide polyglot persistence.

Here's what we heard when we asked, "What problems are being solved with databases?":

Huge evolution is happening very rapidly. Relational databases with big iron are long gone. We are using distributed data and accessing information from different sources . Different databases provide different access. We may use Spark SQL for transactional data and then a warehouse for mining legacy data. Each database has a different look but they all need to have holistic access.

People are being  more predictive with analytics . Databases used to be reporting engines. Now they support predictive enabling advertisers to know an audience’s receptivity to a banner ad, a retailer can use a customer profile to provide a special offer, predictive analytics can be used to route package systems from point A to point B by helping to determine how to staff and assign drivers.

Companies are using databases to learn about their business faster . Safeway uses Teradata and Alation for their loyalty card to market better, provide better service, predict what people will buy, and churn. Ebay is using it to understand how to instruct the website when presenting the customer an offer—an improved user experience (UX). Learning and measuring what people are doing to provide a better customer experience (CX). We track how people are using the software to improve it—one set of techniques providing better results.

The ability to test before making changes without disturbing the persistent data ensures that the customer experience (CX) improves rather than deteriorates.

New generation databases solely operate in memory (i.e. Apache Ignite and Spark) enabling data to be read as fast as possible. We expect data to be retrieved quickly. We solve problems by retrieving data faster than before. Need to solve for persistence for analytics. This enables us to do many things much faster .

The ability to get instant insights into what the business is experiencing. Oil and gas IoT drill bits let the drillers know when a drill bit is about to break. Advertisers can deliver the right ad to the right customer in 10 milliseconds. Banks have real-time risk management. Real-time logistics has led to on-demand ride sourcing.

Traditionally stored data and answered inquiries. Mesh with governance, audit, and meta data . The difference between traditional and database applications is increasingly blurred. Virtual machines and containers enable different types of databases like data stores for virtualization. Twenty years ago there were seven to ten applications by Microsoft. Now, most large companies are doing their own application development. Support is becoming broader. We need a storage platform for traditional and new forms.

A general class of problems tries to get information from individual pieces of data. Some queries grab individual pieces of data. The macro-trend is that more and more data will be analyzed in real time . Start to find more intelligent ways for recommendations, fraud detection, access rights, and IoT. How to deal with volumes of sensor data. Internet of connected things—what happens when you see how things interact, you can get into all of the systems and the necessary data and avoid those that it doesn’t need. Understand connections and data relationships in the graph. As databases proliferate, how do we keep track of the data in all of the places. What’s an original and what’s a copy?

People are noticing that different parts of an application will have different needs. Specialized databases do specific things better (i.e. graphing the database of a social network and crawling links). Other databases scale better. The trend is towards specialization with apps using two, three, or more databases for polyglot persistence. At the end of the day, all databases store, organize, and retrieve data. Query interfaces may differ (i.e. key value stores and robust languages).

Mobile, IoT, and distributed applications have led to the distributed nature of data . The cloud has solved some of the problems but also resulted in more distributed data. While it would be ideal for all data to be in a single location, this is no longer realistic. You must have data accessibility across the board. This raises issues around security and privacy. Keys and credentials are critical to managing data. Security is a difficult subject. The more you can tweak and customize to protect the data, the better (i.e. access tokens). Data is critical, don’t let it get outside of control zones. Implement enterprise level policies.

Organizations that implement this strategy benefit from improved performance in retrieving subsets of the data from large datasets . They can also reuse data for different purposes and perform querying, exploring, and mining to create value from data. They’re ultimately creating a shared operational picture, while providing different views of the same data to different users depending on their needs.

There are many use cases. Twitter timeline is based on the Redis time series database.

On the one end there are quick data use cases with non-relational clickstream data allowing data to be shared by multiple data science teams without copying from one source to another. Since this takes time, it must be tracked and costs more to store.

Opinions expressed by DZone contributors are their own.

Partner Resources

  • About DZone
  • Send feedback
  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone
  • Terms of Service
  • Privacy Policy
  • 3343 Perimeter Hill Drive
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

Advertisement

Advertisement

Big data and disaster management: a systematic review and agenda for future research

  • Applications of OR in Disaster Relief Operations, Part II
  • Published: 21 August 2017
  • Volume 283 , pages 939–959, ( 2019 )

Cite this article

  • Shahriar Akter 1 &
  • Samuel Fosso Wamba 2  

9021 Accesses

182 Citations

3 Altmetric

Explore all metrics

The era of big data and analytics is opening up new possibilities for disaster management (DM). Due to its ability to visualize, analyze and predict disasters, big data is changing the humanitarian operations and crisis management dramatically. Yet, the relevant literature is diverse and fragmented, which calls for its review in order to ascertain its development. A number of publications have dealt with the subject of big data and its applications for minimizing disasters. Based on a systematic literature review, this study examines big data in DM to present main contributions, gaps, challenges and future research agenda. The study presents the findings in terms of yearly distribution, main journals, and most cited papers. The findings also show a classification of publications, an analysis of the trends and the impact of published research in the DM context. Overall the study contributes to a better understanding of the importance of big data in disaster management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Adriana, L., de Irineu, B, Jr., Eduardo Queiroz, P., & Tábata Rejane, B. (2014). Literature review of humanitarian logistics research: Trends and challenges. Journal of Humanitarian Logistics and Supply Chain Management , 4 , 95–130.

Google Scholar  

Agarwal, R., & Dhar, V. (2014). Editorial-big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research , 25 , 443–448.

Akter, S., & Wamba, S. F. (2016). Big data analytics in E-commerce: A systematic review and agenda for future research. Electronic Markets , 1–22.

Akter, S., Wamba, S. F., Gunasekaran, A., Dubey, R., & Childe, S. J. (2016). How to improve firm performance using big data analytics capability and business strategy alignment? International Journal of Production Economics , 182 , 113–131.

Alamdar, F., et al. (2016). Towards multi-agency sensor information integration for disaster management. Computers, Environment and Urban Systems , 56 , 68–85.

Altay, N., & Green, W. G. (2006). OR/MS research in disaster operations management. European Journal of Operational Research , 175 , 475–493.

Ang, L.-M., & Seng, K. P. (2016). Big sensor data applications in urban environments. Big Data Research , 4 , 1–12.

Araz, O. M. (2014). Using google flu trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska. The American Journal of Emergency Medicine , 32 (9), 1016–1023.

Barney, J. (1991). Firm resources and sustained competitive advantage. Journal of Management , 17 , 99–120.

Barton, D., & Court, D. (2012). Making advanced analytics work for you. Harvard Business Review , 90 , 78.

Beath, C., Becerra-Fernandez, I., Ross, J., & Short, J. (2012). Finding value in the information explosion. MIT Sloan Management Review , 53 , 18–20.

Bengtsson, L., Lu, X., Thorson, A., Garfield, R., & Von Schreeb, J. (2011). Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: A post-earthquake geospatial study in Haiti. PLoS Medicine , 8 (8), e1001083.

Bish, D., Agca, E., & Glick, R. (2014). Decision support for hospital evacuation and emergency response. Annals of Operations Research , 221 , 89–106.

Bostenaru Dan, M., & Armas, I. (2015). Earthquake impact on settlements: The role of urban and structural morphology. Natural Hazards and Earth System Sciences , 15 (10), 2283–2297.

Bouchard, L., Albertini, M., Batista, R., & de Montigny, J. (2015). Research on health inequalities: A bibliometric analysis (1966–2014). Social Science & Medicine , 141 , 100–108.

Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society , 15 (5), 662–679.

Bruns, A., & Liang, Y. E. (2012). Tools and methods for capturing Twitter data during natural disasters. First Monday , 17 (4).

Carley, K. M., et al. (2016). Crowd sourcing disaster management: The complex nature of Twitter usage in Padang Indonesia. Safety Science , 90 , 48–61.

Chang, C.-I., & Lo, C.-C. (2016). Planning and implementing a smart city in Taiwan. IT Professional , 18 , 42–49.

Chang, V. (2015). Towards a big data system disaster recovery in a private cloud. Ad Hoc Networks , 35 , 65–82.

Cherichi, S., & Faiz, R. (2016). Upgrading event and pattern detection to big data. In International conference on computational collective intelligence . Springer.

Chung, K., & Park, R. C. (2016). P2P cloud network services for IoT based disaster situations information. Peer-to-Peer Networking and Applications , 9 (3), 566–577.

Cinnamon, J., Jones, S. K., & Adger, W. N. (2016). Evidence and future potential of mobile phone data for disease disaster management. Geoforum , 75 , 253–264.

Collins, M., et al. (2016). Communication in a disaster-the development of a crisis communication tool within the S-HELP project. Journal of Decision Systems , 25 (sup1), 160–170.

Cooper G. P. Jr., et al. (2011). Twitter as a potential disaster risk reduction tool. Part I: Introduction, terminology, research and operational applications. PLoS currents, 7 .

Craglia, M., Ostermann, F., & Spinsanti, L. (2012). Digital Earth from vision to practice: Making sense of citizen-generated content. International Journal of Digital Earth , 5 (5), 398–416.

Crawford, K., & Finn, M. (2015). The limits of crisis data: Analytical and ethical challenges of using social and mobile data to understand disasters. GeoJournal , 80 (4), 491–502.

Crooks, A., Croitoru, A., Stefanidis, A., & Radzikowski, J. (2013). # Earthquake: Twitter as a distributed sensor system. Transactions in GIS , 17 (1), 124–147.

Cutts, B. B., et al. (2015). Environmental justice and emerging information communication technology: A review for US natural disaster management. Environmental Justice , 8 (4), 144–150.

Davenport, T. H. (2013a). Analytics 3.0. Harvard Business Review , 91 , 64–72.

Davenport, T. H. (2013b). Keep up with your quants. Harvard Business Review , 91 , 120–123.

Davenport, T., Barth, P., & Bean, R. (2012). How ‘big data’ is different. MIT Sloan Management Review , 54 , 43–46.

De Albuquerque, J. P., et al. (2015). A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. International Journal of Geographical Information Science , 29 (4), 667–689.

De Gennaro, M., Paffumi, E., & Martini, G. (2016). Big data for supporting low-carbon road transport policies in Europe: Applications, challenges and opportunities. Big Data Research , 6 , 11–25.

De Longueville, B., Annoni, A., Schade, S., Ostlaender, N., & Whitmore, C. (2010). Digital earth’s nervous system for crisis events: Real-time sensor web enablement of volunteered geographic information. International Journal of Digital Earth , 3 (3), 242–259.

Drosio, S., & Stanek, S. (2016). The big data concept as a contributor of added value to crisis decision support systems. Journal of Decision Systems , 25 , 228–239.

Dufty, N. (2016). Twitter turns ten: Its use to date in disaster management. Australian Journal of Emergency Management , 31 (2), 50.

Earle, P., Guy, M., Buckmaster, R., Ostrum, C., Horvath, S., & Vaughan, A. (2010). OMG earthquake! Can Twitter improve earthquake response? Seismological Research Letters , 81 (2), 246–251.

Emmanouil, D., & Nikolaos, D. (2015). Big data analytics in prevention, preparedness, response and recovery in crisis and disaster management. In The 18th international conference on circuits, systems, communications and computers (CSCC 2015). Recent advances in computer engineering series (Vol. 32, pp. 476–482).

Erdelj, M., Natalizio, E., Chowdhury, K. R., & Akyildiz, I. F. (2017). Help from the sky: Leveraging UAVs for disaster management. IEEE Pervasive Computing , 16 (1), 24–32.

Fahimnia, B., Sarkis, J., & Davarzani, H. (2015). Green supply chain management: A review and bibliometric analysis. International Journal of Production Economics , 162 , 101–114.

Fosso Wamba, S., Akter, S., Edwards, A., Chopin, G., & Gnanzou, D. (2015). How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study. International Journal of Production Economics , 165 , 234–246.

Galindo, G., & Batta, R. (2013). Review of recent developments in OR/MS research in disaster operations management. European Journal of Operational Research , 230 , 201–211.

Gamal Aboelmaged, M. (2010). Six Sigma quality: A structured review and implications for future research. International Journal of Quality & Reliability Management , 27 (3), 268–317.

Gao, H., Barbier, G., & Goolsby, R. (2011). Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intelligent Systems , 26 (3), 10–14.

Gelernter, J., & Mushegian, N. (2011). Geo-parsing messages from microtext. Transactions in GIS , 15 (6), 753–773.

Ghosh, S., & Gosavi, A. (2017). A semi-Markov model for post-earthquake emergency response in a smart city. Control Theory and Technology , 1 (15), 13–25.

Goff, J., & Cain, G. (2016). Tsunami databases: The problems of acceptance and absence. Geoforum , 76 , 114–117.

Goswami, S., Chakraborty, S., Ghosh, S., Chakrabarti, A., & Chakraborty, B. (2016). A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal .

Grabowski, M., et al. (2016). Data challenges in dynamic, large-scale resource allocation in remote regions. Safety Science , 87 , 76–86.

Granell, C., & Ostermann, F. O. (2016). Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management. Computers, Environment and Urban Systems , 59 , 231–243.

Green III, W. G., & McGinnis, S. R. (2002). Thoughts on the higher order taxonomy of disasters. Notes on the science of extreme situations paper #7.

Grinberger, A. Y., & Felsenstein, D. (2016). Dynamic agent based simulation of welfare effects of urban disasters. Computers, Environment and Urban Systems , 59 , 129–141.

Grolinger, K., Mezghani, E., Capretz, M. A., & Exposito, E. (2016). Knowledge as a service framework for collaborative data management in cloud environments-disaster domain. In Managing big data in cloud computing environments (pp. 183–209). IGI Global.

Hara, Y., & Kuwahara, M. (2015). Traffic monitoring immediately after a major natural disaster as revealed by probe data—A case in Ishinomaki after the Great East Japan Earthquake. Transportation Research Part A: Policy and Practice , 75 , 1–15.

Hassini, E., Surti, C., & Searcy, C. (2012). A literature review and a case study of sustainable supply chains with a focus on metrics. International Journal of Production Economics , 140 , 69–82.

Haug, N. A., et al. (2016). Assessment of provider attitudes toward# naloxone on Twitter. Substance abuse , 37 (1), 35–41.

Haworth, B., & Bruce, E. (2015). A review of volunteered geographic information for disaster management. Geography Compass , 9 (5), 237–250.

Hazen, B. T., Boone, C. A., Ezell, J. D., & Jones-Farmer, L. A. (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics , 154 , 72–80.

Hristidis, V., Chen, S. C., Li, T., Luis, S., & Deng, Y. (2010). Survey of data management and analysis in disaster situations. The Journal of Systems and Software , 83 , 1701–1714.

Huang, D., et al. (2015). Modeling and analysis in marine big data: Advances and challenges. Mathematical Problems in Engineering .

Huang, Q., & Cervone, G. (2016). Usage of social media and cloud computing during natural hazards. In T. C. Vance, N. Merati, C. Yang, & M. Yuan (Eds.), Cloud computing in ocean and atmospheric sciences (pp. 297–324).

Huang, Q., & Xiao, Y. (2015). Geographic situational awareness: Mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS International Journal of Geo-Information , 4 (3), 1549–1568.

Hughes, A. L., & Palen, L. (2009). Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management , 6 (3–4), 248–260.

Hultquist, C., & Cervone, G. (2017). Citizen monitoring during hazards: Validation of Fukushima radiation measurements. GeoJournal , 1–18.

Jahre, M., Persson, G., Kovács, G., & Spens, K. M. (2007). Humanitarian logistics in disaster relief operations. International Journal of Physical Distribution & Logistics Management , 37 , 99–114.

Janke, A. T., Overbeek, D. L., Kocher, K. E., & Levy, P. D. (2016). Exploring the potential of predictive analytics and big data in emergency care. Annals of Emergency Medicine , 67 , 227–236.

Jean-Pierre, D. (2013). Oracle: Big data for the enterprise . Redwood City, CA: Oracle Corporation.

Jianping, C., Jie, X., Qiao, H., Wei, Y., Zili, L., Bin, H., et al. (2016). Quantitative geoscience and geological big data development: A review. Acta Geologica Sinica (English Edition) , 90 , 1490–1515.

Ji-fan Ren, S., Fosso Wamba, S., Akter, S., Dubey, R., & Childe, S. J. (2017). Modelling quality dynamics, business value and firm performance in a big data analytics environment. International Journal of Production Research , 55 (17), 5011–5026.

Johal, S. (2015). Kindling kindness for compassionate disaster management. PLoS Currents, 7 .

Kent, J. D., & Capello, H. T, Jr. (2013). Spatial patterns and demographic indicators of effective social media content during the Horsethief Canyon fire of 2012. Cartography and Geographic Information Science , 40 (2), 78–89.

Keon, D., et al. (2015). Protecting our shorelines: Modeling the effects of Tsunamis and storm waves. Computer , 48 (11), 23–32.

Kim, G., Shin, B., & Kwon, O. (2012). Investigating the value of sociomaterialism in conceptualizing IT capability of a firm. Journal of Management Information Systems , 29 , 327–362.

Kiron, D., Prentice, P. K., & Ferguson, R. B. (2014). The analytics mandate. MIT Sloan Management Review , 55 , 1–25.

Kitchin, R. (2014). The real-time city? Big data and smart urbanism. GeoJournal , 79 (1), 1–14.

Koshimura, S. (2016). Establishing the advanced disaster reduction management system by fusion of real-time disaster simulation and big data assimilation. Journal of Disaster Research , 11 , 164–174.

Krasuski, A., & Wasilewski, P. (2013). Outlier detection by interaction with domain experts. Fundamenta Informaticae , 127 (1–4), 529–544.

Landwehr, P. M., & Carley, K. M. (2014). Social media in disaster relief. In Data mining and knowledge discovery for big data (pp. 225–257). Berlin: Springer.

Landwehr, P. M., et al. (2016). Using tweets to support disaster planning, warning and response. Safety Science , 90 , 33–47.

Lee, J.-P., et al. (2015). Design and implementation of disaster information alert system using python in ubiquitous environment. In Advances in computer science and ubiquitous computing (pp. 403–409). Springer.

Li, W., et al. (2015). Performance improvement techniques for geospatial web services in a cyberinfrastructure environment—A case study with a disaster management portal. Computers, Environment and Urban Systems , 54 , 314–325.

Liang, Y., Caverlee, J., & Mander, J. (2013). Text versus images: On the viability of social media to assess earthquake damage. In Proceedings of the 22nd international conference on world wide web (pp. 1003–1006). ACM.

Liaqat, M., et al. (2017). Federated cloud resource management: Review and discussion. Journal of Network and Computer Applications , 77 , 87–105.

Lukić, T., Gavrilov, M. B., Marković, S. B., Komac, B., Zorn, M., Mlađđan, D., Đorđđević, J., Milanović, M., Vasiljević, D. A., Vujičić, M. D. & Kuzmanović, B. (2013). Classification of natural disasters between the legislation and application: Experience of the Republic of Serbia. Acta geographica Slovenica , 53 (1), 149–164.

Mandel, B., Culotta, A., Boulahanis, J., Stark, D., Lewis, B., & Rodrigue, J. (2012). A demographic analysis of online sentiment during Hurricane Irene. In Proceedings of the second workshop on language in social media (pp. 27–36). Association for Computational Linguistics.

Marr, B. (2015a). Big data: A game changer in the retail sector. Forbes (p. 3).

Marr, B. (2015b). How big data is changing healthcare. Forbes .

McGuire, B. (2012). How climate change causes earthquakes and erupting volcanoes. The Guardian. http://www.motherjones.com/environment/2012/02/climate-change-linked-to-volcano-eruptions-earthquakes . Accessed 27 October 15.

Mehrotra, S., Qiu, X., Cao, Z., & Tate, A. (2013). Technological challenges in emergency response. IEEE Intelligent Systems , 4 , 5–8.

Mendoza, M., Poblete, B., & Castillo, C., (2010). Twitter under crisis: Can we trust what we RT? In Proceedings of the first workshop on social media analytics (pp. 71–79). ACM.

Menhart, M. (2015). How much can Australia’s economy withstand? Munich Re (Group).

Miller, G. (2013). 6 ways to use “big data” to increase operating margins by 60%. Available at: http://upstreamcommerce.com/blog/2012/04/11/6-ways-big-data-increase-operating-margins-60-part-2 . Accessed 2 March 2016.

Miranda, M. L., et al. (2013). Geographic health information systems: A platform to support the ’triple aim’. Health Affairs , 32 (9), 1608–1615.

Miura, A., et al. (2015). Expression of negative emotional responses to the 2011 Great East Japan Earthquake: Analysis of big data from social media. Shinrigaku Kenkyu: The Japanese Journal of Psychology , 86 (2), 102–111.

Moreira, J. L., et al. (2015). Towards ontology-driven situation-aware disaster management. Applied Ontology , 10 (3–4), 339–353.

Mukherjee, S., Chattopadhyay, M., & Chattopadhyay, S. (2015). A novel encounter based trust evaluation for AODV routing in MANET. In Applications and innovations in mobile computing (AIMoC), 2015 (pp. 141–145). IEEE.

Murayama, M., & Burton, L. (2015). Cassandra, Prometheus, and Hubris: The epic tragedy of Fukushima. In Special issue Cassandra’s curse: The law and foreseeable future disasters (pp. 125–153). Emerald Group Publishing Limited.

Ngai, E. W. T., Chau, D. C. K., Poon, J. K. L., Chan, A. Y. M., Chan, B. C. M., & Wu, W. W. S. (2012). Implementing an RFID-based manufacturing process management system: Lessons learned and success factors. Journal of Engineering and Technology Management , 29 (1), 112–130.

Ngai, E. W., & Gunasekaran, A. (2007). A review for mobile commerce research and applications. Decision Support Systems , 43 (1), 3–15.

Ngai, E. W. T., Moon, K. K., Riggins, F. J., & Candace, Y. Y. (2008). RFID research: An academic literature review (1995–2005) and future research directions. International Journal of Production Economics , 112 (2), 510–520.

Ngai, E. W., Poon, J. K. L., Suk, F. F. C., & Ng, C. C. (2009). Design of an RFID-based healthcare management system using an information system design theory. Information Systems Frontiers , 11 (4), 405–417.

Ngai, E. W., & Wat, F. K. T. (2002). A literature review and classification of electronic commerce research. Information & Management , 39 (5), 415–429.

Ngai, E. W., Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Ssystems with Applications , 36 (2), 2592–2602.

Ng, R. T., & Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering , 14 (5), 1003–1016.

O’Reilly, C. A., & Tushman, M. L. (2008). Ambidexterity as a dynamic capability: Resolving the innovator’s dilemma. Research in Organizational Behavior , 28 , 185–206.

Orlikowski, W. J. (2007). Sociomaterial practices: Exploring technology at work. Organization Studies , 28 , 1435–1448.

Özyer, T., Alhajj, R., & Barker, K. (2007). Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule pre-screening. Journal of Network and Computer Applications , 30 (1), 99–113.

Palmieri, F., et al. (2016). A cloud-based architecture for emergency management and first responders localization in smart city environments. Computers & Electrical Engineering , 56 , 810–830.

Papadopoulos, T., et al. (2017). The role of big data in explaining disaster resilience in supply chains for sustainability. Journal of Cleaner Production , 142 (2), 1108–1118.

Penurkar, M. R., & Deshpande, U. A. (2014). CONHIS: Contact history-based routing algorithm for a vehicular delay tolerant network. In India conference (INDICON), 2014 annual IEEE (pp. 1–6). IEEE.

Ponserre, S., Guha-Sapir, D., Vos, F., & Below, R. (2012). Annual disaster statistical review 2011: The numbers and trends . Brussels: CRED.

Porto, J., de Albuquerque, B., Herfort, A. B., & Zipf, A. (2015). A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. International Journal of Geographical Information Science , 29 (4), 667–689. doi: 10.1080/13658816.2014.996567 .

Article   Google Scholar  

Poser, K., & Dransch, D. (2010). Volunteered geographic information for disaster management with application to rapid flood damage estimation. Geomatica , 64 (1), 89–98.

Prewitt, K. (2013). The 2012 Morris Hansen lecture: Thank you Morris, et al., for Westat, et al. Journal of Official Statistics , 29 (2), 223–231.

Pyakurel, U., & Dhamala, T. (2017). Continuous dynamic contraflow approach for evacuation planning. Annals of Operations Research , 253 , 573–598.

Qian, F., Chiew, K., He, Q., Huang, H., & Ma, L. (2013). Discovery of regional co-location patterns with k-nearest neighbor graph. Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 174–186). Berlin: Springer.

Radianti, J., et al. (2015). Fire simulation-based adaptation of SmartRescue App for serious game: Design, setup and user experience. Engineering Applications of Artificial Intelligence , 46 , 312–325.

Ram, S., Zhang, W., Williams, M., & Pengetnze, Y. (2015). Predicting asthma-related emergency department visits using big data. IEEE journal of biomedical and health informatics , 19 , 1216–1223.

Reddi, V. J., Lee, B. C., Chilimbi, T., & Vaid, K. (2011). Mobile processors for energy-efficient web search. ACM Transactions on Computer Systems (TOCS) , 29 (3), 9.

Revilla-Romero, B., et al. (2015). On the use of global flood forecasts and satellite-derived inundation maps for flood monitoring in data-sparse regions. Remote Sensing , 7 (11), 15702–15728.

Robinson, B., Power, R., & Cameron, M. (2013). A sensitive twitter earthquake detector. In Proceedings of the 22nd international conference on world wide web (pp. 999–1002). ACM.

Rovero, F., & Ahumada, J. (2017). The tropical ecology, assessment and monitoring (TEAM) network: An early warning system for tropical rain forests. Science of The Total Environment , 574 , 914–923.

Rowely, J. (2002). Using case studies in research. Management Research News , 25 (1), 16–27.

Schläfke, M., Silvi, R., & Möller, K. (2013). A framework for business analytics in performance management. International Journal of Productivity and Performance Management , 62 , 110–122.

Schnebele, E., & Waters, N. (2014). Road assessment after flood events using non-authoritative data. Natural Hazards and Earth System Sciences , 14 (4), 1007.

Schultz, C. (2012). Extreme events and natural hazards: The complexity perspective. Eos, Transactions American Geophysical Union , 93 (44), 444–444.

Scott, N., & Batchelor, S. (2013). Real time monitoring in disasters. IDS Bulletin , 44 (2), 122–134.

Shakir, I., et al. (2014). Reducing distributed URLs crawling time: A comparison of GUIDS and IDS. Journal of Theoretical and Applied Information Technology , 1–8.

Shelton, T., Poorthuis, A., Graham, M., & Zook, M. (2014). Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum , 52 , 167–179.

Smith, A. B., & Matthews, J. L. (2015). Quantifying uncertainty and variable sensitivity within the US billion-dollar weather and climate disaster cost estimates. Natural Hazards , 77 (3), 1829–1851.

Starbird, K., & Palen, L. (2010). Pass it on? Retweeting in mass emergency (pp. 1–10). International Community on Information Systems for Crisis Response and Management.

Starr, M. K., & Wassenhove, L. N. V. (2014). Introduction to the special issue on humanitarian operations and crisis management. Production and Operations Management , 23 , 925–937.

Swiss Re Institute Sigma. (2017). Global insured losses from disaster events were USD 54 billion in 2016, up 42% from 2015 . Swiss Re Institute Sigma.

Tan, X., et al. (2016). Agent-as-a-service-based geospatial service aggregation in the cloud: A case study of flood response. Environmental Modelling & Software , 84 , 210–225.

Teece, D., & Leih, S. (2016). Uncertainty, innovation, and dynamic capabilities: An introduction. California Management Review , 58 (4), 5–12.

Tomaszewski, B., et al. (2015). Geographic information systems for disaster response: A review. Journal of Homeland Security and Emergency Management , 12 (3), 571–602.

Tufekci, S., & Wallace, W. A. (1998). The emerging area of emergency management and engineering. IEEE Transactions on Engineering Management , 45 (2), 103–105.

Velev, D., & Zlateva, P. (2012). Use of social media in natural disaster management. In International proceedings of economics development and research , 39 Edn. Dong Lijuan. (pp. 41–45). Singarpore: IACSIT Press.

Venkatesan, M., et al. (2015). A novel Cp-tree-based co-located classifier for big data analysis. International Journal of Communication Networks and Distributed Systems , 15 (2–3), 191–211.

Vieweg, S. (2010). Microblogged contributions to the emergency arena: Discovery, interpretation and implications. Computer supported collaborative work (pp. 515–516).

Vieweg, S., Hughes, A. L., Starbird, K., & Palen, L. (2010). Microblogging during two natural hazards events: What twitter may contribute to situational awareness. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1079–1088). ACM.

Villena-Román, J., Cobos, A. L., & Cristóbal, J. C. G. (2014). TweetAlert: Semantic analytics in social networks for citizen opinion mining in the city of the future. In UMAP Workshops .

Wamba, S. F., Ngai, E. W. T., Riggins, F., & Akter, S. (2017). Guest editorial. International Journal of Operations & Production Management , 37 (1), 2–9.

Wang, X., Wu, Y., Liang, L., & Huang, Z. (2016a). Service outsourcing and disaster response methods in a relief supply chain. Annals of Operations Research , 240 , 471–487.

Wang, Y., Chen, C., Wang, J., & Baldick, R. (2016b). Research on resilience of power systems under natural disasters—A review. IEEE Transactions on Power Systems , 31 (2), 1604–1613.

Wang, Y., Zhang, H., He, D., Guo, C., Zhu, W., & Yang, W. (2016c). Function design and system architecture of disaster prevention and dispatch system in power system based on big data platform. Dianwang Jishu/Power System Technology , 40 (10), 3213–3219.

Waugh, W. L, Jr. (2000). Living with hazards, dealing with disasters: An introduction to emergency management . Armonk, NY: M.E Sharpe.

Winquist, R. J., et al. (2014). The fall and rise of pharmacology–(Re-) defining the discipline? Biochemical Pharmacology , 87 (1), 4–24.

Yoo, J. S., & Shekhar, S. (2006). A joinless approach for mining spatial colocation patterns. IEEE Transactions on Knowledge and Data Engineering , 18 (10), 1323–1337.

Zeydan, E., Bastug, E., Bennis, M., Kader, M. A., Karatepe, I. A., Er, A. S., et al. (2016). Big data caching for networking: Moving from cloud to edge. IEEE Communications Magazine , 54 , 36–42.

Zhang, L., Liu, X., Li, Y., Liu, Y., Liu, Z., Lin, J., et al. (2012). Emergency medical rescue efforts after a major earthquake: Lessons from the 2008 Wenchuan earthquake. The Lancet , 379 (9818), 853–861.

Zheng, L., Shen, C., Tang, L., Zeng, C., Li, T., Luis, S., et al. (2013). Data mining meets the needs of disaster information management. IEEE Transactions on Human–Machine Systems , 43 (5), 451–464.

Zook, M., Graham, M., Shelton, T., & Gorman, S. (2010). Volunteered geographic information and crowdsourcing disaster relief: A case study of the Haitian earthquake. World Medical & Health Policy , 2 (2), 7–33.

Download references

Acknowledgements

The authors appreciate and gratefully acknowledge constructive comments and literature review support of Deepa Mishra (Indian Institute of Technology, Kanpur, India), which improved the quality of our study.

Author information

Authors and affiliations.

Sydney Business School, University of Wollongong, Sydney, NSW, Australia

Shahriar Akter

Toulouse Business School, Toulouse, France

Samuel Fosso Wamba

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shahriar Akter .

Rights and permissions

Reprints and permissions

About this article

Akter, S., Wamba, S.F. Big data and disaster management: a systematic review and agenda for future research. Ann Oper Res 283 , 939–959 (2019). https://doi.org/10.1007/s10479-017-2584-2

Download citation

Published : 21 August 2017

Issue Date : December 2019

DOI : https://doi.org/10.1007/s10479-017-2584-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big data analytics
  • Disaster management
  • Humanitarian services
  • Emergency services
  • Systematic literature review
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 06 January 2022

The use of Big Data Analytics in healthcare

  • Kornelia Batko   ORCID: orcid.org/0000-0001-6561-3826 1 &
  • Andrzej Ślęzak 2  

Journal of Big Data volume  9 , Article number:  3 ( 2022 ) Cite this article

62k Accesses

74 Citations

28 Altmetric

Metrics details

The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities. The direct research was carried out based on research questionnaire and conducted on a sample of 217 medical facilities in Poland. Literature studies have shown that the use of Big Data Analytics can bring many benefits to medical facilities, while direct research has shown that medical facilities in Poland are moving towards data-based healthcare because they use structured and unstructured data, reach for analytics in the administrative, business and clinical area. The research positively confirmed that medical facilities are working on both structural data and unstructured data. The following kinds and sources of data can be distinguished: from databases, transaction data, unstructured content of emails and documents, data from devices and sensors. However, the use of data from social media is lower as in their activity they reach for analytics, not only in the administrative and business but also in the clinical area. It clearly shows that the decisions made in medical facilities are highly data-driven. The results of the study confirm what has been analyzed in the literature that medical facilities are moving towards data-based healthcare, together with its benefits.

Introduction

The main contribution of this paper is to present an analytical overview of using structured and unstructured data (Big Data) analytics in medical facilities in Poland. Medical facilities use both structured and unstructured data in their practice. Structured data has a predetermined schema, it is extensive, freeform, and comes in variety of forms [ 27 ]. In contrast, unstructured data, referred to as Big Data (BD), does not fit into the typical data processing format. Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. It remains stored but not analyzed. Due to the lack of a well-defined schema, it is difficult to search and analyze such data and, therefore, it requires a specific technology and method to transform it into value [ 20 , 68 ]. Integrating data stored in both structured and unstructured formats can add significant value to an organization [ 27 ]. Organizations must approach unstructured data in a different way. Therefore, the potential is seen in Big Data Analytics (BDA). Big Data Analytics are techniques and tools used to analyze and extract information from Big Data. The results of Big Data analysis can be used to predict the future. They also help in creating trends about the past. When it comes to healthcare, it allows to analyze large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 60 ].

This paper is the first study to consolidate and characterize the use of Big Data from different perspectives. The first part consists of a brief literature review of studies on Big Data (BD) and Big Data Analytics (BDA), while the second part presents results of direct research aimed at diagnosing the use of big data analyses in medical facilities in Poland.

Healthcare is a complex system with varied stakeholders: patients, doctors, hospitals, pharmaceutical companies and healthcare decision-makers. This sector is also limited by strict rules and regulations. However, worldwide one may observe a departure from the traditional doctor-patient approach. The doctor becomes a partner and the patient is involved in the therapeutic process [ 14 ]. Healthcare is no longer focused solely on the treatment of patients. The priority for decision-makers should be to promote proper health attitudes and prevent diseases that can be avoided [ 81 ]. This became visible and important especially during the Covid-19 pandemic [ 44 ].

The next challenges that healthcare will have to face is the growing number of elderly people and a decline in fertility. Fertility rates in the country are found below the reproductive minimum necessary to keep the population stable [ 10 ]. The reflection of both effects, namely the increase in age and lower fertility rates, are demographic load indicators, which is constantly growing. Forecasts show that providing healthcare in the form it is provided today will become impossible in the next 20 years [ 70 ]. It is especially visible now during the Covid-19 pandemic when healthcare faced quite a challenge related to the analysis of huge data amounts and the need to identify trends and predict the spread of the coronavirus. The pandemic showed it even more that patients should have access to information about their health condition, the possibility of digital analysis of this data and access to reliable medical support online. Health monitoring and cooperation with doctors in order to prevent diseases can actually revolutionize the healthcare system. One of the most important aspects of the change necessary in healthcare is putting the patient in the center of the system.

Technology is not enough to achieve these goals. Therefore, changes should be made not only at the technological level but also in the management and design of complete healthcare processes and what is more, they should affect the business models of service providers. The use of Big Data Analytics is becoming more and more common in enterprises [ 17 , 54 ]. However, medical enterprises still cannot keep up with the information needs of patients, clinicians, administrators and the creator’s policy. The adoption of a Big Data approach would allow the implementation of personalized and precise medicine based on personalized information, delivered in real time and tailored to individual patients.

To achieve this goal, it is necessary to implement systems that will be able to learn quickly about the data generated by people within clinical care and everyday life. This will enable data-driven decision making, receiving better personalized predictions about prognosis and responses to treatments; a deeper understanding of the complex factors and their interactions that influence health at the patient level, the health system and society, enhanced approaches to detecting safety problems with drugs and devices, as well as more effective methods of comparing prevention, diagnostic, and treatment options [ 40 ].

In the literature, there is a lot of research showing what opportunities can be offered to companies by big data analysis and what data can be analyzed. However, there are few studies showing how data analysis in the area of healthcare is performed, what data is used by medical facilities and what analyses and in which areas they carry out. This paper aims to fill this gap by presenting the results of research carried out in medical facilities in Poland. The goal is to analyze the possibilities of using Big Data Analytics in healthcare, especially in Polish conditions. In particular, the paper is aimed at determining what data is processed by medical facilities in Poland, what analyses they perform and in what areas, and how they assess their analytical maturity. In order to achieve this goal, a critical analysis of the literature was performed, and the direct research was based on a research questionnaire conducted on a sample of 217 medical facilities in Poland. It was hypothesized that medical facilities in Poland are working on both structured and unstructured data and moving towards data-based healthcare and its benefits. Examining the maturity of healthcare facilities in the use of Big Data and Big Data Analytics is crucial in determining the potential future benefits that the healthcare sector can gain from Big Data Analytics. There is also a pressing need to predicate whether, in the coming years, healthcare will be able to cope with the threats and challenges it faces.

This paper is divided into eight parts. The first is the introduction which provides background and the general problem statement of this research. In the second part, this paper discusses considerations on use of Big Data and Big Data Analytics in Healthcare, and then, in the third part, it moves on to challenges and potential benefits of using Big Data Analytics in healthcare. The next part involves the explanation of the proposed method. The result of direct research and discussion are presented in the fifth part, while the following part of the paper is the conclusion. The seventh part of the paper presents practical implications. The final section of the paper provides limitations and directions for future research.

Considerations on use Big Data and Big Data Analytics in the healthcare

In recent years one can observe a constantly increasing demand for solutions offering effective analytical tools. This trend is also noticeable in the analysis of large volumes of data (Big Data, BD). Organizations are looking for ways to use the power of Big Data to improve their decision making, competitive advantage or business performance [ 7 , 54 ]. Big Data is considered to offer potential solutions to public and private organizations, however, still not much is known about the outcome of the practical use of Big Data in different types of organizations [ 24 ].

As already mentioned, in recent years, healthcare management worldwide has been changed from a disease-centered model to a patient-centered model, even in value-based healthcare delivery model [ 68 ]. In order to meet the requirements of this model and provide effective patient-centered care, it is necessary to manage and analyze healthcare Big Data.

The issue often raised when it comes to the use of data in healthcare is the appropriate use of Big Data. Healthcare has always generated huge amounts of data and nowadays, the introduction of electronic medical records, as well as the huge amount of data sent by various types of sensors or generated by patients in social media causes data streams to constantly grow. Also, the medical industry generates significant amounts of data, including clinical records, medical images, genomic data and health behaviors. Proper use of the data will allow healthcare organizations to support clinical decision-making, disease surveillance, and public health management. The challenge posed by clinical data processing involves not only the quantity of data but also the difficulty in processing it.

In the literature one can find many different definitions of Big Data. This concept has evolved in recent years, however, it is still not clearly understood. Nevertheless, despite the range and differences in definitions, Big Data can be treated as a: large amount of digital data, large data sets, tool, technology or phenomenon (cultural or technological.

Big Data can be considered as massive and continually generated digital datasets that are produced via interactions with online technologies [ 53 ]. Big Data can be defined as datasets that are of such large sizes that they pose challenges in traditional storage and analysis techniques [ 28 ]. A similar opinion about Big Data was presented by Ohlhorst who sees Big Data as extremely large data sets, possible neither to manage nor to analyze with traditional data processing tools [ 57 ]. In his opinion, the bigger the data set, the more difficult it is to gain any value from it.

In turn, Knapp perceived Big Data as tools, processes and procedures that allow an organization to create, manipulate and manage very large data sets and storage facilities [ 38 ]. From this point of view, Big Data is identified as a tool to gather information from different databases and processes, allowing users to manage large amounts of data.

Similar perception of the term ‘Big Data’ is shown by Carter. According to him, Big Data technologies refer to a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data by enabling high velocity capture, discovery and/or analysis [ 13 ].

Jordan combines these two approaches by identifying Big Data as a complex system, as it needs data bases for data to be stored in, programs and tools to be managed, as well as expertise and personnel able to retrieve useful information and visualization to be understood [ 37 ].

Following the definition of Laney for Big Data, it can be state that: it is large amount of data generated in very fast motion and it contains a lot of content [ 43 ]. Such data comes from unstructured sources, such as stream of clicks on the web, social networks (Twitter, blogs, Facebook), video recordings from the shops, recording of calls in a call center, real time information from various kinds of sensors, RFID, GPS devices, mobile phones and other devices that identify and monitor something [ 8 ]. Big Data is a powerful digital data silo, raw, collected with all sorts of sources, unstructured and difficult, or even impossible, to analyze using conventional techniques used so far to relational databases.

While describing Big Data, it cannot be overlooked that the term refers more to a phenomenon than to specific technology. Therefore, instead of defining this phenomenon, trying to describe them, more authors are describing Big Data by giving them characteristics included a collection of V’s related to its nature [ 2 , 3 , 23 , 25 , 58 ]:

Volume (refers to the amount of data and is one of the biggest challenges in Big Data Analytics),

Velocity (speed with which new data is generated, the challenge is to be able to manage data effectively and in real time),

Variety (heterogeneity of data, many different types of healthcare data, the challenge is to derive insights by looking at all available heterogenous data in a holistic manner),

Variability (inconsistency of data, the challenge is to correct the interpretation of data that can vary significantly depending on the context),

Veracity (how trustworthy the data is, quality of the data),

Visualization (ability to interpret data and resulting insights, challenging for Big Data due to its other features as described above).

Value (the goal of Big Data Analytics is to discover the hidden knowledge from huge amounts of data).

Big Data is defined as an information asset with high volume, velocity, and variety, which requires specific technology and method for its transformation into value [ 21 , 77 ]. Big Data is also a collection of information about high-volume, high volatility or high diversity, requiring new forms of processing in order to support decision-making, discovering new phenomena and process optimization [ 5 , 7 ]. Big Data is too large for traditional data-processing systems and software tools to capture, store, manage and analyze, therefore it requires new technologies [ 28 , 50 , 61 ] to manage (capture, aggregate, process) its volume, velocity and variety [ 9 ].

Undoubtedly, Big Data differs from the data sources used so far by organizations. Therefore, organizations must approach this type of unstructured data in a different way. First of all, organizations must start to see data as flows and not stocks—this entails the need to implement the so-called streaming analytics [ 48 ]. The mentioned features make it necessary to use new IT tools that allow the fullest use of new data [ 58 ]. The Big Data idea, inseparable from the huge increase in data available to various organizations or individuals, creates opportunities for access to valuable analyses, conclusions and enables making more accurate decisions [ 6 , 11 , 59 ].

The Big Data concept is constantly evolving and currently it does not focus on huge amounts of data, but rather on the process of creating value from this data [ 52 ]. Big Data is collected from various sources that have different data properties and are processed by different organizational units, resulting in creation of a Big Data chain [ 36 ]. The aim of the organizations is to manage, process and analyze Big Data. In the healthcare sector, Big Data streams consist of various types of data, namely [ 8 , 51 ]:

clinical data, i.e. data obtained from electronic medical records, data from hospital information systems, image centers, laboratories, pharmacies and other organizations providing health services, patient generated health data, physician’s free-text notes, genomic data, physiological monitoring data [ 4 ],

biometric data provided from various types of devices that monitor weight, pressure, glucose level, etc.,

financial data, constituting a full record of economic operations reflecting the conducted activity,

data from scientific research activities, i.e. results of research, including drug research, design of medical devices and new methods of treatment,

data provided by patients, including description of preferences, level of satisfaction, information from systems for self-monitoring of their activity: exercises, sleep, meals consumed, etc.

data from social media.

These data are provided not only by patients but also by organizations and institutions, as well as by various types of monitoring devices, sensors or instruments [ 16 ]. Data that has been generated so far in the healthcare sector is stored in both paper and digital form. Thus, the essence and the specificity of the process of Big Data analyses means that organizations need to face new technological and organizational challenges [ 67 ]. The healthcare sector has always generated huge amounts of data and this is connected, among others, with the need to store medical records of patients. However, the problem with Big Data in healthcare is not limited to an overwhelming volume but also an unprecedented diversity in terms of types, data formats and speed with which it should be analyzed in order to provide the necessary information on an ongoing basis [ 3 ]. It is also difficult to apply traditional tools and methods for management of unstructured data [ 67 ]. Due to the diversity and quantity of data sources that are growing all the time, advanced analytical tools and technologies, as well as Big Data analysis methods which can meet and exceed the possibilities of managing healthcare data, are needed [ 3 , 68 ].

Therefore, the potential is seen in Big Data analyses, especially in the aspect of improving the quality of medical care, saving lives or reducing costs [ 30 ]. Extracting from this tangle of given association rules, patterns and trends will allow health service providers and other stakeholders in the healthcare sector to offer more accurate and more insightful diagnoses of patients, personalized treatment, monitoring of the patients, preventive medicine, support of medical research and health population, as well as better quality of medical services and patient care while, at the same time, the ability to reduce costs (Fig.  1 ).

figure 1

(Source: Own elaboration)

Healthcare Big Data Analytics applications

The main challenge with Big Data is how to handle such a large amount of information and use it to make data-driven decisions in plenty of areas [ 64 ]. In the context of healthcare data, another major challenge is to adjust big data storage, analysis, presentation of analysis results and inference basing on them in a clinical setting. Data analytics systems implemented in healthcare are designed to describe, integrate and present complex data in an appropriate way so that it can be understood better (Fig.  2 ). This would improve the efficiency of acquiring, storing, analyzing and visualizing big data from healthcare [ 71 ].

figure 2

Process of Big Data Analytics

The result of data processing with the use of Big Data Analytics is appropriate data storytelling which may contribute to making decisions with both lower risk and data support. This, in turn, can benefit healthcare stakeholders. To take advantage of the potential massive amounts of data in healthcare and to ensure that the right intervention to the right patient is properly timed, personalized, and potentially beneficial to all components of the healthcare system such as the payer, patient, and management, analytics of large datasets must connect communities involved in data analytics and healthcare informatics [ 49 ]. Big Data Analytics can provide insight into clinical data and thus facilitate informed decision-making about the diagnosis and treatment of patients, prevention of diseases or others. Big Data Analytics can also improve the efficiency of healthcare organizations by realizing the data potential [ 3 , 62 ].

Big Data Analytics in medicine and healthcare refers to the integration and analysis of a large amount of complex heterogeneous data, such as various omics (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenetics, deasomics), biomedical data, talemedicine data (sensors, medical equipment data) and electronic health records data [ 46 , 65 ].

When analyzing the phenomenon of Big Data in the healthcare sector, it should be noted that it can be considered from the point of view of three areas: epidemiological, clinical and business.

From a clinical point of view, the Big Data analysis aims to improve the health and condition of patients, enable long-term predictions about their health status and implementation of appropriate therapeutic procedures. Ultimately, the use of data analysis in medicine is to allow the adaptation of therapy to a specific patient, that is personalized medicine (precision, personalized medicine).

From an epidemiological point of view, it is desirable to obtain an accurate prognosis of morbidity in order to implement preventive programs in advance.

In the business context, Big Data analysis may enable offering personalized packages of commercial services or determining the probability of individual disease and infection occurrence. It is worth noting that Big Data means not only the collection and processing of data but, most of all, the inference and visualization of data necessary to obtain specific business benefits.

In order to introduce new management methods and new solutions in terms of effectiveness and transparency, it becomes necessary to make data more accessible, digital, searchable, as well as analyzed and visualized.

Erickson and Rothberg state that the information and data do not reveal their full value until insights are drawn from them. Data becomes useful when it enhances decision making and decision making is enhanced only when analytical techniques are used and an element of human interaction is applied [ 22 ].

Thus, healthcare has experienced much progress in usage and analysis of data. A large-scale digitalization and transparency in this sector is a key statement of almost all countries governments policies. For centuries, the treatment of patients was based on the judgment of doctors who made treatment decisions. In recent years, however, Evidence-Based Medicine has become more and more important as a result of it being related to the systematic analysis of clinical data and decision-making treatment based on the best available information [ 42 ]. In the healthcare sector, Big Data Analytics is expected to improve the quality of life and reduce operational costs [ 72 , 82 ]. Big Data Analytics enables organizations to improve and increase their understanding of the information contained in data. It also helps identify data that provides insightful insights for current as well as future decisions [ 28 ].

Big Data Analytics refers to technologies that are grounded mostly in data mining: text mining, web mining, process mining, audio and video analytics, statistical analysis, network analytics, social media analytics and web analytics [ 16 , 25 , 31 ]. Different data mining techniques can be applied on heterogeneous healthcare data sets, such as: anomaly detection, clustering, classification, association rules as well as summarization and visualization of those Big Data sets [ 65 ]. Modern data analytics techniques explore and leverage unique data characteristics even from high-speed data streams and sensor data [ 15 , 16 , 31 , 55 ]. Big Data can be used, for example, for better diagnosis in the context of comprehensive patient data, disease prevention and telemedicine (in particular when using real-time alerts for immediate care), monitoring patients at home, preventing unnecessary hospital visits, integrating medical imaging for a wider diagnosis, creating predictive analytics, reducing fraud and improving data security, better strategic planning and increasing patients’ involvement in their own health.

Big Data Analytics in healthcare can be divided into [ 33 , 73 , 74 ]:

descriptive analytics in healthcare is used to understand past and current healthcare decisions, converting data into useful information for understanding and analyzing healthcare decisions, outcomes and quality, as well as making informed decisions [ 33 ]. It can be used to create reports (i.e. about patients’ hospitalizations, physicians’ performance, utilization management), visualization, customized reports, drill down tables, or running queries on the basis of historical data.

predictive analytics operates on past performance in an effort to predict the future by examining historical or summarized health data, detecting patterns of relationships in these data, and then extrapolating these relationships to forecast. It can be used to i.e. predict the response of different patient groups to different drugs (dosages) or reactions (clinical trials), anticipate risk and find relationships in health data and detect hidden patterns [ 62 ]. In this way, it is possible to predict the epidemic spread, anticipate service contracts and plan healthcare resources. Predictive analytics is used in proper diagnosis and for appropriate treatments to be given to patients suffering from certain diseases [ 39 ].

prescriptive analytics—occurs when health problems involve too many choices or alternatives. It uses health and medical knowledge in addition to data or information. Prescriptive analytics is used in many areas of healthcare, including drug prescriptions and treatment alternatives. Personalized medicine and evidence-based medicine are both supported by prescriptive analytics.

discovery analytics—utilizes knowledge about knowledge to discover new “inventions” like drugs (drug discovery), previously unknown diseases and medical conditions, alternative treatments, etc.

Although the models and tools used in descriptive, predictive, prescriptive, and discovery analytics are different, many applications involve all four of them [ 62 ]. Big Data Analytics in healthcare can help enable personalized medicine by identifying optimal patient-specific treatments. This can influence the improvement of life standards, reduce waste of healthcare resources and save costs of healthcare [ 56 , 63 , 71 ]. The introduction of large data analysis gives new analytical possibilities in terms of scope, flexibility and visualization. Techniques such as data mining (computational pattern discovery process in large data sets) facilitate inductive reasoning and analysis of exploratory data, enabling scientists to identify data patterns that are independent of specific hypotheses. As a result, predictive analysis and real-time analysis becomes possible, making it easier for medical staff to start early treatments and reduce potential morbidity and mortality. In addition, document analysis, statistical modeling, discovering patterns and topics in document collections and data in the EHR, as well as an inductive approach can help identify and discover relationships between health phenomena.

Advanced analytical techniques can be used for a large amount of existing (but not yet analytical) data on patient health and related medical data to achieve a better understanding of the information and results obtained, as well as to design optimal clinical pathways [ 62 ]. Big Data Analytics in healthcare integrates analysis of several scientific areas such as bioinformatics, medical imaging, sensor informatics, medical informatics and health informatics [ 65 ]. Big Data Analytics in healthcare allows to analyze large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 65 ]. Discussing all the techniques used for Big Data Analytics goes beyond the scope of a single article [ 25 ].

The success of Big Data analysis and its accuracy depend heavily on the tools and techniques used to analyze the ability to provide reliable, up-to-date and meaningful information to various stakeholders [ 12 ]. It is believed that the implementation of big data analytics by healthcare organizations could bring many benefits in the upcoming years, including lowering health care costs, better diagnosis and prediction of diseases and their spread, improving patient care and developing protocols to prevent re-hospitalization, optimizing staff, optimizing equipment, forecasting the need for hospital beds, operating rooms, treatments, and improving the drug supply chain [ 71 ].

Challenges and potential benefits of using Big Data Analytics in healthcare

Modern analytics gives possibilities not only to have insight in historical data, but also to have information necessary to generate insight into what may happen in the future. Even when it comes to prediction of evidence-based actions. The emphasis on reform has prompted payers and suppliers to pursue data analysis to reduce risk, detect fraud, improve efficiency and save lives. Everyone—payers, providers, even patients—are focusing on doing more with fewer resources. Thus, some areas in which enhanced data and analytics can yield the greatest results include various healthcare stakeholders (Table 1 ).

Healthcare organizations see the opportunity to grow through investments in Big Data Analytics. In recent years, by collecting medical data of patients, converting them into Big Data and applying appropriate algorithms, reliable information has been generated that helps patients, physicians and stakeholders in the health sector to identify values and opportunities [ 31 ]. It is worth noting that there are many changes and challenges in the structure of the healthcare sector. Digitization and effective use of Big Data in healthcare can bring benefits to every stakeholder in this sector. A single doctor would benefit the same as the entire healthcare system. Potential opportunities to achieve benefits and effects from Big Data in healthcare can be divided into four groups [ 8 ]:

Improving the quality of healthcare services:

assessment of diagnoses made by doctors and the manner of treatment of diseases indicated by them based on the decision support system working on Big Data collections,

detection of more effective, from a medical point of view, and more cost-effective ways to diagnose and treat patients,

analysis of large volumes of data to reach practical information useful for identifying needs, introducing new health services, preventing and overcoming crises,

prediction of the incidence of diseases,

detecting trends that lead to an improvement in health and lifestyle of the society,

analysis of the human genome for the introduction of personalized treatment.

Supporting the work of medical personnel

doctors’ comparison of current medical cases to cases from the past for better diagnosis and treatment adjustment,

detection of diseases at earlier stages when they can be more easily and quickly cured,

detecting epidemiological risks and improving control of pathogenic spots and reaction rates,

identification of patients who are predicted to have the highest risk of specific, life-threatening diseases by collating data on the history of the most common diseases, in healing people with reports entering insurance companies,

health management of each patient individually (personalized medicine) and health management of the whole society,

capturing and analyzing large amounts of data from hospitals and homes in real time, life monitoring devices to monitor safety and predict adverse events,

analysis of patient profiles to identify people for whom prevention should be applied, lifestyle change or preventive care approach,

the ability to predict the occurrence of specific diseases or worsening of patients’ results,

predicting disease progression and its determinants, estimating the risk of complications,

detecting drug interactions and their side effects.

Supporting scientific and research activity

supporting work on new drugs and clinical trials thanks to the possibility of analyzing “all data” instead of selecting a test sample,

the ability to identify patients with specific, biological features that will take part in specialized clinical trials,

selecting a group of patients for which the tested drug is likely to have the desired effect and no side effects,

using modeling and predictive analysis to design better drugs and devices.

Business and management

reduction of costs and counteracting abuse and counseling practices,

faster and more effective identification of incorrect or unauthorized financial operations in order to prevent abuse and eliminate errors,

increase in profitability by detecting patients generating high costs or identifying doctors whose work, procedures and treatment methods cost the most and offering them solutions that reduce the amount of money spent,

identification of unnecessary medical activities and procedures, e.g. duplicate tests.

According to research conducted by Wang, Kung and Byrd, Big Data Analytics benefits can be classified into five categories: IT infrastructure benefits (reducing system redundancy, avoiding unnecessary IT costs, transferring data quickly among healthcare IT systems, better use of healthcare systems, processing standardization among various healthcare IT systems, reducing IT maintenance costs regarding data storage), operational benefits (improving the quality and accuracy of clinical decisions, processing a large number of health records in seconds, reducing the time of patient travel, immediate access to clinical data to analyze, shortening the time of diagnostic test, reductions in surgery-related hospitalizations, exploring inconceivable new research avenues), organizational benefits (detecting interoperability problems much more quickly than traditional manual methods, improving cross-functional communication and collaboration among administrative staffs, researchers, clinicians and IT staffs, enabling data sharing with other institutions and adding new services, content sources and research partners), managerial benefits (gaining quick insights about changing healthcare trends in the market, providing members of the board and heads of department with sound decision-support information on the daily clinical setting, optimizing business growth-related decisions) and strategic benefits (providing a big picture view of treatment delivery for meeting future need, creating high competitive healthcare services) [ 73 ].

The above specification does not constitute a full list of potential areas of use of Big Data Analysis in healthcare because the possibilities of using analysis are practically unlimited. In addition, advanced analytical tools allow to analyze data from all possible sources and conduct cross-analyses to provide better data insights [ 26 ]. For example, a cross-analysis can refer to a combination of patient characteristics, as well as costs and care results that can help identify the best, in medical terms, and the most cost-effective treatment or treatments and this may allow a better adjustment of the service provider’s offer [ 62 ].

In turn, the analysis of patient profiles (e.g. segmentation and predictive modeling) allows identification of people who should be subject to prophylaxis, prevention or should change their lifestyle [ 8 ]. Shortened list of benefits for Big Data Analytics in healthcare is presented in paper [ 3 ] and consists of: better performance, day-to-day guides, detection of diseases in early stages, making predictive analytics, cost effectiveness, Evidence Based Medicine and effectiveness in patient treatment.

Summarizing, healthcare big data represents a huge potential for the transformation of healthcare: improvement of patients’ results, prediction of outbreaks of epidemics, valuable insights, avoidance of preventable diseases, reduction of the cost of healthcare delivery and improvement of the quality of life in general [ 1 ]. Big Data also generates many challenges such as difficulties in data capture, data storage, data analysis and data visualization [ 15 ]. The main challenges are connected with the issues of: data structure (Big Data should be user-friendly, transparent, and menu-driven but it is fragmented, dispersed, rarely standardized and difficult to aggregate and analyze), security (data security, privacy and sensitivity of healthcare data, there are significant concerns related to confidentiality), data standardization (data is stored in formats that are not compatible with all applications and technologies), storage and transfers (especially costs associated with securing, storing, and transferring unstructured data), managerial skills, such as data governance, lack of appropriate analytical skills and problems with Real-Time Analytics (health care is to be able to utilize Big Data in real time) [ 4 , 34 , 41 ].

The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities in Poland.

Presented research results are part of a larger questionnaire form on Big Data Analytics. The direct research was based on an interview questionnaire which contained 100 questions with 5-point Likert scale (1—strongly disagree, 2—I rather disagree, 3—I do not agree, nor disagree, 4—I rather agree, 5—I definitely agree) and 4 metrics questions. The study was conducted in December 2018 on a sample of 217 medical facilities (110 private, 107 public). The research was conducted by a specialized market research agency: Center for Research and Expertise of the University of Economics in Katowice.

When it comes to direct research, the selected entities included entities financed from public sources—the National Health Fund (23.5%), and entities operating commercially (11.5%). In the surveyed group of entities, more than a half (64.9%) are hybrid financed, both from public and commercial sources. The diversity of the research sample also applies to the size of the entities, defined by the number of employees. Taking into account proportions of the surveyed entities, it should be noted that in the sector structure, medium-sized (10–50 employees—34% of the sample) and large (51–250 employees—27%) entities dominate. The research was of all-Poland nature, and the entities included in the research sample come from all of the voivodships. The largest group were entities from Łódzkie (32%), Śląskie (18%) and Mazowieckie (18%) voivodships, as these voivodships have the largest number of medical institutions. Other regions of the country were represented by single units. The selection of the research sample was random—layered. As part of medical facilities database, groups of private and public medical facilities have been identified and the ones to which the questionnaire was targeted were drawn from each of these groups. The analyses were performed using the GNU PSPP 0.10.2 software.

The aim of the study was to determine whether medical facilities in Poland use Big Data Analytics and if so, in which areas. Characteristics of the research sample is presented in Table 2 .

The research is non-exhaustive due to the incomplete and uneven regional distribution of the samples, overrepresented in three voivodeships (Łódzkie, Mazowieckie and Śląskie). The size of the research sample (217 entities) allows the authors of the paper to formulate specific conclusions on the use of Big Data in the process of its management.

For the purpose of this paper, the following research hypotheses were formulated: (1) medical facilities in Poland are working on both structured and unstructured data (2) medical facilities in Poland are moving towards data-based healthcare and its benefits.

The paper poses the following research questions and statements that coincide with the selected questions from the research questionnaire:

From what sources do medical facilities obtain data? What types of data are used by the particular organization, whether structured or unstructured, and to what extent?

From what sources do medical facilities obtain data?

In which area organizations are using data and analytical systems (clinical or business)?

Is data analytics performed based on historical data or are predictive analyses also performed?

Determining whether administrative and medical staff receive complete, accurate and reliable data in a timely manner?

Determining whether real-time analyses are performed to support the particular organization’s activities.

Results and discussion

On the basis of the literature analysis and research study, a set of questions and statements related to the researched area was formulated. The results from the surveys show that medical facilities use a variety of data sources in their operations. These sources are both structured and unstructured data (Table 3 ).

According to the data provided by the respondents, considering the first statement made in the questionnaire, almost half of the medical institutions (47.58%) agreed that they rather collect and use structured data (e.g. databases and data warehouses, reports to external entities) and 10.57% entirely agree with this statement. As much as 23.35% of representatives of medical institutions stated “I agree or disagree”. Other medical facilities do not collect and use structured data (7.93%) and 6.17% strongly disagree with the first statement. Also, the median calculated based on the obtained results (median: 4), proves that medical facilities in Poland collect and use structured data (Table 4 ).

In turn, 28.19% of the medical institutions agreed that they rather collect and use unstructured data and as much as 9.25% entirely agree with this statement. The number of representatives of medical institutions that stated “I agree or disagree” was 27.31%. Other medical facilities do not collect and use structured data (17.18%) and 13.66% strongly disagree with the first statement. In the case of unstructured data the median is 3, which means that the collection and use of this type of data by medical facilities in Poland is lower.

In the further part of the analysis, it was checked whether the size of the medical facility and form of ownership have an impact on whether it analyzes unstructured data (Tables 4 and 5 ). In order to find this out, correlation coefficients were calculated.

Based on the calculations, it can be concluded that there is a small statistically monotonic correlation between the size of the medical facility and its collection and use of structured data (p < 0.001; τ = 0.16). This means that the use of structured data is slightly increasing in larger medical facilities. The size of the medical facility is more important according to use of unstructured data (p < 0.001; τ = 0.23) (Table 4 .).

To determine whether the form of medical facility ownership affects data collection, the Mann–Whitney U test was used. The calculations show that the form of ownership does not affect what data the organization collects and uses (Table 5 ).

Detailed information on the sources of from which medical facilities collect and use data is presented in the Table 6 .

The questionnaire results show that medical facilities are especially using information published in databases, reports to external units and transaction data, but they also use unstructured data from e-mails, medical devices, sensors, phone calls, audio and video data (Table 6 ). Data from social media, RFID and geolocation data are used to a small extent. Similar findings are concluded in the literature studies.

From the analysis of the answers given by the respondents, more than half of the medical facilities have integrated hospital system (HIS) implemented. As much as 43.61% use integrated hospital system and 16.30% use it extensively (Table 7 ). 19.38% of exanimated medical facilities do not use it at all. Moreover, most of the examined medical facilities (34.80% use it, 32.16% use extensively) conduct medical documentation in an electronic form, which gives an opportunity to use data analytics. Only 4.85% of medical facilities don’t use it at all.

Other problems that needed to be investigated were: whether medical facilities in Poland use data analytics? If so, in what form and in what areas? (Table 8 ). The analysis of answers given by the respondents about the potential of data analytics in medical facilities shows that a similar number of medical facilities use data analytics in administration and business (31.72% agreed with the statement no. 5 and 12.33% strongly agreed) as in the clinical area (33.04% agreed with the statement no. 6 and 12.33% strongly agreed). When considering decision-making issues, 35.24% agree with the statement "the organization uses data and analytical systems to support business decisions” and 8.37% of respondents strongly agree. Almost 40.09% agree with the statement that “the organization uses data and analytical systems to support clinical decisions (in the field of diagnostics and therapy)” and 15.42% of respondents strongly agree. Exanimated medical facilities use in their activity analytics based both on historical data (33.48% agree with statement 7 and 12.78% strongly agree) and predictive analytics (33.04% agrees with the statement number 8 and 15.86% strongly agree). Detailed results are presented in Table 8 .

Medical facilities focus on development in the field of data processing, as they confirm that they conduct analytical planning processes systematically and analyze new opportunities for strategic use of analytics in business and clinical activities (38.33% rather agree and 10.57% strongly agree with this statement). The situation is different with real-time data analysis, here, the situation is not so optimistic. Only 28.19% rather agree and 14.10% strongly agree with the statement that real-time analyses are performed to support an organization’s activities.

When considering whether a facility’s performance in the clinical area depends on the form of ownership, it can be concluded that taking the average and the Mann–Whitney U test depends. A higher degree of use of analyses in the clinical area can be observed in public institutions.

Whether a medical facility performs a descriptive or predictive analysis do not depend on the form of ownership (p > 0.05). It can be concluded that when analyzing the mean and median, they are higher in public facilities, than in private ones. What is more, the Mann–Whitney U test shows that these variables are dependent from each other (p < 0.05) (Table 9 ).

When considering whether a facility’s performance in the clinical area depends on its size, it can be concluded that taking the Kendall’s Tau (τ) it depends (p < 0.001; τ = 0.22), and the correlation is weak but statistically important. This means that the use of data and analytical systems to support clinical decisions (in the field of diagnostics and therapy) increases with the increase of size of the medical facility. A similar relationship, but even less powerful, can be found in the use of descriptive and predictive analyses (Table 10 ).

Considering the results of research in the area of analytical maturity of medical facilities, 8.81% of medical facilities stated that they are at the first level of maturity, i.e. an organization has developed analytical skills and does not perform analyses. As much as 13.66% of medical facilities confirmed that they have poor analytical skills, while 38.33% of the medical facility has located itself at level 3, meaning that “there is a lot to do in analytics”. On the other hand, 28.19% believe that analytical capabilities are well developed and 6.61% stated that analytics are at the highest level and the analytical capabilities are very well developed. Detailed data is presented in Table 11 . Average amounts to 3.11 and Median to 3.

The results of the research have enabled the formulation of following conclusions. Medical facilities in Poland are working on both structured and unstructured data. This data comes from databases, transactions, unstructured content of emails and documents, devices and sensors. However, the use of data from social media is smaller. In their activity, they reach for analytics in the administrative and business, as well as in the clinical area. Also, the decisions made are largely data-driven.

In summary, analysis of the literature that the benefits that medical facilities can get using Big Data Analytics in their activities relate primarily to patients, physicians and medical facilities. It can be confirmed that: patients will be better informed, will receive treatments that will work for them, will have prescribed medications that work for them and not be given unnecessary medications [ 78 ]. Physician roles will likely change to more of a consultant than decision maker. They will advise, warn, and help individual patients and have more time to form positive and lasting relationships with their patients in order to help people. Medical facilities will see changes as well, for example in fewer unnecessary hospitalizations, resulting initially in less revenue, but after the market adjusts, also the accomplishment [ 78 ]. The use of Big Data Analytics can literally revolutionize the way healthcare is practiced for better health and disease reduction.

The analysis of the latest data reveals that data analytics increase the accuracy of diagnoses. Physicians can use predictive algorithms to help them make more accurate diagnoses [ 45 ]. Moreover, it could be helpful in preventive medicine and public health because with early intervention, many diseases can be prevented or ameliorated [ 29 ]. Predictive analytics also allows to identify risk factors for a given patient, and with this knowledge patients will be able to change their lives what, in turn, may contribute to the fact that population disease patterns may dramatically change, resulting in savings in medical costs. Moreover, personalized medicine is the best solution for an individual patient seeking treatment. It can help doctors decide the exact treatments for those individuals. Better diagnoses and more targeted treatments will naturally lead to increases in good outcomes and fewer resources used, including doctors’ time.

The quantitative analysis of the research carried out and presented in this article made it possible to determine whether medical facilities in Poland use Big Data Analytics and if so, in which areas. Thanks to the results obtained it was possible to formulate the following conclusions. Medical facilities are working on both structured and unstructured data, which comes from databases, transactions, unstructured content of emails and documents, devices and sensors. According to analytics, they reach for analytics in the administrative and business, as well as in the clinical area. It clearly showed that the decisions made are largely data-driven. The results of the study confirm what has been analyzed in the literature. Medical facilities are moving towards data-based healthcare and its benefits.

In conclusion, Big Data Analytics has the potential for positive impact and global implications in healthcare. Future research on the use of Big Data in medical facilities will concern the definition of strategies adopted by medical facilities to promote and implement such solutions, as well as the benefits they gain from the use of Big Data analysis and how the perspectives in this area are seen.

Practical implications

This work sought to narrow the gap that exists in analyzing the possibility of using Big Data Analytics in healthcare. Showing how medical facilities in Poland are doing in this respect is an element that is part of global research carried out in this area, including [ 29 , 32 , 60 ].

Limitations and future directions

The research described in this article does not fully exhaust the questions related to the use of Big Data Analytics in Polish healthcare facilities. Only some of the dimensions characterizing the use of data by medical facilities in Poland have been examined. In order to get the full picture, it would be necessary to examine the results of using structured and unstructured data analytics in healthcare. Future research may examine the benefits that medical institutions achieve as a result of the analysis of structured and unstructured data in the clinical and management areas and what limitations they encounter in these areas. For this purpose, it is planned to conduct in-depth interviews with chosen medical facilities in Poland. These facilities could give additional data for empirical analyses based more on their suggestions. Further research should also include medical institutions from beyond the borders of Poland, enabling international comparative analyses.

Future research in the healthcare field has virtually endless possibilities. These regard the use of Big Data Analytics to diagnose specific conditions [ 47 , 66 , 69 , 76 ], propose an approach that can be used in other healthcare applications and create mechanisms to identify “patients like me” [ 75 , 80 ]. Big Data Analytics could also be used for studies related to the spread of pandemics, the efficacy of covid treatment [ 18 , 79 ], or psychology and psychiatry studies, e.g. emotion recognition [ 35 ].

Availability of data and materials

The datasets for this study are available on request to the corresponding author.

Abouelmehdi K, Beni-Hessane A, Khaloufi H. Big healthcare data: preserving security and privacy. J Big Data. 2018. https://doi.org/10.1186/s40537-017-0110-7 .

Article   Google Scholar  

Agrawal A, Choudhary A. Health services data: big data analytics for deriving predictive healthcare insights. Health Serv Eval. 2019. https://doi.org/10.1007/978-1-4899-7673-4_2-1 .

Al Mayahi S, Al-Badi A, Tarhini A. Exploring the potential benefits of big data analytics in providing smart healthcare. In: Miraz MH, Excell P, Ware A, Ali M, Soomro S, editors. Emerging technologies in computing—first international conference, iCETiC 2018, proceedings (Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST). Cham: Springer; 2018. p. 247–58. https://doi.org/10.1007/978-3-319-95450-9_21 .

Bainbridge M. Big data challenges for clinical and precision medicine. In: Househ M, Kushniruk A, Borycki E, editors. Big data, big challenges: a healthcare perspective: background, issues, solutions and research directions. Cham: Springer; 2019. p. 17–31.

Google Scholar  

Bartuś K, Batko K, Lorek P. Business intelligence systems: barriers during implementation. In: Jabłoński M, editor. Strategic performance management new concept and contemporary trends. New York: Nova Science Publishers; 2017. p. 299–327. ISBN: 978-1-53612-681-5.

Bartuś K, Batko K, Lorek P. Diagnoza wykorzystania big data w organizacjach-wybrane wyniki badań. Informatyka Ekonomiczna. 2017;3(45):9–20.

Bartuś K, Batko K, Lorek P. Wykorzystanie rozwiązań business intelligence, competitive intelligence i big data w przedsiębiorstwach województwa śląskiego. Przegląd Organizacji. 2018;2:33–9.

Batko K. Możliwości wykorzystania Big Data w ochronie zdrowia. Roczniki Kolegium Analiz Ekonomicznych. 2016;42:267–82.

Bi Z, Cochran D. Big data analytics with applications. J Manag Anal. 2014;1(4):249–65. https://doi.org/10.1080/23270012.2014.992985 .

Boerma T, Requejo J, Victora CG, Amouzou A, Asha G, Agyepong I, Borghi J. Countdown to 2030: tracking progress towards universal coverage for reproductive, maternal, newborn, and child health. Lancet. 2018;391(10129):1538–48.

Bollier D, Firestone CM. The promise and peril of big data. Washington, D.C: Aspen Institute, Communications and Society Program; 2010. p. 1–66.

Bose R. Competitive intelligence process and tools for intelligence analysis. Ind Manag Data Syst. 2008;108(4):510–28.

Carter P. Big data analytics: future architectures, skills and roadmaps for the CIO: in white paper, IDC sponsored by SAS. 2011. p. 1–16.

Castro EM, Van Regenmortel T, Vanhaecht K, Sermeus W, Van Hecke A. Patient empowerment, patient participation and patient-centeredness in hospital care: a concept analysis based on a literature review. Patient Educ Couns. 2016;99(12):1923–39.

Chen H, Chiang RH, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Q. 2012;36(4):1165–88.

Chen CP, Zhang CY. Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci. 2014;275:314–47.

Chomiak-Orsa I, Mrozek B. Główne perspektywy wykorzystania big data w mediach społecznościowych. Informatyka Ekonomiczna. 2017;3(45):44–54.

Corsi A, de Souza FF, Pagani RN, et al. Big data analytics as a tool for fighting pandemics: a systematic review of literature. J Ambient Intell Hum Comput. 2021;12:9163–80. https://doi.org/10.1007/s12652-020-02617-4 .

Davenport TH, Harris JG. Competing on analytics, the new science of winning. Boston: Harvard Business School Publishing Corporation; 2007.

Davenport TH. Big data at work: dispelling the myths, uncovering the opportunities. Boston: Harvard Business School Publishing; 2014.

De Cnudde S, Martens D. Loyal to your city? A data mining analysis of a public service loyalty program. Decis Support Syst. 2015;73:74–84.

Erickson S, Rothberg H. Data, information, and intelligence. In: Rodriguez E, editor. The analytics process. Boca Raton: Auerbach Publications; 2017. p. 111–26.

Fang H, Zhang Z, Wang CJ, Daneshmand M, Wang C, Wang H. A survey of big data research. IEEE Netw. 2015;29(5):6–9.

Fredriksson C. Organizational knowledge creation with big data. A case study of the concept and practical use of big data in a local government context. 2016. https://www.abo.fi/fakultet/media/22103/fredriksson.pdf .

Gandomi A, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag. 2015;35(2):137–44.

Groves P, Kayyali B, Knott D, Van Kuiken S. The ‘big data’ revolution in healthcare. Accelerating value and innovation. 2015. http://www.pharmatalents.es/assets/files/Big_Data_Revolution.pdf (Reading: 10.04.2019).

Gupta V, Rathmore N. Deriving business intelligence from unstructured data. Int J Inf Comput Technol. 2013;3(9):971–6.

Gupta V, Singh VK, Ghose U, Mukhija P. A quantitative and text-based characterization of big data research. J Intell Fuzzy Syst. 2019;36:4659–75.

Hampel HOBS, O’Bryant SE, Castrillo JI, Ritchie C, Rojkova K, Broich K, Escott-Price V. PRECISION MEDICINE-the golden gate for detection, treatment and prevention of Alzheimer’s disease. J Prev Alzheimer’s Dis. 2016;3(4):243.

Harerimana GB, Jang J, Kim W, Park HK. Health big data analytics: a technology survey. IEEE Access. 2018;6:65661–78. https://doi.org/10.1109/ACCESS.2018.2878254 .

Hu H, Wen Y, Chua TS, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.

Hussain S, Hussain M, Afzal M, Hussain J, Bang J, Seung H, Lee S. Semantic preservation of standardized healthcare documents in big data. Int J Med Inform. 2019;129:133–45. https://doi.org/10.1016/j.ijmedinf.2019.05.024 .

Islam MS, Hasan MM, Wang X, Germack H. A systematic review on healthcare analytics: application and theoretical perspective of data mining. In: Healthcare. Basel: Multidisciplinary Digital Publishing Institute; 2018. p. 54.

Ismail A, Shehab A, El-Henawy IM. Healthcare analysis in smart big data analytics: reviews, challenges and recommendations. In: Security in smart cities: models, applications, and challenges. Cham: Springer; 2019. p. 27–45.

Jain N, Gupta V, Shubham S, et al. Understanding cartoon emotion using integrated deep neural network on large dataset. Neural Comput Appl. 2021. https://doi.org/10.1007/s00521-021-06003-9 .

Janssen M, van der Voort H, Wahyudi A. Factors influencing big data decision-making quality. J Bus Res. 2017;70:338–45.

Jordan SR. Beneficence and the expert bureaucracy. Public Integr. 2014;16(4):375–94. https://doi.org/10.2753/PIN1099-9922160404 .

Knapp MM. Big data. J Electron Resourc Med Libr. 2013;10(4):215–22.

Koti MS, Alamma BH. Predictive analytics techniques using big data for healthcare databases. In: Smart intelligent computing and applications. New York: Springer; 2019. p. 679–86.

Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff. 2014;33(7):1163–70.

Kruse CS, Goswamy R, Raval YJ, Marawi S. Challenges and opportunities of big data in healthcare: a systematic review. JMIR Med Inform. 2016;4(4):e38.

Kyoungyoung J, Gang HK. Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Healthc Inform Res. 2013;19(2):79–85.

Laney D. Application delivery strategies 2011. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf .

Lee IK, Wang CC, Lin MC, Kung CT, Lan KC, Lee CT. Effective strategies to prevent coronavirus disease-2019 (COVID-19) outbreak in hospital. J Hosp Infect. 2020;105(1):102.

Lerner I, Veil R, Nguyen DP, Luu VP, Jantzen R. Revolution in health care: how will data science impact doctor-patient relationships? Front Public Health. 2018;6:99.

Lytras MD, Papadopoulou P, editors. Applying big data analytics in bioinformatics and medicine. IGI Global: Hershey; 2017.

Ma K, et al. Big data in multiple sclerosis: development of a web-based longitudinal study viewer in an imaging informatics-based eFolder system for complex data analysis and management. In: Proceedings volume 9418, medical imaging 2015: PACS and imaging informatics: next generation and innovations. 2015. p. 941809. https://doi.org/10.1117/12.2082650 .

Mach-Król M. Analiza i strategia big data w organizacjach. In: Studia i Materiały Polskiego Stowarzyszenia Zarządzania Wiedzą. 2015;74:43–55.

Madsen LB. Data-driven healthcare: how analytics and BI are transforming the industry. Hoboken: Wiley; 2014.

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung BA. Big data: the next frontier for innovation, competition, and productivity. Washington: McKinsey Global Institute; 2011.

Marconi K, Dobra M, Thompson C. The use of big data in healthcare. In: Liebowitz J, editor. Big data and business analytics. Boca Raton: CRC Press; 2012. p. 229–48.

Mehta N, Pandit A. Concurrence of big data analytics and healthcare: a systematic review. Int J Med Inform. 2018;114:57–65.

Michel M, Lupton D. Toward a manifesto for the ‘public understanding of big data.’ Public Underst Sci. 2016;25(1):104–16. https://doi.org/10.1177/0963662515609005 .

Mikalef P, Krogstie J. Big data analytics as an enabler of process innovation capabilities: a configurational approach. In: International conference on business process management. Cham: Springer; 2018. p. 426–41.

Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M. Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor. 2018;20(4):2923–60.

Nambiar R, Bhardwaj R, Sethi A, Vargheese R. A look at challenges and opportunities of big data analytics in healthcare. In: 2013 IEEE international conference on big data; 2013. p. 17–22.

Ohlhorst F. Big data analytics: turning big data into big money, vol. 65. Hoboken: Wiley; 2012.

Olszak C, Mach-Król M. A conceptual framework for assessing an organization’s readiness to adopt big data. Sustainability. 2018;10(10):3734.

Olszak CM. Toward better understanding and use of business intelligence in organizations. Inf Syst Manag. 2016;33(2):105–23.

Palanisamy V, Thirunavukarasu R. Implications of big data analytics in developing healthcare frameworks—a review. J King Saud Univ Comput Inf Sci. 2017;31(4):415–25.

Provost F, Fawcett T. Data science and its relationship to big data and data-driven decisionmaking. Big Data. 2013;1(1):51–9.

Raghupathi W, Raghupathi V. An overview of health analytics. J Health Med Inform. 2013;4:132. https://doi.org/10.4172/2157-7420.1000132 .

Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2(1):3.

Ratia M, Myllärniemi J. Beyond IC 4.0: the future potential of BI-tool utilization in the private healthcare, conference: proceedings IFKAD, 2018 at: Delft, The Netherlands.

Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018. https://doi.org/10.1515/jib-2017-0030 .

Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13(6):350–9. https://doi.org/10.1038/nrcardio.2016.42 .

Schmarzo B. Big data: understanding how data powers big business. Indianapolis: Wiley; 2013.

Senthilkumar SA, Rai BK, Meshram AA, Gunasekaran A, Chandrakumarmangalam S. Big data in healthcare management: a review of literature. Am J Theor Appl Bus. 2018;4:57–69.

Shubham S, Jain N, Gupta V, et al. Identify glomeruli in human kidney tissue images using a deep learning approach. Soft Comput. 2021. https://doi.org/10.1007/s00500-021-06143-z .

Thuemmler C. The case for health 4.0. In: Thuemmler C, Bai C, editors. Health 4.0: how virtualization and big data are revolutionizing healthcare. New York: Springer; 2017.

Tsai CW, Lai CF, Chao HC, et al. Big data analytics: a survey. J Big Data. 2015;2:21. https://doi.org/10.1186/s40537-015-0030-3 .

Wamba SF, Gunasekaran A, Akter S, Ji-fan RS, Dubey R, Childe SJ. Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res. 2017;70:356–65.

Wang Y, Byrd TA. Business analytics-enabled decision-making effectiveness through knowledge absorptive capacity in health care. J Knowl Manag. 2017;21(3):517–39.

Wang Y, Kung L, Wang W, Yu C, Cegielski CG. An integrated big data analytics-enabled transformation model: application to healthcare. Inf Manag. 2018;55(1):64–79.

Wicks P, et al. Scaling PatientsLikeMe via a “generalized platform” for members with chronic illness: web-based survey study of benefits arising. J Med Internet Res. 2018;20(5):e175.

Willems SM, et al. The potential use of big data in oncology. Oral Oncol. 2019;98:8–12. https://doi.org/10.1016/j.oraloncology.2019.09.003 .

Williams N, Ferdinand NP, Croft R. Project management maturity in the age of big data. Int J Manag Proj Bus. 2014;7(2):311–7.

Winters-Miner LA. Seven ways predictive analytics can improve healthcare. Medical predictive analytics have the potential to revolutionize healthcare around the world. 2014. https://www.elsevier.com/connect/seven-ways-predictive-analytics-can-improve-healthcare (Reading: 15.04.2019).

Wu J, et al. Application of big data technology for COVID-19 prevention and control in China: lessons and recommendations. J Med Internet Res. 2020;22(10): e21980.

Yan L, Peng J, Tan Y. Network dynamics: how can we find patients like us? Inf Syst Res. 2015;26(3):496–512.

Yang JJ, Li J, Mulder J, Wang Y, Chen S, Wu H, Pan H. Emerging information technologies for enhanced healthcare. Comput Ind. 2015;69:3–11.

Zhang Q, Yang LT, Chen Z, Li P. A survey on deep learning for big data. Inf Fusion. 2018;42:146–57.

Download references

Acknowledgements

We would like to thank those who have touched our science paths.

This research was fully funded as statutory activity—subsidy of Ministry of Science and Higher Education granted for Technical University of Czestochowa on maintaining research potential in 2018. Research Number: BS/PB–622/3020/2014/P. Publication fee for the paper was financed by the University of Economics in Katowice.

Author information

Authors and affiliations.

Department of Business Informatics, University of Economics in Katowice, Katowice, Poland

Kornelia Batko

Department of Biomedical Processes and Systems, Institute of Health and Nutrition Sciences, Częstochowa University of Technology, Częstochowa, Poland

Andrzej Ślęzak

You can also search for this author in PubMed   Google Scholar

Contributions

KB proposed the concept of research and its design. The manuscript was prepared by KB with the consultation of AŚ. AŚ reviewed the manuscript for getting its fine shape. KB prepared the manuscript in the contexts such as definition of intellectual content, literature search, data acquisition, data analysis, and so on. AŚ obtained research funding. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Kornelia Batko .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Batko, K., Ślęzak, A. The use of Big Data Analytics in healthcare. J Big Data 9 , 3 (2022). https://doi.org/10.1186/s40537-021-00553-4

Download citation

Received : 28 August 2021

Accepted : 19 December 2021

Published : 06 January 2022

DOI : https://doi.org/10.1186/s40537-021-00553-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big Data Analytics
  • Data-driven healthcare

the systematic analysis of large databases to solve problems and make informed decisions

  • Open access
  • Published: 06 December 2017

Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study

  • Wichor M. Bramer 1 ,
  • Melissa L. Rethlefsen 2 ,
  • Jos Kleijnen 3 , 4 &
  • Oscar H. Franco 5  

Systematic Reviews volume  6 , Article number:  245 ( 2017 ) Cite this article

139k Accesses

807 Citations

88 Altmetric

Metrics details

Within systematic reviews, when searching for relevant references, it is advisable to use multiple databases. However, searching databases is laborious and time-consuming, as syntax of search strategies are database specific. We aimed to determine the optimal combination of databases needed to conduct efficient searches in systematic reviews and whether the current practice in published reviews is appropriate. While previous studies determined the coverage of databases, we analyzed the actual retrieval from the original searches for systematic reviews.

Since May 2013, the first author prospectively recorded results from systematic review searches that he performed at his institution. PubMed was used to identify systematic reviews published using our search strategy results. For each published systematic review, we extracted the references of the included studies. Using the prospectively recorded results and the studies included in the publications, we calculated recall, precision, and number needed to read for single databases and databases in combination. We assessed the frequency at which databases and combinations would achieve varying levels of recall (i.e., 95%). For a sample of 200 recently published systematic reviews, we calculated how many had used enough databases to ensure 95% recall.

A total of 58 published systematic reviews were included, totaling 1746 relevant references identified by our database searches, while 84 included references had been retrieved by other search methods. Sixteen percent of the included references (291 articles) were only found in a single database; Embase produced the most unique references ( n  = 132). The combination of Embase, MEDLINE, Web of Science Core Collection, and Google Scholar performed best, achieving an overall recall of 98.3 and 100% recall in 72% of systematic reviews. We estimate that 60% of published systematic reviews do not retrieve 95% of all available relevant references as many fail to search important databases. Other specialized databases, such as CINAHL or PsycINFO, add unique references to some reviews where the topic of the review is related to the focus of the database.

Conclusions

Optimal searches in systematic reviews should search at least Embase, MEDLINE, Web of Science, and Google Scholar as a minimum requirement to guarantee adequate and efficient coverage.

Peer Review reports

Investigators and information specialists searching for relevant references for a systematic review (SR) are generally advised to search multiple databases and to use additional methods to be able to adequately identify all literature related to the topic of interest [ 1 , 2 , 3 , 4 , 5 , 6 ]. The Cochrane Handbook, for example, recommends the use of at least MEDLINE and Cochrane Central and, when available, Embase for identifying reports of randomized controlled trials [ 7 ]. There are disadvantages to using multiple databases. It is laborious for searchers to translate a search strategy into multiple interfaces and search syntaxes, as field codes and proximity operators differ between interfaces. Differences in thesaurus terms between databases add another significant burden for translation. Furthermore, it is time-consuming for reviewers who have to screen more, and likely irrelevant, titles and abstracts. Lastly, access to databases is often limited and only available on subscription basis.

Previous studies have investigated the added value of different databases on different topics [ 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]. Some concluded that searching only one database can be sufficient as searching other databases has no effect on the outcome [ 16 , 17 ]. Nevertheless others have concluded that a single database is not sufficient to retrieve all references for systematic reviews [ 18 , 19 ]. Most articles on this topic draw their conclusions based on the coverage of databases [ 14 ]. A recent paper tried to find an acceptable number needed to read for adding an additional database; sadly, however, no true conclusion could be drawn [ 20 ]. However, whether an article is present in a database may not translate to being found by a search in that database. Because of this major limitation, the question of which databases are necessary to retrieve all relevant references for a systematic review remains unanswered. Therefore, we research the probability that single or various combinations of databases retrieve the most relevant references in a systematic review by studying actual retrieval in various databases.

The aim of our research is to determine the combination of databases needed for systematic review searches to provide efficient results (i.e., to minimize the burden for the investigators without reducing the validity of the research by missing relevant references). A secondary aim is to investigate the current practice of databases searched for published reviews. Are included references being missed because the review authors failed to search a certain database?

Development of search strategies

At Erasmus MC, search strategies for systematic reviews are often designed via a librarian-mediated search service. The information specialists of Erasmus MC developed an efficient method that helps them perform searches in many databases in a much shorter time than other methods. This method of literature searching and a pragmatic evaluation thereof are published in separate journal articles [ 21 , 22 ]. In short, the method consists of an efficient way to combine thesaurus terms and title/abstract terms into a single line search strategy. This search is then optimized. Articles that are indexed with a set of identified thesaurus terms, but do not contain the current search terms in title or abstract, are screened to discover potential new terms. New candidate terms are added to the basic search and evaluated. Once optimal recall is achieved, macros are used to translate the search syntaxes between databases, though manual adaptation of the thesaurus terms is still necessary.

Review projects at Erasmus MC cover a wide range of medical topics, from therapeutic effectiveness and diagnostic accuracy to ethics and public health. In general, searches are developed in MEDLINE in Ovid (Ovid MEDLINE® In-Process & Other Non-Indexed Citations, Ovid MEDLINE® Daily and Ovid MEDLINE®, from 1946); Embase.com (searching both Embase and MEDLINE records, with full coverage including Embase Classic); the Cochrane Central Register of Controlled Trials (CENTRAL) via the Wiley Interface; Web of Science Core Collection (hereafter called Web of Science); PubMed restricting to records in the subset “as supplied by publisher” to find references that not yet indexed in MEDLINE (using the syntax publisher [sb]); and Google Scholar. In general, we use the first 200 references as sorted in the relevance ranking of Google Scholar. When the number of references from other databases was low, we expected the total number of potential relevant references to be low. In this case, the number of hits from Google Scholar was limited to 100. When the overall number of hits was low, we additionally searched Scopus, and when appropriate for the topic, we included CINAHL (EBSCOhost), PsycINFO (Ovid), and SportDiscus (EBSCOhost) in our search.

Beginning in May 2013, the number of records retrieved from each search for each database was recorded at the moment of searching. The complete results from all databases used for each of the systematic reviews were imported into a unique EndNote library upon search completion and saved without deduplication for this research. The researchers that requested the search received a deduplicated EndNote file from which they selected the references relevant for inclusion in their systematic review. All searches in this study were developed and executed by W.M.B.

Determining relevant references of published reviews

We searched PubMed in July 2016 for all reviews published since 2014 where first authors were affiliated to Erasmus MC, Rotterdam, the Netherlands, and matched those with search registrations performed by the medical library of Erasmus MC. This search was used in earlier research [ 21 ]. Published reviews were included if the search strategies and results had been documented at the time of the last update and if, at minimum, the databases Embase, MEDLINE, Cochrane CENTRAL, Web of Science, and Google Scholar had been used in the review. From the published journal article, we extracted the list of final included references. We documented the department of the first author. To categorize the types of patient/population and intervention, we identified broad MeSH terms relating to the most important disease and intervention discussed in the article. We copied from the MeSH tree the top MeSH term directly below the disease category or, in to case of the intervention, directly below the therapeutics MeSH term. We selected the domain from a pre-defined set of broad domains, including therapy, etiology, epidemiology, diagnosis, management, and prognosis. Lastly, we checked whether the reviews described limiting their included references to a particular study design.

To identify whether our searches had found the included references, and if so, from which database(s) that citation was retrieved, each included reference was located in the original corresponding EndNote library using the first author name combined with the publication year as a search term for each specific relevant publication. If this resulted in extraneous results, the search was subsequently limited using a distinct part of the title or a second author name. Based on the record numbers of the search results in EndNote, we determined from which database these references came. If an included reference was not found in the EndNote file, we presumed the authors used an alternative method of identifying the reference (e.g., examining cited references, contacting prominent authors, or searching gray literature), and we did not include it in our analysis.

Data analysis

We determined the databases that contributed most to the reviews by the number of unique references retrieved by each database used in the reviews. Unique references were included articles that had been found by only one database search. Those databases that contributed the most unique included references were then considered candidate databases to determine the most optimal combination of databases in the further analyses.

In Excel, we calculated the performance of each individual database and various combinations. Performance was measured using recall, precision, and number needed to read. See Table  1 for definitions of these measures. These values were calculated both for all reviews combined and per individual review.

Performance of a search can be expressed in different ways. Depending on the goal of the search, different measures may be optimized. In the case of a clinical question, precision is most important, as a practicing clinician does not have a lot of time to read through many articles in a clinical setting. When searching for a systematic review, recall is the most important aspect, as the researcher does not want to miss any relevant references. As our research is performed on systematic reviews, the main performance measure is recall.

We identified all included references that were uniquely identified by a single database. For the databases that retrieved the most unique included references, we calculated the number of references retrieved (after deduplication) and the number of included references that had been retrieved by all possible combinations of these databases, in total and per review. For all individual reviews, we determined the median recall, the minimum recall, and the percentage of reviews for which each single database or combination retrieved 100% recall.

For each review that we investigated, we determined what the recall was for all possible different database combinations of the most important databases. Based on these, we determined the percentage of reviews where that database combination had achieved 100% recall, more than 95%, more than 90%, and more than 80%. Based on the number of results per database both before and after deduplication as recorded at the time of searching, we calculated the ratio between the total number of results and the number of results for each database and combination.

Improvement of precision was calculated as the ratio between the original precision from the searches in all databases and the precision for each database and combination.

To compare our practice of database usage in systematic reviews against current practice as evidenced in the literature, we analyzed a set of 200 recent systematic reviews from PubMed. On 5 January 2017, we searched PubMed for articles with the phrase “systematic review” in the title. Starting with the most recent articles, we determined the databases searched either from the abstract or from the full text until we had data for 200 reviews. For the individual databases and combinations that were used in those reviews, we multiplied the frequency of occurrence in that set of 200 with the probability that the database or combination would lead to an acceptable recall (which we defined at 95%) that we had measured in our own data.

Our earlier research had resulted in 206 systematic reviews published between 2014 and July 2016, in which the first author was affiliated with Erasmus MC [ 21 ]. In 73 of these, the searches and results had been documented by the first author of this article at the time of the last search. Of those, 15 could not be included in this research, since they had not searched all databases we investigated here. Therefore, for this research, a total of 58 systematic reviews were analyzed. The references to these reviews can be found in Additional file 1 . An overview of the broad topical categories covered in these reviews is given in Table  2 . Many of the reviews were initiated by members of the departments of surgery and epidemiology. The reviews covered a wide variety of disease, none of which was present in more than 12% of the reviews. The interventions were mostly from the chemicals and drugs category, or surgical procedures. Over a third of the reviews were therapeutic, while slightly under a quarter answered an etiological question. Most reviews did not limit to certain study designs, 9% limited to RCTs only, and another 9% limited to other study types.

Together, these reviews included a total of 1830 references. Of these, 84 references (4.6%) had not been retrieved by our database searches and were not included in our analysis, leaving in total 1746 references. In our analyses, we combined the results from MEDLINE in Ovid and PubMed (the subset as supplied by publisher) into one database labeled MEDLINE.

Unique references per database

A total of 292 (17%) references were found by only one database. Table  3 displays the number of unique results retrieved for each single database. Embase retrieved the most unique included references, followed by MEDLINE, Web of Science, and Google Scholar. Cochrane CENTRAL is absent from the table, as for the five reviews limited to randomized trials, it did not add any unique included references. Subject-specific databases such as CINAHL, PsycINFO, and SportDiscus only retrieved additional included references when the topic of the review was directly related to their special content, respectively nursing, psychiatry, and sports medicine.

Overall performance

The four databases that had retrieved the most unique references (Embase, MEDLINE, Web of Science, and Google Scholar) were investigated individually and in all possible combinations (see Table  4 ). Of the individual databases, Embase had the highest overall recall (85.9%). Of the combinations of two databases, Embase and MEDLINE had the best results (92.8%). Embase and MEDLINE combined with either Google Scholar or Web of Science scored similarly well on overall recall (95.9%). However, the combination with Google Scholar had a higher precision and higher median recall, a higher minimum recall, and a higher proportion of reviews that retrieved all included references. Using both Web of Science and Google Scholar in addition to MEDLINE and Embase increased the overall recall to 98.3%. The higher recall from adding extra databases came at a cost in number needed to read (NNR). Searching only Embase produced an NNR of 57 on average, whereas, for the optimal combination of four databases, the NNR was 73.

Probability of appropriate recall

We calculated the recall for individual databases and databases in all possible combination for all reviews included in the research. Figure  1 shows the percentages of reviews where a certain database combination led to a certain recall. For example, in 48% of all systematic reviews, the combination of Embase and MEDLINE (with or without Cochrane CENTRAL; Cochrane CENTRAL did not add unique relevant references) reaches a recall of at least 95%. In 72% of studied systematic reviews, the combination of Embase, MEDLINE, Web of Science, and Google Scholar retrieved all included references. In the top bar, we present the results of the complete database searches relative to the total number of included references. This shows that many database searches missed relevant references.

Percentage of systematic reviews for which a certain database combination reached a certain recall. The X -axis represents the percentage of reviews for which a specific combination of databases, as shown on the y -axis, reached a certain recall (represented with bar colors). Abbreviations: EM Embase, ML MEDLINE, WoS Web of Science, GS Google Scholar. Asterisk indicates that the recall of all databases has been calculated over all included references. The recall of the database combinations was calculated over all included references retrieved by any database

Differences between domains of reviews

We analyzed whether the added value of Web of Science and Google Scholar was dependent of the domain of the review. For 55 reviews, we determined the domain. See Fig.  2 for the comparison of the recall of Embase, MEDLINE, and Cochrane CENTRAL per review for all identified domains. For all but one domain, the traditional combination of Embase, MEDLINE, and Cochrane CENTRAL did not retrieve enough included references. For four out of five systematic reviews that limited to randomized controlled trials (RCTs) only, the traditional combination retrieved 100% of all included references. However, for one review of this domain, the recall was 82%. Of the 11 references included in this review, one was found only in Google Scholar and one only in Web of Science.

Percentage of systematic reviews of a certain domain for which the combination Embase, MEDLINE and Cochrane CENTRAL reached a certain recall

Reduction in number of results

We calculated the ratio between the number of results found when searching all databases, including databases not included in our analyses, such as Scopus, PsycINFO, and CINAHL, and the number of results found searching a selection of databases. See Fig.  3 for the legend of the plots in Figs.  4 and 5 . Figure  4 shows the distribution of this value for individual reviews. The database combinations with the highest recall did not reduce the total number of results by large margins. Moreover, in combinations where the number of results was greatly reduced, the recall of included references was lower.

Legend of Figs. 3 and 4

The ratio between number of results per database combination and the total number of results for all databases

The ratio between precision per database combination and the total precision for all databases

Improvement of precision

To determine how searching multiple databases affected precision, we calculated for each combination the ratio between the original precision, observed when all databases were searched, and the precision calculated for different database combinations. Figure  5 shows the improvement of precision for 15 databases and database combinations. Because precision is defined as the number of relevant references divided by the number of total results, we see a strong correlation with the total number of results.

Status of current practice of database selection

From a set of 200 recent SRs identified via PubMed, we analyzed the databases that had been searched. Almost all reviews (97%) reported a search in MEDLINE. Other databases that we identified as essential for good recall were searched much less frequently; Embase was searched in 61% and Web of Science in 35%, and Google Scholar was only used in 10% of all reviews. For all individual databases or combinations of the four important databases from our research (MEDLINE, Embase, Web of Science, and Google Scholar), we multiplied the frequency of occurrence of that combination in the random set, with the probability we found in our research that this combination would lead to an acceptable recall of 95%. The calculation is shown in Table  5 . For example, around a third of the reviews (37%) relied on the combination of MEDLINE and Embase. Based on our findings, this combination achieves acceptable recall about half the time (47%). This implies that 17% of the reviews in the PubMed sample would have achieved an acceptable recall of 95%. The sum of all these values is the total probability of acceptable recall in the random sample. Based on these calculations, we estimate that the probability that this random set of reviews retrieved more than 95% of all possible included references was 40%. Using similar calculations, also shown in Table  5 , we estimated the probability that 100% of relevant references were retrieved is 23%.

Our study shows that, to reach maximum recall, searches in systematic reviews ought to include a combination of databases. To ensure adequate performance in searches (i.e., recall, precision, and number needed to read), we find that literature searches for a systematic review should, at minimum, be performed in the combination of the following four databases: Embase, MEDLINE (including Epub ahead of print), Web of Science Core Collection, and Google Scholar. Using that combination, 93% of the systematic reviews in our study obtained levels of recall that could be considered acceptable (> 95%). Unique results from specialized databases that closely match systematic review topics, such as PsycINFO for reviews in the fields of behavioral sciences and mental health or CINAHL for reviews on the topics of nursing or allied health, indicate that specialized databases should be used additionally when appropriate.

We find that Embase is critical for acceptable recall in a review and should always be searched for medically oriented systematic reviews. However, Embase is only accessible via a paid subscription, which generally makes it challenging for review teams not affiliated with academic medical centers to access. The highest scoring database combination without Embase is a combination of MEDLINE, Web of Science, and Google Scholar, but that reaches satisfactory recall for only 39% of all investigated systematic reviews, while still requiring a paid subscription to Web of Science. Of the five reviews that included only RCTs, four reached 100% recall if MEDLINE, Web of Science, and Google Scholar combined were complemented with Cochrane CENTRAL.

The Cochrane Handbook recommends searching MEDLINE, Cochrane CENTRAL, and Embase for systematic reviews of RCTs. For reviews in our study that included RCTs only, indeed, this recommendation was sufficient for four (80%) of the reviews. The one review where it was insufficient was about alternative medicine, specifically meditation and relaxation therapy, where one of the missed studies was published in the Indian Journal of Positive Psychology . The other study from the Journal of Advanced Nursing is indexed in MEDLINE and Embase but was only retrieved because of the addition of KeyWords Plus in Web of Science. We estimate more than 50% of reviews that include more study types than RCTs would miss more than 5% of included references if only traditional combination of MEDLINE, Embase, and Cochrane CENTAL is searched.

We are aware that the Cochrane Handbook [ 7 ] recommends more than only these databases, but further recommendations focus on regional and specialized databases. Though we occasionally used the regional databases LILACS and SciELO in our reviews, they did not provide unique references in our study. Subject-specific databases like PsycINFO only added unique references to a small percentage of systematic reviews when they had been used for the search. The third key database we identified in this research, Web of Science, is only mentioned as a citation index in the Cochrane Handbook, not as a bibliographic database. To our surprise, Cochrane CENTRAL did not identify any unique included studies that had not been retrieved by the other databases, not even for the five reviews focusing entirely on RCTs. If Erasmus MC authors had conducted more reviews that included only RCTs, Cochrane CENTRAL might have added more unique references.

MEDLINE did find unique references that had not been found in Embase, although our searches in Embase included all MEDLINE records. It is likely caused by difference in thesaurus terms that were added, but further analysis would be required to determine reasons for not finding the MEDLINE records in Embase. Although Embase covers MEDLINE, it apparently does not index every article from MEDLINE. Thirty-seven references were found in MEDLINE (Ovid) but were not available in Embase.com . These are mostly unique PubMed references, which are not assigned MeSH terms, and are often freely available via PubMed Central.

Google Scholar adds relevant articles not found in the other databases, possibly because it indexes the full text of all articles. It therefore finds articles in which the topic of research is not mentioned in title, abstract, or thesaurus terms, but where the concepts are only discussed in the full text. Searching Google Scholar is challenging as it lacks basic functionality of traditional bibliographic databases, such as truncation (word stemming), proximity operators, the use of parentheses, and a search history. Additionally, search strategies are limited to a maximum of 256 characters, which means that creating a thorough search strategy can be laborious.

Whether Embase and Web of Science can be replaced by Scopus remains uncertain. We have not yet gathered enough data to be able to make a full comparison between Embase and Scopus. In 23 reviews included in this research, Scopus was searched. In 12 reviews (52%), Scopus retrieved 100% of all included references retrieved by Embase or Web of Science. In the other 48%, the recall by Scopus was suboptimal, in one occasion as low as 38%.

Of all reviews in which we searched CINAHL and PsycINFO, respectively, for 6 and 9% of the reviews, unique references were found. For CINAHL and PsycINFO, in one case each, unique relevant references were found. In both these reviews, the topic was highly related to the topic of the database. Although we did not use these special topic databases in all of our reviews, given the low number of reviews where these databases added relevant references, and observing the special topics of those reviews, we suggest that these subject databases will only add value if the topic is related to the topic of the database.

Many articles written on this topic have calculated overall recall of several reviews, instead of the effects on all individual reviews. Researchers planning a systematic review generally perform one review, and they need to estimate the probability that they may miss relevant articles in their search. When looking at the overall recall, the combination of Embase and MEDLINE and either Google Scholar or Web of Science could be regarded sufficient with 96% recall. This number however is not an answer to the question of a researcher performing a systematic review, regarding which databases should be searched. A researcher wants to be able to estimate the chances that his or her current project will miss a relevant reference. However, when looking at individual reviews, the probability of missing more than 5% of included references found through database searching is 33% when Google Scholar is used together with Embase and MEDLINE and 30% for the Web of Science, Embase, and MEDLINE combination. What is considered acceptable recall for systematic review searches is open for debate and can differ between individuals and groups. Some reviewers might accept a potential loss of 5% of relevant references; others would want to pursue 100% recall, no matter what cost. Using the results in this research, review teams can decide, based on their idea of acceptable recall and the desired probability which databases to include in their searches.

Strengths and limitations

We did not investigate whether the loss of certain references had resulted in changes to the conclusion of the reviews. Of course, the loss of a minor non-randomized included study that follows the systematic review’s conclusions would not be as problematic as losing a major included randomized controlled trial with contradictory results. However, the wide range of scope, topic, and criteria between systematic reviews and their related review types make it very hard to answer this question.

We found that two databases previously not recommended as essential for systematic review searching, Web of Science and Google Scholar, were key to improving recall in the reviews we investigated. Because this is a novel finding, we cannot conclude whether it is due to our dataset or to a generalizable principle. It is likely that topical differences in systematic reviews may impact whether databases such as Web of Science and Google Scholar add value to the review. One explanation for our finding may be that if the research question is very specific, the topic of research might not always be mentioned in the title and/or abstract. In that case, Google Scholar might add value by searching the full text of articles. If the research question is more interdisciplinary, a broader science database such as Web of Science is likely to add value. The topics of the reviews studied here may simply have fallen into those categories, though the diversity of the included reviews may point to a more universal applicability.

Although we searched PubMed as supplied by publisher separately from MEDLINE in Ovid, we combined the included references of these databases into one measurement in our analysis. Until 2016, the most complete MEDLINE selection in Ovid still lacked the electronic publications that were already available in PubMed. These could be retrieved by searching PubMed with the subset as supplied by publisher. Since the introduction of the more complete MEDLINE collection Epub Ahead of Print , In-Process & Other Non-Indexed Citations , and Ovid MEDLINE® , the need to separately search PubMed as supplied by publisher has disappeared. According to our data, PubMed’s “as supplied by publisher” subset retrieved 12 unique included references, and it was the most important addition in terms of relevant references to the four major databases. It is therefore important to search MEDLINE including the “Epub Ahead of Print, In-Process, and Other Non-Indexed Citations” references.

These results may not be generalizable to other studies for other reasons. The skills and experience of the searcher are one of the most important aspects in the effectiveness of systematic review search strategies [ 23 , 24 , 25 ]. The searcher in the case of all 58 systematic reviews is an experienced biomedical information specialist. Though we suspect that searchers who are not information specialists or librarians would have a higher possibility of less well-constructed searches and searches with lower recall, even highly trained searchers differ in their approaches to searching. For this study, we searched to achieve as high a recall as possible, though our search strategies, like any other search strategy, still missed some relevant references because relevant terms had not been used in the search. We are not implying that a combined search of the four recommended databases will never result in relevant references being missed, rather that failure to search any one of these four databases will likely lead to relevant references being missed. Our experience in this study shows that additional efforts, such as hand searching, reference checking, and contacting key players, should be made to retrieve extra possible includes.

Based on our calculations made by looking at random systematic reviews in PubMed, we estimate that 60% of these reviews are likely to have missed more than 5% of relevant references only because of the combinations of databases that were used. That is with the generous assumption that the searches in those databases had been designed sensitively enough. Even when taking into account that many searchers consider the use of Scopus as a replacement of Embase, plus taking into account the large overlap of Scopus and Web of Science, this estimate remains similar. Also, while the Scopus and Web of Science assumptions we made might be true for coverage, they are likely very different when looking at recall, as Scopus does not allow the use of the full features of a thesaurus. We see that reviewers rarely use Web of Science and especially Google Scholar in their searches, though they retrieve a great deal of unique references in our reviews. Systematic review searchers should consider using these databases if they are available to them, and if their institution lacks availability, they should ask other institutes to cooperate on their systematic review searches.

The major strength of our paper is that it is the first large-scale study we know of to assess database performance for systematic reviews using prospectively collected data. Prior research on database importance for systematic reviews has looked primarily at whether included references could have theoretically been found in a certain database, but most have been unable to ascertain whether the researchers actually found the articles in those databases [ 10 , 12 , 16 , 17 , 26 ]. Whether a reference is available in a database is important, but whether the article can be found in a precise search with reasonable recall is not only impacted by the database’s coverage. Our experience has shown us that it is also impacted by the ability of the searcher, the accuracy of indexing of the database, and the complexity of terminology in a particular field. Because these studies based on retrospective analysis of database coverage do not account for the searchers’ abilities, the actual findings from the searches performed, and the indexing for particular articles, their conclusions lack immediate translatability into practice. This research goes beyond retrospectively assessed coverage to investigate real search performance in databases. Many of the articles reporting on previous research concluded that one database was able to retrieve most included references. Halladay et al. [ 10 ] and van Enst et al. [ 16 ] concluded that databases other than MEDLINE/PubMed did not change the outcomes of the review, while Rice et al. [ 17 ] found the added value of other databases only for newer, non-indexed references. In addition, Michaleff et al. [ 26 ] found that Cochrane CENTRAL included 95% of all RCTs included in the reviews investigated. Our conclusion that Web of Science and Google Scholar are needed for completeness has not been shared by previous research. Most of the previous studies did not include these two databases in their research.

We recommend that, regardless of their topic, searches for biomedical systematic reviews should combine Embase, MEDLINE (including electronic publications ahead of print), Web of Science (Core Collection), and Google Scholar (the 200 first relevant references) at minimum. Special topics databases such as CINAHL and PsycINFO should be added if the topic of the review directly touches the primary focus of a specialized subject database, like CINAHL for focus on nursing and allied health or PsycINFO for behavioral sciences and mental health. For reviews where RCTs are the desired study design, Cochrane CENTRAL may be similarly useful. Ignoring one or more of the databases that we identified as the four key databases will result in more precise searches with a lower number of results, but the researchers should decide whether that is worth the >increased probability of losing relevant references. This study also highlights once more that searching databases alone is, nevertheless, not enough to retrieve all relevant references.

Future research should continue to investigate recall of actual searches beyond coverage of databases and should consider focusing on the most optimal database combinations, not on single databases.

Levay P, Raynor M, Tuvey D. The contributions of MEDLINE, other bibliographic databases and various search techniques to NICE public health guidance. Evid Based Libr Inf Pract. 2015;10:50–68.

Article   Google Scholar  

Stevinson C, Lawlor DA. Searching multiple databases for systematic reviews: added value or diminishing returns? Complement Ther Med. 2004;12:228–32.

Article   CAS   PubMed   Google Scholar  

Lawrence DW. What is lost when searching only one literature database for articles relevant to injury prevention and safety promotion? Inj Prev. 2008;14:401–4.

Lemeshow AR, Blum RE, Berlin JA, Stoto MA, Colditz GA. Searching one or two databases was insufficient for meta-analysis of observational studies. J Clin Epidemiol. 2005;58:867–73.

Article   PubMed   Google Scholar  

Zheng MH, Zhang X, Ye Q, Chen YP. Searching additional databases except PubMed are necessary for a systematic review. Stroke. 2008;39:e139. author reply e140

Beyer FR, Wright K. Can we prioritise which databases to search? A case study using a systematic review of frozen shoulder management. Health Inf Libr J. 2013;30:49–58.

Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions: The Cochrane Collaboration, London, United Kingdom. 2011.

Wright K, Golder S, Lewis-Light K. What value is the CINAHL database when searching for systematic reviews of qualitative studies? Syst Rev. 2015;4:104.

Article   PubMed   PubMed Central   Google Scholar  

Wilkins T, Gillies RA, Davies K. EMBASE versus MEDLINE for family medicine searches: can MEDLINE searches find the forest or a tree? Can Fam Physician. 2005;51:848–9.

PubMed   Google Scholar  

Halladay CW, Trikalinos TA, Schmid IT, Schmid CH, Dahabreh IJ. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. J Clin Epidemiol. 2015;68:1076–84.

Ahmadi M, Ershad-Sarabi R, Jamshidiorak R, Bahaodini K. Comparison of bibliographic databases in retrieving information on telemedicine. J Kerman Univ Med Sci. 2014;21:343–54.

Google Scholar  

Lorenzetti DL, Topfer L-A, Dennett L, Clement F. Value of databases other than MEDLINE for rapid health technology assessments. Int J Technol Assess Health Care. 2014;30:173–8.

Beckles Z, Glover S, Ashe J, Stockton S, Boynton J, Lai R, Alderson P. Searching CINAHL did not add value to clinical questions posed in NICE guidelines. J Clin Epidemiol. 2013;66:1051–7.

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. The contribution of databases to the results of systematic reviews: a cross-sectional study. BMC Med Res Methodol. 2016;16:1–13.

Aagaard T, Lund H, Juhl C. Optimizing literature search in systematic reviews—are MEDLINE, EMBASE and CENTRAL enough for identifying effect studies within the area of musculoskeletal disorders? BMC Med Res Methodol. 2016;16:161.

van Enst WA, Scholten RJ, Whiting P, Zwinderman AH, Hooft L. Meta-epidemiologic analysis indicates that MEDLINE searches are sufficient for diagnostic test accuracy systematic reviews. J Clin Epidemiol. 2014;67:1192–9.

Rice DB, Kloda LA, Levis B, Qi B, Kingsland E, Thombs BD. Are MEDLINE searches sufficient for systematic reviews and meta-analyses of the diagnostic accuracy of depression screening tools? A review of meta-analyses. J Psychosom Res. 2016;87:7–13.

Bramer WM, Giustini D, Kramer BM, Anderson PF. The comparative recall of Google Scholar versus PubMed in identical searches for biomedical systematic reviews: a review of searches used in systematic reviews. Syst Rev. 2013;2:115.

Bramer WM, Giustini D, Kramer BMR. Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: a prospective study. Syst Rev. 2016;5:39.

Ross-White A, Godfrey C. Is there an optimum number needed to retrieve to justify inclusion of a database in a systematic review search? Health Inf Libr J. 2017;33:217–24.

Bramer WM, Rethlefsen ML, Mast F, Kleijnen J. A pragmatic evaluation of a new method for librarian-mediated literature searches for systematic reviews. Res Synth Methods. 2017. doi: 10.1002/jrsm.1279 .

Bramer WM, de Jonge GB, Rethlefsen ML, Mast F, Kleijnen J. A systematic approach to searching: how to perform high quality literature searches more efficiently. J Med Libr Assoc. 2018.

Rethlefsen ML, Farrell AM, Osterhaus Trzasko LC, Brigham TJ. Librarian co-authors correlated with higher quality reported search strategies in general internal medicine systematic reviews. J Clin Epidemiol. 2015;68:617–26.

McGowan J, Sampson M. Systematic reviews need systematic searchers. J Med Libr Assoc. 2005;93:74–80.

PubMed   PubMed Central   Google Scholar  

McKibbon KA, Haynes RB, Dilks CJW, Ramsden MF, Ryan NC, Baker L, Flemming T, Fitzgerald D. How good are clinical MEDLINE searches? A comparative study of clinical end-user and librarian searches. Comput Biomed Res. 1990;23:583–93.

Michaleff ZA, Costa LO, Moseley AM, Maher CG, Elkins MR, Herbert RD, Sherrington C. CENTRAL, PEDro, PubMed, and EMBASE are the most comprehensive databases indexing randomized controlled trials of physical therapy interventions. Phys Ther. 2011;91:190–7.

Download references

Acknowledgements

Not applicable

Melissa Rethlefsen receives funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR001067. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Availability of data and materials

The datasets generated and/or analyzed during the current study are available from the corresponding author on a reasonable request.

Author information

Authors and affiliations.

Medical Library, Erasmus MC, Erasmus University Medical Centre Rotterdam, 3000 CS, Rotterdam, the Netherlands

Wichor M. Bramer

Spencer S. Eccles Health Sciences Library, University of Utah, Salt Lake City, Utah, USA

Melissa L. Rethlefsen

Kleijnen Systematic Reviews Ltd., York, UK

Jos Kleijnen

School for Public Health and Primary Care (CAPHRI), Maastricht University, Maastricht, the Netherlands

Department of Epidemiology, Erasmus MC, Erasmus University Medical Centre Rotterdam, Rotterdam, the Netherlands

Oscar H. Franco

You can also search for this author in PubMed   Google Scholar

Contributions

WB, JK, and OF designed the study. WB designed the searches used in this study and gathered the data. WB and ML analyzed the data. WB drafted the first manuscript, which was revised critically by the other authors. All authors have approved the final manuscript.

Corresponding author

Correspondence to Wichor M. Bramer .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

WB has received travel allowance from Embase for giving a presentation at a conference. The other authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Reviews included in the research . References to the systematic reviews published by Erasmus MC authors that were included in the research. (DOCX 19 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Bramer, W.M., Rethlefsen, M.L., Kleijnen, J. et al. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev 6 , 245 (2017). https://doi.org/10.1186/s13643-017-0644-y

Download citation

Received : 21 August 2017

Accepted : 24 November 2017

Published : 06 December 2017

DOI : https://doi.org/10.1186/s13643-017-0644-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Databases, bibliographic
  • Review literature as topic
  • Sensitivity and specificity
  • Information storage and retrieval

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

the systematic analysis of large databases to solve problems and make informed decisions

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • BMC Health Serv Res

Logo of bmchsr

How can big data analytics be used for healthcare organization management? Literary framework and future research from a systematic review

Nicola cozzoli.

Department of Economics, University of Foggia, Via Caggese n.1, Foggia, Italy

Fiorella Pia Salvatore

Nicola faccilongo, michele milone, associated data.

The datasets analyzed during the current study are not publicly available due to data relating to scientific journal names and authors but are available from the corresponding author on reasonable request.

Multiple attempts aimed at highlighting the relationship between big data analytics and benefits for healthcare organizations have been raised in the literature. The big data impact on health organization management is still not clear due to the relationship’s multi-disciplinary nature. This study aims to answer three research questions: a) What is the state of art of big data analytics adopted by healthcare organizations? b) What about the benefits for both health managers and healthcare organizations? c) What about future directions on big data analytics research in healthcare?

Through a systematic literature review the impact of big data analytics on healthcare management has been examined. The study aims to map extant literature and present a framework for future scholars to further build on, and executives to be guided by.

The positive relationship between big data analytics and healthcare organization management has emerged. To find out common elements in the studies reviewed, 16 studies have been selected and clustered into 4 research areas: 1) Potentialities of big data analytics. 2) Resource management. 3) Big data analytics and management of health surveillance systems. 4) Big data analytics and technology for healthcare organization.

Conclusions

In conclusion is identified how the big data analytics solutions are considered a milestone for managerial studies applied to healthcare organizations, although scientific research needs to investigate standardization and integration of the devices as well as the protocol in data analysis to improve the performance of the healthcare organization.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12913-022-08167-z.

Big data is transforming and will transform the healthcare organizations in the near future [ 1 , 2 ]. Scientific literature in the managerial context applied to healthcare organizations, consider the Big Data Analytics (BDA) a fundamental tool, so much so that it has attracted the attention of the scientific community and stakeholders [ 3 ]. However, a premise should be made: data by themselves explain little, thus, to be useful in the healthcare organization management, firstly it is necessary to validate their quality, and secondly, find the right correlations. In other words, the data should be processed, analyzed, and interpreted with the appropriate tools [ 4 , 5 ].

Technological applications in healthcare BDA-related are rapidly increasing [ 6 ] and will increasingly characterize managers’ decision-making process. For example, IBM’s Watson project [ 7 ] is a "super-computer" that has scoured through several million scientific articles over the last twenty years and uses artificial intelligence tools (e.g., Machine Learning) to correlate disease symptoms and predict possible diagnostic scenarios. This case helps to understand how and to what extent BDA could really support healthcare managers to improve their decision processes, while increasing the performance of the healthcare organization.

Nowadays, the amount of data is no longer an issue. Internet traffic reports from Cisco and other network operators have estimated the entire digital universe to be 44 zettabytes and 463 exabytes will be the daily information could be generated by 2025. A new era took place in which the processes of production and management of human knowledge will no longer be the exclusive preserve of humans; machines will also play their part as knowledge producers [ 8 ]. From pharmaceutical companies to healthcare organizations, this enormous potential of data products, combined with IoT applications and AI tools [ 9 – 11 ], will play a significant role in the near future. Today, the medical applications based on IoT allow the monitoring of clinical data through the production of data generated by special devices (e.g., wearable devices) [ 12 ], remotely accessible by a physician rather than by caregivers [ 13 ].

The market size is a useful indicator of how much the healthcare organizations are turning their attention to new management models based on the use of big data. By 2025, the big data market in healthcare will touch $70 billion with a record 568% growth in 10 years. The use of such a tool not only represents a complex challenge [ 14 ], but also opens opportunities for all those involved in the healthcare supply chain who manage decision-making processes. Moreover, if on the one hand this technology will influence the definition of new managerial strategies within healthcare organizations, on the other hand, it will have positive repercussions on the effectiveness and efficiency of healthcare processes [ 15 ]. Indeed, the big data technology is used by healthcare managers to get, for example, information related to the list of doctors and nurses, the list of drugs with their expiration date, etc., in order to have tools for facilitating decision-making processes, improving the quality of services provided, and, at the same time, rationalizing the use of resources, by facilitating the management of the healthcare organization as a whole.

The BDA satisfies multiple needs that, on the one hand, influence the quality of the healthcare organization’s performance and, on the other hand, are useful in directing management strategies to improve the supply of healthcare services. Below there are some strategies, which aim to:

  • Provide specific services to patients, from diagnostics to preventive medicine passing through therapeutic adherence.
  • Detect the onset and spread of diseases in advance.
  • Observe parameters inherent to hospital quality standards, promoting control and prevention actions.
  • Modify treatment techniques.
  • Facilitate research and development in pharmacology, reducing the time to market of drugs.
  • Facilitate research and development of new and specific medical devices.

The main aim of this research is, therefore, to provide both an integrative framework on the state of art, and perspectives on how the BDA can be useful for the management of the healthcare organization. Considering the results, food-for-thought on how this technological and cultural revolution will affect the modus operandi of healthcare organizations will be launched.

Through an overview of recent scientific studies, this research aims to raise awareness among both practitioners and managers about BDA tools applied to healthcare management to address more effectively and efficiently the challenges imposed by an increasing demand for healthcare services.

In this regard, the study provides a systematic literature review (SLR) to explore the effect of BDA on the healthcare management by analyzing articles from the Scopus database during a period of 5 years (2016 – 2021).

Furthermore, the result through a content analysis, aspires to be a privileged starting point to find out potential barriers and opportunities provided by BDA-based management systems for smarter healthcare organization. Specifically, the study answers different research questions (RQs) as different levels of analysis have been performed. By analyzing the relationship between BDA-based management systems and the benefits delivered to the organizations, the research could not be conducted without exploring the state of art of BDA tools deployed in the field of healthcare. Thus, starting from this background the discussion on the future perspectives on BDA development in the healthcare organizations appears as a need.

Theoretical framework

Why use BDA and how to exploit its potential for healthcare organization management? This is the main question asked by managers and decision makers working in the healthcare sector. In recent years there have been multiple attempts in the literature aimed at highlighting the relationship between implementation of BDA and benefits for healthcare organizations, in terms of both resource efficiency and process management.

In 2017, a study by Wang and Hajli [ 16 ] has proposed a model founded on Resource-Based Theory and BDA Capabilities (BDAC) to explain the relationship between BDA, benefits, and value creation for healthcare organizations. As stated by Srinivasan and Swink [ 17 ], BDAC refers to “ organizational facility with tools, techniques, and processes that enable a firm to process, organize, visualize, and analyze data, thereby producing insights that enable data-driven operational planning, decision-making, and execution ”. In the healthcare organization, BDAC represents the ability to collect, store, analyze, and process huge volume variety, and velocity of health data come from various sources to improve data-driven decisions [ 18 , 19 ]. Indeed, the study of Wang and Hajli [ 16 ], validated on an empirical basis by 109 cases of BDA tools implementation in 63 healthcare organizations, has demonstrated how specific "path-to-value" can be identified. By varying degrees of relevance of the identified pathways, it has been shown that alongside the challenges of implementing certain BDA tools, there are corresponding specific benefits for healthcare organizations. Preliminarily, the study has defined the ability to analyze big data through the concept of Information Lifecycle Management (ILM) [ 20 ]. In this perspective, the capabilities of the BDA in healthcare organizations are configured as the abilities to process health data from diverse sources and provide significant information to healthcare managers. Thorough BDA, managers can detect timely indicators and identify business strategies, which allow them to put in place perspective plans, efficient strategies, and programs to increase the performance of organizations.

Researchers have found that BDA capabilities primarily stem from the implementation of various tools and features. Specifically, in order of importance, BDA capabilities are firstly triggered by processing tools (e.g., OLAP, machine learning, NLP), followed by aggregation tools (e.g., data warehouse tools), and, secondly, by data visualization tools and capabilities (e.g., visual dashboards/systems, reporting systems/interfaces).

Among the potentials triggered by the implementation of BDA in the healthcare organization, the analytical one was the main capability, that is the ability to process clinical data characterized by immense volume, variety (from text to graph), and speed (from batch to streaming), using descriptive analysis techniques [ 21 , 22 ]. In this regard, it is important to note that BDA-based management systems are the only ones capable of analyzing semi-structured or unstructured data. This represents a crucial element for revealing correlation patterns that are difficult to determine with traditional management systems [ 23 ]. Furthermore, the launch of these systems in a healthcare organization ensures the ability to effectively manage outputs regarding care process and service in order to constantly improve the performance of the organization. In summary, the characteristics of BDA-based management systems implemented in a healthcare organization, are:

  • predictive analytics capability, i.e., the ability to explore data and identify useful correlations, patterns and trends, and extrapolate them to predict what is likely to occur in the future [ 24 , 25 ];
  • interoperability capability, i.e., the ability to integrate data and processes to support management, collaboration, and sharing across different healthcare departments, managers, and facilities [ 26 ], and finally,
  • traceability capability, i.e., the ability to integrate and track all patient history data from different IT facilities and different healthcare units.

In terms of expected benefits from the BDA implementation, the study of Wang and Hajli [ 16 ] has showed that the most important ones are obtained from improved operational activities, such as improved quality and accuracy of healthcare decisions, rapid processing of issues, and the ability to enable treatments proactively before patients’ conditions worsen. Next, in terms of relevance, they were the benefits related to IT infrastructure, such as standardization and reduced costs for redundant infrastructure and the ability to quickly transfer data between different IT systems. Substantially, they have delivered a useful business model that healthcare managers can draw on to evaluate the specific leverages they need to activate in relation to the implementation of the BDA-based management systems. In addition to highlighting the undoubted benefits, the authors clearly show how specific BDA tools can facilitate the decision-making processes of healthcare managers and make them faster and more effective.

In another study carried out to identify BDA benefits and supports, and to drive organizational strategies, Wang, Kung, and Byrd [ 19 ], through the analysis of 26 case studies related to the BDA applications in the healthcare organization, have identified five "capabilities" of BDA: analytic capability for care patterns, unstructured data analytical capability, decision support, predictive, and traceability capabilities [ 19 ]. The study is remarkably interesting because in addition to mapping precise benefits, it also recommends specific strategies considering the BDA implementation for healthcare organizations. These strategies are useful for achieving effective results by leveraging the potential of BDA.

The first successful strategy is to implement governance based on the use of big data, starting with a definition of objectives, procedures, and key performance indicators (KPIs). Once again, one of the discriminating factors for success in implementing such a strategy remains the integration of information systems and the standardization of data protocols that often come from heterogeneous sources already existing in healthcare organizations. The second strategy is related to developing a culture of data sharing. The third one considers the training of healthcare managers, who cannot ignore knowledge related to BDA, for example on the use of data mining and business intelligence tools. The fourth strategy is related to the storage of big data, often available in heterogeneous formats, and is identified in the transition from the more expensive traditional storage systems (NAS) to more efficient and effective systems such as cloud computing solutions. The last strategic driver involves pathways related to the implementation of predictive BDA models. The mastery of KPIs, interactive visualization and data aggregation tools such as dashboards and reports should be acquired instruments for healthcare managers and in general for healthcare organizations oriented to BDA driven process management strategies.

More recent studies focus attention on the management practices supply chain in healthcare. In the study performed by Yu et al. [ 27 ], the authors, interviewing senior executives in Chinese hospitals, show on both a theoretical and empirical basis, how BDAC positively impacts the three dimensions of hospital supply chain integration (SCI) (inter-functional integration, hospital-patient integration and hospital-supplier integration) and how SCI, in turn, contributes to improve the operational flexibility [ 27 ]. By “operational flexibility” in the healthcare organization, it is meant the ability of a ward to adapt its operating procedures in relation to unforeseen circumstances while meeting the needs of patients [ 28 , 29 ].

The scholars have delivered an important contribution in demonstrating the relationship between BDAC, SCI, and operational flexibility from multiple perspectives, by providing useful management guidance for healthcare executives and managers involved in the supply chain. By analyzing and processing medical and managerial data with advanced analytical techniques, Chinese healthcare organizations were able to facilitate decision-making process with timely and appropriate actions, for example, tracking people's movements during the lockdown caused by the Coronavirus, understanding ongoing health trends, and managing pharmaceutical supplies [ 30 , 31 ].

This theoretical framework provides a key to interpreting the benefits offered by good practices deriving from the use of the BDA in the healthcare organization.

At the same time, the rigorous scientific method allows the validation of empirical experiences in relation to clear theoretical references. In the next paragraph projects that demonstrate what is stated in the literature are shown.

Practical framework

N(ursing)  +  Care App is an mHealth application that supports the work of frontline health workers (FHW) in developing countries [ 32 ]. The system is designed to collect not only patient data, but also diagnostic images. It is also given the opportunity to add recommended doctors based on the advice of FHWs in case the patient needs to follow a specific hospital visit.

For healthcare managers, predicting the number of emergency department accesses is a critical issue which complicates the optimization of the human resource management. To this end, Intel, and Assistance Publique-Hôpitaux de Paris (AP-HP), the largest hospital university in Europe, leveraging datasets from multiple sources, worked together to build a cloud-based solution to predict the number of patient visits to emergency rooms and hospital admissions. This predictive analytics tool, will enable healthcare managers at AP-HP hospitals to know the number of emergency room visits and hospital admissions at 15 days in order to reduce wait times, optimize human resource (HR) levels based on anticipated needs, accurately plan patient loads, including by pathology, and overall improve the quality and efficiency of services provided by the healthcare organization [ 33 ].

Chronic conditions, if not kept under control through a rigorous program of therapeutic adherence, can become a source of both more serious physical problems for patients and economic burdens for healthcare organizations. Another project that actively introduced BDA tools into healthcare management was carried out by the European Commission to launch production of the drug Enerzair Breezhaler . It was the first drug for the treatment of asthma co-packaged and co-prescribed with the Propeller digital platform. The app sends a reminder to comply with therapeutic adherence and maintains a record of the data, which the patient shares with him or her physician. Studies have demonstrated that the Propeller platform increases the degree of asthma control by up to 63%, therapeutic adherence by up to 58% [ 34 ], and reduces asthma emergency department visits and hospital admissions by up to 57% [ 35 ].

The practical framework described, aided by some empirical experience, only partially reveals the potential offered by BDA. The diffusion of BDA-based management systems in the healthcare organization will trigger a virtuous circle, allowing soon to accumulate increasingly accurate medical data. By exploiting the most advanced AI technologies, BDA will support predictive analysis, allow physicians to make more accurate and faster diagnostic pathways and managers to use results. It will help health practitioners in the decision-making process, optimize the use of resources with a consequent costs reduction and, overall, improve the quality of services provided by healthcare organizations.

The main aim of this study is to update the state of art about the BDA-based management systems adopted in the healthcare organization, underlining management advantages for both the organizations and managers. BDA has the potential to reduce the cost of care, prevent disease outbreaks, and improve the patients’ quality of life. Through its ability to process and cross-reference massive amounts of both management, and clinical information, BDA promises to be an effective support tool for both healthcare managers and patients.

To achieve this aim, a Systematic Literature Review (SLR) was performed. This method identifies, evaluates, and summarizes the updates that raise from the literature about the BDA tools used to improve both the healthcare organizations performance and patients’ quality of life. The method takes inspiration from the protocol used by Khanra S., et al. [ 36 ] which considers inclusion and exclusion criteria.

The present study aims to add a contribute to the literature by addressing three RQs:

  • What is the state of art of BDA adopted by healthcare organizations?
  • What about the benefits for both health managers and healthcare organization?
  • What about future directions on BDA research in healthcare?

To answer the RQs, as widespread electronic database Scopus has been selected. To obtain an international validity of studies, the research only considers papers in English. Utilizing the Boolean operator “AND”, the following keywords have been searched: “big data analytics” AND “healthcare” AND “management”. As inclusion criteria, only papers published from 2016 to 2021 have been considered. As subject areas, “medicine” and “business, management and accounting” have been selected. Instead, as exclusion criteria, article in press and the following documents type: “review”, “book”, “conference review”, “letter” and “note” have not been taken into account. Also, to avoid a dispersal of the study, conference proceedings have been excluded. Following the searching protocol, 34 results have been obtained (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 12913_2022_8167_Fig1_HTML.jpg

Workflow of articles selection

An excel spreadsheet was used to perform the extraction procedures while the statistical analyses were carried out using the software STATA 16 ©. The list of the extracted papers investigated with the content analysis can be found in the Appendix.

The work proceeds through a descriptive analysis. After that, a content analysis has been performed to identify the most relevant characteristics of the BDA-based management systems, underlining the positive impact for the healthcare organizations, without neglecting to outline the trends for the future scenarios and research directions.

According to the SLR, the iterative process shown in the Fig.  1 , has allowed to delete the duplicates and match the results with the RQs.

As shown in Fig.  1 the initial search on Scopus database has delivered 227 results. By limiting research to papers published between 2016 and 2021, 11% of records have been removed. At the second stage, by selecting the subject areas, the screening has allowed to exclude 131 records; thus, the 57.7% of the results initially selected. The last step of the process has conducted to exclude document types such as Review, Book, Conference Review, Letter, and Note. In other words, 37 records were excluded, representing 16.3% of the sample. At the end of the screening process, 34 articles were selected, representing about 15% of the sample.

In the descriptive analysis the time distribution of the studies from 2016 to 2021 is included. It is important to note the increasing of publication trend from 2017 to 2019. This output confirms a growing interest in the research field of BDA applied to healthcare organizations (Fig.  2 ).

An external file that holds a picture, illustration, etc.
Object name is 12913_2022_8167_Fig2_HTML.jpg

Trend of research steams

The trend of research steams considers a sample of 34 scientific contributions as they come from the screening process above described. Although 6% of the total sample was collected in the years 2016 and 2017, it is only indicative of the growing trend of scientific studies on BDA in healthcare sector. The overall incidence in 2018 was 12% but the turning point was reached in 2019 as 32% of the studies collected in the sample were reached. This outcome could be read considering the Covid-19 pandemic outbreak which has been a representative testing ground for BDA tools by helping managers and decision-makers to plan healthcare managerial strategies.

In this context, the use of the BDA by Chinese healthcare organizations for tracking people's flow during the lockdown, represents an important case study that has registered the peak in the time flow of research. By looking at 2020 and 2021 data, which represent respectively 24% and 21% of the total scientific contributions, the growing trend seems to be confirmed by validating the rising interest in BDA research seen as a planning tool for healthcare processes.

The pie-chart shows the scientific production by country. It is necessary to specify that Scopus database clusters the studies by home country author’s organization, therefore the same study could be referred to more than one country and thus belong to more than one cluster.

The geographical locations of the studies showed in the Fig.  3 outlining India, UK, and USA as more than one third of the total scientific producers. It is well known that IT companies as Google, Apple, Amazon, and Microsoft are investing considerable resources on BDA tools for healthcare. China and India contribute together with 22% of the scientific articles. Big data technology has played a key role in virus tracking during the pandemic crisis. The "Internet Plus Healthcare", a big data center in Zhongwei (China), provides cloud services to both healthcare institutions and IT companies. In Yinchuan (China), an industrial park for big data acts as a catalyst for IT company involved in healthcare sector. India confirms to be one of the heavily adopter countries of artificial intelligence, big data analytics, and IoT technologies. Although India must face the challenge to provide basic healthcare services in a predominantly rural country, start-ups with BDA skills in healthcare are springing up.

An external file that holds a picture, illustration, etc.
Object name is 12913_2022_8167_Fig3_HTML.jpg

Geographical locations of the studies

It is also important underlining the performance of the European countries. UK, Greece, Italy, Spain, Germany, and Portugal support the research with almost 40% of the studies published, confirming that Europe will be a driving force for the BDA research in the next future. The development of a European Health Data Space (EHDS) is an ambitious project of the European Commission. It will lead member states to share an efficient infrastructure for both exchange and management health data by providing citizens with equal treatment, free access to clinical data, and quality healthcare services.

In the area “Others” all the other countries contributing marginally to research have been included.

The next step of the study is focused on a content analysis to show the experiences of applying BDA in healthcare organizations.

Starting from the 34 articles selected for the descriptive analysis, to identify in detail the core issue of the study, a second screening was performed. 18 articles were excluded because weakly focused on the research objective which concerns specifically how BDA can be used for healthcare organization management. Thus, after an in-depth reading of abstracts and full papers, the scholars have identified 16 papers closer targeted on the mentioned research objective. The 16 studies selected through a content analysis were clustered into 4 research areas (RAs) as showed in the following table (Table ​ (Table1). 1 ). The clustering procedure identifies 4 relevant topics: Potentialities of BDA (RA1), Resource management (RA2), BDA and management of health surveillance system (RA3), BDA technology for healthcare organization (RA4). The proposed clustering has been though to give an easy-to-go research map and to support the healthcare managers.

Clusters by relevant topics

RA1: potentialities of BDA

Wang and Hajli [ 16 ] define BDA potentialities in the healthcare context as “ the ability to acquire, store, process and analyze large amounts of health data in various forms, and deliver meaningful information to users, which allows them to discover business values and insights in a timely fashion ”. The relationship between BDA and the benefits for the healthcare organizations it has been well expressed by the theory of the “path to value chain” [ 16 ]. This path represents an important contribution to the exploration of business value, not only for drawing the generic and well-established connection between big data capabilities [ 19 ] and the benefits, but also for empirically showing how capabilities can be developed and what benefits can be achieved in the healthcare organizations. Another study included in this area, explores the key role of BDA capabilities in developing healthcare supply chain integrations and its impact on hospital flexibility [ 27 ]. Specifically, the BDA has a fundamental role in developing healthcare integration supply chain and the operational flexibility. Considering the health and economic crises caused by the Covid-19, this dimension of BDA has been an especially important leverage for managers to improve operational flexibility of the healthcare organizations. The ability to provide predictive models and real-time insights, is a powerful prospective of the BDA for helping healthcare professionals and managers in decision-making process. In this regard, the literature presents several applications of big data in healthcare that support the data collection, management, and integration of data in healthcare organizations [ 37 ]. Moreover, BDA enables the integration of massive datasets, supporting decisions of manager and monitoring the managerial aspects of healthcare organizations. Building a decision-making process based on BDA, firstly means identifying the big data keys that can implement ad-hoc strategies to improve efficiency along the healthcare value chain. To this end, the research carried out by Sousa et al., [ 37 ] underlines the benefits that BDA can give to the decision-making process, through predictive models and real-time analytics, assisting in the collection, management, and integration of data in healthcare organizations.

To date, thanks to an integrated and interconnected ecosystem, is becoming possible to provide personalized healthcare services, collect an enormous quantity of both clinical and biometrics data and, thus, implement BDA instruments. Nevertheless, to take a real advantage from these tools and turn them into useful decision support systems (DSS), is necessary for R&D to be focused on data filtering mechanisms in order to obtain good-quality reliable information [ 38 ]. The healthcare models based on BDA and implementation of new healthcare programs, enable both medical and managerial decision support for the healthcare services provision. New types of interactions with and among users of the healthcare ecosystem will produce in the next future a wide variety of complex data, thus, the main challenges refer to information processing and analytics.

In light of the above, the RA1 includes studies for which the quality of data and the need for high performance filtering mechanisms are becoming keys factor for the success of BDA-based management systems in the healthcare organizations. For example, the study carried out by Maglaveras et al., [ 38 ], included in this area, explores new R&D pathways in biomedical information processing and management, as well as to the design of new intelligent decision support systems.

RA2: resource management

Another important research direction emerged from the literature review, concerns positive impact of the BDA on the resource management. Insufficient policy for managing medical materials waste, energy use and environmental burden, restricts the resources conservation. The BDA is extremely useful in this aspect; it could provide in the next future an important contribution to implement the circular economy processes and to support sustainable development initiatives in the healthcare organizations [ 39 ]. To this end, the study developed by Kazançoğlu et al. [ 39 ], underline the importance of circularity and sustainability concepts to mitigate the sector’s negative impacts on the environment. Furthermore, the study identifies the barriers related to circular economy in the healthcare organization and provides solutions to these barriers by implementing BDA-based management systems. Lastly, the authors, have developed a managerial, policy and theoretical framework to support healthcare managers to launch sustainable initiatives in the context of healthcare organization.

The impact on the performance has been also investigated by studies that have linked benefits of BDA and artificial intelligence with green supply chain integration process [ 40 ]. Digital learning is more becoming a “moderator” of the green supply chain process with a significant positive impact on environmental performance of the healthcare organization. BDA-AI technologies will lead to improvement of the environmental process integration and green supply chain collaboration and, consequently, will support the managers’ decisions involved in the supply processes. This study also provides an important reference framework for logistics/supply chain managers who want to implement BDA-AI technologies for supporting green supply processes and enhancing environmental performance of the healthcare organization [ 40 ].

Nowadays, many scholars are focusing on BDA-driven decision support systems to sustain the healthcare managers [ 41 ]. These types of BDA-based analytical tools will provide a useful quantitative support for managers of healthcare organizations. The authors have reported design and technical details of the system implementations using case studies. They have developed a toolkit which represents a framework reference for resources management, allowing to create strategic models and obtain analytical results for evidence-based decisions and managerial evaluations.

In this RA, two other important topics investigated by BDA are: high quality healthcare service, and healthcare costs. Optimize the supply chain activities is an imperative to keep lower the healthcare costs. The data generated by medical equipment and devices can be successfully used in forecasting, decision-making process, and to make more efficient the healthcare supply chain management [ 42 ]. The study carried out by Alotaibi et al. [ 42 ], thus, presents a review on the use of big data in healthcare organizations underling opportunities and challenges deriving from the application of BDA-based management systems within the organizations.

As already asserted, a good implementation of BDA in the healthcare organization will play a fundamental role in improving the clinical outcomes management, giving helpful insights for decision makers and managers, in order to avoiding diseases, reducing healthcare expenses, and improving the performance of the healthcare organization [ 43 ]. However, to achieve these ambitious outcomes the research will face a crucial challenge: how to rationalize, make easily usable, and at affordable costs, heterogeneous data coming from diverse sources. The research developed by Kundella and Gobinath [ 43 ] represents an important contribute to explore key challenges, techniques, technologies, privacy issues, security algorithms and future directions of the use of BDA in the healthcare organization.

RA3: BDA and management of health surveillance system

The rise of BDA promises to solve many healthcare challenges in the developing countries. The BDA applied to healthcare organization help managers to rationalize the resources, and health system to better delivery treatments to the patients [ 44 ]. In this regard, the government of Zambia is thinking to implement BDA solutions to provide more effective and efficient healthcare services. A well-managed health surveillance system represents an important driver to improve the quality of life and reduce the medical waste, especially in developing countries where the lack of resources is severe and limits economic development. For all these reasons, Europe is investing on BDA initiatives in public health and in the oncology sectors, to generate new knowledge, improve clinical care and make more efficient the management of the public health surveillance system [ 45 ]. The BDA capability for identifying specific population pattern, managing high volume of data and turn it into real (or near real) time insights, contributes to identify it as a powerful tool to support the managers for the decision-making processes. Despite this, implementing a BDA-based management systems within the healthcare organizations requires investment in the human capital, strong collaboration with stakeholders, and data integration with and among the healthcare units. To this end, Gunapal et al., [ 46 ] has highlighted that Singapore has setup a Regional Health System (RHS) database to facilitate BDA for proactive population health management (PHM) and health services research [ 46 ]. The structure of the healthcare database has been built collecting data from four database coming from three RHSs: National Healthcare Group (NHG), Tan Tock Seng Hospital (TTSH), National University Hospital (NUH) and Alexandra Hospital (AH). The result has been a database including information useful for the healthcare managers which incorporates data on patient demographics, chronic disease, and healthcare utilization information. These characteristics facilitate the identification of specific patients’ paths linked by past healthcare utilization and chronic disease information. Converging information into a single database helps to understand the cross-utilization of healthcare services across the three RHSs. A such approach allows to setup the RHSs structure for initiative-taking population health management (PHM) and to improve the performance of healthcare organizations [ 46 ].

RA 4: BDA technology for healthcare organization

The wearable devices and different kind of sensors, able to collect clinical data, in combination with BDA, will constitute the basis of personalized medicine and will be crucial tools to improve the performance of healthcare organizations [ 47 ]. The scientific research has to face the important challenge to adapt data acquisition, storage, transmission and analytics to healthcare demand. Thus, the healthcare data should be categorized, homogenized, and implemented into specific models by adapting machine-learning techniques to the nature of the healthcare organization.

A fruitful field of interest for the application of BDA in healthcare organization is the diagnostic imaging. To take out maximum benefits from it and to be useful for managers of healthcare organizations, it is necessary to implement digital platforms and applications [ 48 ]. Indeed, the simple production of a large amount of data does not automatically translate to an advantage for the healthcare performance. Specific applications are required to favor the correct and advantageous management of diagnostic images [ 48 ]. The link between BDA and IoT technologies, as instrument to incorporate the accessibility, capacity to customize, and practical conveyance of clinical data, emerged as another research direction investigated by the papers included in this RA. These tools allow: (1) the healthcare organizations to decrease expenses; (2) the people to self regulates treatments; (3) practitioners to take as quickly as possible decisions in remote way and keep constant contact with patients [ 49 ].

In light of these results, it is possible to state that IoT, big data, and artificial intelligence as machine-learning algorithms, are three of the most significative innovations in the healthcare organization. These types of organizations are implementing home-centric data collection networks and intelligent BDA systems based on machine learning technologies. For example, a high-level implementation of these systems has been efficiently implemented in Cartagena, Colombia, for hypertensive patients by using an e-Health sensor and Amazon Web Services components [ 50 ]. The authors stress the importance of using the combination of IoT, big data, and artificial intelligence as tools to obtain better health outcomes for the communities and improved performance for healthcare organization. The new generation of machine-learning algorithms can use standardized data sets generated by these sources to improve the effectiveness of public health interventions [ 50 ]. To this end, as pointed out by numerous studies in the field of BDA applied on healthcare organizations, it becomes crucial for the next future research to concentrate R&D efforts towards full standardized dataset protocols.

As highlighted by the results, in Europe, as well as in the rest of the world, a significant trend is emerging among healthcare organizations in adopting BDA-based management systems [ 45 ]. Among the clustering process performed, the common element in the studies reviewed is the positive relationship between BDA tools and achievable benefits for healthcare organizations.

As emerged by the RAs, some studies explore business value for healthcare organizations and the concept of potentialities of BDA (RA1) to explain the evidence of precise path-to-value chains leading to specific benefits [ 16 ]. These perspectives provide useful guidelines for healthcare managers who want to consider implementing BDA tools in their organizations. Some authors in particular focus on the role of BDA capabilities in the development of hospital supply chain integration and operational flexibility, demonstrating a positive relationship between the two dimensions [ 27 ]. During the Covid-19 outbreak, it became clearer how important operational flexibility is to healthcare organizations. The scholars also underline how BDA can impact to the efficiency of the decision-making processes in healthcare organizations, through predictive models and real-time analytics, helping health professionals in the collection, management, and analysis [ 37 ].

In general, BDA-based management systems make personalized care programs possible. However, considering the enormous amount and heterogeneity of information available nowadays, it emerges the necessity to address R&D pathways towards data filtering mechanisms and engineering new intelligent decision support systems within the healthcare organizations [ 38 ].

Circular economy (CE) and sustainability concepts are becoming important key drivers in healthcare organizations to reduce negative impact on the environment (RA2). Some study directions look at BDA as tool to provide solution for barriers related to CE and support sustainable development initiatives in the healthcare organizations [ 39 ]. Empirical studies have demonstrated the benefits of BDA-AI in the supply chain integration process and its impact on environmental performance. By assessing a sample of 168 French hospitals, Benzidia et al. [ 40 ], has observed that the use of BDA-AI technologies has a significant impact on environmental process integration and green supply chain. In particular, this study provides important insights for healthcare managers, who wish to implement BDA-AI technologies for sustaining green supply processes and improving environmental performance [ 40 ]. BDA and web technologies can successfully help managers to redesign healthcare processes making them more effective and efficient. Since healthcare spending is constantly growing in the world’s major regions, there is urgent need to redesign processes optimizing supply chain activities such that high-quality services could be provided at lower costs [ 42 ]. Although BDA-based management systems promise to fulfil this role in the healthcare organization, more in-depth studies are required. Due to heterogeneity of information sources, one of future research direction should deeply investigate the protocol standardization and integration in data analyzing as well as techniques and technologies used, security algorithms of BDA in the healthcare and medical data [ 43 ].

In developing countries, as well as in the rest of the world, the management of health surveillance is a sensitive issue (RA3). Therefore, authors have studied main key factors that hind BDA access in the healthcare organization [ 44 ]. Technology, staff, data management and health policies have been identified as some of decisive variables [ 44 ]. Due to increasing of the ageing population and the related disability, healthcare organizations will face hard challenges soon. To this end, big data can also help healthcare managers to detect patterns and to turn high volumes of data into usable knowledges. In this context investments in technological infrastructures are needed as well as in the human capital [ 45 ]. China is proving, with a large scale of investment, to be a pioneer country in the adoption of BDA-based management systems in the healthcare organization [ 46 ].

The rising of AI, IoT, machine learning [ 49 – 51 ], and sensors technology, as well as embedded systems able to communicate each other, have boosted the adoption of BDA with valuable benefits for the healthcare organization (RA4). These technologies will play a fundamental role on big data management to improve the performances of the healthcare organizations. Some authors have underlined privacy issues related to healthcare data and the necessity to make sensor data homogeneous and tagged. Furthermore, implementation of clinical records into models and adaptation of machine-learning techniques is required [ 47 ]. Future R&D in this field should be focused on the developing of digital platforms and specific applications based on BDA also for managing diagnostic images [ 48 ].

By exploring the relationship between BDA-based management systems and the benefits delivered to the healthcare organizations, this study replies to 3 RQs: 1) What is the state of art of BDA adopted by healthcare organizations, 2) What are the benefits for both health managers and healthcare organizations and 3) What are the future directions on BDA research in healthcare.

To answer the RQs the SLR has started from an investigation on the recent literature BDA about the BDA in healthcare organizations. Descriptive analysis has been performed on a sample of 34 studies coming from all over the world. The second stage shows a detailed content analysis on 16 studies which better answer to research question about the relationship between benefits for the healthcare organization and BDA solutions.

By analyzing the successful BDA strategies in healthcare context, some authors focus their attention on the BDA potentialities applied in the healthcare organizations [ 16 , 37 ]. Indeed, the research highlights how analytical tools through personal health systems support public health management systems and how BDA suggests new pathways to support healthcare managers in decision-making process.

In the literature, other scholars highlight the positive impact of BDA on resource management. The BDA solutions are analyzed as tools to sustain CE initiatives [ 38 , 39 ] as well as to enable green supply chain process integration and improve hospital performance [ 40 ]. By exploiting KPIs coming from BDA solutions, some researchers present innovative models for planning public health policy [ 41 ]. In this context, the studies consider BDA cloud computing solutions and social media data analytics for supporting the performance of healthcare supply chain management [ 42 , 43 ]. Furthermore, researchers from all around the world are showing particular interest on BDA for health surveillance system management [ 44 – 46 ].

According to the recent literature, BDA is transforming the healthcare organizations. The SLR has showed how the BDA solutions are now quite considered a milestone for managerial studies applied to healthcare organizations. The Coronavirus pandemic has been a good test run for using BDA to design healthcare policy strategies. Although an extensive literature on BDA to support healthcare management is being produced, the classification into four RAs proposed is an attempt to examine precise key research directions. About that, the limitations of the present research can be detected as the difficulty to review a field of literature constantly evolving. To date, the amount of data is no longer an issue. To be useful in the healthcare context, is necessary to validate their quality and then find the right correlations. In other words, the data should be processed, analyzed, and interpreted correctly. For this reason, emerges the need to address research pathways towards filtering mechanisms, by converting data from big to smart, and engineering new decision support systems within the healthcare organizations [ 38 ].

The content analysis carried out in this research has shown that studies are addressed to find out new models for both predictive and personalized medicine by exploiting BDA technologies [ 47 ]. The researchers underline the added value of using BDA both in the medical diagnostic process [ 48 ] and jointly with IT technologies such as IOT and machine learning [ 49 , 51 ].

Thus, considering the results obtained, it is possible to state that BDA can effectively help healthcare managers to detect common patterns and turn high volumes of data into usable knowledges. Investments on human capital become a priority to exploit the potential of BDA [ 45 ].

To achieve these objectives the future research should provide usable insights and standardized procedures for training healthcare managers and practitioners. AI, machines learning, as well as management strategies, will also play their part as knowledge producers in the healthcare organization. Privacy issues related to healthcare data and also the necessity to make sensor data homogeneous, are becoming crucial research topics to be faced. Finally, due to the heterogeneity of information sources, the future direction of research should investigate the standardization and integration of the protocol in data analysis, as well as the techniques useful for the managerial sector to implement increasingly BDA-based management systems in future healthcare organizations [ 43 ].

Nowadays the challenge for healthcare organizations is the development of useful applications BDA-based. According with the circular economy view, the future research directions should be addressed considering the relationship between digitalization and management resources consumption. The data centralization combined with a BDA approach can effectively support circular economy processes in healthcare supply chain by reducing waste and resource consumptions.

Exploiting the BDA’s capabilities will also be a key factor in forecasting and monitoring outbreaks. Future studies will need to focus on developing more efficient models for sharing data in order to improve the performance of healthcare organizations around the world.

Acknowledgements

Not applicable.

Authors' contributions

NC and FPS designed and conducted the empirical study, wrote and revised the manuscript. NC and FPS carried out the analysis and wrote the results, discussion and conclusions. NC, FPS, NF, and MM revised the manuscript. All authors read the manuscript and approved the final version.

The research was carried out without funding.

Availability of data and materials

Declarations.

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Nicola Cozzoli, Email: [email protected] .

Fiorella Pia Salvatore, Email: [email protected] .

Nicola Faccilongo, Email: [email protected] .

Michele Milone, Email: [email protected] .

IMAGES

  1. The systematic process of merging the databases with their respective

    the systematic analysis of large databases to solve problems and make informed decisions

  2. GMS FINAL.docx

    the systematic analysis of large databases to solve problems and make informed decisions

  3. what is systematic problem solving

    the systematic analysis of large databases to solve problems and make informed decisions

  4. How to Conduct a Systematic Review

    the systematic analysis of large databases to solve problems and make informed decisions

  5. Make Informed Decisions with Big Data Analytics

    the systematic analysis of large databases to solve problems and make informed decisions

  6. How The Next Generation Of Databases Could Solve Your Problems

    the systematic analysis of large databases to solve problems and make informed decisions

VIDEO

  1. A Guide to Effective Problem Identification Techniques #startup #startupindia #problemsolving

  2. MYTHS AND TRUTHS ABOUT THE INTESTINE

  3. how to solve #1045

  4. Learnings of Seminar Workshop

  5. Systematic Review (the validity & search strategy & searching databases & importing by endnote)

  6. 04 Meta-Analysis in STATA

COMMENTS

  1. GMS 200

    The systematic analysis of large databases to solve problems and make informed decisions. Bureaucracy A rational and efficient form of organization founded on logic, order, and legitimate authority. Contingency Thinking Thinking that tries to match management practices with situational demands Continuous Improvement

  2. The use of Big Data Analytics in healthcare

    Big Data Analytics can provide insight into clinical data and thus facilitate informed decision-making about the diagnosis and treatment of patients, prevention of diseases or others. Big Data Analytics can also improve the efficiency of healthcare organizations by realizing the data potential [ 3, 62 ].

  3. Full article: DECAS: a modern data-driven decision theory for big data

    Hence, data-driven decision making has recently been perceived as a solution for providing more informed, quality decisions which combine the intuition and experience of human decision makers with the analysis of data, thus providing more rational choices leading to better results (Janssen et al., Citation 2017; Power, Citation 2016; Provost ...

  4. Data Science and Analytics: An Overview from Data-Driven Smart

    According to Cao et al. [ 17] "data science is the science of data" or "data science is the study of data", where a data product is a data deliverable, or data-enabled or guided, which can be a discovery, prediction, service, suggestion, insight into decision-making, thought, model, paradigm, tool, or system.

  5. Challenges and Opportunities of Big Data in Health Care: A Systematic

    Big data will play a significant role in this transformation . It will allow the information to be delivered to patients directly and empower them to play an active part in their care [5,15,27]. When patients are provided with the appropriate information, it will influence their decision making and allow them to make informed decisions [13,24].

  6. Systematic analysis of healthcare big data analytics for ...

    In this comprehensive systematic research work, the existing literature reported during 2011 to 2021, is thoroughly analysed for identifying the efforts made to facilitate the doctors and...

  7. Leverage big data analytics for dynamic informed decisions with

    By incorporate big data analysis, organisations can make decisions based on facts rather than intuition or hidden internal knowledge. When decision makers and knowledge workers have access to increasingly accurate and up-to-date trusted source of information, along with the analytical tools to make sense of it, their organisations benefit from ...

  8. Big Data Defined: Examples and Benefits

    Big data describes large and diverse datasets that are huge in volume and also rapidly grow in size over time. Big data is used in machine learning, predictive modeling, and other advanced analytics to solve business problems and make informed decisions. Read on to learn the definition of big data, some of the advantages of big data solutions ...

  9. Big data analytics in healthcare: a systematic literature review

    2.1. Characteristics of big data. The concept of BDA overarches several data-intensive approaches to the analysis and synthesis of large-scale data (Galetsi, Katsaliaki, and Kumar Citation 2020; Mergel, Rethemeyer, and Isett Citation 2016).Such large-scale data derived from information exchange among different systems is often termed 'big data' (Bahri et al. Citation 2018; Khanra, Dhir ...

  10. Artificial intelligence approaches and mechanisms for big data

    This section provides guidelines for performing a systematic analysis for studying the big data analytics approaches. The systematic analysis procedure includes a clarification of finding the related studies in scientific databases (Charband & Navimipour, 2016). The following Research Questions (RQs) are defined and answered according to the ...

  11. Decision Support and Analytics

    1 Citations Part of the Springer Handbooks book series (SHB) Abstract Decision automation is examined in a broader context of using information technologies to support decision-making. Key definitions and a brief history of computerized decision support and analytics create important boundaries.

  12. Data-Driven Decision Making: Facilitating Teacher Use of Student Data

    NCLB mandated teachers' systematic analysis of data collected from standardized, state- or national-level assessments and use of the findings in their instructional decision making (Kennedy, 2011; Mandinach, 2012). ... and problem-solving processes. The four assessment modules ask students to solve problems, contextualized in a narrative ...

  13. Problems Being Solved With Databases

    New generation databases solely operate in memory (i.e. Apache Ignite and Spark) enabling data to be read as fast as possible. We expect data to be retrieved quickly. We solve problems by ...

  14. Big data and disaster management: a systematic review and ...

    The era of big data and analytics is opening up new possibilities for disaster management (DM). Due to its ability to visualize, analyze and predict disasters, big data is changing the humanitarian operations and crisis management dramatically. Yet, the relevant literature is diverse and fragmented, which calls for its review in order to ascertain its development. A number of publications have ...

  15. Systematic analysis of healthcare big data analytics for efficient care

    In this comprehensive systematic research work, the existing literature reported during 2011 to 2021, is thoroughly analysed for identifying the efforts made to facilitate the doctors and practitioners for diagnosing diseases using healthcare big data analytics.

  16. The use of Big Data Analytics in healthcare

    Big Data Analytics can provide insight into clinical data and thus facilitate informed decision-making about the diagnosis and treatment of patients, prevention of diseases or others. Big Data Analytics can also improve the efficiency of healthcare organizations by realizing the data potential [ 3, 62 ].

  17. Optimal database combinations for literature searches in systematic

    Within systematic reviews, when searching for relevant references, it is advisable to use multiple databases. However, searching databases is laborious and time-consuming, as syntax of search strategies are database specific. We aimed to determine the optimal combination of databases needed to conduct efficient searches in systematic reviews and whether the current practice in published ...

  18. Test 1 Flashcards

    _____ is the use of large data bases and mathematics to solve problems and make informed decisions using systematic investigation. evidence-based management _______ environment consists of economic, legal-political, sociocultural, technological, and natural environment conditions.

  19. GMS FINAL.docx

    GMS FINAL.docx - Chapter 2 Analytics The systematic analysis of large databases to solve problems and make informed decisions. Bureaucracy A rational | Course Hero GMS FINAL.docx - Chapter 2 Analytics The systematic... Doc Preview Pages 14 Identified Q&As 5 Solutions available Total views 10 York University GMS GMS 200 ColonelBoulder975 10/21/2021

  20. How can big data analytics be used for healthcare organization

    To find out common elements in the studies reviewed, 16 studies have been selected and clustered into 4 research areas: 1) Potentialities of big data analytics. 2) Resource management. 3) Big data analytics and management of health surveillance systems. 4) Big data analytics and technology for healthcare organization.