Abstract

In the era of advancement in information technology and the smart healthcare industry 5.0, the diagnosis of human diseases is still a challenging task. The accurate prediction of human diseases, especially deadly cancer diseases in the smart healthcare industry 5.0, is of utmost importance for human wellbeing. In recent years, the global Internet of Medical Things (IoMT) industry has evolved at a dizzying pace, from a small wristwatch to a big aircraft. With this advancement in the healthcare industry, there also rises the issue of data privacy. To ensure the privacy of patients’ data and fast data transmission, federated deep extreme learning entangled with the edge computing approach is considered in this proposed intelligent system for the diagnosis of lung disease. Federated deep extreme machine learning is applied for the prediction of lung disease in the proposed intelligent system. Furthermore, to strengthen the proposed model, a fused weighted deep extreme machine learning methodology is adopted for better prediction of lung disease. The MATLAB 2020a tool is used for simulation and results. The proposed fused weighted federated deep extreme machine learning model is used for the validation of the best prediction of cancer disease in the smart healthcare industry 5.0. The result of the proposed fused weighted federated deep extreme machine learning approach achieved 97.2%, which is better than the state-of-the-art published methods.

1. Introduction

Automated systems for human disease classification are becoming increasingly important in the field of healthcare for several reasons, including automated systems can provide more accurate and consistent disease classifications compared to manual classification by healthcare professionals. This is because these systems can analyze vast amounts of patient data and identify patterns that may be missed by humans. It can help in the faster diagnosis of diseases, which is critical for conditions that require immediate treatment. Early detection is key to successful treatment of many diseases, and automated systems can help detect diseases at an early stage when treatment is most effective. Automated systems can help healthcare professionals make more informed decisions about patient care, leading to improved patient outcomes. Automated disease classification systems can be cost-effective in the long term, as they can reduce the need for repeated tests and consultations, which can be expensive for patients and healthcare providers. The importance of automated disease classification systems lies in their ability to improve accuracy, speed, and efficiency in healthcare, ultimately leading to better patient outcomes.

Cancer is the second-largest leading cause of mortality worldwide. In 2020, more than 19.2 million new cancer cases were reported worldwide, with 9.95 million fatalities [1], numbering nearly one in every six expiries of a human being [2]. The human body is made up of billions of cells, which develop and multiply to generate new cells in response to the body’s needs through a process known as cell division. When cells reach a specific age or number prone to harming the body, it is natural for them to die and be replaced with new ones. If this procedure fails, harmful cells begin to grow and replicate, leading to tumor growth. These tumors could be cancerous or noncancerous [3]. Cancer can affect every human body and any of the organs; while colon, lungs, liver, prostate, stomach, and skin cancer are the most commonly afflicting types. Globally, 4.14 million new cases of lung and colon cancer were diagnosed in 2020, with 2.7 million mortalities [4]. A smart healthcare system is typically connected to the internet of medical things (IoMT), allowing you to use and control several smart devices, each of which plays an important role in your and your family’s healthcare. An IoMT-based intelligent prediction [5] system for breast cancer is present empowered by deep learning. Prediction of diabetes and energy consumption with fused machine learning [68] techniques is also adopted for the prediction of human diseases in previous studies.

Heart disease prediction [9] with the help of machine learning is studied for the wellbeing of humans.

Citizens’ lifestyles can be made easy and safe by smart healthcare system solutions that are more open and secure. It not only provides useful tools, such as habit tracking and even safety tests, which have prompted customers and system developers to do extensive research [10], but it also demands handling important issues, such as data privacy, security, and data access. Federated learning for privacy solutions [11, 12], especially in the healthcare sector are focused in this study. In this study, we deeply studied the workings of FL and their contribution towards healthcare data.

Federated learning is a machine learning technique that allows multiple parties to collaboratively train a shared model while keeping their data locally stored and private. This technique has several advantages when it comes to healthcare data, such as FL allows healthcare organizations to share data without actually sharing it. In other words, the data remain on the local servers of each organization, and only the trained model parameters are shared. This helps to protect sensitive patient data, which is essential in healthcare. FL allows for a larger and more diverse set of data to be used in the training process. This can help improve the accuracy and generalizability of the model. Additionally, because data are being collected from multiple sources, there is less risk of bias or skewed data that can occur when relying on a single source. FL can be more cost-effective than traditional centralized learning methods since it avoids the need for data to be transferred to a central location for training. This can be particularly advantageous in healthcare, where large amounts of data need to be processed and analyzed. FL encourages collaboration between healthcare organizations, researchers, and other stakeholders. This can help to promote knowledge sharing, faster innovation, and ultimately better patient outcomes. Overall, the main advantage of federated learning in healthcare dataset is its ability to facilitate collaboration while maintaining data privacy and security. This technique can help to improve the quality of healthcare data analysis and ultimately lead to better patient outcomes. An advancement in FL from FL to split learning (SL) [13] is presented for overcoming privacy issues. The permutation of a small dataset [14] of patients in different hospitals makes a well-trained model achieve the global objective of better evaluation and classification. Resource efficiency management [15] in FL is also considered. Handling heterogeneity in data can be a significant challenge in data analysis, as it can complicate the process of identifying meaningful patterns, relationships, or insights in the data. Effective strategies for dealing with heterogeneity include data preprocessing, data normalization, data integration, and using appropriate statistical or machine learning techniques. In other words, when a dataset is heterogeneous, it contains data that are not uniform or consistent across all its dimensions. For example, a dataset could be heterogeneous if it contains data in different languages, data of different types (e.g., numerical, categorical), data with different levels of granularity (e.g., daily, monthly), or data from different sources or formats.

IoMT stands for “Internet of Medical Things,” which refers to a system of interconnected medical devices, software applications, and health systems that collect and share data over the Internet. IoMT has the potential to revolutionize healthcare by improving patient outcomes, reducing costs, and increasing efficiency. In healthcare, IoMT can be used in a variety of ways, including remote patient monitoring, telemedicine, predictive analytics, and real-time asset tracking. For example, wearable devices such as smartwatches and fitness trackers can collect and transmit patient data to healthcare providers in real-time, allowing them to monitor vital signs and detect potential health problems before they become serious. Edge computing [16] is used for data collection at each node of smart healthcare hospitals, while fog computing [17] is used for connecting the all-edge nodes and data transmission between these nodes. Another significant difficulty, at this vantage point, is keeping data safe from unauthorized persons. Lung cancer detection is profuse [18] clustering with transfer learning (TL) is also used. Different image classification-trained models are being used for disease identification. Automated lung cancer detection [19] with CT images and histopathology [20] with the help of neural networks (NN) and ensemble classifier techniques has been adopted in recent study.

In the healthcare industry 5.0, IoMT played a vital role in data collection and its transfer through wireless transmission from different places using different devices and sensors. In this way, healthcare data can be compromised by any means. But the healthcare industry 5.0 demands high security and privacy of patients’ data. To overcome this issue, different approaches have been resorted to, but still, the problem is unsettled. In this article, we have considered how patients’ data security and privacy can be handled using the FL approach with edge computing.

FL is a framework that has lately acquired popularity as a result of its high level of assurance for learning with fragmented and sensitive data. It demands and allows the training of a shared global model utilizing a central server while keeping the information in the appropriate organization, rather than merging the collected data from diverse sources or relying on the traditional discovery strategy rather than the replication. FL is a robust ML model which works efficiently by combining training data from several sources to create a global model without exchanging datasets directly. There are many advanced machine learning models for the classification of human diseases and their prognosis. In FL, machine learning models are applied to the local datasets of each hospital for strengthening data privacy and security, especially to ensure the integrity of the data. This ensures that patient privacy is maintained between sites. The model is trained by dispersing itself across remote centralized data centers, such as health facilities or other medical organizations, preserving data localization at these locations. No data from any contributor is exchanged or transferred during the training process. Instead of providing data to a single server, such as in traditional deep learning, the server maintains a globally common architecture that is shared by all institutions. After that, each organization creates its own patient-based data model. Following that, each center sends data to the server using the model’s inaccuracy gradient. The central server compiles all participants’ feedback and adjusts the global model based on predetermined parameters. The predefined criteria allow the model to judge the excellence of the response and, as a result, only include information that adds value. As a result, feedback from centers that report unfavorable or typical results may be overlooked. This approach is continued until the global model is learned in a single round of FL. The entire design of FL is shown in Figure 1.

Edge computing is the process of physically bringing computational capacity closer to the source of data, which is mainly an IoT device or sensor. Edge computing is named after the way computational power is sent to the network or device’s edge, allowing faster data processing, higher bandwidth, and data sovereignty. Edge computing lowers the need for huge volumes of data to travel between servers, the cloud, and devices or edge locations to be processed by processing data at the network’s edge. This is especially relevant for current applications, such as data science and artificial intelligence. Edge computing’s [16] purpose is to bring data sources and devices closer to each other, reducing processing time and distance, which increases application and device performance. Sensors are valuable instruments for evaluating smart healthcare systems, ecosystems, and customers. Therefore, the IoMT collects data from them. Sensors, interaction, and smart healthcare are three sorts of devices that fit within this category. Sensors collect data, which is then processed by computers. The IoMT network system includes closed-circuit devices, wearables, and other items that make up the edge layer. Overall, IoMT has the potential to transform healthcare by improving patient care, reducing costs, and increasing efficiency. As technology continues to evolve, we can expect to see more widespread adoption of IoMT in the coming years. By capturing and storing data, information is acquired and processed from these nodes on the edge nodes by applying artificial neural networks (ANN) and their variants.

Artificial neural networks (ANNs) are increasingly being used in healthcare for a variety of applications, including ANNs can be trained on large datasets of medical images or other patient data to accurately identify disease or conditions. For example, ANNs have been used to diagnose breast cancer from mammograms, predict the likelihood of a patient having a heart attack, or detect skin cancer from images of moles. ANNs can be used to predict the progression of a disease or assess the risk of complications. For instance, ANNs have been used to predict the risk of postoperative complications or the likelihood of a patient developing complications from diabetes. ANNs can help in the development of new drugs by predicting the efficacy and toxicity of potential drug candidates. ANNs have been used to predict the toxicity of new drugs on the liver, the effectiveness of cancer drugs, and drug-drug interactions. ANNs can be used to analyze EHRs to identify patterns and predict outcomes. For instance, ANNs can be used to predict which patients are at risk of readmission, or to identify patients who may benefit from preventative interventions.

Overall, ANNs have great potential to improve healthcare by providing accurate and efficient diagnosis, prediction, and personalized treatment.

The main objective of the proposed fused weighted model is to highlight the strengths, and weaknesses of different varients of ANN on the same healthcare dataset. Fusion of ML models refers to the process of combining multiple machine learning models to improve overall performance or accuracy. There are several ways to perform model fusion, depending on the nature of the problem and the types of models being used. In our proposed FL methodology, we considered the weights of each model for fusion. The deep extreme machine learning approach is mainly used in this problem for deep analysis of healthcare datasets for accurate prediction of disease. To get the maximum advantage of different ML models, the proposed fused weighted federated deep extreme learning approach combines the weights of the LM and BR models. In this way, we developed a new generalized model which has the potential to achieve better accuracy in a heterogeneous environment compared to individual ML models. The current approach is not considered in traditional ML methods for disease prediction in the healthcare industry 5.0.

The novelty and contribution of this research are as follows:(i)To ensure the privacy and security of patient data, as well as the secure automated healthcare system, is considered in this research.(ii)To resolve the issue of delayed data transmission while transmitting data entangled with IoMT to the cloud, it is normally difficult to meet real-time scenarios.(iii)The fused weighted federated deep extreme machine learning (FDEML) model is utilized for disease prediction in this research.(iv)The proposed fused weighted FDEML model provides a better solution for accurate disease identification and treatment.(v)The proposed fused weighted FDEML model also provides a better opportunity for the selection of the best prediction ML model.

Finally, simulation results have shown that the FL for the classification of lung disease in e-healthcare monitoring systems will prove better accuracy with the privacy of patients’ data.

The following is a breakdown of the paper’s structure: Section 2 highlights the latest research on lung cancer discovery and monitoring as reported in the literature. The research methods, feature extraction, feature selection, and proposed fused weighted FDEML model are all covered in Section 3; the dataset selection, preprocessing, data fusion (augmentation), and results and discussion are presented in Section 4; and the conclusion and future work are discussed in Section 5. References are given in Section 6.

According to a recent study, cloud-based medical records have several disadvantages. Most of which are connected to healthcare-related data from multiple sources. These are taken and analyzed from various databases that are available anywhere. Furthermore, no infrastructure exist that stores all the healthcare-related data in a cloud-like environment, such as lab tests, imaging, or a patient’s prescriptions in visits, and makes it secure to access from anywhere. Many medical-related departments now use computer systems and software to store data on a system rather than a manual system. In this way, doctor minimizing the human labor necessary to obtain data manually as well as the time and effort needed to do so. Users, on the other hand, are still unable to obtain data online from their homes; they must physically visit the place, which takes time. The tasks and responsibilities of smart houses are increasingly evolving as a result of recent developments in information and communication technology (ICT), and the Internet of things. A smart healthcare system is a home that collects and sends information in real-time. It might use smart technology to offer automated services and information from several medical devices, including a smartwatch, diabetes monitor, blood pressure monitor, and electrocardiograph machine and many more.

Systems that make use of these new technologies are incorporated into the health-based interactive system of computers and the community without user intervention [21]. Consumers may be able to regulate the use of various medical equipment to track and manage their health depending on their settings and the configuration of the smart healthcare network to ensure the full advantages of health product design is achieved. The IoMT and smart living are becoming important gadgets in the healthcare system. The smart healthcare network structure is made up of several tiny embedded computers that are linked together and connected to a range of IoMT devices [22] over an Internet. Wired networking services have been phased out in favor of wireless networking services [23]. Data has been a main source of intelligence in recent decades, and smart applications for real-world concerns such as wireless networking, bioinformatics, agriculture, and finance [24] have opened up new opportunities. These systems are data-driven and incorporate user-friendly insights that help people perform their tasks more efficiently. This generates knowledge, customizes consumer perceptions, enhances customer interactions, boosts operational efficiency, and necessitates the use of developing technology. Several sophisticated technologies make people’s lives easier [25]. A light deep model for pulmonary nodule detection [26] is presented for mobile devices. Large amounts of data are stored in such systems, and the preservation of this constantly changing material in archives raises security concerns. Analysis of disease gene relation with machine learning techniques is adopted for disease gene analysis [27]. Skin lesion classification [28] for humans is presented in this study for skin-related problem identification.

In the image classification dataset, from MNIST, the precision-weighted FL [29] algorithm is considered. Attack detection [30] via FL in the medical-physical system is presented in this study. Automated e-healthcare monitoring systems were proposed. The study proposes attack on FL through an IoT [31], an intrusion detection system. In this study, a multivariate dataset was considered. A game-based deep reinforcement learning [32] for energy-efficient computation is proposed in this study. In this study, data segmentation and time division are considered for reducing the computational cost. In the approach, asynchronous transmission for real-time data are considered properly and efficiently.

The FL via deep knowledge tracing [33] framework is proposed in this study. In this study, data security has been considered intelligently. But a major drawback of this study is that students’ secrecy has not been considered. The FL of predictive models in electronic health records (EHR) [34] proposed a decentralized optimization framework for prediction in hospitalization. In this research, the author contributes to the convergence rate and communication cost reduction. The major drawback was no simulation result and was extracted via FL. A dynamic fusion-based FL for COVID-19 discovery [35] was proposed. The image-based dataset was considered for the identification of COVID-19 patients by using the fusion-based FL approach. The evaluation parameters showed a good result. The major drawback of this proposed medical image analysis is that it does not protect the privacy of patients’ data. A personalized FL [36] for IoT applications based on a cloud-edge framework was proposed. A FL approach for privacy-preserving [37] in traffic flow was presented in this study. In this study, the FL algorithm was designed to predict the flow of traffic. In this approach, a good way was given for reducing the overhead communication costs. The major flaw of this study is that no numerical simulations for privacy were shown. The FedGRU algorithm was used for simulation and result evaluation. Privacy preservation of misbehavior detection [38] via the Internet for vehicles using FL. The FL scheme for collision avoidance [39] for traffic was proposed in this research. In this approach, transfer reinforcement learning agents’ knowledge can be given in trial time. A comprehensive study has been made for brain tumor diagnosis via deep and FL methodologies [40]. A thorough review has been made of all aspects of brain tumor research, including their approaches, datasets, and classifiers.

An optimal DL-based fusion model [41] is presented in this study for image classification in biomedical sciences. SIFT-based handcrafted features of the image dataset and inception v4-based deep features are fused in this approach for better classification of cancer disease. Feature selection and segmentation [42] in MRI images for the diagnosis of brain tumors are presented. Lung cancer disease detection [43] is presented in this study. Transfer learning and class-selective image processing techniques are applied for better accuracy of lung disease detection.

The lightweight encryption [44, 45] techniques are adopted in healthcare for better security of patient data. Different ML algorithms entangled with IoT devices [4648] are applied to secure the data in smart healthcare systems as well as in smart grids. The encryption techniques help a lot with data securing in IoT devices. The article presents a novel method for predicting hydrogen storage in dibenzyltoluene via weighted federated machine learning [49], which allows the use of distributed data without compromising privacy. The authors demonstrate the effectiveness of the method by comparing it with other machine learning approaches and experimental data. The results show that the proposed method can accurately predict the hydrogen storage capacity of dibenzyltoluene, which has important implications for the development of sustainable energy storage technologies.

3. Proposed Work

The proposed fused weighted FDEML for the prediction of lung cancer disease in a smart healthcare system is presented. The proposed model is shown in Figure 1. The phases of the proposed fused weighted FDEML model to predict lung cancer in patients using a smart healthcare system are as follows: The proposed model as shown in Figure 1 is divided into multiple phases: a (1) acquisition layer for the collection of patients’ data, (2) preprocessing of raw data, (3) training and retraining of each local model, (4) a private edge-cloud layer for storing the fused weighted FDEML model, (5) a public cloud layer, and (6) finally the validation layer. The patient’s data is collected in the acquisition layer through IoMT devices. After collecting the data via IoMT devices, the data needs to be preprocessed to remove noisy data. After preprocessing the data, it is divided for training and validation in the FDEML phase.

The global model exchanges the average weights of its model with all local models of each hospital for the prediction and training of the local model, as shown in Figure 1. This exercise continues until the required criteria are met. After reaching the required threshold of the model, the model weights are sent to the private edge cloud for further optimization, as shown in Figure 1 of the proposed fused weighted FDEML model for the prediction of lung disease in smart healthcare 5.0. As shown in Figure 1, a private edge cloud is deployed for training and retraining each local model for making one global fused weighted FDEML model.

The hospitals are labeled with names like Hospital A, with the data of all the patients surrounding that hospital, same in Hospital B, Hospital C, and up to Hospital N. In Figure 1, a global fused weighted FDEML model is stored in private edge clouds for predicting lung disease with different FDEML models for achieving better accuracy in a heterogeneous environment.

The flow chart of the proposed methods is shown in Figure 2. The hierarchy of proposed methods is divided into the following phases; (1) data collection, (2) preprocessing, (3) data distribution, (4) FDEML phase, (5) fusing the LM Model and BR Model, (6) testing and validation, and (7) disease classification. In phase 4, the FDEML approach is adopted separately for both models. In phase 5, the weights of these models, i.e., (the LM model and BR model), were extracted through FDEML being fused. The weights of these models are combined with the ratio of accuracy achieved individually. In the end, the fused weighted FDEML model is applied for disease classification.

The mathematical model of the proposed fused weighted FDEML is as follows: -an artificial neural network (ANN) is fed the dataset once it has been collected from the various networks. The input layer, hidden layer, and output layer are the three layers on which ANN operates. The main functioning mechanism and mathematical model of an ANN are as follows: where [a1, a2, a3, …, an] denotes the input features, “i”, “j,” and “k” denotes the element indices in each layer, and the circle within the layer denotes the neuron. Bias is introduced to each layer, which is indicated by the numbers b1 and b2. The variable displays the weights between the input and hidden layers, while the variable displays the weights between the hidden layer and the output layer.

The total number of elements is n in the input layer, m in the hidden layer, and p in the output layer, which essentially yields the dimensions of each layer.

We are trying to get the output from each neuron in the hidden and output layers. The output at each neuron of the hidden layer can be calculated using equation (1) [49], in which represents the output of client of hidden neuron

Similarly, as in equation (1) [49], represents the output at the output layer at the neuron

The difference between the actual output and the estimated output is called error. This error can be calculated in equation (3), which represents the client error, represents the actual output, and represents the estimated output at the output layer.

Now we disseminate in the back direction to find the weight updating, which resulted in a change in the weights. The weight-updating process starts from the output layer and goes back to the input layer via the hidden layer. The change in weights between the output layer and hidden layer is given in equation (4), where represents the change in weights of client at the output layer

Similarly, the change in weights between the hidden layers and to input layer can be defined using equation (5), which represents the change in weights client at hidden layer

From equations (4) and (5), the relation of change in weights can be converted into equations (6) and (7) as follows:

Using chain rule

After taking the partial derivate and applying the chain rule, equation (8) can be derived to equation (9) as given below.

Equation (8) can be reduced in the form of equation (9) by replacing the constant factor, given in equation (9). In equation (10) , represent the change in the weights between the output to the hidden layer and represents the constant factor of equation (11).

The weights can be updated using equation (11), in which the next weight is updated using the current weight value and the change in the weight , and a learning rate factor

From equation (7), the change in weights of the hidden to input layer can be derived as follows:

Using the chain rule, the change in hidden to input weights can be defined as follows: Equation (13), represents the change in weights of element of hidden layer to the element in the input layer for the client .

By using the chain rule and also taking the derivate, equation (13) can be stated in form of.

Equation (14) can be compact to replace the constant factor stated in equation (11)

Equation (15) can also reduce to

By replacing constant factor with

The weights can be updated in the hidden to input layer by following equation (17) [49] as did in equation (12):

Equations (12) and (18), is the learning rate for weight updation. From equations (12) and (18) [49], we will get the optimum weights, these weights will use to aggregate at the federated server or global model.

Algorithm 1 shows the pseudo code of proposed FDEML model, which execute the ith client.

Client Training Algorithm (, )
(1) Start
(2) Split the local data into minibatches of size Bs
(3) Initialize both layers i.e., input layer and hidden layer weights (, ),
   = 0 and number of epochs t = 0
(4) For every minibatch (Bs)
(i) Do the feed forward phase to
(a)   Calculate using equation (1)
(b)   Calculate estimated output using equation (2)
(ii)  Calculate the error value of ) using equation (3)
(iii)  Back propagation for weight updating
(a)Calculate the using equation (10)
(b)Calculate the using equation (16)
(c)Update the weights using equation (12)
(d)Update the weights using equation (18)
(5)If the stopping criteria do not meet, then
(a)Go to step 4
(b)Else, go to step 6
(6)Return optimum weights (, ) to federated server
(7)Stop
3.1. Transfer of Weights

These weights are then transferred to the cloud or federated server. To secure this system, these weights can be encrypted and then transmitted. In this study, the encrypting of the weights is not used and it is left as an additional entity that can be added as per application requirements.

3.2. Federated Server

Each client is transmitting its optimum weight (, ) to the federated server. In our case, the clients are trained by (1) Levenberg−Marquardt (LM) and (2) Bayesian regularization (BR). The optimized weights of the LM algorithm and BR algorithm are given in equations (19) and (20), respectively

The combined optimal weights for federated server for input to hidden layer can be stated using equation (21), in which , represents the aggregated weights of all locally trained clients

This aggregation faces an issue with the addition property of the matrix, because the addition of the matrix to the dimensions should be consistent. It is very clear from equation (21), that all locally trained matrices cannot be added, since they do not have the same dimensions. To cope with this issue, the dimensions of all the concerned matrices should be the same. For this, we will concatenate a zero matrix with each matrix where it is required.

For this, using equation (22), we will find the maximum length of rows from all locally trained clients

Similarly, we will find the maximum length of columns from all locally trained clients using

To embed the zero matrices with each optimum weight matrix, the following procedure will be used. In this procedure, equations (26)–(28), and represents the zeros matrix for the LM and BR algorithms, respectively, this will generate a matrix of zeros. These zero matrices will be horizontally concatenated with each locally trained model weight

The horizontal concatenation is given below in equations (26) and (27)

In equations (26) and (27), and have the same dimension, now these matrices can be aggregated to each other. To obtain the federated server or global model, we will use equation (28)

3.2.1. Optimal Weights of Hidden-Output Layer

Same as input to the hidden layer, the optimal weights of hidden to output layer for the LM and BR algorithms can be stated using equation (29) and (30)

In equation (37), represent the fused weights of the hidden to the output layer. The local trained clients are given different scaling factors based on their performance.

Algorithm 2 shows the pseudocode of proposed FDEML model, which execute on the server side.

Federated server algorithm
(1) Start
(2) Initialize weights (, )
(3) For each cycle Do
  for each client Do
   [, ] = Client (t, , )
   End
End
(4) Calculate using equation (37)
(5) Calculate using equation (28)
(6) Prediction of unknown data samples
(a) for i = No. of Samples
(i)Calculate
(ii)Calculate
(iii)Calculate the error
(7) Stop
3.3. Dataset

In this study, we used the lung cancer dataset [50], which consists of 309 cases with 15 features of lung cancer disease. The features of the lung cancer datasets are shown in Table 1. The feature characteristics, determining unit, and ranges of features are demonstrated in Table 1. In this dataset, we used the augmentation process by adding 231 records for dataset equalization in both cases. For a small dataset with nominal values, a FDEML model cannot be used. As a result, all the nominal input is converted to numeric values for the proper working of the proposed model, which is shown in Table 1.

The research was carried out to determine the performance of the proposed fused weighted FDEML model in diagnosing lung cancer disease. The data were initially acquired from sensors and sent to the raw database using IoMT. Similarly, data collected from patients via lab results, queries, observations, and medical history was translated from an unstructured to a structured format for additional preprocessing. The preprocessing module examined the final dataset for further processing after gathering features from IoMT-based sensors.

Furthermore, the lung cancer dataset is then utilized for training the proposed prediction fused weighted FDEML model for the prediction of lung cancer disease. For evaluation purposes, we trained the proposed fused weighted FDEML model with two different variants of ANN. After training the proposed fused weighted FDEML models, we combined the weights of each model as shown in the equation. The dataset was randomly divided into 80% and 20%, respectively for training and testing the proposed fused weighted FDEML model for the prediction of lung disease.

3.4. Performance Evaluation

To determine the DEML model’s overall efficiency, various evaluation metrics are applied, as given in equations (3237). The accuracy measure can be used to show the overall predicting capabilities of the ML models and the proposed fused weighted DEML model [51]. True positive (TP) and true negative (TN) determine the competence of the suggested classifier to predict the absence and presence of lung cancer disease in the confusion matrix. The total number of false predictions in the suggested model is identified by false-negative (FN) and false-positive (FP). The sensitivity and success of the lung cancer disease model are calculated discretely using the recall and accuracy metrics. For prediction accuracy, the function measure (FM) metric is used. All the mathematical formulae of the abovementioned performance metrics of the proposed DEML models are as follows:

The training option and parameters used in the Levenberg−Marquardt optimization, Bayesian regularization backpropagation, and the proposed fused weighted FDEML model are presented in Table 2.

4. Results and Discussion

This section presents the proposed fused weighted FDEML model results and assessment with different classifiers, respectively. The ML models used for lung cancer disease prediction are divided into three (3) parts as follows: prediction of lung cancer disease with Levenberg

−Marquardt optimization, Bayesian regularization backpropagation, and the proposed fused weighted FDEML models. The MATLAB 2020 tool is used for simulation and results. The results of the individual DEML models are as follows: the confusion matrix of the LM model’s performance at the training level is shown in Table 3 validation in Table 4. Tables 3 and 4 also summarize’ the accuracy and miss rate at the training and validation phase.

The LM algorithm has been applied to an augmented dataset of 538 records; additionally, this data has been divided into training and groups of 80% (430 samples) and 20% (108 samples) for training and validation. Different metrics, including accuracy, miss rate, precision, sensitivity, specificity, and negative predictive value (NPV) are utilized to produce various statistical measurements used for comparison as well as performance. The formulae in equation (11) through equation (16) are used to calculate these parameters. The LM model predicts output as one (1) and zero (0). The value one (1) indicates that a health issue has been discovered, whereas zero (0) indicates that no health issue has been discovered.

During the training phase, Table 3 illustrates the LM model’s prediction for lung cancer illness. The 430 samples are used in training, and these samples are further divided into 210, 220 positive, and negative samples, respectively. It is determined that 207 samples are truly positive, which are being closely followed, and no healthcare issues have been observed. Only three (03) records are incorrectly projected as negatives, signaling a healthcare issue. A total of 220 samples are chosen in the same way, with negative results indicating the presence of a healthcare condition. The presence of a healthcare issue is indicated by the fact that 211 samples are processed appropriately forecast as negative. Even though a healthcare issue exists, nine (9) samples are wrongly forecasted as positive, indicating the absence of a healthcare issue.

In the validation phase, Table 4 illustrates the LM model prediction for lung cancer illness. 108 samples are used in validation, divided into 50, 58 positive, and negative samples, respectively. It has been determined that 47 samples are truly positive, which are being closely followed, and no healthcare issues have been observed. Only three (03) records are incorrectly projected as negatives, signaling a healthcare issue. A total of 58 samples are chosen in the same way, with negative results indicating the presence of a healthcare condition. The presence of a healthcare issue is indicated by the fact that 52 samples are appropriately forecasted as negative. Even though a healthcare issue exists, six (6) samples are wrongly forecasted as positive, indicating the absence of a healthcare issue.

Similarly, the BR model’s performance in the training and validation phases for the diagnosis of lung cancer disease is shown in Tables 5 and 6, respectively. During the training phase of the proposed fused weighted FDEML-based lung cancer disease prediction system, performance is mentioned in Table 7. Table 7 illustrates the proposed fused weighted FDEML model prediction for lung cancer disease. The 430 samples are used in training and are divided into 221, 209 positive, and negative samples, respectively. It has been determined that 209 samples are truly positive, which are being closely followed, and no healthcare issues have been observed. Only twelve (12) records are incorrectly projected as negatives, signaling a healthcare issue. A total of 209 samples are chosen in the same way, with negative results indicating the presence of a healthcare condition. The presence of a healthcare issue is indicated by the fact that 202 samples are appropriately forecasted as negative. Even though a healthcare issue exists, seven (7) samples is wrongly forecasted as positive, indicating the absence of a healthcare issue.

In the validation phase, Table 8 illustrates the proposed fused weighted FDEML-based lung cancer disease prediction system’s performance. 108 samples are used in validation, divided into 51, 57 positive, and negative samples, respectively. It has been determined that 50 samples are truly positive, which are being closely followed and no healthcare issues have been observed. Only one (1) record was incorrectly projected as negatives, signaling a healthcare issue. A total of 57 samples are chosen in the same way, with negative results indicating the presence of a healthcare condition. The presence of a healthcare issue is indicated by the fact that 54 samples are appropriately forecasted as negative. Even though a healthcare issue exists, three (3) samples are wrongly forecasted as positive, indicating the absence of a healthcare issue.

Table 9 shows the proposed fused weighted FDEML model’s performance in terms of sensitivity, specificity, accuracy, miss rate, and negative prediction value and precision during the training and validation phases. The model during training gives 94.6%, 96.7%, 95.6%, 94.4%, 96.8, and 4.40% sensitivity, specificity, accuracy, negative prediction value, precision, and miss rate, respectively. And during validation, the model gives 98%, 94.7%, 96.3%, 98.2%, 94.3%, and 3.70% sensitivity, specificity, accuracy, negative prediction value, precision, and miss rate, respectively. The comparison with other related work done is shown in Table 10. The overall performance of the proposed fused weighted FDEML model for the diagnosis of lung cancer disease is shown in Figure 3. The LM model achieved 93.50% accuracy in the given dataset for predicting lung cancer disease. The miss rate of the LM model is 6.50%. The BR model achieved 82.40% accuracy while predicting lung cancer disease, which is less than the accuracy of the LM model on the same dataset. The miss rate of the BR model is 17.60%, which is much higher than the LM model for accurate prediction of lung cancer disease as shown in Figure 3. The proposed fused weighted FDEML model achieved 96.30% accuracy, which is much higher than both of the ML models abovementioned. The miss rate of the proposed fused weighted FDEML model is 3.70%, which is very less as compared to the LM and BR models, as shown graphically in Figure 3. Due to the abovementioned facts, the proposed fused weighted model is the better choice for the prediction of lung cancer disease in the healthcare system.

5. Conclusions

Lung cancer disease is a life-threatening disease that affects many parts of the human body. A proposed fused weighted federated deep extreme machine learning model is provided for the rapid and accurate prediction of lung cancer disease without jeopardizing patients’ privacy. For a faster response and higher accuracy rate, a federated DEML approach is applied. Furthermore, the weights of FDEML models are combined to make a new generalized model for better prediction of human disease. The proposed fused weighted FDEML model accurately predicted whether the patient was suffering from lung cancer disease or not. MATLAB 2020a is used to simulate the proposed fused-weighted FDEML model. The accuracy of the LM model and BR model at the validation level was achieved at 93.5% and 82.4%, respectively. The result of the proposed fused-weighted FDEML model achieved 96.3% which is better than the state-of-the-art methods used before for the prediction of cancer disease in the smart healthcare industry 5.0.

The proposed fused weighted model is limited to the lung cancer disease dataset. In this proposed model, only two models, i.e., the LM and BR are considered. In the future, a new generalized model can be generated by utilizing the better results of other models. Further, on the image dataset, this approach can also be applied to achieving a better classification of human diseases.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Center for Cyber-Physical Systems, Khalifa University, under grant 8474000137-RC1-C2PS-T5.