Biomedical Data Mining for Information Retrieval. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Biomedical Data Mining for Information Retrieval - Группа авторов страница 13

Название: Biomedical Data Mining for Information Retrieval

Автор: Группа авторов

Издательство: John Wiley & Sons Limited

Жанр: Базы данных

Серия:

isbn: 9781119711261

isbn:

СКАЧАТЬ quickly, because due to sepsis [28] there is a chance of increasing risk of death after discharge from hospital. The objective of the paper is to develop a model for one year mortality prediction. 5,650 admitted patients with sepsis were selected from MIMIC-III database and were divided into 70% patients for training and 30% patients for testing. Stochastic Gradient Boosting Method is used to develop one-year mortality prediction model. Variables are selected by using Least Absolute Shrinkage and Selection Operator (LASSO) and AUROC is calculated. 0.8039 with confidence level 95%: [0.8033–0.8045] of AUROC result is obtained in testing set. Finally, it is observed that Stochastic Gradient Boosting assembly algorithm is more accurate for one year mortality prediction than other traditional scoring systems—SAPS, OASIS, MPM or SOFA.

      1.3.1 Dataset

      From these 15 variables, first value, last value, highest value, lowest value and median value are calculated for nine variables and taken as features. Only first and last values are taken for four variables. For the dataset A, five outcome-related descriptors (SAPS Score, SOFA Score, length of stay, length of survival and in-hospital death) are available from which inhospital death (0 is represented as a survivor and 1 is represented as died in hospital) is taken as a target value.

      1.3.2 Data Pre-Processing

      Data pre-processing is one of the technique to filter and remove noisy data. 41 variables are given in the dataset. Among them 15 variables are selected out of which some of the variables are not carefully collected and having missing values. In this chapter, missing data are replaced by zeros.

      1.3.3 Normalization

      All the variables in the dataset are in different ranges and in different scales. The current values of data cannot be used for classification. If all the variables have the values in better ranges and scales, classifiers will work in a better way. A standard approach, z-score normalization method is used to normalize the variables.

S. no. Variables Description Physical units
1. Albumin Albumin g/dL
2. ALP Alkaline Phosphate IU/L
3. ALT Alanine transaminase IU/L
4. AST Aspartate transaminase IU/L
5. Bilirubin Bilirubin mg/dL
6. BUN Blood urea nitrogen mg/dL
7. Cholesterol Cholesterol mg/dL
8. Creatinine Creatinine mg/dL
9. DiasABP Invasive diastolic arterial blood pressure mmHg
10. FiO2 Fractional inspired oxygen [0–1]
11. GCS Glasgow Coma Score [3–15]
12. Glucose Serum Glucose mg/dL
13. HCO3 Serum Bicarbonate mmol/L
14. HCT Hematocrit %
15. HR Heart Rate bpm
16. K Serum Potassium mEq/L
17. Lactate Lactate mmol/L
18. Mg Serum Magnesium mmol/L
19. MAP Invasive mean arterial blood pressure mmHg
20. MechVent Mechanical Respiration Ventilation 0/1(true/false)
21. СКАЧАТЬ