The purpose of this study is to develop a method of ROC analysis to evaluate both the ability of individual readers to
detect abnormal findings and the detectability of abnormal findings in individual cases by applying item response theory
to the results of 1/0 judgments on presence of abnormal findings in CT image readings. The validity of the method was
verified by the following data and methods. Twenty-four readers searched for abnormal findings in 25 cases for which
there were chest CT images with defined abnormal findings. From the 1/0 judgment data for the 25 cases with CT
images (column) read by the 24 readers (row), each reader's potential ability to detect the abnormal findings (θ), the rate
of "1" judgment by each reader, i.e. confidence level for TP and FP, P(θ), and the individual image response
characteristic curves with the image as the item were calculated, from which ROC curves that represent the ability of
each reader to detect abnormal findings were created. In addition, from the 1/0 judgment data for the 25 cases with CT
images (row) read by the 24 readers (column), the potential detectability of abnormal findings for each CT image (θ)
and the rate of "1" judgment for the image by readers, i.e. confidence level for TP and FP, P(θ), were calculated, from
which ROC curves that represent the detectability of the abnormal finding in each case were created.
In this paper, we propose a methodology for evaluating whether the use of CAD is effective for any given reader or
case, first analyzing the results of readers' judgments (0 or 1) by the technique known as analysis of bias-variance
characteristics (BVC)1,2, then by combining this with ROC analysis, elucidating the internal structure of the ROC curve.
The mean and variance are first calculated for the situation when multiple readers examine a medical image for a single
case without CAD and with CAD, and assign the values 0 and 1 to their judgment of whether abnormal findings are
absent or present or whether the case is normal or abnormal. The mean of these values represents the degree of bias
from the true diagnosis for the particular case, and the variance represents the spread of judgments between readers.
When the relationship between the two parameters is examined for several cases with differing degrees of diagnostic
difficulty, the mean (horizontal axis) and variance (vertical axis) show a bell-shaped relation. We have named this
typical phenomenon arising when images are read, the bias-variance characteristic (BVC) of diagnosis. The mean of the
0 and 1 judgments of multiple readers is regarded as a measure of the confidence level determined for the particular
case. ROC curves were drawn by usual methods for diagnoses made without CAD and with CAD. From the difference
between the TPF obtained without CAD and with CAD for the same FPF on the ROC curve, we were able to quantify
the number of cases, the total number of readers, and the total number of cases for which CAD support was beneficial.
To demonstrate its usefulness, we applied this method to data obtained in a reading experiment that aimed to evaluate
detection performance for abnormal findings and data obtained in a reading experiment that aimed to evaluate
diagnostic discrimination performance for normal and abnormal cases. We analyzed the internal structure of the ROC
curve produced when all cases were included, and showed that there is a relationship between the degree of diagnostic
difficulty of the case and the benefit of CAD support and demonstrated that there are patients and readers for whom
CAD is of benefit and those for whom it is not.
The purpose of our research is to make clear the mechanism that a reader (physician or radiological technologist) effectively identify abnormal findings in CT images of lung cancer screening by using with CAD system. A method guessing the 2X2 decision matrix between reader / CAD and reader / reader with CAD was investigated. We suppose the next scene to be it. At first, a reader judges whether abnormal findings per one patient per one CT image are present (1) or absent (0) without CAD results. The second, a reader judges whether abnormal findings are present (1) or absent (0) with CAD results. We expresses the correlation between diagnoses by a reader and CAD system for abnormal cases and for normal cases by following formula using phi correlation coefficient:φ=(cd-ab)/√(a+c)(b+d)(b+c)(a+d). a,b,c,d: 2X2 decision matrix parameters. If TPR1=(a+c)/n, TPR2=(b+c)/n and TPR3=(a+b+c)/n for abnormal cases, TPR3=TPR1+TPR2 - TPR1×TRR2 - φ√TPR1(1-TPR1)TPR2(1-TPR2). Therefore, a=n (TPR3 - TPR1), b=n (TPR3 - TPR2), c=n (TPR1 + TPR2 -TPR3), d=n (1.0 - TPR3). This theory was applied for the experimental data. The 41 students interpreted the same CT images [no training]. A second interpretation was performed after they had been instructed on how to interpret CT images [training], and third was assisted by a virtual CAD [training + CAD]. The mechanism that makes up for a good point of a reader and a CAD with CAD in interpreting CT images was theoretically and experimentally investigated. We concluded that a method guessing the decision matrix (2X2) between a reader and a CAD decided the "presence" or "absence" of abnormal findings explain the improvement mechanism of diagnostic performance with CAD system.
When physicians inspect an image, they make up a certain degree of confidence that the image are abnormal; p(t), or normal; n(t)[n(t)=1-p(t)]. After infinite time of the inspection, they reach the equilibrium levels of the confidence of p*=p(∞) and n*=n(∞). There are psychological conflicts between the decisions of normal and abnormal. We assume that the decision of "normal" is distracted by the decision of "abnormal" by a factor of k(1 + ap), and in an inverse direction by a factor of k(1 + bn), where k ( > 0) is a parameter that relates with image quality and skill of the physicians, and a and b are unknown constants. After the infinite time of inspection, the conflict reaches the equilibrium, which satisfies the equation, k(1 + ap*)n* = k(1 + bn*)p*. Here we define a parameter C, which is 2p*/[p*(1 - p*)]. After the infinite time of inspection, the conflict reaches the equilibrium, which satisfies t that changes in the confidence level with the time (dp/dt) is proportional to [k(1+ap)n - k(1+bn)p], i.e. k[-cp2 + (c - 2)p + 1]. Solving the differential equation, we derived the equation; t(p) and p(t) depending with the parameters; k, c, S. S (0-1) is the value arbitrary selected and related with probability of "abnormal" before the image inspection (S = p(0)).
Image reading studies were executed for CT images. ROC curves were generated both by the traditional 4-step score-based method and by the confidence level; p estimated from the equation t(p) of the DDC model using observed judgment time. It was concluded that ROC curves could be generated by measuring time for dichotomous judgment without the subjective scores of diagnostic confidence and applying the DDC model.
The increasing number of CT images to be interpreted in mass screening requires radiologists to interpret a huge number of CT images, and the capacity for screening has therefore been limited by the capacity to process images. To remedy this situation we considered paramedical staff, especially radiological technologists, as "potential screeners," and investigated their capacity to detect abnormalities in CT images of lung cancer screening with and without the assistance of a computer-aided diagnosis (CAD) system. We then compared their performances with those of physicians. A set of 100 slices of thoracic CT images from 100 cases ( 73 abnormal and 27 normal), one slice per case, was interpreted by 43 paramedical college students. A second interpretation by the students was performed after they had been instructed on how to interpret CT images, and a third interpretation was assisted by a virtual CAD system. We calculated the areas under the ROC curve (Az values) for both students and physicians. For the first set of interpretations, the Az values of 40% out of students placed the Az values within the range of Az values of the physicians, which varied from 0.870 to 0.964. For the second set of interpretations after the students had been instructed on CT image interpretation, the students' rate was 86%, and for the third set of virtual CAD-assisted interpretations it was 95%. The performance of paramedical college students in detecting abnormalities from thoracic CT images proved to be sufficient to qualify them as "potential screeners."
In this paper we present two methods of evaluating the effectiveness of double check (by two radiologists or by a CAD system and a radiologist): One method uses ROC analysis and the other uses the phi correlation coefficient (φ). We used the first method to evaluate the effectiveness of two radiologists conducting double check through discussion (i.e. the radiologists confer; conference system). We used the second method to evaluate the effectiveness of double check in which Reader 2 makes a final assessment by referring to the assessment of Reader 1 (reference system). It is suggested that double check conducted by two radiologists through discussion may not be so effective; however, double check in which Reader 2 makes a final assessment by referring to the assessment or Reader 1 may be very effective. In addition, we discuss problems that may occur in relation to Reader 2 deciding whether to adopt the assessment of Reader 1, and practical models of double check by a CAD system and a radiologist. Continued research is necessary to establish a double check system that improves diagnostic accuracy in practical situations, i.e. it is unknown if assessments are correct.
The objective of this study was to measure the image exploration activity of physicians, and thereby contribute to the development of a support system for CRT image interpretation in thoracic CT screening. In this study, we examined how the pupil diameters of five physicians changes over time during interpretation of a large quantity of CT images on a CRT monitor, and how this might be related to the accuracy of diagnosis. The study showed that, when a large quantity of CT images were viewed through a CRT monitor in a dimly lit room, the pupil diameter decreased during the second half of the long interpretation procedure in three of the five physicians. Furthermore, the pupil diameter frequently became approximately zero because the physician became drowsy. However, when the relationship between these phenomena and the accuracy of diagnosis was analyzed in one of the physicians, proof that such phenomena might lead to statistically significant false negatives or false positives was not found. Despite such results, the potential risk of misdiagnosis cannot be ignored. It may be necessary to devise both equipment and work conditions that will not cause the pupil diameter to become approximately zero during interpretation of images on a CRT monitor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.