1 General Purpose
The general principle of regression
analysis is that the best fit
line/exponential-curve/curvilinear-curve etc. is calculated, i.e.,
the one with the shortest distances to the data, and that it is,
subsequently, tested how far the data are from the curve. A
significant correlation between the y (outcome data) and the x
(exposure data) means that the data are closer to the model than
will happen purely by chance. The level of significance is usually
tested, simply, with t-tests or analysis of variance. The simplest
regression model is a linear model.
2 Schematic Overview of Type of Data File
3 Primary Scientific Question
Is curvilinear regression able to find
a best fit regression model for data with both a continuous outcome
and predictor variable.
4 Data Example
In a 20 patient study the quantity of
care estimated as the numbers of daily interventions like
endoscopies and small operations per doctor is tested against the
quality of care scores. The primary question was: if the
relationship between quantity of care and quality of care is not
linear, does curvilinear regression help find the best fit
curve?
Quantityscore
|
Qualityscore
|
19,00
|
2,00
|
20,00
|
3,00
|
23,00
|
4,00
|
24,00
|
5,00
|
26,00
|
6,00
|
27,00
|
7,00
|
28,00
|
8,00
|
29,00
|
9,00
|
29,00
|
10,00
|
29,00
|
11,00
|
The first ten patients of the data file
is given above. The entire data file is in extras.springer.com, and
is entitled chapter25curvilinearestimation. Start by opening that
data file in SPSS. First, we will make a graph of the data.
5 Data Graph
Command:
-
Analyze….Graphs….Chart builder….click: Scatter/Dot….click quality of care and drag to the Y-Axis….click interventions per doctor and drag to the X-Axis….click OK.
The above graph shows the scattergram
of the data. A nonlinear relationship is suggested. The curvilinear
regression option in SPSS helps us identify the best fit
model.
6 Curvilinear Estimation
For analysis, the statistical model
Curve Estimation in the module Regression is required.
Command:
-
Analyze….Regression….Curve Estimation….mark: Linear, Logarithmic, Inverse, Quadratic, Cubic, Power, Exponential….mark: Display ANOVA Table….click OK.
The above graph is produced by the
software program. It looks as though the quadratic and cubic models
produce the best fit models. All of the curves are tested for
goodness of fit using analysis of variance (ANOVA). The underneath
tables show the calculated B-values (regression coefficients). The
larger the absolute B-values, the better fit is provided by the
model. The tables also test whether the absolute B-values are
significantly larger than 0,0. 0,0 indicates no relationship at
all. Significantly larger than 0,0 means, that the data are closer
to the curve than could happen by chance. The best fit linear,
logarithmic, and inverse models are not statistically significant.
The best fit quadratic and cubic models are very significant. The
power models and exponential models are, again, not statistically
significant.
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
Interventions/doctor
|
−,069
|
,116
|
−,135
|
−,594
|
,559
|
(Constant)
|
25,588
|
1,556
|
16,440
|
,000
|
(1) Linear
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
In(interventions/doctor)
|
,726
|
1,061
|
,155
|
,684
|
,502
|
(Constant)
|
23,086
|
2,548
|
9,061
|
,000
|
(2) Logarithmic
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
1/interventions/doctor
|
−11,448
|
5,850
|
−,410
|
−1,957
|
,065
|
(Constant)
|
26,229
|
,989
|
26,512
|
,000
|
(3) Inverse
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
Interventions/doctor
|
−2,017
|
,200
|
3,960
|
10,081
|
,000
|
Interventions/doctor**2
|
−,087
|
,008
|
−4,197
|
−10,686
|
,000
|
(Constant)
|
16,259
|
1,054
|
15,430
|
,000
|
(4) Quadratic
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
Interventions/doctor
|
4,195
|
,258
|
8,234
|
16,234
|
,000
|
Interventions/doctor**2
|
−,301
|
,024
|
−14,534
|
−12,437
|
,000
|
Interventions/doctor**3
|
,006
|
,001
|
6,247
|
8,940
|
,000
|
(Constant)
|
10,679
|
,772
|
13,836
|
,000
|
(5) Cubic
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
In(interventions/doctor)
|
,035
|
,044
|
,180
|
,797
|
,435
|
(Constant)
|
22,667
|
2,379
|
9,528
|
,000
|
(6) Power
Coefficients
Unstandardized coefficients
|
Standardized coefficients
|
t
|
Sig.
|
||
---|---|---|---|---|---|
B
|
Std. error
|
Beta
|
|||
Interventions/doctor
|
−,002
|
,005
|
−,114
|
−,499
|
,624
|
(Constant)
|
25,281
|
1,632
|
15,489
|
,000
|
(7) Exponential
The largest test statistics are given
by (4) Quadratic and (5) Cubic. Now, we can construct regression
equations for these two best fit curves using the data from the
ANOVA tables.
(4) Quadratic
(5) Cubic
The above equations can be used to make a prediction about the best
fit y-value from a given x-value, e.g., with x = 10 you might
expect an y-value of
Alternatively, predictions about the best fit y-values from
x-values given can also be fairly accurately extrapolated from the
curves as drawn.
7 Conclusion
The relationship between quantity of
care and quality of care is curvilinear. Curvilinear regression has
helped finding the best fit curve. If the standard curvilinear
regression models do not yet fit the data, then there are other
possibilities, like logit and probit transformations, Box Cox
transformations, ACE (alternating conditional expectations)/AVAS
(additive and variance stabilization) packages, Loess (locally
weighted scatter plot smoothing) and spline modeling (see also
Chap. 26). These methods are, however,
increasingly complex, and, often, computationally very intensive.
But, for a computer this is no problem.
8 Note
More background, theoretical, and
mathematical information of curvilinear estimation is given in
Statistics applied to clinical studies 5th edition, Chaps. 16 and
24, Springer Heidelberg Germany, 2012, from the same authors.