2.66: You can define a function for this problem with dummy
variable fun(dummy) and use
sapply(1:200,fun).
But a natural way would be to use "flow control" such as "for". Look at >help("for") # notice that "? for" will
not work...
For part d) you may need "if".
So, write a function for one trial. Then include for(i in 1:200) and
make appropriate changes... Store all output
data in a data frame (read chapters 5.1 and 5.2 here)
Something like this...
function(n)
{
X=c(4,8,12,16,20)
EY=20+4*X
Yh10=0
left.limit=0
right.limit=0
for(i in 1:n)
{
e=rnorm(5,sd=5)
Y=EY+e
lr=lm(Y~X)
b=coef(lr)
Yh10[i]=b[[1]]+b[[2]]*10
sde=sqrt(deviance(lr)/df.residual(lr)) #this computes standard
deviation of the residuals
left.limit[i]=....
right.limit[i]=....
#plot(X,Y)
#abline(lr)
#points(10,Yh10[i],pch="*")
#segments(10,left.limit[i],10,right.limit[i])
}
data.frame(Yh10,left.limit,right.limit)
}
I give you my version of the function which works with matrices and do
not use for or if. If you decide
to use this version as a basis for your solution, then you must clearly comment all lines (just
try them and see what they do).
p266 = function(n,a)
{
X=c(4,8,12,16,20)
EY=20+4*X
EY10=20+4*10
e=rnorm(5*n,sd=5)
dim(e)=c(5,n)
Y=EY+e
lr=lm(Y~X)
b=coef(lr)
Yh10=b[1,]+b[2,]*10
dev.e=deviance(lr)/df.residual(lr)
dev.Yh10 = dev.e*(
1/length(X)+(10-mean(X))^2/sum( (X-mean(X))^2 ) )
s.Yh10=sqrt(dev.Yh10)
left.limit=Yh10-qt(1-a/2,df.residual(lr))*s.Yh10
right.limit=Yh10+qt(1-a/2,df.residual(lr))*s.Yh10
inside = (
(left.limit<EY10) & (EY10<right.limit) )
data.frame(Yh10,left.limit,right.limit,b[2,],inside)
# To get answer for part d): > p =
p266(200,0.05) >
sum(p$inside)/length(p$inside)
# The following can be used if you want to get an answer for part
d) inside the function
# list( conf.int.data =
data.frame(Yh10,left.limit,right.limit,b[2,],inside),
#
prop.inside = sum(inside)/length(inside) )
}
Note: When you write a function
(say, fun()
) and need to "fix" it, use fix(fun).
This way you do not need to re-define it every time
you want make a change.
3.4: When you create several plots of the same data, you may find
useful par(mfrow=c(2,3)). Run
it and plot several graphs.
Try c(2,1)....
a) You can built a histogram instead of a dot plot: hist. Read a help
of hist on the argument breaks and use an "appropriate"
number of "breaks". Also, use dotchart.
This plots a data as dots with
a "time" order on a vertical axis.
c) stem is what you need. Recall that you do not have to compute
residuals once you regressed your data
(lr$res - how to find it out
if
you do not remember?). Try stem(lr$res,
scale=2)... A look at sorted
residuals sort(lr$res)
can help you to figure out how stem-and-leaf
diagram works in R.
d) Again, do not compute Y^hat_i. It is a part of lm-object.
e) cor(x,y) gives correlation
coefficient between x and y. Note:
ordered residuals should be used in the test!
f) The plot is straightforward. acf gives a formal mean to check time
correlation... but more on this in "time series" part of the course.
g) Conduct Brown-Forsythe instead of Breusch-Pagan.