Forecast accuracy in R
Multi tool use
I have followed the instructions in this StMoMo package document to fit Lee Carter to mortality data for Canada.
The next step in my project is to measure the forecast accuracy of the Lee Carter model when fit to this Canadian data.
To do this, I tried to use accuracy() but ran into an error since my Lee Carter fit is of class "fitStMoMo" and not of class "forecast" or a time series.
Is there an alternative forecast accuracy function that I can use on "fitStMoMo" objects that will calculate Mean Error, Root Mean Squared Error, Mean Absolute Error, Mean Percentage Error, Mean Absolute Percentage Error and Mean Absolute Scaled Error for me?
Reprex
Reprex created using EWMaleData as used in the StMoMo document to flag the error specifically:
library("StMoMo")
library("demography")
library("forecast")
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link = "logit", staticAgeFun = TRUE, periodAgeFun = "NP",
constFun = constLC)
LC <- lc(link = "logit")
LC$gnmFormula
#> [1] "D/E ~ -1 + offset(o) + factor(x) + Mult(factor(x), factor(t), inst = 1)"
EWMaleData
#> Mortality data for England and Wales
#> Series: male
#> Years: 1961 - 2011
#> Ages: 0 - 100
#> Exposure: central
EWMaleIniData <- central2initial(EWMaleData)
ages.fit <- 55:89
wxt <- genWeightMat(ages = ages.fit, years = EWMaleIniData$years,
clip = 3)
LCfit <- fit(LC, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
#> StMoMo: The following cohorts have been zero weigthed:
#> 1872 1873 1874 1954 1955 1956
#> StMoMo: Start fitting with gnm
#> Initialising
#> Running start-up iterations..
#> Running main iterations.....
#> Done
#> StMoMo: Finish fitting with gnm
LCfor <- forecast(LCfit, h = 50)
class(LCfit)
#> [1] "fitStMoMo"
class(LCfor)
#> [1] "forStMoMo"
accuracy(LCfit)
#> Error in accuracy.default(LCfit): First argument should be a forecast object
#> or a time series.
accuracy(LCfor)
#> Error in accuracy.default(LCfor): First argument should be a forecast object
#> or a time series.
r
|
show 5 more comments
I have followed the instructions in this StMoMo package document to fit Lee Carter to mortality data for Canada.
The next step in my project is to measure the forecast accuracy of the Lee Carter model when fit to this Canadian data.
To do this, I tried to use accuracy() but ran into an error since my Lee Carter fit is of class "fitStMoMo" and not of class "forecast" or a time series.
Is there an alternative forecast accuracy function that I can use on "fitStMoMo" objects that will calculate Mean Error, Root Mean Squared Error, Mean Absolute Error, Mean Percentage Error, Mean Absolute Percentage Error and Mean Absolute Scaled Error for me?
Reprex
Reprex created using EWMaleData as used in the StMoMo document to flag the error specifically:
library("StMoMo")
library("demography")
library("forecast")
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link = "logit", staticAgeFun = TRUE, periodAgeFun = "NP",
constFun = constLC)
LC <- lc(link = "logit")
LC$gnmFormula
#> [1] "D/E ~ -1 + offset(o) + factor(x) + Mult(factor(x), factor(t), inst = 1)"
EWMaleData
#> Mortality data for England and Wales
#> Series: male
#> Years: 1961 - 2011
#> Ages: 0 - 100
#> Exposure: central
EWMaleIniData <- central2initial(EWMaleData)
ages.fit <- 55:89
wxt <- genWeightMat(ages = ages.fit, years = EWMaleIniData$years,
clip = 3)
LCfit <- fit(LC, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
#> StMoMo: The following cohorts have been zero weigthed:
#> 1872 1873 1874 1954 1955 1956
#> StMoMo: Start fitting with gnm
#> Initialising
#> Running start-up iterations..
#> Running main iterations.....
#> Done
#> StMoMo: Finish fitting with gnm
LCfor <- forecast(LCfit, h = 50)
class(LCfit)
#> [1] "fitStMoMo"
class(LCfor)
#> [1] "forStMoMo"
accuracy(LCfit)
#> Error in accuracy.default(LCfit): First argument should be a forecast object
#> or a time series.
accuracy(LCfor)
#> Error in accuracy.default(LCfor): First argument should be a forecast object
#> or a time series.
r
1
The question is too broad as likely to be closed. Don't ask so many questions in one question. In order to ask a better question please read How to ask a good question and Minimal, Complete, and Verifiable Example and How to make a great R reproducible example.
– Rui Barradas
Nov 21 '18 at 11:35
@RuiBarradas Okay I will edit it to just ask one question.
– Vicky
Nov 21 '18 at 11:40
This answer has a very low chance of being answered unless you supply some reproducible code. Assuming the Canadian data isn't critical for your question, I suggest basing your example code instead on the data set supplied with theStMoMo
package.
– AkselA
Nov 21 '18 at 11:58
@AkselA I'm working on my reproducible example now!
– Vicky
Nov 21 '18 at 12:06
@RuiBarradas I have added a reprex now
– Vicky
Nov 21 '18 at 12:25
|
show 5 more comments
I have followed the instructions in this StMoMo package document to fit Lee Carter to mortality data for Canada.
The next step in my project is to measure the forecast accuracy of the Lee Carter model when fit to this Canadian data.
To do this, I tried to use accuracy() but ran into an error since my Lee Carter fit is of class "fitStMoMo" and not of class "forecast" or a time series.
Is there an alternative forecast accuracy function that I can use on "fitStMoMo" objects that will calculate Mean Error, Root Mean Squared Error, Mean Absolute Error, Mean Percentage Error, Mean Absolute Percentage Error and Mean Absolute Scaled Error for me?
Reprex
Reprex created using EWMaleData as used in the StMoMo document to flag the error specifically:
library("StMoMo")
library("demography")
library("forecast")
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link = "logit", staticAgeFun = TRUE, periodAgeFun = "NP",
constFun = constLC)
LC <- lc(link = "logit")
LC$gnmFormula
#> [1] "D/E ~ -1 + offset(o) + factor(x) + Mult(factor(x), factor(t), inst = 1)"
EWMaleData
#> Mortality data for England and Wales
#> Series: male
#> Years: 1961 - 2011
#> Ages: 0 - 100
#> Exposure: central
EWMaleIniData <- central2initial(EWMaleData)
ages.fit <- 55:89
wxt <- genWeightMat(ages = ages.fit, years = EWMaleIniData$years,
clip = 3)
LCfit <- fit(LC, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
#> StMoMo: The following cohorts have been zero weigthed:
#> 1872 1873 1874 1954 1955 1956
#> StMoMo: Start fitting with gnm
#> Initialising
#> Running start-up iterations..
#> Running main iterations.....
#> Done
#> StMoMo: Finish fitting with gnm
LCfor <- forecast(LCfit, h = 50)
class(LCfit)
#> [1] "fitStMoMo"
class(LCfor)
#> [1] "forStMoMo"
accuracy(LCfit)
#> Error in accuracy.default(LCfit): First argument should be a forecast object
#> or a time series.
accuracy(LCfor)
#> Error in accuracy.default(LCfor): First argument should be a forecast object
#> or a time series.
r
I have followed the instructions in this StMoMo package document to fit Lee Carter to mortality data for Canada.
The next step in my project is to measure the forecast accuracy of the Lee Carter model when fit to this Canadian data.
To do this, I tried to use accuracy() but ran into an error since my Lee Carter fit is of class "fitStMoMo" and not of class "forecast" or a time series.
Is there an alternative forecast accuracy function that I can use on "fitStMoMo" objects that will calculate Mean Error, Root Mean Squared Error, Mean Absolute Error, Mean Percentage Error, Mean Absolute Percentage Error and Mean Absolute Scaled Error for me?
Reprex
Reprex created using EWMaleData as used in the StMoMo document to flag the error specifically:
library("StMoMo")
library("demography")
library("forecast")
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link = "logit", staticAgeFun = TRUE, periodAgeFun = "NP",
constFun = constLC)
LC <- lc(link = "logit")
LC$gnmFormula
#> [1] "D/E ~ -1 + offset(o) + factor(x) + Mult(factor(x), factor(t), inst = 1)"
EWMaleData
#> Mortality data for England and Wales
#> Series: male
#> Years: 1961 - 2011
#> Ages: 0 - 100
#> Exposure: central
EWMaleIniData <- central2initial(EWMaleData)
ages.fit <- 55:89
wxt <- genWeightMat(ages = ages.fit, years = EWMaleIniData$years,
clip = 3)
LCfit <- fit(LC, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
#> StMoMo: The following cohorts have been zero weigthed:
#> 1872 1873 1874 1954 1955 1956
#> StMoMo: Start fitting with gnm
#> Initialising
#> Running start-up iterations..
#> Running main iterations.....
#> Done
#> StMoMo: Finish fitting with gnm
LCfor <- forecast(LCfit, h = 50)
class(LCfit)
#> [1] "fitStMoMo"
class(LCfor)
#> [1] "forStMoMo"
accuracy(LCfit)
#> Error in accuracy.default(LCfit): First argument should be a forecast object
#> or a time series.
accuracy(LCfor)
#> Error in accuracy.default(LCfor): First argument should be a forecast object
#> or a time series.
r
r
edited Nov 21 '18 at 13:06
AkselA
4,32621225
4,32621225
asked Nov 21 '18 at 11:27
VickyVicky
85
85
1
The question is too broad as likely to be closed. Don't ask so many questions in one question. In order to ask a better question please read How to ask a good question and Minimal, Complete, and Verifiable Example and How to make a great R reproducible example.
– Rui Barradas
Nov 21 '18 at 11:35
@RuiBarradas Okay I will edit it to just ask one question.
– Vicky
Nov 21 '18 at 11:40
This answer has a very low chance of being answered unless you supply some reproducible code. Assuming the Canadian data isn't critical for your question, I suggest basing your example code instead on the data set supplied with theStMoMo
package.
– AkselA
Nov 21 '18 at 11:58
@AkselA I'm working on my reproducible example now!
– Vicky
Nov 21 '18 at 12:06
@RuiBarradas I have added a reprex now
– Vicky
Nov 21 '18 at 12:25
|
show 5 more comments
1
The question is too broad as likely to be closed. Don't ask so many questions in one question. In order to ask a better question please read How to ask a good question and Minimal, Complete, and Verifiable Example and How to make a great R reproducible example.
– Rui Barradas
Nov 21 '18 at 11:35
@RuiBarradas Okay I will edit it to just ask one question.
– Vicky
Nov 21 '18 at 11:40
This answer has a very low chance of being answered unless you supply some reproducible code. Assuming the Canadian data isn't critical for your question, I suggest basing your example code instead on the data set supplied with theStMoMo
package.
– AkselA
Nov 21 '18 at 11:58
@AkselA I'm working on my reproducible example now!
– Vicky
Nov 21 '18 at 12:06
@RuiBarradas I have added a reprex now
– Vicky
Nov 21 '18 at 12:25
1
1
The question is too broad as likely to be closed. Don't ask so many questions in one question. In order to ask a better question please read How to ask a good question and Minimal, Complete, and Verifiable Example and How to make a great R reproducible example.
– Rui Barradas
Nov 21 '18 at 11:35
The question is too broad as likely to be closed. Don't ask so many questions in one question. In order to ask a better question please read How to ask a good question and Minimal, Complete, and Verifiable Example and How to make a great R reproducible example.
– Rui Barradas
Nov 21 '18 at 11:35
@RuiBarradas Okay I will edit it to just ask one question.
– Vicky
Nov 21 '18 at 11:40
@RuiBarradas Okay I will edit it to just ask one question.
– Vicky
Nov 21 '18 at 11:40
This answer has a very low chance of being answered unless you supply some reproducible code. Assuming the Canadian data isn't critical for your question, I suggest basing your example code instead on the data set supplied with the
StMoMo
package.– AkselA
Nov 21 '18 at 11:58
This answer has a very low chance of being answered unless you supply some reproducible code. Assuming the Canadian data isn't critical for your question, I suggest basing your example code instead on the data set supplied with the
StMoMo
package.– AkselA
Nov 21 '18 at 11:58
@AkselA I'm working on my reproducible example now!
– Vicky
Nov 21 '18 at 12:06
@AkselA I'm working on my reproducible example now!
– Vicky
Nov 21 '18 at 12:06
@RuiBarradas I have added a reprex now
– Vicky
Nov 21 '18 at 12:25
@RuiBarradas I have added a reprex now
– Vicky
Nov 21 '18 at 12:25
|
show 5 more comments
1 Answer
1
active
oldest
votes
I'm not entirely sure about how accuracy()
from forecast
works, but in some way it must be comparing real and predicted values, and returning metrics on how much they differ. This can, in a broad sense, be regarded as a form of cross-validation. As accuracy()
doesn't work for StMoMo
objects, we might as well develop a cross-validation routine ourself.
For a short primer on this form of cross-validation, I'd recommend Rob Hyndman's notes on tsCV()
from forecast
. It would have been nice if we could use tsCV()
here, but it only works for univariate time series, and mortality data is essentially multivariate time series.
I should also mention that before today I had never heard about Mortality Modeling, so the model theory part of this I'm very fuzzy on.
This first bit is identical to what was already posted
library(StMoMo)
library(demography)
library(forecast)
data(EWMaleData)
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link="logit", staticAgeFun=TRUE, periodAgeFun="NP", constFun=constLC)
LC <- lc(link="logit")
EWMaleIniData <- central2initial(EWMaleData)
Then things get a little different. The central point of performing CV on a time series is to do a prediction on data we actually have, but we pretend we don't. Therefore we'll have to subset our data so that the chunk we want to predict on isn't part of the model. In this concrete example we'll use the first 30 years, and then predict the next 10
ages.fit <- 55:89
years.fit <- EWMaleIniData$years[1]:(EWMaleIniData$years[1] + 30)
years.for <- 10
wxt <- genWeightMat(ages=ages.fit, years=years.fit, clip=3)
LCfit <- fit(LC, data=EWMaleIniData, ages.fit=ages.fit,
years.fit=years.fit, wxt=wxt)
LCfor <- forecast(LCfit, h=years.for)
Now that we have a ten-year forecast, we can compare those years to our actual data and use what ever error measure we want to see how accurate the prediction is.
cvy <- LCfor$years # years used in forecast
cva <- LCfor$ages # ages used in forecast
pred <- LCfor$rates # predicted mortality rates
# actual mortality rates subset to the same ages and years as forecast
actual <- EWMaleIniData$Dxt/EWMaleIniData$Ext
actual <- actual[rownames(actual) %in% cva,
colnames(actual) %in% cvy]
# A collection of error measures. plenty of others can be devised
err <- pred - actual
Q <- pred/actual
rmse <- sqrt(rowMeans(err^2))
mae <- rowMeans(abs(err))
smape <- 100 * (rowMeans(exp(abs(log(Q)))) - 1)
This bit is purely for displaying the results
par(mfrow=c(3, 1), mar=c(3, 3, 1, 1), mgp=c(2, 0.8, 0), oma=c(0, 0, 2, 0))
plot(as.numeric(names(rmse)), rmse, type="h", xlab="")
plot(as.numeric(names(mae)), mae, type="h", xlab="")
plot(as.numeric(names(smape)), smape, type="h", xlab="Ages")
mtext(paste("Forecast accuracy for the years",
paste(cvy[c(1, years.for)], collapse=" - ")),
3, outer=TRUE)
As seen in Hyndman's notes, to do this properly we'd have to do this comparison using forecasts at several points in our time series, and the average the scores.
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411110%2fforecast-accuracy-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not entirely sure about how accuracy()
from forecast
works, but in some way it must be comparing real and predicted values, and returning metrics on how much they differ. This can, in a broad sense, be regarded as a form of cross-validation. As accuracy()
doesn't work for StMoMo
objects, we might as well develop a cross-validation routine ourself.
For a short primer on this form of cross-validation, I'd recommend Rob Hyndman's notes on tsCV()
from forecast
. It would have been nice if we could use tsCV()
here, but it only works for univariate time series, and mortality data is essentially multivariate time series.
I should also mention that before today I had never heard about Mortality Modeling, so the model theory part of this I'm very fuzzy on.
This first bit is identical to what was already posted
library(StMoMo)
library(demography)
library(forecast)
data(EWMaleData)
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link="logit", staticAgeFun=TRUE, periodAgeFun="NP", constFun=constLC)
LC <- lc(link="logit")
EWMaleIniData <- central2initial(EWMaleData)
Then things get a little different. The central point of performing CV on a time series is to do a prediction on data we actually have, but we pretend we don't. Therefore we'll have to subset our data so that the chunk we want to predict on isn't part of the model. In this concrete example we'll use the first 30 years, and then predict the next 10
ages.fit <- 55:89
years.fit <- EWMaleIniData$years[1]:(EWMaleIniData$years[1] + 30)
years.for <- 10
wxt <- genWeightMat(ages=ages.fit, years=years.fit, clip=3)
LCfit <- fit(LC, data=EWMaleIniData, ages.fit=ages.fit,
years.fit=years.fit, wxt=wxt)
LCfor <- forecast(LCfit, h=years.for)
Now that we have a ten-year forecast, we can compare those years to our actual data and use what ever error measure we want to see how accurate the prediction is.
cvy <- LCfor$years # years used in forecast
cva <- LCfor$ages # ages used in forecast
pred <- LCfor$rates # predicted mortality rates
# actual mortality rates subset to the same ages and years as forecast
actual <- EWMaleIniData$Dxt/EWMaleIniData$Ext
actual <- actual[rownames(actual) %in% cva,
colnames(actual) %in% cvy]
# A collection of error measures. plenty of others can be devised
err <- pred - actual
Q <- pred/actual
rmse <- sqrt(rowMeans(err^2))
mae <- rowMeans(abs(err))
smape <- 100 * (rowMeans(exp(abs(log(Q)))) - 1)
This bit is purely for displaying the results
par(mfrow=c(3, 1), mar=c(3, 3, 1, 1), mgp=c(2, 0.8, 0), oma=c(0, 0, 2, 0))
plot(as.numeric(names(rmse)), rmse, type="h", xlab="")
plot(as.numeric(names(mae)), mae, type="h", xlab="")
plot(as.numeric(names(smape)), smape, type="h", xlab="Ages")
mtext(paste("Forecast accuracy for the years",
paste(cvy[c(1, years.for)], collapse=" - ")),
3, outer=TRUE)
As seen in Hyndman's notes, to do this properly we'd have to do this comparison using forecasts at several points in our time series, and the average the scores.
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
add a comment |
I'm not entirely sure about how accuracy()
from forecast
works, but in some way it must be comparing real and predicted values, and returning metrics on how much they differ. This can, in a broad sense, be regarded as a form of cross-validation. As accuracy()
doesn't work for StMoMo
objects, we might as well develop a cross-validation routine ourself.
For a short primer on this form of cross-validation, I'd recommend Rob Hyndman's notes on tsCV()
from forecast
. It would have been nice if we could use tsCV()
here, but it only works for univariate time series, and mortality data is essentially multivariate time series.
I should also mention that before today I had never heard about Mortality Modeling, so the model theory part of this I'm very fuzzy on.
This first bit is identical to what was already posted
library(StMoMo)
library(demography)
library(forecast)
data(EWMaleData)
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link="logit", staticAgeFun=TRUE, periodAgeFun="NP", constFun=constLC)
LC <- lc(link="logit")
EWMaleIniData <- central2initial(EWMaleData)
Then things get a little different. The central point of performing CV on a time series is to do a prediction on data we actually have, but we pretend we don't. Therefore we'll have to subset our data so that the chunk we want to predict on isn't part of the model. In this concrete example we'll use the first 30 years, and then predict the next 10
ages.fit <- 55:89
years.fit <- EWMaleIniData$years[1]:(EWMaleIniData$years[1] + 30)
years.for <- 10
wxt <- genWeightMat(ages=ages.fit, years=years.fit, clip=3)
LCfit <- fit(LC, data=EWMaleIniData, ages.fit=ages.fit,
years.fit=years.fit, wxt=wxt)
LCfor <- forecast(LCfit, h=years.for)
Now that we have a ten-year forecast, we can compare those years to our actual data and use what ever error measure we want to see how accurate the prediction is.
cvy <- LCfor$years # years used in forecast
cva <- LCfor$ages # ages used in forecast
pred <- LCfor$rates # predicted mortality rates
# actual mortality rates subset to the same ages and years as forecast
actual <- EWMaleIniData$Dxt/EWMaleIniData$Ext
actual <- actual[rownames(actual) %in% cva,
colnames(actual) %in% cvy]
# A collection of error measures. plenty of others can be devised
err <- pred - actual
Q <- pred/actual
rmse <- sqrt(rowMeans(err^2))
mae <- rowMeans(abs(err))
smape <- 100 * (rowMeans(exp(abs(log(Q)))) - 1)
This bit is purely for displaying the results
par(mfrow=c(3, 1), mar=c(3, 3, 1, 1), mgp=c(2, 0.8, 0), oma=c(0, 0, 2, 0))
plot(as.numeric(names(rmse)), rmse, type="h", xlab="")
plot(as.numeric(names(mae)), mae, type="h", xlab="")
plot(as.numeric(names(smape)), smape, type="h", xlab="Ages")
mtext(paste("Forecast accuracy for the years",
paste(cvy[c(1, years.for)], collapse=" - ")),
3, outer=TRUE)
As seen in Hyndman's notes, to do this properly we'd have to do this comparison using forecasts at several points in our time series, and the average the scores.
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
add a comment |
I'm not entirely sure about how accuracy()
from forecast
works, but in some way it must be comparing real and predicted values, and returning metrics on how much they differ. This can, in a broad sense, be regarded as a form of cross-validation. As accuracy()
doesn't work for StMoMo
objects, we might as well develop a cross-validation routine ourself.
For a short primer on this form of cross-validation, I'd recommend Rob Hyndman's notes on tsCV()
from forecast
. It would have been nice if we could use tsCV()
here, but it only works for univariate time series, and mortality data is essentially multivariate time series.
I should also mention that before today I had never heard about Mortality Modeling, so the model theory part of this I'm very fuzzy on.
This first bit is identical to what was already posted
library(StMoMo)
library(demography)
library(forecast)
data(EWMaleData)
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link="logit", staticAgeFun=TRUE, periodAgeFun="NP", constFun=constLC)
LC <- lc(link="logit")
EWMaleIniData <- central2initial(EWMaleData)
Then things get a little different. The central point of performing CV on a time series is to do a prediction on data we actually have, but we pretend we don't. Therefore we'll have to subset our data so that the chunk we want to predict on isn't part of the model. In this concrete example we'll use the first 30 years, and then predict the next 10
ages.fit <- 55:89
years.fit <- EWMaleIniData$years[1]:(EWMaleIniData$years[1] + 30)
years.for <- 10
wxt <- genWeightMat(ages=ages.fit, years=years.fit, clip=3)
LCfit <- fit(LC, data=EWMaleIniData, ages.fit=ages.fit,
years.fit=years.fit, wxt=wxt)
LCfor <- forecast(LCfit, h=years.for)
Now that we have a ten-year forecast, we can compare those years to our actual data and use what ever error measure we want to see how accurate the prediction is.
cvy <- LCfor$years # years used in forecast
cva <- LCfor$ages # ages used in forecast
pred <- LCfor$rates # predicted mortality rates
# actual mortality rates subset to the same ages and years as forecast
actual <- EWMaleIniData$Dxt/EWMaleIniData$Ext
actual <- actual[rownames(actual) %in% cva,
colnames(actual) %in% cvy]
# A collection of error measures. plenty of others can be devised
err <- pred - actual
Q <- pred/actual
rmse <- sqrt(rowMeans(err^2))
mae <- rowMeans(abs(err))
smape <- 100 * (rowMeans(exp(abs(log(Q)))) - 1)
This bit is purely for displaying the results
par(mfrow=c(3, 1), mar=c(3, 3, 1, 1), mgp=c(2, 0.8, 0), oma=c(0, 0, 2, 0))
plot(as.numeric(names(rmse)), rmse, type="h", xlab="")
plot(as.numeric(names(mae)), mae, type="h", xlab="")
plot(as.numeric(names(smape)), smape, type="h", xlab="Ages")
mtext(paste("Forecast accuracy for the years",
paste(cvy[c(1, years.for)], collapse=" - ")),
3, outer=TRUE)
As seen in Hyndman's notes, to do this properly we'd have to do this comparison using forecasts at several points in our time series, and the average the scores.
I'm not entirely sure about how accuracy()
from forecast
works, but in some way it must be comparing real and predicted values, and returning metrics on how much they differ. This can, in a broad sense, be regarded as a form of cross-validation. As accuracy()
doesn't work for StMoMo
objects, we might as well develop a cross-validation routine ourself.
For a short primer on this form of cross-validation, I'd recommend Rob Hyndman's notes on tsCV()
from forecast
. It would have been nice if we could use tsCV()
here, but it only works for univariate time series, and mortality data is essentially multivariate time series.
I should also mention that before today I had never heard about Mortality Modeling, so the model theory part of this I'm very fuzzy on.
This first bit is identical to what was already posted
library(StMoMo)
library(demography)
library(forecast)
data(EWMaleData)
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
c1 <- mean(kt[1, ], na.rm = TRUE)
c2 <- sum(bx[, 1], na.rm = TRUE)
list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1))
}
LC <- StMoMo(link="logit", staticAgeFun=TRUE, periodAgeFun="NP", constFun=constLC)
LC <- lc(link="logit")
EWMaleIniData <- central2initial(EWMaleData)
Then things get a little different. The central point of performing CV on a time series is to do a prediction on data we actually have, but we pretend we don't. Therefore we'll have to subset our data so that the chunk we want to predict on isn't part of the model. In this concrete example we'll use the first 30 years, and then predict the next 10
ages.fit <- 55:89
years.fit <- EWMaleIniData$years[1]:(EWMaleIniData$years[1] + 30)
years.for <- 10
wxt <- genWeightMat(ages=ages.fit, years=years.fit, clip=3)
LCfit <- fit(LC, data=EWMaleIniData, ages.fit=ages.fit,
years.fit=years.fit, wxt=wxt)
LCfor <- forecast(LCfit, h=years.for)
Now that we have a ten-year forecast, we can compare those years to our actual data and use what ever error measure we want to see how accurate the prediction is.
cvy <- LCfor$years # years used in forecast
cva <- LCfor$ages # ages used in forecast
pred <- LCfor$rates # predicted mortality rates
# actual mortality rates subset to the same ages and years as forecast
actual <- EWMaleIniData$Dxt/EWMaleIniData$Ext
actual <- actual[rownames(actual) %in% cva,
colnames(actual) %in% cvy]
# A collection of error measures. plenty of others can be devised
err <- pred - actual
Q <- pred/actual
rmse <- sqrt(rowMeans(err^2))
mae <- rowMeans(abs(err))
smape <- 100 * (rowMeans(exp(abs(log(Q)))) - 1)
This bit is purely for displaying the results
par(mfrow=c(3, 1), mar=c(3, 3, 1, 1), mgp=c(2, 0.8, 0), oma=c(0, 0, 2, 0))
plot(as.numeric(names(rmse)), rmse, type="h", xlab="")
plot(as.numeric(names(mae)), mae, type="h", xlab="")
plot(as.numeric(names(smape)), smape, type="h", xlab="Ages")
mtext(paste("Forecast accuracy for the years",
paste(cvy[c(1, years.for)], collapse=" - ")),
3, outer=TRUE)
As seen in Hyndman's notes, to do this properly we'd have to do this comparison using forecasts at several points in our time series, and the average the scores.
edited Nov 21 '18 at 21:12
answered Nov 21 '18 at 18:42
AkselAAkselA
4,32621225
4,32621225
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
add a comment |
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
That's really great, thanks so much! Very useful. You've got me out of my 2 day slump! Thanks again!
– Vicky
Nov 21 '18 at 22:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411110%2fforecast-accuracy-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
h,Tc9iS 27xC VPMcAytpyJ,ERvO,tIG9PP rIA,gpDs,h3UNy ZBi,6LfeCcFrApov,5Ax
1
The question is too broad as likely to be closed. Don't ask so many questions in one question. In order to ask a better question please read How to ask a good question and Minimal, Complete, and Verifiable Example and How to make a great R reproducible example.
– Rui Barradas
Nov 21 '18 at 11:35
@RuiBarradas Okay I will edit it to just ask one question.
– Vicky
Nov 21 '18 at 11:40
This answer has a very low chance of being answered unless you supply some reproducible code. Assuming the Canadian data isn't critical for your question, I suggest basing your example code instead on the data set supplied with the
StMoMo
package.– AkselA
Nov 21 '18 at 11:58
@AkselA I'm working on my reproducible example now!
– Vicky
Nov 21 '18 at 12:06
@RuiBarradas I have added a reprex now
– Vicky
Nov 21 '18 at 12:25