Understanding Keras LSTM NN input & output for binary classification
I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:
xtrain.shape == (5000, 16, 6)
ytrain.shape == (5000, 16)
xtest.shape == (2000, 16, 6)
ytest.shape == (2000, 16)
My model looks like this:
model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);
model.fit(xtrain, ytrain, batch_size=16, epochs=10);
However after trying to fit the model I get this error:
ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)
What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).
If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having
xtrain.shape == (80000, 16, 6)
ytrain.shape == (80000,)
xtest.shape == (32000, 16, 6)
ytest.shape == (32000,)
python tensorflow keras lstm
add a comment |
I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:
xtrain.shape == (5000, 16, 6)
ytrain.shape == (5000, 16)
xtest.shape == (2000, 16, 6)
ytest.shape == (2000, 16)
My model looks like this:
model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);
model.fit(xtrain, ytrain, batch_size=16, epochs=10);
However after trying to fit the model I get this error:
ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)
What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).
If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having
xtrain.shape == (80000, 16, 6)
ytrain.shape == (80000,)
xtest.shape == (32000, 16, 6)
ytest.shape == (32000,)
python tensorflow keras lstm
add a comment |
I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:
xtrain.shape == (5000, 16, 6)
ytrain.shape == (5000, 16)
xtest.shape == (2000, 16, 6)
ytest.shape == (2000, 16)
My model looks like this:
model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);
model.fit(xtrain, ytrain, batch_size=16, epochs=10);
However after trying to fit the model I get this error:
ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)
What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).
If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having
xtrain.shape == (80000, 16, 6)
ytrain.shape == (80000,)
xtest.shape == (32000, 16, 6)
ytest.shape == (32000,)
python tensorflow keras lstm
I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:
xtrain.shape == (5000, 16, 6)
ytrain.shape == (5000, 16)
xtest.shape == (2000, 16, 6)
ytest.shape == (2000, 16)
My model looks like this:
model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);
model.fit(xtrain, ytrain, batch_size=16, epochs=10);
However after trying to fit the model I get this error:
ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)
What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).
If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having
xtrain.shape == (80000, 16, 6)
ytrain.shape == (80000,)
xtest.shape == (32000, 16, 6)
ytest.shape == (32000,)
python tensorflow keras lstm
python tensorflow keras lstm
asked Nov 22 '18 at 23:31
SEnergySEnergy
137
137
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You are close with the last comments of the question. Since it's a binary classification problem, you should have 1
output per input, so you need to get rid of the 16
in you y
s and replace it for a 1
.
Besides, you need to be able to divide the train set by your batch size, so you can use 5008
for example.
In fact:
ytrain.shape == (5000, 1)
Passes the error you mention, but raises a new one:
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples
Which is addressed by ensuring that:
xtrain.shape == (5008, 16, 6)
ytrain.shape == (5008, 1)
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
I'm not sure I'm following you. You haven
train samples.n
should be divisible by the batch size, but you can fit then
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you havef
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one uniquey
. This is the schema for a binary classification of a sequence.
– Julian Peller
Nov 23 '18 at 20:04
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439070%2funderstanding-keras-lstm-nn-input-output-for-binary-classification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You are close with the last comments of the question. Since it's a binary classification problem, you should have 1
output per input, so you need to get rid of the 16
in you y
s and replace it for a 1
.
Besides, you need to be able to divide the train set by your batch size, so you can use 5008
for example.
In fact:
ytrain.shape == (5000, 1)
Passes the error you mention, but raises a new one:
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples
Which is addressed by ensuring that:
xtrain.shape == (5008, 16, 6)
ytrain.shape == (5008, 1)
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
I'm not sure I'm following you. You haven
train samples.n
should be divisible by the batch size, but you can fit then
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you havef
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one uniquey
. This is the schema for a binary classification of a sequence.
– Julian Peller
Nov 23 '18 at 20:04
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
add a comment |
You are close with the last comments of the question. Since it's a binary classification problem, you should have 1
output per input, so you need to get rid of the 16
in you y
s and replace it for a 1
.
Besides, you need to be able to divide the train set by your batch size, so you can use 5008
for example.
In fact:
ytrain.shape == (5000, 1)
Passes the error you mention, but raises a new one:
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples
Which is addressed by ensuring that:
xtrain.shape == (5008, 16, 6)
ytrain.shape == (5008, 1)
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
I'm not sure I'm following you. You haven
train samples.n
should be divisible by the batch size, but you can fit then
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you havef
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one uniquey
. This is the schema for a binary classification of a sequence.
– Julian Peller
Nov 23 '18 at 20:04
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
add a comment |
You are close with the last comments of the question. Since it's a binary classification problem, you should have 1
output per input, so you need to get rid of the 16
in you y
s and replace it for a 1
.
Besides, you need to be able to divide the train set by your batch size, so you can use 5008
for example.
In fact:
ytrain.shape == (5000, 1)
Passes the error you mention, but raises a new one:
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples
Which is addressed by ensuring that:
xtrain.shape == (5008, 16, 6)
ytrain.shape == (5008, 1)
You are close with the last comments of the question. Since it's a binary classification problem, you should have 1
output per input, so you need to get rid of the 16
in you y
s and replace it for a 1
.
Besides, you need to be able to divide the train set by your batch size, so you can use 5008
for example.
In fact:
ytrain.shape == (5000, 1)
Passes the error you mention, but raises a new one:
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples
Which is addressed by ensuring that:
xtrain.shape == (5008, 16, 6)
ytrain.shape == (5008, 1)
answered Nov 22 '18 at 23:56
Julian PellerJulian Peller
8941511
8941511
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
I'm not sure I'm following you. You haven
train samples.n
should be divisible by the batch size, but you can fit then
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you havef
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one uniquey
. This is the schema for a binary classification of a sequence.
– Julian Peller
Nov 23 '18 at 20:04
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
add a comment |
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
I'm not sure I'm following you. You haven
train samples.n
should be divisible by the batch size, but you can fit then
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you havef
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one uniquey
. This is the schema for a binary classification of a sequence.
– Julian Peller
Nov 23 '18 at 20:04
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?
– SEnergy
Nov 23 '18 at 16:32
I'm not sure I'm following you. You have
n
train samples. n
should be divisible by the batch size, but you can fit the n
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y
. This is the schema for a binary classification of a sequence.– Julian Peller
Nov 23 '18 at 20:04
I'm not sure I'm following you. You have
n
train samples. n
should be divisible by the batch size, but you can fit the n
samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f
features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y
. This is the schema for a binary classification of a sequence.– Julian Peller
Nov 23 '18 at 20:04
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?
– SEnergy
Nov 23 '18 at 20:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439070%2funderstanding-keras-lstm-nn-input-output-for-binary-classification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown