In Reinforcement learning using feature approximation, does one have a single set of weights or a set of...
This question is an attempt to reframe this question to make it clearer.
This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.
These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.
The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.
But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.
My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.
Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.
Is this the right way to think about it, or am I missing something?
Thanks.
feature-extraction reinforcement-learning
add a comment |
This question is an attempt to reframe this question to make it clearer.
This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.
These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.
The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.
But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.
My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.
Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.
Is this the right way to think about it, or am I missing something?
Thanks.
feature-extraction reinforcement-learning
Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18
There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40
Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33
Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28
You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56
add a comment |
This question is an attempt to reframe this question to make it clearer.
This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.
These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.
The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.
But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.
My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.
Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.
Is this the right way to think about it, or am I missing something?
Thanks.
feature-extraction reinforcement-learning
This question is an attempt to reframe this question to make it clearer.
This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.
These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.
The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.
But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.
My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.
Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.
Is this the right way to think about it, or am I missing something?
Thanks.
feature-extraction reinforcement-learning
feature-extraction reinforcement-learning
edited Nov 20 at 17:50
asked Nov 20 at 17:31
RussAbbott
9461820
9461820
Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18
There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40
Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33
Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28
You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56
add a comment |
Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18
There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40
Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33
Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28
You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56
Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18
Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18
There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40
There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40
Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33
Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33
Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28
Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28
You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56
You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398440%2fin-reinforcement-learning-using-feature-approximation-does-one-have-a-single-se%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398440%2fin-reinforcement-learning-using-feature-approximation-does-one-have-a-single-se%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18
There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40
Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33
Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28
You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56