In Reinforcement learning using feature approximation, does one have a single set of weights or a set of...












1














This question is an attempt to reframe this question to make it clearer.



This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.



These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.



The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.



But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.



My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.



Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.



Is this the right way to think about it, or am I missing something?



Thanks.










share|improve this question
























  • Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
    – Pablo EM
    Dec 8 at 13:18










  • There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
    – RussAbbott
    Dec 9 at 4:40










  • Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
    – Pablo EM
    Dec 9 at 12:33










  • Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
    – RussAbbott
    Dec 9 at 21:28










  • You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
    – RussAbbott
    Dec 9 at 21:56


















1














This question is an attempt to reframe this question to make it clearer.



This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.



These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.



The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.



But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.



My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.



Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.



Is this the right way to think about it, or am I missing something?



Thanks.










share|improve this question
























  • Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
    – Pablo EM
    Dec 8 at 13:18










  • There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
    – RussAbbott
    Dec 9 at 4:40










  • Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
    – Pablo EM
    Dec 9 at 12:33










  • Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
    – RussAbbott
    Dec 9 at 21:28










  • You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
    – RussAbbott
    Dec 9 at 21:56
















1












1








1







This question is an attempt to reframe this question to make it clearer.



This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.



These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.



The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.



But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.



My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.



Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.



Is this the right way to think about it, or am I missing something?



Thanks.










share|improve this question















This question is an attempt to reframe this question to make it clearer.



This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.



These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.



The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.



But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.



My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.



Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.



Is this the right way to think about it, or am I missing something?



Thanks.







feature-extraction reinforcement-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 17:50

























asked Nov 20 at 17:31









RussAbbott

9461820




9461820












  • Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
    – Pablo EM
    Dec 8 at 13:18










  • There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
    – RussAbbott
    Dec 9 at 4:40










  • Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
    – Pablo EM
    Dec 9 at 12:33










  • Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
    – RussAbbott
    Dec 9 at 21:28










  • You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
    – RussAbbott
    Dec 9 at 21:56




















  • Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
    – Pablo EM
    Dec 8 at 13:18










  • There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
    – RussAbbott
    Dec 9 at 4:40










  • Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
    – Pablo EM
    Dec 9 at 12:33










  • Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
    – RussAbbott
    Dec 9 at 21:28










  • You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
    – RussAbbott
    Dec 9 at 21:56


















Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18




Why do you think it is necessary to anticipate the effect of an action? If I've understood correctly, the feature functions are applied to the current state and current action, you don't need to anticipate the next state. Right?
– Pablo EM
Dec 8 at 13:18












There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40




There is some code from UC Berkeley for a class that teaches reinforcement learning. It tries each action and by anticipating what each does, determines the feature values for each of those states and picks an action based which produces the best result.
– RussAbbott
Dec 9 at 4:40












Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33




Do you mean the class from the video you included in the question? From the slide you mention doesn't seem they try all the actions, but honestly I haven't watch the complete video. Anyway, could you please point me to the code you are talking about? I'm curious about it :D
– Pablo EM
Dec 9 at 12:33












Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28




Here's a version translated from Python 2 t Python 3. (drive.google.com/file/d/1LJxpJNAu2K_FCq9KLAMhgrrCxcjtg18N/…)
– RussAbbott
Dec 9 at 21:28












You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56






You might be interested in the RL portion of this course: inst.eecs.berkeley.edu/~cs188/fa18/project3.html#Q10. (The full course: inst.eecs.berkeley.edu/~cs188/fa18. RL Is Week 5. The relevant project is P3 listed on the Week 6 line.)
– RussAbbott
Dec 9 at 21:56



















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398440%2fin-reinforcement-learning-using-feature-approximation-does-one-have-a-single-se%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398440%2fin-reinforcement-learning-using-feature-approximation-does-one-have-a-single-se%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Tonle Sap (See)

I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

Guatemaltekische Davis-Cup-Mannschaft