One of the big challenges ahead is to find a way to integrate these disparate approaches into a single conceptual framework. In essence, each of these different approaches represent solutions to different but inter-related problems in understanding how the brain learns from reinforcement. A unified hierarchical framework would seem well poised to accommodate each of these
Osimertinib cost distinct components. The need to perform learning and inference over state-space structure can easily be accommodated in such a way, by adding a level of hierarchy tasked with finding the relevant features to form a state-space, while other levels of the hierarchy are concerned with learning about the values for actions within that state-space.
Furthermore, while hierarchical RL studies in neuroimaging have focused to date on MF and not MB approaches, it is a natural extension this website to imagine that both MB and MF learning strategies could be accommodated within this framework. One possibility would be to envisage that MB reasoning would be most likely to occur at the higher end of a hierarchical structure, for instance at the level of selecting abstract options to pursue abstract goals, such as for example, selecting the ‘option’ of going to a Chinese restaurant tonight to get dinner, while MF control might be more likely to occur for actions at the lower end of the hierarchy, that is, in selecting a stimulus-response chain to drive one’s car to go to the restaurant. This proposal is echoed in earlier connectionist [66] and psychological [67] models of decision-making. More recently, the integration of an MB/MF action control hierarchy
has been discussed within the context of RL actor-critic models [44] and a computational model by which meta-actions might be learned via TDPE signals has been described [68]. This framework has also found applications in the prediction Avelestat (AZD9668) of human actions in the context of assistive robots 69 and 70]. However, it is also plausible that as one moves down the action hierarchy, even relatively low level actions might under some conditions be performed in a MB manner, particularly if the MF system has unreliable predictions for those actions. Considering meta-actions as action sequences performed by the MF controller, the transmission of pseudo-prediction errors (PPE) to the arbitrator might serve as a low-cost monitoring signal ensuring that behavior is never run exclusively in an ‘open-loop’ manner and that the MB system can always intervene if necessary. It is conceivable that arbitration between MB and MF strategies acts at multiples levels of the action hierarchy and that behavior is driven by a mix of both MB and MF strategies operating at different hierarchical levels (see Figure 2).