MIT AGI: Cognitive Architecture (Nate Derbinsky)

bfO4EkoGh40 • 2018-03-20

Transcript preview

Open

Kind: captions
Language: en
so today we have nadir bin ski he's a
professor at Northeastern University
working on various aspects of
computational agents that exhibit human
level intelligence
please give Nate a warm welcome thanks a
lot and thanks for having me here so the
title that was on the page was cognitive
modeling I'll kind of get there but I
wanted to put it in context so the the
bigger theme here is I want to talk
about what's called cognitive
architecture and if you've never heard
about that before that's great and I
wanted to contextualize that as how are
we what how is that one approach to get
us to AGI
and I say what my view of AGI is and put
up a whole bunch of TV and movie
characters that I grew up with that
inspire me that will lead us into what
is this thing called cognitive
architecture it's a whole research field
that crosses neuroscience psychology
cognitive science and all the way into
AI so I'll try to give you kind of the
historical big-picture view of it what
some of the actual systems are out there
that might be of interest to you and
then we'll kind of zoom in on one of
them that I've done a good amount of
work with called soar and what I'll try
to do is tell a story a research story
of how we started with kind of a core
research question we look to how humans
operate understood that phenomenon and
then took it and so really interesting
results from it and so at the end if
this field is of interest there's a few
pointers for you to go read more and and
go experience more of cognitive
architecture so just rough definition of
AGI given this in AGI class depending
the direction that you're coming from it
might be kind of understanding
intelligence or maybe developing
intelligent systems they're operating at
the level of human level intelligence
the the typical differences between this
and other sorts of maybe AI machine
learning systems we want systems that
are going to persist for a long period
of time
we want them robust to different
conditions we want them learning over
time and here's the crux of it working
on different tasks and in a lot of cases
tasks they didn't know we're coming
ahead of time I got into this because I
clearly watched too much TV and too many
movies and then I looked back at this
and I realized I think I'm covering 70's
80's 90's nots I guess it is and today
and so this is what I wanted out of AI
and this is what I wanted to work with
and then there's the the reality that we
have today
so instead of so who's watched Knight
Rider for instance I I don't think that
exists yet but but maybe we're getting
there
and in particular for fun during the
Amazon sale day I got myself an Alexa
and I could just see myself at some
point saying Alexa please might write me
an R sync script you know to sync my
class and if you have an Alexa you
probably know the following phrase this
this just always hurts me inside which
is sorry I don't know that one which is
okay right that's a lot of people have
no idea what I'm asking let alone how to
do that so what I want Alexa to respond
with after that is do you have time to
teach me and to provide some sort of
interface by which back and forth we can
kind of talk through this that we aren't
there yet to say the least but I'll talk
later about some work on a system called
Rosie that's working in that direction
we're starting to see see some ideas
about being able to teach systems out of
work
so folks who are in this field I think
generally fall into these three
categories they're just curious they
want to learn new things generate
knowledge work on hard problems great I
think there are folks who are in kind of
that middle cognitive modeling realm and
so I'll use this term a lot it's really
understanding how humans think how
humans operate human intelligence at
multiple levels and if you can do that
one there's just knowledge in and of
itself of how we operate but there's a
lot of really important applications
that you can think of if we were able to
not only understand but predict how
humans would respond react in various
tasks medicine is is an easy one there's
some work in HCI or HR I I'll get to
later where if you can predict how
humans would respond to a test you can
iterate tightly and develop better
interfaces it's already being used in
the realm of simulation and in defense
industries I happen to fall into the
latter group which or the bottom group
which is systems development which is to
say just the desire to build systems for
various tasks that are working on tasks
that kind of current AI machine learning
can't operate on and I think when you're
working at this level or on any system
that nobody's really achieved before
what do you do you you kind of look to
the examples that you have which in this
case that we know of
it's just humans right irrespective of
your motivation when you have kind of an
intent that you want to achieve in your
research you kind of let that drive your
approach and so I often show my AI
students this
the touring test you might have heard of
or variants of it that have come before
these were folks who are trying to
create system that acted in a certain
way that acted intelligently and the
kind of line that they drew the
benchmark that they used was to say
let's make systems that operate like
humans do
cognitive modelers will fit up into this
top point here to say it's not enough to
act that way but by some definition of
thinking we want the system to do what
humans do or at least be able to make
predictions about it so that might be
things like what errors would the human
make on this task or how long would it
take them to perform this task or what
emotion would be produced in this task
there are folks who are still thinking
about how the computer is operating but
it's trying to apply kind of rational
rules to it so a logician for instance
would say if you have a and you have B
the a gives you B B gives you see a
should definitely give you C that's just
what's rational and so there folks
operate in that direction and then if
you go to intro AI class anywhere in the
country particularly Berkeley because
they have graphics designers that I get
to steal from the benchmark would be
what the system produces in terms of
action and the benchmark is some sort of
optimal rational bound irrespective of
where you work in the space there's kind
of a common output that arrives when you
research these areas which is you can
learn individual bits and pieces and it
can be hard to bring them together to
build a system that either predicts or
acts on different tasks so this is part
of the transfer learning problem but
it's also part of having distinct
theories that are hard to combine
together so I'm going to give an example
that come comes out of cognitive
modeling or perhaps three examples so if
you were in a HCI class or some interest
psychology classes one of the first
things you'll learn about is Fitz law
which provides you the ability to
predict the difficulty level of
basically a human pointing
from where they start to a particular
place and it turns out that you can
learn some parameters and model this
based upon just the distance from where
you are to the targets and the size of
the target so both moving along distance
will take a while but also if you're
aiming for a very small point that can
take longer then if there's a large area
that you just kind of have to get
yourself to and so this is held true for
many humans so let's say we've learned
this and then we move on to the next
task
and we learn about what's called the
power law of practice which has been
shown true in a number of different
tasks what I'm showing here is one of
them where you're going to draw a line
through sequential set of circles here
starting at 1 going to 2 and so forth
not making a mistake or at least not
trying to and try to do this as fast as
possible and so for a particular person
we would fit the a B and C parameters
and we'd see a power law so as you
perform this task more you're going to
see a decrease in the amount of reaction
time required to complete the task great
we've learned two things about humans
let's add some more in so for those who
might have done some reinforcement
learning TV learning is one of those
approaches temporal difference learning
that's had some evidence of similar
sorts of processes in the dopamine
centers of the brain and it basically
says in a sequential learning tasks you
perform the task you get some sort of
reward how are you going to kind of
update your representation of what to do
in the future such as to maximize
expectation of future reward and there
are various models of how that changes
over time and you can build up functions
that allow you to form better and better
and better a given trial and error great
so we've learned three interesting
models here that hold true over multiple
people multiple tasks and so my question
is if we take these together and add
them together how do we start to
understand a task as quote/unquote
simple as chess which is to say we could
ask questions how how long would it take
for a person to play what mistakes would
they make
they played a few games how would they
adapt themselves or if we want to
develop system that ended up being good
at chess or at least learning to become
better at chess
my question is if you could there
doesn't seem to be a clear way to take
these very very individual theories and
kind of smash them together and get a
reasonable answer of how to play chess
or how do humans play chess and so
gentlemen in this slide is Alan Newell
one of the founders of AI did incredible
work in psychology and other fields
he gave a series of lectures at Harvard
in 1987 and they were published in 1990
called the unified theories of cognition
and his argument to the psychology
community at that point was the argument
on the prior slide they had many
individual studies many individual
results and so the question was how do
you bring them together to gain this
overall theory how do you make forward
progress and so his proposal was unified
theories of cognition which became known
as cognitive architecture which is to
say to bring together your core
assumptions your core beliefs of what
are the fixed mechanisms and processes
that intelligent agents would use across
tasks so the representations the
learning mechanisms the memory systems
bring them together implement them in a
theory and use that across tasks and the
core idea is that when you actually have
to implement this and see how it's going
to work across different tasks the
interconnections between these different
processes and representations would add
constraint and over time the constraints
would start limiting the design space of
what is necessary and what is possible
in terms of building intelligent systems
and so the overall goal from there was
to understand and exhibit human level
intelligence using these cognitive
architectures a'nature
question asked is okay so we've gone
from a methodology of science that we
understand how to operate in we make a
hypothesis we construct a study we
gather our data we evaluate that data
and we falsify we do not falsify the
original hypothesis and we can do that
over and over again and we know that
we're making for progress scientifically
if I've now taken that model and changed
it into I have a piece of software and
it's representing my theories and to
some extent I can configure that
software in different ways to work on
different tasks how do I know that I'm
making progress and so there's a form of
science called lactose ium and it's kind
of shown pictorially here where you
start with your core of what your your
beliefs are about where your head what
is necessary for achieving the goal that
you have and around that you'll have
kind of ephemeral hypotheses and
assumptions that over time may grow and
shrink and so you're trying out
different things trying out different
things and if an assumption is around
there long enough it becomes part of
that core and so as you work on more
tasks can learn more
either by your work or by data coming in
from with someone else the core is
growing larger and larger
you've got more constraints and you've
made more progress and so what I wanted
to look at we're in this community what
are some of the core assumptions that
are driving forward scientific progress
so one of them actually came out of
those lectures they're referred to as
Newell's time scales of human action and
so off on the left the left two columns
are both time units just expect somewhat
differently second from the left being
maybe more useful to a lot of us in
understanding daily life one step over
from there would be kind of at what
level processes are occurring so the
lowest three are down at kind of the
substrate the neuronal level we're
building up to deliberate tasks that
occur in the brain and tasks that are
operating on the order of ten seconds
some of these might occur in the
psychology laboratory but probably a
step up to
and ours and then above that really
becomes interactions between agents over
time and so if we start with that the
things to take away is that regular the
hypothesis is that regularities will
occur at these different time scales and
that they're useful and so those who
operate at that lowest time scale might
be considering neuroscience cognitive
neuroscience when you shift up to the
next couple levels what we would think
about in terms of the areas of science
that deal with that would be psychology
and cognitive science and then we shift
up a level and we're talking about
sociology and economics and the
interplay between agents over time and
so what we'll find with cognitive
architecture is that most of them will
tend to sit at the deliberate act we're
trying to take knowledge of a situation
and make a single decision and then
sequences of decisions over time will
build to tasks and tasks over time will
build to more interesting phenomenon I'm
actually going to show that that isn't
strictly true that there are folks
working in this field that actually do
operate one level below some other
assumptions so this is herb Simon
receiving the Nobel Prize in Economics
and part of what he received that award
for was an idea of bounded rationality
so in various fields we tend to model
humans as rational and his argument was
let's consider that human beings are
operating under various kinds of
constraints and so to model the rational
with respect to and bounded by how
complex the problem is that they're
working on how big is that search space
that they have to conquer cognitive
limitations so speed of operations
amount of memory short-term as well as
long-term as well as other aspects of
our computing infrastructure that are
going to keep us from being able to
arbitrarily solve complex problems as
well as how much time is available to
make that decision and so this is
actually a phrase that came out of his
speech when he received the Nobel Prize
decision-makers can satisfice either by
finding optimum
solutions for a simplified world just to
say take your big problem simplify in
some way and then solve that or by
finding satisfactory solutions for a
more realistic world take the world and
all its complexity take the problem in
all its complexity and try to find
something that works neither approach in
general dominates the other and both
have continued to co-exist and so what
you're actually going to see throughout
the cognitive architecture community is
this understanding that some problems
you're not going to be able to get an
optimal solution to if you consider for
instance bounded amount of computation
bounded time the need to be reactive to
a changing environment these sorts of
issues and so in some sense we can
decompose problems that come up over and
over again into simpler problems solve
those newer optimally or optimally fix
those in optimize those but more general
problems we might have to satisfy some
there's also the idea of the simple
system hypothesis so this is Alan Newell
and herb Simon they're considering how a
computer could play the game of chess so
the physical system physical symbol
system talks about the idea of taking
something some signal abstractly
referred to as symbol combining them in
some ways to form expressions and then
having operations that produce new
expressions a weak interpretation of the
idea that symbol systems are necessary
and sufficient for intelligent systems a
very weak way of talking about it is the
claim that there's nothing unique about
the neuronal infrastructure that we have
but if we got the software right we
could implement it in the bits bytes Ram
and processor that make up modern
computers that's kind of the weakest way
to look at this that we can do it with
silicon and not carbon stronger way that
this used to be looked at was more of a
logical standpoint which is to say if we
can encode rules of logic these tend to
line up if we think intuitively of
planning and problem solving and if we
can just get that right and get enough
fat's in there and enough
in there that somehow intelligence well
that's what we need for intelligence and
eventually we can get to the point of
intelligence and that's what you need
for intelligence and that was a starting
point that lasted for a while I think by
now most folks in this field would agree
that that's necessary to be able to
operate logically but that there are
going to be representations and
processes that will benefit from non
symbolic representation so particularly
perceptual processing visual auditory
and processing things in a more kind of
standard machine learning sort of way as
well as kind of statistic taking
advantage of statistical representations
so we're getting closer to actually
looking at cognitive architectures I did
want to go back to the idea that
different researchers are coming with
different research foci foci and we'll
start off with kind of the lowest level
and understanding biological modeling so
Leiber and spawn both try to model
different degrees of low-level details
parameters firing rates connectivities
between different kind of levels of
neuronal representations they build that
up and then they tried to build tasks
above that layer but always being very
cautious about being true to human
biological processes at a layer above
there would be psychological modeling
which is to say trying to build systems
that are true in some sense to areas of
the brain interactions in the brain and
being able to predict errors that we
made timing that we produced by the
human mind and so there I'll talk a
little bit about Akhtar this final level
down here these are systems that are
focused mainly on producing functional
systems that exhibit really cool
artifacts and and solve really cool
problems and so I'll spend most of the
time talking about soar but I want to
point out a relative newcomer in the
game called Sigma
so to talk about spawn a little bit
we'll see if the sound works in here I'm
going to let the Creator take this one
or not see how the AV system likes this
[Music]
but of course if I wouldn't be pleased
with a pad of the microseconds and all
celebrated since we're engineering is
critical goal engineering allows you to
break down equation intense very precise
descriptions which we can test like
building actual models one probably do
recently is called the song model this
Moscow has the two and a half million
individual mountains isolated and evens
the model is a knife and the up the
canal is Bernard
so essentially you can see images of
numbers and that is something like a
progressive in the case of the discarded
into that seat but magic tide reproduces
style that
yeah so for instance it's easy to get
the on environment and actually
forgive separately and silently on
medical side we all know that we have
cognitive town concession we get over
and we can try that address accountant
spicy alienating process with these nice
models another potential area in fact
it's on artificial intelligence a lot of
working out visual Imperfects thanks to
donations that are exceeded it at one
pass pretty since plane test what's
special is fine is that it's like that
many different paths and this X we have
not found it might appear californee the
flow of information through different
parts of the model something I haven't
seen very well so provide a pointer at
the end he's got a really cool book
called how to build a brain and if you
google and you can google spun you can
find a toolkit where you can kind of
construct circuits that will approximate
functions that you're interested in
connect them together set certain
properties that you would want at a low
level and build them up and actually
work on tasks at the level of vision and
robotic actuation so that's a really
cool system as we move into
architectures that are sitting above
that biological level I wanted to give
you kind of an overall sense of what
they're going to look like what a
prototypical architecture is going to
look like so they're gonna have some
ability to have perception the
modalities typically are more digital
symbolic but they will depending on the
architecture be able to handle vision
audition and various sensory inputs
these will get represented in some sort
of short-term memory whatever the
state's representation for the
particular system is there it's typical
to have a representation of the
knowledge of what tasks can be performed
when they should be performed how they
should be controlled and so these are
typically both actions that take place
internally that manage the internal
state of the system and perform internal
computations but also about external
actuation and external might be
a digital system a game AI but it might
also be some sort of robotic actuation
in real world there's typically some
sort of mechanism by which to select
from the available actions in a
particular situation there's typically
some way to augment this procedural
information which is to say learn about
new actions possibly modify existing
ones there's typically some semblance of
what's called declarative memory so
whereas procedural at least in humans if
I asked you to describe how to ride a
bike you might be able to say get on the
seats and pedal but in terms of keeping
your balance there you'd have a pretty
hard time describing it declaratively so
that's kind of the procedural side the
implicit representation of knowledge
whereas declarative would include facts
geography
math but it also include experiences
that the agent has had a more episodic
representation of declarative memory and
they'll typically have some way of
learning this information on mending it
over time and then finally some way of
taking actions in the world and they'll
all have some sort of cycle which is
perception comes in knowledge that the
agent has is brought to bear on that an
action is selected knowledge that knows
to condition on that action will act
accordingly both with internal processes
as well as eventually to take action and
then rinse and repeat so when we talk
about in an AI system an agent in this
context that would be the fixed
representation which is whatever
architecture we're talking about plus
set of knowledge that is typically
specific to the task but might be more
general so oftentimes these systems
could incorporate a more general
knowledge base of facts of linguistic
facts of Geographic facts
let's take Wikipedia and let's just
stick it in the brain of the system
there'll be more tasks in general but
then also whatever it is that you're
doing right now how should you proceed
in that and then it's typical to see
this processing cycle and going back to
the prior assumption the idea is that
these primitive cycles allow for the
agent to be reactive to its environment
so if new things come in that has react
to if the Lions sitting over there I
better run and maybe not do my calculus
homework right so as long as this cycle
is going I'm reactive but at the same
time if multiple actions are taken over
time I'm able to get complex behavior
over the long term so this is the act
our cognitive architecture it has many
of the kind of core pieces that I talked
about before let's see if the mouse yes
mouse is useful up there so we have the
procedural model here a short-term
memory is going to be these buffers that
are on the outside the procedural memory
is encoded as what I call production
rules or if-then rules if is this is the
state of my short-term memory this is
what I think should happen as a result
you have a selection of the appropriate
rule to fire and an execution you're
seeing associated parts of the brain
being represented here cool thing that
has been done over time in the act our
community is to make predictions about
brain areas and then perform MRIs and
gather that data and correlate that data
so when you use the system you will get
predictions about things like timing of
operations errors that will occur
probabilities that something is learned
but you'll also get predictions about to
degree that they can kind of brain areas
that are going to line light up and if
you want to that's actively being
developed at Carnegie Mellon to the left
is John Anderson who developed this
cognitive architecture ooh 30 ish years
ago and until the last about five years
he was the primary researcher developer
behind it with Christian and then
recently he's decided to spend more time
on cognitive tutoring systems and so
Christian has become the primary
developer there is an annual akhtar
workshop there's a summer school where
if you're thinking about modeling a
particular task you can kind of bring
your task to them bring your data they
teach you how to use the system and try
to get that study going right there on
the spot to give you a sense of what
kinds of tasks this could be applied to
so this is a representative of a certain
class of tasks certainly not the only
one let's try this again
I think powerpoints going to want a
restart every time okay so we're getting
predictions about basically where the
eye is going to move what you're not
seeing is it's actually processing
things like text and colors and making
predictions about what to do and how to
represent the information and how to
process the graph as a whole
I had alluded to this earlier there's
work by Bonnie John very similar so
making predictions about how humans
would use computer interfaces and at the
time she got hired away by IBM and so
they wanted the ability to have software
that you can put in front of software
designers and when they think they have
a good interface press a button this
model of human cognition would try to
perform the tasks that have been told to
do and make predictions about how long
it would take and so you can have this
tight feedback loop from designers
saying here's how good your particular
interfaces so act are as a whole it's
very prevalent in this community I went
to their web page and counted up just
the papers that they knew about it was
over 1,100 papers over time if you're
interested in it the main distribution
is in Lisp but many people have used
this and wanted to apply it to systems
that need a little more processing power
so there's the NRL has a Java port of it
that they use in robotics the Air Force
Research Lab and Dayton has implemented
it in Erlang for a parallel processing
of large declare knowledge bases they're
trying to do service-oriented
architectures with it CUDA because they
want what it has to say they don't want
to wait around for it to have to figure
that stuff out
so that's the two minutes about Akhtar
Sigma is a relative newcomer and it's
developed out at the University of
Southern California by man named Paul
rosenbloom mmm mentioned a couple
minutes because he was one of the prime
developers of soar at Carnegie Mellon so
he knows a lot about how store works and
he's worked on it over the years and I
think originally I'm gonna speak for him
and he'll probably say I was wrong I
think originally it was kind of a mental
exercise of can i reproduce or using a
uniform substrate I'll talk about so in
a little bit it's thirty years of
research code if anybody is dealt with
research code it's thirty years of C and
C++ with dozens of graduate students
over time it's not pretty at all and and
theoretically it's got these boxes
sitting out here and so he reimplemented
the the core functionality of soar all
using factor graphs and message passing
algorithms under the hood he got to that
point and then said there's nothing
stopping me from going further and so
now it can do all sorts of modern
machine learning vision optimization
sort of things that would take some time
in any other architecture to be able to
integrate well so it's been an
interesting experience it's now going to
be the basis for the virtual human
project out at the Institute for
Creative Technology it's Institute
associated with University of Southern
California for him until recently could
get your hands on it but in the last
couple years he's done some tutorials on
it he's got a public release with
documentation so that's something
interesting to keep an eye on but I'm
gonna spend all the remaining time on
the Soraa cognitive architecture and so
you see it looks quite a bit like the
prototypical architecture and I'll give
a sense again about how this all
operates give a sense of the people
involved we already talked about Alan
Newell so both John Laird who is my
advisor and Paul Rosenbloom were
students of Alan Newell John's thesis
project was related to the chunking
mechanism and soar which learns new
rules based upon sub-goal reasoning so
he finished that I believe the year I
was born and so he's one of the few
researchers you'll find who's still
actively working on their thesis project
beyond that's about I think about ten
years ago he founded soar technology
which is company up in Ann Arbor
Michigan while it's called solar
technology it doesn't do exclusively
soar but that's a part of the portfolio
general intelligence system stuff a lot
of Defense Association so some notes of
what's going to make soar different from
the other other architectures that fall
into this kind of functional
architecture category a big thing is a
focus on efficiency so john wants to be
able to run soar on just about anything
we just got on the soar mailing list a
desire to run it on a real-time
processor and our answer while we had
never done it before was probably it'll
work every release there's timing tests
and we always what we what we look at is
in a bunch of different domains for a
bunch of different reasons that relate
to human processing there's this magic
number that comes out which is 50
milliseconds which is to say in terms of
responding to tasks if you're above that
time humans will sense a delay and you
don't want that to happen now if we're
working in a robotics task 50
milliseconds if you're dramatically
above that you just fell off the curb or
worse or you just hit somebody in a car
right so we're trying to keep that as
low as possible and for most agents it
it doesn't even register it's below 1
millisecond fractions of millisecond but
I'll come back to this because a lot of
the work that I was doing was computer
science AI and a lot of efficient
algorithms and data structures and 50
milliseconds was that very high upper
bound it's also one of the projects that
has a public distribution you can get in
all sorts of operating systems we use
something called swig that allows you to
interface with it in a bunch of
different languages we kind of describe
the meta description and you are able to
basically generate bindings and
different platforms Korres C++ there was
a team at sore tech that said we don't
like C++ it gets messy so they actually
did a port over to pure Java in case
that appeals to you there's an annual
soar workshop that takes place in Ann
Arbor typically it's free you can go
there get a sort tutorial and talk to
folks who are working on soar and it's
fun I've been there every year but one
in the last decade it's just fun to see
the people around the world that are
using the system and all sorts of
interesting ways to give you a sense of
the diversity of the applications one of
the first was our one store which was
back in the days when it was an actual
challenge to build a computer which is
to say that your choice of certain
components would have radical
implications for other parts of the
computer so it wasn't just the Dell
website where you just I want this much
RAM I want this much CPU there was a lot
of thinking that went behind it and then
physical labor that went to construct
your computer and so it was making that
process a lot better there are folks
that apply to natural language
processing
I saw r7 was the core of the virtual
humans project for a long time
HCI tasks terrasaur was one of the
largest rule-based systems
tens of thousands of rules over 48 hours
it was a very large-scale simulation a
defense simulation lots of games it's
been applied to for various reasons and
then in the last few years
porting it on to mobile robotics
platforms this is Edwin Olsen's splinter
bot an early version of it that went on
to win the magic competition then I went
on to put soar on the web and if after
this talk you're really interested in a
dice game that I'm going to talk about
you can actually go to the iOS App Store
and download it's called Michigan liar's
dice it's free you don't have to pay for
it but you can actually play a liar's
dice with soar and it's even set the
difficulty level it's pretty good it
beats me on a regular basis I wanted to
give you a couple other just kind of
really weird feeling sort of
applications and really cool
applications the first one
is out of Georgia Tech go PowerPoint is
dom-based
interactive art installation in which
she participants can engage and
collaborate the movement improvisation
with each other and virtual advance
permits this thing her actually creates
a hyperspace English virtual and quicker
real bodies meet the line between human
and non-human is learned through images
to examine a relationship with
technology the night installation
ultimately examines how humans and
machine can co-create experiences and it
ducks out in a playful environment the
don't creates a social space that
encourages human human interaction and
collective dance experiences allowing
the depends to create an explorer
movement while having fun the
development of lumini has been a hundred
exploration in our forms of theatre and
dance as well as research and artificial
intelligence and cognitive science
lumahai draws inspiration from the
ancient art form of shot here the
original two-dimensional version of the
installation led the conceptualization
of the dome in the liminal space which
even silhouettes and virtual character
is being danced together on the
projection surface rather than relying
on a predominant library of movement
responses the virtual dancer learns in
this part measurements and utilizes new
points movement theory to systematically
reason about them and working
improvisational shoes under the moon
response the points theory is based in
dance and theater and analyzes the
performance along the dimensions of
tempo duration repetition kinesthetic
response shape spatial relationships
gesture architecture and
Photography the virtual dancer is able
to use several different strategies to
respond to human movements
these include mimicry of a movement
transformation of the movement along
viewpoints and mentions we're calling a
similar or complementary movement from
memory in terms of you fight revolutions
and define actually sponsor patterns of
the agent has learned while dancing with
its human partner the reason we did this
is this is part of a larger effort in
our lab for understanding the
relationship between
compeition cognition and creativity
where a large amount of our efforts go
into understanding human creativity and
how we make things together out were
created together as a way that almost
understand how we can build co-created
AI that serves the same purpose where to
be a colleague and collaborate with us
and create things with us so Brian was a
graduate student in John leritz
lab as well before I start this I lude
into this earlier where we're getting
closer to rosie saying can you teach me
so let me give you some introduction to
this in the lower left you're seeing the
view of a Kinect camera onto a flat
surface there's a robotic arm mainly 3d
printed parts few servos above that
you're seeing an interpretation of the
scene we're giving it kind of
associations of the four areas with
semantic titles like one is the table
one is the garbage just just semantic
terms for areas but other than that the
agent doesn't actually know all that
much and it's going to operate in two
modalities one is we'll call it natural
language natural ich language restricted
subset of English as well as some quote
unquote pointing so you're gonna see
some Mouse pointers in the upper left
saying I'll talk about this and this is
just a way to indicate location and so
starting off we're gonna say things like
you know pick up the blue block and it's
gonna be like I don't know what
is what is blue we say oh well that's a
color okay
you know so go get the green thing
what's green oh it's a color okay move
the blue thing to a particular location
where's that point it okay what is
moving like really it has to start from
the beginning and it's described and it
said okay now you've finished and once
we got to that point now I can say move
the green thing over here and it's got
everything that it needs to be able to
then reproduce the task given new
parameters and it's learned that ability
so let me give it a little bit of time
so you can look a little bit at top left
in terms of the pointers you're going to
see some text commands being entered so
what kind of attribute is blue we're
gonna say it's a color and so that can
map it then to a particular sensory
modality this is green so the pointing
what kind of thing is green okay color
so now it knows how to understand blue
and green as colors with respect to the
visual scene move rectangle to the table
what is rectangle okay now I can map
that on to or understanding parts of the
world is this the blue rectangle so the
arm is actually pointing itself to get
confirmation from the instructor and
then we're trying to understand in
general when you say move something what
is the goal of this operation and so
then it also has a declared
representation of the idea of this task
not only that it completed it then it
can look back on having completed the
task and understand what were the steps
that led to achieving a particular goal
so in order
move it you're gonna have to pick it up
it knows which one the blue thing is
great now
in the table so that's a particular
location and at this point we can say
you're done you have accomplished the
moved blue rectangle to the table and so
I can understand what that very simple
kind of process is like and associate
that with the verb to move and now we
can say move the green object or not do
the garbage and without any further
interaction based upon everything that
learned up till that point it can
successfully complete that task so this
is work of chavala Mohan and others at
the shore group at the University of
Michigan on the bruisy project and
they're extending this to playing games
and learning the rules of games through
text-based descriptions and multimodal
experience so in order to build up to
here's a story and so I wanted to give
you a sense of how research occurs in
the group and so there's these back and
forth that occur over time between
there's this piece of software called
soar we want to make this thing better
and give it new capabilities and so all
our agents are going to become better
and we always have to keep in mind and
you'll see this as I go further that it
has to be useful to a wide variety of
agents it has to be task independent and
it has to be efficient for us to do
anything in the architecture all of
those have to hold true so we do
something cool in the architecture and
then we say okay let's solve a cool
problem so it's build some agents to do
this and so this ends up testing what
are the limitations what are the issues
that arise in a particular mechanism as
well as integration with others and we
get to solve interesting problems we
usually find there was something missing
and then we can go back to the
architecture and rinse and repeat just
to give you an idea again how sore works
so the working memory is actually a
directed connected graph the perception
is just a subset of that graph and so
there's going to be symbolic
representations of most of the world
there is a visual subsystem in which you
can provide a scene graph just not
showing it here actions are also a
subset of that graph and so the
procedural knowledge which is also
production rules can modify can
sections of the input modify sections of
the output as well as arbitrary parts of
the graph to take actions so the
decision procedure says of all the
things that I know to do and I've kind
of ranked them according to various
preferences what single things should I
do
semantic memory for facts there's
episodic memory the agent is always
actually storing every experience it's
ever had over time in episodic memory
and it has the ability to get back to
that and so the similar cycle we saw
before we get input in this perception
called the input link rules are going to
fire all in parallel and say here's
everything I know about the situation
here's all the things I could do
decision procedure says here's what
we're going to do based upon the
selected operator all sorts of things
could happen with respect to memories
providing input rules firing to perform
computations and as well as potentially
output in the world and remember agent
reactivity is required we want the
system to be able to react to things in
the world at a very quick pace so
anything that happens in this cycle at
max the overall cycle has to be under 50
milliseconds and so that's gonna be
constraint we hold ourselves to and so
the story I'll be telling will say how
we got to a point where we started
actually forgetting things and we're an
architecture that doesn't want to be
like humans we want to create cool
systems but what we realized was
something that we do there's probably
some benefit to it and we actually put
it into our system in the lead to good
outputs so here's the research path I'm
going to walk down we had just a simple
problem which was we have these memory
systems and sometimes they're going to
get a cue that could relate to multiple
memories and the question is if you have
a fixed mechanism what should you return
in a task independent way which one of
these many memories should you return
that was our question and we looked to
some human data on this something called
the rational analysis of memory done by
John Anderson and realized that in human
language there are recency and frequency
effects that maybe it would be useful
and so we actually did an analysis found
that not only
does this occur but it's useful in what
are called word sense disambiguation
tasks I'll get to that what that means
in a second develop some algorithms to
scale this really well and it turned out
to worked out well not only in the
original task when we learn look to two
other completely different ones
the same underlying mechanism ended up
producing some really interesting
outputs so let me talk about word sense
disambiguation real quick this is a core
problem in natural language processing
if you haven't heard of it before let's
say we have an agent and for some reason
it needs to understand the verb to run
looks to its memory and finds that it
could you know run in the park it could
be running a fever could run an election
it could run a program and the question
is what should an task independent
memory mechanism return if all you've
been given is the verb to run and so the
rational analysis of memory looks
through multiple text corpora and what
they found was if a particular word had
been used recently it's very likely to
be reused again and if it hadn't been
used recently
there's going to be this effect where
the expression here the T is time since
the most recent use it's going to sum
those with a exponential decay and so
what it looks like if time is going to
the right activation hire as better as
you get these individual usages you get
these little drops and then eventually
drop down and so if we had just one
usage of a word the read would be what
the decay would look like and so the
core problem here is if we're at a
particular point and we want to select
between kind of the blue thing or the
red thing blue would have a higher
activation and so maybe that's useful
this is how things are modeled with
human memory but is it useful in general
for tasks and so we looked at common
corpora used in word sense
disambiguation and just said well if we
just look at this corporate twice and we
just use answers prior answers you know
I ask the question what is the sense of
this word I took a guess I got the right
answer and I used that recency and
frequency information in my task
independent memory would that be useful
and somewhat of a surprise but somewhat
maybe not of a
it actually performed really well across
multiple corpora so we said okay this
seems like a reasonable mechanism let's
look at implementing this efficiently in
the architecture and the problem was
this term right here said for every
memory for every time step you're having
to pay everything that doesn't sound
like a recipe for efficiency if you're
talking about lots and lots of knowledge
over long periods of time so we made use
of a nice approximation that petrol that
come up with to approximate tale effect
so accesses that happen
long long ago we could basically
approximate their effect on the overall
sum so now we had a fixed set of values
and what we basically said is since
these are always decreasing and all we
care about is relative order let's just
only recompute when someone gets a new
value so it's a guess it's a heuristic
and approximation but we looked at how
this worked on the same set of corpora
and in terms of query time if we made
these approximations well under our 50
millisecond the effect on task
performance was negligible in fact hunt
a couple of these it got ever so
slightly better terms of accuracy and
actually if we looked at the individual
decisions that were being made making
these sorts of approximations were
leading to up to 90 sorry at least 90
percent of the decisions being made were
identical to having done the true full
calculation so I said this is great
and we implemented this and worked
really well and then we started working
on what seemed like completely unrelated
problems
one was in mobile robotics we had a
mobile robot I'll show picture of in a
little while roaming around the halls
performing all sorts of tasks and what
we're finding was if you have a system
that's remembering everything in your
short-term memory and your short-term
memory gets really really big I don't
know about you my short-term memory
feels really really small I would love
it to be big but if you make your memory
really big and you try to remember
something
you're not having to pull lots and lots
and lots of information into your
short-term memory so the system was
actually getting slower simply because
it had a lot of short-term memory
representation of the overall map it was
looking up so large working memory a
problem Liars dices game you play with
dice we were doing in our L base system
on this reinforcement learning and it
turned out it's a really really big
value function we're having to store
lots of data and we didn't know which
stuff we had to keep around to keep the
performance up so we had a hypothesis
that forgetting was actually going to be
a beneficial thing that maybe maybe the
problem we have with our memories that
we really really dislike this forgetting
thing maybe it's actually useful and so
we experimented with the following
policy we said let's forget a memory if
one we haven't really it's not predicted
to be useful by this base level
activation we haven't used it recently
we haven't used it frequently maybe it's
not worth it
that and we felt confident that we could
approximately reconstruct it if we
absolutely had to and if those two
things held we could forget something so
it's this bait same basic algorithm but
instead of the ranking them it's if we
set a threshold for base level
activation finding when it is that a
memory is going to pass that threshold
and try to forget based upon that in a
way that's efficient that isn't going to
scale really really poorly so we were
able to come up with an efficient way to
implement this using an approximation
that ended up for most memories to be
exactly correct to the original I'm
happy to go over details of this if
anybody's interested later but end up
being a fairly close approximation one
that as compared to an accurate
completely accurate search for the value
ended up being somewhere between 15 to
20 times faster and so when we looked at
our mobile robot here oh sorry let me
get this back because our little robots
actually going around
it's the third floor of the computer
science building at the University of
Michigan it's going around he's building
a map and again the idea was this map is
getting too big so here was the basic
idea as the robots going around it's
going to need this map information about
rooms the color there is describing kind
of the strength of the memory and as it
gets farther and farther away and it
hasn't used part of the map for planning
or other purposes basically make it 2 K
away so that by the time it gets to the
bottom it's forgotten about the top but
we had the belief that we could
reconstruc

Resume

# Arsitektur Kognitif dan Masa Depan AGI: Sebuah Tinjauan Mendalam tentang Sistem Kecerdasan Tingkat Manusia

### Inti Sari (Executive Summary)
Video ini membahas pendekatan **Arsitektur Kognitif (Cognitive Architecture)** sebagai jalan utama untuk mencapai **Kecerdasan Umum Buatan (AGI)** yang setara dengan kemampuan manusia. Dipandu oleh Profesor Nadir bin ski dari Northeastern University, pembahasan mencakup definisi AGI, sejarah dan filosofi di balik teori kognitif terpadu, serta analisis mendalam terhadap arsitektur seperti **Soar**, **ACT-R**, dan **Sigma**. Video ini juga menyoroti pentingnya mekanisme memori, efisiensi melalui "pelupa" (forgetting), dan bagaimana arsitektur simbolik dapat berintegrasi dengan pembelajaran mendalam (*deep learning*).

### Poin-Poin Kunci (Key Takeaways)
*   **Definisi AGI:** Sistem kecerdasan tingkat manusia yang mampu bertahan lama, tangguh, belajar seiring waktu, dan menangani berbagai tugas (termasuk yang tidak terduga), berbeda dari AI spesifik tugas saat ini seperti Alexa.
*   **Arsitektur Kognitif:** Adalah teori terpadu tentang struktur mekanisme tetap yang mendasari kognisi manusia, yang mencakup persepsi, memori, dan tindakan, memungkinkan sistem untuk belajar dan beralih tugas (transfer learning).
*   **Arsitektur Utama:** Terdapat beberapa arsitektur terkemuka seperti **Soar** (fokus pada efisiensi dan *chunking*), **ACT-R** (fokus pada pemodelan psikologis), **Sigma** (pendekatan baru berbasis grafik), dan **Spaun** (pemodelan biologis rendah).
*   **Pentingnya "Melupakan":** Dalam AGI, kemampuan melupakan informasi yang tidak relevan atau jarang digunakan itu krusial untuk menjaga efisiensi pemrosesan di bawah kendali waktu nyata (real-time), mirip dengan cara kerja memori manusia.
*   **Integrasi Deep Learning:** Arsitektur kognitif dan *deep learning* bukanlah lawan, melainkan komplementer; *deep learning* baik untuk persepsi (pengenalan objek), sementara arsitektur kognitif menangani penalaran dan perilaku tingkat tinggi.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Pengantar: Mencari Kecerdasan Tingkat Manusia (AGI)
Pembicara memulai dengan membedakan antara AI spesifik saat ini (yang hanya pandai satu hal) dengan AGI. AGI didefinisikan sebagai sistem yang tidak hanya cerdas, tetapi juga memiliki ketahanan (*robustness*), kemampuan belajar sepanjang hayat, dan fleksibilitas untuk menangani tugas-tugas baru.
*   **Motivasi:** Terinspirasi oleh karakter fiksi cerdas (seperti KITT di *Knight Rider*) yang bisa memahami konteks luas, dibandingkan asisten virtual modern yang terbatas.
*   **Kategori Peneliti:** Peneliti arsitektur kognitif biasanya termasuk dalam tiga kategori: mereka yang ingin menghasilkan pengetahuan baru, mereka yang memodelkan kognisi manusia (untuk kedokteran/HCI), dan mereka yang mengembangkan sistem untuk tugas sulit yang tidak bisa diselesaikan AI biasa.
*   **Pendekatan:** Berbeda dengan *Turing Test* (bertindak seperti manusia) atau Logika (aturan rasional), pemodel kognitif berfokus pada "berpikir seperti manusia" untuk memprediksi kesalahan, waktu reaksi, dan emosi.

#### 2. Teori Terpadu Kognisi (Unified Theories of Cognition)
Alan Newell, salah satu pendiri AI, mengusulkan bahwa psikologi membutuhkan teori terpadu daripada kumpulan studi kecil yang terpisah. Hal ini melahirkan konsep Arsitektur Kognitif.
*   **Mekanisme Tetap:** Arsitektur menyediakan struktur dasar (memori, representasi, pembelajaran) yang tetap, di atasnya pengetahuan spesifik tugas dibangun.
*   **Skala Waktu Newell:** Kognisi manusia beroperasi pada berbagai skala waktu, mulai dari level neuron (milidetik) hingga interaksi sosial (jam/tahun). Arsitektur kognitif biasanya berfokus pada level "tindakan sengaja" (*deliberate act*) sekitar 10 detik.
*   **Rasionalitas Terbatas (*Bounded Rationality*):** Herbert Simon mengajarkan bahwa manusia rasional tetapi terbatas oleh kompleksitas masalah dan keterbatasan kognitif. Manusia menggunakan pendekatan "memuaskan" (*satisficing*)—mencari solusi yang cukup baik daripada solusi sempurna—karena keterbatasan komputasi.

#### 3. Jenis dan Komponen Arsitektur Kognitif
Terdapat beberapa level pemodelan dan arsitektur yang dikembangkan:
*   **Spaun:** Model yang sangat mendekati biologi dengan 2,5 juta neuron, mampu memproses gambar dan meniru gaya tulisan tangan.
*   **ACT-R:** Fokus pada pemodelan psikologis dan prediksi perilaku manusia, dengan lebih dari 1.100 paper akademis yang menggunakannya.
*   **Sigma:** Arsitektur baru yang dikembangkan di USC, menggunakan *factor graphs* untuk fleksibilitas, dan menjadi dasar *Virtual Human Project*.
*   **Komponen Prototipe:** Sebuah arsitektur kognitif umumnya memiliki siklus: **Persepsi** -> **Memori Jangka Pendek** -> **Pengetahuan Tugas** -> **Seleksi Aksi** -> **Aksi (Internal/Eksternal)**. Pembelajaran terjadi untuk meningkatkan memori deklaratif dan prosedural.

#### 4. Studi Kasus: Arsitektur Soar
Soar adalah salah satu arsitektur tertua dan paling mapan, dikembangkan oleh John Laird dan Paul Rosenbloom (murid Alan Newell).
*   **Efisiensi:** Soar dirancang untuk berjalan cepat. Target waktunya adalah di bawah **50 milidetik** per siklus keputusan (sesuai delay persepsi manusia), bahkan seringkali di bawah 1 milidetik.
*   **Chunking:** Mekanisme pembelajaran utama Soar di mana sistem mengingat hasil sub-tujuan (*sub-goals*) sebagai aturan baru, mempercepat pemrosesan di masa depan.
*   **Aplikasi:** Digunakan dalam konfigurasi komputer (R1-Soar), simulasi militer skala besar (TerraSoar), robotika (pemenang kompetisi MAGIC), dan pengembangan game.

#### 5. Pembelajaran & Kolaborasi Manusia-AI (Proyek Bruisy)
Video menampilkan contoh robot lengan yang belajar melalui interaksi sosial dengan manusia (proyek Bruisy).
*   **Metode:** Robot diajarkan atribut (warna) dan tindakan (memindahkan objek) melalui bahasa alami dan penunjukan (*pointing*).
*   **Generalisasi:** Setelah belajar konsep dasar, robot dapat melakukan tugas baru (misalnya: "pindahkan objek hijau ke tempat sampah") tanpa pemrograman ulang, menunjukkan kemampuan generalisasi.

#### 6. Inovasi Memori: Seni Melupakan (*Forgetting*)
Salah satu kontribusi utama yang dibahas adalah bagaimana menangani memori dalam sistem skala besar.
*   **Masalah:** Menyimpan semua pengalaman (seperti dalam *Liar's Dice* atau pemetaan robotika) membuat memori membengkak dan melambatkan sistem.
*   **Solusi:** Menerapkan mekanisme "melupakan" berdasarkan aktivasi dasar (*base-level activation*). Informasi yang jarang digunakan atau tidak berguna dihapus.
*   **Hasil:** Pada permainan *Liar's Dice*, sistem dengan mekanisme melupakan dapat mengurangi penggunaan memori dari 2GB menjadi muat di iPad tanpa kehilangan performa kemenangan. Pada robotika, mekanisme ini mencegah sistem melambat saat memetakan area besar.

#### 7. Hubungan Simbolik, Deep Learning, dan Masa Depan
Pembicara menanggapi pertanyaan mengenai relevansi arsitektur kognitif di era *deep learning*, menegaskan bahwa keduanya bukanlah lawan melainkan saling melengkapi.

## Kesimpulan & Pesan Penutup
Secara keseluruhan, video ini menegaskan bahwa arsitektur kognitif merupakan fondasi penting untuk mewujudkan AGI yang mampu meniru ketangguhan dan fleksibilitas kognisi manusia. Integrasi antara struktur simbolik dan *deep learning* menawarkan pendekatan yang lebih holistik, di mana efisiensi sistem dijaga melalui mekanisme pembelajaran dan "melupakan". Pemahaman mendalam mengenai hal ini membuka jalan bagi pengembangan sistem AI yang tidak hanya cerdas, tetapi juga adaptif terhadap berbagai konteks kehidupan nyata.

Read

file updated 2026-02-13 13:23:55 UTC