注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

物之二元

从此出发,环游世界

 
 
 

日志

 
 

[转载] BM系列(2) - Boltzmann Machines for the Grumpy Universe  

2014-05-26 14:53:18|  分类: 它山之石 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

In the last post, we thought a bit about machine creatures, and in particular Cid, an unfortunate who we are going to torment quite a bit over the next few posts.

Today we’ll do a little bit of construction and deconstruction of Cid. We’re going to build him a brain, and try to see how it works, and whether we can get his brain to do what we want.

Thinking about our system architecture

To proceed, we’re going to separate out the entirety of Cid’s Universe into three distinct parts. The first is the External Universe. This will consist of everything outside of Cid. The third is Cid’s brain, which will attempt to build a model of the External Universe, and will reside entirely within Cid. The second is the interface layer between the two, which in this context you can think of as an eye. This interface layer can accept information from the External Universe, whereas his brain cannot. The brain accepts information from the interface layer, and can also send information to the interface layer. Take a look at this picture. Hopefully the idea is clear!

Cid_pics_1

Cid’s high level architecture.

Here’s another way of looking at the same thing that highlights the separation between Cid and the External Universe.

    I think I like this on<wbr>e better as it emphasizes that Cid is contained and separate from the External Universe.

I think I like this one better as it emphasizes that Cid is contained and separate from the External Universe.

This segregation is really important and is tied to some real meaty issues. If you think of your own body and how it lives in your Universe, we have the same type of architecture. Your External Universe is roughly everything outside your skin; your interface layer is roughly everything on the outside of your body; and your internal model of the world is roughly everything inside your skin (probably mostly what’s inside your skull).

In the picture there is a small orange circle. We’ll call this a visible unit. (Now we’re starting to connect to a real Boltzmann Machine. Exciting!). You can think of the visible units as vertices in a graph. They are special in our architecture, in that they are able to ‘see’ into the External Universe, and are connected into Cid’s brain. Whenever you read ‘visible units’ in the context of Boltzmann Machines, think interface layer between the External Universe and the creature’s internal representation of it. It’s the layer that separates ‘outside the creature’ from ‘inside the creature’.

Inside Cid’s brain

So far we haven’t talked at all about what might be going on inside Cid’s brain. Let’s fix that, and build an actual brain that allows Cid to understand the Grumpy Universe.

Recall that the Grumpy Universe is a very silly place, where the External Universe consists of only two possible inputs (those being Grumpy Cat and Creepy Manbaby). Now instead of actually using the images themselves, let’s simplify things a bit and represent these by a zero (for Grumpy Cat) and a one (for Creepy Manbaby). So Cid’s interface layer will only ever see a zero (our stand-in for Grumpy Cat) or a one (for Creepy Manbaby).

To build Cid a brain, let’s do the following. Let’s set up a number of nodes, like the visible unit, but hidden. We’ll call these ones Hidden Units. Here’s a picture of what a possible Cid brain could look like.

Here we have on<wbr>e visible unit (the orange circle) and eight hidden units (the yellow circles).

Here we have one visible unit (the orange circle) and eight hidden units (the yellow circles).

From now on, we’ll just focus on the visible and hidden units to simplify things. Here they are.

Cid_pics_4

A proto-brain for Cid.

Here we’ve added a couple of things. Each of the nodes now has a label. The visible units (of which now there is only one) we’ll label v_k where k an integer denoting which visible unit we’re referring to. The hidden nodes are labeled h_k where k is again an integer referring to a specific node. We’ve (arbitrarily) chosen eight hidden nodes.

We’ve also added some black lines that connect some, but not all, of the nodes together. The connectivity pattern shown above is just one of many different ones we could pick. This particular one will turn out to be quite useful for some things I want to show you, but we could just as well have allowed all to all connectivity.

Wherever there is a black line, we introduce a real number which we call a weight. In the proto-brain above, there are four of these between the visible unit and the hidden units, and 16 of them between the different hidden units. We’ll write the weights between the visible and hidden units as U_k, where k = 0, 1, 2, 3 depending on which hidden unit is connected to. We’ll write the weights between hidden units as W_{k, p} where k and p are the indices of the hidden units the weight connects. Here’s a picture to help make this clearer.

Here some of the weights are explicitly shown -- all four U weights (connecting the visible unit to the hidden units) and three of the W weights are explicitly shown (the bold lines with the W next to them).

Here some of the weights are explicitly shown — all four U weights (connecting the visible unit to the hidden units) and three of the W weights are explicitly shown (the bold lines with the W next to them).

Now let’s assume that each of the nodes can take on one of two values — say either zero or one (it could be -1 and +1 also — any two values will do). The total number of nodes in the current architecture is 1 (visible) + 8 (hidden) = 9. Since each of these nodes can have value 0 or 1, all nine of them together can be specified with nine bits. We’ll use the convention that the leftmost bit is the visible unit, and the rightmost eight bits are the hidden units. Let’s call the value of the visible unit y, and the values of the hidden units x_k, where k=0..7 refers to each of the eight hidden units.

We now define the probability of any particular state of our network to be

P(y, x_0, x_1, ..., x_7) = {{1}over{cal Z}} exp(-E(y, x_0, x_1, ..., x_7) / T)

Where

E(y, x_0, x_1, ..., x_7) = y a_0 + sum_{k=0}^7 b_k x_k + sum_{k=0}^3 y U_k x_k + sum_{k,p in {cal E}} x_k W_{k,p} x_p

The probability distribution P is called a Boltzmann distribution (ergo the term ‘Boltzmann Machine’). The variable T is the temperature of the distribution. The quantity

{cal Z} = sum_{all-possible-states} exp(-E(y, x_0, x_1, ..., x_7))

is called the partition function, and it’s pretty much impossible to calculate (it will turn out we don’t need to!).

I’ve introduced some parameters here — a_0 and b_k are local biases on each of the nodes. They are (as yet unknown) real numbers, just like the U_k and W_{k,p} weights. The notation sum_{k,p in {cal E}} just means only sum over the k, p pairs that have an edge between them.

OK that’s enough for today. Next post we’re going to start exercising that brain!

  评论这张
 
阅读(310)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017