The biggest problem with transformer based AI models

Tech - - (+2|-0)

The biggest problem with transformer based AI models(self)

There is a fundamental problem with how the transformer architecture works and it's ability to do reasoning a human can. It breaks down with a very specific scenario. This scenario isn't present in every logic problem, but it is common enough that we can't hand keys over to AI to do unchecked problem solving.

The breaking scenario is only moderately convoluted but is within what humans should be able to do, and AI fails entirely.

The scenario is when you have two systems that may or may not interact between them but internally each system has both similarities and differences. That's rather abstract so I'll give an exact example. And this is an example I was just working with and was reminded of this problem.

Let's say we want to figure out if I3 Atlas is crossing in front of or behind the Sun's orbit around the galaxy. We want it to do this easily because we just want to settle an online debate. Is I3 doing a slingshot maneuver or not? That's a pretty simple premise. In an ideal world you just ask the question, AI thinks about it, and you get an answer. Now the most commonly published material that an AI can search for on the internet related to I3 Atlas will talk about what it is doing in our solar system.

So this introduces the problem. Talking about two similar and yet distinct systems in the same context. We want to figure out what's happening in the galaxy but all our material is related to solar system. So we'll need data about both. Galaxies have orbits in them. So do solar systems. But they are not the same thing. The problem is that an AI cannot reason about two distinct things that have similar vocabularies surrounding them without all of the tokens that are descriptive of item A not attending to all of the tokens that are descriptive of item B. So it's ability to keep track of what relates to what completely breaks down.

To understand this we need to know how an attention mechanism works in an AI. To over simplify things what an AI really does is it blurs words together. It just does this selectively. Semi-selectively. Why does it do this? To solve the problem with words having different meanings in different contexts.

If I say a woman is a fox or a man is a fox these have different meanings. And they both have different meanings than an actual fox. If a woman is a fox we mean that she is attractive. If we say that a man is a fox it's possible we mean something similar but it is just as likely to mean that he's cunning like a fox. Similar issues come up when we use terms like green which could mean a color, it could refer to someone being envious, or someone being new to a field, or maybe wood that hasn't aged. How does it know the difference? Roughly what it does is it averages words together. Each word or token is just a long sequence of numbers that represent a position in a space. To modify the word fox we, sort of, just take 1/10th of the word woman and add it to fox. We also may do it in the other direction depending what our context mask looks like. So we now have two new words in our text. One is a woman-ish fox, and the other is a fox-ish woman. The model basically has a learned mapping of which words have this impact on each other at an elevated level or a diminished level.

The result is that if you try to ask it to keep two things straight that have similar vocabulary surrounding them it's idea of what words apply to one thing or the other are going to get blurry. Especially when search is involved because its going to grab a lot of articles that are going to use a lot of jargon.

So if you have a description of what is going on at the galactic level and you write words or import them implicitly from under the hood article grabs words like, azimuth, perihelion, retrograde, UVW vector; And you end up with a lot of the same words to describe what's happening at the solar level; Then all the orbit related words are going to get both the word galactic and solar embedded into them. In fact all of these words are going to attend to each other a lot. As this gets passed through the layers of the model everything will attend to each other more and more until you get complete cross chatter in the reasoning of the two things.

This is a real problem for science. Basically we can't use it for science. It really is a better word completion. We want it to do reasoning but you better pray none of your problems describe two separate things with a lot of similar words. Also this becomes a problem with moral reasoning.

Score		1
Age		1
Proximity		1
Bump		1
Comments		1

Comment preview

Review