Technology invades the modern world
Chapter 148 You all know more about machine translation than I do
Chapter 148 You all know more about machine translation than I do (5k)
“Professor, you’re not a language translator. Language is a game of rules; probability is too unreliable.” Paul Calvin wanted to try again.
Of course, he genuinely didn't believe that translation and statistics could be related.
There is a one-to-one correspondence between the words.
The English and Russian words are matched one-to-one for direct translation, thus expanding the corpus.
At the time, this was considered the right way.
This is also known as the exhaustive search method.
Once all the words are mapped one-to-one, automatic translation is achieved.
Statistics is a game of probability. Leaving aside the fact that their incompetence would be exposed if Lin Ran were right, the improvement principle that Lin Ran mentioned is intuitively wrong.
Simply put, it's counterintuitive.
Just like before the GPT large model came out, everyone thought that the algorithm was the most important.
After GPT was released, everyone started frantically throwing bricks.
When it comes to deepseek, the algorithm seems to be somewhat useful.
Even top researchers can have problems with blindly following others, and may feel lost, unable to find their direction, or unable to break free from their predicament.
In this chaotic age of computers, this is perfectly normal.
"Precision? Precision means making mistakes, and modern computers are far from being precise."
Don't you understand that the reason you demonstrated such good results in 54 was because those Russian sentences were carefully selected by you?
The complexity of natural language is far greater than you expect.
You only expanded the corpus, but didn't do rule coverage or context dependency handling.
"Can you possibly understand machine translation better than I do?"
Lin Ran roared, "You've been working on this for nine years without any progress. Now, do exactly as I say!"
Given Lin Ran's status, strength, and power, they had no choice but to refuse.
It goes without saying that Watson would believe Lin Ran, given that Project Deep Blue had just ended, and McNamara of the Department of Defense was completely subservient to Lin Ran.
Do you computer scientists really know more about computers than math masters?
McNamara hasn't forgotten Lin Ran's brilliance in game theory and statistics.
IBM's CEO supports Lin Ran, the Secretary of Defense supports Lin Ran, and the research team at Georgetown University can only be utterly humiliated.
"We need to do five things: optimize algorithms and rule design, expand the corpus and vocabulary, improve the efficiency of data processing, introduce statistical methods, and maximize the use of hardware."
IBM was responsible for improving data processing efficiency and maximizing hardware utilization.
The other three points were handled by members of Georgetown University.
Let's first talk about optimization algorithms and rule design.
Your persistent problem is that you haven't introduced more granular syntactic rules into your expansion of the rule set.
Because of limited storage, you think that expanding the comparative vocabulary database is enough.
In fact, syntax rules are more important.
What you need to do is introduce common, high-frequency sentence patterns.
Context-dependent processing is applied. Lexical translation takes into account the preceding and following words, reducing ambiguity through a limited context window.
For example, свет can mean both light and world.
This can be easily determined from the preceding words whether it refers to light or the world.
Watson tentatively reminded him, "Professor, you also speak Russian?"
Lin Ran replied matter-of-factly, "Of course, I've met Korolev twice. How could I communicate with him if I don't speak Russian?"
I speak Russian, German, English, and Chinese.
Lin Ran's status as a multilingual expert adds credibility to his theories.
In this day and age, it is not unusual for scientists to be fluent in several languages.
Of course, some sensitive departments will increase their suspicion of you.
Take John McCarthy, mentioned earlier, for example. He was fluent in Russian and received Russian education from a young age, even though he was born in America.
"Furthermore, the translation process should be modular, rather than a simple mapping relationship."
It should be divided into three parts: preprocessing, translation, and postprocessing.
Preprocessing includes word segmentation and word form restoration, translation is the mapping of the dictionary, and postprocessing is the adjustment of word order.
This reduces the complexity of a single calculation and increases the reusability of rules!
Lin Ran's words gave the members of the research team present a lot of inspiration.
It was as if they had been trapped in the Baiyue jungle and couldn't find a way out, but now a light appeared in the sky to guide them on how to get out of the jungle maze.
Everyone is eager to try it out.
All the researchers frantically took notes of what Lin Ran said in their notebooks.
Although it's uncertain whether the professor's method will work, having a way is better than having no way at all.
Furthermore, if you don't take good notes, being expelled is just a matter of a single word from the professor.
"Okay, we've covered some simple stuff so far. Now comes the hardest part."
Because IBM's machines are not very powerful, we can only introduce some relatively simple statistical methods to improve the accuracy of our translations.
I call it frequency-based word alignment.
This is also the core of our introduction of statistical models.
We first need to manually analyze the parallel sentences and mark the correspondence between Russian words or phrases and their English translations.
Russian sentence Мыговоримомире
English translation: "We speak about peace"
Alignment result: "мы" corresponds to "we"
“говорим” corresponds to “speak”.
“о” corresponds to “about”.
“мире” corresponds to “peace”.
Then we need to statistically analyze the frequency of this alignment.
Statistically analyze the frequency of the corresponding English translation for each Russian word or phrase.
For example, in the corpus, “говорим” is translated as “speak” in 80% of the sentences and as “talk” in 20%.
This allows us to construct a probability table.
These probabilities are compiled into a table for machine querying. Due to limited memory space, we will only store high-frequency word pairs for now, such as the top 1000 word pairs, ignoring low-frequency pairs.
When multiple options are available when translating a word, refer to the probability table to select the most likely translation.
Another method is to analyze the co-occurrence frequency of adjacent words. мы often appears together with говорим, corresponding to "We speak," and the machine prioritizes this combination during translation.
We compensate for the shortcomings of rules by prioritizing rule-based processing and using statistical methods to handle ambiguous cases!
Lin Ran gave them a good lesson from a statistical perspective.
However, this is just the beginning.
The research team members present now understand the outline of the forest combustion optimization strategy, but there are still many details to be adjusted, tried, and optimized in the actual implementation process.
However, just considering the probability of its introduction, the senior researchers at Georgetown Translation Machines all had a sudden realization.
They felt that the optimization algorithms and rule designs mentioned earlier made sense, but they couldn't judge whether they would actually work in practice.
But the introduction of this statistical method, even just by imagining it, can significantly improve the performance of the Georgetown translation machine.
After finishing their work for the day, Calvin and Dostel sat in a corner of a small restaurant near Redstone Base, with two glasses of local beer in front of them.
Calvin put down his notebook, sighed, and said, "Leon, are we really fools?"
After hearing this today, Calvin was questioning his existence.
Lin Ran proposed a complete solution. Not only was the solution comprehensive, but they had also considered many of the points but couldn't figure out how to implement them. In addition, there were some points that they had never even thought of.
The research and development ideas of an entire team that have been working on for almost ten years are not as valuable as the practical information Lin Ran shared in one afternoon.
Calvin was beginning to question the meaning of life.
"The professor's idea wasn't ahead of its time, but rather too practical."
You might find it fantastical, but when you put it all together, it becomes incredibly practical.
Even though it hasn't started yet, just from the framework proposed by the professor, I can imagine how good the Georgetown translation machine will be after adopting this complete solution to upgrade it," Calvin continued with his感慨 (gǎnkǎi, deep emotion).
Now he finally understood why NASA researchers and engineers tolerated Lin Ran's sharp tongue; the difference was too great, and they were completely convinced. Especially the idea of statistically analyzing the co-occurrence frequency of adjacent words—it wasn't hard to come up with, but they just couldn't.
Using statistical methods to handle ambiguous scenarios and adding statistical algorithms is something they never even considered.
Dostel turned his head and smiled wryly, "I've been thinking about it too. The statistical method he proposed sounds like a fantasy, but the results are undeniable."
I estimate that the Georgetown translation system, under the professor's guidance, will see a significant improvement in quality.
We don't need to carefully prepare short phrases; they can be applied to a wider range of scenarios, rather than being limited to the military field.
Calvin nodded: "Yes, I didn't believe it at first. Language is clearly driven by rules, how can it be solved by statistics? But he shut me up with the facts."
"As expected of a professor; his insight into the essence of things transcends disciplines."
Dostel pondered for a moment: "You're right, it feels as if he can see through the essence of machine translation."
Perhaps this is a benefit of my mathematical training; I'm afraid if I spend any more time with the professor, I might even consider pursuing a PhD in mathematics.
Calvin looked at him in surprise: "A PhD in mathematics? You must be joking."
Dostert said seriously, "I'm not joking."
If mathematics can truly help us better understand the essence of things, I think pursuing a PhD in statistics wouldn't be a bad idea.
Calvin was silent for a moment, then smiled and said, "If you go, I'll go too."
Dostel raised his overflowing beer glass: "Cheers to the professor! The professor will bring us victory!"
Calvin laughed and replied, "Cheers! But Professor, if only you could be a little gentler."
On the other side, IBM engineers Cuthbert Hurd and Peter Sheridan were also full of admiration for Lin Ran.
Cuthbert rubbed his temples and asked, "Peter, do you really think the professor's statistical model will work?"
Peter put down his pen, smiled, and said, "Cuthbert, I must confess, I wasn't optimistic about him at first, but now I'm completely convinced. The professor's method not only maximizes the performance of the IBM 7090, but also provides at least a probability basis for translation, moving it from a chaotic state."
Cuthbert nodded: "I think those guys at Georgetown University think so too. Didn't you see Calvin's attitude change from initial skepticism to listening intently later?"
The professor's algorithm is optimized to perfection.
Peter smiled wryly: "Magic? As one of the most brilliant mathematicians of this era, perhaps the most brilliant, statistics is probably just a simple Sudoku puzzle to the professor."
I just didn't expect the professor to combine probability theory and linguistics so cleverly; I never imagined machine translation could be done this way.
Cuthbert asked curiously, "You said the professor is fluent in Russian, and the few sentences he spoke today were as standard as could be."
It also spans multiple fields; let alone IBM, probably no one in America could come up with such a solution.
Could the professor have connections to Soviet Russia?
Peter said speechlessly, "The Soviets would let the professor stay in America?"
If I were Nikita, I would never let someone as talented as the professor stay in the White House.
Even if you could obtain technical secrets from NASA, no matter how many secrets you get, they probably wouldn't be as valuable as the professor himself.
And have you ever considered that if the professor weren't at NASA, but instead teamed up with Korolev in Moscow, can you imagine America winning the space race?
Cuthbert thought for a moment, then immediately shook his head: "Absolutely impossible."
"So, if the professor had ties to Soviet Russia, how could he possibly stay in America?"
"The first thing he'll probably need to do is lead a manned lunar landing in Moscow," Peter laughed.
It's possible for scientists to collude with Russia, but it's less likely for scientists with the power to influence the balance to collude with Russia.
If they only admired Lin Ran's academic achievements, then Watson admired him in every way.
Similar to John Morgan.
However, Watson admires John Morgan for different reasons.
"Professor, how did you come up with the idea of using an exhibition hall to build a corporate image?" Watson raised his glass of red wine, smiling.
The Deep Blue Exhibition Hall next to Times Square earned IBM tremendous prestige.
Times Square has always been a landmark building in New York, and a must-visit attraction for almost every tourist visiting New York.
The Deep Blue Pavilion, however, attracted everyone's attention with its style that seemed out of place in this era.
Combined with the world's only AI chess game, it can automatically play against humans.
The impact on the public is unprecedented.
America's companies have a long tradition of showcasing their technological prowess and promoting their products through public exhibitions, a tradition that dates back to England.
Whether it was Stephen's earliest train or the later steamship, the English mobilized the public and publicized it extensively in newspapers.
America's earliest and most successful invention was probably Edison's light bulb, and later Bell's telephone was also a classic example.
But these are all fleeting moments. Only when the product enters people's daily lives will they have a deeper understanding of the company and the brand.
The existence of the Deep Blue Exhibition Hall leaves a deep impression on every visitor with its deep blue and black lines.
The impression that IBM = Artificial Intelligence = High Technology was etched in the minds of every visitor.
This benefits IBM not only in terms of its corporate image and its association with artificial intelligence, but it has also laid the foundation for the White House to compete with Russia in the field of artificial intelligence, should the White House be determined to do so.
The supplier will have no other choice but IBM.
Lin Ran's suggestion essentially delivered one of IBM's largest customers out of thin air, and it was a long-term order that could last for decades.
John Morgan's General Aerospace got a contract from NASA, and at least gave some of it to Linran Shares, but Watson didn't contribute anything.
Even if Lin Ran pointed at him directly and said it, he would only say yes, yes, yes, I am an idiot.
Inside the private room, the waiter quietly left, leaving a quiet space for conversation.
"Because I think that artificial intelligence like Deep Blue should leave a deep enough impression on the public."
And not just showcased within IBM.
As for why an artist was chosen to design it, how could an ordinary theater be worthy of deep blue?
Watson smiled and nodded: "You're absolutely right."
When I first saw the completion of the Technology Ark, I felt that it didn't belong to this era. It was because of you that DeepBlue and the Technology Ark came into being.
Professor, I must raise a glass to you.
Don't say that you have to toast Stephen Hawking if he comes.
Even Watson has to toast me in front of Lin Ran.
Watson continued, "Professor, on behalf of IBM, I would like to express my sincerest gratitude to you."
Not only Deep Blue, but your contributions to the Georgetown-IBM project have been phenomenal. Your innovative approaches will lead to groundbreaking advancements in our machine translation systems.
While it's not yet a breakthrough, Watson is already quite confident.
Lin Ran nodded and said, "That's only right. Besides, Watson, I think my contributions to IBM, whether it's Deep Blue or the Georgetown translation machine, cannot be measured in money."
Lin Ran was not modest at all.
This caused Watson's smile to freeze: "Professor, we will give you a generous sum of money as a reward."
Lin Ran shook his head: "I'm not interested in money."
Watson thought about it and realized that he had never heard of the other party being interested in money.
Watson hesitated when asked to take a stake in the company.
"Professor." Watson said before he could finish.
Lin Ran continued, "I need you to fulfill one small condition of mine."
If you cannot meet my conditions, I may have to seriously consider the possibility of cooperating with General Electric.
General Electric, element perception.
The relationship between Lin Ran and the Morgan family needs no further explanation.
Watson knew it all too well.
General Electric also makes computers.
Although General Motors did not make mainframe computers, its GE-225 series, as a transistor-based computer, was used to handle tasks such as payroll, inventory management, and accounting.
General Electric has the capability, and more importantly, the resources.
If Lin Ran is added to the mix, along with his influence as a master, he could indeed pose a significant threat to IBM.
Watson's tone softened immediately: "Professor, what do you want?"
"MIT Radiation Laboratory Series"
(End of this chapter)
You'll Also Like
-
Era: Starting with the struggle to refuse being taken advantage of
Chapter 382 21 hours ago -
Old Domain Bizarre
Chapter 53 21 hours ago -
I Alone Am Immortal: My Rebirth and Leisurely Cultivation
Chapter 484 21 hours ago -
Immortality and cultivation begin with full comprehension.
Chapter 869 21 hours ago -
The younger generation, starting from where the wind blows...
Chapter 365 21 hours ago -
F1: The Making of a Racing God
Chapter 287 21 hours ago -
Invasion Myth: Starting with the Schoolteacher
Chapter 1076 21 hours ago -
Swords emerge from the human world
Chapter 106 21 hours ago -
I was reborn without dreams
Chapter 218 21 hours ago -
Playing with fantasy beasts in the martial arts world
Chapter 233 21 hours ago