Rebirth: Starting from Lighting Up the Tech Tree

Chapter 102 Quickly

For the next two weeks, Zuo Cheng locked himself in his office and hardly ever went out.

Han Lu knocked on the door twice, but Zuo Cheng only said, "I'm busy, don't bother me." Chen Hao also knocked once, but Zuo Cheng dismissed him with the same words. The whole company knew that Zuo Cheng was working on something important, but no one knew what it was.

He carefully reviewed all the learning materials recommended by Yu Ying. Sutton's Introduction to Reinforcement Learning provided the theoretical foundation, Mnih's DQN paper provided the methodology, and with the addition of several recent studies on deep reinforcement learning in the field of resource scheduling, he built a complete knowledge framework in five days.

This speed is unimaginable in academia. Someone with no AI background reading core papers on reinforcement learning in five days? But Zuo Cheng has a technological foundation. The fusion blade of the Intelligent Star Network Scheduling System gave him intuitive understanding; many concepts that others need to ponder repeatedly to grasp, he can understand in a single glance.

The effects of technological amplification are also continuing to take hold. All AI-related learning efficiency has increased by 20%, meaning his learning speed is more than an order of magnitude faster than that of the average person.

On the seventh day, Zuo Cheng began designing the algorithm framework on paper.

The core idea of deep reinforcement learning is simple: let an intelligent agent continuously try and fail in an environment, learning the optimal strategy through reward and punishment mechanisms. Applied to inter-satellite link scheduling, this means letting the AI model continuously try different spectrum allocation schemes in a simulation environment to find the one with the highest spectrum utilization.

Designing is easy, but implementing it is difficult. How should the state space be defined? How should the action space be designed? How should the reward function be constructed? Every choice will affect the final result.

Zuo Cheng took out the blade description of the intelligent star network scheduling system and studied it carefully.

The key parameters provided by the blades were a great help. The state space should include three dimensions: link quality, satellite position, and spectrum occupancy; the action space should be designed as continuous rather than discrete, because the finer the granularity of spectrum allocation, the larger the optimization space; the reward function should be based primarily on spectrum utilization, plus a penalty term for link stability.

Zuo Cheng compiled these parameters into a technical document and handed it to Tang Xu.

"Build the simulation environment according to this framework," Zuo Cheng said. "The state space is three-dimensional, the action space is continuous, and the reward function uses this formula."

Tang Xu took the document, looked at it for a while, and his expression changed from confusion to shock.

"Mr. Zuo, this framework is very professional. Where did you learn it?"

"I've been self-studying these past few days," Zuo Cheng said. "Don't ask how I learned it, just follow the framework."

Tang Xu didn't press further. He knew Zuo Cheng's learning ability far surpassed that of ordinary people; from communications to the Internet of Things to the current AI, he could always grasp the core knowledge in the shortest amount of time. He couldn't explain this ability; he could only attribute it to talent.

Three days later, the simulation environment was set up. Tang Xu reported that the environment was running smoothly, the definition of the state space and action space followed Zuo Cheng's framework perfectly, and the reward function was also implemented.

"Okay, the next step is to train the model," Zuo Cheng said. "Is the GPU server in place?"

"They've arrived. Han Lu rushed to buy four RTX 2080 Ti cards, and they were just installed yesterday."

"Are four pieces enough?"

"In a simulation environment with 480 satellites, it would take about three days to run a single DQN model on four GPUs," Tang Xu said. "If we want to run multiple models for comparison, it might take a week."

"A week is too long," Zuo Cheng said. "I'll give you a training parameter configuration: learning rate of 0.0003, batch size of 256, experience replay pool size of 100 million, and target network update frequency of 1000 steps. Running it with this configuration should reduce the training time to two days."

Tang Xu noted down the parameters, then asked with some confusion, "How did you determine these parameters?"

"We figured it out through testing," Zuo Cheng said. Of course, he couldn't say that these parameters were directly given in the blade description.

After Tang Xu left, Zuo Cheng opened the system panel and glanced at it. The number of leaves on the Internet of Things branch had changed again, increasing from fifteen to sixteen, with the newly grown one called "Neural Network Architecture Search." This leaf's ability is to automatically search for the optimal neural network structure, reducing the need for manual parameter tuning.

It came at the perfect time. Zuo Cheng incorporated the NAS (Neural Architecture Search) concept into the algorithm design, adding an automatic architecture search module to the training script. This way, the model can not only learn the optimal scheduling strategy but also automatically find the most suitable network structure for itself.

The effect of a 1.2x technology boost is vividly demonstrated in this case. The same training task, without the boost, might take five days to converge; with the boost, it only takes two days. This is the power of the tech tree; what seems like a mere 20% increase can be a lifesaver at crucial junctures.

Zuo Cheng closed his laptop and walked to the window. Night had fallen, and the lights in the technology park were sparse, with only a few office buildings still lit. He knew that under one of those lights, Tang Xu was training.

He sent Yu Ying a message: "Kongkong, thank you for recommending my paper. I've built the reinforcement learning framework and am currently training the model."

Yu Ying replied: "You really studied it? Only for two weeks?"

"Don't underestimate your brother."

"I'm not underestimating you, I just find it incredible." Yu Ying sent a shocked emoji. "My senior in her first year of doctoral studies spent half a year just barely getting the hang of reinforcement learning, and you've built the framework in two weeks?"

Zuo Cheng smiled but didn't reply. He couldn't tell Yu Ying that he had a technological advantage behind him. In other people's eyes, he was a genius; but only he knew that behind his genius was an invisible technological tree.

Two days later, Tang Xuxing rushed into Zuo Cheng's office.

"Manager Zuo! The model has converged! The spectrum utilization rate is 76%!"

Seventy-six percent. With the added benefit of a 20 percent increase in technology, the final utilization rate will reach 91 percent, far exceeding the design target of 80 percent.

Zuo Cheng took a deep breath, but remained calm: "Have you run the full simulation?"

"It's in progress, and the results are expected this afternoon."

"Okay, let me know when you're done running."

After Tang Xu left, Zuo Cheng leaned back in his chair, a slight smile playing on his lips. The first step in the AI field had been taken. 402 didn't have an AI team? No problem, he was an AI team all by himself.

At least for this stage, he can manage on his own. But when the AI branches are truly activated, 402 will need a real AI team to support it.

Prev Index Next

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.