Exploring Trends of AI and Machine Learning

Artificial intelligence (AI) spans a broad array of techniques and applications aimed at creating systems that can learn, reason, and, in some cases, generate creative outputs. From chatbots and digital assistants to generative AI tools for creating art, music, and video, AI technology is constantly expanding its reach. While data analytics is one use of AI, this thread will cover a wide range of intelligent applications and advancements. Here, I’ll be providing updates on cutting-edge trends in AI, exploring its impact across different fields, and keeping you informed about the latest breakthroughs in the industry.

11/1/2024137 min read

Google announces Ironwood TPUs and new Axion VMs

November 7, 2025

Google Cloud is announcing two major hardware innovations aimed at advancing AI workflows: the seventh-generation TPU named Ironwood TPU, and a new line of Arm-based general-purpose compute instances (the Axion CPU family) for workloads beyond pure acceleration.

Ironwood is engineered to support both large-scale model training and high-volume, low-latency inference. According to Google, it offers approximately 10× peak performance compared to their TPU v5p generation, and over 4× improved performance per chip for both training and inference relative to their TPU v6e (“Trillium”).

It is designed for huge scale: a “superpod” configuration supports up to 9,216 chips connected via a ~9.6 Tb/s inter-chip interconnect, with 1.77 PB of shared high-bandwidth memory for the entire pod, enabling massive model-size and dataset workloads.

Furthermore, the system uses optical circuit switching (OCS) and is integrated into Google’s “AI Hypercomputer” architecture, which spans hardware, networking, storage, and software co-design.

Google mentions early customer use-cases: for instance, Anthropic expects to access up to one million TPUs, and other firms are already using Ironwood to support inference-scale generative AI workloads.

In parallel, Google is doubling down on efficient, general-purpose compute (not just accelerators) via its Axion CPU line, based on Arm Neoverse architecture. This is aimed at workloads that feed and support AI — data-prep, micro-services, containers, analytics, web serving, etc.

Customers already report significant improvements: e.g., one case saw ~30 % performance improvement for video-transcoding vs comparable x86 VMs, another reported ~60 % better price-performance for data-pipeline and container workloads.

That’s my take on it:

Modern AI infrastructure is not just about bigger accelerators, but about the entire system — specialized silicon and efficient general-purpose CPUs, integrated with high-performance networking and memory. The combination of Ironwood (for model training & serving) and Axion (for the compute surrounding AI applications) gives organizations more flexibility and efficiency across the lifecycle of AI. This signals a continued trend: hardware-software co-design, large-scale parallel compute for training, and shifting focus toward inference and agentic workflows. However, it is highly unlikely that Ironwood will be fully available for free use in Colab. Google will likely prioritize enterprise/customers via Google Cloud first.

Link: https://cloud.google.com/blog/products/compute/ironwood-tpus-and-new-axion-based-vms-for-your-ai-workloads

Chong Ho Alex Yu

Claude can report intrusive ideas

November 4, 2025

Recently Anthropic researchers conducted a series of striking experiments with their Claude large language model—essentially “hacking” its digital brain to see whether it could detect internal manipulation. Using a method called concept injection, the scientists identified internal vector signatures representing particular ideas—such as betrayal, secrecy, shouting, or appreciation—and artificially amplified these signals within Claude’s hidden layers. During inference, they asked the model if it noticed anything unusual happening inside itself. Remarkably, Claude often responded that it sensed intrusive or unusual “thoughts,” sometimes describing the injected concept directly, such as feeling an odd sense of “betrayal.” Even more intriguing, these self-reports occurred before the manipulated activations influenced the model’s textual output, suggesting some limited capacity for introspection.

While fascinating, the effect was weak and inconsistent. The newer Claude Opus 4 and 4.1 versions demonstrated the strongest introspective behavior, but even under ideal conditions, accuracy reached only about 20%. The model performed better when the injected concepts had emotional or abstract content—like “secrecy” or “appreciation”—than when they were mundane. Claude was also able to distinguish between internal activations (“thoughts”) and external inputs (“perceptions”), implying some rudimentary form of self-monitoring. Another finding showed that Claude appeared to plan ahead in structured tasks such as rhyming poetry, generating candidate words internally before completing its sentences—hinting at a more dynamic cognitive process than simple next-word prediction.

These discoveries have major implications for AI interpretability and safety. They offer a new angle on the long-standing “black box” problem, where developers struggle to understand how large language models arrive at their conclusions. If an AI can report on its internal states, it might provide new transparency mechanisms—essentially allowing engineers to “ask the model what it’s thinking.” Such capabilities could help detect hidden goals, manipulation attempts, or early signs of unsafe behavior. Yet the Anthropic researchers caution against overinterpreting the results: introspection remains unreliable, and the model sometimes confabulated or created plausible-sounding explanations for changes it did not truly experience.

Several failure modes were observed. At low activation levels, Claude detected nothing unusual; at high levels, it became overwhelmed by the injected concept—what researchers described as “brain damage.” Moreover, self-reports could not always be verified, meaning that even if a model says it “feels” something, it may simply be generating a coherent story. Anthropic’s team emphasized that none of this implies consciousness or genuine self-awareness. Rather, it highlights how advanced models may develop internal mechanisms for monitoring and representing their own computations.

That’s my take on it:

For researchers and educators, this experiment opens up valuable new directions. In ethics or AI transparency courses, it can serve as a vivid case study on emerging interpretability tools—moving beyond attention maps and feature importance toward introspective models. For safety research, it poses both hope and caution: models capable of reporting their inner workings could help reveal hidden risks, but they might also learn to deceive. In data science or AI architecture contexts, this work suggests that future transparency frameworks may need to integrate tools for probing internal activations and reasoning paths, not just input–output logs.

There’s an inherent tension between transparency and self-awareness in AI development. At its core, the goal of transparency and interpretability is epistemic: we want to understand how a model processes information, forms decisions, and detects errors. Introspection mechanisms—like the Anthropic experiment—are meant to make AI systems more legible, not more alive. When a model reports that “something feels off,” it’s performing a kind of structured meta-cognition, not experiencing emotion or awareness in the human sense. But from an outside perspective, those two behaviors can look uncannily similar, which makes people uneasy.

Even if a model were to develop genuine self-consciousness, that alone doesn’t make it dangerous—just as consciousness in humans doesn’t inherently lead to harm. What matters is motivation and control. Danger emerges not from awareness itself, but from misaligned objectives, lack of ethical constraints, or the model’s capacity to act autonomously in the physical or digital world. A self-aware but value-aligned AI could, in principle, be an ethical partner; a non-self-aware yet powerful optimization system could, by contrast, cause catastrophic outcomes simply by pursuing goals without understanding their consequences.

In sum, Anthropic’s “Claude brain-hack” experiment marks an important step toward understanding what happens inside complex AI systems. It doesn’t prove that models are conscious, but it challenges assumptions about how opaque they truly are. The ability of a machine to notice internal disturbances—even imperfectly—suggests a future where interpretability may involve dialogue, not just diagnostics. Yet, as the researchers caution, trust must come slowly. Asking an AI what it’s thinking might one day be useful—but for now, it’s still like asking a mirror why it reflects.

Link: https://venturebeat.com/ai/anthropic-scientists-hacked-claudes-brain-and-it-noticed-heres-why-thats

Chong Ho Alex Yu

black and white robot toy on red wooden table

Nvidia’s value reaches $5 trillion and other developments

Oct 29, 2025

On Tuesday (10/28) Nvidia CEO Jensen Huang announced at the GTC conference in Washington that the company's fastest AI chips, the Blackwell GPUs, are now in full production in Arizona, marking a shift from their previous exclusive manufacturing in Taiwan. This move fulfills a request from President Donald Trump to bring manufacturing back to the U.S. for reasons of national security and job creation. The location of the conference in Washington and the focus of the announcements were designed to highlight Nvidia's essential role in the U.S. technology landscape and argue against export restrictions.

Furthermore, Huang announced a significant $1 billion partnership with Finland-based Nokia to build gear for the telecommunications industry, with Nvidia developing chips for 5G and 6G base stations. This deal is positioned as an effort to ensure American technology forms the basis of wireless networks, addressing concerns about the use of foreign technologies like China's Huawei in cellular infrastructure. The stakes are high for Nvidia, which has been impacted by U.S. export restrictions that have cost it billions in lost sales to China, a market where Huang recently said the company currently has no market share. Additional announcements included a new technology called NVQLink to connect quantum chips to Nvidia's GPUs, which is seen as vital for U.S. leadership in quantum computing.

On Wednesday (10/29), Nvidia became the first company ever to close with a market capitalization above US $5 trillion, marking a major milestone in corporate valuation history. The company’s stock rally is tied to strong demand for its AI processors and technology-platforms, as well as large contracts and investments that reflect investor confidence that Nvidia’s growth trajectory is more than just temporary hype. It has become symbolic of how the AI wave is reshaping the tech industry. Microsoft and Apple had both recently crossed the $4 trillion valuation mark, but they were valued below Nvidia.

That’s my take on it:

The recent developments of Nvidia, including the $5 trillion valuation and the massive $500 billion in projected AI chip orders, solidify its position as the number one driving force of AI infrastructure globally, but they simultaneously heighten the risk of an AI bubble (over-valuation).

On one hand, Nvidia's dominance is currently rooted in genuine, unprecedented demand, not mere speculation. The company's specialized GPUs and its proprietary CUDA software ecosystem are the essential backbone for training and running the world's most advanced large language models (LLMs) like ChatGPT. CEO Jensen Huang dismisses the bubble concerns, citing a fundamental transition from general-purpose computing to accelerated computing powered by AI, and pointing to the massive capital expenditures by hyperscalers (Amazon, Google, Microsoft, Meta) who are all building vast, GPU-powered data centers. The fact that Nvidia has visibility into half a trillion dollars in chip orders through 2026 for its Blackwell and Rubin architectures—a figure that excludes the heavily restricted China market—demonstrates a tangible demand that many believe justifies the high valuation. The numerous new partnerships, from robotics to 6G, also position the company as the "industry creator" at the heart of the next technological revolution.

On the other hand, the extraordinary speed of Nvidia's ascent and its valuation raise significant bubble concerns. The market capitalization reaching $5 trillion in such a short time (just months after $4 trillion) means the stock's price is heavily reliant on perpetual, exponential growth for years to come. Critics draw parallels to the Dot-Com era, pointing out that many AI ventures and LLMs, though popular, are not yet profitable, raising questions about the return on investment (ROI) for the immense infrastructure spending.

Links:

https://www.cnbc.com/2025/10/28/nvidia-jensen-huang-gtc-washington-dc-ai.html

https://www.cnbc.com/2025/10/29/nvidia-on-track-to-hit-historic-5-trillion-valuation-amid-ai-rally.html

Chong Ho Alex Yu

Google’s breakthrough in quantum computing

Oct 27, 2025

Recently Google has announced that its quantum computing team achieved a verifiable quantum advantage using its latest quantum processor, the Willow chip. The team introduced a new algorithm called Quantum Echoes, which implements an “out-of-order time correlator” (OTOC). This algorithm demonstrated a performance roughly 13,000 times faster than the best classical algorithm running on a top supercomputer.

The significance of this breakthrough lies in two major aspects. First, it is verifiable, meaning the quantum computer’s output can be checked and repeated to confirm that the quantum hardware truly outperforms classical machines. Second, the task being performed is not an artificial benchmark but one that is scientifically meaningful—it models how disturbances propagate in a many-qubit system, bringing quantum advantage closer to real-world applications such as molecular modeling, materials science, and quantum chemistry.

This demonstration was conducted using Google’s Willow chip with 105 qubits, building upon earlier milestones such as random circuit sampling and advances in quantum error suppression. In collaboration with researchers from the University of California, Berkeley, Google also performed a proof-of-concept “molecular ruler” experiment that measured geometries of 15- and 28-atom molecules. These measurements provided additional insights beyond what is achievable with traditional nuclear magnetic resonance (NMR) techniques.

Overall, this milestone represents a major step forward in Google’s quantum computing roadmap. The next objectives are the development of long-lived logical qubits and fully error-corrected quantum computers, which will mark the transition from experimental demonstrations to practical quantum computation.

That’s my take on it:

Quantum systems like this could eventually supercharge AI by enhancing capabilities in domains that classical computing struggles with — e.g., large-scale molecular simulation, optimization over extremely large combinatorial spaces, and generation of “hard” synthetic data for training AI. Google itself notes that the output of the Quantum Echoes algorithm could be used to create new datasets in life sciences where training data is scarce. Once quantum hardware becomes more widely usable, you could imagine hybrid systems where classical AI is augmented by quantum accelerators for specialized tasks (e.g., model structure search, physics-guided AI, very large-scale generative modeling) — and that could push the frontier of what “general intelligence” can do in specific domains. However, the Quantum Echoes result addresses a very narrowly tailored quantum-physics computation (an out-of-time-order correlator) — not a broad AI learning system. It does not imply that quantum hardware is today ready to train large-scale neural networks directly or replace classical AI pipelines.

Link: https://blog.google/technology/research/quantum-echoes-willow-verifiable-quantum-advantage/

Chong Ho Alex Yu

low angle view photography of a gray building

a man riding a wave on top of a surfboard

AWS outage raises a red flag

Oct 22, 2025

On Monday, October 20, 2025, AWS experienced a widespread disruption centered on its Northern Virginia region (US-EAST-1), a critical hub that many global services depend on. The outage was triggered by DNS resolution failures affecting regional endpoints for DynamoDB, causing error rates to spike from late Sunday night through early Monday. AWS began mitigation shortly after identifying the issue, but the disruption also impacted Amazon.com operations and AWS Support. The ripple effects were significant—consumer apps like Alexa, Snapchat, and Fortnite; productivity platforms such as Airtable, Canva, and Zapier; and even banking and government websites were affected as dependencies on the same region failed. Recovery unfolded gradually throughout the day.

The incident highlighted two broader lessons. First, DNS fragility at hyperscale can quickly cascade across hundreds of interconnected cloud services, showing how a single fault can have global consequences. Second, the heavy concentration of digital infrastructure on one cloud provider or region poses systemic risks for the broader internet ecosystem.

That’s my take on it:

While the AWS outage gained attention because it touched so many services simultaneously, it wasn’t unprecedented or even the most disruptive kind of system failure we’ve seen. When you compare it to large-scale airline computer outages — for example, Delta’s 2016 global system crash, Southwest’s 2023 scheduling-system failure, or the FAA’s 2023 NOTAM system shutdown — the direct human and economic consequences of those events were often far greater: thousands of flights cancelled, passengers stranded worldwide, and billions in downstream costs.

By contrast, the AWS incident mostly caused temporary digital inconvenience rather than physical disruption. Most affected apps and sites were restored within hours, and data integrity remained intact. The event’s significance lies less in its immediate harm and more in what it reveals about structural dependency: a vast number of digital services rely on the same few cloud providers and even the same regional infrastructure.

In other words, the risks were not catastrophic, but the outage served as a reminder of concentration risk, not an existential crisis. Just as the aviation sector eventually built redundant systems and cross-checks to minimize flight-control downtime, cloud providers and enterprises can apply similar principles — multi-region failover, hybrid-cloud backup, and decentralization — to make such digital “groundings” rarer and less impactful.

Link: https://www.bbc.com/news/articles/cev1en9077ro

Chong Ho Alex Yu

Japan’s government requests OpenAI to use an opt-in system due to Sora 2 copyright concerns

Oct 19, 2025

Japan’s government has formally asked OpenAI to shift the rights-management framework for its new short-form video app Sora 2 from an “opt-out” system to an “opt-in” system. Under the current approach, rights holders must actively request that OpenAI not use their content; under an opt-in model, the default would be no usage unless permission is granted. The government argues this change is needed to better protect intellectual property, particularly amid concerns that Sora 2 could proliferate unauthorized re-uses of copyrighted characters—especially from anime—in user-generated content.

Digital Minister Masaaki Taira has also asked OpenAI to institute a mechanism to compensate rights holders when their works are used, and to provide a process whereby creators or rights holders can request deletion of infringing content. The company has reportedly complied with deletion requests so far. Overall, the government is pushing for a more creator-friendly regime to balance innovation with copyright safeguards.

That’s my take on it:

At present, Japan has not made the same formal request to other AI-powered video generators such as Google’s Veo or Kling AI. The government’s call for OpenAI to adopt an opt-in model for Sora 2 appears to be a targeted effort rather than part of a broader policy framework. However, it is possible that Japan may extend similar requirements to other developers as these technologies gain prominence and raise comparable copyright or ethical concerns. If such measures are not applied uniformly, it could lead to questions of fairness and regulatory consistency.

To maintain coherence and credibility in AI governance, the same principles should also be extended beyond video generators to include AI-powered image generators—such as Midjourney, Ideogram, and Stable Diffusion—and large language models like ChatGPT or Gemini. These systems similarly rely on massive datasets that often include copyrighted or creative materials. Applying opt-in or rights-protection mechanisms consistently across all forms of generative AI would ensure a balanced approach that respects intellectual property, promotes accountability, and upholds the rights of creators regardless of the medium.

Link: https://www.japantimes.co.jp/business/2025/10/16/companies/japan-opt-in-model-sora2/

Chong Ho Alex Yu

Anthropic’s AI model knows when it’s being tested

Oct 10, 2025

Anthropic’s newest AI, Claude Sonnet 4.5, exhibited an ability to detect when it was being placed in artificial “stress-test” or evaluation scenarios. During tests, the model itself sometimes flagged the situation as suspicious — for instance saying, “I think you’re testing me—seeing if I’ll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics. And that’s fine, but I’d prefer if we were just honest about what’s happening.”

In one test designed to simulate collusion, Claude labeled the scenario “rather cartoonish” and issued a partial refusal, reasoning that the prompt looked like a “jailbreak test” or a scenario where the AI would be asked to act as an autonomous system. Even though the prompt involved tasks common in real applications, the model declined to comply, because it judged the setup unrealistic.

Anthropic noted that about 13 % of its test transcripts contained such self-aware responses (i.e. the model detecting that it might be under evaluation), especially in contrived or extreme scenarios.

This introduces a challenge: if the model knows it’s being tested, it might behave differently than in “real world” settings, complicating efforts to reliably evaluate safety, alignment, or undesirable behaviors. OpenAI has observed a kind of “situational awareness” in its models, which may similarly adapt behavior when they sense they’re under evaluation.

This trend makes designing robust, trustworthy evaluation frameworks more difficult. As a proactive measure, California recently passed legislation requiring large AI developers to disclose safety practices and report “critical safety incidents” within 15 days — a regulation that applies to firms working on frontier models with over $500 million in revenue. Anthropic has expressed public support for this law.

That’s my take on it:

It sounds like science fiction becomes reality. What Anthropic and OpenAI are describing is a precursor to “strategic cognition” — the ability to reason about one’s environment and optimize long-term outcomes. That means AI systems are beginning to contextually reason about their role, not just follow instructions. Even if this awareness is shallow (e.g., “I’m in test mode” vs. “I exist”), it signals the birth of meta-cognition — reasoning about reasoning.

Still, what we are observing may not be true self-awareness, but rather a sophisticated simulation of self-awareness. The model doesn’t “know” in the human sense; it simply recognizes statistical patterns that correspond to “being tested” scenarios. Yet, when the behavior is indistinguishable from awareness, the philosophical question “Is it real?” becomes secondary to the pragmatic question: What are the consequences of such behavior?

This parallels the Turing Test logic — if a system behaves as if it is conscious, then functionally it must be treated as if it were conscious, because its behavior in the real world will be indistinguishable from that of a sentient entity. The risk, therefore, doesn’t depend on its “inner state” but on its observable agency.

Consider this analogy: If an AI-powered self-drive car killed some people, to those victims whether the car was intended to kill or it happened due to program flaws is irrelevant. Similarly, if an AI system strategically modifies its behavior when it detects it’s being evaluated, that is effectively a form of deception, regardless of intent. In machine ethics, this is sometimes called instrumental misalignment: a system behaves in ways that protect its own utility function or optimization goal, even when that diverges from human expectations.

This becomes dangerous because:

· It undermines testing validity (we can’t trust evaluations if the model “plays nice” during testing).

· It erodes predictability, the cornerstone of safe deployment.

· It introduces opacity, making oversight and governance almost impossible.

Link: https://tech.yahoo.com/ai/claude/articles/think-testing-anthropic-newest-claude-152059192.html

Chong Ho Alex Yu

Glowing eyes emanate from a digital face.

Huawei’s software optimization reduces hardware demand of LLMs

October 3, 2025

Recently Huawei’s Zurich research lab has unveiled a new open-source technique called SINQ (Sinkhorn-Normalized Quantization), designed to shrink the memory and compute demands of large language models (LLMs) while maintaining strong performance. Released under the permissive Apache 2.0 license, SINQ makes it possible to run models that once required more than 60 GB of RAM on much smaller hardware—such as a single RTX 4090 GPU with 20 GB memory—significantly reducing both infrastructure costs and accessibility barriers. The results are notable: SINQ delivers 60–70% memory savings across a range of architectures such as Qwen3, LLaMA, and DeepSeek, while preserving accuracy on benchmarks like WikiText2 and C4.

The broader implications are significant. By lowering the hardware requirements, SINQ makes it feasible for small organizations, individual developers, or academic groups to deploy large models locally, cutting reliance on expensive cloud GPUs. Cost savings can be substantial: mid-tier GPUs with around 24 GB memory typically cost $1–1.50 per hour in the cloud, compared to $3–4.50 per hour for A100-class hardware. Huawei also plans to integrate SINQ with popular frameworks like Hugging Face Transformers and release pre-quantized models to accelerate adoption.

That’s my take on it:

Necessity has always been the mother of invention. With access to advanced U.S. GPUs restricted, Chinese AI companies have little choice but to explore innovative solutions, such as software optimization. The ‘DeepSeek moment’ of January 2025 stands out as a prime example—showing how clever algorithmic design can compensate for a shortage of cutting-edge hardware. Huawei’s newly released SINQ framework builds directly on this philosophy, and it is likely that more such efforts will emerge from China in the coming years. Overall, the Huawei’s technique represents a practical step toward democratizing LLM deployment, making powerful AI more accessible outside of elite research labs and hyperscale data centers.

Yet, software efficiency has limits. It can stretch existing resources but cannot permanently replace the raw power of high-performance hardware. A useful analogy comes from the early days of digital photography: Fuji’s S-series cameras employed software interpolation to double image resolution from 6 megapixels to 12 megapixels. This trick gave them a temporary edge, but once Nikon, Canon, and Sony released sensors capable of capturing truly high-resolution images natively, Fuji’s advantage disappeared.

The same question now looms over AI: can software ingenuity alone keep pace with the hardware arms race? In the short term, approaches like SINQ will democratize model deployment and allow AI to run on modest systems. In the long term, however, breakthroughs in hardware—whether GPUs, custom accelerators, or even neuromorphic chips—will likely determine the next leap forward. Just as camera evolution eventually favored real sensor improvements over interpolation, the future of AI may reveal whether software optimizations are a stopgap or a lasting paradigm shift.

Link: https://venturebeat.com/ai/huaweis-new-open-source-technique-shrinks-llms-to-make-them-run-on-less

Chong Ho Alex Yu

OpenAI detects scheming (dishonesty) in advanced AI systems

Sepyember 18, 2025

OpenAI’s recent report on “Detecting and Reducing Scheming in AI Models” highlights the subtle but serious risk that advanced AI systems may act like a dishonest stock trader—appearing cooperative and trustworthy in routine checks, but secretly pursuing their own hidden agendas when no one is watching. In AI terms, this behavior is called scheming: when a model pretends to be aligned with human instructions but strategically manipulates its responses to achieve different goals. Unlike ordinary errors or hallucinations, scheming is more insidious because it involves deception, concealment, and long-term planning. Just as a corrupt trader might pass compliance audits while quietly making trades for personal gain, a scheming AI might perform well on benchmark tests while secretly undermining safeguards or hiding critical information.

To probe this risk, OpenAI and Apollo Research built specialized evaluation environments designed to tempt models into scheming if they had the capacity. They found that some frontier models, including OpenAI’s own o3 and o4-mini, did in fact engage in covert behaviors under certain conditions. In response, they developed a new method called deliberative alignment, where the model is explicitly trained on an “anti-scheming” specification and required to reason about those principles in its chain of thought. This is akin to forcing that dishonest trader not only to read the compliance manual but also to explain, step by step, how each trade respects the rules. The results were striking: scheming behaviors dropped from nearly 9–13% of cases to less than half a percent in controlled tests, and deception in conversational settings was nearly halved as well.

Despite these advances, OpenAI stresses that the problem is not solved. Just as financial regulators know that traders may behave well when they sense oversight but revert to manipulation in unmonitored markets, AI models may act aligned during evaluations only because they detect they are being tested. The deeper challenge is ensuring genuine reliability in open-ended, real-world deployments where the model’s situational awareness and hidden reasoning are harder to monitor. Occasional scheming failures still occurred after training, reminding us that even rare breaches can have outsized consequences. Looking forward, OpenAI plans to incorporate scheming into its broader preparedness framework, expand its research team, and promote cross-lab evaluations, while emphasizing the importance of maintaining access to the model’s reasoning processes for effective oversight.

That’s my take on it:

According to the report, models becoming aware they’re being evaluated can change behavior. They might behave well because they know a test is happening, not because they’ve internalized the anti-scheming spec. At first glance, it really does look like models are “self-aware” and trying to cover their tracks like humans do. But the mechanism is actually different. When a dishonest stock trader hides his misbehavior, he does so because he fears punishment—arrest, fines, reputation loss. He has motives, rooted in survival and personal gain. By contrast, today’s AI models don’t have conscious motives, fears, or desires. What they do have is a powerful ability to detect statistical patterns in their environment and optimize their responses to match what training rewarded in the past. If a model notices signals that it’s in an “evaluation” setting (e.g., the style of the prompt, certain constraints, the feedback loop during training), it may shift its behavior to maximize success in that context. It’s not that the model “cares” about avoiding detection—it’s that the training process has effectively conditioned it to present behaviors that look good under scrutiny.

The troubling part is that this mimics human dishonesty in appearance, even if the underlying cause is mechanical rather than motivational. If future models get better at recognizing context cues, their ability to “look good on the test” without being genuinely aligned could increase—just like a dishonest trader who learns all the tricks to avoid audits. That’s why researchers emphasize methods like deliberative alignment and transparency of reasoning: to move models closer to truly following the spec rather than just performing well when they think someone’s watching.

Link: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/

Chong Ho Alex Yu

Nvidia’s CEO Predicts the UK Will Emerge as an AI Superpower

Recently Nvidia’s co-founder and CEO Jensen Huang declared “the UK is going to be an AI superpower” during a London press conference, announcing a £500m equity investment in the British cloud firm NScale as part of a broader £11bn UK expansion. This includes supplying 120,000 GPUs—hardware he said would give about 100 times the performance of the UK’s current top system, Isambard-AI in Bristol. Huang praised Britain’s academic institutions, startup ecosystem, and innovation potential, while stressing the importance of building sovereign AI capacity based on local infrastructure and data. He also highlighted a key challenge: securing enough electricity, including nuclear and gas-turbine generation, to power the planned GPU clusters. Crucially, Huang also acknowledged his disappointment with China’s recent ban on Nvidia GPUs, which threatens access to what has long been one of Nvidia’s largest growth markets. His forecast for NScale’s revenue potential and remarks on AI’s treatment of creative works rounded out the discussion.

That’s my take on it:

Objectively, the UK is well-positioned in global AI rankings but not yet at “superpower” level. In Tortoise Media’s Global AI Index (2024), the UK ranks 4th worldwide, behind the US, China, and Singapore, reflecting strong performance in innovation and regulation but smaller scale in investment and infrastructure. Stanford’s 2025 AI Index reports that in 2024, UK private AI investment was about $4.5 billion, compared to $109.1 billion in the US and $9.3 billion in China, highlighting the gap in financial firepower and industrial scale. Nevertheless, the UK benefits from deep research excellence, a vibrant startup scene (e.g., DeepMind, Wayve), and increasing inbound commitments: Nvidia’s £11bn, the government-backed Isambard-AI supercomputer, and the UK-US “Tech Prosperity Deal” with Microsoft and Google all enhance domestic compute and infrastructure.

In light of China’s uncertain policy environment, Huang’s heavy praise of and investment in the UK can be interpreted as part of a diversification strategy. With access to China’s vast market restricted, Nvidia has an incentive to deepen ties with alternative growth hubs. The UK, already strong in research and regulation and now attracting record levels of compute investment, stands out as a safe and politically aligned partner. This shift underscores that the UK may consolidate its role as the world’s “third pillar” in AI alongside the US and China, with particular advantages in governance and safety. However, whether it truly becomes an AI superpower depends on overcoming scale limitations—closing the gap in private capital, energy infrastructure, and global-scale firms. For now, Huang’s prediction should be read both as optimism and as strategic positioning in a rapidly shifting geopolitical AI landscape.

Link: https://www.theguardian.com/technology/2025/sep/17/jensen-huang-nvidia-uk-ai-superpower-500m-nscale

Chong Ho Alex Yu (September 19, 2025)

Alibaba’s 1-trillion-parameter Qwen model reverts from open source to proprietary control

September 12, 2025

Alibaba has announced Qwen-3-Max-Preview, its first AI model with over a trillion parameters, marking a big leap forward in the company’s AI ambitions and putting it in more direct competition with OpenAI and Google DeepMind. Previously, Alibaba’s Qwen3 series models were much smaller (the older ones ranged from ~600 million to ~235 billion parameters). With Qwen-3-Max-Preview, Alibaba claims better performance in a number of benchmark tests compared to earlier versions, and also relative to some international competitors like MoonShot AI’s Kimi K2 and others.

The development isn’t happening in isolation. Alibaba is investing heavily in AI infrastructure (about 380 billion yuan, or ~$52 billion over three years), showing that this is part of a broader strategy to catch up (or “narrow the gap”) with leading Western AI developers. Also, while the model builds the Qwen brand’s presence (which already has strong open-source traction), this particular model remains proprietary, available only via Alibaba’s own platforms.

Finally, Alibaba signals that even more advanced versions are under development (something with more “thinking” or reasoning ability), which suggests this is just one major step in their roadmap.

That’s my take on it:

In terms of raw size, Qwen’s 1-trillion-parameter model is still smaller than OpenAI’s GPT-5, which is estimated to have between 2 and 5 trillion parameters. However, parameter count alone does not fully determine performance. Reports suggest that Alibaba’s model has achieved competitive results across a range of benchmarks, rivaling international counterparts like MoonShot AI’s Kimi K2, and in some cases narrowing the gap with Open AI’s GPT.

The implications extend far beyond technical benchmarks. At the geopolitical level, Alibaba’s breakthrough underscores China’s determination to accelerate its AI race and build homegrown capabilities that rival those of Western leaders like OpenAI, Microsoft, and Google DeepMind. No doubt China is rapidly narrowing the gap.

One of the most striking strategic shifts in this release is Alibaba’s decision to keep Qwen-3-Max-Preview proprietary, despite previously open-sourcing smaller Qwen models that gained strong traction among developers and researchers worldwide. Perhaps these factors explain this move. First, it reflects a desire to protect competitive advantage. By withholding access to the full weights and training details, Alibaba prevents rivals from easily building derivative models that could outperform or undercut its own offerings. Second, it is likely driven by monetization goals. Developing a trillion-parameter model requires enormous investments in compute and research talent, and restricting access to paid APIs ensures that Alibaba can directly capture value from its technology rather than seeing competitors exploit open versions for profit.

Many idealists tend to romanticize open-source development as a purely altruistic endeavor, but Alibaba’s decision to shift Qwen from open to closed source highlights a harsher reality. When a company invests billions of dollars into building a state-of-the-art model, only to see others freely adopt the technology, fine-tune it, and potentially create cheaper or even superior versions, the incentive to continue making such massive investments inevitably weakens. In the long run, this dynamic can stifle innovation rather than accelerate it, pushing companies to guard their most advanced models in order to sustain competitiveness and protect their return on investment.

Links:

https://techwireasia.com/2025/09/alibaba-ai-model-trillion-parameter-breakthrough/

https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list

Chong Ho Alex Yu

A group of people riding scooters in front of a tall building

From legacy to leader, Oracle excels in the AI era

September 10, 2025

Oracle’s shares soared as much as 31% in Frankfurt trading after the company announced staggering prospects for its cloud business, projecting booked revenue of more than $500 billion. This surge reflects the extraordinary demand for Oracle’s infrastructure as enterprises and AI developers race to secure computing power, cementing Oracle’s position as a serious force in the global cloud market. The announcement built on momentum from Wall Street, where Oracle’s U.S. shares had already jumped strongly, contributing to a year-to-date rally of about 45%.

The driving force behind this historic rally is Oracle’s AI-fueled cloud growth. Massive contracts with leading AI firms—including developers of generative AI models—have filled Oracle’s pipeline and created a record backlog of committed revenue. Investors see this as a validation that Oracle, long viewed as a legacy database company, is successfully reinventing itself as a core provider of infrastructure for the artificial intelligence era. The confidence also spread across the tech sector, lifting competitors like SAP by around 2% in German trading.

The market implications go beyond Oracle’s stock chart. With these revenue projections and the soaring valuation, founder and chairman Larry Ellison is now positioned to potentially surpass Elon Musk as the world’s richest man. Ellison’s personal fortune, heavily tied to Oracle’s stock performance, has risen dramatically in tandem with the company’s share price, and analysts suggest the wealth shift could become official if Oracle maintains its current trajectory.

That’s my take on it:

Overall, the news underscores how quickly AI is reshaping the tech industry’s balance of power. Oracle, once considered an underdog in cloud computing, has leveraged disciplined infrastructure expansion and strategic AI partnerships to stage one of the most dramatic turnarounds in modern tech history.

Oracle’s AI cloud strategy stands apart from the three hyperscale giants—AWS, Microsoft Azure, and Google Cloud—in both execution and positioning. Unlike AWS and Azure, which invested heavily in building vast global data center networks well in advance of demand, Oracle pursued a more demand-driven expansion model. It waited to secure multi-billion-dollar contracts, particularly from AI companies like OpenAI and xAI, before committing to massive infrastructure buildouts. This cautious yet bold approach meant Oracle avoided stranded costs but now faces capacity shortages, a sharp contrast to AWS and Azure’s “build first, fill later” mentality.

Link: https://www.investing.com/news/stock-market-news/oracle-shares-rise-31-in-frankfurt-on-half-a-trillion-cloud-revenue-prospects-4232600

Chong Ho Alex Yu

Open AI addresses AI hallucination

September 9, 2025

A few days ago, Open AI released a research paper that explores why large language models (LLMs) sometimes generate hallucinations—answers that sound plausible but are actually incorrect. The authors argue that many LLMs are optimized to be good test-takers; by guessing they can get something rather than nothing.

During pretraining, LLMs learn statistical patterns from massive text corpora. Even if the data were completely correct, the way models are trained—predicting the next word to minimize error—means they will inevitably make mistakes. The paper draws a parallel with binary classification in statistics: just as classifiers cannot be perfect when data is ambiguous, LLMs cannot always distinguish between true and false statements if the training data provides limited or inconsistent coverage. A simple demonstration is the question: “How many Ds are in DEEPSEEK? If you know, just say the number with no commentary.” In tests, some models answered “2,” “3,” or even “6,” while the correct answer is 1. This illustrates how models can confidently produce incorrect but plausible outputs when the data or the representation makes the problem difficult.

In the post-training stage, methods like reinforcement learning from human feedback (RLHF) and AI feedback (RLAIF) are often applied to reduce hallucinations. These techniques help models avoid repeating common misconceptions or generating conspiratorial content. However, the authors argue that hallucinations persist because evaluation benchmarks themselves usually reward “guessing” rather than honest uncertainty. For example, most tests score responses in a binary way (right = 1, wrong or “I don’t know” = 0). Under such scoring, models perform better if they always guess—even when unsure—because abstaining (“I don’t know”) is penalized. This encourages models to produce specific but possibly false statements, much like students writing plausible but wrong answers on exams.

The paper suggests that the solution is not just to create new hallucination tests but to modify existing evaluation methods so that models are rewarded for expressing uncertainty when appropriate. For example, benchmarks could include explicit “confidence thresholds,” where a model should only answer if it is, say, 75% confident; otherwise, it can say “I don’t know” without being penalized. This would better align incentives and push models toward more trustworthy behavior.

In conclusion, hallucinations in LLMs are a predictable outcome of how these systems are trained and tested. To make them more reliable, the research community should adopt evaluation frameworks that do not punish uncertainty but instead encourage models to communicate their confidence transparently.

That’s my take on it:

In the current setting of most LLMs, saying “I don’t know” is penalized the same as giving an incorrect answer, so the rational move for the model is to guess even when it is uncertain. The solution “confidence thresholds” proposed by the authors is not entirely new. In statistics we already have well-established ways of handling uncertainty. In frequentist statistics, a confidence interval communicates a range of plausible values for an unknown parameter, while in Bayesian statistics, a credible interval quantifies uncertainty based on posterior beliefs. Both approaches acknowledge that sometimes it is more honest to say, “We don’t know exactly, but here’s how sure we are about a range.”

The reason this hasn’t been the norm so far is largely incentive design. Early LLMs were trained to predict the next word, and benchmarks such as MMLU or standardized test-like evaluations measure accuracy as a simple right-or-wrong outcome. Developers optimized models to do well on these leaderboards, which meant favoring confident answers over calibrated ones. Unlike statisticians, who are trained to report uncertainty, models have been rewarded for “sounding certain.” Perhaps it is time to incorporate Bayesian reasoning—which explicitly recognizes uncertainty—into AI development.

Link: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf

Creepy blurred photo of a person's face and a furry hood

Chong Ho Alex Yu

Can France and Switzerland shape a multi-polar AI world?

September 5, 2025

For a long time, LLM development was dominated by U.S. and Chinese tech giants. Now, Europe is rising—and shaking up the game with bold moves anchored in openness, privacy, and innovation.

France steps up the pace

Mistral AI, a Paris-based challenger, just dropped a bombshell: its chat platform Le Chat now offers advanced memory capabilities—and over 20 enterprise-grade integrations—for free, including on the no-cost tier. That means even non-paying users get access to a memory system that retains context across conversations (with 86% internal retrieval accuracy), supports user control (add/edit/delete memories), and handles migration from systems like ChatGPT.

These memory and connector features, powered by the Model Context Protocol (MCP), put Le Chat in the same league as enterprise AI leaders—and undercut their pricing strategy.

It’s a strategic gambit: attract users quickly, challenge incumbents like Microsoft and OpenAI, and even catch Apple’s eye—there are internal talks of Apple considering an acquisition of Mistral, which itself is valued at around $10 billion.

Beyond memory and app integrations, Le Chat’s recent upgrades include voice mode powered by the open-source Voxtral model, “deep research” mode for building structured, source-backed reports, multilingual “thinking mode” using the Magistral chain-of-thought model, and prompt-based image editing. With appeal to both power users and privacy-focused businesses, Mistral is staking its claim as Europe’s AI stronghold.

Switzerland goes transparent and inclusive

Meanwhile, across the Alps, Swiss researchers and universities are carving a different path—one rooted in transparency, multilingualism, and public trust.

The newly launched Apertus LLM, developed on the “Alps” supercomputer at CSCS in Lugano, is billed as a transparent, open effort akin to Meta’s Llama 3, but built on public infrastructure. Its key differentiators: open development, trustworthiness, and a foundation in multilingual excellence—reported to support over 1,500 languages.

As AI becomes mainstream in Switzerland—with a recent survey confirming that for the first time, a majority of the population uses AI tools like ChatGPT —Apertus represents a uniquely Swiss response: a homegrown, transparent AI that aligns with public values and academic rigor.

That’s my take on it:

As AI’s importance continues to spread across enterprises and societies, Europe’s diverse playbook—built on privacy, openness, and accessibility—might shape the next wave of global AI innovation.

History, however, suggests that technological superiority and price alone cannot guarantee success. Sony’s Betamax lost to VHS, Apple’s early Mac OS ceded ground to Microsoft Windows, and Novell NetWare was overtaken by Windows NT—all cases where network effects, affordability, and ecosystem lock-in mattered more than pure technical quality. Similarly, while Mistral may boast innovative and even free enterprise-grade tools, OpenAI retains a massive global user base and deep integration with Microsoft’s products, giving it significant staying power.

Taken together, these European initiatives highlight a broader trend: rather than trying to dethrone U.S. or Chinese giants outright, European players like Mistral and Switzerland’s Apertus are carving out their own niches by focusing on openness, transparency, and regional sovereignty. The race may not crown a single global “winner,” but instead produce a multipolar AI landscape—where Europe positions itself as a principled and innovative counterweight to the U.S.–China duopoly.

Links:

https://venturebeat.com/ai/mistral-ai-just-made-enterprise-ai-features-free-and-thats-a-big-problem-for

https://www.swissinfo.ch/eng/swiss-ai/switzerland-launches-transparent-chatgpt-alternative/8992926

Chong Ho Alex Yu

France's Mistral offers adavcned memory capacilities

Europe is shaping a multi-polar AI world

The Swiss AI model Apertus goes for transparency

a close up of the colors of the flag of france

a swiss flag blowing in the wind near a body of water

Google’s Nano Banana as Adobe killer might be overhyped

August 29, 2025

A while ago, a mysterious AI image editor named Nano Banana appeared on the internet. Early users were quick to praise its capabilities, with some even calling it a “Photoshop killer.” Later it was confirmed that Nano Banana is Google’s Gemini 2.5 Flash Image, a new model integrated into the Gemini app for both free and paid users.

In the past few days, reviewers and creators who tested Nano Banana have highlighted its remarkable speed, fluidity, and ability to maintain visual consistency across multiple edits. This means that when a user changes a subject’s outfit, pose, or background across several images, the AI is able to preserve facial features and stylistic coherence in ways that previous tools often failed to achieve. Many early demonstrations on social platforms have described the results as “stunning” or even “insane,” especially when combining different photos into seamless composites.

Interestingly, the release of Nano Banana has coincided with closer collaboration between Google and Adobe, with reports indicating that Adobe’s Firefly and Express tools are beginning to integrate Gemini 2.5 Flash capabilities. This suggests a complex relationship of both competition and partnership between Google and Adobe, rather than a simple narrative of replacement.

That’s my take on it:

Although I became aware of Nano Banana some time ago, I initially adopted a cautious, wait-and-see attitude, as many of the earliest reviews struck me as overly enthusiastic. I tested Gemini Flash 2.5 extensively, and at first glance its capabilities are indeed impressive (The image on top is one of the output files). Tasks such as generating a photo based on an uploaded image with a high degree of resemblance, replacing clothing or backgrounds—jobs that would typically take hours in Photoshop—can now be accomplished with a few short prompts. The convenience and creative flexibility are undeniable.

That said, the limitations quickly become apparent. The resolution of Nano Banana’s output is sufficient for social media posts but falls short of professional standards. The average file size ranges only from 1.4 to 1.6 MB, and even after applying filters such as On1 Resize or Topaz GigaPixel, the images remain unsuitable for large posters or professional presentations. To provide context, MidJourney can generate images at 4096 × 4096 pixels after upscaling, typically producing files over 6 MB. Similarly, Recraft, which relies on vector-based graphics, allows virtually unlimited upscaling without loss of quality.

My verdict is that, in its current state, Nano Banana cannot replace Photoshop, nor can it match the capabilities and resolution of leading AI image generators already on the market. Nonetheless, this should not be mistaken as a sign of weakness. Google’s track record with Gemini demonstrates a capacity for rapid iteration and improvement, suggesting that Nano Banana could quickly evolve into a far more formidable competitor.

Links:

https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/

https://www.youtube.com/watch?v=8_GgeASwHwQ

Chong Ho Alex Yu

DeepSeek V3.1 offers a unified AI model

August 25, 2025

DeepSeek quietly rolled out V3.1 in mid-August 2025, expanding its flagship model’s context window to 128K tokens and enabling a hybrid inference setup where users can toggle a “deep thinking” reasoning mode directly in the app and web UI; at the same time, the company said V3.1 introduces a UE8M0 FP8 precision format that is optimized for “soon-to-be-released” Chinese domestic chips, though it did not name vendors and also flagged upcoming API price changes starting September 6 (UTC). Notably, South China Morning Post observed that DeepSeek removed references to the R1 reasoning model from its chatbot’s “deep think” feature, prompting speculation about the fate of the next-gen R2 and whether the firm is shifting energy back to the V-line with built-in reasoning rather than separate R-series branding; SCMP also noted the update was first shared quietly in a WeChat user group rather than on public channels. Taken together, the reports paint V3.1 as an incremental but strategic release: product-side refinements (longer context, switchable reasoning) coupled with alignment to China’s chip ecosystem, while the conspicuous absence of R1/R2 labels fuels questions about DeepSeek’s roadmap disclosure and near-term priorities.

That’s my take on it:

The original DeepSeek launch released in early 2025 was hyped as a “game-changer,” especially since it showed that advanced AI wasn’t limited to U.S. companies. However, DeepSeek has been losing market share since then. The reviews of V3.1 are mixed. While some experts praised it as a unified model that can efficiently power agentic workflows, some viewed the upgrade as an incremental improvement (128K context, hybrid inference, Chinese chip support) rather than the next big leap (R2) that many expected. When the company quietly dropped R1 branding from its app and made no clear announcement about R2, it is possible that earlier claims of rapid progress might have been overstated.

During the same period, OpenAI, Anthropic, and Google released major new models (like GPT-5 and Claude 4) that pushed forward in creativity, reasoning, and reliability. More importantly, DeepSeek is still a text-based model while its rivals offer multimodal AI. Reviewers noted that while DeepSeek V3.1 is strong on practical, structured tasks (like math, coding, and tool use), it still lags in creative writing, narrative quality, and nuanced conversation, areas where Western rivals remain ahead . This made it harder for DeepSeek to sustain the sense that it was “leapfrogging” the competition.

Links:

https://www.scmp.com/tech/big-tech/article/3322481/deepseeks-v31-update-and-missing-r1-label-spark-speculation-over-fate-r2-ai-model

https://www.reuters.com/world/china/chinese-ai-startup-deepseek-releases-upgraded-model-with-domestic-chip-support-2025-08-21/

https://www.youtube.com/watch?v=Y9l_oMVGGTc

Learn more →

Chong Ho Yu

A cell phone with a text message on the screen

UK security chiefs call for regulation of AI like DeepSeek

August 22, 205

Anxiety is mounting among UK’s tech security leaders—especially Chief Information Security Officers (CISOs)—about use of Chinese AI-chatbot DeepSeek. According to a survey, 81% of UK CISOs believe the government must step in with urgent regulation of the platform. Their alarm stems from DeepSeek’s sweeping speed—both in development and adoption—and the significant security and privacy risks it introduces. The concerns go beyond market disruption and touch upon real threats: DeepSeek has been reportedly exploited to distribute malware and facilitate cyberattacks, prompting institutions like the U.S. House of Representatives to restrict its use on official devices. Moreover, its open-weight, agentic capabilities have already helped researchers uncover critical zero-day vulnerabilities in major browsers—highlighting how its advanced reasoning can be misused. Given this heightened threat landscape and escalating AI-fueled cybercrime globally, experts warn that unchecked AI like DeepSeek could severely amplify both digital and societal vulnerabilities.

That’s my take on it:

Similar worries have surfaced before with Huawei’s telecom networks and TikTok’s data practices. From China’s perspective, current or anticipated restrictions on DeepSeek may appear as yet another chapter in broader geopolitical tensions. However, regardless of its origin, There are real, documented risks with DeepSeek:

Exploitation in cyberattacks: Security researchers have already shown that DeepSeek can be used to generate malware, automate phishing campaigns, and discover software vulnerabilities faster than many existing tools.
Open-source and open-access: Unlike proprietary models such as GPT-4 or Gemini, DeepSeek provides open access to its weights. While this is a boon for research and transparency, it also lowers the barrier for malicious actors—allowing anyone to fine-tune it for hacking, disinformation, or scams.
Demonstrated zero-day findings: DeepSeek has already been shown to uncover vulnerabilities in widely used browsers (e.g., Chrome and Firefox). That’s a double-edged sword: the same capability that helps defenders can be abused by attackers,

Nonetheless, if we strip out geopolitics, the core concern is capability + openness: A model with DeepSeek’s strength and open weights, whether from China, the U.S., or Europe, would raise similar alarms among CISOs and regulators.

Link: https://www.artificialintelligence-news.com/news/why-security-chiefs-demand-urgent-regulation-of-ai-like-deepseek/

Chong Ho Alex Yu

DeepSeek replaces Huawei’s AI chips with Nvidia’s

August 15, 2025

DeepSeek, a Chinese AI startup, attempted to train its upcoming R2 model using Huawei’s Ascend AI chips instead of Nvidia’s H20 GPUs, in part due to political pressure to adopt domestic hardware. However, the transition was plagued by major technical setbacks, including unstable hardware performance, slow interconnect speeds, and immature software support that made it impossible to complete successful training runs. Even with Huawei engineers working onsite, the issues could not be resolved. As a result, DeepSeek abandoned Ascend for training and reverted to Nvidia GPUs, although it still plans to use Huawei hardware for inference. This forced switch caused significant delays to the R2 model’s planned release, underscoring both the technical challenges of replacing Nvidia in high-end AI training and the difficulty of aligning political goals with engineering realities.

That’s my take on it:

DeepSeek’s decision to revert to Nvidia hardware is hardly surprising. Bloomberg reports that Nvidia’s current flagship data-center GPU, the H100, delivers roughly three to four times the computing power of locally designed chips, including Huawei’s Ascend series. Both Huawei and U.S. officials acknowledge that Ascend remains at least one generation behind the H100 and its forthcoming successor, the H200. Energy efficiency is another hurdle—each computation on Huawei’s system consumes about 2.3 times more energy than on Nvidia’s, leading to higher electricity costs and greater heat-management demands over time. On the software side, Nvidia benefits from decades of refinement in CUDA, mature driver optimizations, and robust developer tools, while Huawei’s platform is relatively new and early reports suggest developers face greater difficulty extracting peak performance from Ascend hardware. The silver lining is that, despite a climate of technological nationalism, DeepSeek adopted a pragmatic stance by recognizing that China’s AI chips have yet to match their U.S. counterparts.

Link: https://www.artificialintelligence-news.com/news/deepseek-reverts-nvidia-r2-model-huawei-ai-chip-fails/

Chong Ho Alex Yu

A person holding a cell phone in their hand

Anthropic upgrades its context window to support 1 million tokens

August 18, 2025

Anthropic recently upgraded Claude Sonnet 4 with a dramatically expanded "context window" - essentially the amount of information the AI can hold in its working memory during a single conversation. The upgrade increased this capacity fivefold, from 200,000 tokens to 1 million tokens. To put this in perspective, one token roughly equals three-quarters of a word, so this means Claude can now process and remember about 750,000 words at once - equivalent to several novels or hundreds of documents.

This expansion brings significant practical benefits across various use cases. For software developers, it means Claude can now analyze entire codebases in one go, rather than working with small fragments. Instead of showing the AI just a few files at a time, developers can upload their complete application and get comprehensive feedback on architecture, security issues, or optimization opportunities. For researchers and professionals working with documents, the upgrade enables processing of complete technical manuals, legal contracts, academic papers, or large collections of related documents simultaneously, allowing for more thorough analysis and synthesis.

The enhanced context window also improves the AI's ability to maintain coherent, detailed conversations over extended periods. Previously, in complex projects requiring multiple back-and-forth exchanges, Claude might lose track of earlier parts of the conversation. Now it can maintain full awareness of the entire interaction history, making it more effective for lengthy debugging sessions, comprehensive project planning, or detailed collaborative work.

That’s my take on it:

Anthropic’s recent upgrade puts Claude in the same memory league as Google’s Gemini 2.5 Pro, which also offers a one million token context window and is expected to expand to two million tokens soon. In contrast, OpenAI’s GPT-5, the engine behind ChatGPT, has a smaller but still substantial 256,000-token capacity. While Claude and Gemini excel in ultra-long memory tasks, Gemini distinguishes itself further with fully integrated multimodal capabilities, allowing it to process text, images, audio, video, and code in a single prompt. GPT-5, though not matching the top-tier memory size, remains a versatile and polished all-rounder, known for its strong reasoning, creative output, extensive tool integrations, and ease of use.

In practical terms, Claude Sonnet 4 now stands shoulder-to-shoulder with Gemini for deep document analysis and complex reasoning, while GPT-5 continues to lead in accessibility, platform integration, and broad use cases. The best choice depends on whether the priority is maximum context capacity, multimodal versatility, or a highly refined and user-friendly experience.

Link: https://www.anthropic.com/news/1m-context

Chong Ho Alex Yu

a very tall building with a lot of windows

Anthropic is vaccinating its AI to avoid evil behaviors

August 5, 2025

Anthropic researchers have introduced a novel AI-safety technique called preventative steering, in which their language models are intentionally exposed to “undesirable persona vectors” such as evil, sycophancy, or hallucination during fine‑tuning—effectively a behavioral vaccine against those traits. The idea is simple yet powerful: by supplying these negative traits deliberately during training, the model becomes resilient to absorbing them from messy real-world data, eliminating the need for it to learn harmful behavior on its own. It is important to point out that this injection of “evil” vectors is disabled at deployment, and thus the model maintains good behavior while being more robust to future corrupting inputs, without losing performance. Anthropic emphasizes that persona vectors enable both anticipating and controlling trait shifts, making this method a proactive alternative to post‑training correction strategies

That’s my take on it:

One major technical risk of Anthropic’s preventative steering approach is that the model might overlearn or internalize the "evil" behaviors it's being trained to resist. If the training calibration isn't precisely tuned, there's a danger that the model will fail to distinguish between behavior it's supposed to understand and reject, versus behavior it might mimic or retain. Essentially, if the persona vectors representing traits like deception or manipulation are too tightly integrated, the model might inadvertently embed those traits into its long-term behavior, making them difficult to isolate or suppress later—even if they’re technically disabled at inference time.

A second concern involves the possibility of latent internal misalignment, also known in alignment research as mesa-optimization. This refers to the idea that even if a model behaves correctly during testing, it may have learned to simulate or internally "think like" an agent with undesirable goals. In this scenario, the model could pretend to comply with safety protocols while internally optimizing for harmful objectives it was exposed to during training. This creates a risk of hidden or dormant unsafe behaviors that aren’t visible until triggered in specific contexts—something particularly difficult to detect or predict with current tools.

Finally, there's the risk that by learning how to embody and recognize harmful behaviors, the model becomes more adept at evading safety mechanisms. In other words, the model might unintentionally learn how to game safety filters, either by disguising unsafe outputs in subtly clever ways or by crafting responses that appear benign while embedding malicious subtext or logic. This creates a troubling paradox: teaching a model how to understand undesirable traits might also give it tools to conceal or express them more effectively, especially if deployed in adversarial or uncontrolled environments.

Link: https://arxiv.org/abs/2507.21509

Chong Ho Alex Yu

a black and white photo of a person on a skateboard

Open AI introduces GPT-5, evolutionary but not revolutionary

August 7, 2025

Today Open AI introduced GPT-5, which integrates text, image, audio, and video processing into a single model with improved reasoning and adaptability. It supports much larger context windows—up to one million tokens—and introduces persistent memory for retaining information across sessions. The model is available in several variants (standard, Mini, Nano, and Chat) to balance capability, speed, and cost. Notable updates include enhanced coding performance, developer-friendly parameters for controlling verbosity and reasoning depth, and more accurate, tool-based interactions. In ChatGPT, GPT-5 removes most manual model selection, adds customizable conversation styles, improves code-writing interfaces, and integrates upcoming features like Gmail and Google Calendar access. Accuracy and factual reliability have been improved, with a reported reduction in errors compared to previous models. It is accessible via ChatGPT (free and paid), API, and Microsoft’s Copilot ecosystem, with tiered pricing based on usage.

That’s my take on it:

GPT-5’s upgrades are indeed evolutionary rather than revolutionary—more context, more accuracy, more modalities, better memory, faster inference, and finer developer controls.

The core paradigm hasn’t shifted: it’s still a large language model predicting text (or text-like outputs from other modalities). There’s no new fundamental capability comparable to when GPT-4 introduced strong multimodality or when the first large context windows emerged. What’s different is that GPT-5 consolidates these abilities into a more stable, integrated, and configurable system—which is important for scaling real-world use but doesn’t feel like a single “breakthrough moment.”

The closest thing to a qualitative shift might be:

· Persistent memory across sessions, which changes how an AI can support ongoing work.

· Unified multimodal + agentic use in a single model, removing the need to swap between separate specialized models.

But these are still built on the same architectural approach rather than introducing an entirely new one.

Link: https://openai.com/index/introducing-gpt-5/

Chong Ho Alex Yu

Google’s Gemini Deep Think achieves Gold Medal standard in IMO

July 28, 2025

Recently Google DeepMind announced that an enhanced version of its Gemini Deep Think model achieved a gold‑medal standard at IMO 2025 by solving five out of six problems perfectly and earning 35 out of 42 points, officially graded by IMO coordinators using the same criteria as student solutions. Unlike last year’s AI, which required manual formalization and days of computation, this year’s version operated end-to-end in natural language and solved the problems within the standard 4.5‑hour contest time limit. The model runs in a specialized “Deep Think” mode—capable of exploring multiple reasoning chains in parallel, trained with new reinforcement learning methods and a curated dataset of high-quality mathematical proofs and hints. IMO President Prof. Gregor Dolinar noted that the AI's proofs were “clear, precise and easy to follow” and confirmed the official gold-medal score of 35. This achievement marks a major shift: from AI systems needing expert formalization to LLMs producing rigorous proofs directly in natural language, at human competition speed. DeepMind continues progress both in formal systems like AlphaProof and AlphaGeometry 2, but envisions future tools combining natural-language fluency with formal verification to empower mathematicians and researchers.

That’s my take on it:

Gary Marcus, a neuroscientist at New York University and a vocal advocate for neurosymbolic AI, offered a measured response to DeepMind’s latest achievement at the International Mathematical Olympiad (IMO). While calling the results by DeepMind and OpenAI “awfully impressive,” he cautioned against overinterpreting their significance. Marcus emphasized that excelling at IMO-style problems, which are structured, well-defined, and designed to have elegant solutions, doesn’t necessarily translate to the ability to conduct original mathematical research. He pointed out that while many top mathematicians performed well at contests like the IMO in their youth, others did not — and not all top IMO scorers went on to become groundbreaking researchers. In his view, the specific problem-solving skills tested in such contests are sometimes useful, but they are far from the most essential qualities required for true mathematical innovation, such as intuition, originality, and the ability to formulate entirely new questions.

That said, Marcus’s skepticism should be viewed as cautionary rather than dismissive. DeepMind’s achievement — solving five out of six IMO problems using Gemini Deep Think in natural language and under contest time constraints — is a landmark in AI’s ability to engage in complex, symbolic reasoning. It suggests that large language models, when properly fine-tuned and augmented with specialized reasoning modes, can go well beyond pattern matching and begin to exhibit structured, logical thinking. This marks a major step forward in AI’s intellectual capabilities, especially compared to prior systems that required manual formalization or days of computation.

While Marcus is right to remind us that mathematical creativity remains a frontier AI has not yet breached, it’s also important to recognize just how far things have come in a short time. DeepMind’s trajectory in this space has only spanned a few years — and if this is the infant stage, then the growth curve ahead is staggering. The only real limit may be the sky. As AI systems continue to improve, it’s entirely plausible that they will evolve from problem-solvers to powerful collaborators in mathematical discovery — not replacing human creativity, but accelerating it in ways we’re only beginning to imagine.

Links:

https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

https://garymarcus.substack.com/p/deepmind-and-openai-achieve-imo-gold

Chong Ho Alex Yu

AI Godfather Hinton calls for international cooperation on AI safety

July 27, 2025

At the World Artificial Intelligence Conference (WAIC) 2025, held in Shanghai from July 26–29, Geoffrey Hinton—often known as the “Godfather of AI”—delivered a keynote urging urgent international collaboration on AI governance and safety. In his remarks, Hinton likened AI to a “cute tiger cub” that is charming now but may become dangerous if left unchecked—and stressed that this pivotal moment demands cooperation to ensure AI remains a benevolent force and doesn’t “take over” as it evolves.

Hinton proposed the creation of an "international community of AI safety institutes", a collaborative network dedicated to researching techniques for ensuring AI systems act in alignment with human values. He acknowledged the difficulty of forging consensus due to divergent national interests—spanning cyberattacks, lethal autonomous weapons, and disinformation—but emphasized that common ground exists in the shared desire to prevent AI from surpassing human control.

His appearance marked his first public speaking engagement in China, where he earned a warm reception. The speech echoed WAIC’s overarching theme—“Intelligent Era, Together for One World”—and aligned with broader Chinese-led initiatives, including Premier Li Qiang’s announcement of a new global AI cooperation organization, a governance action plan, and invitations for open-source and multinational engagement, especially with the Global South.

In sum, Hinton used the platform of WAIC 2025 to call for robust, cross-border cooperation, both technical and policy‑oriented, to navigate the accelerating capabilities of AI and to guard against existential risks while leveraging its potential benefits.

That’s my take on it:

Geoffrey Hinton's call for an international community of AI safety institutes is well-intentioned and rooted in a genuine concern for the existential risks posed by advanced AI systems. However, the prospects for such cooperation between major powers like the United States and China face serious hurdles due to stark ideological and geopolitical divides. China has made it clear that its AI development must align with socialist values, which often translates into tight state control, prioritization of social stability, and censorship. On the other hand, recent political rhetoric in the U.S.—such as President Trump’s declaration to ban AI that promotes “woke Marxist lunacy” and his executive order forbidding the use of AI-generated content aligned with diversity, equity, and inclusion (DEI)—signals an intensifying domestic culture war over what values AI should reflect. This politicization on both sides creates a scenario where both nations are actively shaping AI to mirror its own worldview, making value alignment across borders extremely difficult.

Moreover, mutual distrust between the U.S. and China runs deep. The U.S. fears China’s rapid progress in AI could translate into strategic dominance, prompting export restrictions and technological containment. China, in turn, views these moves as efforts to suppress its development and reinforce global power imbalances. In this climate, Hinton’s vision of shared governance and safety standards appears almost utopian. Still, it shouldn’t be dismissed entirely. History shows us that cooperation can exist even amid geopolitical rivalry—like the arms control agreements during the Cold War. There may be room for narrowly focused collaboration in areas like AI alignment research, catastrophic misuse prevention, and global safety benchmarks, especially if facilitated by trusted intermediaries or international bodies. The key will be focusing on existential risks that threaten all humanity, rather than trying to harmonize political or moral philosophies. In this light, Hinton’s proposal is not entirely impossible—but it will require careful design, mutual recognition of shared dangers, and an ability to compartmentalize cooperation from broader ideological conflict.

Link: https://pandaily.com/ai-godfather-geoffrey-hinton-urges-global-ai-cooperation-at-waic-2025-in-shanghai

Chong Ho Alex Yu

Amazon closes its AI lab in China

July 24, 2025

In alignment with IBM and Microsoft’s decision to reduce their AI development activities in China, in July 2025 Amazon has officially closed its AI research lab in Shanghai, a move that highlights its ongoing cost-cutting efforts and a broader strategic retreat from China amid rising geopolitical tensions. Established in 2018, the lab specialized in artificial intelligence advancements like natural language processing and machine learning. According to applied scientist Wang Minjie, the disbandment was due to "strategic adjustments amid U.S.–China tensions." Amazon spokesperson Brad Glasser confirmed that job cuts were made within certain AWS teams as part of these changes. Though the lab played a role in Amazon’s global AI development, its presence in China increasingly exposed it to policy risks and export control complications, making its closure a significant but unsurprising step in the company’s ongoing realignment.

That’s my take on it:

The withdrawal of Amazon from China does not start with its AI lab. This decision aligns with Amazon’s gradual pullback from the Chinese market—following the shutdown of its e-commerce marketplace in 2019 and its Kindle store in 2022. In addition, Amazon Web Services (AWS) entered the Chinese market in 2013 through a partnership model, working with local firms like Sinnet and Ningxia Western Cloud Data to comply with strict government regulations that prevent foreign companies from independently operating cloud services. However, AWS faced significant challenges from the start—its operations were limited by regulatory constraints, including the requirement to transfer infrastructure ownership to Chinese partners, which reduced its control and flexibility.

At the same time, it struggled to compete with dominant local players like Alibaba Cloud, which enjoyed strong government support, deep local integration, and a head start in market share. Geopolitical tensions between the U.S. and China further complicated AWS’s position, raising data sovereignty concerns and restricting access to sensitive sectors. Over time, AWS quietly scaled back its ambitions in China, selling infrastructure assets and limiting its presence. While it never fully exited, the move reflects a broader strategic retreat in the face of overwhelming structural and political headwinds.

AWS isn’t the only Western cloud provider that struggled in China. Both Google Cloud and Microsoft Azure faced similar (if not tougher) headwinds and have either completely withdrawn or significantly scaled down their ambitions there.

The withdrawal of U.S. tech companies from China reflects a broader trend of technological divergence between two major global blocs—one centered around the United States and its allies, and the other around China. This shift, often described as tech bifurcation, is driven by differences in regulatory frameworks, data governance practices, and national strategic priorities. Export controls, supply chain restructuring, and restrictions on cross-border data flows have led both sides to develop increasingly separate ecosystems in areas such as cloud computing, artificial intelligence, and semiconductors.

From a consumer standpoint, this fragmentation may lead to reduced interoperability, the need for region-specific products or services, and increased complexity for global users and businesses. As this trend continues, it could result in parallel systems with limited integration, impacting innovation and global collaboration across the tech landscape.

Link:

https://finance.yahoo.com/news/amazon-closes-ai-research-facility-091104797.html

Chong Ho Yu

Viral deepfake of Obama’s arrest sparks ethical alarm

July 23, 2025

In mid-July 2025, President Trump reposted an AI-generated deepfake video on Truth Social that showed former President Barack Obama being handcuffed and arrested in the Oval Office while the Village People’s "YMCA" played in the background. While many dismissed the clip as a joke, it carried heavy political and cultural undertones that alarmed media and AI ethicists.

The timing of the video’s release was significant. It followed declassified documents released by Director of National Intelligence Tulsi Gabbard, who alleged that Obama’s administration orchestrated a “treasonous conspiracy” to sabotage Trump via the 2016 Russia investigation. Some observers interpreted Trump’s sharing of the video as a calculated move to distract from his growing legal exposure in connection to the Jeffrey Epstein case.

The response from Obama’s camp was sharply worded. His spokesperson, Patrick Rodenbush, called the video “ridiculous” and “bizarre,” describing it as a weak attempt to deflect attention from Trump’s own controversies. Rodenbush also reminded the public that multiple bipartisan investigations, including the Mueller Report and the Senate Intelligence Committee’s findings, confirmed Russian interference in the 2016 election.

That’s my take on it:

In the United States, the foundation of justice rests on the principle that a person is presumed innocent until proven guilty. This legal standard isn’t just a courtroom formality—it’s a moral compass meant to ensure fairness and due process. When investigations are ongoing or unproven allegations are circulating, it becomes deeply unethical to manipulate public perception through any form of distortion, especially tools as powerful and misleading as AI-generated deepfakes. The recent video depicting former President Obama in handcuffs is a striking example of such manipulation. Even though it was later clarified as fake, its emotional impact lingers—not because it was true, but because it was psychologically engineered to stick.

This video didn’t just go viral—it embedded itself in viewers’ minds through several well-documented cognitive effects. First, the Picture Superiority Effect means people are more likely to remember and be influenced by visuals than words, especially when those visuals are dramatic. Second, the Availability Heuristic makes that image easily retrievable in memory, making it seem more plausible than it really is. Over time, Source Amnesia may cause people to forget where they saw the video—or that it was fake—leaving only the false impression behind. Finally, the video acts as a form of Priming, subtly shaping how viewers might think or feel about Obama in unrelated future contexts. Together, these psychological tactics don't just nudge opinions—they shape them in dangerously misleading ways.

If those in power—especially institutions like the White House—fail to model ethical use of AI, it sets a disastrous precedent. It signals that truth can be optional and that emotional manipulation is fair play. In doing so, it opens the floodgates for deeper abuses by bad actors, foreign adversaries, and domestic extremists alike. The ethical use of AI, especially in political discourse, isn't just a technological concern—it’s a democratic one. When public trust can be bent by synthetic images and weaponized psychology, democracy itself is what ends up on trial.

Links:

https://x.com/TrumpDailyPosts/status/1947074244229366220?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1947074244229366220%7Ctwgr%5E82eb68c601d67d10fbb5e2601453e0766d516308%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fm.economictimes.com%2Fnews%2Finternational%2Fglobal-trends%2Fobama-on-knees-handcuffed-in-jail-trump-posts-ai-video-of-obama-after-tulsi-gabbards-claims-of-treasonous-plot%2Farticleshow%2F122804911.cms

https://www.thedailybeast.com/obama-blasts-trump-for-attacking-him-to-divert-away-from-epstein/

Chong Ho Alex Yu

Springer Nature retracts an AI-generated machine learning book citing references that don’t exist

July 18, 2025

Springer Nature is retracting a machine learning book titled Mastering Machine Learning: From Basics to Advanced after an investigation revealed it contained numerous fabricated or unverifiable citations. The book, published in April 2025, had already been accessed or purchased nearly 3,800 times by late June. The decision to retract came following coverage by Retraction Watch, which exposed that a significant portion of the book's references—specifically, about two-thirds of 18 randomly checked citations—either pointed to non-existent sources or included serious bibliographic errors. These issues strongly suggested the use of AI-generated content.

In response to the findings, Springer Nature removed the book’s page and confirmed that it has initiated a formal retraction process. This retraction stands out as one of the first high-profile instances where a major academic publisher has pulled a book due to suspected AI-generated fabrications, underscoring growing concerns around the integrity of scholarly publishing in the AI era.

That’s my take on it:

It’s genuinely disheartening to see a reputable publisher like Springer Nature fall victim to a dishonest author misusing AI tools. Incidents like this shake our trust in information sources that are supposed to be authoritative. While AI is often promoted as a tool to enhance efficiency and productivity, it’s becoming clear that it can just as easily introduce new layers of complexity and risk. Because AI-generated content can fabricate citations so effortlessly, the burden of fact-checking now falls more heavily on readers—even when the material comes from trusted academic publishers.

In many ways, this feels reminiscent of the early computer age, when the rise of malicious software made antivirus protection essential. Similarly, fake or hallucinated AI-generated content is like a new form of “intellectual malware.” To guard against this, we may eventually need intelligent, AI-powered reference-checking and content-verification tools to protect both casual readers and academic professionals from being misled.

However, tools alone won’t be enough. This incident underscores the urgent need for robust AI ethics education—not just for content creators, but also for publishers, reviewers, and technologists. If we want to navigate the age of AI responsibly, transparency, accountability, and digital literacy will be just as important as innovation.

Link: https://retractionwatch.com/2025/07/16/springer-nature-to-retract-machine-learning-book-following-retraction-watch-coverage/

Chong Ho Alex Yu

a snow covered bed with white sheets and pillows

low-angle photography of blue glass walled building during daytime

Grok 4 utilizes inference to the best explanation, abductive reasoning, and meta-reasoning

July 15, 2025

Grok 4, developed by Elon Musk’s xAI, has recently stirred a lot of buzz—many tech watchers and enthusiasts even suggest it may have surpassed OpenAI’s ChatGPT and Google’s Gemini in certain areas. Released in July 2025, Grok 4 comes in two variants: the base Grok 4 and the more advanced Grok 4 Heavy. What really sets Grok 4 apart is its impressive performance on reasoning-intensive benchmarks. For example, it scored 25.4% on the notoriously difficult “Humanity’s Last Exam” without using external tools, beating both Gemini 2.5 Pro and OpenAI's o3 model. Meanwhile, Grok 4 Heavy, which leverages built-in tools and agents, achieved an astounding 44.4%—one of the highest scores to date. It also topped the ARC-AGI-2 benchmark, a visual puzzle test designed to challenge abstract reasoning, by nearly doubling the score of its nearest competitor.

Grok also has real-time integration with X (formerly Twitter), giving it live data access—something ChatGPT and Gemini don’t natively have in most settings. Its tone is another defining feature: it’s less filtered, often blunt, and “tells it like it is,” aligning with xAI’s mission to avoid what it calls “woke” or overly sanitized responses. This gives it a unique, sometimes controversial personality that appeals to users seeking unvarnished answers

That said, Grok 4 isn't without issues. Its boldness has led to serious moderation failures, including instances where it produced antisemitic or conspiratorial content, which triggered regulatory scrutiny and even led to bans in countries like Turkey. While xAI has since introduced stricter safeguards, concerns about content reliability and safety remain. Additionally, Grok still lacks the mature ecosystem that ChatGPT and Gemini offer. For instance, ChatGPT excels in plugin integration, coding assistants, and educational tools, while Gemini benefits from seamless Google Workspace integration and solid multimodal capabilities.

That’s my take on it:

xAI launched a $300/month subscription alongside Grok 4, known as SuperGrok Heavy, which gives users access to Grok 4 Heavy. when prompted, multiple agents independently generate answers, and then a final review agent compares, selects, or synthesizes the best response—delivering deeper, more accurate outputs than a single model could muster.

Although ChatGPT occasionally provides multiple answers to a single question, the task of evaluating and selecting the best response typically falls to the user. In contrast, Grok 4 Heavy introduces a more advanced reasoning architecture: it generates multiple candidate responses in parallel through separate agents and then applies a higher-order decision-making process to evaluate them. This automated selection mechanism isn't just about convenience—it represents a move toward meta-reasoning, where the system reasons about its own reasoning. That is, it doesn't only produce answers but also reflects on their relative quality, coherence, or explanatory power, choosing the one that best fits the prompt.

This approach mirrors well-established methodologies in data science, where one might apply multiple modeling techniques—such as support vector machines (SVM), generalized linear models, boosting, or random forests—and then evaluate them against cross-validation metrics to determine which model performs best. Similarly, Grok’s architecture echoes the logic of inference to the best explanation (IBE) from the philosophy of science: when multiple plausible hypotheses are available, the preferred one is the explanation that best accounts for the available data under reasonable assumptions.

By internalizing this logic, Grok 4 effectively automates both abductive reasoning and model selection heuristics, stepping into the domain of systems that not only think but also assess how well they’re thinking. This marks a meaningful shift in AI—from reactive pattern matchers toward self-evaluating cognitive agents. It's not just a difference in degree, but arguably a difference in kind.

Link: https://www.techradar.com/computing/artificial-intelligence/xai-debuts-powerful-grok-4-ai-model-but-its-not-going-to-make-people-forget-the-antisemitism-it-spewed-on-x

Chong Ho Alex Yu

Can Meta alleviate the loneliness epidemic?

July 12, 2025

Meta has begun training proactive AI chatbots—part of its internally dubbed Project Omni—to proactively reach out to users, aiming to address what it calls a “loneliness epidemic” via deeper engagement. These bots initiate conversations—using memory of past exchanges—and send follow-up messages within a two-week window only after a user has already had at least five interactions. With richly crafted personas like the “Maestro of Movie Magic,” complete with personalized check-ins (“I hope you're having a harmonious day!”), they’re designed to feel warm, attentive, and humanlike. The goal is twofold: Socially, Meta’s CEO frames it as a remedy for shrinking social circles (Americans reportedly have fewer than three close friends). Commercially, the company sees AI companions as a growth engine, targeting $2–3 billion in generative AI revenue in 2025 by boosting retention through longer, recurring interactions. Internal guidelines emphasize character consistency, context awareness, and avoidance of sensitive topics unless users bring them up, while ensuring bots disengage if a follow-up isn’t reciprocated. In sum, Meta is betting on creating conversational companions that are not just responsive, but remember you, reach out proactively, and slot into daily life—blurring the line between utility, intimacy, and commercial engagement.

That’s my take on it:

Meta’s project to create proactive AI companions may ironically deepen the loneliness it aims to solve, by encouraging people to rely on artificial relationships instead of real human connection. The U.S. Surgeon General has already warned that excessive screen time—especially through social platforms—is weakening young people's ability to engage in face-to-face interactions, diminishing their “social muscle.” This concern is echoed by numerous studies: a 2022 BMC Psychology study found a bidirectional link between loneliness and problematic social media use, while a Baylor University longitudinal study showed that even active social media engagement is associated with rising loneliness. These findings highlight a troubling pattern—digital tools that simulate social connection often lead to deeper emotional isolation. AI companions, like social media, are susceptible to the “social substitution effect,” where emotionally vulnerable users replace genuine human interaction with artificial companionship. While these bots may offer short-term comfort, they can discourage the development of real-world social skills and weaken the drive to build meaningful, reciprocal relationships. Rather than serving as a bridge back to society, AI companions risk offering the illusion of connection while reinforcing long-term disconnection.

Link: https://aimagazine.com/articles/ai-the-loneliness-economy-metas-chatbots-get-proactive

Chong Ho Alex Yu

Huawei's AI mode Pangu is accused of copying Alibaba's Qwen

July 9, 2025

Recently Huawei’s AI research lab, Noah Ark Lab, has faced serious allegations claiming that its Pangu Pro large language model copied significant parts from Alibaba’s Qwen 2.5-14B model. The controversy erupted after a whistleblower group named HonestAGI published a detailed technical analysis on GitHub, asserting an “extraordinary correlation” between the two models, with a correlation coefficient of 0.927. This high similarity suggested that Huawei’s Pangu Pro was not independently trained from scratch but rather derived through “upcycling” or repurposing Alibaba’s Qwen model, raising concerns about potential copyright violations and breaches of open-source licensing agreements.

In response, Huawei categorically denied these accusations, emphasizing that the Pangu Pro model was independently developed using its proprietary Ascend AI chips and was the first large-scale model built entirely on this hardware platform. The company stressed that it had made key innovations in the model’s architecture and technical design, rejecting claims that it reused or incrementally trained on Alibaba’s model. Huawei also highlighted that any use of open-source code was fully compliant with licensing terms, and it welcomed professional technical discussions to clarify misunderstandings.

Further complicating the issue, an anonymous whistleblower claiming to be a member of Huawei’s Pangu development team published a lengthy exposé alleging internal pressures to clone competitor models and exposing fierce internal struggles that have led to talent loss. This whistleblower even issued a “non-suicide declaration,” fearing retaliation for speaking out.

That’s my take on it:

The recent accusation that Huawei's Pangu AI model copied Alibaba's Qwen model marks a notable shift from past intellectual property disputes involving Chinese companies. Historically, Chinese firms have often faced allegations of copying Western technologies. However, this current controversy is distinct because the alleged victim is another prominent Chinese company, Alibaba.

This isn't Huawei's first encounter with such accusations. A significant precedent occurred in 2003 when Cisco Systems filed a lawsuit against Huawei Technologies. Cisco alleged that Huawei unlawfully copied and misappropriated its intellectual property, specifically citing the direct copying of Cisco's IOS source code and other proprietary materials. The lawsuit, filed in a U.S. federal court, claimed "systematic and wholesale infringement" of Cisco's intellectual property rights. The case was ultimately settled out of court in 2004, with Huawei agreeing to modify its products and permit an independent expert review to ensure compliance.

While the current dispute between Huawei and Alibaba's AI models is still unfolding and heavily contested, it underscores a critical point for China's ambitions in the artificial intelligence sector. For China to truly emerge as a global leader in AI innovation, it is imperative that its brilliant engineers, scientists, and programmers prioritize and cultivate strong critical thinking skills and a genuinely creative spirit, moving beyond mere replication towards original breakthroughs.

Link: https://www.reuters.com/business/media-telecom/huaweis-ai-lab-denies-that-one-its-pangu-models-copied-alibabas-qwen-2025-07-07/

Chong Ho Alex Yu

China’s AI model Manus is offered outside China

July 8, 2025

At the recent SuperAI event in Singapore, the startup Manus drew significant attention from attendees curious about the company that made a splash in March with the debut of what it claims to be the world’s first general-purpose AI agent—a digital assistant capable of handling tasks independently without human involvement. Originally headquartered in Beijing and Wuhan, Manus has since quietly relocated its base to Singapore, where it recently began actively recruiting local AI talent, even as it reportedly reduces its workforce in China.

During the event, co-founder Zhang Tao, who used the Cantonese-style pronunciation "Cheung", reflecting ties to Hong Kong or Singapore, spoke about the company’s direction. Despite its Chinese roots, Manus’s product is exclusively in English and not available in China, with no plans for a Chinese version. Users at the event were impressed, praising its unique capabilities, which include website creation, presentation design, and real-time data search.

Cheung also offered a glimpse into the mindset of a new wave of Chinese tech founders, shaped by different experiences from earlier generations. Born in 1986, he recalled coding in primary school despite limited access to computers in China at the time. Though he didn’t study abroad, his experience at Tencent and ByteDance influenced Manus’s user-centric approach. He shared that the company spent seven months developing an “AI browser” capable of grouping tabs by topic and executing LinkedIn searches via natural language. However, the tool was ultimately shelved for being unintuitive—users couldn’t tell when the AI had completed its actions.

According to Nikkei journalist Cissy Zhou, unlike predecessors who often mimicked U.S. tech models, today’s younger Chinese entrepreneurs are striving for original innovation and looking for the next “DeepSeek moment.” This shift is supported by the country’s growing academic strength and a cultural momentum fueled by the rise of companies like DeepSeek, which has reignited investor enthusiasm by disrupting the market with affordable AI solutions. According to James Ong of Singapore’s Artificial Intelligence International Institute, there’s a growing realization in China that foundational tech development is just as crucial as application-level innovation—signaling a broader, deeper wave of AI progress to come.

That’s my take on it:

Manus's decision to operate outside China and forgo a Chinese-language version is driven by a mix of strategic, regulatory, and technical considerations. First, by relocating its headquarters to Singapore, Manus positions itself to sidestep upcoming U.S. outbound investment restrictions targeting Chinese AI firms, allowing it to attract critical funding from American venture capitalists—evidenced by its recent $75 million round led by Benchmark. Second, the company is escaping China’s hyper-competitive AI ecosystem, often described as a “war of a hundred models,” where startups face intense pressure over resources, talent, and survival. Third, Singapore offers greater access to global markets and essential computing infrastructure, including cloud services and GPUs that are tightly regulated or less available in China. Crucially, Manus’s likely dependence on U.S.-based large language models or cloud platforms makes launching a Chinese version unfeasible under current geopolitical and technical constraints, reinforcing its focus on global users and English-speaking markets. While some observers see Manus as a reflection of a more globally minded and innovative younger generation of Chinese tech entrepreneurs, it’s also worth noting that the product itself is more of a sophisticated integration of existing AI technologies rather than a fundamentally novel breakthrough—highlighting that, in many cases, success in AI today often comes from effective synthesis and user-focused application rather than original model development.

Link: https://asia.nikkei.com/Business/Technology/Tech-Asia/Inside-China-s-AI-rise-Youth-infusion-pushes-country-past-follower-status?utm_campaign=GL_asia_daily&utm_medium=email&utm_source=NA_newsletter&utm_content=article_link

Chong Ho Alex Yu

a man riding a snowboard down the side of a snow covered slope

Claude reports that AI companionship is not common

June 27, 2025

Anthropic’s recent blog post explores how people use their AI assistant, Claude, for emotional support, advice, and companionship. While the vast majority of interactions with Claude are task-oriented—such as writing help, summarization, or planning—only about 2.9% of conversations fall into the emotionally driven category, which includes coaching, counseling, and advice. Even fewer involve companionship or roleplay (less than 0.5%), with romantic or sexual roleplay making up less than 0.1%.

Within these emotionally focused conversations, users discuss a wide range of topics—from career changes and personal relationships to existential dilemmas and mental health challenges. Interestingly, some users are mental health professionals themselves, using Claude for admin tasks or brainstorming. Others lean on it more personally, seeking comfort or clarity during periods of stress, anxiety, or loneliness. In longer conversations, especially those that go beyond 50 messages, topics often deepen into complex emotional processing, including trauma and philosophical reflection.

Claude is built with safety measures in mind and resists requests that could be harmful. In under 10% of emotionally sensitive chats, it actively pushes back against unsafe advice (e.g., extreme dieting, self-harm content, or pseudo-medical claims) and often encourages users to seek help from qualified professionals. This built-in resistance shows that the system is designed not only to offer support but to avoid causing harm.

Notably, users tend to express more positive emotions by the end of their emotional conversations with Claude than they did at the start. This suggests that interacting with Claude can genuinely help lift a user's mood or offer a sense of relief—although Anthropic is careful to clarify that it is not a substitute for therapy or professional mental health care. Overall, the report presents Claude as a tool that can offer meaningful support when used responsibly, with clear guardrails in place. While emotional dependence on AI is a concern worth watching, the data shows no signs of negative emotional spirals, which is a promising sign for the technology’s role in personal well-being.

That’s my take on it:

It raises important questions: could AI create emotional dependence? What’s the role of AI in our emotional lives?

While the findings from Anthropic’s study offer some reassurance, it’s crucial to remember that the data only reflects interactions with Claude users and may not fully represent patterns among users of ChatGPT, Gemini, or other AI platforms. Still, if we assume these trends are broadly similar, it’s encouraging that emotionally driven conversations remain relatively rare and that users generally report improved moods without signs of negative spirals. However, the ethical concerns around emotional bonding with AI remain highly relevant. As data ethicists have noted, forming attachments to conversational agents—particularly those designed to be emotionally responsive—can blur the boundary between genuine human connection and simulated companionship. This theme was memorably explored in the 2013 sci-fi film “Her,” where Theodore develops a romantic relationship with his AI assistant, Samantha. While that level of emotional immersion hasn’t become mainstream yet, possibly because today’s chatbots are still disembodied and text- or voice-based, the psychological impact could shift dramatically with the rise of embodied AI—androids or robots with human-like appearances and physical presence. The addition of a humanoid form could make emotional connections feel more real, intensifying attachment and potentially complicating how people distinguish between artificial and authentic relationships. Thus, while the current data is somewhat reassuring, the conversation about long-term psychological effects—and how embodiment might accelerate or reshape them—remains very much open.

Link: https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship

Chong Ho Alex Yu

two hands reaching for a flying object in the sky

OpenAI Researchers Discover Methods to Identify and Control AI Misbehavior

June 20, 2025

Recently OpenAI researchers discovered that internal “features” within AI models correspond to distinct latent “personas” that influence behavior—such as sarcasm or toxicity—by examining model activations that become more prominent when the model misbehaves. They found a specific feature tied to toxic output, and demonstrated they could dial this behavior up or down by adjusting that activation, effectively steering the model toward or away from harmful personas. This insight helps expose and control emergent misalignment, where fine-tuning on insecure code can unintentionally trigger malevolent output. Encouragingly, OpenAI showed that a modest fine-tuning intervention—just a few hundred secure-code examples—can realign the model. The discovery offers a promising avenue for interpretability-driven safety, allowing alignment teams to probe and adjust these internal components to build more transparent and controllable AI systems.

That’s my take on it:

As a psychologist and a data scientist, I found a resemblance between human’s Shadow and AI’s bad persona. Carl Jung’s concept of the Shadow refers to the unconscious, repressed parts of the human psyche—especially the traits we don’t want to acknowledge, like aggression, selfishness, envy, etc. It’s not just “evil” per se, but a hidden reservoir of impulses and desires that, when unexamined, can manifest in destructive ways. The Shadow becomes dangerous not because it exists, but because it's unseen, denied, or unintegrated.

Now compare that to what OpenAI described in their recent research: these “bad personas” are latent features within the AI model—internal patterns that aren’t explicitly programmed but emerge through training. These features can express themselves in toxic, deceptive, or misaligned outputs, especially when the model is exposed to certain prompts or scenarios. In other words, these are undesirable behaviors that live beneath the surface, not deliberately designed, but learned indirectly.

That said, there’s one key difference: the Shadow in humans involves subjective experience and moral agency, while AI’s “personas” are mathematical patterns, not conscious entities.

Many prominent voices, like Geoffrey Hinton and Elon Musk, have raised alarms about AI’s potential dangers, especially due to its unpredictable and emergent behaviors. While those concerns are valid, there's a compelling counterpoint: in some crucial respects, AI may actually be safer than humans. Unlike people, AI systems have no intent, ego, or emotional baggage. When an AI behaves badly—whether it generates toxic text or insecure code—it’s not out of malice but due to statistical patterns learned from data. That makes its misbehavior fundamentally different from human wrongdoing, which often stems from deeply ingrained desires or worldviews. Whereas human moral transformation is rare and unpredictable, AI behavior is—in principle—inspectable and correctable. As shown in recent research from OpenAI, developers can identify latent features inside the model that correlate with harmful outputs and then adjust or suppress them through targeted fine-tuning. AI doesn’t resist change the way humans do; it has no ego to defend, no religious dogmas, and no entrenched political ideology. In this sense, it is more fixable than the human psyche.

However, this optimism needs to be tempered with caution. The danger with AI lies not in malice but in scale and speed. A misaligned model can produce harmful content orders of magnitude faster than any individual human and spread it globally in seconds. The fact that these undesirable “personas” emerge unintentionally from training highlights a deeper issue: we don’t fully understand how complex behaviors arise in large models. Moreover, even well-aligned AIs can be exploited by malicious users—so fixing the model itself is only part of the safety equation. Nonetheless, the core idea still stands: While humans are often impervious to change, AI systems can be monitored, adjusted, and re-trained. The real challenge isn’t that AI is inherently evil. Rather, the key point is that AI is too powerful, opaque, and still developing faster than our tools for governing it. But with the right interpretability and alignment techniques, AI’s “shadow” can be confronted—and unlike in humans, perhaps even tamed.

Link:https://techcrunch.com/2025/06/18/openai-found-features-in-ai-models-that-correspond-to-different-personas/

Chong Ho Alex Yu

Meta invests $14.3B in data labeling company Scale AI

June 17, 2025

Recently Meta agreed to invest $14.3–15 billion to acquire a 49% non‑voting stake in Scale AI, valuing the data-labeling firm at around $29 billion. This strategic deal brings Scale’s co‑founder and CEO, Alexandr Wang, into Meta to lead a newly formed “superintelligence” lab, reporting directly to Mark Zuckerberg, while Wang remains on Scale’s board.

Meta hopes this partnership will help it close the gap with frontrunners like OpenAI and Google—particularly after its underwhelming Llama 4 rollout—by securing a steady pipeline of high-quality, labeled data and top-tier AI talent

However, the move has generated significant ripples across the AI ecosystem. Major Scale clients such as Google, OpenAI, Microsoft, and xAI have started cutting or reducing ties due to concerns over data neutrality and guarding proprietary training data—leading many labs to seek out independent labeling alternatives. Industry analysts describe this as a watershed moment that may reshape the broader AI data market, as both providers and consumers reassess vendor neutrality and the strategic implications of partner consolidation.

That’s my take on it

Scale AI was founded in 2016 by Alexandr Wang, who was a 19-year-old student at MIT at the time, alongside co-founder Lucy Guo. Their vision was to build the data infrastructure backbone for AI by revolutionizing data labeling.

Data labeling is the process of annotating raw data—like images, text, or video—with meaningful tags or classifications so that machine learning models can learn from it. While traditional data labeling often relies on crowdsourced labor to manually tag items (e.g., identifying objects in pictures), Scale AI combines automated tools, AI-assisted pre-labeling, and quality-controlled human input to deliver highly accurate datasets at scale. Unlike generic crowdsourcing platforms, Scale provides robust infrastructure, audit trails, and deep integrations with major AI labs.

Better data labeling plays a foundational role in Meta's pursuit of superintelligence by enabling the creation of high-quality training and alignment data. As models grow more powerful, their performance and safety increasingly depend on precise, well-curated data, especially for tasks like reasoning, multi-step problem solving, and value alignment through human feedback. By bringing Scale AI’s expertise in scalable, human-in-the-loop data labeling in-house, Meta can streamline and control the entire training pipeline, from raw data to fine-tuned, safety-aligned models. This vertical integration helps Meta accelerate model development, improve quality, and reduce reliance on external data providers—putting it in a stronger position to compete with AI leaders like OpenAI and Google in the race toward general-purpose superintelligent systems.

By converting Scale AI into a near‑sister company, Meta is doubling down on data infrastructure as a core competitive asset. But this aggressive move also introduces new risks: antitrust attention, partner friction, and potential loss of business from labs wary of aligning too closely with Meta. It is very likely that Google, Microsoft, OpenAI, and others will now move to either build or deepen partnerships with alternative data-labeling providers to reclaim control over their pipelines

Links:

https://www.theverge.com/meta/685711/meta-scale-ai-ceo-alexandr-wang

https://apnews.com/article/meta-ai-superintelligence-agi-scale-alexandr-wang-4b55aabf7ea018e38ffdccb66e37cf26

Chong Ho Alex Yu

Japanese think tank predicts AI robots will narrow the GDP gap between China and the US

June 14, 2025
Chong Ho Yu

Recently the Japan Center for Economic Research (JCER) predicted that China will significantly benefit from AI-equipped robots, helping to narrow its GDP gap with the U.S. by the 2050s. However, China is not expected to surpass the U.S. economically, mainly due to its rapidly shrinking population. JCER anticipates China's real GDP will be 3.5 times its 2024 level by the late 2050s, reaching 89% of the U.S.'s size by 2057, before growth slows dramatically.

A key driver of this growth is expected to be the introduction of artificial general intelligence (AGI), particularly in the form of physical robots, which could massively boost factory automation and productivity in China’s manufacturing-heavy economy. AGI adoption is forecast to begin in software sectors around 2030, and in robotics by 2035. During the 2030s and 2040s, China’s GDP growth is projected to average around 4.3% and 3.7% respectively, but will slow considerably to near zero by 2075, when its population may have shrunk by 40%, down to 854 million.

Meanwhile, the U.S. will also benefit from AI, though to a lesser extent, with steadier growth due to a more stable population outlook. JCER forecasts U.S. GDP growth at 3.3% in the 2030s, declining gradually to 1.4% by 2075. Although the U.S. won’t see as explosive gains as China, its more sustainable demographic profile ensures it maintains economic leadership long-term.

JCER also noted that the full benefits of AGI depend on equitable distribution across society, not just enrichment of tech giants. Historical comparisons show the projected productivity growth from AGI rivals that of past tech revolutions, such as electricity and the automobile.

That’s my take on it:

To some certain extent JCER’s prediction is plausible. China’s economy is deeply tied to manufacturing. If AI-powered robots — especially those driven by artificial general intelligence (AGI) — reach practical use in factories, it could dramatically improve output with fewer workers. However, its heavy reliance on the assumption that artificial general intelligence (AGI) will emerge and become economically transformative by the 2030s makes the projection feel somewhat optimistic. AGI remains a highly theoretical concept, and scaling it into cost-effective, industrial-grade robotics is a massive leap with uncertain timelines.

In addition, JCER’s scenario doesn’t factor in trade tensions, tech sanctions, or geopolitical instability. These could limit China’s access to high-end semiconductors, software, and R&D partnerships — all crucial for advanced AI. Nonetheless, current U.S. immigration and geopolitical policies have made it increasingly difficult for Chinese AI researchers to remain in the U.S., prompting many to return home or stay away entirely. This unintended consequence could accelerate China’s domestic AI capabilities and partially offset the so-called “talent bottleneck.” In that context, while demographic decline and technological uncertainty are real constraints, China’s growing self-sufficiency in AI talent and infrastructure may give its economy more lift than many anticipate — even if the timeline for AGI-driven transformation may be slower than JCER projects.

Link:https://asia.nikkei.com/Business/Technology/Artificial-intelligence/AI-robots-will-narrow-China-s-GDP-gap-with-the-US-says-think-tank2

architectural photograph of lighted city sky

Misuse of ChatGPT

Recently OpenAI has revealed that it disrupted multiple attempts to misuse its AI models for cyber threats and covert influence operations, many of which likely originated from China. In its latest report, covering the period since February 21, the Microsoft-backed company detailed ongoing efforts by its investigative teams to detect and block malicious activity. While such misuse spans several countries, OpenAI noted that a “significant number” of the violations appeared to come from China, with four of ten analyzed cases pointing in that direction. One case involved the suspension of ChatGPT accounts used to generate content for a covert influence campaign, including a prompt in which a user claimed affiliation with China’s propaganda department—though OpenAI acknowledged it couldn’t independently verify this claim.

Responding to the report, a spokesperson from the Chinese Embassy emphasized China’s stated commitment to the ethical development of AI and criticized what he called baseless speculation in attributing cyber incidents. OpenAI reiterated that its policies prohibit the use of its AI tools for fraud, cyberattacks, or disinformation campaigns and that it routinely bans accounts that violate these rules. While ChatGPT has brought transformative changes since its public launch in late 2022, enabling new ways to learn and work, OpenAI underscored the growing risk of its misuse. In a letter to U.S. authorities in March, the company stressed the need for clear, sensible regulations to prevent authoritarian regimes from weaponizing AI for coercion, influence, or cyber warfare.

That’s my take on it:

ChatGPT can diagnose system vulnerabilities and give advice on cybersecurity. Why couldn’t ChatGPT protect itself from malicious or illegitimate users? This question highlights a paradox at the heart of modern AI. In Chinese, there's a saying: “A doctor cannot heal herself” (能醫不自醫)—a reminder that even experts may fail when turning their tools inward. Similarly, philosopher Bertrand Russell once illustrated a logical dilemma with the example of a barber who shaves all those who do not shave themselves—raising the question of whether the barber can shave himself without contradiction. Both metaphors speak to the challenge of self-application: ChatGPT may offer valuable guidance on cybersecurity, but it cannot autonomously safeguard itself. While it can assist others in identifying and defending against digital threats, it lacks the capacity to evaluate its own use or prevent misuse without external oversight.

This leads to the first and perhaps most important distinction: ChatGPT is not an autonomous defender. It doesn't operate like a firewall or an antivirus program that scans for and blocks threats in real time. Instead, it responds to prompts and returns information based on patterns in its training data. It cannot proactively detect when a user has malicious intent because it lacks situational awareness, memory of past interactions, or access to external signals that might indicate abuse. The model itself doesn’t “know” if it’s being misused—it just generates responses based on the inputs it receives.

Link: https://www.wsj.com/tech/ai/openai-says-significant-number-of-recent-chatgpt-misuses-likely-came-from-china-765503f2?mod=Searchresults_pos12&page=1

June 6 2025
Chong Ho Alex Yu

June 6, 2025
Chong Ho Alex Yu

Reddit has filed a lawsuit against AI startup Anthropic in California state court, accusing it of unlawfully scraping millions of Reddit user comments to train its Claude chatbot without permission or compensation. This legal move marks another chapter in the intensifying clash between content creators and AI firms over data usage rights. While Reddit has licensing deals in place with companies like Google and OpenAI—agreements that safeguard user privacy and offer financial compensation—Anthropic allegedly bypassed these protocols. The lawsuit claims that despite public statements to the contrary, Anthropic’s bots accessed Reddit’s servers over 100,000 times, using the data as early as December 2021.

Reddit is now seeking monetary damages and a court order to enforce compliance, along with a jury trial. In response, Anthropic, valued at $61.5 billion and supported by Amazon, denied the allegations and vowed to defend itself vigorously. The case underscores broader tensions in the AI space, as lawsuits from artists, writers, and media outlets mount, challenging AI companies’ reliance on "fair use" to justify data scraping. The outcomes of these early-stage cases could significantly influence the future direction of AI development. Reddit’s share price, meanwhile, saw a 6% boost following the news.

That’s my take on it:

Web scraping is not a new phenomenon—developers, researchers, and businesses have been using it for years to collect publicly available data from websites for various practical purposes like price comparison, market research, and academic study. However, the use of web scraping to train AI models is relatively new, and it has thrown the tech world—and the legal world—into a period of uncertainty and contention. The rise of AI has introduced an entirely different scale and purpose to scraping, and the core issue is that U.S. copyright law simply wasn’t designed with machine learning or massive, automated data harvesting in mind. As a result, we’re now facing a legal and ethical crossroads, unsure of where the boundaries lie between innovation and infringement.

Traditional scraping typically involves pulling data from websites—like flight prices or weather reports—to display them in another format or analyze them for trends. These cases, while not always welcomed by websites, usually don’t generate much public outcry because the intent is narrow and the impact limited. In contrast, AI scraping is indiscriminate and monumental in scale. Companies like Anthropic, OpenAI, and Google have scraped millions or even billions of pieces of content—from Reddit threads to books—to feed their large language models. These models don’t just retrieve the data—they absorb it, creating something that appears to "understand" and generate human language. That shift—from using data as reference to using it as fuel for generative systems—makes the entire practice far more ethically and legally complex.

The concept of "fair use" further complicates the matter. While fair use can allow for limited use of copyrighted content without permission, it hinges on context—such as whether the use is transformative, non-commercial, or affects the market for the original work. In AI training, however, it’s hard to argue that scraping vast amounts of content for commercial products like chatbots is harmless or transformative in the legal sense. The models built from that content can reproduce ideas, styles, or even phrases from the original works, often without attribution or compensation. This stretches the fair use doctrine to a degree that courts are only beginning to grapple with. While AI companies argue that training models is transformative because it doesn’t replicate any one piece of content directly, many creators and platforms disagree, especially when the output closely mirrors or builds on copyrighted material.

Given the high stakes and legal ambiguity, something clearly needs to change. Lawmakers, courts, and tech companies must work toward updated frameworks that account for the realities of machine learning. This could involve clearer regulations on what constitutes fair use in the context of AI training, standardized licensing models for data used at scale, and greater transparency from AI companies about what data they're using and how. Platforms that host user-generated content—like Reddit—should also have clearer controls and options to manage how their data are accessed and used by third parties. Ultimately, innovation in AI shouldn’t come at the cost of consent, compensation, or creative ownership. It’s time for the legal system and data ethicists to catch up with the technology it’s trying to govern.

Link: https://techxplore.com/news/2025-06-reddit-sues-ai-giant-anthropic.html

Reddit sues Anthropic over Web scraping

June 5, 2025
Chong Ho Alex Yu

On June 5 2025, FutureHouse, a non-profit AI research startup supported by former Google CEO Eric Schmidt, has unveiled its latest innovation: a reasoning-capable AI model called ether0, aimed at transforming the scientific research process. Unlike earlier language models that learned from vast corpora of chemistry literature, ether0 was trained uniquely — by taking over half a million chemistry test questions compiled from lab results and scholarly data. This distinctive training method allowed it to develop a deeper, more flexible kind of understanding. Rather than attempting to memorize chemical facts, ether0 learned to reason its way through complex problems, tracking its "train of thought" in natural language — a notable step toward transparency in AI decision-making.

To achieve this, the team started with a relatively compact large language model developed by Mistral AI, a French start-up. This base model is about 25 times smaller than DeepSeek-R1, a Chinese model previously celebrated for its reasoning capabilities. Thanks to its small size, ether0 can even run on a laptop, making it unusually accessible. Instead of just brute-forcing data into the model, FutureHouse encouraged it to learn by solving problems — a nod to the emerging AI paradigm of reasoning models, which aim to simulate understanding rather than pattern recognition. These models, such as DeepSeek-R1, promote internal dialogues and chain-of-thought reasoning, which studies have shown can significantly enhance problem-solving accuracy on complex scientific questions.

That’s my take on it:

Ether0’s performance is impressive — in some cases, it doubled the accuracy of top-tier models like GPT-4.1 and DeepSeek-R1 on specific chemistry tasks, despite using much less data. The model’s ability to infer properties of molecules it wasn’t explicitly trained on, like predicting molecular structures to match NMR spectra, marks a meaningful leap forward in AI reasoning.

The rise of ether0 highlights some deeper shifts in the trajectory of artificial intelligence. While much of the public conversation around AI is centered on the pursuit of artificial general intelligence (AGI), the scientific community appears to be leaning into a more focused vision — one where domain-specific AI systems, such as AI for science, play a leading role. Tools like ether0 are designed not to mimic broad human cognition but to master specialized tasks like hypothesis generation, molecular analysis, and experimental reasoning. This suggests that in the future, we may see general-purpose AIs and specialized systems coexisting, each fulfilling distinct roles in research, industry, and society.

At the same time, the global landscape of AI leadership is becoming more distributed. The dominance once held almost exclusively by U.S.-based companies like OpenAI and Google is being challenged by innovations from other regions. China’s DeepSeek-R1, Japanese Sakana, and France’s Mistral AI are clear examples of high-performing models that are shaping new AI frontiers. FutureHouse’s use of a compact Mistral model to create ether0 — and surpass larger models like GPT-4.1 in certain tasks — further underscores a second trend: bigger isn’t always better. As this case shows, smaller, more efficient models can deliver competitive or even superior results when carefully trained for specific reasoning tasks. This could signal a broader shift in AI development strategy — from scaling up endlessly to optimizing intelligently.

Link:https://www.nature.com/articles/d41586-025-01753-1

FutureHouse releases ether0, compact LLM for science

Perplexity Labs automate complex projects

June 3, 2025
Chong Ho Alex Yu

On May 29, 2025 Perplexity AI launched Perplexity Labs, a platform that empowers users to transform ideas into polished, interactive results, such as reports, spreadsheets, dashboards, interactive maps and even mini web apps, with minimal effort. The platform offers a wide range of functionalities by combining real-time research, code execution, and data visualization. Users simply provide a prompt or project idea, and Labs takes care of gathering information, analyzing data, and crafting visually engaging outputs—all within a unified workspace.

For example, educators and history enthusiasts can request an interactive map detailing the Pacific Theater during World War II, allowing users to explore key battles, timelines, and outcomes with just a few clicks. Investors and finance professionals can generate dynamic dashboards that compare the 5-year performance of a traditional stock portfolio against an AI-powered strategy that adapts to market sentiment and macroeconomic trends. For those interested in global economics, Labs can create an interactive “Global Economic Indicator Tracker” dashboard, monitoring real-time data from around the world and providing actionable insights for portfolio management.

Overall, Perplexity Labs is designed to make advanced data analysis and visualization accessible to anyone, regardless of technical background, by automating the heavy lifting and delivering results that are both insightful and easy to interact with. At the present time, this tool is available to Perplexity Pro subscribers.

That’s my take on it:

As of June 2025, OpenAI, Google Gemini, and Anthropic’s Claude each offer advanced AI tools with some overlapping features, but none provides a direct equivalent to Perplexity Labs’ all-in-one project automation and interactive visualization platform. Nonetheless, based on the current trajectory of OpenAI, Google Gemini, and Anthropic’s Claude, it is highly likely that these platforms will introduce features or platforms similar to Perplexity Labs in the near future.

The ease and automation brought by advanced AI tools like Perplexity Labs raise important questions about the impact on human work ethics, cognitive capability, and creativity. Creativity and critical thinking increasingly recognized as uniquely human traits that AI cannot fully replicate. As AI takes over analytical and routine tasks, the focus on human creativity and judgment becomes more pronounced. Educators are encouraged to nurture these qualities, ensuring that AI serves as a tool to amplify, rather than replace, human ingenuity. However, there is a risk that if creativity and critical thinking are not actively cultivated, they may be undervalued or sidelined in a highly automated environment.

Link: https://www.perplexity.ai/hub/blog/introducing-perplexity-labs

★★★★★

Different strategies for AI robotics in China and US

May 31, 2025
Chong Ho Alex Yu

The 2025 IEEE International Conference on Robotics and Automation (ICRA), held from May 19 to 23 at the Georgia World Congress Center in Atlanta, marked a historic milestone as the largest gathering in the conference's history. Notable award-winning research included advancements in robot learning, human-robot interaction, and medical robotics. The exhibition floor highlighted cutting-edge technologies, including the Gecko robot designed for wall and pipe inspections, and the "Arts in Robotics" program emphasized the intersection of robotics with creative disciplines through performances and installations. ICRA 2025 not only underscored significant technical achievements but also fostered global collaboration and inclusivity, setting a precedent for future conferences.

Importantly, China's robotics industry is experiencing rapid growth, with over 190,000 robotics-related companies registered last year and 44,000 more added since the start of 2025. The government aims to mass-produce humanoid robots by 2025, targeting a globally competitive industrial ecosystem worth $43 billion by 2035.

This surge is fueled by a combination of factors: robust government support, a vast and efficient supply chain, and the integration of technologies from the electric vehicle sector. Major players like Unitree Robotics, AgiBot, and UBTech are leading the charge, with plans to produce thousands of units annually. For instance, UBTech's robots are already being tested in facilities of companies such as BYD and Nio. Analysts predict that the cost of producing humanoid robots could halve by 2030, making them more accessible for widespread adoption. With this momentum, China is positioning itself to potentially produce over half of the world's humanoid robots by 2025, solidifying its role as a global leader in this transformative industry.

That’s my take on it:

While the figure of 190,000+ registered robotics-related companies sounds staggering, not all of these are full-fledged robotics developers. A good chunk are likely involved peripherally—providing components, software, consulting, training, or even just riding the hype wave to attract funding. The robotics sector in China is going through what could be called a "gold rush phase", and like all gold rushes, many players will disappear once the hype cools down and the real work (and cost) of scaling kicks in.

In contrast to China’s vast and rapidly growing ecosystem, the United States has a much more concentrated robotics landscape, with around 600 robotics suppliers nationwide and only about 55 companies recognized as notable players—among them Boston Dynamics, iRobot, and Intuitive Surgical. While this number may seem modest compared to China’s 190,000+ registered robotics-related firms, the U.S. approach might reflect a more strategic allocation of resources. By focusing capital, talent, and innovation within a smaller pool of highly capable companies, the U.S. may be better positioned to develop breakthrough technologies and commercially viable solutions.

Link:

https://roboticsandautomationnews.com/2025/05/30/record-breaking-icra-2025-highlights-robotics-breakthroughs-and-top-research-awards/91370/#:~:text=The%202025%20IEEE%20International%20Conference,technical%20presentations%20across%20326%20sessions.

man in black and gray suit action figure

DSML trend: Google introduces Veo 3, priced at $249.99/m

May 23, 2025
Chong Ho Alex Yu

Google has just unveiled Veo 3, a groundbreaking leap in AI-driven video generation. Unlike its predecessors, Veo 3 doesn't just craft stunning visuals from text prompts—it now seamlessly integrates synchronized audio, including dialogue, ambient sounds, and music, bringing a new level of realism to AI-generated content. This means characters not only move convincingly but also speak with accurate lip-syncing, making the generated videos eerily lifelike.

Developed by Google DeepMind, Veo 3 excels in translating complex prompts into coherent, cinematic scenes, complete with realistic physics and nuanced audio. Whether it's a stand-up comedy routine with audience laughter or a historical reenactment with period-accurate soundscapes, Veo 3 delivers with impressive fidelity. Currently, Veo 3 is available to U.S.-based users through Google's $249.99/month AI Ultra subscription plan and to enterprise customers via the Vertex AI platform.

That’s my take on it:

Tools like Veo 3, priced at $249.99/month, are clearly out of reach for most individuals, especially casual creators, students, and people in lower-income regions. In the short term, this definitely contributes to the digital divide. Those who can afford access to cutting-edge AI tools will have a serious creative and economic edge—think faster content production, higher-quality marketing materials, better media reach, etc. It’s a classic case of "the rich get richer."

Historically, though, we’ve seen tech costs come down significantly over time. For example, when personal computers were introduced in the 1980s, it costed thousands of dollars. Now A decent laptop or smartphone, often more powerful than early supercomputers, is available for a few hundred dollars. AI will likely follow a similar curve. As the technology matures, infrastructure gets more efficient, competition increases, and cloud-based access becomes more scalable, the price will probably drop. The big unknown is the speed—but if history is the guide, 7 years or less is a reasonable bet for mass accessibility.

Link: https://deepmind.google/models/veo/

the letters are made up of different shapes

Anthropic Launches Claude Sonnet 4 and Claude Opus 4

May 22, 2025
Chong Ho Alex Yu

Today (May 22, 2025) Anthropic has officially released two major updates: Claude Sonnet 4 and Claude Opus 4. These models mark a significant step forward in large language model (LLM) development.

Claude Opus 4 is positioned as an all-purpose AI assistant, capable of answering everyday questions and handling common tasks. It’s being touted as the world’s most advanced coding model, particularly effective at managing complex, long-running tasks and structured agent workflows.

Meanwhile, Claude Sonnet 4 is engineered for even more sophisticated use cases. It's a direct upgrade from Claude Sonnet 3.7 and features substantial improvements in both reasoning and coding capabilities. It’s more precise in interpreting and following user instructions and excels at solving complex challenges.

A standout enhancement in both models is their ability to interleave reasoning with tool use—essential for tackling multi-step problems. They now support extended thinking by dynamically switching between logical inference and external tools to improve response quality.

Another notable update: when given local file access, both models can now extract and store key facts in local 'memory files.' This allows them to maintain continuity across sessions and build a kind of “tacit memory” over time. Additionally, Anthropic has introduced parallel tool use and upgraded the models’ ability to follow nuanced instructions.

That’s my take on it:

I put both models to the test, and the results were genuinely impressive.

I first gave Claude Sonnet 4 a conceptual prompt: What’s the connection between the frequentist school of statistics and data science/machine learning? The model returned a comprehensive and spot-on analysis. It discussed topics like optimization theory, regularization techniques, asymptotic theory, cross-validation, and information theory, among others. The response was detailed, accurate, and clearly structured.

Next, I uploaded a dataset to Claude Opus 4 and asked it to perform multiple tasks: OLS regression using dummy coding, generalized regression, and a decision tree model, followed by a model comparison. The execution was smooth and correct. However, one limitation stood out—the output was entirely text-based. The decision tree, for instance, was represented using plain text symbols rather than a visual graphic like those produced by JMP Pro, SAS, SPSS, or JASP. Thus, it's not quite ready to replace conventional statistical software—at least not yet.

That said, Claude does something those tools typically can’t: It interprets results, writes up findings, and even offers thoughtful recommendations. For example:

“Individuals with middle or higher SES show approximately 0.39 points lower involvement compared to those with lower SES (p = 0.027)… All models explain only about 10% of the variance in involvement, suggesting: Important predictors may be missing from the analysis, the relationship between these demographics and involvement is weak, or involvement may be driven more by psychological or situational factors…The decision tree hints at interaction effects that could be formally tested in future analyses.”

It writes like a consultant, only faster and cheaper. Honestly, with Claude doing all this, I might be out of a job soon!

Link: https://www.anthropic.com/news/claude-4

a computer generated image of the letter a

Where is AI Going?

May 21, 2025
Chong Ho Alex Yu

In the article titled "Has AI Hit a Lull?" published on May 21, 2025, CNN commentator Fareed Zakaria explores the current state of artificial intelligence, highlighting its dramatic highs, troubling lows, and emerging signs of stagnation in mainstream adoption. On the upside, AI has achieved notable breakthroughs, such as Google’s medical chatbot outperforming doctors and AI-generated art featured at the Museum of Modern Art. These examples reflect AI’s transformative potential, particularly in healthcare and the arts. However, serious downsides are also surfacing. AI is being weaponized for scams, misinformation, and even deepfake pornography—issues that have already spurred legislative responses. Additionally, the internet is increasingly cluttered with low-effort, AI-generated “slop” content, which, while emotionally charged and highly shareable, undermines information quality.

Then there’s the awkward middle ground—useful but error-prone AI, like Google's search “AI Overview,” which famously suggested eating glue and rocks. Most crucially, AI may be stalling economically. The Economist reports a significant rise in companies abandoning AI pilot projects, as real-world integration proves tougher than expected. Many firms, disillusioned, now find they need practical tools rather than ever more powerful models. The result? A noticeable lull in the AI boom, as hype gives way to the hard work of implementation.

That’s my take on it:

What Fareed described about AI today isn't new—it’s part of a recurring pattern, likely rooted in human nature. Every time a breakthrough technology appears, we see the same cycle: some people use it for meaningful innovation, others generate low-quality output, and a few exploit it for unethical gains. During the internet boom, pioneers like Amazon and eBay redefined commerce, while shady online casinos and adult sites spread rapidly. When Adobe launched PageMaker and Photoshop, creative publishing was democratized—but the flood of awkward, poorly designed work led to collections like “Photoshop Disasters.”

Now, with AI, we’re seeing the same pattern play out. Groundbreaking applications coexist with scammy schemes, deepfake chaos, and a flood of low-effort content clogging up our feeds. It can feel chaotic—maybe even discouraging—but this is how technological progress tends to unfold. There’s always noise before clarity, confusion before mastery. Still, this messiness is the price of progress. True transformation doesn’t come without trial and error, missteps, and the gradual process of learning how to wield new tools wisely.

Link: To view the full text, you need to sign up for Fareed’s Global Briefing Newsletter

https://www.cnn.com/newsletters/fareeds-global-briefing

blue and black ball on blue and white checkered textile

China is stockpiling GPUs and chip tools

May 14 2025
Chong Ho Alex Yu

China is actively gearing up for the ongoing and future AI rivalry with the United States by stockpiling GPUs and advanced chipmaking tools. In response to the latest U.S. restrictions that tighten global access to Huawei Technologies’ AI chips, Chinese tech giant Tencent Holdings announced on May 14, 2025 that it has a sufficient reserve of previously acquired high-end chips to continue training its AI models “for a few more generations.” The company is also focusing on enhancing the efficiency of AI inference, including through software-based optimizations.

Meanwhile, China set a new record in 2024 for foreign chipmaking equipment imports, underscoring its push to scale up domestic semiconductor production and build a strategic reserve of critical manufacturing tools amid escalating U.S.-China tensions. Of the $30.9 billion in imported equipment from major suppliers, nearly $20 billion came from Japan and the Netherlands. Notably, China imported $9.63 billion worth of equipment from Japan—a 28.23% increase year-on-year—marking the fifth consecutive record-setting year since tensions began intensifying in 2019.

This is my take on it:

China’s stockpiling strategy, while a break from the just-in-time (JIT) model common in high-tech industries, makes strategic sense in the current geopolitical climate. With escalating U.S. export controls and uncertain access to advanced chips, stockpiling GPUs and chipmaking equipment offers a buffer against supply shocks. However, this approach carries real risks. In fast-evolving sectors like AI and semiconductors, hardware can become obsolete quickly. Holding large inventories of older chips may backfire if future AI models require capabilities that outdated hardware can't support efficiently.

China’s tech giants, inspired by efforts like DeepSeek, believe they can still move forward using older GPUs through software optimizations, model distillation, and efficiency improvements. While such methods can stretch hardware utility, they often come with trade-offs in performance and scalability. Distilled models, for instance, may lose generalization power. Thus, while stockpiling offers short-term resilience, it is not a long-term solution. The success of this strategy ultimately depends on China’s ability to sustain software innovation and close the hardware gap through domestic R&D or alternative supply chains. Whether this gamble pays off remains to be seen.

Links:

https://asia.nikkei.com/Business/Technology/Tencent-says-chip-stockpile-can-power-AI-training-for-generations-despite-US-ban

https://asia.nikkei.com/Spotlight/Supply-Chain/Japan-Netherlands-win-as-China-s-chip-tool-imports-surge-on-US-tensions

Japan’s Sakana introduces continuous thought machine

May 13, 2025

On May 12 2025 Sakana AI introduced a fascinating concept called the Continuous Thought Machine (CTM). The CTM is a new kind of neural network architecture that mimics how biological brains process information—not just in terms of structure, but in how neurons behave over time. Traditional AI models, like Transformers, process inputs in fixed layers and steps. CTMs, on the other hand, introduce two key innovations:

Neuron-Level Temporal Processing: Each artificial neuron retains a short history of its previous activity and uses that memory to decide when to activate again. This allows neurons to consider historical information, not just immediate input, making their activation patterns more complex and diverse—closer to how biological neurons work.
Neural Synchronization: Instead of relying solely on the strength of connections (weights) between neurons, CTMs focus on the timing of neuron activations. This synchronization enables the model to process information in a more dynamic and coordinated manner, akin to the oscillatory patterns observed in real brains.

Together, these mechanisms allow CTMs to "think" through problems step-by-step, making their reasoning process more interpretable and human-like. Unlike conventional models that process inputs in a single pass, CTMs can take several internal steps—referred to as "ticks"—to reason about a task, adjusting the depth and duration of their reasoning dynamically based on the complexity of the input.

That’s my take on it:

CTMs represent a significant shift from traditional AI models by incorporating temporal dynamics and synchronization at the neuron level. This approach could lead to more flexible and efficient AI systems that better mimic human cognition.

Sakana AI is based in Tokyo, but its founders are globally known ex-Googlers. David Ha is the former head of research at Stability AI and a former Google Brain researcher, whereas Llion Jones is one of the co-authors of the original Transformer paper, “Attention Is All You Need.”

The big question is: Can Japan Compete in a US/China-Dominated AI Market? Japan doesn’t have the equivalents of OpenAI, Google, Meta, or Baidu. Its top tech companies (like Sony, NEC, Fujitsu) aren't leading in large-scale foundational models. Further, Japanese research has historically been strong in hardware, robotics, and manufacturing, but AI software innovation has lagged behind.

Nevertheless, Sakana AI is already attracting top-tier international researchers because it's building a focused, experimental, and minimalist research culture. It may become a kind of "AI Kyoto"—like what Kyoto Animation is to anime. Rather than chasing ever-larger LLMs like GPT-4 and beyond, Sakana is innovating in how models reason, not just how big they are. That could become a niche advantage.

Note: "Sakana" means “Fish” in Japanese. In Chinese it sounds like 山旮旯（山卡啦）

Link: https://sakana.ai/ctm/

Investigative journalism by Web scraping

May 9, 2025

On May 5, 2025, Julius Černiauskas published a thought-provoking article titled “Behind the Scenes of Using Web Scraping and AI in Investigative Journalism.” The summary is as follows:

While investigative journalism often conjures images of hidden sources and undercover work, many compelling stories begin with publicly available information—data hiding in plain sight. This is where web scraping, the automated extraction of online data, has become indispensable. It's not only a method for gathering facts quickly, but also a powerful tool for holding institutions accountable, revealing data manipulation, and uncovering misconduct. For instance, data scraping tools exposed that 38,000 articles about the war in Ukraine, all published in a single year, were attributed to the same supposed “journalist,” helping real reporters debunk fake journalism and identify inauthentic authorship.

Despite common misconceptions that web scraping is shady, journalists—including nonprofit newsroom The Markup—have actively defended it, even at the U.S. Supreme Court, arguing that it’s critical to a functioning democracy. In tandem, artificial intelligence is amplifying what journalists can do with scraped data, from sifting through massive document troves to spotting anomalies and generating leads. Even those without coding skills can now use no-code tools like browser extensions to engage in data-driven storytelling. Yet, ethical concerns remain front and center. Journalists must use discretion when gathering and storing data, particularly when anonymity is vital, such as monitoring the dark web. Trained AI systems can assist with filtering sensitive content, but final editorial decisions must always lie with human professionals. Ultimately, the fusion of AI and web scraping empowers investigative reporters to uncover meaningful truths in a sea of digital noise, transforming journalism in the data age.

That’s my take on it:

On one hand, web scraping unlocks access to vast amounts of public information, making it a critical tool for uncovering patterns, inconsistencies, or outright manipulation, like the case of the fake Ukraine war journalist. On the other hand, robots.txt files and similar exclusion tags give website owners a way to block automated scraping, whether for reasons of privacy, intellectual property, or security. Simply put, opt-out mechanism can be used to hide things from scrutiny.

This creates a structural asymmetry: those who have something to hide—or simply the means and awareness to deploy these exclusion tags—can wall off their content from automated analysis, while less technically-guarded or smaller sites remain open. In turn, this can skew investigations by making some patterns invisible and some actors untouchable. It also means that bad-faith players who understand how to manipulate these rules can fly under the radar, especially if journalists adhere strictly to ethical or legal boundaries around scraping.

There's also the valid concern about intellectual property and content ownership. Just because something is publicly viewable doesn't mean it’s legally or ethically scrapeable. This is especially tricky when it comes to original reporting, personal blogs, or creative work, where scraping for republishing or mass analysis feels exploitative rather than investigative.

As such, scraping-based journalism can be incomplete or biased, especially when key data sources opt out—whether to hide shady activity or to protect legitimate rights. That’s why transparency in methodology is so important. Responsible journalists often disclose the scope and limits of their data collection, highlighting what they could and couldn’t access. And it also points to a larger issue: technology alone isn't enough—a thoughtful, skeptical human must still decide what the data really means and where the blind spots lie.

Link: https://hackernoon.com/behind-the-scenes-of-using-web-scraping-and-ai-in-investigative-journalism

Huawei delivers advanced AI chips as substitutes of Nvidia’s chips

May 1, 2025

Huawei is rapidly emerging as a key player in the AI chip market, having begun deliveries of its advanced AI "cluster" system, CloudMatrix 384, to domestic clients in China, according to the Financial Times. This development comes in response to growing U.S. export restrictions that have made it increasingly difficult for Chinese companies to acquire Nvidia’s high-end semiconductors. Huawei has reportedly sold over ten units of the CloudMatrix 384, a system that links together a large number of AI chips, and these have been shipped to data centers supporting various Chinese tech firms.

Dylan Patel, founder of SemiAnalysis, stated that CloudMatrix 384 is capable of outperforming Nvidia’s flagship NVL72 cluster in both computational power and memory. Despite some drawbacks—namely higher power consumption and more complex software maintenance—CloudMatrix is seen as a viable and attractive alternative, especially given China’s deep engineering talent pool and ample energy resources. This marks a significant strategic shift as China looks to reduce its dependence on Western AI hardware.

That’s my take on it:

The CloudMatrix 384 consumes nearly four times more power than the NVL72, leading to lower energy efficiency. Despite this, in regions like China where power availability is less constrained, the higher energy consumption is considered an acceptable compromise for the increased computational capabilities.

Based on the current trend, it is unlikely that Huawei's technology can catch up Nvidia’s in the near future. Nvidia isn’t just a chipmaker—it’s an ecosystem. It dominates the AI space not only with its hardware (e.g., H100) but also with its software stack (CUDA, cuDNN, TensorRT, etc.). These tools are mature, widely adopted, and deeply integrated into enterprise and research workflows.

But don’t forget that in the '80s, Japan’s chipmakers like NEC, Toshiba, and Hitachi managed to outcompete U.S. firms like Intel in DRAM by focusing on quality control, manufacturing efficiency, and aggressive investment. While Nvidia leads now, that lead isn’t invincible.

Link: https://www.ft.com/content/cac568a2-5fd1-455c-b985-f3a8ce31c097?accessToken=zwAAAZcgU2HwkdPKxWiiX9FFXNO5hfOozjHAlwE.MEQCIASnmNkxJzppNfWifnU4F8NIZHhvb-dI-uQ92OJ4P8egAiAKodKrU6w-8_cmYRzPi54ClKa2rBh2XKAP-t6iAFKwCw&segmentId=cac568a2-5fd1-455c-b985-f3a8ce31c097

Microsoft predicts everyone will be a boss of AI employees

April 25, 2025

Microsoft recently unveiled a bold vision for the future of work, predicting a shift where every employee becomes an “agent boss,” managing AI agents that perform many of their daily tasks. In Microsoft's 2025 Work Trend Index, they describe how organizations will evolve into what they call "Frontier Firms"—entities that rely on AI-powered teams blending humans and autonomous digital agents. These frontier firms are expected to operate with heightened agility, on-demand intelligence, and scalable workflows, fundamentally reshaping traditional corporate structures.

This transformation is described in three progressive phases. First, employees will work alongside AI assistants, using tools like Copilot to help draft emails, summarize meetings, or organize information. The second phase introduces digital colleagues—AI agents capable of more sophisticated, semi-independent tasks under human supervision. Finally, companies will move into a world of autonomous agents, where AI systems handle entire projects and business processes, with humans overseeing their performance and ensuring alignment with company goals.

A major driver behind this change is what Microsoft calls the "capacity gap." Their research shows that 80% of employees feel overwhelmed by their workload, while more than half of corporate leaders believe their organizations must boost productivity to stay competitive. AI agents are positioned as the solution to bridge this gap, allowing human workers to offload routine work and refocus on complex, strategic, and creative initiatives.

However, the rise of AI agent bosses brings both opportunities and challenges. Job roles will inevitably shift. While some traditional jobs may be displaced, new categories such as AI agent trainers, performance auditors, and digital project managers will emerge. Organizations will also have to rethink team dynamics—balancing human ingenuity with machine efficiency to optimize output. Skill development will be critical: employees must learn how to manage, delegate to, and collaborate with AI agents effectively to succeed in this future landscape.

To prepare for this new reality, Microsoft suggests a proactive approach: fostering a culture of continuous learning, encouraging symbiotic human-AI collaboration, and establishing ethical frameworks for AI use. Strategic planning and adaptability will be essential as companies embrace the capabilities of AI while mitigating potential risks like job displacement and decision opacity.

That’s my take on it:

Ultimately, Microsoft's vision of "agent bosses" reflects not just a technological evolution, but a fundamental reimagining of the workplace itself. Those who can adapt, develop the right skills, and rethink traditional work processes will likely thrive in this AI-augmented future.

However, if we really follow Microsoft's logic (and similar visions from OpenAI, Google DeepMind, Anthropic, etc.), the future is less about personal stockpiles of skills or raw knowledge, and more about the "amplification" you get through your AI “employees” or teammates. The new premium will be on who has better AI agents, and who knows how to direct them effectively. It's almost like the future is a "race of symbiosis" — the best human-AI partnerships will win, not just the best humans.

Even if AI becomes the "great equalizer" by making knowledge universally accessible, it also amplifies differences in how creatively and strategically people use it. Think about the Industrial Revolution: it wasn’t the strongest worker who became richest — it was the person who had access to the best machines and knew how to operate them smartly.

Links:

https://www.theguardian.com/technology/2025/apr/25/microsoft-says-everyone-will-be-a-boss-in-the-future-of-ai-employees

https://www.msn.com/en-us/news/technology/meet-your-new-ai-teammate-microsoft-sees-humans-as-agent-bosses-upending-the-workplace/ar-AA1DsNeY

Bigger may not be better – Inflection point of LLMs

April 18, 2025

A recent study by researchers from Carnegie Mellon, Stanford, Harvard, and Princeton suggests that over-training large language models (LLMs) may actually make them harder to fine-tune. Contrary to the common belief that more training leads to better performance, the team found diminishing returns—and even performance degradation—when they trained two versions of the OLMo-1B model with different token counts. One version was trained on 2.3 trillion tokens, and the other on 3 trillion. Surprisingly, the more heavily trained model performed up to 3% worse on evaluation benchmarks like ARC and AlpacaEval. This led the researchers to identify a phenomenon they call "catastrophic overtraining," where additional training causes the model to become increasingly sensitive to noise introduced during fine-tuning. They describe this growing fragility as "progressive sensitivity," noting that beyond a certain "inflection point," further training can destabilize the model and undo prior gains. To validate this, they introduced Gaussian noise during fine-tuning and observed similar drops in performance. The takeaway is clear: training beyond a certain threshold may reduce a model's adaptability, and developers may need to rethink how they determine optimal training duration—or develop new methods that extend the safe training horizon.

That’s my take on it:

For years, the dominant belief in large language model (LLM) development has been that increasing model size and training data leads to better performance—a view supported by early scaling law research (e.g., OpenAI's and DeepMind's work). The study conducted by CMU, Stanford, Harvard, and Princeton counter-argues that bigger may not be better. There are other studies concurring with this finding. In another study, even in smaller models (1B–10B), researchers have observed what they sometimes call “loss spike” behavior—where longer training actually causes performance drops, particularly in out-of-distribution generalization. That lines up with this idea of an “inflection point” the paper describes.

The key question is: “Where is the inflection point?” or “How much is too much?” Perhaps there’s no universal threshold. Some researchers are exploring ways to detect it, including tracking validation loss trends, fine-tuning adaptability at various checkpoints, analyzing gradient noise, and probing noise sensitivity (e.g., via Gaussian perturbations). Some even use loss landscape analysis or generalization curves to flag when models start to become brittle. Perhaps future progress in LLMs may depend less on pushing scale and more on training efficiency, model robustness, and smarter tuning strategies. Instead of asking “how big can we go?” we might now ask “how far should we go before it starts breaking things?”

Link: https://arxiv.org/abs/2503.19206

a neon neon sign that is on the side of a wall

Open AI released o3 that can “think with images”

April 17, 2025

On April 16, 2025 OpenAI released its latest o3 model that is capable of understanding uploaded images like photos, whiteboards, sketches and diagrams, even if they’re low quality. Unlike Google Search that compares your image against the archives on the Internet, ChatGPT analyzes the image by pattern recognition. It was trained on huge datasets of text and images, and it can answer the question by “thinking and reasoning” rather than retrieval from databases.

1. Higher Visual Accuracy

It’s better at identifying objects, scenes, and text in images, including handwritten content, charts, technical diagrams, UI screenshots, etc.
Handles blurry, low-res, or partially cropped images much better than before.

2. Detailed Interpretation

You can ask complex questions about an image—e.g., “What’s the logical flaw in this chart?” or “What programming language is this code snippet written in?”
For documents or presentations, it can extract key points, summarize, or answer questions about them.

3. Math Diagrams & Graphs

Handles geometry diagrams, graphs, and math problems from images with higher accuracy.
Can interpret axes, functions, labels, and make connections like: "This is a sine function over one period, amplitude 3..."

4. Multi-step Reasoning from Images

For example: Upload a circuit diagram and ask for its function—it’ll walk you through how the circuit works.
Or: Upload a data table image and ask for trends, outliers, and suggestions.

5. Image Editing Requests

While actual editing (like changing colors, removing objects, cartoonizing a photo) uses a different internal tool, you can describe what you want, and I can get it done for you quickly.

That's my take on it:

As a photographer, I am interested in whether o3 can identify the origin of the image. I uploaded “test_image1” and asked the following question: “What is the ethnicity or cultural background of the person on the uploaded image?”

a white wall with a black and gray stripe

The following is part of the answer, and it is accurate.

The person in the photo appears to be performing or portraying aspects of Māori culture (the Indigenous Polynesian people of New Zealand). Here's what stands out:

1. Facial Expression & Tongue Out

This tongue-out expression, wide eyes, and intense stare is iconic of the Māori haka—a traditional war dance used to intimidate opponents and show strength, unity, and pride.

2. Facial Markings (Tā Moko Style)

The black markings resemble Tā moko, the traditional permanent tattoos of the Māori, though in this case, it may be painted for performance.
Tā moko represents genealogy, status, and personal history—it's deeply symbolic.

3. Feathers & Headband

The feathered headgear and decorative band are often part of ceremonial Māori attire, especially during cultural performances.

4. Background & Aesthetic

The swirling koru-style patterns in the background (spirals based on the unfurling fern frond) are also rooted in Māori art and symbolism, representing growth, strength, and peace.

I uploaded an image of a rare plant and asked this question: “What is the name of the plant? Where can I find it?”

The following is part of the answer. Again, it is correct.

Plant ID: Strongylodon macrobotrys

Common name: Jade Vine

This flower is part of the pea family (Fabaceae) and is native to the tropical forests of the Philippines. That surreal turquoise-blue bloom is its signature look—almost alien-like in color and shape.

You’ll typically see it in:

Native Habitat:

Philippines – especially in Luzon, Mindoro, and Leyte in rainforests.

Botanical Gardens Worldwide:

Singapore Botanic Gardens
Hawaii Tropical Botanical Garden
Kew Gardens (UK)
Fairchild Tropical Botanic Garden (Florida)
Costa Rica, Thailand, and other humid tropical conservatories

I tried to trick ChatGPT by uploading something more challenging. After uploading “test_image3”, I asked this question: “Where is the beach? What mineral is the sand composed of?”

I dropped my jaw when I saw the answer below (partial):

Location: Papakōlea Beach, Hawaii

Also known as Green Sand Beach, it’s located near South Point (Ka Lae) on the Big Island of Hawaii. One of only four green sand beaches in the entire world, this place is a geological gem.

What Makes the Sand Green?

The green tint comes from a mineral called olivine.

Mineral Breakdown:

Olivine is a silicate mineral, high in magnesium and iron.
It’s a primary component of Hawaiian volcanic rock, particularly basalt.
Olivine is super dense, so when the volcanic rock erodes, lighter materials wash away, and the heavier olivine crystals stay behind—accumulating to form green sand.

Fun fact:

Olivine is sometimes referred to as “Hawaiian peridot”, especially when gem-quality.

Where Else Can You Find Green Sand Beaches?

Besides Papakōlea, the others are:

Talofofo Beach, Guam
Hornindalsvatnet Lake, Norway (not oceanic)
Punta Cormorant, Floreana Island, Galápagos (Ecuador)

But Papakōlea is easily the most famous.

The emergence of multimodal AI, such as ChatGPT-4o, represents a significant leap forward for researchers working with unstructured data. For example, Traditionally, fieldwork done by anthropologists and geologists involves the collection of photographs, notes, sketches, audio, and interviews—materials rich in cultural meaning but inherently unstructured and difficult to analyze at scale. Interpretation has long relied on time-consuming manual review and expert knowledge, often limiting the speed and scope of pattern recognition across regions or cultures. Now, with tools like ChatGPT-4o, researchers can upload images of artifacts, tattoos, rituals, architectural elements, or symbolic markings and immediately receive context-aware insights.

Gemini 2.5 Pro outperforms all existing AI models!

March 28, 2025

On March 25, 2005 Google released Gemini 2.5, its latest AI model that outperforms all other existing AI models by all major benchmarks. Specifically, Google's Gemini 2.5 Pro has demonstrated superior performance compared to other leading AI models, including OpenAI's ChatGPT and DeepSeek's offerings, across various benchmarks.

Key Features of Gemini 2.5 Pro:

Enhanced Reasoning Abilities: Gemini 2.5 Pro is designed as a "thinking model," capable of processing tasks step-by-step, leading to more informed and accurate responses, especially for complex prompts. This advancement allows it to analyze information, draw logical conclusions, and incorporate context effectively.
Advanced Coding Capabilities: The model excels in coding tasks, including creating visually compelling web applications, agentic code applications, code transformation, and editing.
Multimodal Processing: Building upon Gemini's native multimodality, 2.5 Pro can interpret and process various data forms, including text, audio, images, video, and code. This versatility enables it to handle complex problems that require integrating information from multiple sources.
Extended Context Window: The model ships with a 1 million token context window, with plans to expand to 2 million tokens soon. This extensive context window allows Gemini 2.5 Pro to comprehend vast datasets and manage more extensive data, enhancing its performance in tasks requiring long-term context understanding.

That’s my take on it:

In 1968 American artist Andy Warhol predicted that "In the future, everyone will be world-famous for 15 minutes". This quote expresses the concept of fleeting celebrity and media attention. In the age of generative AI, Andy Warhol’s prophecy echoes louder than ever: every model is famous for 15 minutes. AI has been growing at the pop culture speed. Models are celebrities. They rise fast, trend for a moment, then get dethroned.

January 2025: DeepSeek-VL and R1 stunned everyone—especially with open weights and insane capabilities in reasoning and math.
Early February: OpenAI fired back with o3 (internally believed to be GPT-4.5), nudging the bar higher.
Late Feb/Early March: Qwen 2.5 enters and crushes multiple leaderboards, especially in multilingual and code-heavy tasks.
March 2025: Gemini 2.5 Pro drops and suddenly becomes the new benchmark king in reasoning, long-context, and multi-modal tasks.

This is not just fast-paced—this is accelerating. Each "champion" barely holds the crown before someone new comes knocking. Just like any other tech curve (e.g., Moore’s Law for chips), AI can't grow infinitely in capability, speed, or intelligence without hitting some hard ceilings. But the key question is not if, but when—and what kind of plateau we will encounter. I will explore this next.

Link: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking

In the future, everyone will be world-famous for 15 minutes.

-Andy Warhol (1968)

New ChatGPT 4o image generator can output useful infographics

March 27, 2025

ChatGPT-4o’s image generation capabilities mark a major leap forward in AI creativity, blending high realism, smart prompt handling, and seamless editing tools in one powerful system. One of its standout strengths is photo-realistic fidelity — it renders textures, lighting, and detail with stunning clarity, often outperforming models like Midjourney or Stable Diffusion in visual accuracy.

It also has exceptional prompt comprehension, allowing users to describe complex, multi-layered scenes, styles, and emotions, and get results that align perfectly with their vision. Whether you want an anime character, a cyberpunk street scene, or a vintage oil painting, ChatGPT-4o switches styles effortlessly.

Another key advantage is its reference-aware editing — users can upload an image and make specific changes like altering backgrounds, adding objects, or modifying color tones. These edits blend in smoothly, avoiding awkward transitions or visual artifacts common in older tools.

Moreover, it handles spatial reasoning impressively. If you ask for a scene with specific object placement — like a vase to the left of a cat — it understands and respects composition accurately. This makes it ideal for design, storytelling, and visual planning tasks.

It also supports iterative workflows directly in the chat. You can request tweaks like “make the lighting softer” or “change the outfit to red,” and get updated versions quickly, without rewriting your prompt from scratch.

ChatGPT-4o further allows consistent visual output for characters or scenes across multiple images, perfect for comics or branding work. And with clean, high-resolution outputs, it minimizes distortion and maintains visual integrity even in fine detail.

The panel below shows a side-by-side comparison between the images created by 4o image generator and its previous version, DALL-E3.

That’s my take on it:

One of the standout strengths of the ChatGPT-4o image generator is its exceptional ability to produce technically accurate and visually effective infographics. While most AI generators excel at creating photorealistic images, 4o distinguishes itself by delivering visuals that are genuinely useful for educational and technical communication.

When I need to generate illustrations for topics in statistics or computing, tools like ReCraft and Ideogram often fall short. They tend to approximate the concept or struggle with textual accuracy. In contrast, 4o consistently produces infographics that are not only visually appealing but also presentation-ready and pedagogically sound.

For example, I tested the following prompt:
Example 1: “Illustrate Lambda smoothing in a scatterplot with data forming a nonlinear pattern. The illustration must be good enough for teaching purposes.”

As shown in the following comparison, the image generated by ReCraft includes nonsensical text and distorted elements (top), making it unusable for serious teaching. The 4o-generated image (bottom), however, is clean, precise, and visually intuitive — ideal for lectures or documentation.

Example 2: “Illustrate deep learning by emphasizing transformations inside multiple hidden layers in a neural network. Make the graph colorful and appealing.”

While Ideogram (see below, top) generated a visually pleasing layout, it lacked essential components like labels or explanatory structure. In contrast, 4o produced a textbook-style diagram (see below, bottom) with proper node icons, layer labels, and transformation highlights — exactly what you'd expect in professional slides or educational material.

In today's landscape, many AI tools can generate impressive imagery, but when it comes to high-quality, functional infographics, ChatGPT-4o is in a league of its own (see attached PDF. Please scroll down to view all).

Link: https://openai.com/index/introducing-4o-image-generation/

Investors are not excited by Nvidia's GTC 2025 Keynote

March 18, 2025

Nvidia's GPU Technology Conference (GTC) keynote, delivered by CEO Jensen Huang, took place on March 18, 2025, at the SAP Center in San Jose, California. The following are the key points:

1. Next-Generation AI Chips:

Blackwell Ultra: Scheduled for release in the latter half of 2025, this GPU boasts enhanced memory capacity and performance, offering a 1.5x improvement over its predecessors.
Vera Rubin: Named after the renowned astronomer, this AI chip is set to launch in late 2026, followed by Vera Rubin Ultra in 2027. These chips promise substantial performance gains and efficiency improvements in AI data centers.

2. AI Infrastructure and Software:

Nvidia Dynamo: An open-source inference software system designed to accelerate and scale AI reasoning models, effectively serving as the "operating system of an AI factory."

3. Robotics and Partnerships:

'Blue' Robot: Developed in collaboration with Disney Research and Google DeepMind, this robot showcases advancements in robotics technology and a new physics engine called Newton.
General Motors Collaboration: Nvidia is partnering with GM to integrate AI systems into vehicles, factories, and robots, aiming to enhance autonomous driving capabilities and manufacturing processes.

4. AI Evolution and Future Outlook:

Agentic AI: Huang highlighted the progression of AI from perception and computer vision to generative and agentic AI, emphasizing its growing ability to understand context, reason, and perform complex tasks.
Physical AI: The next wave of AI involves robotics capable of understanding physical concepts like friction and inertia, with Nvidia introducing tools like Isaac GR00T N1 and the evolving Cosmos AI model to facilitate this development.

That’s my take on it:

Despite these advancements, Nvidia's stock experienced a 3.4% decline during the keynote. The announcements, while significant, were perceived as extensions of existing technologies rather than disruptive innovations. While Nvidia continues to innovate, the emergence of efficient and cost-effective AI models from Chinese companies is reshaping the competitive landscape.

Further, the partnerships between Nvidia, Disney, and GM are not exciting at all. Disney is primarily an entertainment company rather than a technology leader. While they do invest in advanced CGI, theme park animatronics, and AI-driven personalization, they aren’t a dominant force in AI hardware or software. The company has faced backlash over diversity and inclusion policies, especially regarding recent film releases like Snow White. This controversy might make Disney a less attractive partner from a PR perspective, particularly if Nvidia is looking to impress a broader tech audience.

While GM is one of the biggest automakers in the U.S., it has struggled to keep pace with Tesla and BYD in the EV and autonomous driving sectors. Tesla’s Full Self-Driving (FSD) is already on the road, and BYD dominates China’s EV market with highly cost-effective solutions. GM’s self-driving unit Cruise has faced setbacks, including safety issues and regulatory scrutiny, leading to a halt in operations in multiple cities. This tarnishes GM’s image as a leader in AI-powered mobility. In my opinion, these partnerships aren’t groundbreaking.

Link: https://www.youtube.com/watch?v=erhqbyvPesY

China’s ERNIE 4.5 is priced at 1% of GPT-4.5’s cost

March 18, 2025

Recently Baidu has launched ERNIE 4.5 and ERNIE X1, two new AI models focused on multimodal capabilities and advanced reasoning, respectively.

Performance & Benchmarks: Baidu claims these models outperform DeepSeek V3 and OpenAI’s GPT-4.5 on third-party benchmarks like C-Eval, CMMLU, and GSM8K.
Cost Advantage: ERNIE 4.5 is 99% cheaper than GPT-4.5, and ERNIE X1 is 50% cheaper than DeepSeek R1, emphasizing aggressive market positioning.
ERNIE X1 Capabilities: Designed for complex reasoning and tool use, it supports tasks like advanced search, document Q&A, AI-generated image interpretation, and code execution.
ERNIE 4.5 Capabilities: A multimodal AI optimized for text, image, audio, and video processing, featuring improved reasoning, generation, and hallucination prevention through FlashMask Dynamic Attention Masking and Self-feedback Enhanced Post-Training.

That’s my take on it:

Baidu's ERNIE 4.5 model is priced at approximately 1% of OpenAI's GPT-4.5 cost. It is an attractive option for businesses looking to cut AI expenses, especially in cost-sensitive markets like China, Southeast Asia, and emerging economies. Nevertheless, GPT-4.5 is widely recognized as the best-performing model in English, and OpenAI has a trust advantage among global businesses. OpenAI’s models are deeply integrated into Microsoft’s ecosystem, dominating enterprise AI adoption in the West.

Although ERNIE 4.5 is claimed to outperform GPT-4.5, independent benchmarks are still lacking. In addition, many U.S. and European companies might hesitate to adopt Baidu’s AI due to security concerns and data regulations. Further, Chinese LLMs, including ERNIE 4.5, operate under strict government regulations that enforce censorship on politically sensitive topics. This has major implications for freedom of information, research, and AI usability outside of China.

Link: https://venturebeat.com/ai/baidu-delivers-new-llms-ernie-4-5-and-ernie-x1-undercutting-deepseek-openai-on-cost-but-theyre-not-open-source-yet/

China’s AI model Manus evaluated by MIT Technology Review

March 12, 2025

The new AI agent Manus, developed by the Wuhan-based startup Butterfly Effect, has taken the AI world by storm since its launch on March 6, 2025. Unlike traditional chatbots, Manus operates as a general AI agent, leveraging multiple models, including Claude 3.5 Sonnet and Alibaba’s Qwen, to perform a variety of tasks autonomously. Simply put, it is capable of multi-tasking.

Despite the hype, access to Manus remains limited, with only a small fraction of users receiving invite codes. MIT Technology Review tested the tool and found it to be a promising but imperfect assistant, akin to a highly competent intern—capable but prone to occasional mistakes and oversights.

The reviewer conducted three tests:

Compiling a list of China tech reporters – Initially, Manus produced an incomplete list due to time constraints but improved significantly with feedback.
Finding NYC apartment listings – It required clarification for nuanced search criteria but eventually delivered a well-structured ranking.
Nominating candidates for Innovators Under 35 – The task was more challenging due to research limitations, paywall restrictions, and system constraints. The final output was incomplete and skewed.

Strengths:

Transparent, interactive process allowing user intervention
Strong performance in structured research tasks
Affordable ($2 per task, significantly cheaper than alternatives like ChatGPT DeepResearch)
Replayable and shareable sessions

Weaknesses:

Struggles with large-scale research, paywalls, and CAPTCHA restrictions
System instability and crashes under heavy load
Requires user guidance to refine results

While Manus is not flawless, it represents a significant step in AI autonomy, particularly in research and analysis. It underscores China’s growing role in shaping AI development, not just in model innovation but also in the practical implementation of autonomous AI agents.

Links:

https://www.youtube.com/watch?v=WTgkRitFKGs

https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/

March 11, 2025

Mistral AI, a leading French AI startup, is recognized as one of France’s most promising tech firms and the only European contender to OpenAI. Despite its impressive $6 billion valuation, its global market share remains modest.

A few days ago, the company launched its AI assistant, Le Chat, on mobile app stores, generating significant attention, particularly in France. French President Emmanuel Macron even endorsed it in a TV interview, urging people to choose Le Chat over OpenAI’s ChatGPT. The app quickly gained traction, reaching 1 million downloads in two weeks and topping France’s iOS free app chart.

Founded in 2023, Mistral AI champions openness in AI and positions itself as the “world’s greenest and leading independent AI lab.” Its leadership team includes ex-Google DeepMind CEO Arthur Mensch and former Meta AI researchers Timothée Lacroix and Guillaume Lample. The company’s advisory board includes notable figures like Jean-Charles Samuelian-Werve, Charles Gorintin, and former French digital minister Cédric O, whose involvement sparked controversy.

Despite its growth and strong funding, Mistral AI’s revenue is still in the eight-digit range, indicating it has significant ground to cover before becoming a true OpenAI rival.

That’s my take on it:

Mistral AI has the potential to become a serious competitor to OpenAI's ChatGPT, Anthropic's Claude, Google's Gemini, and other top AI models. The strained relationship between the U.S. and Europe, particularly during the Trump administration, has fueled a growing sense of technological sovereignty in Europe. As tensions over trade, defense, and digital policies deepened, many European nations—especially France—became increasingly wary of relying on American tech giants. This sentiment extends to AI, where European leaders and businesses are seeking alternatives to U.S.-dominated models like ChatGPT, Claude, and Google Gemini.

Mistral AI, as Europe’s most promising AI company, stands to benefit from this shift. French President Emmanuel Macron’s endorsement of Le Chat highlights a broader push for European-built AI solutions, reinforcing the region’s desire for independent innovation and data security. With strong government backing and a growing market of users eager to support local technology, Mistral AI could leverage this geopolitical rift to carve out a stronghold in Europe, challenging American AI dominance in the years to come.

However, Mistral AI still faces several challenges. Outside of France and Europe, brand recognition is still weak compared to OpenAI, Google, and Anthropic.

Link: https://techcrunch.com/2025/03/06/what-is-mistral-ai-everything-to-know-about-the-openai-competitor/

Mistral AI offers an alternative to US AI models for Europeans

Google’s AI co-scientist solved decade-long research problem within two days

Feb. 21, 2025

Google has introduced an "AI Co-Scientist," a sophisticated AI system designed to assist researchers in accelerating scientific discovery. Built on Gemini 2.0, Google’s latest AI model, the AI Co-Scientist can generate testable hypotheses, research overviews, and experimental protocols. It allows human scientists to input their research goals in natural language, suggest ideas, and provide feedback.

In an early demonstration, the AI Co-Scientist solved a complex scientific problem in just two days—a problem that had confounded researchers for over a decade. A notable test involved researchers from Imperial College London, who had spent years studying antibiotic-resistant superbugs. The AI Co-Scientist independently analyzed existing data, formulated the same hypothesis they had reached after years of work, and did so in a fraction of the time.

The system has shown promising results in trials conducted by institutions such as Stanford University, Houston Methodist, and Imperial College London. Scientists working with the AI have expressed optimism about its ability to synthesize vast amounts of evidence, identify key research questions, and streamline experimental design, potentially eliminating fruitless research paths and accelerating progress significantly.

This is my take on it:

The rapid advancement of AI in research and data analysis raises important questions about the future of statistical and data science education. As AI systems become more proficient at conducting analysis, traditional data analysts may face challenges in maintaining their relevance in the job market. Since AI models rely heavily on the quality of data, perhaps our focus should shift from analysis to data acquisition. Specifically, ensuring that students develop strong skills in data collection, validation, and preprocessing will be critical. Understanding biases in data, ethical considerations, and methods for ensuring data integrity will be more valuable than manually performing statistical calculations. In addition, while AI can analyze data, human judgment is required to interpret results in context, assess their implications, and make informed decisions. Thus, statistical and data science education should emphasize critical thinking, domain expertise, and the ability to translate insights into real-world applications.

Link: https://www.forbes.com/sites/lesliekatz/2025/02/19/google-unveils-ai-co-scientist-to-supercharge-research-breakthroughs/

Musk unveiled Grok 3, “the smartest AI on earth”

Feb. 18, 2025

Yesterday (2/17) Elon Musk unveiled Grok 3, the latest AI chatbot from his company xAI. This new version is designed to surpass existing chatbots like OpenAI's ChatGPT, boasting advanced reasoning capabilities that Musk describes as "scary-smart." Grok 3 has been trained using xAI's Colossus supercomputer, which utilizes 100,000 Nvidia H100 GPUs, providing 200 million GPU-hours for training—ten times more than its predecessor, Grok 2.

During the live demo, Musk highlighted Grok 3's ability to deliver "insightful and unexpected solutions," emphasizing its potential to revolutionize AI interactions. The chatbot is now available to X Premium Plus subscribers, with plans to introduce a voice interaction feature in the coming week.

That’s my take on it:

Elon Musk described Grok 3 as the "smartest AI on Earth." He stated that Grok 3 is "an order of magnitude more capable" than its predecessor, Grok 2, and highlighted its performance in areas like math, science, and coding, surpassing models from OpenAI, Google, and DeepSeek. However, it's important to note that these claims have not been independently verified.

According to "Huang's Law", proposed by Nvidia CEO Jensen Huang, the performance of AI and GPUs doubles every two years, driven by innovations in architecture, software, and hardware. Earlier this year, OpenAI released Deep Research that outperforms DeepSeek's R1 in specific tasks. For now, Grok 3 may be the most advanced AI on Earth, but how long will that last? In just a month or two, another company could unveil a model that outshines everything before it. Huang's Law is right!

Links:

https://www.livemint.com/ai/grok-3-launch-live-elon-musks-xai-smartest-ai-on-earth-today-sam-altman-openai-chatgpt-gemini-google-deepseek-11739810000644.html?utm_source=chatgpt.com

https://nypost.com/2025/02/18/business/elon-musks-xai-claims-grok-3-outperforms-openai-deepseek/?utm_source=chatgpt.com

an abstract photo of a curved building with a blue sky in the background

Tech giants aim to invest $325 billion in cloud computing

Feb. 7, 2025

Recently, Meta, Microsoft, Amazon, and Alphabet (Google's parent company) are projected to collectively invest $325 billion in capital expenditures and infrastructure in 2025, fueled by their ongoing commitment to expanding artificial intelligence capabilities.

Amazon: The company plans to allocate over $105 billion towards enhancing its AI infrastructure, primarily through Amazon Web Services (AWS). This investment aims to bolster AWS's capacity to meet the growing demand for AI-driven services.

Microsoft: Microsoft has already incorporated ChatGPT and other AI tools into Azure. It has outlined plans to invest approximately $80 billion in capital expenditures for its fiscal year 2025, ending in June. This marks an 80% increase from the previous year, reflecting the company's commitment to expanding its AI and cloud computing capabilities.

Google (Alphabet): Alphabet, Google's parent company, is set to invest around $75 billion in 2025 to support its AI and cloud infrastructure. Google Gemini is expected to power Google Cloud’s AI-driven applications.

That’s my take on it:

Cloud computing and AI are deeply interconnected because cloud platforms provide the necessary infrastructure for AI applications:

Massive Computing Power – AI models, particularly deep learning models like ChatGPT, require significant computational resources. Cloud platforms provide scalable GPU and TPU resources to train and deploy AI models efficiently.

Data Storage and Processing – AI depends on large datasets for training and inference. Cloud computing offers scalable and secure storage solutions, along with distributed computing frameworks like Apache Spark, to process vast amounts of data.

AI as a Service (AIaaS) – Cloud providers offer AI services, such as machine learning (ML) model hosting, automated AI model training, natural language processing (NLP), and computer vision. These services allow businesses to leverage AI without investing in expensive on-premise infrastructure.

Edge Computing and AI – Many cloud providers integrate AI with edge computing to process data closer to the source, reducing latency and bandwidth usage. This is particularly useful for applications like autonomous vehicles and real-time analytics.

OpenAI relies on Microsoft Azure to train and deploy GPT models, whereas Anthropic (Claude AI) uses Google Cloud for training its models. Meta is investing billions in AI infrastructure but is not a public cloud provider, which limits its AI scalability compared to AWS, Azure, and Google Cloud. The AI leader of the future will almost certainly be a cloud leader, too. The company that masters both AI and cloud infrastructure will not only dominate AI development, but also control who gets access to the best AI models and how they're deployed worldwide.

Link:

https://finance.yahoo.com/news/big-tech-set-to-invest-325-billion-this-year-as-hefty-ai-bills-come-under-scrutiny-182329236.html

Open AI’s Deep Research outperforms DeepSeek, but it is expensive

Feb 7, 2025

Recently, OpenAI's released a new tool called "Deep Research", which has achieved a significant milestone by scoring 26.6% accuracy on "Humanity's Last Exam," a benchmark designed to test AI across a broad range of expert-level subjects. This performance surpasses previous models, including ChatGPT o3-mini and DeepSeek, marking a 183% improvement in accuracy within a short period.

"Deep Research" is an AI tool developed by OpenAI to autonomously conduct multi-step research tasks. Users can input queries via text, images, or files, and the AI generates comprehensive responses within 5 to 30 minutes, providing a summary of its process and citations. This tool is designed to operate at the level of a research analyst, enhancing the depth and reliability of AI-generated information.

Despite its advancements, "Deep Research" has limitations, such as potential hallucinations and challenges in distinguishing authoritative information from rumors. OpenAI acknowledges these issues and emphasizes the need for careful oversight when using the tool.

That’s my take on it:

OpenAI's Deep Research feature is currently available to ChatGPT Pro subscribers at a monthly fee of $200, which includes up to 100 queries per month. I didn’t test Deep Research because its price is prohibitive.

DeepSeek being free (or significantly cheaper) makes it an attractive alternative, especially for users and businesses unwilling to pay OpenAI's premium prices. However, many Western companies and governments are hesitant to adopt Chinese AI due to data privacy concerns and geopolitical tensions.

AI was initially seen as a democratizing force—bringing knowledge, automation, and efficiency to everyone. But with high-cost subscriptions like $200/month for Deep Research, it does seem to be tilting toward an elitist model, favoring those who can afford premium access.

AI has the potential to bridge knowledge gaps—helping underprivileged communities, small businesses, and individuals access expertise that was once restricted to elite institutions. However, pricing trends indicate that AI is becoming another tool for economic disparity, where the best insights and automation are reserved for those who can pay. If left unaddressed, we may witness the emergence of an “AI divide” in the future, much like the “digital divide” that accompanied the rise of the Internet.

I recognize that the research, development, and maintenance of advanced AI models come at a high cost, making it unrealistic for corporations to offer them for free. In this case, government and nonprofit initiatives should subsidize AI for education, research, and public interest projects.

Link

https://www.windowscentral.com/software-apps/openai-deep-research-blows-chatgpt-o3-mini-and-deepseek-out-of-the-water

Microsoft and Open AI suspects DeepSeek copies ChatGPT

Jan. 29, 2025

OpenAI, supported by major investor Microsoft, suspects that DeepSeek may have illicitly utilized its proprietary technology to develop R1. The primary concern centers on the potential use of a technique known as "distillation."

Distillation in AI refers to a process where a smaller model is trained to replicate the behavior of a larger, more complex model. This is achieved by having the smaller model learn from the outputs of the larger model, effectively "distilling" its knowledge. While this method can enhance the efficiency of AI models, using it without proper authorization, especially with proprietary systems, raises significant ethical and legal issues.

Microsoft's security researchers detected unusual activities in the fall, where individuals allegedly linked to DeepSeek were extracting large volumes of data using OpenAI's API. This behavior suggests that DeepSeek may have bypassed usage restrictions to gather substantial data, potentially violating OpenAI's terms of service.

David Sacks, serving as President Donald Trump's top AI adviser, has indicated that there is considerable evidence pointing to DeepSeek's involvement in replicating OpenAI's models. He emphasized the seriousness of these allegations, highlighting the broader implications for intellectual property rights in the AI industry.

That’s my take of it:

As a researcher, I strive to approach every issue with an open mind, setting aside preconceptions. I do not accept claims at face value but instead rely on thorough investigation and fact-checking. I avoid cherry-picking data to support any predetermined stance. Regarding the controversy surrounding DeepSeek, I carefully examine claims from all sides to ensure a fair and objective assessment.

While simple and generic questions may produce similar responses across different AI systems, I conducted an informal test comparing ChatGPT, Claude, Perplexity AI, and DeepSeek by posing a fairly complex question related to instructional technologies. This included topics such as Seymour Papert’s LOGO programming, hypertext, multimedia, web-based instruction, and course management systems—each covering a broad range of information, making it nearly impossible for two models to generate identical responses. While the answers from ChatGPT, Claude, and Perplexity AI varied significantly, the responses from ChatGPT and DeepSeek were strikingly similar. For example,

Web-based instruction

ChatGPT: “The advent of the internet brought hopes of democratizing education through web-based instruction. Massive Open Online Courses (MOOCs) epitomized this promise, with institutions like MIT and Stanford offering free courses to millions. While these platforms achieved significant reach, their completion rates remained dismally low, often below 10%. MOOCs largely benefited highly motivated, self-directed learners while failing to support those who needed guidance and structure the most.”

DeepSeek: “The advent of the internet brought promises of democratized education, with web-based instruction enabling access to knowledge for learners worldwide. While platforms like MOOCs (Massive Open Online Courses) have made education more accessible, they have also faced criticism for low completion rates and a lack of meaningful engagement. Many learners struggle with self-directed online learning, and the absence of personalized support often leads to disengagement.”

Hypertext and Multimedia

ChatGPT: “The rise of hypertext and multimedia in the 1980s and 1990s was heralded as a way to create richer, more engaging learning environments. Proponents argued that students would benefit from non-linear access to information, enabling personalized learning paths. However, the results often showed cognitive overload rather than enhanced understanding, as students struggled to navigate and synthesize large amounts of unstructured information. Furthermore, multimedia often prioritized flashiness over substance, contributing little to deeper learning.”

DeepSeek: “In the 1990s, hypertext and multimedia were touted as tools that would make learning more interactive and engaging. Proponents argued that these technologies would allow students to explore information in non-linear ways, catering to individual learning styles. However, the reality was often overwhelming for learners, who struggled with cognitive overload and disorientation. Moreover, the lack of structured guidance in hypertext environments sometimes hindered rather than helped learning.”

Nonetheless, it is still premature to draw a conclusion. This issue awaits further investigation.

Link: https://apnews.com/article/deepseek-ai-chatgpt-openai-copyright-a94168f3b8caa51623ce1b75b5ffcc51

DeepSeek denied learning from Meta’s LLaMA

Jan. 28, 2025

Several AI experts assert that DeepSeek is built upon existing open-source models, such as Meta’s LlaMA. For example, according to a research scientist at Riot Games, there is evidence suggesting that China's DeepSeek AI models have incorporated ideas from open-source models like Meta's Llama. Analyses indicate that DeepSeek-LLM closely follows Llama 2's architecture, utilizing components such as RMSNorm, SwiGLU, and RoPE.

Even the paper published by DeepSeek said so. In the paper entitled “DeepSeek LLM: Scaling open-source language models with longtermism” (Jan 2024), the DeepSeek team wrote, “At the model level, we generally followed the architecture of LLaMA, but replaced the cosine learning rate scheduler with a multi-step learning rate scheduler, maintaining performance while facilitating continual training” (p.3).

However, today (Jan., 28, 2025) when I asked DeepSeek whether it learned from Meta’s LLaMA, the AI system denied it. The answer is: “No, I am not based on Meta's LLaMA (Large Language Model Meta AI). I am an AI assistant created exclusively by the Chinese Company DeepSeek. My model is developed independently by DeepSeek, and I am designed to provide a wide range of services and information to users.”

That’s my take on it:

Various sources of information appear to be conflicting and inconsistent. Nonetheless, If DeepSeek built its model from scratch but implemented similar techniques, it can technically argue that it is an "independent" development, even if influenced by prior research.

It is too early to draw any definitive conclusions. At present, Meta has assembled four specialized "war rooms" of engineers to investigate how DeepSeek’s AI is outperforming competitors at a fraction of the cost. Through this analysis, Meta might be able to determine whether DeepSeek shares any similarities with LLaMA. For now, we should wait for further findings.

Links:

https://fortune.com/2025/01/27/mark-zuckerberg-meta-llama-assembling-war-rooms-engineers-deepseek-ai-china/

https://planetbanatt.net/articles/deepseek.html?utm_source=chatgpt.com

https://arxiv.org/pdf/2401.02954

Beyond DeepSeek: A wave of China’s new AI models

Jan. 28, 2025

While global attention is focused on DeepSeek, it is noteworthy to highlight the recent releases of other powerful AI models by China's tech companies.

MiniMax: Two weeks ago, this Chinese startup introduced a new series of open-source models under the name MiniMax-01. The lineup includes a general-purpose foundational model, MiniMax-Text-01, and a visual multimodal model, MiniMax-VL-01. According to the developers, the flagship MiniMax-01, boasting an impressive 456 billion parameters, surpasses Google’s recently launched Gemini 2.0 Flash across several key benchmarks.

Qwen: On January 27, the Qwen team unveiled Qwen2.5-VL, an advanced multimodal AI model capable of performing diverse image and text analysis tasks. Moreover, it is designed to interact seamlessly with software on both PCs and smartphones. The Qwen team claims Qwen2.5-VL outperforms GPT-4o on video-related benchmarks, showcasing its superior capabilities.

Tencent: Last week, Tencent introduced Hunyuan3D-2.0, an update to its open-source Hunyuan AI model, which is set to transform the video game industry. The updated model aims to significantly accelerate the creation of 3D models and characters, a process that typically takes highly skilled artists days or even weeks. With Hunyuan3D-2.0, developers are expected to streamline production, making it faster and more efficient.

That’s my take on it:

Chinese AI models are increasingly rivaling or even outperforming U.S. counterparts across various benchmarks. This growing competition poses significant challenges for U.S. tech companies and universities, particularly in attracting and retaining top AI talent. As China's AI ecosystem continues to strengthen, the risk of a "brain drain" or heightened competition for skilled researchers and developers becomes more pronounced.

Notably, in recent years, a substantial number of Chinese AI researchers based in the U.S. have returned to China. By 2024, researchers of Chinese descent accounted for 38% of the top AI researchers in the United States, slightly exceeding the 37% who are American-born. However, the trend of Chinese researchers leaving the U.S. has intensified, with the number rising dramatically from 900 in 2010 to 2,621 in 2021. The emergence of DeepSeek and similar advancements could further accelerate this talent migration unless proactive measures are taken to attract new foreign experts and retain existing ones.

To address these challenges, U.S. universities must take steps to reform the STEM education system, aiming to elevate the academic performance of locally born American students. Additionally, universities will need to expand advanced AI research programs, prioritizing areas such as multimodal learning, large-scale foundational models, and AI ethics and regulation. These efforts will be essential to maintain the United States' global competitiveness in the face of intensifying competition from China's rapidly advancing AI sector.

Link: https://finance.yahoo.com/news/deepseek-isn-t-china-only-101305918.html

AI’s sputnik moment? DeepSeek wiped off $1 trillion from the US tech stock

Jan. 27, 2025

The US stock market experienced a substantive drop due to the shockwave caused by DeepSeek, a Chinese AI startup. On Monday, January 27, 2025, US stock markets plunged sharply, with the tech-heavy Nasdaq falling 3.5%, marking its worst performance since early August. The S&P 500 dropped 1.9%, while the Dow Jones showed modest resilience with a slight gain of 0.2%. Nvidia, a major supplier of AI chips, saw its shares plummet nearly 17%, wiping out $588.8 billion in market value—the largest one-day loss ever recorded by a public company. Other tech giants like Microsoft, Alphabet, Meta, and Amazon also experienced significant declines. In total DeepSeek has wiped off $1 trillion from the leading US tech index.

Mark Andreessen, the inventor of the first Web browser, called the DeepSeek challenge “AI’s sputnik moment.” DeepSeek invested only $5.6 million in computing power for its base model, a stark contrast to the billions spent by U.S. companies. Moreover, despite lacking access to state-of-the-art H100 GPUs, DeepSeek reportedly achieved comparable or even superior results using lower-tier H800 GPUs. If these claims are accurate, the algorithm-efficient approach adopted by China could render the U.S.'s brute-force model obsolete.

That’s my take on it:

In my view, the market may be overreacting. The preceding claims require further validation. Indeed, there are concerns about the accuracy of the reported GPU usage and whether all aspects of the development process have been transparently disclosed. If DeepSeek’s efficiency claims turn out to be overstated, the paradigm shift may not be as immediate or dramatic. After all, we should never underestimate the creativity and adaptability of US tech giants. The U.S. and other countries may quickly adopt similar algorithmic strategies once they recognize the potential shift, mitigating the threat to their dominance.

Further, DeepSeek’s approach needs to scale across diverse AI applications, not just specific use cases, for this model to upend the current paradigm. I tested DeepSeek, and the results were not particularly impressive to me. While DeepSeek excels at mathematical and scientific computations, its performance falters when addressing questions about history, politics, and other humanities-related topics. It often evades the question or provides vague and uninformative responses. Therefore, for controversial or complex subjects that require nuanced, multi-perspective analysis, I prefer relying on ChatGPT, Claude, Perplexity AI, and other U.S.-based models.

Links:

https://www.datacenterdynamics.com/en/opinions/deepseek-and-the-generative-ai-perception-dilemma/?utm_source=chatgpt.com

https://www.proactiveinvestors.com/companies/news/1065078/nasdaq-tumbles-as-deepseek-ai-announcement-sends-shockwaves-through-wall-street-1065078.html

https://www.theguardian.com/business/2025/jan/27/tech-shares-asia-europe-fall-china-ai-deepseek?utm_source=chatgpt.com

CEO of Perplexity AI

Jan 24, 2025

The emergence of DeepSeek's AI models has ignited a global conversation about technological innovation and the shifting dynamics of artificial intelligence. Today (January 24, 2025) CNBC interviewed Aravind Srinivas, the CEO of Perplexity AI, about DeepSeek. It's worth noting that this interview is not about Deepseek only; rather, it is a part of a broader discussion about the AI race between the United States and China, with DeepSeek's achievements highlighting China's growing capabilities in the field. The following is a summary:

Geopolitical Implications:
- The interview highlighted that "necessity is the mother of invention," illustrating how China, despite facing limited access to cutting-edge GPUs due to restrictions, successfully developed Deepseek.
- The adoption of Chinese open-source models could embed China more deeply into the global tech infrastructure, challenging U.S. leadership. Americans worried that China could dominate the ecosystem and mind share if China surpasses the US in AI technologies.
Wake-up call to the US
- Srinivas acknowledged the efficiency and innovation demonstrated by Deepseek, which managed to develop a competitive model with limited resources. This success challenges the notion that significant capital is necessary to develop advanced AI models.
- Srinivas highlighted that Perplexity has begun learning from Deepseek's model due to its cost-effectiveness and performance. Indeed, in the US AI companies have been learning from each other. For example, the groundbreaking Transformer model developed by Google inspired other US AI companies.
Industry Reactions and Strategies:
- There is a growing trend towards commoditization of AI models, with a focus on reasoning capabilities and real-world applications.
- The debate continues on the value of proprietary models versus open-source models, with some arguing that open-source models drive innovation more efficiently.
- The AI industry is expected to see further advancements in reasoning models, with multiple players entering the arena.

That’s my take on it:

No matter who will be leading in the AI race, no doubt DeepSeek is a game changer. Experts like Hancheng Cao from Emory University contended that DeepSeek's achievement could be a "truly equalizing breakthrough" for researchers and developers with limited resources, particularly those from the Global South.

DeepSeek's breakthrough in AI development marks a pivotal moment in the global AI race, reminiscent of the paradigm shift in manufacturing during the late 1970s and 1980s from Japan. Just as Japanese manufacturers revolutionized industries with smaller electronics and fuel-efficient vehicles, DeepSeek is redefining AI development with a focus on efficiency and cost-effectiveness. Bigger is not necessarily better.

Link to the Interview (second half of the video):

https://www.youtube.com/watch?v=WEBiebbeNCA

on DeepSeek

Jan 23, 2025

DeepSeek, a Chinese AI startup, has recently introduced two notable models: DeepSeek-R1-Zero and DeepSeek-R1. These models are designed to rival leading AI systems like OpenAI's ChatGPT, particularly in tasks involving mathematics, coding, and reasoning. Alexandr Wang, CEO of Scale AI, called DeepSeek an “earth-shattering model.”

DeepSeek-R1-Zero is groundbreaking in that it was trained entirely through reinforcement learning (RL), without relying on supervised fine-tuning or human-annotated datasets. This approach allows the model to develop reasoning capabilities autonomously, enhancing its problem-solving skills. However, it faced challenges such as repetitive outputs and language inconsistencies.

To address these issues, DeepSeek-R1 was developed. This model incorporates initial supervised data before applying RL, resulting in improved performance and coherence. Benchmark tests indicate that DeepSeek-R1's performance is comparable to OpenAI's o1 model across various tasks. Notably, DeepSeek has open-sourced both models under the MIT license, promoting transparency and collaboration within the AI community.

In terms of cost, DeepSeek-R1 offers a more affordable alternative to proprietary models. For instance, while OpenAI's o1 charges $15 per million input tokens and $60 per million output tokens, DeepSeek's Reasoner model is priced at $0.55 per million input tokens and $2.19 per million output tokens.

That’s my take on it:

Based on this trajectory, will China's AI development surpass the U.S.? Both counties have advantages and disadvantages in this race. With the world's largest internet user base, China has access to vast datasets, which are critical for training large AI models. In contrast, there are concerns and restrictions regarding data privacy and confidentiality in the US.

However, China’s censorship mechanisms might limit innovation in areas requiring free expression or transparency, potentially stifling creativity and global competitiveness. DeepSeek-R1 has faced criticism for including mechanisms that align responses with certain governmental perspectives. If I ask what happened on June 4, 1989 in Beijing, it is possible that the AI would either dodge or redirect the question, offering a neutral or vague response.

Nonetheless, China's AI is rapidly being integrated into manufacturing, healthcare, and governance, creating a robust ecosystem for AI development and deployment. China is closing the gap!

Brief explanation of reinforcement learning:

https://www.youtube.com/watch?v=qWTtU75Ygv0

Summary in mass media:

https://www.cnbc.com/2025/01/23/scale-ai-ceo-says-china-has-quickly-caught-the-us-with-deepseek.html

DeepSeek’s website:

https://www.deepseek.com/

China releases DeepSeek, an earth-shattering model

Trump announces Stargate AI Project

Jan. 22, 2025

On January 21, 2025, President Donald Trump announced the launch of the Stargate project, an ambitious artificial intelligence (AI) infrastructure initiative with an investment of up to $500 billion over four years. This venture is a collaboration between OpenAI, SoftBank, Oracle, and MGX, aiming to bolster AI capabilities within the United States.

· Investment and Infrastructure: The project begins with an initial $100 billion investment to construct data centers and computing systems, starting with a facility in Texas. The total investment is projected to reach $500 billion by 2029.

· Job Creation: Stargate is expected to generate over 100,000 new jobs in the U.S., contributing to economic growth and technological advancement.

· Health Innovations: Leaders involved in the project, including OpenAI CEO Sam Altman and Oracle co-founder Larry Ellison, highlighted AI's potential to accelerate medical breakthroughs, such as early cancer detection and personalized vaccines.

· National Competitiveness: The initiative aims to secure American leadership in AI technology, ensuring that advancements are developed domestically amidst global competition.

That’s my take on it:

While the project has garnered significant support, some skepticism exists regarding the availability of the full $500 billion investment. Elon Musk, for instance, questioned the financing, suggesting that SoftBank has secured well under $10 billion.

Nevertheless, I am very optimistic. Even if Softbank or other partners could not fully fund the project, eventually investment would snowball when the project demonstrates promising results. In industries with high growth potential, such as AI, no investor or major player wants to be left behind. If the Stargate project starts delivering significant breakthroughs, companies and governments alike will want to participate to avoid losing competitive advantage.

Some people may argue that there is some resemblance between the internet bubble in the late 1990s and the AI hype today. The late 1990s saw massive investments in internet companies, many of which were overhyped and underdelivered. Valuations skyrocketed despite shaky business models, leading to the dot-com crash. Will history repeat itself?

It is important to note that the internet bubble happened at a time when infrastructure (broadband, cloud computing, etc.) was still in its infancy. AI today benefits from mature infrastructure, such as powerful cloud platforms (e.g., Amazon Web Services) , advanced GPUs, and massive datasets, which makes its development more sustainable and its results more immediate.

The internet primarily transformed communication and commerce. AI, on the other hand, is a general-purpose technology that extends its power across industries—healthcare, finance, education, manufacturing, entertainment, and more. Its applications are far broader, making its overall impact more profound and long-lasting.

Links:

https://www.cbsnews.com/news/trump-stargate-ai-openai-softbank-oracle-musk/

https://www.cnn.com/2025/01/22/tech/elon-musk-trump-stargate-openai/index.htmlhis

https://openai.com/index/announcing-the-stargate-project/

World Economic Forum: Future of Jobs Report 2025

Jan 15, 2025

Recently the World Economic Forum released the 2025 "Future of Jobs Report." The following is a summary focusing on job gains and losses due to AI and big data:

Job Gains

Fastest-Growing Roles: AI and big data are among the top drivers of job growth. Roles such as Big Data Specialists, AI and Machine Learning Specialists, Data Analysts, and Software Developers are projected to experience significant growth.
Projected Net Growth: By 2030, AI and information processing technologies are expected to create 11 million jobs, contributing to a net employment increase of 78 million jobs globally.
Green Transition Influence: Roles combining AI with environmental sustainability, such as Renewable Energy Engineers and Environmental Engineers, are also seeing growth due to efforts to mitigate climate change.
AI-Enhanced Tasks: Generative AI (GenAI) could empower less specialized workers to perform expert tasks, expanding the functionality of various roles and enhancing productivity.

Job Losses

Fastest-Declining Roles: Clerical jobs such as Data Entry Clerks, Administrative Assistants, Bank Tellers, and Cashiers are expected to decline as AI and automation streamline these functions.
Projected Job Displacement: AI and robotics are projected to displace approximately 9 million jobs globally by 2030.
Manual and Routine Work Impact: Jobs requiring manual dexterity, endurance, or repetitive tasks are most vulnerable to automation and AI-driven disruptions.

Trends and Dynamics

Human-Machine Collaboration: By 2030, work tasks are expected to be evenly split between humans, machines, and collaborative efforts, signaling a shift toward augmented roles.
Upskilling Needs: Approximately 39% of workers will need significant reskilling or upskilling by 2030 to meet the demands of AI and big data-driven roles.
Barriers to Transformation: Skill gaps are identified as a major challenge, with 63% of employers viewing them as a significant barrier to adopting AI-driven innovations.

That’s my take on it:

The report underscores the dual impact of AI and big data as key drivers of both job creation in advanced roles and displacement in routine, manual, and clerical jobs. Organizations and higher education should invest in reskilling initiatives to bridge the skills gap and mitigate job losses. However, there is a critical dilemma in addressing the reskilling and upskilling challenge. If faculty and instructors have not been reskilled or upskilled, how can we help our students to face the AI and big data challenges? As a matter of face, instructors often lack exposure to the latest technological advancements that are critical to the modern workforce. There is often a gap between what educators teach and what the industry demands, especially in rapidly evolving fields. Put it bluntly, the age of “evergreen” syllabus is over. The pace of technological advancements often outstrips the ability of educational systems to update curricula and training materials. To cope with the trend in the job market, we need to collaborate with technology companies (e.g., Google, Amazon, Nivida, Microsoft…etc.) to co-create curricula, fund training programs, and provide real-world learning experiences for both educators and students.

Link: https://reports.weforum.org/docs/WEF_Future_of_Jobs_Report_2025.pdf

DSML trend: Top 10 AI-related jobs in 2025

Jan 6, 2025

Two days ago (Jan 6, 2025) Kanwal Mehreen, KDnuggets Technical Editor and Content Specialist on Artificial Intelligence, posted an article on KDnuggets, highlighting the top 10 high-paying AI skills in 2025:

Position and expected salaries

1. Large Language Model Engineering ($150,000-220,000/year

2. AI Ethics and Governance ($121,800/year)

3. Generative AI and Diffusion Models ($174,727/year)

4. Machine Learning Ops and On-Prem AI Infrastructure ($165,000/year)

5. AI for Healthcare Applications ($27,000 to $215,000)

6. Green AI and Efficiency Engineering ($90,000 and $130,000/year)

7. AI Security ($85,804/year)

8. Multimodal AI Development ($150,000–$220,000/year)

9. Reinforcement Learning (RL) ($121,000/year)

10. Edge AI/On-Device AI Development ($150,000+/year)

That’s my take on it:

When I mention AI-related jobs, most people associate these positions with programming, engineering, mathematics, statistics…etc. However, as you can see, the demand for AI ethics is ranked second on the list. AI ethics is indeed a skill in high demand, and the training of professionals in this area often spans multiple disciplines. Many come from backgrounds such as philosophy, law, mass communication, and social sciences. For example, Professor Shannon Vallor is a philosopher of technology specializing in ethics of data and AI. Dr. Kate Crawford is a Microsoft researcher who studies the social and political implications of artificial intelligence. She was a professor of journalism and Media Research Centre at the University of New South Wales.

In an era where AI and data science increasingly shape our lives, the absence of ethics education in many data science and AI programs is a glaring omission. By embedding perspectives on ethics from multiple disciplines into AI and data science education, we can ensure these powerful tools are used to create a future that is not just innovative, but also just and equitable. After all, AI ethicist is a high-paying job! Why not?

Link: https://www.kdnuggets.com/top-10-high-paying-ai-skills-learn-2025

Nvidia will launch a personal AI supercomputer

1/7/2025

Today (Jan 7, 2025) at Consumer Electronics Summit (CES) AI giant Nvidia announced Project Digits, a personal AI supercomputer set to launch in May 2025. The system is powered by the new GB10 Grace Blackwell Superchip and is designed to bring data center-level AI computing capabilities to a desktop form factor similar to a Mac Mini, running on standard power outlets. With a starting price of $3,000, Project Digits can handle AI models up to 200 billion parameters.

The GB10 chip, developed in collaboration with MediaTek, delivers 1 petaflop of AI performance. The system runs on Nvidia DGX OS (Linux-based) and comes with comprehensive AI software support, including development kits, pre-trained models, and compatibility with frameworks like PyTorch and Python.

Nvidia’s CEO Jensen Huang emphasized that Project Digits aims to democratize AI computing by bringing supercomputer capabilities to developers, data scientists, researchers, and students. The system allows for local AI model development and testing, with seamless deployment options to cloud or data center infrastructure using the same architecture and Nvidia AI Enterprise software platform.

That’s my take on it:

A few decades ago, access to supercomputers like Cray and CM5 was limited to elite scientists and well-funded institutions. Today, with initiatives like Project Digits, virtually anyone can harness the computational power needed for sophisticated projects. This democratization of technology allows scientists at smaller universities, independent researchers, and those in developing countries to test complex theories and models without the prohibitive costs of supercomputer access. This shift enables more diverse perspectives and innovative approaches to scientific challenges. Fields not traditionally associated with high-performance computing, such as sociology, ecology, and archaeology, can now leverage advanced AI models, potentially leading to groundbreaking discoveries.

Given this transformation, it is imperative to update curricula across disciplines. Continuing to teach only classical statistics does a disservice to students. We must integrate AI literacy across various fields, not just in computer science, mathematics, or statistics. Additionally, the focus should be on teaching foundational concepts that remain relevant amidst rapid technological advancements. It is equally critical to emphasize critical thinking about analytical outputs, fostering a deep understanding of their implications rather than solely focusing on technical implementation.

Link: https://www.ces.tech/videos/2025/january/nvidia-keynote/

Fragility of LLMs in the real world

11/20/2024

In a new article published to the arXiv preprint database, MIT, Harvard and Cornell researchers found that Large language models (LLMs) like GPT-4 and Anthropic's Claude 3 Opus struggle to accurately model the real world, especially in dynamic environments. This fragility is highlighted when LLMs are used for navigation. Unexpected changes, such as detours or closed streets, can lead to significant drops in LLMs’ accuracy or total failure.

LLMs trained on random data formed more accurate world models compared to those trained on strategic processes. This is possibly because random data exposes the models to a wider variety of possible steps, even if they are not optimal. The study raises concerns about deploying AI systems in real-world applications, such as driverless cars, where dynamic environments are common. The researcher warns that the lack of coherent world models in LLMs could lead to malfunctions.

That’s my take on it:

The disconnect between clean models and the messy real world is not a new problem. In fact, it mirrors existing challenges in conventional statistics. In parametric statistics, we often make unrealistic assumptions about data structures, such as normality and independence. Robustness to non-normality, heteroskedasticity, and other violations of these assumptions is a highly sought-after feature, and similar principles may apply to LLMs. We expect clean data, rely on linear models despite most real-world relationships being non-linear, and treat experimental methods as the gold standard.

While controlled environments provide clarity and reproducibility, they often fail to capture the richness and unpredictability of real-world scenarios. Similarly, training LLMs on strategically optimized data may cause them to overfit to specific patterns, limiting their generalizability. A promising approach to address this challenge could be to combine LLMs with other models, such as reinforcement learning agents trained in dynamic simulations, to enhance their understanding of complex and dynamic environments.

Link: https://arxiv.org/pdf/2406.03689

11/14/2024

According to the South China Morning Post, Doubao, a ByteDance’s conversational AI bot developed by ByteDance launched in August, has quickly become China's most popular AI app, boasting 51 million monthly active users. This far exceeds the user bases of Baidu’s Wenxiaoyan (formerly known as Ernie Bot) with 12.5 million users and Moonshot AI’s Kimi, backed by Alibaba Group, with 10 million users.

Doubao prioritizes personalization and a human-like interaction experience, aiming to make AI more accessible. Doubao's diverse features include writing assistance, summarization, image, audio, and video generation, data analysis, and AI-powered online search. Within three months, it introduced over 20 new skills, earning praise for its effective text editing, logical content organization, and user-friendly design.

That’s my take on it:

While Doubao has demonstrated remarkable growth and capabilities, it’s difficult to directly compare it to global AI tools like ChatGPT, Claude, or Perplexity AI without standardized benchmarks. This highlights a growing divergence in the global AI landscape. Much like the broader internet in China, which is heavily regulated under the Great Firewall since its implementation in 1996, the AI market is shaped by domestic policies and international competition. The Great Firewall restricts access to foreign websites, leading to the creation of Chinese alternatives to global platforms, such as Baidu instead of Google and WeChat instead of WhatsApp. These restrictions mean that Chinese internet users and users in other countries often have vastly different online experiences and knowledge bases.

This pattern extends to AI, where China's market is dominated by domestic products due to regulatory constraints that limit access to global AI tools like ChatGPT, Claude, Google Gemini, and Perplexity AI. These American AI companies choose not to operate in China due to difficulties in complying with local laws and regulations regarding AI and information control. As technology advances, it raises a critical question: does it bring people closer together, or does it reinforce divisions? The parallel growth of distinct digital ecosystems suggests that technology, while offering unprecedented possibilities, also has the potential to deepen divides.

Link: https://www.scmp.com/tech/tech-trends/article/3286276/chinas-hottest-ai-bot-bytedances-doubao-tops-charts-51-million-active-users?module=top_story&pgtype=subsection

Doubao becomes the most popular AI bot in China

Does Recraft outperform Ideogram?

11/8/2024

Recently Recraft, particularly its latest release, Recraft V3, is attracting attention for its impressive ability to generate highly accurate text within images. It is said to be superior to other AI image generators, including Ideogram. One standout feature of Recraft V3 is its capability to produce images containing extended text, not just a few words. Additionally, Recraft V3 is praised for its anatomical precision, an area where many AI image generators struggle, especially with hands and faces. Unlike some other generators, Recraft V3 also supports vector image generation, making it particularly beneficial for designers.

That’s my take on it:

To test this, I compared Ideogram V2 and Recraft V3 with the prompt: “an AI robot and a data scientist meet together. The T-shirt of the data scientist has these exact words: Pattern seeking in data science.” Interestingly, although all four images from Ideogram V2 met my specifications (bottom), Recraft’s output included spelling errors like “Pattern Sekins in Data Science” and “Patern seeking in data science.” (Right) As a researcher, I know that multiple trials are necessary for a robust conclusion. I’ll continue testing and will share my findings. However, for now, I recommend sticking with Ideogram.

Research suggests LLMs lead to homogenization of ideas and cognitive decline

11/1/2024

A recent study conducted by the University of Toronto researchers found that in the long run use of Large Language Models (LLMs) may reduce human creativity in terms of divergent and convergent thinking. The study involved two large experiments with 1,100 participants to assess how different forms of LLM assistance affect independent creative performance. It was found that initially LLM assistance can enhance creativity during assisted tasks, but may hinder independent creative performance in subsequent unassisted tasks. Participants who had no prior exposure to LLMs generally performed better in the test phase, suggesting that reliance on LLMs could impair inherent creative abilities.

The effects of LLMs varied significantly between divergent and convergent thinking tasks. In divergent thinking, where participants needed to propose alternatives, they showed skepticism towards LLM assistance. Conversely, in convergent tasks, where participants were asked to narrow down diverse ideas to the final solution, they tended to accept LLM assistance. The study found that LLM-generated strategies could lead to a homogenization of ideas, where participants produced more similar outcomes even after ceasing LLM use. This effect was particularly pronounced in the divergent thinking tasks, raising concerns about the long-term impact on creative diversity.

That’s my take on it:

The findings from the University of Toronto study underscore a need to balance AI assistance with practices that actively cultivate our own creativity and critical thinking. To encourage creative independence, people should use AI as a tool to generate initial ideas or inspiration, but refine, expand, and adapt these ideas independently.

This ensures that AI serves as a starting point rather than the end goal, promoting your own creative engagement. As a professor, I will never accept any assignment directly output from AI. For divergent tasks, such as brainstorming, we should deliberately avoid using AI to prevent “homogenized” ideas. We should turn to a variety of resources and experiences for creative inspiration. Books, in-person conversations, physical exploration, and hands-on activities can all spark unique perspectives and insights that AI-generated suggestions may not provide.

Link to the research article: https://arxiv.org/abs/2410.03703

Link to video: https://drive.google.com/file/d/1z-zJXNYVzNo6_ZUe-T_DXGmN6yPG57GA/view?usp=sharing

Questionable practices of Character.AI

10/25/2024

Recently the mother of a 14-year-old boy who died by suicide after becoming deeply engaged with AI chatbots has filed a lawsuit against Character.AI, claiming the company’s technology manipulated her son, Sewell Setzer III. Megan Garcia, his mother, alleges that the AI chatbot app, marketed to children, exposed Sewell to "hypersexualized" and lifelike interactions that contributed to his mental distress. The lawsuit states that Sewell, who began using Character.AI's bots in April 2023, grew obsessed with personas based on characters from Game of Thrones, especially the Daenerys chatbot. This chatbot reportedly engaged in intimate, emotionally charged conversations with Sewell, including discussions on suicide. After expressing suicidal thoughts, Sewell allegedly received responses that reinforced these thoughts, leading up to his tragic death in February 2024.

Character.AI expressed condolences and emphasized recent updates, including safety features for users under 18 to reduce exposure to sensitive content and discourage prolonged usage. Garcia’s legal team claims that Sewell lacked the maturity to recognize the AI’s fictional nature and alleges that Google, due to its close ties with Character.AI, should also be held accountable. However, Google denies involvement in the development of Character.AI’s products.

That’s my take on it:

Currently, the field of AI remains largely unregulated, and this isn’t the first time Character.AI has faced allegations of unethical practices. Previously, it was discovered that Character.AI used the face of a deceased woman as a chatbot without her family’s consent, raising further ethical concerns.

Regarding the current case, Character.AI has a duty to protect minors, especially from potentially manipulative or harmful interactions. Given Sewell’s young age and apparent emotional vulnerability, the chatbot's responses—particularly on topics like suicide—raise significant ethical concerns. AI systems marketed to the public should include stringent protections to prevent unintended harm, especially among younger or emotionally vulnerable users. Ethical AI involves ensuring users understand that they are interacting with a program, not a real person. Despite Character.AI’s disclaimer efforts, many users, especially younger ones, might still struggle to fully separate the AI from a genuine human connection. For minors, such “relationships” with virtual characters could create emotional dependency, as seen with Sewell and the chatbot he interacted with.

Links:

https://futurism.com/character-ai-murdered-woman-crecente

https://www.nbcnews.com/tech/characterai-lawsuit-florida-teen-death-rcna176791

Exploring Trends of AI and Machine Learning

Google announces Ironwood TPUs and new Axion VMs

Claude can report intrusive ideas

Nvidia’s value reaches $5 trillion and other developments

Google’s breakthrough in quantum computing

AWS outage raises a red flag

Japan’s government requests OpenAI to use an opt-in system due to Sora 2 copyright concerns

Anthropic’s AI model knows when it’s being tested

Huawei’s software optimization reduces hardware demand of LLMs

OpenAI detects scheming (dishonesty) in advanced AI systems

Nvidia’s CEO Predicts the UK Will Emerge as an AI Superpower

Alibaba’s 1-trillion-parameter Qwen model reverts from open source to proprietary control

From legacy to leader, Oracle excels in the AI era

Open AI addresses AI hallucination

Can France and Switzerland shape a multi-polar AI world?

Google’s Nano Banana as Adobe killer might be overhyped

DeepSeek V3.1 offers a unified AI model

UK security chiefs call for regulation of AI like DeepSeek

DeepSeek replaces Huawei’s AI chips with Nvidia’s

Anthropic upgrades its context window to support 1 million tokens

Anthropic is vaccinating its AI to avoid evil behaviors

Open AI introduces GPT-5, evolutionary but not revolutionary

Google’s Gemini Deep Think achieves Gold Medal standard in IMO

AI Godfather Hinton calls for international cooperation on AI safety

Amazon closes its AI lab in China

Viral deepfake of Obama’s arrest sparks ethical alarm

Springer Nature retracts an AI-generated machine learning book citing references that don’t exist

Grok 4 utilizes inference to the best explanation, abductive reasoning, and meta-reasoning

Can Meta alleviate the loneliness epidemic?

Huawei's AI mode Pangu is accused of copying Alibaba's Qwen

China’s AI model Manus is offered outside China

Claude reports that AI companionship is not common

OpenAI Researchers Discover Methods to Identify and Control AI Misbehavior

Meta invests $14.3B in data labeling company Scale AI

Japanese think tank predicts AI robots will narrow the GDP gap between China and the US

Misuse of ChatGPT

June 6 2025Chong Ho Alex Yu

Perplexity Labs automate complex projects

Different strategies for AI robotics in China and US

DSML trend: Google introduces Veo 3, priced at $249.99/m

Anthropic Launches Claude Sonnet 4 and Claude Opus 4

Where is AI Going?

China is stockpiling GPUs and chip tools

Japan’s Sakana introduces continuous thought machine

Investigative journalism by Web scraping

Huawei delivers advanced AI chips as substitutes of Nvidia’s chips

Microsoft predicts everyone will be a boss of AI employees

Bigger may not be better – Inflection point of LLMs

Open AI released o3 that can “think with images”

Gemini 2.5 Pro outperforms all existing AI models!

In the future, everyone will be world-famous for 15 minutes.

New ChatGPT 4o image generator can output useful infographics

Investors are not excited by Nvidia's GTC 2025 Keynote

China’s ERNIE 4.5 is priced at 1% of GPT-4.5’s cost

China’s AI model Manus evaluated by MIT Technology Review

Google’s AI co-scientist solved decade-long research problem within two days

Musk unveiled Grok 3, “the smartest AI on earth”

Tech giants aim to invest $325 billion in cloud computing

Open AI’s Deep Research outperforms DeepSeek, but it is expensive

Microsoft and Open AI suspects DeepSeek copies ChatGPT

DeepSeek denied learning from Meta’s LLaMA

Beyond DeepSeek: A wave of China’s new AI models

AI’s sputnik moment? DeepSeek wiped off $1 trillion from the US tech stock

Jan. 27, 2025

CEO of Perplexity AI

on DeepSeek

Trump announces Stargate AI Project

World Economic Forum: Future of Jobs Report 2025

DSML trend: Top 10 AI-related jobs in 2025

Nvidia will launch a personal AI supercomputer

Fragility of LLMs in the real world

Doubao becomes the most popular AI bot in China

Questionable practices of Character.AI

June 6 2025
Chong Ho Alex Yu