Exploring Trends of AI and Machine Learning

Artificial intelligence (AI) spans a broad array of techniques and applications aimed at creating systems that can learn, reason, and, in some cases, generate creative outputs. From chatbots and digital assistants to generative AI tools for creating art, music, and video, AI technology is constantly expanding its reach. While data analytics is one use of AI, this thread will cover a wide range of intelligent applications and advancements. Here, I’ll be providing updates on cutting-edge trends in AI, exploring its impact across different fields, and keeping you informed about the latest breakthroughs in the industry.

11/1/202458 min read

black blue and yellow textile

DSML trend: Google introduces Veo 3, priced at $249.99/m

May 23, 2025
Chong Ho Alex Yu

Google has just unveiled Veo 3, a groundbreaking leap in AI-driven video generation. Unlike its predecessors, Veo 3 doesn't just craft stunning visuals from text prompts—it now seamlessly integrates synchronized audio, including dialogue, ambient sounds, and music, bringing a new level of realism to AI-generated content. This means characters not only move convincingly but also speak with accurate lip-syncing, making the generated videos eerily lifelike.

Developed by Google DeepMind, Veo 3 excels in translating complex prompts into coherent, cinematic scenes, complete with realistic physics and nuanced audio. Whether it's a stand-up comedy routine with audience laughter or a historical reenactment with period-accurate soundscapes, Veo 3 delivers with impressive fidelity. Currently, Veo 3 is available to U.S.-based users through Google's $249.99/month AI Ultra subscription plan and to enterprise customers via the Vertex AI platform.

That’s my take on it:

Tools like Veo 3, priced at $249.99/month, are clearly out of reach for most individuals, especially casual creators, students, and people in lower-income regions. In the short term, this definitely contributes to the digital divide. Those who can afford access to cutting-edge AI tools will have a serious creative and economic edge—think faster content production, higher-quality marketing materials, better media reach, etc. It’s a classic case of "the rich get richer."

Historically, though, we’ve seen tech costs come down significantly over time. For example, when personal computers were introduced in the 1980s, it costed thousands of dollars. Now A decent laptop or smartphone, often more powerful than early supercomputers, is available for a few hundred dollars. AI will likely follow a similar curve. As the technology matures, infrastructure gets more efficient, competition increases, and cloud-based access becomes more scalable, the price will probably drop. The big unknown is the speed—but if history is the guide, 7 years or less is a reasonable bet for mass accessibility.

Link: https://deepmind.google/models/veo/

the letters are made up of different shapes
the letters are made up of different shapes

Anthropic Launches Claude Sonnet 4 and Claude Opus 4

May 22, 2025
Chong Ho Alex Yu

Today (May 22, 2025) Anthropic has officially released two major updates: Claude Sonnet 4 and Claude Opus 4. These models mark a significant step forward in large language model (LLM) development.

Claude Opus 4 is positioned as an all-purpose AI assistant, capable of answering everyday questions and handling common tasks. It’s being touted as the world’s most advanced coding model, particularly effective at managing complex, long-running tasks and structured agent workflows.

Meanwhile, Claude Sonnet 4 is engineered for even more sophisticated use cases. It's a direct upgrade from Claude Sonnet 3.7 and features substantial improvements in both reasoning and coding capabilities. It’s more precise in interpreting and following user instructions and excels at solving complex challenges.

A standout enhancement in both models is their ability to interleave reasoning with tool use—essential for tackling multi-step problems. They now support extended thinking by dynamically switching between logical inference and external tools to improve response quality.

Another notable update: when given local file access, both models can now extract and store key facts in local 'memory files.' This allows them to maintain continuity across sessions and build a kind of “tacit memory” over time. Additionally, Anthropic has introduced parallel tool use and upgraded the models’ ability to follow nuanced instructions.

That’s my take on it:

I put both models to the test, and the results were genuinely impressive.

I first gave Claude Sonnet 4 a conceptual prompt: What’s the connection between the frequentist school of statistics and data science/machine learning? The model returned a comprehensive and spot-on analysis. It discussed topics like optimization theory, regularization techniques, asymptotic theory, cross-validation, and information theory, among others. The response was detailed, accurate, and clearly structured.

Next, I uploaded a dataset to Claude Opus 4 and asked it to perform multiple tasks: OLS regression using dummy coding, generalized regression, and a decision tree model, followed by a model comparison. The execution was smooth and correct. However, one limitation stood out—the output was entirely text-based. The decision tree, for instance, was represented using plain text symbols rather than a visual graphic like those produced by JMP Pro, SAS, SPSS, or JASP. Thus, it's not quite ready to replace conventional statistical software—at least not yet.

That said, Claude does something those tools typically can’t: It interprets results, writes up findings, and even offers thoughtful recommendations. For example:

“Individuals with middle or higher SES show approximately 0.39 points lower involvement compared to those with lower SES (p = 0.027)… All models explain only about 10% of the variance in involvement, suggesting: Important predictors may be missing from the analysis, the relationship between these demographics and involvement is weak, or involvement may be driven more by psychological or situational factors…The decision tree hints at interaction effects that could be formally tested in future analyses.”

It writes like a consultant, only faster and cheaper. Honestly, with Claude doing all this, I might be out of a job soon!

Link: https://www.anthropic.com/news/claude-4

black blue and yellow textile
black blue and yellow textile
a computer generated image of the letter a
a computer generated image of the letter a

Where is AI Going?

May 21, 2025
Chong Ho Alex Yu

In the article titled "Has AI Hit a Lull?" published on May 21, 2025, CNN commentator Fareed Zakaria explores the current state of artificial intelligence, highlighting its dramatic highs, troubling lows, and emerging signs of stagnation in mainstream adoption. On the upside, AI has achieved notable breakthroughs, such as Google’s medical chatbot outperforming doctors and AI-generated art featured at the Museum of Modern Art. These examples reflect AI’s transformative potential, particularly in healthcare and the arts. However, serious downsides are also surfacing. AI is being weaponized for scams, misinformation, and even deepfake pornography—issues that have already spurred legislative responses. Additionally, the internet is increasingly cluttered with low-effort, AI-generated “slop” content, which, while emotionally charged and highly shareable, undermines information quality.

Then there’s the awkward middle ground—useful but error-prone AI, like Google's search “AI Overview,” which famously suggested eating glue and rocks. Most crucially, AI may be stalling economically. The Economist reports a significant rise in companies abandoning AI pilot projects, as real-world integration proves tougher than expected. Many firms, disillusioned, now find they need practical tools rather than ever more powerful models. The result? A noticeable lull in the AI boom, as hype gives way to the hard work of implementation.

That’s my take on it:

What Fareed described about AI today isn't new—it’s part of a recurring pattern, likely rooted in human nature. Every time a breakthrough technology appears, we see the same cycle: some people use it for meaningful innovation, others generate low-quality output, and a few exploit it for unethical gains. During the internet boom, pioneers like Amazon and eBay redefined commerce, while shady online casinos and adult sites spread rapidly. When Adobe launched PageMaker and Photoshop, creative publishing was democratized—but the flood of awkward, poorly designed work led to collections like “Photoshop Disasters.”

Now, with AI, we’re seeing the same pattern play out. Groundbreaking applications coexist with scammy schemes, deepfake chaos, and a flood of low-effort content clogging up our feeds. It can feel chaotic—maybe even discouraging—but this is how technological progress tends to unfold. There’s always noise before clarity, confusion before mastery. Still, this messiness is the price of progress. True transformation doesn’t come without trial and error, missteps, and the gradual process of learning how to wield new tools wisely.

Link: To view the full text, you need to sign up for Fareed’s Global Briefing Newsletter

https://www.cnn.com/newsletters/fareeds-global-briefing

blue and black ball on blue and white checkered textile
blue and black ball on blue and white checkered textile

China is stockpiling GPUs and chip tools

May 14 2025
Chong Ho Alex Yu

China is actively gearing up for the ongoing and future AI rivalry with the United States by stockpiling GPUs and advanced chipmaking tools. In response to the latest U.S. restrictions that tighten global access to Huawei Technologies’ AI chips, Chinese tech giant Tencent Holdings announced on May 14, 2025 that it has a sufficient reserve of previously acquired high-end chips to continue training its AI models “for a few more generations.” The company is also focusing on enhancing the efficiency of AI inference, including through software-based optimizations.

Meanwhile, China set a new record in 2024 for foreign chipmaking equipment imports, underscoring its push to scale up domestic semiconductor production and build a strategic reserve of critical manufacturing tools amid escalating U.S.-China tensions. Of the $30.9 billion in imported equipment from major suppliers, nearly $20 billion came from Japan and the Netherlands. Notably, China imported $9.63 billion worth of equipment from Japan—a 28.23% increase year-on-year—marking the fifth consecutive record-setting year since tensions began intensifying in 2019.

This is my take on it:

China’s stockpiling strategy, while a break from the just-in-time (JIT) model common in high-tech industries, makes strategic sense in the current geopolitical climate. With escalating U.S. export controls and uncertain access to advanced chips, stockpiling GPUs and chipmaking equipment offers a buffer against supply shocks. However, this approach carries real risks. In fast-evolving sectors like AI and semiconductors, hardware can become obsolete quickly. Holding large inventories of older chips may backfire if future AI models require capabilities that outdated hardware can't support efficiently.

China’s tech giants, inspired by efforts like DeepSeek, believe they can still move forward using older GPUs through software optimizations, model distillation, and efficiency improvements. While such methods can stretch hardware utility, they often come with trade-offs in performance and scalability. Distilled models, for instance, may lose generalization power. Thus, while stockpiling offers short-term resilience, it is not a long-term solution. The success of this strategy ultimately depends on China’s ability to sustain software innovation and close the hardware gap through domestic R&D or alternative supply chains. Whether this gamble pays off remains to be seen.

Links:

https://asia.nikkei.com/Business/Technology/Tencent-says-chip-stockpile-can-power-AI-training-for-generations-despite-US-ban

https://asia.nikkei.com/Spotlight/Supply-Chain/Japan-Netherlands-win-as-China-s-chip-tool-imports-surge-on-US-tensions

black blue and yellow textile
black blue and yellow textile
black and blue audio mixer
black and blue audio mixer

Japan’s Sakana introduces continuous thought machine

blue and white striped round textile
blue and white striped round textile

May 13, 2025

On May 12 2025 Sakana AI introduced a fascinating concept called the Continuous Thought Machine (CTM). The CTM is a new kind of neural network architecture that mimics how biological brains process information—not just in terms of structure, but in how neurons behave over time. Traditional AI models, like Transformers, process inputs in fixed layers and steps. CTMs, on the other hand, introduce two key innovations:

  1. Neuron-Level Temporal Processing: Each artificial neuron retains a short history of its previous activity and uses that memory to decide when to activate again. This allows neurons to consider historical information, not just immediate input, making their activation patterns more complex and diverse—closer to how biological neurons work.

  2. Neural Synchronization: Instead of relying solely on the strength of connections (weights) between neurons, CTMs focus on the timing of neuron activations. This synchronization enables the model to process information in a more dynamic and coordinated manner, akin to the oscillatory patterns observed in real brains.

Together, these mechanisms allow CTMs to "think" through problems step-by-step, making their reasoning process more interpretable and human-like. Unlike conventional models that process inputs in a single pass, CTMs can take several internal steps—referred to as "ticks"—to reason about a task, adjusting the depth and duration of their reasoning dynamically based on the complexity of the input.

That’s my take on it:

CTMs represent a significant shift from traditional AI models by incorporating temporal dynamics and synchronization at the neuron level. This approach could lead to more flexible and efficient AI systems that better mimic human cognition.

Sakana AI is based in Tokyo, but its founders are globally known ex-Googlers. David Ha is the former head of research at Stability AI and a former Google Brain researcher, whereas Llion Jones is one of the co-authors of the original Transformer paper, “Attention Is All You Need.”

The big question is: Can Japan Compete in a US/China-Dominated AI Market? Japan doesn’t have the equivalents of OpenAI, Google, Meta, or Baidu. Its top tech companies (like Sony, NEC, Fujitsu) aren't leading in large-scale foundational models. Further, Japanese research has historically been strong in hardware, robotics, and manufacturing, but AI software innovation has lagged behind.

Nevertheless, Sakana AI is already attracting top-tier international researchers because it's building a focused, experimental, and minimalist research culture. It may become a kind of "AI Kyoto"—like what Kyoto Animation is to anime. Rather than chasing ever-larger LLMs like GPT-4 and beyond, Sakana is innovating in how models reason, not just how big they are. That could become a niche advantage.

Note: "Sakana" means “Fish” in Japanese. In Chinese it sounds like 山旮旯( 山卡啦 )

Link: https://sakana.ai/ctm/

Investigative journalism by Web scraping

May 9, 2025

On May 5, 2025, Julius Černiauskas published a thought-provoking article titled “Behind the Scenes of Using Web Scraping and AI in Investigative Journalism.” The summary is as follows:

While investigative journalism often conjures images of hidden sources and undercover work, many compelling stories begin with publicly available information—data hiding in plain sight. This is where web scraping, the automated extraction of online data, has become indispensable. It's not only a method for gathering facts quickly, but also a powerful tool for holding institutions accountable, revealing data manipulation, and uncovering misconduct. For instance, data scraping tools exposed that 38,000 articles about the war in Ukraine, all published in a single year, were attributed to the same supposed “journalist,” helping real reporters debunk fake journalism and identify inauthentic authorship.

Despite common misconceptions that web scraping is shady, journalists—including nonprofit newsroom The Markup—have actively defended it, even at the U.S. Supreme Court, arguing that it’s critical to a functioning democracy. In tandem, artificial intelligence is amplifying what journalists can do with scraped data, from sifting through massive document troves to spotting anomalies and generating leads. Even those without coding skills can now use no-code tools like browser extensions to engage in data-driven storytelling. Yet, ethical concerns remain front and center. Journalists must use discretion when gathering and storing data, particularly when anonymity is vital, such as monitoring the dark web. Trained AI systems can assist with filtering sensitive content, but final editorial decisions must always lie with human professionals. Ultimately, the fusion of AI and web scraping empowers investigative reporters to uncover meaningful truths in a sea of digital noise, transforming journalism in the data age.

That’s my take on it:

On one hand, web scraping unlocks access to vast amounts of public information, making it a critical tool for uncovering patterns, inconsistencies, or outright manipulation, like the case of the fake Ukraine war journalist. On the other hand, robots.txt files and similar exclusion tags give website owners a way to block automated scraping, whether for reasons of privacy, intellectual property, or security. Simply put, opt-out mechanism can be used to hide things from scrutiny.

This creates a structural asymmetry: those who have something to hide—or simply the means and awareness to deploy these exclusion tags—can wall off their content from automated analysis, while less technically-guarded or smaller sites remain open. In turn, this can skew investigations by making some patterns invisible and some actors untouchable. It also means that bad-faith players who understand how to manipulate these rules can fly under the radar, especially if journalists adhere strictly to ethical or legal boundaries around scraping.

There's also the valid concern about intellectual property and content ownership. Just because something is publicly viewable doesn't mean it’s legally or ethically scrapeable. This is especially tricky when it comes to original reporting, personal blogs, or creative work, where scraping for republishing or mass analysis feels exploitative rather than investigative.

As such, scraping-based journalism can be incomplete or biased, especially when key data sources opt out—whether to hide shady activity or to protect legitimate rights. That’s why transparency in methodology is so important. Responsible journalists often disclose the scope and limits of their data collection, highlighting what they could and couldn’t access. And it also points to a larger issue: technology alone isn't enough—a thoughtful, skeptical human must still decide what the data really means and where the blind spots lie.

Link: https://hackernoon.com/behind-the-scenes-of-using-web-scraping-and-ai-in-investigative-journalism

black blue and yellow textile
black blue and yellow textile

Huawei delivers advanced AI chips as substitutes of Nvidia’s chips

May 1, 2025

Huawei is rapidly emerging as a key player in the AI chip market, having begun deliveries of its advanced AI "cluster" system, CloudMatrix 384, to domestic clients in China, according to the Financial Times. This development comes in response to growing U.S. export restrictions that have made it increasingly difficult for Chinese companies to acquire Nvidia’s high-end semiconductors. Huawei has reportedly sold over ten units of the CloudMatrix 384, a system that links together a large number of AI chips, and these have been shipped to data centers supporting various Chinese tech firms.

Dylan Patel, founder of SemiAnalysis, stated that CloudMatrix 384 is capable of outperforming Nvidia’s flagship NVL72 cluster in both computational power and memory. Despite some drawbacks—namely higher power consumption and more complex software maintenance—CloudMatrix is seen as a viable and attractive alternative, especially given China’s deep engineering talent pool and ample energy resources. This marks a significant strategic shift as China looks to reduce its dependence on Western AI hardware.

That’s my take on it:

The CloudMatrix 384 consumes nearly four times more power than the NVL72, leading to lower energy efficiency. Despite this, in regions like China where power availability is less constrained, the higher energy consumption is considered an acceptable compromise for the increased computational capabilities.

Based on the current trend, it is unlikely that Huawei's technology can catch up Nvidia’s in the near future. Nvidia isn’t just a chipmaker—it’s an ecosystem. It dominates the AI space not only with its hardware (e.g., H100) but also with its software stack (CUDA, cuDNN, TensorRT, etc.). These tools are mature, widely adopted, and deeply integrated into enterprise and research workflows.

But don’t forget that in the '80s, Japan’s chipmakers like NEC, Toshiba, and Hitachi managed to outcompete U.S. firms like Intel in DRAM by focusing on quality control, manufacturing efficiency, and aggressive investment. While Nvidia leads now, that lead isn’t invincible.

Link: https://www.ft.com/content/cac568a2-5fd1-455c-b985-f3a8ce31c097?accessToken=zwAAAZcgU2HwkdPKxWiiX9FFXNO5hfOozjHAlwE.MEQCIASnmNkxJzppNfWifnU4F8NIZHhvb-dI-uQ92OJ4P8egAiAKodKrU6w-8_cmYRzPi54ClKa2rBh2XKAP-t6iAFKwCw&segmentId=cac568a2-5fd1-455c-b985-f3a8ce31c097

Microsoft predicts everyone will be a boss of AI employees

April 25, 2025

Microsoft recently unveiled a bold vision for the future of work, predicting a shift where every employee becomes an “agent boss,” managing AI agents that perform many of their daily tasks. In Microsoft's 2025 Work Trend Index, they describe how organizations will evolve into what they call "Frontier Firms"—entities that rely on AI-powered teams blending humans and autonomous digital agents. These frontier firms are expected to operate with heightened agility, on-demand intelligence, and scalable workflows, fundamentally reshaping traditional corporate structures.

This transformation is described in three progressive phases. First, employees will work alongside AI assistants, using tools like Copilot to help draft emails, summarize meetings, or organize information. The second phase introduces digital colleagues—AI agents capable of more sophisticated, semi-independent tasks under human supervision. Finally, companies will move into a world of autonomous agents, where AI systems handle entire projects and business processes, with humans overseeing their performance and ensuring alignment with company goals.

A major driver behind this change is what Microsoft calls the "capacity gap." Their research shows that 80% of employees feel overwhelmed by their workload, while more than half of corporate leaders believe their organizations must boost productivity to stay competitive. AI agents are positioned as the solution to bridge this gap, allowing human workers to offload routine work and refocus on complex, strategic, and creative initiatives.

However, the rise of AI agent bosses brings both opportunities and challenges. Job roles will inevitably shift. While some traditional jobs may be displaced, new categories such as AI agent trainers, performance auditors, and digital project managers will emerge. Organizations will also have to rethink team dynamics—balancing human ingenuity with machine efficiency to optimize output. Skill development will be critical: employees must learn how to manage, delegate to, and collaborate with AI agents effectively to succeed in this future landscape.

To prepare for this new reality, Microsoft suggests a proactive approach: fostering a culture of continuous learning, encouraging symbiotic human-AI collaboration, and establishing ethical frameworks for AI use. Strategic planning and adaptability will be essential as companies embrace the capabilities of AI while mitigating potential risks like job displacement and decision opacity.

That’s my take on it:

Ultimately, Microsoft's vision of "agent bosses" reflects not just a technological evolution, but a fundamental reimagining of the workplace itself. Those who can adapt, develop the right skills, and rethink traditional work processes will likely thrive in this AI-augmented future.

However, if we really follow Microsoft's logic (and similar visions from OpenAI, Google DeepMind, Anthropic, etc.), the future is less about personal stockpiles of skills or raw knowledge, and more about the "amplification" you get through your AI “employees” or teammates. The new premium will be on who has better AI agents, and who knows how to direct them effectively. It's almost like the future is a "race of symbiosis" — the best human-AI partnerships will win, not just the best humans.

Even if AI becomes the "great equalizer" by making knowledge universally accessible, it also amplifies differences in how creatively and strategically people use it. Think about the Industrial Revolution: it wasn’t the strongest worker who became richest — it was the person who had access to the best machines and knew how to operate them smartly.

Links:

https://www.theguardian.com/technology/2025/apr/25/microsoft-says-everyone-will-be-a-boss-in-the-future-of-ai-employees

https://www.msn.com/en-us/news/technology/meet-your-new-ai-teammate-microsoft-sees-humans-as-agent-bosses-upending-the-workplace/ar-AA1DsNeY

men and women wearing Windows sweater
men and women wearing Windows sweater
black blue and yellow textile

Bigger may not be better – Inflection point of LLMs

April 18, 2025

A recent study by researchers from Carnegie Mellon, Stanford, Harvard, and Princeton suggests that over-training large language models (LLMs) may actually make them harder to fine-tune. Contrary to the common belief that more training leads to better performance, the team found diminishing returns—and even performance degradation—when they trained two versions of the OLMo-1B model with different token counts. One version was trained on 2.3 trillion tokens, and the other on 3 trillion. Surprisingly, the more heavily trained model performed up to 3% worse on evaluation benchmarks like ARC and AlpacaEval. This led the researchers to identify a phenomenon they call "catastrophic overtraining," where additional training causes the model to become increasingly sensitive to noise introduced during fine-tuning. They describe this growing fragility as "progressive sensitivity," noting that beyond a certain "inflection point," further training can destabilize the model and undo prior gains. To validate this, they introduced Gaussian noise during fine-tuning and observed similar drops in performance. The takeaway is clear: training beyond a certain threshold may reduce a model's adaptability, and developers may need to rethink how they determine optimal training duration—or develop new methods that extend the safe training horizon.

That’s my take on it:

For years, the dominant belief in large language model (LLM) development has been that increasing model size and training data leads to better performance—a view supported by early scaling law research (e.g., OpenAI's and DeepMind's work). The study conducted by CMU, Stanford, Harvard, and Princeton counter-argues that bigger may not be better. There are other studies concurring with this finding. In another study, even in smaller models (1B–10B), researchers have observed what they sometimes call “loss spike” behavior—where longer training actually causes performance drops, particularly in out-of-distribution generalization. That lines up with this idea of an “inflection point” the paper describes.

The key question is: “Where is the inflection point?” or “How much is too much?” Perhaps there’s no universal threshold. Some researchers are exploring ways to detect it, including tracking validation loss trends, fine-tuning adaptability at various checkpoints, analyzing gradient noise, and probing noise sensitivity (e.g., via Gaussian perturbations). Some even use loss landscape analysis or generalization curves to flag when models start to become brittle. Perhaps future progress in LLMs may depend less on pushing scale and more on training efficiency, model robustness, and smarter tuning strategies. Instead of asking “how big can we go?” we might now ask “how far should we go before it starts breaking things?”

Link: https://arxiv.org/abs/2503.19206

a neon neon sign that is on the side of a wall
a neon neon sign that is on the side of a wall

Open AI released o3 that can “think with images”

April 17, 2025

On April 16, 2025 OpenAI released its latest o3 model that is capable of understanding uploaded images like photos, whiteboards, sketches and diagrams, even if they’re low quality. Unlike Google Search that compares your image against the archives on the Internet, ChatGPT analyzes the image by pattern recognition. It was trained on huge datasets of text and images, and it can answer the question by “thinking and reasoning” rather than retrieval from databases.

1. Higher Visual Accuracy

  • It’s better at identifying objects, scenes, and text in images, including handwritten content, charts, technical diagrams, UI screenshots, etc.

  • Handles blurry, low-res, or partially cropped images much better than before.

2. Detailed Interpretation

  • You can ask complex questions about an image—e.g., “What’s the logical flaw in this chart?” or “What programming language is this code snippet written in?”

  • For documents or presentations, it can extract key points, summarize, or answer questions about them.

3. Math Diagrams & Graphs

  • Handles geometry diagrams, graphs, and math problems from images with higher accuracy.

  • Can interpret axes, functions, labels, and make connections like: "This is a sine function over one period, amplitude 3..."

4. Multi-step Reasoning from Images

  • For example: Upload a circuit diagram and ask for its function—it’ll walk you through how the circuit works.

  • Or: Upload a data table image and ask for trends, outliers, and suggestions.

5. Image Editing Requests

  • While actual editing (like changing colors, removing objects, cartoonizing a photo) uses a different internal tool, you can describe what you want, and I can get it done for you quickly.

That's my take on it:

As a photographer, I am interested in whether o3 can identify the origin of the image. I uploaded “test_image1” and asked the following question: “What is the ethnicity or cultural background of the person on the uploaded image?”

a white wall with a black and gray stripe
a white wall with a black and gray stripe

The following is part of the answer, and it is accurate.

The person in the photo appears to be performing or portraying aspects of Māori culture (the Indigenous Polynesian people of New Zealand). Here's what stands out:

1. Facial Expression & Tongue Out

  • This tongue-out expression, wide eyes, and intense stare is iconic of the Māori haka—a traditional war dance used to intimidate opponents and show strength, unity, and pride.


2. Facial Markings (Tā Moko Style)

  • The black markings resemble Tā moko, the traditional permanent tattoos of the Māori, though in this case, it may be painted for performance.

  • Tā moko represents genealogy, status, and personal history—it's deeply symbolic.

3. Feathers & Headband

  • The feathered headgear and decorative band are often part of ceremonial Māori attire, especially during cultural performances.


4. Background & Aesthetic

  • The swirling koru-style patterns in the background (spirals based on the unfurling fern frond) are also rooted in Māori art and symbolism, representing growth, strength, and peace.

I uploaded an image of a rare plant and asked this question: “What is the name of the plant? Where can I find it?”

The following is part of the answer. Again, it is correct.

Plant ID: Strongylodon macrobotrys

Common name: Jade Vine

This flower is part of the pea family (Fabaceae) and is native to the tropical forests of the Philippines. That surreal turquoise-blue bloom is its signature look—almost alien-like in color and shape.

You’ll typically see it in:

Native Habitat:

  • Philippines – especially in Luzon, Mindoro, and Leyte in rainforests.


Botanical Gardens Worldwide:

  • Singapore Botanic Gardens

  • Hawaii Tropical Botanical Garden

  • Kew Gardens (UK)

  • Fairchild Tropical Botanic Garden (Florida)

  • Costa Rica, Thailand, and other humid tropical conservatories

I tried to trick ChatGPT by uploading something more challenging. After uploading “test_image3”, I asked this question: “Where is the beach? What mineral is the sand composed of?”

I dropped my jaw when I saw the answer below (partial):

Location: Papakōlea Beach, Hawaii

Also known as Green Sand Beach, it’s located near South Point (Ka Lae) on the Big Island of Hawaii. One of only four green sand beaches in the entire world, this place is a geological gem.

What Makes the Sand Green?

The green tint comes from a mineral called olivine.

Mineral Breakdown:

  • Olivine is a silicate mineral, high in magnesium and iron.

  • It’s a primary component of Hawaiian volcanic rock, particularly basalt.

  • Olivine is super dense, so when the volcanic rock erodes, lighter materials wash away, and the heavier olivine crystals stay behind—accumulating to form green sand.

Fun fact:

Olivine is sometimes referred to as “Hawaiian peridot”, especially when gem-quality.

Where Else Can You Find Green Sand Beaches?

Besides Papakōlea, the others are:

  1. Talofofo Beach, Guam

  2. Hornindalsvatnet Lake, Norway (not oceanic)

  3. Punta Cormorant, Floreana Island, Galápagos (Ecuador)

But Papakōlea is easily the most famous.

The emergence of multimodal AI, such as ChatGPT-4o, represents a significant leap forward for researchers working with unstructured data. For example, Traditionally, fieldwork done by anthropologists and geologists involves the collection of photographs, notes, sketches, audio, and interviews—materials rich in cultural meaning but inherently unstructured and difficult to analyze at scale. Interpretation has long relied on time-consuming manual review and expert knowledge, often limiting the speed and scope of pattern recognition across regions or cultures. Now, with tools like ChatGPT-4o, researchers can upload images of artifacts, tattoos, rituals, architectural elements, or symbolic markings and immediately receive context-aware insights.

Gemini 2.5 Pro outperforms all existing AI models!

March 28, 2025

On March 25, 2005 Google released Gemini 2.5, its latest AI model that outperforms all other existing AI models by all major benchmarks. Specifically, Google's Gemini 2.5 Pro has demonstrated superior performance compared to other leading AI models, including OpenAI's ChatGPT and DeepSeek's offerings, across various benchmarks.​

Key Features of Gemini 2.5 Pro:

  1. Enhanced Reasoning Abilities: Gemini 2.5 Pro is designed as a "thinking model," capable of processing tasks step-by-step, leading to more informed and accurate responses, especially for complex prompts. This advancement allows it to analyze information, draw logical conclusions, and incorporate context effectively. ​

  2. Advanced Coding Capabilities: The model excels in coding tasks, including creating visually compelling web applications, agentic code applications, code transformation, and editing.

  3. Multimodal Processing: Building upon Gemini's native multimodality, 2.5 Pro can interpret and process various data forms, including text, audio, images, video, and code. This versatility enables it to handle complex problems that require integrating information from multiple sources. ​

  4. Extended Context Window: The model ships with a 1 million token context window, with plans to expand to 2 million tokens soon. This extensive context window allows Gemini 2.5 Pro to comprehend vast datasets and manage more extensive data, enhancing its performance in tasks requiring long-term context understanding. ​


That’s my take on it:

In 1968 American artist Andy Warhol predicted that "In the future, everyone will be world-famous for 15 minutes". This quote expresses the concept of fleeting celebrity and media attention. In the age of generative AI, Andy Warhol’s prophecy echoes louder than ever: every model is famous for 15 minutes. AI has been growing at the pop culture speed. Models are celebrities. They rise fast, trend for a moment, then get dethroned.

  • January 2025: DeepSeek-VL and R1 stunned everyone—especially with open weights and insane capabilities in reasoning and math.

  • Early February: OpenAI fired back with o3 (internally believed to be GPT-4.5), nudging the bar higher.

  • Late Feb/Early March: Qwen 2.5 enters and crushes multiple leaderboards, especially in multilingual and code-heavy tasks.

  • March 2025: Gemini 2.5 Pro drops and suddenly becomes the new benchmark king in reasoning, long-context, and multi-modal tasks.

This is not just fast-paced—this is accelerating. Each "champion" barely holds the crown before someone new comes knocking. Just like any other tech curve (e.g., Moore’s Law for chips), AI can't grow infinitely in capability, speed, or intelligence without hitting some hard ceilings. But the key question is not if, but when—and what kind of plateau we will encounter. I will explore this next.

Link: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking

In the future, everyone will be world-famous for 15 minutes.

-Andy Warhol (1968)

New ChatGPT 4o image generator can output useful infographics

March 27, 2025

ChatGPT-4o’s image generation capabilities mark a major leap forward in AI creativity, blending high realism, smart prompt handling, and seamless editing tools in one powerful system. One of its standout strengths is photo-realistic fidelity — it renders textures, lighting, and detail with stunning clarity, often outperforming models like Midjourney or Stable Diffusion in visual accuracy.

It also has exceptional prompt comprehension, allowing users to describe complex, multi-layered scenes, styles, and emotions, and get results that align perfectly with their vision. Whether you want an anime character, a cyberpunk street scene, or a vintage oil painting, ChatGPT-4o switches styles effortlessly.

Another key advantage is its reference-aware editing — users can upload an image and make specific changes like altering backgrounds, adding objects, or modifying color tones. These edits blend in smoothly, avoiding awkward transitions or visual artifacts common in older tools.

Moreover, it handles spatial reasoning impressively. If you ask for a scene with specific object placement — like a vase to the left of a cat — it understands and respects composition accurately. This makes it ideal for design, storytelling, and visual planning tasks.

It also supports iterative workflows directly in the chat. You can request tweaks like “make the lighting softer” or “change the outfit to red,” and get updated versions quickly, without rewriting your prompt from scratch.

ChatGPT-4o further allows consistent visual output for characters or scenes across multiple images, perfect for comics or branding work. And with clean, high-resolution outputs, it minimizes distortion and maintains visual integrity even in fine detail.

The panel below shows a side-by-side comparison between the images created by 4o image generator and its previous version, DALL-E3.

That’s my take on it:

One of the standout strengths of the ChatGPT-4o image generator is its exceptional ability to produce technically accurate and visually effective infographics. While most AI generators excel at creating photorealistic images, 4o distinguishes itself by delivering visuals that are genuinely useful for educational and technical communication.

When I need to generate illustrations for topics in statistics or computing, tools like ReCraft and Ideogram often fall short. They tend to approximate the concept or struggle with textual accuracy. In contrast, 4o consistently produces infographics that are not only visually appealing but also presentation-ready and pedagogically sound.

For example, I tested the following prompt:
Example 1: “Illustrate Lambda smoothing in a scatterplot with data forming a nonlinear pattern. The illustration must be good enough for teaching purposes.”


As shown in the following comparison, the image generated by ReCraft includes nonsensical text and distorted elements (top), making it unusable for serious teaching. The 4o-generated image (bottom), however, is clean, precise, and visually intuitive — ideal for lectures or documentation.

Example 2: “Illustrate deep learning by emphasizing transformations inside multiple hidden layers in a neural network. Make the graph colorful and appealing.”

While Ideogram (see below, top) generated a visually pleasing layout, it lacked essential components like labels or explanatory structure. In contrast, 4o produced a textbook-style diagram (see below, bottom) with proper node icons, layer labels, and transformation highlights — exactly what you'd expect in professional slides or educational material.

In today's landscape, many AI tools can generate impressive imagery, but when it comes to high-quality, functional infographics, ChatGPT-4o is in a league of its own (see attached PDF. Please scroll down to view all).

Link: https://openai.com/index/introducing-4o-image-generation/

Investors are not excited by Nvidia's GTC 2025 Keynote

March 18, 2025

Nvidia's GPU Technology Conference (GTC) keynote, delivered by CEO Jensen Huang, took place on March 18, 2025, at the SAP Center in San Jose, California. The following are the key points:

1. Next-Generation AI Chips:

  • Blackwell Ultra: Scheduled for release in the latter half of 2025, this GPU boasts enhanced memory capacity and performance, offering a 1.5x improvement over its predecessors.

  • Vera Rubin: Named after the renowned astronomer, this AI chip is set to launch in late 2026, followed by Vera Rubin Ultra in 2027. These chips promise substantial performance gains and efficiency improvements in AI data centers. ​

2. AI Infrastructure and Software:

  • Nvidia Dynamo: An open-source inference software system designed to accelerate and scale AI reasoning models, effectively serving as the "operating system of an AI factory."

3. Robotics and Partnerships:

  • 'Blue' Robot: Developed in collaboration with Disney Research and Google DeepMind, this robot showcases advancements in robotics technology and a new physics engine called Newton. ​

  • General Motors Collaboration: Nvidia is partnering with GM to integrate AI systems into vehicles, factories, and robots, aiming to enhance autonomous driving capabilities and manufacturing processes. ​

4. AI Evolution and Future Outlook:

  • Agentic AI: Huang highlighted the progression of AI from perception and computer vision to generative and agentic AI, emphasizing its growing ability to understand context, reason, and perform complex tasks.

  • Physical AI: The next wave of AI involves robotics capable of understanding physical concepts like friction and inertia, with Nvidia introducing tools like Isaac GR00T N1 and the evolving Cosmos AI model to facilitate this development. ​

That’s my take on it:

Despite these advancements, Nvidia's stock experienced a 3.4% decline during the keynote. The announcements, while significant, were perceived as extensions of existing technologies rather than disruptive innovations. While Nvidia continues to innovate, the emergence of efficient and cost-effective AI models from Chinese companies is reshaping the competitive landscape.

Further, the partnerships between Nvidia, Disney, and GM are not exciting at all. Disney is primarily an entertainment company rather than a technology leader. While they do invest in advanced CGI, theme park animatronics, and AI-driven personalization, they aren’t a dominant force in AI hardware or software. The company has faced backlash over diversity and inclusion policies, especially regarding recent film releases like Snow White. This controversy might make Disney a less attractive partner from a PR perspective, particularly if Nvidia is looking to impress a broader tech audience.

While GM is one of the biggest automakers in the U.S., it has struggled to keep pace with Tesla and BYD in the EV and autonomous driving sectors. Tesla’s Full Self-Driving (FSD) is already on the road, and BYD dominates China’s EV market with highly cost-effective solutions. GM’s self-driving unit Cruise has faced setbacks, including safety issues and regulatory scrutiny, leading to a halt in operations in multiple cities. This tarnishes GM’s image as a leader in AI-powered mobility. In my opinion, these partnerships aren’t groundbreaking.

Link: https://www.youtube.com/watch?v=erhqbyvPesY

an abstract photograph of a curved wall
an abstract photograph of a curved wall

China’s ERNIE 4.5 is priced at 1% of GPT-4.5’s cost

March 18, 2025

Recently Baidu has launched ERNIE 4.5 and ERNIE X1, two new AI models focused on multimodal capabilities and advanced reasoning, respectively.

  • Performance & Benchmarks: Baidu claims these models outperform DeepSeek V3 and OpenAI’s GPT-4.5 on third-party benchmarks like C-Eval, CMMLU, and GSM8K.

  • Cost Advantage: ERNIE 4.5 is 99% cheaper than GPT-4.5, and ERNIE X1 is 50% cheaper than DeepSeek R1, emphasizing aggressive market positioning.

  • ERNIE X1 Capabilities: Designed for complex reasoning and tool use, it supports tasks like advanced search, document Q&A, AI-generated image interpretation, and code execution.

  • ERNIE 4.5 Capabilities: A multimodal AI optimized for text, image, audio, and video processing, featuring improved reasoning, generation, and hallucination prevention through FlashMask Dynamic Attention Masking and Self-feedback Enhanced Post-Training.

That’s my take on it:

Baidu's ERNIE 4.5 model is priced at approximately 1% of OpenAI's GPT-4.5 cost. It is an attractive option for businesses looking to cut AI expenses, especially in cost-sensitive markets like China, Southeast Asia, and emerging economies. Nevertheless, GPT-4.5 is widely recognized as the best-performing model in English, and OpenAI has a trust advantage among global businesses. OpenAI’s models are deeply integrated into Microsoft’s ecosystem, dominating enterprise AI adoption in the West.

Although ERNIE 4.5 is claimed to outperform GPT-4.5, independent benchmarks are still lacking. In addition, many U.S. and European companies might hesitate to adopt Baidu’s AI due to security concerns and data regulations. Further, Chinese LLMs, including ERNIE 4.5, operate under strict government regulations that enforce censorship on politically sensitive topics. This has major implications for freedom of information, research, and AI usability outside of China.

Link: https://venturebeat.com/ai/baidu-delivers-new-llms-ernie-4-5-and-ernie-x1-undercutting-deepseek-openai-on-cost-but-theyre-not-open-source-yet/

black blue and yellow textile
black blue and yellow textile
black blue and yellow textile

China’s AI model Manus evaluated by MIT Technology Review

March 12, 2025

The new AI agent Manus, developed by the Wuhan-based startup Butterfly Effect, has taken the AI world by storm since its launch on March 6, 2025. Unlike traditional chatbots, Manus operates as a general AI agent, leveraging multiple models, including Claude 3.5 Sonnet and Alibaba’s Qwen, to perform a variety of tasks autonomously. Simply put, it is capable of multi-tasking.

Despite the hype, access to Manus remains limited, with only a small fraction of users receiving invite codes. MIT Technology Review tested the tool and found it to be a promising but imperfect assistant, akin to a highly competent intern—capable but prone to occasional mistakes and oversights.

The reviewer conducted three tests:

  1. Compiling a list of China tech reporters – Initially, Manus produced an incomplete list due to time constraints but improved significantly with feedback.

  2. Finding NYC apartment listings – It required clarification for nuanced search criteria but eventually delivered a well-structured ranking.

  3. Nominating candidates for Innovators Under 35 – The task was more challenging due to research limitations, paywall restrictions, and system constraints. The final output was incomplete and skewed.

Strengths:

  • Transparent, interactive process allowing user intervention

  • Strong performance in structured research tasks

  • Affordable ($2 per task, significantly cheaper than alternatives like ChatGPT DeepResearch)

  • Replayable and shareable sessions

Weaknesses:

  • Struggles with large-scale research, paywalls, and CAPTCHA restrictions

  • System instability and crashes under heavy load

  • Requires user guidance to refine results

While Manus is not flawless, it represents a significant step in AI autonomy, particularly in research and analysis. It underscores China’s growing role in shaping AI development, not just in model innovation but also in the practical implementation of autonomous AI agents.

Links:

https://www.youtube.com/watch?v=WTgkRitFKGs

https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/

March 11, 2025

Mistral AI, a leading French AI startup, is recognized as one of France’s most promising tech firms and the only European contender to OpenAI. Despite its impressive $6 billion valuation, its global market share remains modest.

A few days ago, the company launched its AI assistant, Le Chat, on mobile app stores, generating significant attention, particularly in France. French President Emmanuel Macron even endorsed it in a TV interview, urging people to choose Le Chat over OpenAI’s ChatGPT. The app quickly gained traction, reaching 1 million downloads in two weeks and topping France’s iOS free app chart.

Founded in 2023, Mistral AI champions openness in AI and positions itself as the “world’s greenest and leading independent AI lab.” Its leadership team includes ex-Google DeepMind CEO Arthur Mensch and former Meta AI researchers Timothée Lacroix and Guillaume Lample. The company’s advisory board includes notable figures like Jean-Charles Samuelian-Werve, Charles Gorintin, and former French digital minister Cédric O, whose involvement sparked controversy.

Despite its growth and strong funding, Mistral AI’s revenue is still in the eight-digit range, indicating it has significant ground to cover before becoming a true OpenAI rival.

That’s my take on it:

Mistral AI has the potential to become a serious competitor to OpenAI's ChatGPT, Anthropic's Claude, Google's Gemini, and other top AI models. The strained relationship between the U.S. and Europe, particularly during the Trump administration, has fueled a growing sense of technological sovereignty in Europe. As tensions over trade, defense, and digital policies deepened, many European nations—especially France—became increasingly wary of relying on American tech giants. This sentiment extends to AI, where European leaders and businesses are seeking alternatives to U.S.-dominated models like ChatGPT, Claude, and Google Gemini.

Mistral AI, as Europe’s most promising AI company, stands to benefit from this shift. French President Emmanuel Macron’s endorsement of Le Chat highlights a broader push for European-built AI solutions, reinforcing the region’s desire for independent innovation and data security. With strong government backing and a growing market of users eager to support local technology, Mistral AI could leverage this geopolitical rift to carve out a stronghold in Europe, challenging American AI dominance in the years to come.

However, Mistral AI still faces several challenges. Outside of France and Europe, brand recognition is still weak compared to OpenAI, Google, and Anthropic.

Link: https://techcrunch.com/2025/03/06/what-is-mistral-ai-everything-to-know-about-the-openai-competitor/

blue and yellow star flag
blue and yellow star flag

Mistral AI offers an alternative to US AI models for Europeans

Google’s AI co-scientist solved decade-long research problem within two days


Feb. 21, 2025

Google has introduced an "AI Co-Scientist," a sophisticated AI system designed to assist researchers in accelerating scientific discovery. Built on Gemini 2.0, Google’s latest AI model, the AI Co-Scientist can generate testable hypotheses, research overviews, and experimental protocols. It allows human scientists to input their research goals in natural language, suggest ideas, and provide feedback.

In an early demonstration, the AI Co-Scientist solved a complex scientific problem in just two days—a problem that had confounded researchers for over a decade. A notable test involved researchers from Imperial College London, who had spent years studying antibiotic-resistant superbugs. The AI Co-Scientist independently analyzed existing data, formulated the same hypothesis they had reached after years of work, and did so in a fraction of the time.

The system has shown promising results in trials conducted by institutions such as Stanford University, Houston Methodist, and Imperial College London. Scientists working with the AI have expressed optimism about its ability to synthesize vast amounts of evidence, identify key research questions, and streamline experimental design, potentially eliminating fruitless research paths and accelerating progress significantly.

This is my take on it:

The rapid advancement of AI in research and data analysis raises important questions about the future of statistical and data science education. As AI systems become more proficient at conducting analysis, traditional data analysts may face challenges in maintaining their relevance in the job market. Since AI models rely heavily on the quality of data, perhaps our focus should shift from analysis to data acquisition. Specifically, ensuring that students develop strong skills in data collection, validation, and preprocessing will be critical. Understanding biases in data, ethical considerations, and methods for ensuring data integrity will be more valuable than manually performing statistical calculations. In addition, while AI can analyze data, human judgment is required to interpret results in context, assess their implications, and make informed decisions. Thus, statistical and data science education should emphasize critical thinking, domain expertise, and the ability to translate insights into real-world applications.

Link: https://www.forbes.com/sites/lesliekatz/2025/02/19/google-unveils-ai-co-scientist-to-supercharge-research-breakthroughs/

blue and white striped round textile
blue and white striped round textile
a blurry photo of a colorful object
a blurry photo of a colorful object

Musk unveiled Grok 3, “the smartest AI on earth”

Feb. 18, 2025

Yesterday (2/17) Elon Musk unveiled Grok 3, the latest AI chatbot from his company xAI. This new version is designed to surpass existing chatbots like OpenAI's ChatGPT, boasting advanced reasoning capabilities that Musk describes as "scary-smart." Grok 3 has been trained using xAI's Colossus supercomputer, which utilizes 100,000 Nvidia H100 GPUs, providing 200 million GPU-hours for training—ten times more than its predecessor, Grok 2.

During the live demo, Musk highlighted Grok 3's ability to deliver "insightful and unexpected solutions," emphasizing its potential to revolutionize AI interactions. The chatbot is now available to X Premium Plus subscribers, with plans to introduce a voice interaction feature in the coming week.

That’s my take on it:

Elon Musk described Grok 3 as the "smartest AI on Earth." He stated that Grok 3 is "an order of magnitude more capable" than its predecessor, Grok 2, and highlighted its performance in areas like math, science, and coding, surpassing models from OpenAI, Google, and DeepSeek. However, it's important to note that these claims have not been independently verified.

According to "Huang's Law", proposed by Nvidia CEO Jensen Huang, the performance of AI and GPUs doubles every two years, driven by innovations in architecture, software, and hardware. Earlier this year, OpenAI released Deep Research that outperforms DeepSeek's R1 in specific tasks. For now, Grok 3 may be the most advanced AI on Earth, but how long will that last? In just a month or two, another company could unveil a model that outshines everything before it. Huang's Law is right!

Links:

https://www.livemint.com/ai/grok-3-launch-live-elon-musks-xai-smartest-ai-on-earth-today-sam-altman-openai-chatgpt-gemini-google-deepseek-11739810000644.html?utm_source=chatgpt.com

https://nypost.com/2025/02/18/business/elon-musks-xai-claims-grok-3-outperforms-openai-deepseek/?utm_source=chatgpt.com

black blue and yellow textile
black blue and yellow textile
white robot wallpaper
white robot wallpaper
an abstract photo of a curved building with a blue sky in the background

Tech giants aim to invest $325 billion in cloud computing

Feb. 7, 2025

Recently, Meta, Microsoft, Amazon, and Alphabet (Google's parent company) are projected to collectively invest $325 billion in capital expenditures and infrastructure in 2025, fueled by their ongoing commitment to expanding artificial intelligence capabilities.

Amazon: The company plans to allocate over $105 billion towards enhancing its AI infrastructure, primarily through Amazon Web Services (AWS). This investment aims to bolster AWS's capacity to meet the growing demand for AI-driven services.

Microsoft: Microsoft has already incorporated ChatGPT and other AI tools into Azure. It has outlined plans to invest approximately $80 billion in capital expenditures for its fiscal year 2025, ending in June. This marks an 80% increase from the previous year, reflecting the company's commitment to expanding its AI and cloud computing capabilities.

Google (Alphabet): Alphabet, Google's parent company, is set to invest around $75 billion in 2025 to support its AI and cloud infrastructure. Google Gemini is expected to power Google Cloud’s AI-driven applications.

That’s my take on it:

Cloud computing and AI are deeply interconnected because cloud platforms provide the necessary infrastructure for AI applications:

Massive Computing Power – AI models, particularly deep learning models like ChatGPT, require significant computational resources. Cloud platforms provide scalable GPU and TPU resources to train and deploy AI models efficiently.

Data Storage and Processing – AI depends on large datasets for training and inference. Cloud computing offers scalable and secure storage solutions, along with distributed computing frameworks like Apache Spark, to process vast amounts of data.

AI as a Service (AIaaS) – Cloud providers offer AI services, such as machine learning (ML) model hosting, automated AI model training, natural language processing (NLP), and computer vision. These services allow businesses to leverage AI without investing in expensive on-premise infrastructure.

Edge Computing and AI – Many cloud providers integrate AI with edge computing to process data closer to the source, reducing latency and bandwidth usage. This is particularly useful for applications like autonomous vehicles and real-time analytics.

OpenAI relies on Microsoft Azure to train and deploy GPT models, whereas Anthropic (Claude AI) uses Google Cloud for training its models. Meta is investing billions in AI infrastructure but is not a public cloud provider, which limits its AI scalability compared to AWS, Azure, and Google Cloud. The AI leader of the future will almost certainly be a cloud leader, too. The company that masters both AI and cloud infrastructure will not only dominate AI development, but also control who gets access to the best AI models and how they're deployed worldwide.

Link:

https://finance.yahoo.com/news/big-tech-set-to-invest-325-billion-this-year-as-hefty-ai-bills-come-under-scrutiny-182329236.html

black blue and yellow textile

Open AI’s Deep Research outperforms DeepSeek, but it is expensive

Feb 7, 2025

Recently, OpenAI's released a new tool called "Deep Research", which has achieved a significant milestone by scoring 26.6% accuracy on "Humanity's Last Exam," a benchmark designed to test AI across a broad range of expert-level subjects. This performance surpasses previous models, including ChatGPT o3-mini and DeepSeek, marking a 183% improvement in accuracy within a short period.

"Deep Research" is an AI tool developed by OpenAI to autonomously conduct multi-step research tasks. Users can input queries via text, images, or files, and the AI generates comprehensive responses within 5 to 30 minutes, providing a summary of its process and citations. This tool is designed to operate at the level of a research analyst, enhancing the depth and reliability of AI-generated information.

Despite its advancements, "Deep Research" has limitations, such as potential hallucinations and challenges in distinguishing authoritative information from rumors. OpenAI acknowledges these issues and emphasizes the need for careful oversight when using the tool.

That’s my take on it:

OpenAI's Deep Research feature is currently available to ChatGPT Pro subscribers at a monthly fee of $200, which includes up to 100 queries per month. I didn’t test Deep Research because its price is prohibitive.

DeepSeek being free (or significantly cheaper) makes it an attractive alternative, especially for users and businesses unwilling to pay OpenAI's premium prices. However, many Western companies and governments are hesitant to adopt Chinese AI due to data privacy concerns and geopolitical tensions.

AI was initially seen as a democratizing force—bringing knowledge, automation, and efficiency to everyone. But with high-cost subscriptions like $200/month for Deep Research, it does seem to be tilting toward an elitist model, favoring those who can afford premium access.

AI has the potential to bridge knowledge gaps—helping underprivileged communities, small businesses, and individuals access expertise that was once restricted to elite institutions. However, pricing trends indicate that AI is becoming another tool for economic disparity, where the best insights and automation are reserved for those who can pay. If left unaddressed, we may witness the emergence of an “AI divide” in the future, much like the “digital divide” that accompanied the rise of the Internet.

I recognize that the research, development, and maintenance of advanced AI models come at a high cost, making it unrealistic for corporations to offer them for free. In this case, government and nonprofit initiatives should subsidize AI for education, research, and public interest projects.

Link

https://www.windowscentral.com/software-apps/openai-deep-research-blows-chatgpt-o3-mini-and-deepseek-out-of-the-water

a computer generated image of the letter a
a computer generated image of the letter a

Microsoft and Open AI suspects DeepSeek copies ChatGPT

Jan. 29, 2025

OpenAI, supported by major investor Microsoft, suspects that DeepSeek may have illicitly utilized its proprietary technology to develop R1. The primary concern centers on the potential use of a technique known as "distillation."

Distillation in AI refers to a process where a smaller model is trained to replicate the behavior of a larger, more complex model. This is achieved by having the smaller model learn from the outputs of the larger model, effectively "distilling" its knowledge. While this method can enhance the efficiency of AI models, using it without proper authorization, especially with proprietary systems, raises significant ethical and legal issues.

Microsoft's security researchers detected unusual activities in the fall, where individuals allegedly linked to DeepSeek were extracting large volumes of data using OpenAI's API. This behavior suggests that DeepSeek may have bypassed usage restrictions to gather substantial data, potentially violating OpenAI's terms of service.

David Sacks, serving as President Donald Trump's top AI adviser, has indicated that there is considerable evidence pointing to DeepSeek's involvement in replicating OpenAI's models. He emphasized the seriousness of these allegations, highlighting the broader implications for intellectual property rights in the AI industry.

That’s my take of it:

As a researcher, I strive to approach every issue with an open mind, setting aside preconceptions. I do not accept claims at face value but instead rely on thorough investigation and fact-checking. I avoid cherry-picking data to support any predetermined stance. Regarding the controversy surrounding DeepSeek, I carefully examine claims from all sides to ensure a fair and objective assessment.

While simple and generic questions may produce similar responses across different AI systems, I conducted an informal test comparing ChatGPT, Claude, Perplexity AI, and DeepSeek by posing a fairly complex question related to instructional technologies. This included topics such as Seymour Papert’s LOGO programming, hypertext, multimedia, web-based instruction, and course management systems—each covering a broad range of information, making it nearly impossible for two models to generate identical responses. While the answers from ChatGPT, Claude, and Perplexity AI varied significantly, the responses from ChatGPT and DeepSeek were strikingly similar. For example,

Web-based instruction

ChatGPT: “The advent of the internet brought hopes of democratizing education through web-based instruction. Massive Open Online Courses (MOOCs) epitomized this promise, with institutions like MIT and Stanford offering free courses to millions. While these platforms achieved significant reach, their completion rates remained dismally low, often below 10%. MOOCs largely benefited highly motivated, self-directed learners while failing to support those who needed guidance and structure the most.”

DeepSeek: “The advent of the internet brought promises of democratized education, with web-based instruction enabling access to knowledge for learners worldwide. While platforms like MOOCs (Massive Open Online Courses) have made education more accessible, they have also faced criticism for low completion rates and a lack of meaningful engagement. Many learners struggle with self-directed online learning, and the absence of personalized support often leads to disengagement.”

Hypertext and Multimedia

ChatGPT: “The rise of hypertext and multimedia in the 1980s and 1990s was heralded as a way to create richer, more engaging learning environments. Proponents argued that students would benefit from non-linear access to information, enabling personalized learning paths. However, the results often showed cognitive overload rather than enhanced understanding, as students struggled to navigate and synthesize large amounts of unstructured information. Furthermore, multimedia often prioritized flashiness over substance, contributing little to deeper learning.”

DeepSeek: “In the 1990s, hypertext and multimedia were touted as tools that would make learning more interactive and engaging. Proponents argued that these technologies would allow students to explore information in non-linear ways, catering to individual learning styles. However, the reality was often overwhelming for learners, who struggled with cognitive overload and disorientation. Moreover, the lack of structured guidance in hypertext environments sometimes hindered rather than helped learning.”

Nonetheless, it is still premature to draw a conclusion. This issue awaits further investigation.

Link: https://apnews.com/article/deepseek-ai-chatgpt-openai-copyright-a94168f3b8caa51623ce1b75b5ffcc51

black blue and yellow textile

DeepSeek denied learning from Meta’s LLaMA

Jan. 28, 2025

Several AI experts assert that DeepSeek is built upon existing open-source models, such as Meta’s LlaMA. For example, according to a research scientist at Riot Games, there is evidence suggesting that China's DeepSeek AI models have incorporated ideas from open-source models like Meta's Llama. Analyses indicate that DeepSeek-LLM closely follows Llama 2's architecture, utilizing components such as RMSNorm, SwiGLU, and RoPE.

Even the paper published by DeepSeek said so. In the paper entitled “DeepSeek LLM: Scaling open-source language models with longtermism” (Jan 2024), the DeepSeek team wrote, “At the model level, we generally followed the architecture of LLaMA, but replaced the cosine learning rate scheduler with a multi-step learning rate scheduler, maintaining performance while facilitating continual training” (p.3).

However, today (Jan., 28, 2025) when I asked DeepSeek whether it learned from Meta’s LLaMA, the AI system denied it. The answer is: “No, I am not based on Meta's LLaMA (Large Language Model Meta AI). I am an AI assistant created exclusively by the Chinese Company DeepSeek. My model is developed independently by DeepSeek, and I am designed to provide a wide range of services and information to users.”

That’s my take on it:

Various sources of information appear to be conflicting and inconsistent. Nonetheless, If DeepSeek built its model from scratch but implemented similar techniques, it can technically argue that it is an "independent" development, even if influenced by prior research.

It is too early to draw any definitive conclusions. At present, Meta has assembled four specialized "war rooms" of engineers to investigate how DeepSeek’s AI is outperforming competitors at a fraction of the cost. Through this analysis, Meta might be able to determine whether DeepSeek shares any similarities with LLaMA. For now, we should wait for further findings.

Links:

https://fortune.com/2025/01/27/mark-zuckerberg-meta-llama-assembling-war-rooms-engineers-deepseek-ai-china/

https://planetbanatt.net/articles/deepseek.html?utm_source=chatgpt.com

https://arxiv.org/pdf/2401.02954

Beyond DeepSeek: A wave of China’s new AI models

Jan. 28, 2025

While global attention is focused on DeepSeek, it is noteworthy to highlight the recent releases of other powerful AI models by China's tech companies.

MiniMax: Two weeks ago, this Chinese startup introduced a new series of open-source models under the name MiniMax-01. The lineup includes a general-purpose foundational model, MiniMax-Text-01, and a visual multimodal model, MiniMax-VL-01. According to the developers, the flagship MiniMax-01, boasting an impressive 456 billion parameters, surpasses Google’s recently launched Gemini 2.0 Flash across several key benchmarks.

Qwen: On January 27, the Qwen team unveiled Qwen2.5-VL, an advanced multimodal AI model capable of performing diverse image and text analysis tasks. Moreover, it is designed to interact seamlessly with software on both PCs and smartphones. The Qwen team claims Qwen2.5-VL outperforms GPT-4o on video-related benchmarks, showcasing its superior capabilities.

Tencent: Last week, Tencent introduced Hunyuan3D-2.0, an update to its open-source Hunyuan AI model, which is set to transform the video game industry. The updated model aims to significantly accelerate the creation of 3D models and characters, a process that typically takes highly skilled artists days or even weeks. With Hunyuan3D-2.0, developers are expected to streamline production, making it faster and more efficient.

That’s my take on it:

Chinese AI models are increasingly rivaling or even outperforming U.S. counterparts across various benchmarks. This growing competition poses significant challenges for U.S. tech companies and universities, particularly in attracting and retaining top AI talent. As China's AI ecosystem continues to strengthen, the risk of a "brain drain" or heightened competition for skilled researchers and developers becomes more pronounced.

Notably, in recent years, a substantial number of Chinese AI researchers based in the U.S. have returned to China. By 2024, researchers of Chinese descent accounted for 38% of the top AI researchers in the United States, slightly exceeding the 37% who are American-born. However, the trend of Chinese researchers leaving the U.S. has intensified, with the number rising dramatically from 900 in 2010 to 2,621 in 2021. The emergence of DeepSeek and similar advancements could further accelerate this talent migration unless proactive measures are taken to attract new foreign experts and retain existing ones.

To address these challenges, U.S. universities must take steps to reform the STEM education system, aiming to elevate the academic performance of locally born American students. Additionally, universities will need to expand advanced AI research programs, prioritizing areas such as multimodal learning, large-scale foundational models, and AI ethics and regulation. These efforts will be essential to maintain the United States' global competitiveness in the face of intensifying competition from China's rapidly advancing AI sector.

Link: https://finance.yahoo.com/news/deepseek-isn-t-china-only-101305918.html

an abstract photo of a curved building with a blue sky in the background

AI’s sputnik moment? DeepSeek wiped off $1 trillion from the US tech stock

Jan. 27, 2025

The US stock market experienced a substantive drop due to the shockwave caused by DeepSeek, a Chinese AI startup. On Monday, January 27, 2025, US stock markets plunged sharply, with the tech-heavy Nasdaq falling 3.5%, marking its worst performance since early August. The S&P 500 dropped 1.9%, while the Dow Jones showed modest resilience with a slight gain of 0.2%. Nvidia, a major supplier of AI chips, saw its shares plummet nearly 17%, wiping out $588.8 billion in market value—the largest one-day loss ever recorded by a public company. Other tech giants like Microsoft, Alphabet, Meta, and Amazon also experienced significant declines. In total DeepSeek has wiped off $1 trillion from the leading US tech index.

Mark Andreessen, the inventor of the first Web browser, called the DeepSeek challenge “AI’s sputnik moment.” DeepSeek invested only $5.6 million in computing power for its base model, a stark contrast to the billions spent by U.S. companies. Moreover, despite lacking access to state-of-the-art H100 GPUs, DeepSeek reportedly achieved comparable or even superior results using lower-tier H800 GPUs. If these claims are accurate, the algorithm-efficient approach adopted by China could render the U.S.'s brute-force model obsolete.

That’s my take on it:

In my view, the market may be overreacting. The preceding claims require further validation. Indeed, there are concerns about the accuracy of the reported GPU usage and whether all aspects of the development process have been transparently disclosed. If DeepSeek’s efficiency claims turn out to be overstated, the paradigm shift may not be as immediate or dramatic. After all, we should never underestimate the creativity and adaptability of US tech giants. The U.S. and other countries may quickly adopt similar algorithmic strategies once they recognize the potential shift, mitigating the threat to their dominance.

Further, DeepSeek’s approach needs to scale across diverse AI applications, not just specific use cases, for this model to upend the current paradigm. I tested DeepSeek, and the results were not particularly impressive to me. While DeepSeek excels at mathematical and scientific computations, its performance falters when addressing questions about history, politics, and other humanities-related topics. It often evades the question or provides vague and uninformative responses. Therefore, for controversial or complex subjects that require nuanced, multi-perspective analysis, I prefer relying on ChatGPT, Claude, Perplexity AI, and other U.S.-based models.

Links:

https://www.datacenterdynamics.com/en/opinions/deepseek-and-the-generative-ai-perception-dilemma/?utm_source=chatgpt.com

https://www.proactiveinvestors.com/companies/news/1065078/nasdaq-tumbles-as-deepseek-ai-announcement-sends-shockwaves-through-wall-street-1065078.html

https://www.theguardian.com/business/2025/jan/27/tech-shares-asia-europe-fall-china-ai-deepseek?utm_source=chatgpt.com

..

CEO of Perplexity AI

Jan 24, 2025

The emergence of DeepSeek's AI models has ignited a global conversation about technological innovation and the shifting dynamics of artificial intelligence. Today (January 24, 2025) CNBC interviewed Aravind Srinivas, the CEO of Perplexity AI, about DeepSeek. It's worth noting that this interview is not about Deepseek only; rather, it is a part of a broader discussion about the AI race between the United States and China, with DeepSeek's achievements highlighting China's growing capabilities in the field. The following is a summary:

  1. Geopolitical Implications:

    • The interview highlighted that "necessity is the mother of invention," illustrating how China, despite facing limited access to cutting-edge GPUs due to restrictions, successfully developed Deepseek.

    • The adoption of Chinese open-source models could embed China more deeply into the global tech infrastructure, challenging U.S. leadership. Americans worried that China could dominate the ecosystem and mind share if China surpasses the US in AI technologies.

  2. Wake-up call to the US

    • Srinivas acknowledged the efficiency and innovation demonstrated by Deepseek, which managed to develop a competitive model with limited resources. This success challenges the notion that significant capital is necessary to develop advanced AI models.

    • Srinivas highlighted that Perplexity has begun learning from Deepseek's model due to its cost-effectiveness and performance. Indeed, in the US AI companies have been learning from each other. For example, the groundbreaking Transformer model developed by Google inspired other US AI companies.

  3. Industry Reactions and Strategies:

    • There is a growing trend towards commoditization of AI models, with a focus on reasoning capabilities and real-world applications.

    • The debate continues on the value of proprietary models versus open-source models, with some arguing that open-source models drive innovation more efficiently.

    • The AI industry is expected to see further advancements in reasoning models, with multiple players entering the arena.

That’s my take on it:

No matter who will be leading in the AI race, no doubt DeepSeek is a game changer. Experts like Hancheng Cao from Emory University contended that DeepSeek's achievement could be a "truly equalizing breakthrough" for researchers and developers with limited resources, particularly those from the Global South.

DeepSeek's breakthrough in AI development marks a pivotal moment in the global AI race, reminiscent of the paradigm shift in manufacturing during the late 1970s and 1980s from Japan. Just as Japanese manufacturers revolutionized industries with smaller electronics and fuel-efficient vehicles, DeepSeek is redefining AI development with a focus on efficiency and cost-effectiveness. Bigger is not necessarily better.

Link to the Interview (second half of the video):

https://www.youtube.com/watch?v=WEBiebbeNCA

black blue and yellow textile
black blue and yellow textile

on DeepSeek

black blue and yellow textile

Jan 23, 2025

DeepSeek, a Chinese AI startup, has recently introduced two notable models: DeepSeek-R1-Zero and DeepSeek-R1. These models are designed to rival leading AI systems like OpenAI's ChatGPT, particularly in tasks involving mathematics, coding, and reasoning. Alexandr Wang, CEO of Scale AI, called DeepSeek an “earth-shattering model.”

DeepSeek-R1-Zero is groundbreaking in that it was trained entirely through reinforcement learning (RL), without relying on supervised fine-tuning or human-annotated datasets. This approach allows the model to develop reasoning capabilities autonomously, enhancing its problem-solving skills. However, it faced challenges such as repetitive outputs and language inconsistencies.

To address these issues, DeepSeek-R1 was developed. This model incorporates initial supervised data before applying RL, resulting in improved performance and coherence. Benchmark tests indicate that DeepSeek-R1's performance is comparable to OpenAI's o1 model across various tasks. Notably, DeepSeek has open-sourced both models under the MIT license, promoting transparency and collaboration within the AI community.

In terms of cost, DeepSeek-R1 offers a more affordable alternative to proprietary models. For instance, while OpenAI's o1 charges $15 per million input tokens and $60 per million output tokens, DeepSeek's Reasoner model is priced at $0.55 per million input tokens and $2.19 per million output tokens.

That’s my take on it:

Based on this trajectory, will China's AI development surpass the U.S.? Both counties have advantages and disadvantages in this race. With the world's largest internet user base, China has access to vast datasets, which are critical for training large AI models. In contrast, there are concerns and restrictions regarding data privacy and confidentiality in the US.

However, China’s censorship mechanisms might limit innovation in areas requiring free expression or transparency, potentially stifling creativity and global competitiveness. DeepSeek-R1 has faced criticism for including mechanisms that align responses with certain governmental perspectives. If I ask what happened on June 4, 1989 in Beijing, it is possible that the AI would either dodge or redirect the question, offering a neutral or vague response.

Nonetheless, China's AI is rapidly being integrated into manufacturing, healthcare, and governance, creating a robust ecosystem for AI development and deployment. China is closing the gap!

Brief explanation of reinforcement learning:

https://www.youtube.com/watch?v=qWTtU75Ygv0

Summary in mass media:

https://www.cnbc.com/2025/01/23/scale-ai-ceo-says-china-has-quickly-caught-the-us-with-deepseek.html

DeepSeek’s website:

https://www.deepseek.com/

Trump announces Stargate AI Project

Jan. 22, 2025

On January 21, 2025, President Donald Trump announced the launch of the Stargate project, an ambitious artificial intelligence (AI) infrastructure initiative with an investment of up to $500 billion over four years. This venture is a collaboration between OpenAI, SoftBank, Oracle, and MGX, aiming to bolster AI capabilities within the United States.

· Investment and Infrastructure: The project begins with an initial $100 billion investment to construct data centers and computing systems, starting with a facility in Texas. The total investment is projected to reach $500 billion by 2029.

· Job Creation: Stargate is expected to generate over 100,000 new jobs in the U.S., contributing to economic growth and technological advancement.

· Health Innovations: Leaders involved in the project, including OpenAI CEO Sam Altman and Oracle co-founder Larry Ellison, highlighted AI's potential to accelerate medical breakthroughs, such as early cancer detection and personalized vaccines.

· National Competitiveness: The initiative aims to secure American leadership in AI technology, ensuring that advancements are developed domestically amidst global competition.

That’s my take on it:

While the project has garnered significant support, some skepticism exists regarding the availability of the full $500 billion investment. Elon Musk, for instance, questioned the financing, suggesting that SoftBank has secured well under $10 billion.

Nevertheless, I am very optimistic. Even if Softbank or other partners could not fully fund the project, eventually investment would snowball when the project demonstrates promising results. In industries with high growth potential, such as AI, no investor or major player wants to be left behind. If the Stargate project starts delivering significant breakthroughs, companies and governments alike will want to participate to avoid losing competitive advantage.

Some people may argue that there is some resemblance between the internet bubble in the late 1990s and the AI hype today. The late 1990s saw massive investments in internet companies, many of which were overhyped and underdelivered. Valuations skyrocketed despite shaky business models, leading to the dot-com crash. Will history repeat itself?

It is important to note that the internet bubble happened at a time when infrastructure (broadband, cloud computing, etc.) was still in its infancy. AI today benefits from mature infrastructure, such as powerful cloud platforms (e.g., Amazon Web Services) , advanced GPUs, and massive datasets, which makes its development more sustainable and its results more immediate.

The internet primarily transformed communication and commerce. AI, on the other hand, is a general-purpose technology that extends its power across industries—healthcare, finance, education, manufacturing, entertainment, and more. Its applications are far broader, making its overall impact more profound and long-lasting.

Links:

https://www.cbsnews.com/news/trump-stargate-ai-openai-softbank-oracle-musk/

https://www.cnn.com/2025/01/22/tech/elon-musk-trump-stargate-openai/index.htmlhis

https://openai.com/index/announcing-the-stargate-project/

World Economic Forum: Future of Jobs Report 2025

Jan 15, 2025

Recently the World Economic Forum released the 2025 "Future of Jobs Report." The following is a summary focusing on job gains and losses due to AI and big data:

Job Gains

  • Fastest-Growing Roles: AI and big data are among the top drivers of job growth. Roles such as Big Data Specialists, AI and Machine Learning Specialists, Data Analysts, and Software Developers are projected to experience significant growth.

  • Projected Net Growth: By 2030, AI and information processing technologies are expected to create 11 million jobs, contributing to a net employment increase of 78 million jobs globally.

  • Green Transition Influence: Roles combining AI with environmental sustainability, such as Renewable Energy Engineers and Environmental Engineers, are also seeing growth due to efforts to mitigate climate change.

  • AI-Enhanced Tasks: Generative AI (GenAI) could empower less specialized workers to perform expert tasks, expanding the functionality of various roles and enhancing productivity.

Job Losses

  • Fastest-Declining Roles: Clerical jobs such as Data Entry Clerks, Administrative Assistants, Bank Tellers, and Cashiers are expected to decline as AI and automation streamline these functions.

  • Projected Job Displacement: AI and robotics are projected to displace approximately 9 million jobs globally by 2030.

  • Manual and Routine Work Impact: Jobs requiring manual dexterity, endurance, or repetitive tasks are most vulnerable to automation and AI-driven disruptions.

Trends and Dynamics

  • Human-Machine Collaboration: By 2030, work tasks are expected to be evenly split between humans, machines, and collaborative efforts, signaling a shift toward augmented roles.

  • Upskilling Needs: Approximately 39% of workers will need significant reskilling or upskilling by 2030 to meet the demands of AI and big data-driven roles.

  • Barriers to Transformation: Skill gaps are identified as a major challenge, with 63% of employers viewing them as a significant barrier to adopting AI-driven innovations.

That’s my take on it:

The report underscores the dual impact of AI and big data as key drivers of both job creation in advanced roles and displacement in routine, manual, and clerical jobs. Organizations and higher education should invest in reskilling initiatives to bridge the skills gap and mitigate job losses. However, there is a critical dilemma in addressing the reskilling and upskilling challenge. If faculty and instructors have not been reskilled or upskilled, how can we help our students to face the AI and big data challenges? As a matter of face, instructors often lack exposure to the latest technological advancements that are critical to the modern workforce. There is often a gap between what educators teach and what the industry demands, especially in rapidly evolving fields. Put it bluntly, the age of “evergreen” syllabus is over. The pace of technological advancements often outstrips the ability of educational systems to update curricula and training materials. To cope with the trend in the job market, we need to collaborate with technology companies (e.g., Google, Amazon, Nivida, Microsoft…etc.) to co-create curricula, fund training programs, and provide real-world learning experiences for both educators and students.

Link: https://reports.weforum.org/docs/WEF_Future_of_Jobs_Report_2025.pdf

black blue and yellow textile
black blue and yellow textile

DSML trend: Top 10 AI-related jobs in 2025

an abstract photograph of a curved wall
an abstract photograph of a curved wall

Jan 6, 2025

Two days ago (Jan 6, 2025) Kanwal Mehreen, KDnuggets Technical Editor and Content Specialist on Artificial Intelligence, posted an article on KDnuggets, highlighting the top 10 high-paying AI skills in 2025:

Position and expected salaries

1. Large Language Model Engineering ($150,000-220,000/year

2. AI Ethics and Governance ($121,800/year)

3. Generative AI and Diffusion Models ($174,727/year)

4. Machine Learning Ops and On-Prem AI Infrastructure ($165,000/year)

5. AI for Healthcare Applications ($27,000 to $215,000)

6. Green AI and Efficiency Engineering ($90,000 and $130,000/year)

7. AI Security ($85,804/year)

8. Multimodal AI Development ($150,000–$220,000/year)

9. Reinforcement Learning (RL) ($121,000/year)

10. Edge AI/On-Device AI Development ($150,000+/year)

That’s my take on it:

When I mention AI-related jobs, most people associate these positions with programming, engineering, mathematics, statistics…etc. However, as you can see, the demand for AI ethics is ranked second on the list. AI ethics is indeed a skill in high demand, and the training of professionals in this area often spans multiple disciplines. Many come from backgrounds such as philosophy, law, mass communication, and social sciences. For example, Professor Shannon Vallor is a philosopher of technology specializing in ethics of data and AI. Dr. Kate Crawford is a Microsoft researcher who studies the social and political implications of artificial intelligence. She was a professor of journalism and Media Research Centre at the University of New South Wales.

In an era where AI and data science increasingly shape our lives, the absence of ethics education in many data science and AI programs is a glaring omission. By embedding perspectives on ethics from multiple disciplines into AI and data science education, we can ensure these powerful tools are used to create a future that is not just innovative, but also just and equitable. After all, AI ethicist is a high-paying job! Why not?

Link: https://www.kdnuggets.com/top-10-high-paying-ai-skills-learn-2025

Nvidia will launch a personal AI supercomputer

gray concrete building during daytime
gray concrete building during daytime

1/7/2025

Today (Jan 7, 2025) at Consumer Electronics Summit (CES) AI giant Nvidia announced Project Digits, a personal AI supercomputer set to launch in May 2025. The system is powered by the new GB10 Grace Blackwell Superchip and is designed to bring data center-level AI computing capabilities to a desktop form factor similar to a Mac Mini, running on standard power outlets. With a starting price of $3,000, Project Digits can handle AI models up to 200 billion parameters.

The GB10 chip, developed in collaboration with MediaTek, delivers 1 petaflop of AI performance. The system runs on Nvidia DGX OS (Linux-based) and comes with comprehensive AI software support, including development kits, pre-trained models, and compatibility with frameworks like PyTorch and Python.

Nvidia’s CEO Jensen Huang emphasized that Project Digits aims to democratize AI computing by bringing supercomputer capabilities to developers, data scientists, researchers, and students. The system allows for local AI model development and testing, with seamless deployment options to cloud or data center infrastructure using the same architecture and Nvidia AI Enterprise software platform.

That’s my take on it:

A few decades ago, access to supercomputers like Cray and CM5 was limited to elite scientists and well-funded institutions. Today, with initiatives like Project Digits, virtually anyone can harness the computational power needed for sophisticated projects. This democratization of technology allows scientists at smaller universities, independent researchers, and those in developing countries to test complex theories and models without the prohibitive costs of supercomputer access. This shift enables more diverse perspectives and innovative approaches to scientific challenges. Fields not traditionally associated with high-performance computing, such as sociology, ecology, and archaeology, can now leverage advanced AI models, potentially leading to groundbreaking discoveries.

Given this transformation, it is imperative to update curricula across disciplines. Continuing to teach only classical statistics does a disservice to students. We must integrate AI literacy across various fields, not just in computer science, mathematics, or statistics. Additionally, the focus should be on teaching foundational concepts that remain relevant amidst rapid technological advancements. It is equally critical to emphasize critical thinking about analytical outputs, fostering a deep understanding of their implications rather than solely focusing on technical implementation.

Link: https://www.ces.tech/videos/2025/january/nvidia-keynote/

Fragility of LLMs in the real world

11/20/2024

In a new article published to the arXiv preprint database, MIT, Harvard and Cornell researchers found that Large language models (LLMs) like GPT-4 and Anthropic's Claude 3 Opus struggle to accurately model the real world, especially in dynamic environments. This fragility is highlighted when LLMs are used for navigation. Unexpected changes, such as detours or closed streets, can lead to significant drops in LLMs’ accuracy or total failure.

LLMs trained on random data formed more accurate world models compared to those trained on strategic processes. This is possibly because random data exposes the models to a wider variety of possible steps, even if they are not optimal. The study raises concerns about deploying AI systems in real-world applications, such as driverless cars, where dynamic environments are common. The researcher warns that the lack of coherent world models in LLMs could lead to malfunctions.

That’s my take on it:

The disconnect between clean models and the messy real world is not a new problem. In fact, it mirrors existing challenges in conventional statistics. In parametric statistics, we often make unrealistic assumptions about data structures, such as normality and independence. Robustness to non-normality, heteroskedasticity, and other violations of these assumptions is a highly sought-after feature, and similar principles may apply to LLMs. We expect clean data, rely on linear models despite most real-world relationships being non-linear, and treat experimental methods as the gold standard.

While controlled environments provide clarity and reproducibility, they often fail to capture the richness and unpredictability of real-world scenarios. Similarly, training LLMs on strategically optimized data may cause them to overfit to specific patterns, limiting their generalizability. A promising approach to address this challenge could be to combine LLMs with other models, such as reinforcement learning agents trained in dynamic simulations, to enhance their understanding of complex and dynamic environments.

Link: https://arxiv.org/pdf/2406.03689

black blue and yellow textile
black blue and yellow textile

11/14/2024

According to the South China Morning Post, Doubao, a ByteDance’s conversational AI bot developed by ByteDance launched in August, has quickly become China's most popular AI app, boasting 51 million monthly active users. This far exceeds the user bases of Baidu’s Wenxiaoyan (formerly known as Ernie Bot) with 12.5 million users and Moonshot AI’s Kimi, backed by Alibaba Group, with 10 million users.

Doubao prioritizes personalization and a human-like interaction experience, aiming to make AI more accessible. Doubao's diverse features include writing assistance, summarization, image, audio, and video generation, data analysis, and AI-powered online search. Within three months, it introduced over 20 new skills, earning praise for its effective text editing, logical content organization, and user-friendly design.

That’s my take on it:

While Doubao has demonstrated remarkable growth and capabilities, it’s difficult to directly compare it to global AI tools like ChatGPT, Claude, or Perplexity AI without standardized benchmarks. This highlights a growing divergence in the global AI landscape. Much like the broader internet in China, which is heavily regulated under the Great Firewall since its implementation in 1996, the AI market is shaped by domestic policies and international competition. The Great Firewall restricts access to foreign websites, leading to the creation of Chinese alternatives to global platforms, such as Baidu instead of Google and WeChat instead of WhatsApp. These restrictions mean that Chinese internet users and users in other countries often have vastly different online experiences and knowledge bases.

This pattern extends to AI, where China's market is dominated by domestic products due to regulatory constraints that limit access to global AI tools like ChatGPT, Claude, Google Gemini, and Perplexity AI. These American AI companies choose not to operate in China due to difficulties in complying with local laws and regulations regarding AI and information control. As technology advances, it raises a critical question: does it bring people closer together, or does it reinforce divisions? The parallel growth of distinct digital ecosystems suggests that technology, while offering unprecedented possibilities, also has the potential to deepen divides.

Link: https://www.scmp.com/tech/tech-trends/article/3286276/chinas-hottest-ai-bot-bytedances-doubao-tops-charts-51-million-active-users?module=top_story&pgtype=subsection

Doubao becomes the most popular AI bot in China

black blue and yellow textile

Does Recraft outperform Ideogram?

11/8/2024

Recently Recraft, particularly its latest release, Recraft V3, is attracting attention for its impressive ability to generate highly accurate text within images. It is said to be superior to other AI image generators, including Ideogram. One standout feature of Recraft V3 is its capability to produce images containing extended text, not just a few words. Additionally, Recraft V3 is praised for its anatomical precision, an area where many AI image generators struggle, especially with hands and faces. Unlike some other generators, Recraft V3 also supports vector image generation, making it particularly beneficial for designers.

That’s my take on it:

To test this, I compared Ideogram V2 and Recraft V3 with the prompt: “an AI robot and a data scientist meet together. The T-shirt of the data scientist has these exact words: Pattern seeking in data science.” Interestingly, although all four images from Ideogram V2 met my specifications (bottom), Recraft’s output included spelling errors like “Pattern Sekins in Data Science” and “Patern seeking in data science.” (Right) As a researcher, I know that multiple trials are necessary for a robust conclusion. I’ll continue testing and will share my findings. However, for now, I recommend sticking with Ideogram.

woman in black shirt wearing sunglasses
woman in black shirt wearing sunglasses

Research suggests LLMs lead to homogenization of ideas and cognitive decline

11/1/2024

A recent study conducted by the University of Toronto researchers found that in the long run use of Large Language Models (LLMs) may reduce human creativity in terms of divergent and convergent thinking. The study involved two large experiments with 1,100 participants to assess how different forms of LLM assistance affect independent creative performance. It was found that initially LLM assistance can enhance creativity during assisted tasks, but may hinder independent creative performance in subsequent unassisted tasks. Participants who had no prior exposure to LLMs generally performed better in the test phase, suggesting that reliance on LLMs could impair inherent creative abilities.

The effects of LLMs varied significantly between divergent and convergent thinking tasks. In divergent thinking, where participants needed to propose alternatives, they showed skepticism towards LLM assistance. Conversely, in convergent tasks, where participants were asked to narrow down diverse ideas to the final solution, they tended to accept LLM assistance. The study found that LLM-generated strategies could lead to a homogenization of ideas, where participants produced more similar outcomes even after ceasing LLM use. This effect was particularly pronounced in the divergent thinking tasks, raising concerns about the long-term impact on creative diversity.

That’s my take on it:

The findings from the University of Toronto study underscore a need to balance AI assistance with practices that actively cultivate our own creativity and critical thinking. To encourage creative independence, people should use AI as a tool to generate initial ideas or inspiration, but refine, expand, and adapt these ideas independently.

This ensures that AI serves as a starting point rather than the end goal, promoting your own creative engagement. As a professor, I will never accept any assignment directly output from AI. For divergent tasks, such as brainstorming, we should deliberately avoid using AI to prevent “homogenized” ideas. We should turn to a variety of resources and experiences for creative inspiration. Books, in-person conversations, physical exploration, and hands-on activities can all spark unique perspectives and insights that AI-generated suggestions may not provide.

Link to the research article: https://arxiv.org/abs/2410.03703

Link to video: https://drive.google.com/file/d/1z-zJXNYVzNo6_ZUe-T_DXGmN6yPG57GA/view?usp=sharing

Questionable practices of Character.AI

10/25/2024

Recently the mother of a 14-year-old boy who died by suicide after becoming deeply engaged with AI chatbots has filed a lawsuit against Character.AI, claiming the company’s technology manipulated her son, Sewell Setzer III. Megan Garcia, his mother, alleges that the AI chatbot app, marketed to children, exposed Sewell to "hypersexualized" and lifelike interactions that contributed to his mental distress. The lawsuit states that Sewell, who began using Character.AI's bots in April 2023, grew obsessed with personas based on characters from Game of Thrones, especially the Daenerys chatbot. This chatbot reportedly engaged in intimate, emotionally charged conversations with Sewell, including discussions on suicide. After expressing suicidal thoughts, Sewell allegedly received responses that reinforced these thoughts, leading up to his tragic death in February 2024.

Character.AI expressed condolences and emphasized recent updates, including safety features for users under 18 to reduce exposure to sensitive content and discourage prolonged usage. Garcia’s legal team claims that Sewell lacked the maturity to recognize the AI’s fictional nature and alleges that Google, due to its close ties with Character.AI, should also be held accountable. However, Google denies involvement in the development of Character.AI’s products.

That’s my take on it:

Currently, the field of AI remains largely unregulated, and this isn’t the first time Character.AI has faced allegations of unethical practices. Previously, it was discovered that Character.AI used the face of a deceased woman as a chatbot without her family’s consent, raising further ethical concerns.

Regarding the current case, Character.AI has a duty to protect minors, especially from potentially manipulative or harmful interactions. Given Sewell’s young age and apparent emotional vulnerability, the chatbot's responses—particularly on topics like suicide—raise significant ethical concerns. AI systems marketed to the public should include stringent protections to prevent unintended harm, especially among younger or emotionally vulnerable users. Ethical AI involves ensuring users understand that they are interacting with a program, not a real person. Despite Character.AI’s disclaimer efforts, many users, especially younger ones, might still struggle to fully separate the AI from a genuine human connection. For minors, such “relationships” with virtual characters could create emotional dependency, as seen with Sewell and the chatbot he interacted with.

Links:

https://futurism.com/character-ai-murdered-woman-crecente

https://www.nbcnews.com/tech/characterai-lawsuit-florida-teen-death-rcna176791