Ideas Worth Exploring: 2025-03-26

Ideas: Benj Edwards - Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries

Benj Edwards discusses a growing issue in the open source community where aggressive AI crawlers from companies like Amazon, OpenAI, Anthropic, and Alibaba are overwhelming Git repositories, causing instability, increased bandwidth costs, and DDoS-like attacks on vital public resources. These crawlers evade standard defensive measures by spoofing user agents, using residential IP addresses as proxies, and ignoring robots.txt directives.

Software developers are resorting to measures like moving servers behind VPNs and implementing custom proof-of-work challenge systems to filter out bot traffic. However, these solutions can negatively impact legitimate users by causing delays or increased waiting times for computational puzzles to complete, particularly on mobile devices.

Open source projects, which typically operate with limited resources compared to commercial entities, are facing a disproportionate burden from these AI crawlers. The costs extend beyond infrastructure strain as the crawlers often hit expensive endpoints like git blame and log pages, placing additional strain on limited resources. Some open source projects have also started receiving AI-generated bug reports, wasting developer time with fabricated vulnerabilities.

It is unclear why these companies do not adopt more collaborative approaches or rate-limit their data harvesting runs to avoid overwhelming source websites. The article mentions that some companies like OpenAI and Anthropic are at least setting proper user agent strings, while others like Alibaba are reportedly more deceptive in their approaches.

In response to these attacks, defensive tools like Nepenthes and Cloudflare's AI Labyrinth have emerged to trap crawlers in endless mazes of fake content or link them to a series of AI-generated pages, aiming to waste the companies' resources and potentially poison their training data. The "ai.robots.txt" project also offers an open list of web crawlers associated with AI companies and provides premade robots.txt files to implement the Robots Exclusion Protocol.

GitHub Repos: Whose code am I running in GitHub Actions?

https://alexwlchan.net/2025/github-actions-audit

The author shares their ideas about finding a security vulnerability in the GitHub Action 'tj-actions/changed-files' which was exploited to leak secrets by adding malicious code. This attack was possible due to the use of mutable Git tags instead of immutable commit IDs, allowing an attacker to change the code that runs when a workflow is triggered.

The article then provides a shell script that can be used to find all the GitHub Actions being used in your repositories and counts how many times each one appears. It explains how to use Unix pipelines to chain together various text processing tools like find, xargs, sed, tr, awk, and sort to achieve this. The script helps users identify which actions they are using, evaluate the trustworthiness of their authors, and decide whether they need to write their own scripts instead.

The author encourages readers who use GitHub Actions to run the script on their repositories to check what they're using and become familiar with Unix text processing tools and pipelines for creating one-off scripts for data processing.

GitHub Repos: Versus Incident

https://github.com/VersusControl/versus-incident

An open-source incident management tool that supports alerting across multiple channels with easy custom messaging and on-call integrations. Compatible with any tool supporting webhook alerts, it’s designed for modern DevOps teams to quickly respond to production incidents.

Multi-channel Alerts: Send incident notifications to Slack, Microsoft Teams, Telegram, and Email (more channels coming!)
Custom Templates: Define your own alert messages using Go templates
Easy Configuration: YAML-based configuration with environment variables support
REST API: Simple HTTP interface to receive alerts
On-call: On-call integrations with AWS Incident Manager

Ideas: Robin Wieruch - Authorization in Next.js

https://www.robinwieruch.de/next-authorization

Robin Wieruch provides a guide on implementing authorization in Next.js when using React Server Components and Server Actions in Next's App Router. It emphasizes the importance of enforcing authorization before users can access data sources, typically done in the API layer. The guide outlines approaches for authorization for data access, routing, UI, and middleware.

For data access, authorization checks should be implemented in custom query functions or Server Actions to prevent unauthorized read and write operations. Routing-based authorization can be achieved by applying checks in entry point components that have access to the database to prevent unauthorized users from accessing certain routes. However, using Layout components for authorization is not recommended as a security solution but may provide convenience for developers.

Robin Wieruch concludes that the most important part of authorization is having it as close as possible to sensitive data, emphasizing the need for robust authorization checks in API, Service, and Data Access Layers.

Ideas: Paul Stamatiou - Browse No More

https://paulstamatiou.com/browse-no-more

Paul Stamatiou discusses the decline of browsing the web due to the increasing use of AI answer engines such as ChatGPT, Perplexity, Grok, Copilot, and Gemini.

These tools provide quick answers to queries but sacrifice the joy of serendipitous discovery, personal connections, and authentic web experiences that traditional search engines offered.

The author laments the lack of transparency in how these AI tools function, their de-prioritization of attribution, black-box decision making, and homogenization of responses. The article suggests a need for intentional personalization, transparency, and control in AI tools to enrich the web browsing experience.

Ideas: Ignite Digital - Goldilocks Effect 🧠 Why We Buy

https://ckarchive.com/b/n4uohvhxg3lels7q339qeh6dxngggil

The article discusses the Goldilocks Effect in marketing, a psychological phenomenon where consumers tend to avoid extremes and prefer the middle option due to factors such as Extremeness Aversion, Loss Aversion, and Regret Aversion. This effect can be utilized by businesses to drive sales by strategically presenting options that make the desired choice appear most attractive.

Examples provided include using a decoy option to drive sales in e-commerce, segmenting plans by user identity in streaming services, and making the middle plan stand out in SaaS offerings. By understanding and applying this effect, businesses can influence customers' purchasing decisions and increase their sales effectively.

Ideas Worth Exploring: 2025-03-26

Ideas: Benj Edwards - Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries

GitHub Repos: Whose code am I running in GitHub Actions?

GitHub Repos: Versus Incident

Ideas: Robin Wieruch - Authorization in Next.js

Ideas: Paul Stamatiou - Browse No More

Ideas: Ignite Digital - Goldilocks Effect 🧠 Why We Buy

Recent Posts

Comments

Ideas: Benj Edwards - Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries

GitHub Repos: Whose code am I running in GitHub Actions?

GitHub Repos: Versus Incident

Ideas: Robin Wieruch - Authorization in Next.js

Ideas: Paul Stamatiou - Browse No More

Ideas: Ignite Digital​ - Goldilocks Effect 🧠 Why We Buy

Comments

Ideas: Ignite Digital - Goldilocks Effect 🧠 Why We Buy