top of page

Ideas Worth Exploring: 2025-03-26

  • Writer: Charles Ray
    Charles Ray
  • Mar 26
  • 4 min read

Updated: 5 days ago

Ideas: Benj Edwards - Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries


spider web

Benj Edwards discusses a growing issue in the open source community where aggressive AI crawlers from companies like Amazon, OpenAI, Anthropic, and Alibaba are overwhelming Git repositories, causing instability, increased bandwidth costs, and DDoS-like attacks on vital public resources. These crawlers evade standard defensive measures by spoofing user agents, using residential IP addresses as proxies, and ignoring robots.txt directives.


Software developers are resorting to measures like moving servers behind VPNs and implementing custom proof-of-work challenge systems to filter out bot traffic. However, these solutions can negatively impact legitimate users by causing delays or increased waiting times for computational puzzles to complete, particularly on mobile devices.


Open source projects, which typically operate with limited resources compared to commercial entities, are facing a disproportionate burden from these AI crawlers. The costs extend beyond infrastructure strain as the crawlers often hit expensive endpoints like git blame and log pages, placing additional strain on limited resources. Some open source projects have also started receiving AI-generated bug reports, wasting developer time with fabricated vulnerabilities.


It is unclear why these companies do not adopt more collaborative approaches or rate-limit their data harvesting runs to avoid overwhelming source websites. The article mentions that some companies like OpenAI and Anthropic are at least setting proper user agent strings, while others like Alibaba are reportedly more deceptive in their approaches.


In response to these attacks, defensive tools like Nepenthes and Cloudflare's AI Labyrinth have emerged to trap crawlers in endless mazes of fake content or link them to a series of AI-generated pages, aiming to waste the companies' resources and potentially poison their training data. The "ai.robots.txt" project also offers an open list of web crawlers associated with AI companies and provides premade robots.txt files to implement the Robots Exclusion Protocol.


GitHub Repos: Whose code am I running in GitHub Actions?


ghost behind door

The author shares their ideas about finding a security vulnerability in the GitHub Action 'tj-actions/changed-files' which was exploited to leak secrets by adding malicious code. This attack was possible due to the use of mutable Git tags instead of immutable commit IDs, allowing an attacker to change the code that runs when a workflow is triggered.


The article then provides a shell script that can be used to find all the GitHub Actions being used in your repositories and counts how many times each one appears. It explains how to use Unix pipelines to chain together various text processing tools like find, xargs, sed, tr, awk, and sort to achieve this. The script helps users identify which actions they are using, evaluate the trustworthiness of their authors, and decide whether they need to write their own scripts instead.


The author encourages readers who use GitHub Actions to run the script on their repositories to check what they're using and become familiar with Unix text processing tools and pipelines for creating one-off scripts for data processing.


GitHub Repos: Versus Incident


stop sign

An open-source incident management tool that supports alerting across multiple channels with easy custom messaging and on-call integrations. Compatible with any tool supporting webhook alerts, it’s designed for modern DevOps teams to quickly respond to production incidents.


  • Multi-channel Alerts: Send incident notifications to Slack, Microsoft Teams, Telegram, and Email (more channels coming!)

  • Custom Templates: Define your own alert messages using Go templates

  • Easy Configuration: YAML-based configuration with environment variables support

  • REST API: Simple HTTP interface to receive alerts

  • On-call: On-call integrations with AWS Incident Manager


Ideas: Robin Wieruch - Authorization in Next.js


lock

Robin Wieruch provides a guide on implementing authorization in Next.js when using React Server Components and Server Actions in Next's App Router. It emphasizes the importance of enforcing authorization before users can access data sources, typically done in the API layer. The guide outlines approaches for authorization for data access, routing, UI, and middleware.


For data access, authorization checks should be implemented in custom query functions or Server Actions to prevent unauthorized read and write operations. Routing-based authorization can be achieved by applying checks in entry point components that have access to the database to prevent unauthorized users from accessing certain routes. However, using Layout components for authorization is not recommended as a security solution but may provide convenience for developers.


Robin Wieruch concludes that the most important part of authorization is having it as close as possible to sensitive data, emphasizing the need for robust authorization checks in API, Service, and Data Access Layers.


Ideas: Paul Stamatiou - Browse No More


crazy AI

Paul Stamatiou discusses the decline of browsing the web due to the increasing use of AI answer engines such as ChatGPT, Perplexity, Grok, Copilot, and Gemini.


These tools provide quick answers to queries but sacrifice the joy of serendipitous discovery, personal connections, and authentic web experiences that traditional search engines offered.


The author laments the lack of transparency in how these AI tools function, their de-prioritization of attribution, black-box decision making, and homogenization of responses. The article suggests a need for intentional personalization, transparency, and control in AI tools to enrich the web browsing experience.


Ideas: Ignite Digital​ - Goldilocks Effect 🧠 Why We Buy


goldilocks

The article discusses the Goldilocks Effect in marketing, a psychological phenomenon where consumers tend to avoid extremes and prefer the middle option due to factors such as Extremeness Aversion, Loss Aversion, and Regret Aversion. This effect can be utilized by businesses to drive sales by strategically presenting options that make the desired choice appear most attractive.


Examples provided include using a decoy option to drive sales in e-commerce, segmenting plans by user identity in streaming services, and making the middle plan stand out in SaaS offerings. By understanding and applying this effect, businesses can influence customers' purchasing decisions and increase their sales effectively.

Comments


Mitcer Incorporated | Challenge? Understood. Solved! ͭ ͫ  

288 Indian Road

Toronto, ON, M6R 2X2

All material on or associated with this web site is for informational and educational purposes only. It is not a recommendation of any specific investment product, strategy, or decision, and is not intended to suggest taking or refraining from any course of  action. It is not intended to address the needs, circumstances, and objectives of any specific investor. All material on or associated with this website is not meant as tax or legal advice.  Any person or entity undertaking any investment needs to consult a financial advisor and/or tax professional before making investment, financial and/or tax-related decisions.

©2025 by Mitcer Incorporated. Powered and secured by Wix

  • Instagram
  • Facebook
  • X
  • LinkedIn
bottom of page