AI Agents for Pen Testing and Cybersecurity
Over the weekend, I read an interesting paper[1] by researchers at Stanford, Carnegie Mellon, and Gray Swan AI about an AI agent they built that outperformed 9 out of 10 humans. It got me thinking about AI tooling for pen testing, and I learned about a neat tool called Strix[2]. Strix built autonomous AI agents that act just like real hackers - they run your code dynamically, find vulnerabilities, and validate them through actual proofs-of-concept. Strix works with many models, including Claude.
I’ve been using Claude Code for a little while and have really enjoyed my time with it buiding applications, troubleshooting code, and using it as a sounding board for the new curriculum I plan to use this semester. As I dove into Strix using Claude for pen testing, I found that it is extremely capable. I had it run a pen test against one of my applications, and it returned with a 65-page write-up with findings; one of the findings had a CVSS score of 9.8 (oof!). Strix validated the findings, walked me through the methodology, and recommended fixes. I gave it the URL of my application, access to my GitHub repo, and login credentials and Strix did the rest. Strix uses APIs to interact with the models it supports and can get expensive when doing deep analysis. I wanted a way to use my already existing Claude Code Max subscription instead of paying for the more expensive API usage.
I paired with Claude to modify the Strix code base to work with Claude Code in a local environment. I wanted to expose the Strix agents via an MCP server so the Strix tooling would be available to Claude Code using the subscription we already had. Additionally, I wanted the tests to continue running after my session ended on my EC2 instance. We set out to build the app[3] so we could run the tests in a screen and come back to them later. I also needed a good way to manage concurrent scans. I wanted to know which scans were running, an easy way to attach to them to view status, and a way to clean them up quickly after they finished. So we built a terminal UI wrapper that calls the app that runs the tests. We can now run pen tests using our Claude Code subscription on remote Linux servers that persist between sessions.
If you have not looked at AI for pen testing I think you might want to. I don’t think it will replace the pros anytime soon but it seems to be a pretty good sidekick for a developer.
Typo of Theseus: Reviving the oldest Ruby on Rails Open Source Project
Reviving might be a bit of an over statement. The project I “revived” is still actively supported just under a different name.
Last week, I pulled down an old Ruby on Rails project, Typo[1] (now known as Publify[2]), widely regarded as the first Ruby on Rails blogging platform and one of the oldest open-source projects built on Rails. I forked it from Emmanuel Auffray[3], who had version 6.1.0 on GitHub, and quickly realized just how old the codebase was—the project originally shipped in January 2005, right at the dawn of Rails itself. Its last meaningful development activity was over a decade ago.
I had a simple task in mind: see if I could use Claude Opus 4.5 to get it running on Rails 7 and Ruby 3.4.7, even though it originally shipped with Rails 3.0.10 and Ruby 1.9.3. I told Claude not to ask me for permission and to just go to work upgrading what it needed using the Claude CLI. Over the holidays, Anthropic doubled Claude’s usage window, which meant I could let Claude run uninterrupted for hours at a time.
Claude ran for nearly 18 hours straight, debugging, attempting fixes, failing, retrying, and debugging again. By the end of the hackathon, I had a running Typo instance with many failing tests and a truly Frankenstein backend. Once I had time to review the code, I realized what had happened: Claude had stitched together a hybrid of Rails 3 and Rails 7, with a compatibility layer to keep old components alive under a modern framework—no wonder the tests were failing. Anyone who has lived through a major Rails upgrade[4]) will recognize this pattern.
I had to give Claude additional instructions to remove deprecated code, rewrite things the right (Rails) way, and—while we were at it—upgrade everything again to Rails 8. I wasn’t able to oneshot the application to Rails 8, but by iterating together—error by error—we eventually got there. Once Typo was functional again and core features worked—creating articles, posting comments, managing categories and tags, and loading sidebar plugins—I decided it was time to modernize the user experience. I removed the legacy visual and HTML editors and replaced them with CodeMirror, a proper Markdown editor (CodeMirror[5]). I added Active Storage[6] so I could use S3 for uploads, replaced legacy JavaScript with Turbo and Stimulus via (Hotwire[7]), and experimented with new sidebar plugins for Flickr, Spotify, and X—with mixed results. I also removed a bunch of themes that I was not going to use. When I used Typo for the first time 12 years ago, I was a big fan of.the Scribbish theme by packagethief. I used that as my base template and then leveraged styles I like from different places, mostly based on LaTex[8] templates I have used in the past. Oh, and on posts where I write a lot I can indent paragraphs! I know that seems trivial[9].
On the content side, I built new Typo textfilters for generating PDF slideshows from Markdown, added beautiful code syntax highlighting using Prism.js[10], and implemented automatic link referencing for articles. After all of that, Claude summarized the work we did:
In just 7 days and 16 commits, this modernization effort transformed a decade-old Rails 3 application into a modern Rails 8 application with:
- 5 major Rails version upgrades
- 2 major Ruby version upgrades
- 54% more test coverage
- 29% fewer files, resulting in a leaner codebase
- A modern CI/CD pipeline using GitHub Actions
- Docker deployment support
- A fully modern JavaScript architecture
The repository—now totaling 3,740 commits since January 2005—has been successfully brought up to modern standards while preserving its original purpose as a blogging platform.
References
- typosphere.org
- github.com/publify/publify
- github.com/ManuInNZ/typo
- guides.rubyonrails.org/upgrading_ruby_on_rails.html
- codemirror.net
- guides.rubyonrails.org/active_storage_overview.html
- hotwired.dev
- latex-project.org
- reddit.com/r/TrueAskReddit/comments/33qilf/why_dont_we_indent_paragraphs_online/
- prismjs.com
ARobot.Wiki - Claude and FIRST® Robotics Competition
Over the last week or so, I’ve had some time for Learn and Be Curious[1]. I wanted to get better acquainted with Amazon Bedrock[2] and its vast offerings. I also have a kick-off on January 10th for FRC[3], when FIRST will release a new robotics challenge for High School Robotics teams. Each year, these challenges kick off with a new game and rules. The rules are released at the same time the game is announced, so it’s imperative that teams understand the rules as quickly as possible. A team will frequently reference the rules throughout the build season as they design, prototype, and build their robot for competition.
Many of the mentors on my team are not district employees and can’t access school computers for research, so they rely heavily on their phones (we also don’t have many computers in our build space). I wanted to help solve the mentors’ problem with checking the rules for specifics, and also introduce my students to AI. My team is required to read the game manual from cover to cover so they have a good understanding of the game. Sometimes, though, they’ll be working on a particular problem with the robot, recall a rule they read, but not remember where they read it, so they can reference it. This sounded like a perfect problem for a Retrieval-Augmented Generation (RAG) AI. So I set out, with the help of Claude, to develop a mobile-friendly web application that FRC teams can use to quickly reference the rules by asking questions.
We developed ARobot.wiki[4] over several days. The app uses AWS Bedrock with Titan (for vectors) and Claude Sonnet 4.5 (for information retrieval) that allows users to ask it questions about last years game. The system provides answers and citations for each, which users can click to open the PDF game manual and see where the citation came from. This worked pretty well out of the box, but we ran into issues where Claude was missing some FRC context. To overcome this challenge, in addition to the vectors, we implemented agentic lookups that allowed Claude to follow additional context vectors to provide users with better answers. Additionally, we also implemented an FRC glossary of terms to give Claude additional context as it worked to understand our users’ questions. To date, we have over 50 users and 35 unique teams registered and gearing up for the REBUILT[5] kick-off next weekend. When the new game manual is released, we will clear out the old rules and replace them with the new rule book. The best part of the application is the wiki that allows users to share helpful conversations with the rest of the FRC community.
The application has been a lot of fun to build, and we’ve encountered some interesting challenges. One of the biggest challenges to overcome with an application like this is trust, not just is the AI being truthful, but more specifically, with data security and privacy, because minors are using it (13+). We wanted to build a platform that parents, coaches, mentors, and students could use together. Some parents (me included) are cautious about the AI tools we let our kids use, so we built a platform that gives parents complete oversight of their students’ conversations to ensure appropriate engagement. We also built in fairly heavy guardrails with the agent to keep it within its intended purpose as it answers questions.
Overall, I am very pleased with the application. Existing software engineering practices and rituals are very helpful when working with coding agents. I learned a lot about building specs and pairing with Claude. The biggest lesson I learned was that Test-Driven-Development[6] and Gherkin scenarios[7] will take a lot of the re-work out when working with agents. Lastly, a well-formatted and thoughtful Claude.md[8] file will help keep the agent on track and remind it to follow the TDD / BDD coding instructions.
I have shared slides that Claude wrote as we completed the journey together and our specs[9]. The slides are more technical and discuss some of the design decisions we made and our tech stack.
References
- amazon.jobs/content/en/our-workplace/leadership-principles
- aws.amazon.com/bedrock/
- firstinspires.org/programs/frc/
- arobot.wiki
- firstinspires.org/programs/frc/game-and-season
- martinfowler.com/bliki/TestDrivenDevelopment.html
- cucumber.io/docs/gherkin/reference/
- anthropic.com/engineering/claude-code-best-practices
- github.com/tghastings/frc-rag-docs/blob/main/README.md
Ruby 4.0!
Ruby 4.0 was released[1] on Christmas day! They also re-designed the Ruby website[2] and it is beautiful. You can tell a lot of love went into the new design. I’m not sure how they do it but the language continues to bring joy with each release.
# Output "I love Ruby"
say = "I love Ruby"
puts say
# Output "I *LOVE* RUBY"
say = say.sub("love", "*love*")
puts say.upcase
# Output "I *love* Ruby"
# five times
5.times { puts say }
New Book for Christmas
I received a new book for Christmas, Disciplines of a Godly Man[1] by R. Kent Hughes. It has a lot of Godly wisdom and reminders about what men are called to, and it’s given me a fresh perspective. I’m only a few chapters in and already have a lot of work to do. Each chapter ends with a reflection. I need to go back and re-read each chapter at least twice to gather and internalize all of the wisdom.
It has been nice to take a break from the computer and dive deeper into this book. This season has been filled with many advancements in AI, which has made my life interesting and a little busier than usual for December (I’ll share more about that later). In the meantime, I wanted to get my first post for 2026 out. Here’s to many more.
References
Spring 2025 Student Project Reflections
DevEdu... Edtech Cloud Development Environment
Excited to announce the lunch of DevEdu[1]. Educators can create an account, create a course, and assign development environments to the course. Students access containerized development environments through their web browsers, featuring VSCode with integrated terminal access.
Instructors can configure specific environments—I use Django in my UCCS courses—where they specify the version of Python and Django each student gets. Students enroll using course links and launch containers matching the instructor’s specifications.
For students unable to purchase licenses, the platform offers docker images as open-source software so they can host the environments locally. The service also partners with bookstores to provide bulk licensing through textbook affordability programs.
The platform has proven successful at UCCS with over 100 students, eliminating the need for individual technical support on personal devices while maintaining flexibility for different programming stacks.
References
New Year, New Blog, Welcome Rails 8!
I love this time of year. Things at work slowdown and I have time to learn new tools and techniques. This year, I’ve spent my time moving my blog over from Rails 6 with React and Devise to native Rails 8. Rails 8 provides an authentication mechanism and many new improvements.
Graduated 2024! Ph.D.
I finally made it. After six years studying at the University of Colorado Colorado Springs, I graduated in May with my Ph.D. in Engineering. I’m excited to be done and focus on my family and professional career that still includes teaching occasionally.