Will Smidlein's Blog

uv: It's Really Good

Basically, they took what was working in pip, rye and poetry, and discarded all the stuff that didn’t work. Then they spent months killing tickets to bring it to an insane level of quality.

This cannot be understated, as such a level of quality and dedication is so extremely rare in software that I usually associate it with things like VLC or sqlite. This is the league I consider uv in.

Always try uv first. If it doesn’t work (which is very rare), go back to what you did before or find a workaround.

If you haven’t played with uv yet, I can’t recommend it enough. Python ecosystem tools come and go but this one feels like it’s got some staying power. Lots of tremendously well thought-out decisions, many of which are laid out in the linked article. No magic, just good abstractions. Such a joy.

I will say I am somewhat confused about who is paying for all of this. Like obviously VCs but uh… why? Definitely not out of the goodness of their hearts. Similar uneasiness around Bun, which I also love.

Harper Reed's LLM codegen workflow

For some reason I say “over my skies” a lot when talking about LLMs. I don’t know why. It resonates with me. Maybe it’s because it is beautiful smooth powder skiing, and then all of a sudden you are like “WHAT THE FUCK IS GOING ON!” and are completely lost and suddenly fall off a cliff.

I find that using a planning step […] can help keep things under control. At least you will have a doc you can double-check against. I also do believe that testing is helpful - especially if you are doing wild style aider coding. Helps keep things good, and tight.

Broad strokes this is very similar to my workflow but there lots of nuggets of wisdom in this post. He also goes into a lot of interesting detail about using LLMs in non-Greenfield projects. Just a great read all around.

First time hearing about Aider and repomix, excited to try them out.

Where The Linear Chat Paradigm Starts To Fall Apart

Expanding on this comment from my last post:

I have yet to find a UI that lets me tag a chunk of response (eg a specific bullet point) to come back to, or what I really want which is a waterfall of the different conversations that split out of a long response. Typically I want to respond and ask more (or provide more detail) about 2-5 bullet points but in a purely linear structure I’m constantly scrolling back up and trying to remember things I want to loop back and ask about.

I put together a (totally fake) example diagram to illustrate what I mean:

Human

How can I improve this code?

Consistent Error Handling and Wrapping

Context-Aware Concurrency

More Idiomatic Data Structures

Reduce Nesting & Improve Legibility

Centralized Logging

Test Coverage & Benchmarks

Optional: Configurable Retries / Intervals

How do you suggest we solve?

Creates its own logger

Can you just use slog?

Uses slog

Okay, apply & can you check the whole codebase?

Finds ~30%

Okay fix; any more?

Finds another ~20%

Okay fix, I'll do the rest with find-and-replace

Why go beyond Go's built-in primitives?

Explanation

Hm, makes sense—apply

Client in client.go (code)

Try pseudocode?

{code}

Great, apply it

Which data structures?

bindingManager in internal/client/binding.go

Great, apply it. Any others?

TransactionMap in internal/client/transaction_manager.go

Great, apply it

Intentional; idiomatic Go

(No changes needed)

Duplicate; ignore

Expand on the benchmarks

Big verbose response

It's fine we use an external test suite

Expand on test coverage

Transaction Testing in internal/client/transaction.go

Is that the right place for the test?

No, (fix)

Great apply it

Binding Manager Testing in internal/client/binding_test.go

Can we collocate these?

Yes, {code}

Great apply it

Are these retries/intervals RFC-defined or other standard?

RFC 5766, 8656, 6062, 8489

Okay they're standardized, no need to make them configurable

As we get further into the “reasoning era” I think this problem will only become more pronounced. It’s surprising to me that none of the major LLM providers have explored branching logic (at least to my knowledge).

The current solution of “scroll back up and reference earlier in the conversation” falls apart as soon as you get past a few messages. You almost need a mechanism that says “pick back up with my state from here”.

More to come…

How I Quickly Build Complex Side Projects With LLMs (in Feb 2025)

I’m a programmer both professionally and, somewhat begrudgingly, as a hobby. I am constantly building dumb hyper-specific side projects for an audience of 1. I have a shell script that orders a coffee at my local coffee drive-through. I have a Chrome extension that scrapes and syncs my Amazon purchases to YNAB. I have a neural network constantly checking the IP cam in my garage on trash day to make sure I took the trash out. These are things I build- almost compulsively- upon encountering the tiniest annoyance or inconvenience. And it’s been that way for as long as I can remember.

LLMs have hypercharged this. At first I was able to do projects in 1 or 2 iterations with a chatbot, but as these things’ capabilities have grown, so has my appetite. I’m building dumber projects of grander scales than I’ve ever built before. I figured I should share my approach and see how it compares to others.

This workflow particularly shines for data-heavy automation projects that would normally be tedious to build and maintain. Some examples I’ve built using this approach:


Step 1: Lay out the requirements

This stage is usually in a chatbot UI and I’m generally reaching for Sonnet or 4o unless the project has complex data parsing/structures at which point I’m usually going for o1.

First specify that the goal of the conversation is to create a comprehensive bullet point list of requirements “in the style of a Jira ticket”. Next specify that you want to have a conversation about the problem and the requirements prior to generating the list- ask it to be extremely detailed about it’s requirement gathering and verbose in it’s response. Instruct it to remain language-agnostic reaching for pseudocode where necessary. Finally the fun part… just word dump the problem. Explain what you want to do in plain English. If you have example input data, provide it. If you have a HAR file from a reverse engineered API, (turn it into Markdown and) provide it. API docs? Provide them.

I have yet to find a UI that lets me tag a chunk of response (eg a specific bullet point) to come back to, or what I really want which is a waterfall of the different conversations that split out of a long response. Typically I want to respond and ask more (or provide more detail) about 2-5 bullet points but in a purely linear structure I’m constantly scrolling back up and trying to remember things I want to loop back and ask about.

At the end you’ll get a nice big text blob with requirements. Save this as a mardown doc - initial-reqs.md. Assume that this requirements list is missing about 30% of what you think you told it. You also cannot assume any logic whatsoever for what it decides to drop and at what part of the process it decides to drop it. But it’s okay! We have a solution…

Step 2: Do it again

Usually still in a chatbot UI, usually reaching for a “smarter” model from a different provider. So if I used Sonnet in the last step, this time I’m probably going for o1. If I used a GPT model, I’m reaching for Gemini 2.0. I have not seen any evidence that this actually matters but anecdotally it feels right so ¯\(ツ)/¯. Using llm more and more for this step.

First specify that the goal is to find any gaps in the requirement document outlined “so that a talented but new Junior engineer can complete it without interruption”. Specify that you’re writing Python (even if the end goal is not Python- these things seem to be trained on a lot of Python… and it’s very easy to iterate on). Specify that you require small well-defined methods with verbose commentary. Ask an open ended question like “am I missing anything?” and answer any questions you feel are relevant. At the end, dump that entire conversation to a markdown file - detailed-reqs.md.

Step 3: Unit tests!

I’m generally in an “LLM-native” code editor like Cursor, VS Code with GH Copilot, or Windsurf for this step. I can’t say there is any distinct pattern of which model I’m reaching for- in fact I’m generally hot swapping between them just for funsies.

I’ve found I’m generally ask for the intended output language at this point. For me that’s either Python (I need iterate on this quickly and run it once), Node.js (I need to run this on a regular basis and can quickly deal with it when something breaks), or Go (I need this to run exactly the same today as it will run 5 years from now). However on a few projects- specifically one parsing really complex CSV structures using Go and another transcoding G.711->Opus with libopus- I found the models choked a bit and got stuck in a loop writing code that would never execute. The trick is to ask it to write the unit tests and code in Python and then ask it to turn that code into Go/Swift/whatever. For the sake of a (marginally) interesting post, I’m going to do that.

Specify that you want a comprehensive set of unit tests that check for conformity against the specifications attached. Attach both initial-reqs.md and detailed-reqs.md. Ask it to include excerpts of the requirements alongside the relevant tests.

This is the part of the process that requires the most brainpower. You need to go through all of the generated tests and ensure they

  1. Make sense
  2. Are comprehensive enough
  3. Cover every iota of intended functionality

This is made even easier by the ability to chat with the models about specific individual tests. I am finding that I generally 3-5x each test size from the initial generation. This may be out of habit but it’s likely because it’s so goddamn easy.

Make sure you’re breaking tests into individual files, I have completely unscientifically settled on <1k lines/file. This is really where an LLM-native editor starts to show it’s perks vs copying and pasting from a chatbot UI.

It is absolutely crucial that you’re reviewing the diffs on a line-by-line basis rather than just blindly accepting the “full file” results. The LLMs will drop comments to an almost comical degree- even for lines they haven’t touched the logic on. This is probably the biggest flaw of these models (or rather, the “code apply” models the IDEs are using) at the moment.

By the end of this step you want a comprehensive test suite. It’s on you, the human, to make sure this test suite actually encompasses everything you want your final program to do. You won’t (or at least I don’t) but you should try really really really hard to. Actually read every single one and question if it makes sense. This is literally the first time in the process that this is happening, so it’s critical that you take your time here.

Step 4: Write the code

If you did your job right in the previous steps, the model should be able to generate code that gets with the expected results in 1-3 steps. If you did not do a good job, it has probably quickly become clear to you what gaps exist in your unit tests. Do not fix the code. Fix the unit tests. It is extremely tempting to be like “oh duh I just forgot to tell it Fahrenheit instead of Celsius, if I correct it I’ll fix this one remaining bug with the program and be done.” It’s tempting because it’s probably true! For simple enough corrections, the LLM will get it right first iteration. But for more complex implementations, you may need to cycle back and forth for many message in order to get it to do what you want in the way you want. These conversations sometimes go totally off the rails into a loop where it’s just suggesting the same 3 solutions that will never work over and over and over again. Those conversations are where the unit tests really shine because you can close them and start fresh ones with zero fear of losing context. You’re only losing conversation-time context, which in this case is actually quite useful. I have yet to encounter a problem I could not solve in this way- although it does sometimes require several attempts to guide the model through the exact implementation path I want. I always always “save the state” of the requirements back into the unit tests.

Even if you get the thing to run in a way that seems like it’s working, it’s very possible some (or even many) unit tests will fail. Again, that’s the beauty of unit tests. Now you’ve just gotta run through them and figure out if it’s the code that’s wrong, the unit test that’s wrong, or both- and then fix. I find the cycle time on this to be fast enough (and fun enough) that I haven’t tried anything “agentic” to automate this process beyond a little bit of experimentation with Block’s Goose.

I cannot hammer this home enough… the code is the byproduct of the unit tests. In 6 months or 2 years or 10 years when the languages change or the dependencies change or your tastes change, you can simply ask the most cutting-edge model of the time to write you new code against those unit tests, and then you can objectively evaluate (in a single shot), if it did it. Sure sounds like a practical implementation of those “reward functions” I hear the thinkfluencers talking about.

Step 5: Write a Readme

I’m not actually sure if this helps the LLms at all, but it definitely helps me when I’ve gotta go back and figure out what the hell I built and why I built it that way.

I will usually feed both the initial-reqs.md and detailed-reqs.md and ask it something along the lines of “Generate a comprehensive Readme with sections outlining goals, data structures, dependencies, how to run locally, how to run unit tests” which pretty much any model can do in one shot. Sometimes I’m feeding the unit tests and the code in (which definitely improves the quality of the Readme) but sometimes I’m limited by context size. On my next project I want to try this stage with Gemini 2.0 Pro’s insane 2M context window.


Please email with feedback and ideas! I am excited to see how this process evolves (and likely continues to get simplified by better tools and better models) over time.

LLM Trick: Improve Project Directory Structure With tree


❯ tree -I 'node_modules|dist' | llm "How can I improve this project \
structure to follow best practices in 2025? Be verbose but format \
your response so that each line is less than 80 characters. Use as \
many newlines as necessary."

Improving your project structure involves adhering to best practices for organization,
maintainability, scalability, and clarity. Below are recommendations to enhance your
existing structure in line with current trends and methodologies.

### 1. Directory Organization

#### **Logical Grouping**
- Ensure that files are organized based on their functionality or purpose.
- Keep components, layouts, and utilities grouped logically and consistently.

#### **Feature-based Structure**
- Consider organizing files based on features or modules.
- For example, group related pages, components, and styles together.

#### **Example Structure**
src
├── features
│   ├── blog
│   │   ├── components
│   │   ├── pages
│   │   ├── utils
│   │   └── styles
│   └── about
│       ├── components
│       ├── pages
│       └── styles

### 2. Dedicated Folder for Hooks and Contexts

- If using React, create a dedicated folder for hooks and context providers. This
  enhances maintainability and encourages reusability.

src
├── hooks
│   ├── useAuth.ts
│   └── useFetch.ts
└── contexts
    ├── AuthContext.tsx
    └── ThemeContext.tsx

### 3. Asset Management

#### **Centralized Assets Folder**
- Move all assets (images, videos, fonts) into a designated directory for clarity.

src
├── assets
│   ├── images
│   ├── videos
│   └── fonts

### 4. Naming Conventions

- Use consistent naming conventions for files and directories. Opt for lowercase
  with hyphens for filenames (e.g., `header.astro`, `post-list.astro`).

- Ensure that folder names clearly describe the contents or purpose (e.g., use
  `components` instead of `component`).

### 5. Documentation

- Update the `README.md` to contain information on project setup,
  contributing guidelines, and architecture overview.

- Consider adding a **Contributing** folder to document coding standards, PR
  guidelines, and issue templates.

### 6. Typescript Organization

- Organize TypeScript files into dedicated folders depending on whether they
  are types, interfaces, or enums, enhancing structure and discoverability.

src
├── types
│   ├── global.d.ts
│   └── post.ts
└── interfaces
    └── user.ts

### 7. Tests Directory

- If not already present, create a `tests` directory to contain all test-related
  files, enhancing testing organization.

tests
├── unit
└── integration

### 8. Configuration Management

- Keep configuration files (like `tsconfig.json`) in a clearly labeled folder,
  perhaps under a `config` directory.

### 9. Enhance Version Control

- Consider adding `.gitignore` and `.editorconfig` files if not present
  to manage ignored files and maintain coding styles.

### 10. Update `public` folder organization

- Consolidate assets in the `public` directory and structure them for easy access.

public
├── assets
│   ├── images
│   └── styles
└── favicon.ico

### Conclusion

By adopting these organizational strategies, your project structure will be more
robust, easier to navigate, and maintainable as it grows over time. Furthermore,
these best practices align with modern development trends, fostering a better
developer experience and improving collaboration within teams.


Samuel Covin On AI Abstractions

If you’re running a customer service business and you have loads of people sitting answering telephones, the less well trained they are, the less that you trust them, the more that you need to give them a script to go through. […] If you’re doing high net worth banking, you just employ people who you think are going to be charming to other rich people and set them off to go and have coffee with people. […] And the same is true of models. The more intelligent they are, the less we need to tell them, like structure what they go and do and constrain the routes in which they take.

If models are getting faster as quickly as you say they are, then we don’t need agents and we don’t really need any of these abstraction layers. We can just give our model […] access to the internet, cross our fingers and hope for the best. Agents, agent frameworks, graphs, all of this stuff is basically making up for the fact that right now the models are not that clever.

Samuel Covin [~00:26:32]

One of many great tidbits from Samuel in this podcast.

I am generally not a fan of Python (in favor of the clearly far superior Javascript) but I am a superfan of Pydantic. I was thrilled when Pydantic AI was announced and have continued to follow it’s developments and iterations closely. I have a strong feeling it will continue to define mental models in the AI SDK space for many years to come.

[Video] Sidewalk Chalk Robot

Taylorator: Flood the FM Broadcast Band with Taylor Swift

Anomalous Tokens in DeepSeek-V3 and r1

Fascinating. Only a matter of time before somebody writes an llm fuzzer.

How I Write <img> Alt Tags for This Blog

llm 'Write an alt tag for this image' -a xyz.gif

I use the llm CLI tool and cycle through different vision-capable models until I get an answer I like. At some point I will probably automate this as part of the build step of the blog, but the novelty of trying out the various different models far outweighs any annoyance for now. I was surprised they all handled animated GIFs without complaints, although they may just be pulling a single (or a few) frames.

The price of that query with GPT-4o? $0.0023 - that is, one half of one half of one penny.

Presidio

Presidio helps to ensure sensitive data is properly managed and governed. It provides fast identification and anonymization modules for private entities in text and images such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.

How it works

I was just taking a look at Chainlit, and more specifically this example and saw Presidio mentioned.

I have seen basic attempts at doing this with hand-spun regexes in the past and I’ve seen commercial products, but this feels like it strikes a nice middle ground. Despite the very Microsoft-y website that made me immediately assume it was for C# or .NET, it’s a Python library, and it’s MIT licensed. From their FAQs:

Microsoft Presidio is not an official Microsoft product. […] The authors and maintainers of Presidio come from [our] Industry Solutions Engineering team.

Understanding Home Assistant’s Database and Statistics Model

Zero

Terrible name, interesting idea.

Hacking Subaru

I bought my mom a 2023 Subaru Impreza with the promise that she would let me borrow it to try and hack it

Another Sam Curry Banger. Don’t miss the Bypassing 2FA portion, it’s a doozy.

First time hearing of ffuf, looks neat.

Pagefind

Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure.

Delightful project I accidentally stumbled upon while building this very blog. It pre-computes all the search indexes at build time and then packages into a gloriously simple frontend. Who knows if I’ll ever post enough that it’s worth using. For the time being, it lives at /search.

It spits out a directory structure like this:

dist/pagefind
├── fragment
│   ├── en_4733ec7.pf_fragment
│   ├── en_5c9a98e.pf_fragment
│   ├── en_7ca223a.pf_fragment
│   └── en_953c689.pf_fragment
├── index
│   └── en_4d96258.pf_index
├── pagefind-entry.json
├── pagefind-highlight.js
├── pagefind-modular-ui.css
├── pagefind-modular-ui.js
├── pagefind-ui.css
├── pagefind-ui.js
├── pagefind.en_f57a1155c8.pf_meta
├── pagefind.js
├── wasm.en.pagefind
└── wasm.unknown.pagefind

The .js, .css, and even the wasm stuff all made sense, but I was curious about the binary blobs in the .pf_fragment, .pf_index, and .pf_meta files.

Weirdly (and somewhat ironically), I could not find any documentation on the actual binary format the indexes were being stored as. I poked around a bit before deciding to dig into the source code.

With the help of Claude, I’ve figured out that they’re using Concise Binary Object Representation via the minicbor Rust lib and sort of pieced together the root data structures. I have linked to them below.

.pf_fragment

#[derive(Serialize, Debug, Clone)]
pub struct PageFragmentData {
    pub url: String,
    pub content: String,
    pub word_count: usize,
    pub filters: BTreeMap<String, Vec<String>>,
    pub meta: BTreeMap<String, String>,
    pub anchors: Vec<PageAnchorData>,
}

#[derive(Serialize, Debug, Clone)]
pub struct PageAnchorData {
    pub element: String,
    pub id: String,
    pub text: String,
    pub location: u32,
}

Code

.pf_index

/// A single word index chunk: `pagefind/index/*.pf_index`
#[derive(Encode)]
pub struct WordIndex {
    #[n(0)]
    pub words: Vec<PackedWord>,
}

/// A single word as an inverse index of all locations on the site
#[derive(Encode, Clone, Debug)]
pub struct PackedWord {
    #[n(0)]
    pub word: String,
    #[n(1)]
    pub pages: Vec<PackedPage>,
}

/// A set of locations on a given page
#[derive(Encode, Clone, Debug)]
pub struct PackedPage {
    #[n(0)]
    pub page_number: usize, // Won't exceed u32 but saves us some into()s
    #[n(1)]
    pub locs: Vec<i32>,
}

Code

.pf_meta

/// All metadata we need to glue together search queries & results
#[derive(Encode, Debug)]
pub struct MetaIndex {
    #[n(0)]
    pub version: String,
    #[n(1)]
    pub pages: Vec<MetaPage>,
    #[n(2)]
    pub index_chunks: Vec<MetaChunk>,
    #[n(3)]
    pub filters: Vec<MetaFilter>,
    #[n(4)]
    pub sorts: Vec<MetaSort>,
}

/// Communicates the pagefind/index/*.pf_index file we need to load
/// when searching for a word that sorts between `from` and `to`
#[derive(Encode, PartialEq, Debug)]
pub struct MetaChunk {
    #[n(0)]
    pub from: String,
    #[n(1)]
    pub to: String,
    #[n(2)]
    pub hash: String,
}

#[derive(Encode, Debug)]
pub struct MetaPage {
    #[n(0)]
    pub hash: String,
    #[n(1)]
    pub word_count: u32,
}

Code

Things That Seem Like They Should Be Easy but Are Actually Virtually Impossible

Comic by Randall Munroe of xkcd. Illustration humorously contrasts the simplicity of checking a photo’s location using GIS versus the immense complexity of identifying the subject of the photo as a bird. Caption: ‘In CS, it can be hard to explain the difference between the easy and the virtually impossible.
xkcd 1425

Getting an Odometer Reading via the OBD2 Port

There are PIDs for pretty much everything, but alas no PID for “total mileage” or even “trip mileage”. I have wanted a trip logger that syncs via WiFi for YEARS now and this weird limitation is the primary thing stopping me from building.

And like, I get it, you don’t want people to have any reason to tamper with OBD2, and “odometer fraud” seems like a reason people might want to tamper with OBD2. But you completely sidestep that risk by just not including it. How inconvenient for me.

Every couple months I’m like “how hard could it possibly be to find the manufacturer-specific CAN messages for all the vehicles in my fleet” and every couple months I am reminded why I am not a car hacker.

UPDATE: Apparently all cars after model year 2021 are mandated to have a standardized odometer PID because California said so. This actually started in phases in 2019. However this does not help me specifically as 0 of my vehicles are 2019 or newer.

Programmatically Detecting an Answering Machine

SIP is a whacky wonderful protocol that covers many many cases. Over the years it has been evolved and tacked onto to solve various issues individuals and organizations have had.

You know what that protocol does not cover? Answering machines. Absolutely zero indication to the caller that they’re talking to an automated answering machine vs talking to a human.

And reader, you’re probably saying to yourself “huh, that’s weird, but surely somewhere in the telephony stack, SOMEWHERE you can determine this information right?” And reader, I thought the same thing! But no! Nothing anywhere in the chain can deterministically tell you an answering machine is answering vs a human is answering.

In the grand scheme of things I guess this is probably a good thing? Scammers and spammers would definitely figure out how to use the distinction to their advantage, although there are many commercial products available today that can do this by having AI listening to the call and listening for the beep or whatever.

I’ll be sure to file a feature request with the phone providers.

A Marriage Proposal Spoken Entirely in Office Jargon

Programming With LLMs

The ideal task for an LLM is one where it needs to use a lot of common libraries (more than a human can remember, so it is doing a lot of small-scale research for you), working to an interface you designed or produces a small interface you can verify as sensible quickly, and it can write readable tests.

David Crawshaw

Self-driving a 1993 Volvo 940 with Openpilot

Prototype This was ahead of their time