<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Notes to a Future Self - tips</title>
    <subtitle>Mostly software</subtitle>
    <link rel="self" type="application/atom+xml" href="https://jhugman.com/tags/tips/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://jhugman.com"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-03-17T00:00:00+00:00</updated>
    <id>https://jhugman.com/tags/tips/atom.xml</id>
    <entry xml:lang="en">
        <title>Auto-research is wild</title>
        <published>2026-03-17T00:00:00+00:00</published>
        <updated>2026-03-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              James Hugman
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jhugman.com/posts/autoresearch-is-wild/"/>
        <id>https://jhugman.com/posts/autoresearch-is-wild/</id>
        
        <content type="html" xml:base="https://jhugman.com/posts/autoresearch-is-wild/">&lt;p&gt;This is pretty exciting.&lt;&#x2F;p&gt;
&lt;p&gt;I heard about &lt;a class=&quot;external-link&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;karpathy&#x2F;autoresearch&quot;&gt;Kaparthy&#x27;s Autoresearch&lt;&#x2F;a&gt; only vaguely: it&#x27;s a way of getting the LLM to run experiments on itself(?), running for long periods of time.&lt;&#x2F;p&gt;
&lt;p&gt;It wasn&#x27;t until I read David Cortés&#x27; &lt;a class=&quot;external-link&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;davebcn87&#x2F;pi-autoresearch&quot;&gt;&lt;code&gt;autoresearch-pi&lt;&#x2F;code&gt; skill&lt;&#x2F;a&gt; that I understood the significance. The skill lets you give your coding agent an optimization target for some or all of your code, then it:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;comes up with ideas,&lt;&#x2F;li&gt;
&lt;li&gt;tries each of them out,&lt;&#x2F;li&gt;
&lt;li&gt;keeping the ideas that make the optimization target better, but&lt;&#x2F;li&gt;
&lt;li&gt;reverts the ones that don&#x27;t.&lt;&#x2F;li&gt;
&lt;li&gt;it comes up with more ideas in a loop.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I tried it on improving the integration test performance on &lt;a class=&quot;external-link&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;jhugman&#x2F;uniffi-bindgen-react-native&quot;&gt;a project I work on&lt;&#x2F;a&gt;. The details here are not important, but I include them here just to give a sense of how many moving parts there are involved.&lt;&#x2F;p&gt;
&lt;p&gt;The tests are testing the correctness of the FFI generator. It&#x27;s a complicated test set up:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;A Rust test file with a macro, which generates the actual test function.&lt;&#x2F;li&gt;
&lt;li&gt;The generated test function takes the fixture crate, compiles it then generates the FFI for it the crate.&lt;&#x2F;li&gt;
&lt;li&gt;The generated FFI is itself part typescript and part C++.&lt;&#x2F;li&gt;
&lt;li&gt;The actual test is Typescript file which is written for the fixture crate.&lt;&#x2F;li&gt;
&lt;li&gt;The &lt;code&gt;test.ts&lt;&#x2F;code&gt; and the Typescript part of the generated FFI are bundled together, with &lt;code&gt;tsc&lt;&#x2F;code&gt; and &lt;code&gt;metro&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;The bundle is sent off to a hand made C++ test runner which runs Typescript in a &lt;a class=&quot;external-link&quot; href=&quot;https:&#x2F;&#x2F;reactnative.dev&#x2F;docs&#x2F;hermes&quot;&gt;Hermes execution environment&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;There is a similarly convoluted set up to run the same tests with the same fixture crates, but in a WASM environment.&lt;&#x2F;p&gt;
&lt;p&gt;The tests are pretty extensive, and on my machine, they take about 30-40 minutes. They take so long, I don&#x27;t tend to run them all at once very often, preferring to run them one at a time, or just JSI or just WASM.&lt;&#x2F;p&gt;
&lt;p&gt;Over the past couple of years I&#x27;ve tried to improve the performance, but it&#x27;s always ended by backing away from it: it&#x27;s too delicate and too important to break, but not important enough to make faster.&lt;&#x2F;p&gt;
&lt;p&gt;After setting up &lt;a class=&quot;external-link&quot; href=&quot;https:&#x2F;&#x2F;pi.dev&quot;&gt;pi.dev&lt;&#x2F;a&gt; and installing the plugin, I answered a few questions. These were essentially:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;what do I want to optimize and how do I measure it?&lt;&#x2F;li&gt;
&lt;li&gt;how do we know an experiment didn&#x27;t break anything?&lt;&#x2F;li&gt;
&lt;li&gt;are there any ideas that I wanted to try first?&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The plugin is a skill, which I guess could be ported to Claude or Codex, or whatever, and a gadget which updates the status bar.&lt;&#x2F;p&gt;
&lt;p&gt;I chose to optimize the JSI test performance; picking out a couple of representative tests. It should run the entire test suite for JSI to check nothing had broken.&lt;&#x2F;p&gt;
&lt;p&gt;Then went to bed.&lt;&#x2F;p&gt;
&lt;p&gt;I came back in the morning, and it had got the who test suite down to 74 seconds.&lt;&#x2F;p&gt;
&lt;p&gt;Its most significant win was to string together three ideas which together unlocked parallel test running.&lt;&#x2F;p&gt;
&lt;p&gt;Since then, I have run it twice more: once for the WASM test suite, and then once when both of them were sub two minutes and running in parallel. I could get it to optimize for a realistic development scenario: mutating a template which changes a generated FFI, so that only code changes that cause a change in generated code need trigger an expensive re-compile of the C++. For these second and third times, I limited the area of code to change to one crate.&lt;&#x2F;p&gt;
&lt;p&gt;Now the entire test-suite, after two nights optimizing and almost no effort from me, is now 1m42s.&lt;&#x2F;p&gt;
&lt;p&gt;A powerful new tool for our toolsets has been discovered.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Tip: Being Karen to an agent</title>
        <published>2026-02-26T00:00:00+00:00</published>
        <updated>2026-02-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              James Hugman
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jhugman.com/posts/agent-karen-md/"/>
        <id>https://jhugman.com/posts/agent-karen-md/</id>
        
        <content type="html" xml:base="https://jhugman.com/posts/agent-karen-md/">&lt;p&gt;I just found a neat trick. I started talking to GPT-5-mini. I had a relatively long set of back and forths, but in the end decided to go to a more powerful model.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;prompt&quot; style=&quot;background-color:#ffffff;color:#303030;&quot; class=&quot;language-prompt &quot;&gt;&lt;code class=&quot;language-prompt&quot; data-lang=&quot;prompt&quot;&gt;&lt;span&gt;I&amp;#39;d like to speak to another LLM. Please give me a prompt getting the next LLM up to speed efficiently with all context and my choices. Use a fence block to make copying the prompt easier.
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This prompt is entirely unremarkable, but for how I started the prompt.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
