Scratchpad

features development

design UI: everyone P0 (time to do this)

workflow: how does panel interact
implementation UI: @nebrask (A/B testing)
iterative feature development afterwards: @lucas @waleed
ML components: @aarnphm
training: (research) (quality testing) ← @aarnphm¹
inference: (infrastructure) (A/B testing, regression testing) @waleed
- OpenAI-compatible API server: functional
- Edit logits for inference server (vllm, llama-cpp)
- local inference
- UX: TTFT (time to first tokens)
- inference engine: vLLM (GPU), llama-cpp (CPU)
- vllm plugins support
multiplayer text editor: (target: stakeholders) + (other player: AI models) (P3)

ux.

session history: https://translucentweb.site/
writing ⇒ graph (embeddings representation for the text)

Some examples of interesting research problems in this direction + interfaces:

- How can a user sort through millions of features to select and edit precise features they desire in a sample? What's the best way to organize millions of features in a UI?
- How do you reconcile…
— Linus (@thesephist) May 21, 2024

Inline-definition

inline definitions pic.twitter.com/RfYBRLiMS3
— JohnPhamous (@JohnPhamous) October 2, 2024

Storage (local):

XDG_DATA_HOME/tinymorph for configuration, state, db

accesibility:

https://www.w3.org/WAI/standards-guidelines/aria/

Telescopic text

expansion upon telescopic text: notation

Announcing: Writing Examples

Today is launch day! We built this website to celebrate great writing.

It’s 100% free. Each article deconstructs a piece of writing from an iconic writer. The goal is to give you X-Ray vision into what makes sentences and paragraphs come alive (so… pic.twitter.com/5w9Mvkz01b
— David Perell (@david_perell) October 3, 2024

website

cursor navigation:

I invented a new cursor for zooming in 🔎
live @ https://t.co/uy2N38yXSX pic.twitter.com/TDPcmXYtX7
— Jace 🤎 (@JaceThings) October 8, 2024

graph-based:

⇒ Conceptual: Mind map

empirical

non-linear actions → linear actions

drag-and-drop posted notes ⇒ generate posted-notes

cost.

Using EC2 for GPUs and inference cost. (Running on A100 with 32 CPUs)

text editor

Question

What sort of data structure we want to use for implement this?

How should we implement cursor and certain buffers?

File management locally (preferrably user-owned instead of centralized data storage)

[Stretch, preference] Can we support modal editing?

How do we handle syntax highlighting as well as markdown rendering (think of treesitter, but then shiki is pretty computationally expensive)

How should we handle file (Chromium has a file system API builtin the browser)

For node server, I’m thinking we should keep it light, as it can run a simple proxy server that opens a websocket to stream the JSON to the browser (probably easiest for us as we don’t have to worry too much about graphQL or any of that nonsense db) has context menu

See play.html for dead-simple editor I played with.

Local file is a must (can be accessed via file:///)

async function createFolder() {
  try {
    const dirHandle = await window.showDirectoryPicker()
 
    // Create a new folder
    const newFolderHandle = await dirHandle.getDirectoryHandle("NewFolder", { create: true })
 
    console.log("Folder created successfully")
    return newFolderHandle
  } catch (err) {
    console.error("Error creating folder:", err)
  }
}

Possible UI component library: shadcn/ui

what if your journal visualized your emotions? (little concept inspired by hume and obsidian) pic.twitter.com/zJe3oe3f5I
— lele (@CherrilynnZ) September 19, 2024

editor: https://prosemirror.net/

What is the data model for planning?

CoT drawbacks

training SAEs

inference

Steering Llama via Contrastive Activation Addition (Panickssery et al., 2024), code

Seems like they are using layer 16 for interp Claude’s features

self-explanation

Excerpt from Self-explaining SAE features

Idea: replace residual stream on X with decoder direction times a given scale, called self-explanation
auto-interp: use a larger LLM to spot patterns in max activating examples (See Neuronpedia’s auto-interp)

A variant of activation patching

self-similarity

measure cosine similarity between the 16th layer residual stream of last prompt tokens and original SAE feature.

entropy

based on predicted distribution of the answer’s first token.

The distribution is represented as $P (t_{n} ∣ t_{1 \dots n - 1})$ . Since we will insert SAE feature direction into one of the prompt token, the distribution becomes $P (t_{n} ∣ t_{1 \dots n - 1}, f)$ where $f$ is the SAE feature index..

Note that entropy decreases as the mutual information between random variable representing the feature and first answer token increases.

composite

ranks the optimal scale in the top-3 list for a much larger percentage of cases.

composite (x) = α \cdot self-similarity (x) + (1 - α) \cdot entropy (x)

mathematical framework for transformers circuits

excerpt from this transformers threads

automatic interpretability

plan

update docs
- waleed & lucas
frontend
- nebras
server
- aarnphm

Prep for POC interview

Lectures
TA meetings
Extras

Extras

user manual and usability testing

Panickssery, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., & Turner, A. M. (2024). Steering Llama 2 via Contrastive Activation Addition. https://arxiv.org/abs/2312.06681

mwatkins’s earlier exploration ↩
Linus’ talk on interface for latent space exploration site or yt ↩
Language models can explain neurons in language models ↩

tinymorph

Table of Contents

Explorer

features development

ux.

cost.

text editor

training SAEs

inference

self-explanation

self-similarity

entropy

composite

mathematical framework for transformers circuits

automatic interpretability

plan

Extras

Graph View

Backlinks

tinymorph

Table of Contents

Explorer

Scratchpad

features development

ux.

cost.

text editor

training SAEs

inference

self-explanation

self-similarity

entropy

composite

mathematical framework for transformers circuits

automatic interpretability

plan

Extras

Footnotes

Graph View

Backlinks