features development
design UI: everyone P0 (time to do this)
-
workflow: how does panel interact
-
implementation UI: @nebrask (A/B testing)
-
iterative feature development afterwards: @lucas @waleed
-
ML components: @aarnphm
-
training: (research) (quality testing) ← @aarnphm1
-
inference: (infrastructure) (A/B testing, regression testing) @waleed
- OpenAI-compatible API server: functional
- Edit logits for inference server (vllm, llama-cpp)
- local inference
- UX: TTFT (time to first tokens)
- inference engine: vLLM (GPU), llama-cpp (CPU)
- vllm plugins support
-
multiplayer text editor: (target: stakeholders) + (other player: AI models) (P3)
ux.
- session history: https://translucentweb.site/
- writing ⇒ graph (embeddings representation for the text)
Some examples of interesting research problems in this direction + interfaces:
— Linus (@thesephist) May 21, 2024
- How can a user sort through millions of features to select and edit precise features they desire in a sample? What's the best way to organize millions of features in a UI?
- How do you reconcile…
Inline-definition
inline definitions pic.twitter.com/RfYBRLiMS3
— JohnPhamous (@JohnPhamous) October 2, 2024
Storage (local):
XDG_DATA_HOME/tinymorph
for configuration, state, db
accesibility:
expansion upon telescopic text: notation
Announcing: Writing Examples
— David Perell (@david_perell) October 3, 2024
Today is launch day! We built this website to celebrate great writing.
It’s 100% free. Each article deconstructs a piece of writing from an iconic writer. The goal is to give you X-Ray vision into what makes sentences and paragraphs come alive (so… pic.twitter.com/5w9Mvkz01b
cursor navigation:
I invented a new cursor for zooming in 🔎
— Jace 🤎 (@JaceThings) October 8, 2024
live @ https://t.co/uy2N38yXSX pic.twitter.com/TDPcmXYtX7
graph-based:
⇒ Conceptual: Mind map
- empirical
non-linear actions → linear actions
drag-and-drop posted notes ⇒ generate posted-notes
cost.
Using EC2 for GPUs and inference cost. (Running on A100 with 32 CPUs)
text editor
Question
- What sort of data structure we want to use for implement this?
- How should we implement cursor and certain buffers?
- File management locally (preferrably user-owned instead of centralized data storage)
- [Stretch, preference] Can we support modal editing?
- How do we handle syntax highlighting as well as markdown rendering (think of treesitter, but then shiki is pretty computationally expensive)
- How should we handle file (Chromium has a file system API builtin the browser)
For node server, I’m thinking we should keep it light, as it can run a simple proxy server that opens a websocket to stream the JSON to the browser (probably easiest for us as we don’t have to worry too much about graphQL or any of that nonsense db) has context menu
See play.html for dead-simple editor I played with.
Local file is a must (can be accessed via file:///
)
Possible UI component library: shadcn/ui
what if your journal visualized your emotions? (little concept inspired by hume and obsidian) pic.twitter.com/zJe3oe3f5I
— lele (@CherrilynnZ) September 19, 2024
editor: https://prosemirror.net/
What is the data model for planning?
CoT drawbacks
training SAEs
see also: Goodfire preview releases
Dictionary learning: https://transformer-circuits.pub/2023/monosemantic-features/index.html ⇒ motivation to prove SAE results in interpretable features
https://transformer-circuits.pub/2024/scaling-monosemanticity/
for finding attention activation.
Anthropic’s report on training SAEs
- https://github.com/EleutherAI/sae
- https://github.com/jbloomAus/SAELens
- use as reference, but probably
torch.distributed
andtransformers.Trainer
should be more than enough
- use as reference, but probably
- lens into viewing random activations
https://lstmvis.vizhub.ai/ ⇒ LSTM vis https://github.com/TransformerLensOrg/TransformerLens
https://blog.eleuther.ai/autointerp/
Attribute allocation?
Question
How should we steer?
- Think of using SAEs ⇒ iterate better prompt
Features composition for guided steering
features rep? Correctness w/ models internal representation (trie for models)
- manually curate features features ablation:
Accurate mappings based on human and machine features?
Context: reduction in third space of model representations
representation based on users text (build features)
Use SAE to steerable generations2 ⇐ User feedbacks
problem statement.
actionable steering for attention-based models
RAG-infused pipeline
What if we add additional web-search vectors to enhance correctness in steering?
inference
Steering Llama via Contrastive Activation Addition (Panickssery et al., 2024), code
- Seems like they are using layer 16 for interp Claude’s features
self-explanation
Excerpt from Self-explaining SAE features
- Idea: replace residual stream on X with decoder direction times a given scale, called self-explanation
- auto-interp: use a larger LLM to spot patterns in max activating examples (See Neuronpedia’s auto-interp)
A variant of activation patching
See also: SelfE or Patchscope
Important
align with auto-interp3 as the current standard for SAE feature interpretation.
self-similarity
measure cosine similarity between the 16th layer residual stream of last prompt tokens and original SAE feature.
entropy
based on predicted distribution of the answer’s first token.
The distribution is represented as . Since we will insert SAE feature direction into one of the prompt token, the distribution becomes where is the SAE feature index..
Note that entropy decreases as the mutual information between random variable representing the feature and first answer token increases.
composite
ranks the optimal scale in the top-3 list for a much larger percentage of cases.
mathematical framework for transformers circuits
excerpt from this transformers threads
automatic interpretability
see also: Transluce’s Monitor source
plan
- update docs
- waleed & lucas
- frontend
- nebras
- server
- aarnphm
Prep for POC interview
- Lectures
- TA meetings
- Extras
Extras
user manual and usability testing