See also: this document revision

Problem

To know the world one must construct it. -- Pavese

In 1972 Alan Kay (Kay, 2011, p. 11) conceptualised the Dynabook, a portable device that empower users of all ages to explore and learn. More than just a portable computer, the Dynabook was envisioned as a dynamic, interactive medium for learning, creation, and self-expression, that could adapt to users’ increasing skills and need.

Fast forward to the century, software has become comoditised and transcend every aspect in our life. Simultaneously, we’ve seen exponential growth in machine learning (ML) systems’1 capabilities, largely due to a general push of large language models (LLMs) into the mainstream. As these systems exihibit emergent properties of intelligence, how should we craft interfaces that amplify users’ agency and encourage a sense of personalisation through interactions, rather than providing a mere tool for automation?

Imagine you are an engineer who pursues creative writing as a hobby. You often curate topics and ideas from discussion on social media, then categorise them into themes for your arguments. There are plethora of tools out there that you can use to assist you with planning for your writing. For those inclined towards more adventurous endeavours, such as running customized models to meet specific requirements, you might find yourself in the land of auto-regressive models: GPTs and friends.

Auto-regressive models excels at surfacing machines’ internal representation of the world through a simple interface: given a blob of text, the model will generate a contiguous piece of text that it predicts as the most probable tokens. For example, if you give it a Wikipedia article, the model should produce text consistent with the remainder of said article. These models works well given the following assumption: the inputs prompt must be coherent and well-structured surrounding a given problem the users want to achieve. A writer might provide paragraphs from their favourite authors - let’s say Joan Didion, as context to formulate their arguments for a certain writing. The model then “suggests” certain ideas that simulate Didion’s style of writing. Here is a big catch: garbage in, garbage out. If your prompt are disconnected or incoherent, the model will generate text that is equally incoherent.

This heuristic lays the foundation to the proliferation of conversational user interfaces (CUIs), which is obvious given that chat is a thin wrapper around text modality. Yet, CUIs often prove frustrating when dealing with tasks that require larger sets of information (think of support portals, orders forms, etc.). Additionally, for tasks that require frequent information retrieval (research, travel planning, writing, etc.), CUIs are suboptimal as they compel users to unecessarily maintain information in their working memory (for no reason). For writers, the hardest part of writing or getting over writers block usually relies on how to coherently structure their thoughts onto papers. This requires a step beyond pure conversation partners, an interface that induces both planning and modelling of ideas.

Given these challenges, tinymorph doesn’t seek to be a mere tools for rewriting text. tinymorph aims to explore alternative interfaces for text generations models to extend our cognitive abilities. This means developing spatial and visual interfaces that allow for non-linear exploration of information and ideas, through writing.

Inputs and Outputs

Inputs from users’ perspective:

  1. User starts with writing about a topic of their choice (e.g. essay on growth.)
  2. User’s personal preferences for tonality and writing styles.
  3. External information sources related to topic at hand.
  4. Context of the topic as well as desired goals.
  5. Optionally a personalized system prompt to setup the model.

Outputs from tinymorph:

  1. A text-based inlay hints suggestions from models
  2. left-to-right (LTR) sequential panel to get users feedback on manual steering
  3. An optional panel allowing users to choose certain features to play with possible generations of their text
  4. Additional panel for hyperparameter tuning for generation intervention

Stakeholders

tinymorph main stakeholders include writers, and engineers whose write. tinymorph seeks to help writers to be more productive in their writing, getting over their creative block. Additionally, tinymorph also aims to encourage engineers who write to be more creative and structure with their writing, while stay true to their tone. Writing, at the end of the day, are very personal, which means a tool should not dictate how one should write.

Environment

The following encapsulates both hardware and software environment for tinymorph:

hardware.

  • For local inference, GPUs will be preferred for faster time-to-first-tokens over CPUs.
  • For users’ own hosted inference server, it will requires to have internet connection for running inference.
  • tinymorph is a text editor, therefore it would be able to run on any modern computer regardless of operating system.

software.

  • Cross-platform support
  • A series of SAEs will be trained and used for feature steering.
  • Support for offline inference (users can run LLMs locally).
  • Minimal dependencies with intuitive UIs to demonstrate tinymorph’s capabilities.

Goals

  • A file-over-app web-based text editor.
  • User feedback-loop for steering through planning.
  • Efficient activation caching for feature steering.
  • Functional SAEs for guided steering.
  • OpenAI-compatible API for LLM server.

Stretch Goals

  • local inference.
  • integrations with different device forms (tablets, e-readers).
  • Cross-platform GUI.

Challenge Level and Extras

Challenge level for tinymorph is advanced, since the project is exploring what is possible to build AI-native interfaces. It involves the field of mechanistic interpretability, which is a relatively new field in alignment research that involves a lot of domain knowledge into the inner working of transformers circuits.

Extras that we wish to pursue: Design Thinking, Research.

Reasoning:

  1. Design Thinking: tinymorph will explore new interactions for writing, which means design thinking is required for us to prototype on ideas.
  2. Research: The field of mechanistic interpretability is considered relatively new, in which there are exciting research happening. We will explore this to understand how we can utilise manual editing activations rather than driving the model through a system prompt.

Appendix

Haugeland, J. (1997). Mind Design II: Philosophy, Psychology, and Artificial Intelligence. The MIT Press. https://doi.org/10.7551/mitpress/4626.001.0001
Hendler, J. (2008). Avoiding Another AI Winter. IEEE Intelligent Systems, 23(2), 2–4. https://doi.org/10.1109/MIS.2008.20
Kay, A. C. (2011). A Personal Computer for Children of All Ages. Proceedings of the ACM Annual Conference - Volume 1. https://doi.org/10.1145/800193.1971922
TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433

Reflection

Writing this document helps me structure my thoughts surrounding the problem definition and somewhat confirm an intuition I have towards this problem. Interfaces for personal computing are supposed to be open-ended to surprise ourselves with new ways to create, therefore, I think it is important for me to write down these thoughts. At the end of the day, writing is a form of lossiness as mutation, turning a “net into a line”.

The main pain point I have during writing the problem statement includes declaring the scope for hardware and software requirements. We did have an internal discussion whether to build a GUI text-editor or refer to a web-based one. After a quick discussion, we decided that given everyone expertise and preference with the web, our main focus for capstone would be a web-based interface.

We did consult with our supervisor to limit down the scope for both the UX work as well as potential avenue for ML research. We established a clear boundaries for UX components we would develop that supplement the core ML work we are going to do (which is mechanistic interpretability).

Revision

DateRevisionChange
Sept. 16 20240.0Initial skafolding
Sept. 19 20240.1Problem Statement definitions
Oct. 5 20240.2Update problem statement, adjust stakeholders

Footnotes

  1. Historically, the seminar work from Alan Turing laid the foundation for exploring the possibilities of a thinking machine (TURING, 1950). Subsequently, the development of AI had taken a symbolic approach, popularized through decision-tree reasonings and expert systems — world representation through high-level and human-readable symbols to manipulate the results. Haugeland referred to these systems as Good Old-Fashioned AI (GOFAI) (Haugeland, 1997)

    However, GOFAI presented its limitation and led us into a “AI Winter”, largely due to the cynicism of the general research community as well as a reduction in funding at most research labs. (Hendler, 2008)

    Given the rise to Moore’s Law and the expontential amount of computing and data a new approach centered around statistical methods and connectionist networks arose, and referred as “New Fangled AI” (NFAI). Machine learning system nowadays are also known as NFAI systems.