humanities ∩ technology ∩ education

This paper on context windows for Large Language Models is important and interesting, if only because it is a reminder that bigger is not always better and that advances can have unexpected consequences.


The headline getting attention is that what comes first and what comes last matters more for results than what comes in the middle. In some ways that's not surprising, as working with existing smaller context windows provides some intuition that what you say at the margins matters a lot. Small changes there can override things said in the middle. But I think the more relevant consequence of this work is not that large context windows aren't useful but that we may end up having to subdivide tasks anyway. That is, the hype of LLMs has been how much they can do in one go. One generation from a prompt and you get all this stuff that might have required multiple passes and methods of traditional NLP parsers or classification models or explicit logic. But the reality has been that to get repeatable and reliable results requires some subdivision or architecting of tasks just as it always has. This paper is a reminder that at a practical level it can still pay off to consider what can be done with smaller units and also that we should not assume that larger context windows will solve all problems.

tldr: 1. prompt first, build later; 2. shape context, don't summarize.

  1. Been working with a number of student projects on the side and teaching a bit of prompt engineering techniques. A key thing there that struck me in working with them is how unobvious it might be that the first step of working with LLMs should always be working up the prompts and seeing the limits of what the prompts themselves can do. Further, reliability and repeatability are of course key concerns and so the two criteria that matter most when thinking about when and whether to do more computationally expensive things like fine-tuning or other sorts of model training are a) is this something that can't be done with the prompt itself and b) is this something that can't be done reliably and repeatably with the prompt itself. If the answer to either of those is yes, then time to think about what goes into frontend/backend, fine-training, other forms of data manipulation outside the LLM, or other techniques. Particularly with a lot of people coming to this for the first time, I think there's a tendency to be seduced by the youtube tutorials out there and forgetting that simple is usually better.

  2. Been working a lot on how to use knowledge graphs to shape and select prompts and relevant context in Retrieval Augmented Generation. Some of this is public here in this webinar with Fauna and AWS: Beyond what's in the webinar, the field is wide open on ways to combine specific kinds of graph data or relational data with less structured data returned from vector search. What I think is missing are ways to use the structure of relationships to at a minimum select and in other cases reshape the prompt itself. i.e. the graph structure returned to a query can itself be treated as loosely structured data to feed into generating a prompt. As LLMs themselves seem better and better at generating prompts (leading to the possibility that prompt engineering itself might be a short-lived field for most general usage), I expect to see more and more prompts generated from a variety of possible indirect inputs as parses or reactions to features of those inputs. Graphs are particularly amenable to this as they can be represented in language as sentences defining relationships (e.g. Oscar knows LLMs; Kate does not know Oscar).

I build a lot, most of which feeds into temporary artifacts and a long list of projects both side and primary. I'm not particularly good at building in public — I will freely admit that. But lessons from building do make their way immediately to teachings, both in former life as professor and in present day to day with a variety of stakeholders, students, colleagues, and collaborators. I'll freely admit that far too much of this stays in my head, despite knowing that documentation can be an invaluable asset, particularly as projects grow in complexity.

So, build notes serve a simple purpose: collect some learnings and outcomes periodically and put them together here.

Let's go.

Enter your email to subscribe to updates.