Forgotten Knowledge, Humans and AI
How many books never made it to digital? What gaps do we have in either our information or knowledge?
TLDR;
Lots of knowledge exists in copyrighted books, not accessible to search engines and LLMs
Many mental models exist and research has already been done that you may not be aware of
We need to think about the information and knowledge gaps in the sources we have
While doing some research into software development, I chanced upon a book published in 1980 called “Software Psychology”. Buying it on a whim at $20 shipped, it was full of insights and surprises.
What the book made me question was not AI or people capability, but coverage. What knowledge exists, what knowledge is visible, and what knowledge has quietly disappeared or been forgotten.
On the initial skim, I found:
- In 1977, a commercial company existed building natural language queries on databases
- Psychological challenges of using natural language and chatbots. The original chatbot, Eliza, was written in 1966!
- Code review processes which factor in reviewer learning (IMHO - this helps with knowledge transfer and reducing key person risk)
- Halstead’s software science - foundational work in measuring software attributes
- English is ambiguous, and has limitations when querying complex systems with incomplete information and potential human bias (this sounds very relevant to AI hallucinations!)
I decided to go down the rabbit hole looking at LLMs blind spots to books, copyrighted information, proprietary information in companies and people’s knowledge that hasn’t been written down anywhere.
No model trained primarily on the web can access what was never made visible. It’s a definite information gap in many of the LLMs we use today. Even Larry Ellison, CEO of Oracle, is talking about needing the private and commercial information
This was a “that’s interesting” moment for me, the blind spot of what’s missing, and after a bit more searching came across terms like “Dark Knowledge” and “Information Amnesia”.
“Dark knowledge” does not mean secret, just that it exists and invisible if you don’t look for it. There are many causes such as undigitised books, abandoned research paths, tacit organisational memory, and ideas that never survived a tooling or funding transition. These are gaps in current thinking or information because we don’t know they exist, or has just been forgotten.
Some of this loss is informational like books, papers, presentations, reports, and source code. Some loss can be knowledge, that only ever existed in practice, context, and human judgment. Often tacit knowledge and experience was never written down in the first place.
I think there is a systemic blind spot on certain types of information and trapped knowledge.
The “forgetting” is a byproduct of complex systems, delivery pressures and communication overhead.
The following quote was in the book, from an even older book, The Myth of the Machine (1970): Those who are so fascinated by the computer’s lifelike feats - that they would turn into the voice of omniscience, betray how little understanding they have of either themselves, their mechanical-electrical agents or the possibilities of life”
Down another rabbit hole, from this quote above, lead to the next big word I had to look up “anthropomorphism”. This is when people attribute human characteristics, emotions, or behaviors to non-human entities like animals, objects or natural phenomena. This is I think the blessing and the curse of many LLMs, they sounds like confident humans, and there can be more trust than is warranted in the outputs. Simon Wardleys calls LLMs “Coherence engines, not Truth engines”. We’ve also seen this pattern of thinking with “magic” methodologies or tools that are going to 10x our productivity, but didn’t factor people, their learning or motivations.
Coming back to the psychology aspect of software, which is why I bought the book, there are many other classic texts I’ve reread, like “Mythical Man Month” and “Peopleware”, which have similar “That hasn’t really changed in the last few decades feel to it”. Things like team sizes, communication overhead and efficiency, distractions and deep work are still challenges, with some probably timeless solutions, not solved by technology.
So, if we build software, systems and AIs, on what we remember, not what we once knew, we need to think more about the limits and gaps.
Many development teams I’ve spoken with talk about maintenance being difficult to plan and budget for, skills and knowledge transfer is difficult to fit into release schedules, and people leaving teams or companies compounds this challenge.
Part of the solution to knowledge gaps is visibility, awareness and focus, but for many people, these factors can compete directly with delivery pressures and shorter term incentives. This is a structural, not technical, challenge.
It seems like part of the solution is critical thinking about “what are we missing?” and now we can build fast, remembering that we still have to focus on the right outcome and that maintenance for a long running product is where the effort really is.