Feeding The AI some Pie!
I never ever really got my head around formulae (laziness / direction) so LLM is a revelation / revolution except:
Not having the best grasp it is most interesting to understand / decode when there are missing bits. I suppose this is what life / science is all about - connecting dots that are not there.
Anyway a hack / a fix / a solution to ‘hel’ a chat window display formulae - but first to recap:
LLM1
”It seems there were persistent encoding issues with the symbols and characters in the content. Let me adjust the text to replace problematic characters like…” ffs nearly all of them..
LLM2
”It appears that persistent Unicode issues are preventing successful PDF creation using the current method. I’ll simplify the problematic symbols (e.g., replacing square root symbols and other special characters with plain text equivalents) and retry.”
LLM3
”It seems the tool continues to encounter Unicode encoding issues, even with symbols replaced. Let me switch to a more robust PDF generation method that avoids these problems altogether.”
LLM4
”It appears the session has reset, and required tools are unavailable. Could you re-enable the necessary tools, or I can guide you in generating the output manually?”
So this is tear ones hair out or roffl..
So the hack / fix / solution
Feed it pie
Ask: can you understand this image?
Hopefully the LLM will say: “Yes, I can understand the image. It represents the Greek letter π (pi), a mathematical constant approximately equal to 3.14159. It is widely used in mathematics to represent the ratio of a circle's circumference to its diameter.”
Instruct:
So then please take the elements of the combined formula, use an image or verbal explanation?
You may then reach acquiescence:
”Certainly! Let's break down the elements of the combined formula we discussed earlier using verbal explanations paired with an image-based mental visualization to clarify how the pieces fit together.”
Some of you may think this represents a New Year Resolution on my behalf to learn more formulae - not a fucking chance… lol
After all that some of this will/may form the basis of an article revolving around the combination of the Fisher–Rao Metric with a Markov chain in order to look at the continuous statistical geometry yet capture the discrete sequential nature of inputs at the early stages.
So the Yellow Brick Road illustrative image above, But, do not reprogramme your LLM, yet..
An ELI16 version may be good for me when I come back to this..so..
Markov Chain: Capturing Sequential Transitions
A Markov chain models processes that unfold in steps, where each step depends only on the current state. Think of predicting the next word in a sentence: the word "on" often follows "the gun is" because the likelihood of "on" depends on the immediate context. Formally, a Markov chain represents this with transition probabilities, like
𝑃 ( 𝑠 𝑡 + 1 ∣ 𝑠 𝑡 ) P(s t+1 ∣s t ), which tell us how likely we are to move from one state (𝑠 𝑡 s t ) to the next ( 𝑠 𝑡 + 1 s t+1 ).
This framework is ideal for early-stage prompt processing in language models, where the sequence of tokens (words or subwords) is parsed step by step. The Markov chain perspective captures this discrete, probabilistic nature.
You may laugh but…
Large Language Models as Markov Chains
ArXiv
https://arxiv.org/abs/2410.02724
So to Fisher–Rao Metric: Measuring the Statistical Geometry
The Fisher–Rao metric, which operates in a continuous, geometric realm. This metric quantifies how "far apart" two probability distributions are on a statistical manifold, using information theory as its foundation.
Imagine words in a sentence as points in a space where proximity reflects similarity or shared meaning. The Fisher–Rao metric measures the sensitivity of this space, helping us understand how small changes in inputs (e.g., tweaking a word) ripple through the model. In essence, it defines a Riemannian geometry on the manifold of probability distributions, enabling us to calculate distances that respect the data's probabilistic nature. This may seem complicated but it does not have to be:
Think of bank..
I was so happy I robbed the bank and got away
I was so happy to be on the bank reflecting on life
I was happy, evben ecstatic to catch a fish on the bank and have something to eat beyond what my bank / aka finance vulture mf’s had already eaten from my paycheck/cheque / bank account. Throwing myself off the back or at the bank would return nothing.
Again whilst my knowledge is / will never be complete is does seem that there is a bit of a dilemma here in terms of the trad Euclidean interepretation of ‘flatness’
There does appear to be differences of opinion in terms of:
does the Fisher–Rao metric intrinsically measure curvature
how does the Riemannian metric on the statistical manifold vary in curvature
how do the key components that adjust in response to changes in the underlying probability distributions or data topology
Always more questions and less time to answer… :-(
Please forgive me if these are in the ELI range of 5 to 15
I will revert with a further article in due course, but.. it’s only January.