This post was delayed out of respect for the passing of the President of the Law Society of Singapore, Mr Adrian Tan. Requiescat in pace.
Hi friends, it has been a while! Many things have been cooking in this absolutely riveting season of change1. This week, I pick up on the thread that I started here and here, where I compared the interesting similarities that legal argumentation has to filling up cloze passages. Large Language Models (LLMs) learn through the same principle - the completion of cloze tasks and the iterative feedback they receive. Reminiscent of those halcyon days!
The next obvious step would have been to talk about Reinforcement Learning with Human Feedback (RLHF), a mechanism that underpins the GPT revolution. When you communicate your approval or disapproval to GPT, you effectively shape its subsequent responses. You set up a scaffolding for it to follow. Your feedback serves as a 'reward system', refining the algorithm's performance. This seems a familiar phenomenon, doesn't it? All of us have been through performance reviews, been told our ideas or arguments need improvement, or are just outright wrong. But it's important to remember - we're focusing on 'preferences' rather than 'correctness'.
But I won’t take that step just yet. Let’s take a slight detour.
Because, you see, I've recently been immersing myself in the riveting world of squash, playing with some friends. These same friends can testify to my skill level.
It is Bad.
But that didn’t deter me. I live near some public squash courts (ActiveSG is amazing). So I told myself that I would practise my forehand technique 750 times. Since I’m a beginner, almost nothing would go according to plan. So, I would probably have to rush to get to the ball on time with my backhand - mixing things up, or interleaving, generally boosts learning.
Towards the end of that first session, I started finding a groove and an aim that I had not had. Sweat gave way to a little bit of pride, a little bit of a gleam, and the huff and puff of reaching each ball turned into a bit of a dance2. So, I did a second session after getting some expert advice from YouTube videos, this time with 500 strokes. Yes, listen to the squash women and men on the screen. Point your racket where you would like the ball to go. Yes, make your opponent run! Make the ball lose energy! Oh yes, you are a squash player - did someone just walk by and see my disastrous serve? Then I did a third session, and you know what. I felt like I had improved tremendously.
This was all unsupervised learning in action.
Then, I played with a friend. AND I GOT THOROUGHLY, THOROUGHLY TRASHED. The sort with trashing where the floor you knew is now a gaping hole. Well, at least, in the first couple of games. My friend commented that my timing got a lot better during our last few games, and while our score differences had not quite decreased compared to our previous challenges, the rallies were longer - and I like to think I’d given him more of a run for his money.
But the feedback from playing with him during those first few games was important: it offered a reward and penalty system. I was rewarded points for things that I did right, and penalised me for things that I did wrong. And as has been said of such games, it is less about scoring points, than it is not losing points. Yes, that combination of words takes a while to sink in.
Does this mean this was supervised learning? Not quite. Roughly speaking, I think this was more like reinforcement learning. You will find games being mentioned in the reinforcement learning literature for precisely this reason. Chess, StarCraft, Go, etc. The list goes on.
Supervised learning would have been like learning how to play with a coach (which I am seriously considering by the way).
What does this mean for legal learning? Or learning how to learn (metalearning, to be concise)?
Here are some quick observations:
Some of us are better at unsupervised learning than others.
Unsupervised learning is not innately better than supervised learning. Neither is reinforcement learning when compared to the three. An ensemble of them is usually better.
That does not mean that responsiveness to feedback is equivalent to quickness (to reach correct conclusions) in unsupervised learning. The distinction is subtle, but significant.
What that means for adaptability and creativity is something I need to think more about.
But adaptability and creativity shine in novel situations that have been untouched by previous supervised learning or reinforcement learning. Such situations are truly tests, and set the better ones apart. Can such skills be learned? Can learning be transferred? (Stay tuned to a future post for a deep dive into reinforcement learning.)
But the stellar performance of supervised learning, and reinforcement learning-based models corroborate with our intuitions more generally about the importance of guidance in learning.
We all engage in different types of learning: supervised, unsupervised, as well as reinforcement learning.
It is perhaps the last of these that is reinforcing, incentivising, rewarding, and ultimately, encouraging toxic behaviours in the legal profession across the world.
It requires a great deal of introspection to rectify these.3
Winning the game isn’t always the ‘correct’ action. Neither is playing the game. Playing a game comes with rewards and penalties, victories and losses, but it doesn't say anything about the 'correctness' of winning or losing, playing or sitting it out. Is football correct?4
We'll seek to address this intriguing question in the next post, where I'll pose this exact question to GPT.
Until then, stay safe and thrive!
I have learned how to cook lentils, for one!
My wife has informed me that I dance like an uncle. She tells me that is not because I am now past 30, but that it is certainly OK because I am now past 30. I do not intend to seek feedback elsewhere. But I do intend to own it. Roller-blade disco uncle, anyone?
This is the reason behind the subtitle of this article. One of the reasons I adore delving into AI is the beautiful cycle of learning and reflection that it sets up. Where I had viewed one topic as if through a glass darkly, I now learn a little more about two topics: AI and law. And in turn, I learn more about how I learn. Great stuff.
As a Fantasy Premier League player - I even wrote some optimisation algorithms which ensured that I did not finish last in my league even though I am not a regular fan - I know, and I understand, what some of you may be tempted to answer. But it is an absurd question, and a great one because it is absurd.