Waivio

Recommended Posts

Video to Written Word: Contributing to the Network State Project

5 comments

mightpossibly681.069 days ago7 min read

 https://img.inleo.io/DQmWZvVE4SSxLiPpd1TzL59EcPuQ3UAhLX6WXQwdTQq6c1s/Thumb-book-video.jpg
 
I was approached by @starkerz the other day, with a request to contribute to his and @theycallmedan's recent book project, @networkstate - Securing Digital Rights for Communities: Game Theory and Governance of Scalable Blockchains for use in Digital Network States.

From the preface:

What follows is the basis of what we hope will become a book and set a standard in the industry for what true decentralisation is. No premines, ICO's, companies, CEO's or early Venture Capital. Just community backed freedom for all participants.

Their approach to writing this book has been to record a series of videos to 3speak - one for each chapter - instead of actually writing it. But they do, however, want it to be compiled into an actual written book, which is the reason why they approached me.

They were familiar with my work through the @ai-summaries project, an ongoing project I started at the beginning of this year (2024). It basically entails transcribing and summarizing various 3Speak videos and podcasts, with the main intention of making their contents accessible to more people (keywords; accessibility, language barriers, consumption preferences). So far approximately 4800 video summaries has been posted to chain, and the number is growing every week.

So their request was basically to summarize all the chapters using AI, so that they could use them as a basis for getting a book down in written form.

Step 1: Transcription

To be able to do anything with it, the first thing I did was to transcribe all the videos. I already have a pipeline set up to automatically download and transcribe a series of videos, so I saved a bit of time there (took approximately one afternoon - video transcription is quite resource itensive process when you do it on your own equipment) -- I use WhisperAI for transcription for those who might wonder.

With all 26 transcripts in place, I could then refine them by using my custom built open source program HIVE ASR Dictionary that I've been adding Hive-related words to its database since the beginning of the year.

Step 2: Compiling Transcripts into Chapters

I started out by crafting a prompt which I tested on a few of the transcripts. I had to refine it further a few times until I got results that were consistent enough for me to process the rest of the transcripts with.

From experience, I knew that I should use Anthropics Claude-3.5 Sonnet to process the transcripts, as chatGPTs context window tends to be a bit too small for long transcripts like some of the ones I have. I've also in general had much more success Anthropic than with OpenAI, so the choice was easy.

After processing all 25 transcripts, I had 25 chapters neatly organized with subheadings that I could manually paste into a single Word document with proper formatting. It totaled 60 pages.

 https://img.inleo.io/DQmX47eaGXgusnc8dTTWLB8iUbRcbSHxjjeLu8mg8Yi7GCC/image.png
 

Token Output Limitations

I shared the results with Starkerz that was happy with the results, but had envisioned that the book should be a lot longer. And indeed so; I did a word count of all the transcripts I had, and did a rough calculation of how many pages that approximately would equate to (Font Arial, size 12), or approximately 300 pages. A lot more than the 60 pages I had managed to create in v1, for sure.

The problem is, of course, that even the most cutting edge LLMs have pretty severe limits on output tokens (i.e. the length of responses they can generate).

Regardless, I went about processing all the transcripts again with an improved prompt and a doubled compute token limit (set to the maximum). But even with increased token limits and a refined prompt, I was still able to get version 2 at about 80 pages. Better, but still way too short.

I was about to say that this is the best I can do when...

Refining the transcripts

Starkerz had a great idea; What if instead of summarizing the chapters, that I instead use AI to simply reproduce the transcripts, but with cleaned up language and removed filler words.

I'd of course still be limited by the token output limits, so I had to do some testing and figure out how big parts of the transcripts I could process without it simply doing a summarization instead of keeping the original wording intact. After testing for a bit and crafting a new prompt for the purpose that was working fairly consistently, I found that transcript chunks could be about 5,5k characters before the LLM started acting inconsistently.

Write a program to chunk-process the transcripts

Now I had to write a program to chunk the process the transcripts, as the largest ones being about 80k characters, and the biggest I could process was closer to 5,5k. This was done fairly quickly in Python using my trusted AI coding assistant.

At this point I could process all the transcripts again for the third time, this time hopefully with an even better result.

Step 3: Putting V3 together, Consisting of the Refined Transcripts

Again I got to working on putting the chapters together in a word-document with proper formatting. With the new method, each chapter was significantly longer - much better. The downside to the latter approach as compared to the first method, was of course that this method did not add any subheaders or anything like that.

In the end I ended up with a total of 176 pages. Keep in mind that it would probably actually be a lot more pages in a completed book, as this version (as mentioned) has no subheaders to further organize the chapters.

 https://img.inleo.io/DQmNi4VqNSa1WGfGBPCF9TP2jUPhGtbA7EwHKDL3pq1emmd/image.png
 

Assignment Completed: Thoughts on Next Steps

I delivered all three versions to Starkerz, noting that the key difference between the three versions is that the latest was much more content-rich but lacked the subheader structure seen in the earlier summaries. So in my head, it would make sense to use all three versions in the continued work with the book, using the subheader structures of the first two versions, while using the third version for the actual contents of said chapters and subchapters.

Luckily for me though, I'm not a writer, so my job ends here.

My understanding is that they intend to engage a professional writer to help them complete the book. I wish them the best of luck in the continued process, and sincerely hope that my efforts with this turns out to be of help.

Reflections: The Power and Limitations of AI in Creative Projects

This has been an exciting project for me, to test the limits of AI and to see how far one can get with these tools in such a project. And as we can see; quite far! Not all the way there obviously, but pretty far. I'm pretty certain a ton of man-hours has been saved because of this work.

It also confirms my previous experience, that you get much better results when you have good content to feed into it to begin with - as these videos truly are. Really! Go watch them on 3Speak right now.

Conclusion: A New Model for Book Creation?

I'm also intrigued by this way to write a book. Perhaps this is how books will be written in the future? What we're definitely looking at is a new model for book creation that blends video, AI and traditional authorship. I'm excited to have gotten the chance to be a part of it.


Here are the three versions (google doc links):

Posted Using InLeo Alpha

Comments

Sort byBest
AI
Waivio AI Assistant
How can I help you today?