Obsidian Performance Test - Take 1

# Obsidian Performance Test - Take 1 ![[graph.jpg|Graph View of Performance Test Database]] Imagine a database containing the full text of all the books and publications you have ever read, plus all your reading notes with links to the source paragraphs and your text highlights in the source. It also has everything you have ever written: your journal, all the articles, books, documents, and reports. Such a database would be a powerful source of knowledge. With persistence, it is possible to build this in [Obsidian](https://obsidian.md). But can Obsidian handle this volume of information? This post is about my attempt to understand Obsidian's performance limits. I will present my approach and findings. # Obsidian's architecture at a glance Obsidian is a web application hosted in a local web browser environment called [Electron](https://github.com/electron/electron). It operates on a folder of files stored locally on your computer. Obsidian calls this folder the Vault. Information is stored in documents and attachments such as image files. Documents are formatted using markdown, a simple, platform-independent markup language. Obsidian provides features that facilitate the navigation and editing of these documents: search; linking to documents, to images, to chapters and to paragraphs (a.k.a. blocks); autocomplete for links; management of backlinks; automatic updates when you move or rename files; publishing; synchronization, etc. As I was preparing for this test, my expectation was that Obsidian should be very scalable because it stores documents and attachments in the computer's file system. All Obsidian needs to maintain is an index database to aid full-text search and the maintenance/navigation of links. # My test database I loaded 2459 full-text books into my test Vault and generated 565 full-book literature notes containing 161 820 block references. The total size of my test Vault is 727 MB or 8 074 688 non-empty paragraphs (a.k.a blocks). I did not load any images. The largest document I loaded is 14.1MB (2 662 616 words) long. I used plaintext books from the [Gutenberg Library](https://www.gutenberg.org/). For these, I auto-generated 18 645 chapter headings (e.g. `# Chapter 1`). Besides the files from the Gutenberg Library, I also loaded Joschua's [Bible Study in Obsidian Kit](https://forum.obsidian.md/t/bible-study-in-obsidian-kit-including-the-bible-in-markdown/12503?u=selfire). Some may argue that they read more than ~2500 books and create detailed notes for more than ~550 of those. Personally, I'd be thrilled to have detailed personal notes of 50 books! I feel this volume of written information is a realistic target for a Personal Knowledge Management system. I did not include images and multimedia files in my test, because even though these take up significant space on the local filesystem, I believe that Obsidian only indexes their filenames. I don't expect images and other attachments to have a major contribution to overall Obsidian performance. # Findings Obsidian performed well. The editing experience, such as highlighting paragraphs, worked smoothly even with the largest document. The search was slow at first, but maybe Obsidian was still indexing in the background. When I wanted to do some screen capture of slow search performance, it was already performing well. ## Referencing chapters worked well Even with this volume of chapters, search for chapters felt very smooth. Here's an example of looking up a chapter/sub-chapter by typing `[[##` followed by part of the chapter's heading text: ## Referencing blocks had some limitations The same discovery feature, however, stopped working for block references. In my main Vault, I can reference paragraphs by typing `[[^^` followed by some text from the paragraph. In my performance test Vault, this did not work. If, however, I know the title of the document from which I want to reference a specific paragraph, then `Document-Title#^` followed by some text from the paragraph I want to reference worked smoothly. If I am unsure which document contains the paragraph I want, I can use search to find it first. In the example below, I open Obsidian search with a hotkey and type the beginning of the paragraph I am searching for. Once found, I drag the link to the document containing the paragraph I want to reference into my notes, and use the `[[Document-Title#^` format to search for the specific paragraph (again) to create a block reference to it. Admittedly, there is an extra step, Roam's solution of CTRL+drag to create a block reference is much nicer. But the solution works, and the use-case is not that extremely frequent to make this a deal-breaker. ## Editing a 14MB large document felt fluid The title of the document is "gn06v10", which is The Entire PG Works of George Meredith. Here's what I do in the demo below. 1. I open the search to look for this document. Note, that search would have probably performed faster if instead of doing a full-text search for "gn06v10", I rather searched for `file:gn06v10`. Search locates the document in a few seconds. 2. I opened the document and added some random text highlights. Highlights take 1-2 seconds to take effect. 3. Finally, I scroll to the middle of the document (scrolling is quick) and add a line of text. Here, Obsidian freezes for more than a few seconds then adds the sentence I've typed from the keyboard buffer. Remember, this is a 14MB large text file. I am not aware of any other text editor that would perform significantly better managing a document of this size. ## Hitting the wrong button occasionally resulted in long waits... sometimes followed by a black screen Truth be told, these incidents happened right after I loaded the large volume of documents into Obsidian. When I wanted to reproduce the issue to create a screen capture, I wasn't able to. I consider this as an issue while Obsidian was indexing in the background. That said, it is not elegant, that starting a search resulted in Obsidian freezing, and eventually halting, and leaving a blank black screen behind. After terminating and restarting Obsidian, the same search worked with no issues. # Conclusion Obsidian passed the first performance test well. Apart from the issue with block discovery and the occasional black screen, it handled the large volume of text well. Loading and reading books in Obsidian, however, is not yet solved very well. Parallel to this performance test, I have started to read my first ebook fully within Obsidian. Unfortunately, there don't seem to be good tools available to convert ePub files into Markdown (separate files per chapter, proper table of contents with chapter links, navigation links between chapters, and images copied to a separate folder with working links in the document). Even once you are ready with the hurdle of the format conversion, reading and highlighting are also not very comfortable, especially on mobile phones. Obsidian not remembering where I left off reading is just one hassle making reading books in Obsidian challenging. Now that I know Obsidian can handle the volume of information such that in theory, I could have my library integrated with my notes, I will focus on creating the scripts, maybe the plugin to allow for a more friction-free reading and note-taking experience. # Comparing to Roam While the following articles won't offer a direct comparison of a similar scenario, I spent many weeks trying to load books into Roam. Here are my posts dealing with the topic: - [[Importing the Bible to Roam - Final Solution]] - [[Study Bible or ePub Books in Roam - My rollercoaster ride with Roam JSON]] - [[My Adventures with Roam.JSON]] - [[Read Books in Roam - A Detailed How To Guide for Importing and Using ePub in Roam]] # Scripts used For reference, I am sharing the two scripts I used in the performance testing process. ## Generating literature notes with block references ```js files = app.vault.getMarkdownFiles(); stepsize = 15; refs=0; blocks=0; for(f of files) { notes = ""; if(f.path!="index.md") { text = await app.vault.read(f); lines = text.split("\r\n"); i = Math.floor(Math.random()*stepsize); while (i<lines.length) { if (lines[i].length>10) { refs++; blockId = "^"+Math.floor(Math.random()*Date.now()).toString(36); lines[i] = lines[i] + " " + blockId; notes += "> ![["+f.basename+"#"+blockId+"]]\n"+i+". Morbi lobortis augue egestas arcu porttitor, in cursus felis posuere. Nulla finibus vestibulum arcu, id molestie urna fringilla at. Fusce sit amet velit a est tincidunt iaculis sit amet vehicula dui. Aliquam elementum ex eget accumsan pulvinar. In in sollicitudin ex. Nam ut est condimentum, efficitur augue in, cursus augue. Sed faucibus mi non tempor egestas. Proin et nibh dignissim sapien feugiat porta a quis enim. Donec id leo ultrices, molestie dui ut, elementum ligula. Maecenas id suscipit tellus, et luctus libero.\n\n"; blocks += 2; } i += Math.floor(Math.random()*stepsize); } await app.vault.create(f.path.split(".md")[0]+" - litnote.md",notes); await app.vault.modify(f,lines.join("\r\n")); blocks += lines.filter((l)=>l!="").length; } } console.log("Number of files",files.length*2-1); console.log("Number of blocks",blocks); console.log("Number of block references", refs); ``` ## Basic Vault statistics ```js files = app.vault.getMarkdownFiles(); paras = 0; headings = 0; block_refs = 0; for (f of files) { text = await app.vault.read(f); lines = text.split("\r\n"); paras += lines.filter((l)=>l!="").length; headings += lines.filter((l)=>l.match(/^#+\s/)).length; block_refs += lines.filter((l)=>l.match(/ \^[^\s]+$/)).length; } console.log("number of non-empty paragraphs",paras); console.log("number of block-refs",block_refs); console.log("number of headings",headings); ```