A journey towards a proper string type

I decided to create my own string caching library named kdy_str. This article is about my reasoning.

ยท

5 min read

Problem

I've been working to replace the atom types of the swc project. It uses string-cache to intern strings, but it uses a global Mutex internally, and as a result, it made swc slower. So there were an enormous amount of requests asking if it could be replaced with something different.

But well, I had no luck. Of course, I tried lots of existing libraries. Most of them were good, but all of them made swc slower than before. Even if they don't have mutex at all.

Initial requirements

  • swc must be faster with the new string library

  • String creation should be faster (No mutex)

  • Drop of AST should be faster (No mutex)

That's all! I just wanted faster swc. And... because everyone said Mutex in string-cache is the cause of slowness; I just thought it was true. But... but... swc was using js_word!() in almost everywhere, so the PR for removing direct usage of js_word!() was extremely time-consuming.

Anyway, I managed to do it. So I could replace string libraries easily if I wanted.

Tried solutions

Of course, I tried various libraries. At the moment of the js_word removal task being done, it's simply matter of performance; swc_atoms::Atom is now a newtype struct with a private field, so the library I use is simply an implementation detail.

I tried

  • tendril

  • smartstring

  • ustr

  • compact_str

I always had a good reasoning behind a choice. I'll describe them in the order.

tendril

tendril provides a strange, but useful type. It's similar to String, but it's internally Clone-on-Write , and it's reference counted. The noticeable thing is that the tendril instance for the whole file can be used to store text data for subtendrils, but with a good encapsulation.

If I use it, I'll be able to avoid many string copies. But I have to modify 4 parsers. The swc project now has parsers for ECMAScript, CSS, HTML, and XML so it would be really time-consuming.

Sadly, I couldn't profile tendril version correctly, because I had to patch a lot more than when I chose another library.

smartstring

I was already using smartstring in some parts of swc. So, I thought it might be faster. But well, it was not the case. The operations for hash maps became much slower, and it resulted in 10% ~ 25% performance regression.

I'm not saying it's bad. It's simply not a fit for a project that relies heavily on hashmap operations.

string-cache didn't have this problem, because it interns all strings. After interning, a string is simply an u64, with a small optimization for Option<T>.

I realized that I have to think in the way of DoD (data-oriented design). We are not interested in modifying strings, at least in place. Yeah, strings can be stored at the stack, but well, for what? We are more interested in storing it in a hashmap and using it as a key.

ustr

So I tried ustr. This is quite similar to string-cache, and it's faster! But it has a critical problem. It does not free strings.

Okay Bye GIFs | Tenor

I couldn't use it. But anyway, I now know what matters, and what doesn't.

compact_str

I knew I had to create a new string type, but I wanted to check this library because the documentation of it made me surprised. And the source code of it is quite beautiful. You may not agree with this, but it's fine. ๐Ÿ˜Š

I did what I wanted to do, and it made swc slower compared to string-cache. But while profiling, I found that I made a huge mistake while analyzing profiling results in the previous steps.

My mistake

What matters is Eq being fast, not Eq + Hash. Once Hash produces an identical result, eq (==) is called, and it's very slow. Eq of two strings becomes a call to memcmp. Hash is also important as it's also invoked, but hashing is much faster than I thought. Additionally, if you manage to implement very fast Eq, it's very likely that Hash is also fast.

New requirements

So... time to create a new library. Let's list the requirements first. I'll describe what each item means after listing items.

  • Eq must be very fast. It should not call memcmp at least.

  • Hash should also be fast.

  • Creation and drop of the type should be fast, and it should not involve a Mutex.

  • The size of the type should be small. It shows up in AST quite frequently.

  • It should store small strings inline without going to the heap.

Restrictions are super tricky, so I thought

Is this possible in the first place?

But yeah, it's possible.


Eq must be very fast. It should not call memcmp at least.

It means string should be intern-ed. After interning, those are simply numbers so == becomes very cheap. But typically interning is global and as a result involves a Mutex. This makes design harder, but I concluded it's possible without a Mutex if I don't need the global interning.

Hash should also be fast.

It's a bit complex, but we can store the hash value next to the string data, just like string-cache.

Creation and drop of the type should be fast, and it should not involve a Mutex.

Hmm................

After some thinking, I found that it's also possible if I don't need global interning.

The size of the type should be small. It shows up in AST quite frequently.

This is also a hard requirement. But I'm good at reading and adjusting the code from others, so I decided to look at the source code of string-cache.

It should store small strings inline without going to the heap

The authors of string-cache already solved this issue, so I can take a look at their code.


Roadmap

After a long time of thinking, I now have some concrete ideas. I'm writing design documents, and experimenting with the Rust library. It's very unusual that I write documentation. Also, putting my name in my work feels quite strange, but I wanted to do so because this can be useful for optimization in general.

I want to use it for swc asap, but it involves lots of unsafe, so I have to be cautious. My plan is

  1. I'll publish the design document using this blog. (kdy1.dev)

  2. I'll open-source it once it's ready.

Currently only features for DYNAMIC_SET of string-cache are implemented. No inline strings, no known (static) strings. I want to have all of them. I'll post an update asap.

ย