A journey towards a proper string type
Problem
I've been working to replace the atom types of the swc project. It uses string-cache
to intern strings, but it uses a global Mutex
internally, and as a result, it made swc slower. So there were an enormous amount of requests asking if it could be replaced with something different.
But well, I had no luck. Of course, I tried lots of existing libraries. Most of them were good, but all of them made swc slower than before. Even if they don't have mutex at all.
Initial requirements
swc must be faster with the new string library
String creation should be faster (No mutex)
Drop of AST should be faster (No mutex)
That's all! I just wanted faster swc. And... because everyone said Mutex
in string-cache
is the cause of slowness; I just thought it was true. But... but... swc was using js_word!()
in almost everywhere, so the PR for removing direct usage of js_word!()
was extremely time-consuming.
Anyway, I managed to do it. So I could replace string libraries easily if I wanted.
Tried solutions
Of course, I tried various libraries. At the moment of the js_word removal task being done, it's simply matter of performance; swc_atoms::Atom
is now a newtype struct with a private field, so the library I use is simply an implementation detail.
I tried
tendril
smartstring
ustr
compact_str
I always had a good reasoning behind a choice. I'll describe them in the order.
tendril
tendril
provides a strange, but useful type. It's similar to String
, but it's internally Clone-on-Write
, and it's reference counted. The noticeable thing is that the tendril
instance for the whole file can be used to store text data for subtendrils, but with a good encapsulation.
If I use it, I'll be able to avoid many string copies. But I have to modify 4 parsers. The swc project now has parsers for ECMAScript, CSS, HTML, and XML so it would be really time-consuming.
Sadly, I couldn't profile tendril
version correctly, because I had to patch a lot more than when I chose another library.
smartstring
I was already using smartstring
in some parts of swc. So, I thought it might be faster. But well, it was not the case. The operations for hash maps became much slower, and it resulted in 10% ~ 25% performance regression.
I'm not saying it's bad. It's simply not a fit for a project that relies heavily on hashmap operations.
string-cache
didn't have this problem, because it interns all strings. After interning, a string is simply an u64
, with a small optimization for Option<T>
.
I realized that I have to think in the way of DoD (data-oriented design). We are not interested in modifying strings, at least in place. Yeah, strings can be stored at the stack, but well, for what? We are more interested in storing it in a hashmap and using it as a key.
ustr
So I tried ustr
. This is quite similar to string-cache
, and it's faster! But it has a critical problem. It does not free strings.
I couldn't use it. But anyway, I now know what matters, and what doesn't.
compact_str
I knew I had to create a new string type, but I wanted to check this library because the documentation of it made me surprised. And the source code of it is quite beautiful. You may not agree with this, but it's fine. ๐
I did what I wanted to do, and it made swc slower compared to string-cache
. But while profiling, I found that I made a huge mistake while analyzing profiling results in the previous steps.
My mistake
What matters is Eq
being fast, not Eq + Hash
. Once Hash
produces an identical result, eq
(==
) is called, and it's very slow. Eq
of two strings becomes a call to memcmp
. Hash
is also important as it's also invoked, but hashing is much faster than I thought. Additionally, if you manage to implement very fast Eq
, it's very likely that Hash
is also fast.
New requirements
So... time to create a new library. Let's list the requirements first. I'll describe what each item means after listing items.
Eq
must be very fast. It should not callmemcmp
at least.Hash
should also be fast.Creation and drop of the type should be fast, and it should not involve a
Mutex
.The size of the type should be small. It shows up in AST quite frequently.
- It should store small strings inline without going to the heap.
Restrictions are super tricky, so I thought
Is this possible in the first place?
But yeah, it's possible.
Eq
must be very fast. It should not callmemcmp
at least.
It means string should be intern
-ed. After interning, those are simply numbers so ==
becomes very cheap. But typically interning is global and as a result involves a Mutex
. This makes design harder, but I concluded it's possible without a Mutex
if I don't need the global interning.
Hash
should also be fast.
It's a bit complex, but we can store the hash value next to the string data, just like string-cache
.
Creation and drop of the type should be fast, and it should not involve a
Mutex
.
Hmm................
After some thinking, I found that it's also possible if I don't need global interning.
The size of the type should be small. It shows up in AST quite frequently.
This is also a hard requirement. But I'm good at reading and adjusting the code from others, so I decided to look at the source code of string-cache
.
It should store small strings inline without going to the heap
The authors of string-cache
already solved this issue, so I can take a look at their code.
Roadmap
After a long time of thinking, I now have some concrete ideas. I'm writing design documents, and experimenting with the Rust library. It's very unusual that I write documentation. Also, putting my name in my work feels quite strange, but I wanted to do so because this can be useful for optimization in general.
I want to use it for swc asap, but it involves lots of unsafe
, so I have to be cautious. My plan is
I'll publish the design document using this blog. (kdy1.dev)
I'll open-source it once it's ready.
Currently only features for DYNAMIC_SET
of string-cache
are implemented. No inline strings, no known (static) strings. I want to have all of them. I'll post an update asap.