I’ve been working on parsing 1 million records into a local graph. I noticed that it was taking a while so I put together a test app to find where the problem could be. The tests was creating
int.MaxValue number of vertices into the HashRepository.
Running the test verified what I thought was happening. Somewhere the code is causing \(O(n^2)\).
I ran a profiler on the code and found that 60% of the time was being spent on json serialization.
So I swapped out to a different JSON serializer an re-ran the test.
The first part of the graph could be \(O(n^2)\) and the second looks linear \(O(n)\). The profiler showed an improvement. Now only 41% of the time is spent in serializing. But still it’s not what I’m looking for. I want at least linear time.
Since it looks like it may be the serializer, I switch caused the serializer to just return a constant value each time.
Whala! There is the linear time I wanted. Notice the hops on the line when something took a little longer than usual. I would guess that is the hash table resizing. Let’s see where the profiler says the code is spending its time. Now all the time is spent calculating the KeyHash for the property
Here are how they compare to each other
I still don’t feel comfortable with the serializer as the culprit for the quadratic growth. Instead of creating a lot of Vertices, I will just create one vertex and add a lot of properties using the default serializer. This had a much better growth.
The profile tells me most of the time is in the Serializer. Hmmm. I have good growth and am using a serializer. Perhaps the serializer isn’t the issue.
Let’s see the Newtonsoft serializer.
Same growth, and most of the time was spent in the Serializer. Hmmm.
Let’s see with no serializer.
Together they look like this
I’m not really seeing anything.