Having solved The Big Problem®, I was afforded no respite before having another problem to address. I noticed along the way that the testing code I had added to draw debug lines for all of the bullets that had entered bounding spheres was displaying a different set of bullets than the bullet rendering script was displaying. When I sucked up the performance impact and had the rendering script draw debug lines for every bullet in existence, I found something worrisome: all of the visible bullets were in fact behaving as they should and appearing in their correct locations, but a fair chunk of the bullets that had been fired were invisible!
I temporarily disabled stretching for the bullet sprites and made them really big so it was obvious which were being displayed properly and which only via debug lines (tiny blue dots).At first I figured that some conditional statement or other was hiding bullets, so one by one I tried disabling all such checks. I even made the system draw red debug lines for inactive bullets. I had no luck.
Then I figured maybe something was up with the collision system. Fortunately that already has a simple Off switch, but it turned out that wasn't it either.
It did turn out that, while I didn't bother counting them myself and don't expect anyone else to do so, that exactly 2/3 of the bullets were invisible.
One may notice that 3 is the exact number of vertices in a triangle! HALF LIFE 3 CONFIRMED
Just kidding. The real reason this is relevant is because I was calling the sparsely undocumented function Graphics.DrawProcedural(). Because my geometry shader outputs triangles, I figured that when it asks for the MeshTopology argument, I should say to use triangles. Nope! Somehow that was causing the shader to be informed that two out of every three vertices were part of a triangle belonging to the first vertex and should be skipped by the geometry shader. Odd. Changing the MeshTopology argument to specify points (individual vertices) fixed it.
So yeah the moral of the story is that if you're drawing a point cloud using Graphics.DrawProcedural, use MeshTopology.Points. Hooray! Now my system draws three times as many bullets with one simple change in code and no significant change in performance! Here's how the system looked after all the recent improvements:
The captions in the image should be fairly self-explanatory. I can stretch the bullets based on their absolute world-space velocities or their velocities relative to the camera or some other object, I can spawn particle effects at their precise points of impact, and when the bullets ricochet they finally do so based on proper reflection vectors and I can specify how much of their original velocity to maintain and how much to randomly scatter. There is a tiny amount of imperfection in the impact positions still, but I feel that I've refined it as much as I need for the time being.
Here's another comparison, this time showing all of the iterations thus far:
Observe how, as the camera moves in order to maintain its position relative to the target ship, the purple bullets stretched according to their absolute world-space velocities appear to stretch in the wrong direction, whereas the cyan bullets appear much more correct. It's not shown above, but I can also have the cyan bullets maintain some portion of their forward velocities and penetrate into the target rather than bounce. This could be useful on some kind of powerful railgun that can punch through armor plates to damage modules inside a ship.Also note that I finally fully implemented the option to have bullets become inactive upon striking a module, as I expect will be the case for most bullets in the final game. And look! I even got bullet damage working:
With all these kinks worked out, my next task was optimization. When I originally cooked it up, even with all its flaws, the GPU Bullet System could handle over 100,000 bullets on screen at once before any noticeable performance drop. At this point? Not so much.
The bulk of the problem is that the GPU can easily draw tons of identical sprites, but communication between the GPU and CPU is relatively slow, even on an integrated graphics chip. While the actual GPU and CPU are intimately connected, literally sharing a casing, in this architecture (if I'm not mistaken) the GPU stores its data in the same place the CPU does - the RAM. Thus, every time a buffer needs to travel between one memory region and the other, I lose several cycles of processing waiting for the data to be accessed from the RAM. Since each different type of bullet needs its own set of compute buffers, and at least one of these has to travel one way or the other up to four times per frame, having many different kinds of bullets in play at a time (as I do currently) leads to a significantly lower framerate.
Currently I'm investigating a few remedies to this.
By doing some optimization work in the code to reduce unnecessary operations and eliminate garbage generation wherever possible, I've sped it up by a noteworthy factor, but I still have several milliseconds' delay every frame while the CPU waits on data from the GPU. I've experimented with making this subsystem asynchronous, but that enables the data to arrive in an entirely different frame from the one during which I requested it and thus I have to deal with discrepancies in where the bullets are in the buffer I've received versus the main buffer, which in turn leads to grossly inaccurate results from physics queries and, once again, to my frustration, bullets floating through the target ship without touching it.
Next I think I'll investigate the possibility of only having one bullet manager script for the whole game and having each bullet within the buffer carry a bunch of extra data about what sort of bullet it is. Depending on how far I go with this, it could get very, very complicated.
There's also still the option of keeping it as-is. I'm open to suggestions on this.
Finally I leave you with a picture of what happens when I dispense with concerns about framerate entirely and make the system display ONE MILLION BULLETS!
There's no starfield background here - every single little white dot in the image is a bullet that the system is processing and rendering. The framerate is less than stellar in this situation but still surprisingly high, and I surmise that a more powerful GPU than mine would handle it easily.
No comments:
Post a Comment