Yep, all this sounds right. I mentioned Bridge just because it also includes a crossfader. As does Pendulum come to think of it, but yeah these are all single-channel faders
Neither the Visual Cortex or the Marble Index I think is really designed for this. In particular the problem if you want the Visual Cortex output specifically is that you can’t really access the output signal, it’s intended a final compositor that feeds directly to your display or capture device. If you had a signal generated elsewehere in the video synthesizer that was going to VC channel A, no problem: leave channel B unpatched, trigger a wipe or patch to the compositor CV input. This does seem pretty achievable with some software or an outboard video mixer that accepts a MIDI trigger or something, though I know you said you don’t want to involve a PC.
If you had some more dramatic processing you could do with your camera feed (Staircase and Castle modules are great for that kind of thing) I think using VC to “gate” the output is probably pretty satisfactory for a workflow like this.