After the initial power-up, which checked the basics, such as the power rails being operational across the board with no obvious faults (i.e. nothing taking a lot of current) and the reset circuits operating correctly, the next step is to exercise each of the components on the board, to gain confidence that each of them are working as expected.

This means installing just the ESP32 and using a minimal piece of code to set all the output pins to safe initial states. Before initialisation, there are a number of fairly weak pull-up resistors on the board who's sole purpose it is, to ensure that when the board is not being driven by the ESP32, other devices are disabled / not selected, hence preventing bad drive combinations on the board, such as motors running uncontrolled, etc. Once the ESP32 boots, it takes over and starts driving outputs to known states, its driver outputs are much stronger than the high value pull-up resistors that are in place, so they are simply overcome with little effort and little wasted energy. 

Testing times

Now that the board is initialised, its possible to drive each of the outputs to ensure that each of the pins can switch to both a high and low level and that the signal arrives at the correct places on the board. This is validated with the Oscilloscope (scope). The purpose of this testing is to ensure that the board is defect free, with no short circuits, which will prevent some signals from switching properly, it also checks that there are no open circuits between the driving device and the devices that use that signal. Any failures indicates either a manufacturing issue or a chip failure.

Once the confidence is there that all the pins are driving as expected, then the proper code - which is still not ready at this point, can then begin to be used, which brings up the IC's on the board so that they can then start controlling their respective things, thus extending out the testing to all components across the board. 

Once all outputs are driven and software is operating, then the return signals from other devices to the various chips and the ESP32 must be checked. This is more complex as it requires that other devices are configured and operating and the respective conditions that cause their outputs to switch have occurred.

In reality, comprehensive testing takes a while as it has a lot of dependencies. However, the more that can be tested, the more confidence can be gained that the boards and devices are operating correctly, thereby taking manufacturing issues off the table and by devices operating, checking that the devices are functioning correctly too.  

I started with the SPI decoder and all the normal output pins, which there are not very many of, since most of the general purpose IO pins (GPIO pins) are off of Microchip MCP23S17 16-bit IO Expanders with SPI interfaces, there are two interrupt pins that can be configured in different ways depending on what I need them to do, these pick up from all the IO expanders and go back to the ESP32. This in turn means that the SPI bus, the interrupt pins and the I2C bus go across much of the board, connecting up to the relevant IC's.

Test pins are present next to the ESP32, so that I can connect these to my Saleae Logic Analyser, which is worth its weight in gold when working with a higher level protocol or more than a couple of signals at the same time as you can see things in context and decode the communication that is taking place, seeing actual data, errors and status messages flowing. If you enjoy this sort of hobby or are in studying Electronics at college or university, then give them a call and ask about their hobby and educational discounts, if you qualify, you might be pleasantly surprised. 

The Logic Analyser is generally more useful than the Scope in digital systems, since it has 16 channels compared to the Scope's 4 channels, However, the Scope can see things such as poor levels, noise, overshoot and undershoot, all of which are things that a Logic Analyser can't see. On the flip side, a Scope doesn't decode data (unless you get an expensive one that has this feature built in). In general, a Scope will simply shows what actually happening on the pin - both good and bad, whereas the Logic Analyser will show the information flowing and interrelationships between signals, it can also record this information for a much longer period than the scope can and importantly, put it on the computers screen, which is far larger than that of a Scope, its also easier to move around with the mouse and keyboard / search for specific data sequences and trigger on a variety of logical conditions. Both tools are essential and overlap in some areas, but in general provide different capabilities. 

Interrupt problem

Although initial testing went well, another problem was encountered when testing the IO Expanders, more specifically, the detection of changes from an input pin on one of those devices. This should alert the ESP32 via an Interrupt, the ESP32, on receipt of that, goes to all the IO Expanders to find out which one / ones had an event, reads what happens, processes it and continues. This information is stored in a couple of registers on each IC, so its quick to retrieve and process.

The problem was that the interrupt pins were not firing properly. These pins are open drain outputs and they are wire-or'ed across the board, meaning that any IO Expander can signal an interrupt to the ESP32 by pulling the pin low and when none are signalling a change, the pull-up resistor does its job, returning the signal to a logic 1 level.  There are two of these Interrupt pins. The configuration sent to the devices during initialisation, is to use one line for for PORTA changes and the other one for PORTB changes, this halves the amount of information that needs to be read when and Interrupt is received, again helping to reduce I2C bus traffic to a minimum. 

Probing with the scope showed the Interrupt pins were not firing properly, but this was new code and therefore untested, so is it a hardware problem or a software problem ? After the code was double-checked again, I swapped to the second board and re-tested, it worked fine, so the fault was on the previous board, this was not a design issue, its an assembly problem or a bad IC.

Further investigation with the multimeter showed a low resistance of a couple of ohms between the two interrupt pins, which are adjacent - not a short, but very close to it. This is less than ideal, since those traces go all over the board.

After a thorough visual inspection under my magnifier, there were no signs of the problem. Now to locate it and sort it out. I used my multimeter on its lowest ohms setting and zeroed out the leads, so that I can measure the small changes in resistance, since all wires have a resistance and the longer the wire, the more resistance it has. So, if I measure and the reading goes down, then I'm closer to the problem. Having narrowed the fault down to one part of the board, where there were a couple of chips. Out comes the hot air soldering iron and one chip is removed with the help of a bit of flux, yet the problem remains. Nice, a 50-50 percent chance of finding it and its the other one, why is that always the case ?. I pop that one off in the same way and the problem disappears. At least  I've found it, but where was it ?. I visually inspect the chips around those pins and they are fine, re-apply both to the board with a bit of fresh flux and the existing solder and the problem is no longer there and things jump into life. The fault must have been a little bit of solder from initial assembly that didn't flow properly and was under the chip.

An hour or so wasted, but at least the problem is fixed and I can continue talking to the IO Expanders and sensing all inputs from other devices. 

Once that was done, the I2C bus was exercised and it was working, at least for the PWM controller and the I2C multiplexer is switching the bus to the driver IC's, although, I can't test the differential bus drivers yet, since the remote modules haven't been smoke tested yet and I want one known good, working main board to expand out from.

Filled with confidence that at least the majority of the main board was working, I now really needed the main code base to be available, so work transitioned to getting that going, since the code was the main issue blocking further testing on the main board and the remote modules.  There is a lot still to do and spring is nearly upon us.