I purpose that root of problem are in
a) Lenovo making BAD design of main PCB and this voltage regulator overloaded all the time, and burning to death is just manner of time.
(Buying the same PCB of even whole server in most cases - not a solution, because again You return to situation when server probably die at any next day or after next shutdown/restart…)
b) Lenovo make good PCB design but choose NOT WELL QUALITY manufacturer to buy this regulator from it AND have NO SUCH TIME TO TESTS before production.
(M3 was a long stay on a market and Lenovo need quickly provide a more powerful model to keep stable positions on a server market against more other cheapest brands with more aggressive advertising);
c) Lenovo making BAD DESIGN of PCB (but voltage regulator are well designed and from manufacturer with excellent Quality Control) and in conjunction with UNDERDVELOPED FW this lead voltage regulator to die because of burn because of outputs overloading.
(this c) are just extended variant of a)
- personal experience and research on www;
- my personal contacts with service engineers who spend over 30 years with hundreds of thousands of enterprise servers and
- fact that LENOVO MAKE 2-ND GENERATION OF THIS MAINBOARD (but only small repair services with experienced engineers and official servicemens know about this, but NDA) after this scandal with x3550/x3650 M4
lead me to choose c) as right decision.
(more You may read on Lenovo official forum by this link https://forums.lenovo.com/t5/System-x-X6...).
So, there are only ONE WAY if You need a STABLE & PREDICTABLE work of Your server:
JUST BUY THE 2-ND GENERATION OF THIS PCB
This way also eliminate Your lifetime (and also significantly decrease downtime of Your service, especially if it linked to government or financial org, and need to be online 24/7/365) to migrating and re-setup all software on a different Lenovo-manufactured server or other brand server (Dell, Fujitsu, MSI,…).
Need to note that most users that has experienced the same problem (and write about this on forums)
a) are geographically from EU or Asia;
b) their M4’s are in use around 1-2years;
But again, this is definitely NOT A PROBLEM OF CERTAIN GEOGRAPHICAL REGIONS (because FRU of main PCB are the same in US and Indonesia/Portugal, with all respect to all countries), this is just about Lenovo try to eliminate reputation damage due possible problems with FRESH model, when have no such time to polish electronic, physical design, firmware and software (like Director, etc…)
Logically, better receive the some reputation damage on LESS PROFITABLE markets, rather BIG REPUTATION DAMAGE on main (>80% sales are in US) market.
Nothing wrong with this, - normal practice for EACH COMPANY in tech industry nowadays.
The problem are in that the Lenovo was CONTINUE TO SELLING PROBLEM MODEL AFTER THOUSANDS OF USERS CONFIRM THE PROBLEM. (May be much more, because hundreds of big data centers have NDA about the cases like this and not disclosure all facts to not to loose contracts).
Even more: Lenovo not agree to giving free replace program for the old and this new customers, and propose only paying for serviceman coming an replace or selling new motherboard (sometime the same 1-ST PROBLEM GENERATION). ;(
VERY BAD CUSTOMER CARE.
Today I find the old (2015-2017 years messages date) russians electronic forum where the are VERY USEFUL info about diagnostic & resolving this case:
Use Google translate to read. VERY USEFUL INFO !