Antminer S19 Pro+ Hydro Hash Board Repair Guide
Ⅰ. Repair platform / tools / equipment requirements:
1. Platform requirements:
Anti-static repair workbench (It must be grounded), anti-static wristband (Grounded).
2. Equipment requirements:
(1) Constant temperature soldering iron (350-380 degrees), with a pointed soldering iron tip for soldering small SMD components such as resistors and capacitors;
(2) Hot air gun and BGA rework station for chip / BGA disassembly and soldering;
(3) Multimeter, with a soldered steel needle and a heat shrink tube for easy measurement (Fluke 15b+ 15BMAX-01 recommended);
(4) Oscilloscope (UTD2102CEX+ recommended);
(5) Network cable: Internet connection, stable network.
3. Test tool requirements:
(1) APW11 PSU (Part No.: TCAP021600130);
(2) DIY power adapter cable: Use thick copper wire to connect the positive and negative poles of the PSU to the hash board. It is recommended to use 4AWG copper wire with a length of less than 60cm for powering the hash board;
(3) Test fixture using V2.2040 control board (test fixture part No. ZJ0001000001).
4. Maintenance auxiliary materials / tools required:
(1) Solder paste M705, flux, board washing water with anhydrous alcohol (It is used to clean the flux residue after maintenance);
(2) Thermal conductive gel (specification: Fujipoly SPG-30B) is used to apply on the chip surface after maintenance;
(3) Ball planting steel mesh, solder wick, solder ball (ball diameter recommended 0.4mm): When replacing a new chip, it is necessary to tin the chip pins and then solder them to the hash board. Apply thermal conductive gel evenly on the chip surface and then lock the water cooling heat sink.
(4) Barcode scanner;
(5) RS232/TTL adapter board;
(6) DIY short-circuit probe (Pins can be used).
5. Common maintenance spare material requirements:
(1) 0402 resistors (0R, 10R, 33R, 100R, 1K, 2K);
(2) 0402 capacitors (0.1uf, 1uf).
Ⅱ. Repair requirements:
1. Pay attention to the operation method when replacing the chip. After replacing any accessories, there is no obvious deformation of the PCB board. Check whether there are open circuits or short circuits in the replaced parts and the surrounding parts.
2. The repair personnel must have certain electronic knowledge, more than one year of maintenance experience, and be proficient in BGA / QFN / LGA packaging soldering technology.
3. After the maintenance, the hash board must be tested more than twice and all are OK before it can pass!
4. Check whether the tools and testers can work properly, determine the maintenance station test software parameters, test fixture version, etc.
5. For the test of chip replacement, it is necessary to detect the chip first, and then do the functional test after passing. The functional test must ensure that the water cooling plate is assembled OK. When installing the water cooling plate, the surface of the chip must be evenly coated with thermal conductive gel, and the cooling fan is at full speed (factory test PT1, after-sales can refer to the use of water drainage).
6. When measuring the signal, 4 fans are used to assist in heat dissipation, and the fans are kept at full speed.
7. When the hash board is powered on, the negative copper wire of the PSU must be connected first, and then the pattern is tested. The hash board after repair must be cooled down before testing, otherwise it will cause test PNG.
8. When replacing a new chip, the pins and solder paste must be printed to ensure that the chip is pre-tinned before soldering to the PCBA for repair.
9. The test fixtures on the maintenance end all use the Test_Mode mode and the code scanning mode for testing. After the test passes, the production end flows from the first test station and is installed and aged normally (installed at the same level).
Ⅲ. Test fixture production and precautions
The test jig should meet the heat dissipation of the hash board and facilitate the measurement of signals:
1. Receive the material number: ZJ0001000001 test fixture.
2. For the first time, use the S19 pro+ Hydro series test fixture SD card firmware to update the FPGA of the fixture control board. After decompression, copy it to the SD card and insert the card into the control board card slot; power on for about 1 minute and wait for the indicator to flash 3 times before the update is completed; (If it is not updated, it may cause a fixed report of a certain chip failure during the test).
3. Make the test SD card according to the needs. For PT1 chip detect and PT2 function test, directly decompress the compressed package to make an SD card; After decompression, delete the original Config file first, such as renaming the Config.ini-HHB42601-PT2 configuration file to Config.ini, then click "Yes", and the final configuration file will be "Config.ini".
4. When testing the PT1 chip at the production, after-sales, and outsourced maintenance ends, a barcode scanner, serial port tool, and network cable are required. For details, see the S19pro+ Hyd. test process document.
IV. Principle Overview
1. Working structure of S19 pro+ Hydro hash board:
(1) The hash board consists of 180 BM1362 chips (screen printing order BM1-BM180), divided into 60 groups (domains), each group consists of 3 ICs;
(2) The operating voltage of the BM1362 chip used in the S19 pro+ Hydro hash board is 0.32-0.326V;
(3) Domains 60, 59, 58, 57, 56, 55, and 54 (a total of 7 groups) are powered by the 25V output by the boost circuit U9 to the LDO, and then output 1.2V to the ASIC chips;
(4) Domains 53-1 are powered by VDD_IN to the LDO chip, and then output 1.2V to the ASIC chips. The voltage decreases by 0.3V every time a domain is retreated.
(5) The 0.8V of domains 60-54 is provided by the VDD 1.95V of the LDO output of this domain;
2. S19 pro+ Hydro hash board boost circuit:
The boost is to convert the 18V input of the PSU to 25V, as shown in the figure.
3. Signal direction of S19 pro+ Hydro ASIC chip:
(1) CLK signal flow, generated by Y1 25M crystal oscillator, Y1 is transmitted from chip BM1 to chip BM180; voltage is about 0.6V;
(2) TX (CI, CO) signal flow, from IO port 7 pin (3.3V) to level conversion IC U2, and then from chip BM1 to chip BM180;
(3) RX (RI, RO) signal flow, from chip BM180 to chip BM1, through U1 to return to the signal cable terminal pin 8 and then return to the control board;
(4) BO (BI, BO) signal flow, from chip BM1 to BM180;
(5) RST signal flow, from IO port 3 pin to D2, R13, R14, and then from chip 01 to chip 180;
4. Overall architecture:
The overall architecture is mainly composed of 4 hash boards, 1 control board, and an APW111721a power supply.
Ⅴ. Common faulty and troubleshooting steps of the hash board
1. Phenomenon: Single board test detection chip is 0 (PT1 station)
Step 1: Check the PSU output first.
Step 2: Check the voltage output of the voltage domain.
The voltage of each domain is about 0.33V. If there is 19V powering, there is usually domain voltage. First measure the output of the hash board power terminal, and whether the MOS chip is short-circuited (measure the resistance between pins 1, 4, and 8). If there is 19V powering but no domain voltage, continue to troubleshoot.
Step 3: Check the PIC circuit.
Measure the output of the U3‘s 2nd pin. The voltage is about 3.2V. If yes, please continue to troubleshoot the problem. If no 3.3V, please check whether the connection between the fixture cable and the hash board is OK, and re-flash the PIC chip.
PIC chip flashing steps:
① Flash the PIC program on the hash board.
Program: PIC_H20_release_v101_4c54.hex
Download the burning tool: PICkit3, the 1st pin of the PICkit3 cable corresponds to the 1st pin of J2 on the hash board, and the 1st, 2nd, 3rd, 4th, 5th, and 6th pins need to be connected.
② Flashing software:
Open MPLAB IPE, select device: PIC16F1704, click power to select the powering mode, and then click operate.
(1) Step 1: Select file, find the .HEX file to be burned;
(2) Step 2: Click connect to connect normally;
(3) Step 3: Click the program button, and after completion, click verify, and the verification is completed to prove that the flashing is successful.
Step 4: Check the output of the boost circuit. In the figure, C76 can measure 28V voltage, and C62 can measure 25V voltage.
Step 5: Check the output of each group of LDO 1.2V or PLL 0.8V.
Step 6: Check the signal output of the ASIC chip (CLK/CI/RI/BO/RST)
Refer to the voltage value range described by the signal trend. If the voltage value deviation is large during measurement, it can be compared with the adjacent group measurement value for judgment.
2. Phenomenon: EEPROM NG is displayed on the test fixture LCD screen.
Solution: Check whether U6 is soldered normally.
3. Phenomenon: When the test fixture LCD screen displays sensor NG, the test reading temperature is abnormal.
Follow the steps below to troubleshoot:
A) Check the serial port log. If sensor=0, check whether the U4, R453, R454 chip or the adjacent SMD resistor and capacitor are soldered normally;
B) Sensor={0, 1, 2, 3, }, the corresponding sensor position is {U4, U5, U260, U261};
4. Phenomenon: When the test fixture LCD screen displays INIT NG WATER_TEMP, the water temperature at the water inlet and outlet is abnormal.
Solution: Check whether U4, U5 and SMD resistors and capacitors are soldered normally.
5. Phenomenon: The indicator light of the hash board is off.
Solution: Check whether the PIC 3.3V is normal.
6. Phenomenon: Incomplete chip detection on the single board (PT1/PT2 station).
Repair solution:
(1) When the LCD displays ASIC NG: (0), first measure the domain total voltage 21V and the boost circuit voltage 25V to see if they are normal, then use a short-circuit probe to short-circuit the RO test point and the 1V2 test point between the first and second chips, and then run the chip search program. Look at the serial port log. If 0 chip are still found at this time, it will be one of the following situations:
(1-1) Use a multimeter to measure whether the voltage of the 1V2 and 0V8 test points is 1.2V and 0.8V. If not, it may be that the 1.2V and 0.8V LDO circuits of the domain are abnormal, or the two ASIC chips of the domain are not soldered well. Most of the time, it is caused by the short circuit of the 0.8V and 1.2v chip filter capacitors (measure the resistance of the chip filter capacitors on the front and back of the PCBA).
(1-2) Check whether the U1&U2 circuits are abnormal, such as resistor solder joints, etc.
(1-3) After 1V2 and 0V8 are normal, measure the RO, RST, CLK, CI, and BI signals in turn to see if they are normal;
(1-4) Abnormal water temperature or heat dissipation causes U4 and U5 to burn out, PIC chip U3 is short-circuited to the ground, the 1.2v and 0.8V of the first domain have no output, and the BM1 and BM2 chips are burned out; (U4, U5, U260, and U261 are burned out).
(2) If 1 chip can be found in step (1), it means that the 1st chip and the previous circuits are good. Use a similar method to check the subsequent chips. For example, short-circuit the 1V2 test point and the RO test point between the 38th and 39th chips. If the log can find 38 chips, then the first 38 chips are fine. If 0 chips are still found, first check whether the 1V2 is normal. If it is normal, then the chips after the 38th chip have problems. Continue to use the binary search method until the problematic chip is found. Assuming that the Nth chip has a problem, when the 1V2 and RO between the N-1th and Nth chips are short-circuited, the N-1 chip can be found; but when the 1V2 and RO between the Nth and N+1 chips are short-circuited, not all chips can be found.
(3) When the LCD displays ASIC NG: X (fixed report of a certain chip), there are two cases:
(3-1) The first case: the test time is basically the same as the OK board, and usually the value of X does not change each time it is tested.
The test time refers to the time from pressing the start test button to the LCD displaying the result ASIC NG: (X).
This situation is most likely caused by abnormal soldering of the CLK, CI, and BO series resistors before and after the Xth chip, so just focus on checking these 6 resistors. A small probability is that one of the three chips X-1, X, and X+1 has abnormal soldering of the following pins:
(3-2) The second case: the test time is almost twice as long as the OK board (sometimes the X value changes each time the test is performed, and sometimes X=0);
During the test, assuming that the domain voltages of all domains in front of the abnormal position are almost all less than 0.3V, and the domain voltages of the domains behind are almost all higher than 0.37V, this is caused by the chip not being soldered properly, usually 1.2V, 0.8V, RXT, CLK are not soldered properly. It is recommended to directly measure the domain voltage to locate which domain has a problem. The 1V2 and RO short-circuit method used in step (1) can also locate the abnormal position;
(3-3) The third case: the chip appearance is normal, the voltage signal is normal, and it is a problem with the chip itself;
7. Phenomenon: Single board Pattern NG, i.e. the reply nonce data is incomplete (PT2 station)
Pattern NG is caused by the fact that the characteristics of some chips are quite different from those of other chips. Currently, there are several causes of defects:
(1) If a chip is found to be damaged, only the chip needs to be replaced;
(2) The chip is soldered or the chip is poorly soldered (the nonce reply number of two chips in one domain is 0 or 1);
(3) The domain voltage of this domain is low, 1.2V&0.8V voltage is normal, and the chip itself has problems;
(4) The nonce reply number of multiple chips is 0, measure the domain voltage, and check from the domain with low domain voltage;
PS: Special attention should be paid to the fact that the domain and asic numbers both start from 0;
8. Phenomenon: The chip test is OK, but the PT2 function test serial port does not stop (long run)
Repair method: During the PT2 test, check the serial port print log. Generally, it is caused by a chip register address error. Just replace the BM5 chip as shown in the figure (asic starts from 0);
9. Phenomenon: PT1 chip test is OK, PT2 function test always reports a certain chip as NG;
Repair method: Check the appearance, measure the front SMD capacitor or resistor, generally it is caused by poor chip soldering or a certain capacitor or resistor is damaged and the resistance value is abnormal, or it is a problem with the chip itself;
10. Phenomenon: Most of the chips in the whole miner or single board return to nonces 0.
Generally, it is a chip soldering problem, and it will be OK after reinstalling.
VI. Failure caused by control board problem
1. The whole machine does not run
Maintenance ideas:
(1) Check whether the voltage of several voltage output points is normal. If 3.3V is short-circuited, disconnect U8 first. If it is still short-circuited, unplug the CPU and measure again. For other voltage abnormalities, generally replace the corresponding voltage converter IC.
(2) If the voltage is normal, please check the welding condition of DDR/CPU (X-RAY inspection at the production end)
(3) Try to update the flash program with an SD card. If the machine with the control board flashed needs to start normally, the following steps are required:
a) After the card is successfully flashed, the green LED indicator is always on, we need to turn the power off and restart;
b) Wait for 30 seconds after powering on again (the time process of opening OTP)
c) OTP (One Time Programable) is a type of memory of MCU, which means one-time programmable: after the program is burned into the IC, it cannot be changed or cleared again;
Notes:
① If the power is suddenly cut off during the OTP opening process or the time is less than 30 seconds, the control board will fail to open the OTP function, and the control board will not start (not connected to the Internet). It is necessary to replace U1 (control board main control IC FBGA). The replaced U1 cannot be used on the 19 series;
② For control boards that have opened the OTP function, U1 cannot be used on models of other series;
2. The whole machine cannot find the IP
(1) It is likely that the IP cannot be found due to abnormal operation. Refer to point 1 for troubleshooting.
(2) Check the appearance and soldering condition of the network port, network transformer T1, and CPU.
3. The whole machine cannot be upgraded
(1) Check the appearance and soldering condition of the network port, network transformer T1, and CPU.
4. The whole machine fails to read the hash board or the chain is missing, reporting J:4
(1) Check the connection condition of the cable.
(2) Check the parts of the corresponding chain of the control board.
(3) Check the wave soldering quality of the power strip pins and the resistance around the plug-in interface.
VII. Whole machine failure phenomenon
1. Whole machine initial test
Refer to the test process document, the general problems are assembly process problems and control board process problems.
Common phenomena: IP cannot be detected, abnormal number of detected fans, abnormal detection chain. If an abnormality occurs during the test, repair it according to the monitoring interface and test LOG prompts. The repair methods for the whole machine initial test and aging test are the same.
2. Aging test: During the aging test, maintenance should be performed according to the monitoring interface test, for example;
(1) Lack of chain: that is, 1 of the 4 boards is missing. In this case, there is mostly a problem with the connection between the hash board and the control board. Check whether the ribbon cable is open. If the connection is OK, we can perform a PT2 test on the single board to see if it can pass the test. If yes, it can basically be determined that the problem is with the control board. If it fails the test, use the PT2 repair method to repair it.
(2) Abnormal temperature: Generally, the temperature is high. Our monitoring system sets the maximum PCB temperature of the machine to not exceed 80 degrees, and the chip cannot exceed 95 degrees. If it exceeds this temperature, the machine will alarm and cannot work normally. Generally, it can be found that the water outlet temperature is too high to exceed 65 degrees. Thermal conductive gel can also cause temperature abnormalities.
(3) Not all chips are found (it can still be turned on, but the hash power is 3/4 or 2/4 of the normal value). The number of chips is insufficient: refer to PT2 test and repair;
(4) After running for a period of time, there is no hash rate, and the mining pool connection is disconnected. Check the network;
3. Construction of water cooling test platform for after-sales repair and production maintenance.
(1) Because the first generation of water radiators cannot meet the heat dissipation requirements, a water pump needs to be added in series (or two water radiators need to be connected in series) during the modification;
(2) Use 8mm air pipe joints and air pipes during the modification;
(3) If the heat dissipation does not meet the requirements, add an air cooling fan to assist;
Refer to the above station inspection steps, please communicate with the after-sales engineer for the details of the relevant test procedures and test fixtures. After maintenance, you can use the code scanning mode to test PT1 and then test PT2.
Ⅷ. Other precautions
Maintenance flow chart
1. Conventional inspection: First, visually inspect the hash board to be repaired to see if there is any deformation or burning of the PCB. If so, it must be processed first; whether there are obvious signs of burning of parts, parts impact offset or missing parts, etc.; secondly, after visual inspection, the impedance of each voltage domain can be tested to detect whether there is a short circuit or open circuit. If found, it must be processed first. Thirdly, check whether the voltage of each domain is around 0.33V.
2. After the conventional inspection is fine (generally short circuit detection is necessary to avoid burning the chip or other materials due to short circuit when power is on), the chip can be tested with a test fixture, and the positioning can be determined based on the test results.
3. According to the display results of the test fixture, start from the vicinity of the faulty chip and detect the chip test points (CI/RST/RO/CLK/BI) and voltages such as VDD0V8 and VDD1V2.
4. According to the signal flow, except for the reverse transmission of RO signal (chip No. 180 to No. 1), several signals CLK CI BI RST are forward transmission (1-180), and the abnormal fault point is found through the powering sequence.
5. When locating the faulty chip, the chip needs to be re-soldered. The method is to add flux around the chip (preferably no-clean flux), heat the solder joints of the chip pins to a dissolved state, and promote the chip pins to re-grind with the pads and collect tin. To achieve the effect of re-tinning. If the fault is still the same after re-soldering, the chip can be replaced directly.
6. After the repaired hash board, when testing with a test fixture, it must pass more than twice to be judged as a good product. The first time, after replacing the accessories, wait for the hash board to cool down, use the test fixture to test pass, and then put it aside and continue to cool it down. The second time, wait for a few minutes until the hash board is completely cooled before testing.
7. After the hash board is repaired, relevant repair/analysis records must be made (repair report requirements: date, SN, PCB version, bit number, defect cause, defect responsibility, etc.). In order to provide feedback to production, after-sales, and R&D.
8. After recording, assemble the whole machine for routine aging.
9. The repaired good products on the production side must be streamlined from the first production station (at least the appearance and PT1/PT2 test stations must be inspected)!
10. For the repaired defective hash board, the thermal conductive gel must be removed from the water cooling plate and reprinted before it can be streamlined!