Antminer L9 Hash Board Repair Guide
I. Requirements for Preparation of Maintenance Platforms/Tools/Equipment
1. Platform Requirements: Anti-static leather maintenance workbench (the workbench needs to be grounded), anti-static wrist straps and proper grounding.
2. Equipment Requirements:
(1) A constant temperature soldering iron (with a temperature range of 350℃ - 380℃). The pointed soldering iron tip is used for soldering small SMD resistors, capacitors and other small SMD components;
(2) A hot air gun and a heating platform (with a temperature range of 300℃ - 400℃), and a BGA rework station are used for chip / BGA disassembly and soldering;
(3) A multimeter, with soldering steel pins covered with heat shrink tubing for convenient measurement (Fluke 15BMAX-01 is recommended);
(4) An oscilloscope (FNIRSI-1013D oscilloscope oscilloscope is recommended);
(5) Network cables (Requirement: connected to the Internet with a stable network).
3. Requirements for Maintenance and Testing Tools:
(1) The Antminer L9 PSU adopts APW171215a, with a voltage range of 12V - 15V, version 1.3, meeting safety regulations (calibrated), or other PSUs with specifications listed in the BOM can be used, along with power supply adapter cables (self-made: use thick copper wires to connect the positive and negative poles of the PSU to the hash board. It is recommended to use 4AWG copper wires with a length within 60cm, which are only limited to use for PT1 and maintenance testing), used for powering the hash board.
(2) Test fixture uses the test jigs of V2.1 or V2.3 control boards (the part number of the test jig is ZJ0001000004). Discharge resistors need to be installed on the positive and negative poles of the power supply of the test fixture, using cement resistors of 20 ohms and above 100W.
(3) When using the 19 series general testers for the first time, it is necessary to flash the tester firmware package version B031 (refer to the L9 test guidance document for details).
4. Requirements for Maintenance Auxiliary Materials/Tools:
(1) Solder paste columns M705, flux, circuit board cleaning fluid mixed with anhydrous alcohol; the circuit board cleaning fluid is used to clean the flux residues after maintenance;
(2) Thermal conductive gel is used to apply on the surface of chips after maintenance;
(3) Solder ball stencils (BM1491, FCLGA, with a chip size of 8 * 10mm), desoldering wick; when replacing new chips, it is necessary to tin the pins of the chips before soldering them to the hash board, and evenly apply thermal conductive gel on the surface of the chips before fastening the large heat sinks;
(4) Barcode scanner;
(5) RS232/TTL adapter board with 3.3V;
(6) Self-made short-circuit probes.
5. Requirements for Common Maintenance Spare Parts:
(1) SMD resistors RES, 33 ohms, 1%, 1/20W, R0201(0603);
(2) SMD resistors RES, 10K ohms, +/- 1%, 1/16W, 0402;
(3) SMD resistors Res, 0 ohms, 5%, 1/16W, 0402;
(4) SMD ceramic capacitors, 100NF, 6.3V, 10%, X5R, C0201(0603);
(5) SMD ceramic capacitors Cap, 1uF, +/- 10%, 16V, X5R, 0402;
(6) SMD ceramic capacitors, 1uF, 6.3V, 20%, X5R, C0201(0603);
(7) SMD ceramic capacitors Cap, 22uF, +/- 20%, 6.3V, X5R, 0603.
II. Maintenance Requirements
1. Pay attention to the operation techniques when replacing chips. After replacing any components, there should be no obvious deformation on the PCB board. Check whether there are any missing components, open circuits or short circuits on the replaced parts and their surrounding areas.
2. Maintenance personnel must possess certain electronic knowledge and have more than one year of maintenance experience. They should be proficient in the soldering techniques for BGA/QFN/LGA package encapsulation.
3. After maintenance, the hash boards must be tested more than twice and all test results must be OK before they can pass. Each work order has a corresponding number of hash board serial numbers (SN). Each board has a unique SN, and during the maintenance and testing scanning process, the SNs cannot be scanned in a crossed manner and cannot be reused.
4. Check whether the tools and testers can work properly, and determine the parameters of the test software at the maintenance workstation, the version of the test fixture, etc.
5. For the testing of chips replaced during maintenance, it is necessary to test the chips first. After passing the test, conduct the PT2 frequency sweeping function test. During the function test, it must be ensured that the heat sink is assembled properly. When installing the heat sink, thermal conductive gel must be evenly applied on the surface of the chip, and the fan needs to be connected to the test fixture for controlled testing.
6. If there is no fan for heat dissipation on a single board, continuous testing of PT1 will cause the chip to overheat and damage the board.
7. When measuring signals and voltages with the heat sink attached, a fan can be used on the TOP surface to dissipate heat and perform measurements.
8. When replacing new chips, use a repair steel mesh to print solder paste on the chip's printed pins to ensure that the chip is pre-tinned before soldering it to the PCBA for repair.
9. The test fixtures at the maintenance end should all use the Test_Mode and the scanning mode for testing. After passing the test, the production end will start the flow line from the first station of the test and conduct normal installation and aging (install according to the same level).
10. After the PCBA with the heat sink removed has been successfully repaired, the heat sink must be re-glued before undergoing the post-installation flow line test (otherwise, it will cause poor operation of the whole miner).
III. Production and precautions of test fixture
The test fixture supporting fixture should meet the requirements of heat dissipation of the hash board and facilitate signal measurement.
1. When using the BSL41601 series test fixtures for the first time, use the SD card flashing program to update the FPGA of the jig control board: unzip the program and copy it to the SD card, and then insert the card into the card slot of the jig. Power it on and wait for about 1 minute. After the indicator lights on the control board flash 3 times at the same time, the update is completed. (If the update is not carried out, it may cause a fixed report of a certain chip being defective during testing.)
2. Make the test SD cards according to requirements.
For PT1 chip inspection and PT2 functional testing, directly unzip the compressed package to make the SD cards. After unzipping, delete the original Config file first, such as Config.PT1.ini. Name the configuration file Config.ini, then click "Yes", and the final configuration file will be "Config.ini" (the repaired board must be re-tested from PT1 according to the production process, and maintenance personnel cannot skip the test line operation without permission).
3. When testing the PT1 chip at the production, after-sales, and outsourced repair ends, a barcode scanner, serial port tool, and network cable are required.
Product specification reference:
Number | Miner specifications | L9 BSL41601 | L7 BSL39601 |
1 | Miner hash rate MH/s | 15,992 | 9,500 |
2 | Wall power consumption @25℃ W | 3,367 | 3,425 |
3 | Operating frequency MHz | 1,200 | 1,850 |
4 | Number of chips in the Miner | 330 | 360 |
5 | Number of hash boards in the Miner | 3 | 3 |
6 | Hash board chip matrix | 22 * 5 | 24 * 5 |
7 | Miner size mm | 400 * 195 * 290 | 370 * 195.5 * 290 |
8 | Miner weight (incl. packaging) Kg | 15.4 | 15 |
9 | PSU specifications | APW171215a, copper busbar + screws | APW121417B, copper busbar + screws |
10 | Chassis fan | 4*7000 | 4*6000 |
11 | Network connection | RJ45 Ethernet 10/100M | RJ45 Ethernet 10/100M |
IV. Principle Overview
1. Working Structure of L9 (BSL41601) Hash Board:
The hash board has no MOS chips and is powered directly. The VDDIN voltage is 13.2V - 13.5V. According to the voltage frequency corresponding to the gear position, it is composed of 110 BM1491 chips (in sequence BM1 - BM110), divided into 22 groups (domains). Each domain consists of 5 ASICs, and the domain voltage is about 0.6V.
The VDDIO of the chips in domains 1 - 17 is powered by LDO chips (1.2V & 0.8V). The input voltage of the LDO chips is across 5 voltage domains VDD (5×0.6 = 3V). Each domain is powered by 2 LDO chips (one LDO outputs 1.2V, and one LDO outputs 0.8V);
In the high - voltage domains 18 - 22, the Boost circuit is used to boost Vin to 19.6V to power the MP2019, then output 2.5V to supply power to the LDO chips, and then the LDOs output 1.2V & 0.8V to power the chip VDDIO.
The clock crystal oscillator Y1, with Vin being 1.2V, is powered by the output of the U6 LDO chip and outputs 25MHz.
2. Hash board BOT surface diagram (before the heat sink is attached):
The "red frame" is the 1.2V LDO chips, and the "yellow frame" is the 0.8V LDO chips;
Level_shifter0-20 has a total of 21 op amps, as shown in the blue frame, which perform addition operations on chip-related signals. Level 0-12 cross-domain VDD operating voltage of about 6.6V; 13-20 is taken from Boost circuit to MP2019 output VDD to supply op amps.
3. Hash board BOT surface diagram:
The identification number corresponds to the chip heat sink, which can measure the voltage of the adjacent domain.
The temperature sensing circuit uses two LM75A temperature sensors to read the temperature of the air inlet and outlet. U4 corresponds to the air outlet temperature sensor, and U5 is the air inlet temperature sensor:
1.2V & 0.8V LDO circuit schematic diagram (2 of them are captured):
4. BSL41601 hash board boost circuit:
The boost is powered by the PSU VDD_IN and converted to 19.6V through U7.
5. Signal transmission direction of BSL41601 chip:
(1) CLK signal, generated by Y1 25M crystal oscillator, Y1 is transmitted from chip BM1 to chip BM110; voltage is about 0.58-0.6V;
(2) TX (CI, CO) signal, enters from IO port 7 pin (3.3V), passes through level conversion IC U2, and then is transmitted from chip BM1 to chip BM110. The voltage measured by multimeter is about 1.2V;
(3) RX (RI, RO) signal, from chip BM110 to chip BM1, returns to the signal cable terminal pin 8 through U1, and then returns to the control board. The multimeter measures about 1.2V;
(4) BO (BI, BO) signal, from chip BM1 to chip BM110;
(5) RST signal: it enters from IO port pin 3, passes through R3, and then is transmitted from chip 01 to chip 110. The multimeter measures about 1.2V.
6. Miner architecture:
The L9 miner is mainly composed of 3 hash boards, 1 control board, and 1 APW17 power supply.
Ⅴ. Hash board common bad phenomena and troubleshooting steps
1. Phenomenon: Single board test detection chip is 0 (PT1 station)
Step 1: Check the power output first.
Step 2: Check the voltage domain‘s output voltage.
Measure the voltage domain voltage of about 0.6V to see if it is normal. If VDD_IN is powered, there is usually a domain voltage. Prioritize the output at the power terminal of the hash board.
Step 3: Check whether CLK is output. We can measure before and after BM1. If there is output, measure whether the last chip has CLK. If not, use the 2-point method for repair.
Step 4: Check each group of LDO 1.2V or PLL 0.8V output.
Step 5: Check the chip signal output (CLK/CI/RI/BO/RST), refer to the voltage value range described by the signal transmission direction, and if the voltage value deviation is large during measurement, compare it with the measurement value of the adjacent group for judgment.
2. Phenomenon: EEPROM NG is displayed on the fixture LCD
Check whether U3 and surrounding components are soldered normally and whether the data cable contact is OK.
3. Phenomenon: The fixture LCD screen displays sensor NG, and the test reading temperature is abnormal. Follow the steps below to troubleshoot:
(A) Check the serial port log. If sensor NG, check whether the U4, U5 temperature sensor IC or the adjacent SMD resistors and capacitors are soldered normally;
(B) If the temperature sensor IC is poor, pay attention to check whether the 3.3V power is normal, which is output from the control board to J1.
4. Phenomenon: The fixture LCD screen displays INIT NG TEMP
The test reads abnormal inlet and outlet temperatures. It is necessary to check whether the powering and SMD resistors and capacitors are soldered normally.
5. Phenomenon: Incomplete chip detection on single board - PT1/PT2 (sweep) station
(a) LCD display ASIC NG: (0)
First measure the domain total voltage of about 13.2-13.5V and the single domain voltage of 0.6V. If the voltage is normal, use the short-circuit probe to short the RO test point and the 1.2V test point between the 1st and 2nd chips, and then run the chip search program. Check the serial port log. If 0 chips are still found at this time, it will be one of the following situations:
a-1) Use a multimeter to measure whether the voltage of the 1.2V and 0.8V test points is 1.2V and 0.8V. If not, it may be that the 1.2V and 0.8V LDO circuits of this domain are abnormal, or the two ASIC chips of this domain are not soldered well. Most of them are caused by the short circuit of the 0.8V and 1.2V SMD filter capacitors, measure the resistance of the SMD filter capacitors related to the front and back of the PCBA, or measure whether the chip VDDIO impedance to ground is abnormal. If it is abnormal, disconnect one of the chips. If the impedance or diode value is still abnormal after disconnecting the resistor, the corresponding chip can be removed;
a-2) Check whether the U1 & U2 circuit is abnormal, such as the resistor is poorly soldered, etc.;
a-3) After 1.2V and 0.8V are normal, measure the RO, RST, CLK, CI, and BI signals in turn to see if they are normal;
a-4) Abnormal fixture 3.3V short circuit, abnormal fan or abnormal heat dissipation will cause U4 and U5 to burn out. If similar defects occur, first measure whether the 3.3V to ground impedance is abnormal. If it is short-circuited, remove the short-circuited component first, and install the corresponding components after the impedance returns to normal.
(b) If 1 chip can be found in step (a), it means that the 1st chip and the previous circuits are OK, and the binary method can be used for repair positioning.
For example, short-circuit the 1.2V test point and the RO test point between the 38th and 39th chips. If the log can find 38 chips, the first 38 chips are fine; if 0 chips are still found, first check whether the 1.2V is normal. Generally, the chips after the 38th chip have problems. We can continue to use the binary method to check until the problematic chip is found.
Assuming that the Nth chip has a problem, when the 1.2V and RO between the N-1th and Nth chips are short-circuited, the N-1 chip can be found, but when the 1.2V and RO between the Nth and N+1 chips are short-circuited, the report is 0, which means that the Nth chip is abnormal.
(c) LCD display ASIC NG: (X) - Fixed report of a certain chip
There are two situations:
(c-1) The first one: The test time is basically the same as the OK board, and the value of X usually does not change each time it is tested.
The test time refers to the time from pressing the start test button to the LCD displaying the result ASIC NG: (X).
This situation is most likely caused by abnormal soldering of the CLK, CI, and BO series resistors before and after the Xth chip, so just focus on checking these 6 resistors. A small probability is that there is abnormal soldering of the chip pins in the three chips X-1, X, and X+1;
(c-2) The second one: There is no abnormality in the appearance of the chip, and the voltage signal is normal. It is a problem with the chip itself.
6. Phenomenon: Single board Pattern NG, i.e. the reply nonce data is incomplete (PT2 station)
Pattern NG is caused by the fact that the characteristics of some chips are quite different from those of other chips. Currently, there are several causes of defects:
(1) During trial production, it is found that the chip itself is defective. In this case, we only need to replace the corresponding chip;
(2) The chip is tinned or the chip is poorly soldered (the noce reply number of two chips in one domain is 0 or 1);
(3) The domain voltage of this domain is low, 1.2V & 0.8V voltage is normal, and the chip itself has problems;
(4) Multiple chips or the entire domain nocees reply abnormally. Measure the domain voltage and check from the domain with low domain voltage;
Special attention: domain and asic numbers start from 0. When repairing or replacing chips, be sure to clean the residual flux on the surface. Failure to clean the surface will cause abnormal contact with the thermal conductive adhesive, which will cause abnormal heat dissipation.
7. Phenomenon: The whole miner test R:1 shows a bad log, prompting find x asic (x represents less than 110)
It means that the hash board chip cannot be found completely, and it can be repaired as a PT1 chip abnormality. Chain0 represents the No. 1 hash board, Chain2 represents No. 3 board, the hash board next to the PSU is the No. 3 board (chain2), and the middle one is Chain1, which represents the No. 2 board.
8. Phenomenon: PT1 chip batch abnormality
Please refer to the following PIN pin diagram, use the ohm range of the multimeter to measure the corresponding impedance, and output the table.
VI. Schematic diagram of control board & failures caused by some problems.
1. The whole miner does not run
(1) Check whether the voltage of several voltage output points is normal. If 3.3V is short-circuited, disconnect the LDO first. If it is still short-circuited, remove the CPU and measure. For other voltage abnormalities, generally replace the corresponding LDO.
(2) If the voltage is normal, please check the soldering condition of DDR/CPU (X-RAY inspection at the production end).
(3) If the voltage is normal, try to update the flash program online.
2. The whole miner cannot find the IP
It is probably caused by abnormal operation. Refer to point 1 for troubleshooting.
Check the appearance and soldering condition of the network port, network transformer T1, and CPU.
3. The whole miner cannot be upgraded
Check the appearance and soldering condition of the network port, network transformer T1, and CPU.
4. The whole miner fails to read the hash board or the chain is missing
A. Check the connection condition of the cable.
B. Check the parts of the corresponding chain of the control board.
C. Check the wave soldering quality of the power strip pins and the resistance around the plug-in interface.
5. U61 provides power for DDR.
CV1835 control board:
6. U8 is 1.8V LDO and U11 is 1.5V LDO
7. U16 is a 0.9V LDO
8. 3.3V LDO
9. Schematic diagram of bidirectional level converter
VII. Whole miner failure phenomenon
1. Whole miner initial test
Refer to the test process document. Generally, problems such as assembly process problems, control board process problems, power supply abnormalities can be cross-judged.
Common phenomena: IP cannot be detected, abnormal number of detected fans, abnormal chain detection.
If abnormalities occur during the test, repairs should be performed according to the monitoring interface and test LOG prompts. The repair methods for the whole miner initial test and aging test are the same.
2. Aging test:
(1) Fan display abnormality: Check whether the fan is working normally, whether the control board connection is normal, and whether the control board fan circuit is abnormal.
(2) Missing chain: Only 2 of the 3 hash boards can be identified.
This situation is due to a problem with the connection between the hash board and the control board. Check whether the cable is open-circuit. If the connection is OK, the single board can be tested with PT1/PT2 to see if it can pass the test. If it can pass the test, it can basically be determined that the problem is with the control board. If the test fails, repair it according to the PT1/PT2 repair method.
(3) Temperature abnormality: First check whether it is caused by high ambient temperature. The maximum temperature of the PCB in the monitoring system cannot exceed 81 degrees, and the chip temperature cannot exceed 98 degrees. If it exceeds 81 degrees or the chip temperature exceeds 98 degrees, the miner will report R:1 and cannot operate normally. Secondly, check whether the fan speed is abnormal. Low speed can also cause temperature abnormality. If it can be determined that the temperature of 1pcs of hash boards is abnormal, check whether the temperature sensor of the hash board is abnormal (L9 has only two temperature sensor chips), and refer to the single board repair.
(4) The whole miner does not have complete chip inspection: disassemble the single board and retest PT1, and refer to the PT1 repair method for repair.
(5) There is no hash rate after running for a period of time, and the mining pool connection is interrupted. Check the network.
(6) Incomplete chips are found (OM firmware can be started and run, but IM firmware cannot run, and the hash rate of OM firmware is 2/3 or 1/3 of the normal value). Check the log to see if the number of chips is insufficient. If not, refer to PT1&PT2 test and repair.
(7) Aging test status of normal good miner:
(8) The hash rate of the whole miner drops JX:1;
a. First check the miner operation page and the whole miner log to see if there is a chip with an X and a red report. If it is confirmed that there are 1 or more chips, we can first read the chip temperature through port 6060 after the webpage IP, and the nonces reply situation to determine whether it is abnormal. If the temperature of the chip with an X is abnormal, first check the heat dissipation. If the domain voltage is abnormal, it can correspond to whether the impedance of the mass-produced chip is abnormal.
b. If the chip temperature and domain voltage are normal, it is reported that one of the chips is bad asic. For the part that cannot be repaired, verify and retest PT2 OK, and feedback the statistical data to quality engineering for processing.
c. If an I2C error occurs during the operation of the whole miner, it is generally a problem with the cable contact. We can restart or re-plug the cable.
3. After-sales repair, production repair PT1 test platform construction (the test requires the fan to blow air up and down to dissipate heat).
Refer to the above troubleshooting steps for each station. After maintenance, we can use the scan mode to test PT1 and then test PT2.
Ⅷ. Other precautions
Maintenance flow chart
1. Routine inspection: First, visually inspect the hash board to be repaired to see if there is any deformation or burning of the PCB. If there is, it must be processed first; whether there are obvious signs of burning of parts, parts impact offset or missing parts, etc.; secondly, after visual inspection, the impedance of each voltage domain can be tested to detect whether there is a short circuit or open circuit. If found, it must be processed first. Thirdly, check whether the voltage of each domain is around 1.2V.
2. After the routine inspection is fine (generally, short-circuit detection in routine inspection is necessary to avoid burning chips or other materials due to short circuit when power is on), the chip can be tested with a test fixture, and the positioning can be determined based on the test fixture test results.
3. According to the display results of the test fixture test, start from the vicinity of the faulty chip and detect the chip test points (CI/RST/RO/CLK/BI) and voltages such as VDD0V8 and VDD1.2V.
4. According to the signal flow, except for the reverse transmission of RO signal (chip No. 110 to No. 1), several signals CLK CI BI RST are forward transmission (1-110), and the abnormal fault point is found through the power supply sequence.
5. When locating the faulty chip, the chip needs to be re-soldered. The method is to add flux (preferably no-clean flux) around the chip, heat the solder joints of the chip pins to a dissolved state, and promote the chip pins and pads to re-grind and collect tin. To achieve the effect of re-tinning. If the fault is still the same after re-soldering, the chip can be replaced directly.
6. After the repaired hash board, when testing with a test fixture, it must pass more than twice to be judged as a good product. The first time, after replacing the accessories, wait for the hash board to cool down, use the test fixture to test the pass, and then put it aside and cool it down. The second time, wait for a few minutes for the hash board to completely cool down before testing.
7. After the hash board is repaired, relevant maintenance/analysis records must be made (repair report requirements: date, SN, PCB version, bit number, defect cause, defect responsibility, etc.). Daily maintenance reports are issued daily.
8. After recording, assemble the whole miner for routine aging.
9. The repaired good products on the production side must be streamlined from the first production station (at least the appearance and PT1/PT2 test stations must be inspected).
10. For the repaired defective hash board, the heat sink thermal gel must be reprinted before it can be streamlined (otherwise it will cause abnormal heat dissipation and temperature difference).
Dear Customers,
Hello everyone, as China is about to usher in the Spring Festival, international logistics will be suspended. Zeus Mining is scheduled to stop shipping on January 18, 2025, and start the Spring Festival holiday from January 21 to February 4, 2025 (GMT+8). Pre-sales and after-sales service will reply to the information on February 5, 2025, and shipping will resume on February 8, 2025. Thank you for your support and trust in 2024. In 2025 and the future, we will bring better products and services to our friends.
Best wishes,
ZEUS MINING CO., LTD