For yield management systems the old calculation adage, “garbage in/garbage out” still rings true. Aligning and cleaning data remains a dirty business.
With the increased value in data in the semiconductor supply chain, there now are essentially two supply chains running in parallel. One involves the physical product being created, while the other includes the data associated with each process step and manufacturing facility. To be successful, chip manufacturers need to tightly manage both.
At wafer, assembly, and test manufacturing facilities, data can be aligned between product test programs, steps, and facilities. Once done, that alignment can be managed with checks, repairs, and alerts in the data management infrastructure. Whenever data looks wrong, the system places a hold on it so an engineer can examine it and determine why.
Data wrangling — getting clean data to analyze — can be a huge time sink for engineers or data scientists looking to extract information from semiconductor data. In some cases, it can soak up 80% to 90% of their time. Efforts to streamline this process have been under development for nearly two decades. The initial target was the front-end of the manufacturing process, but recently attention has shifted to the backend for wafer test, assembly, and package.
The backend remains the least understood area of semiconductor data analytics, in large part because it is tedious work involving data generators. For data scientists, the exciting stuff involves analysis, adaptive flows, and predictive models built upon the data.
“What’s not well understood is the complexity of, and methods to solve, data alignment across the supply chain,” said Greg Prewitt, director of Exensio solutions at PDF Solutions. “The genealogy of devices and respective data is important to the training of complex ML models. Complexities include the use of different identifiers at different steps, and reverse-BOM (bill of materials, one unit to many devices) and then BOM (many devices to one unit).”
One of the big challenges involves how to structure different types of data. “In backend processes where Advantest has good visibility, data from one insertion to another insertion can have different formats, making it difficult to use across other insertions,” said Keith Schaub, vice president of technology and strategy for Advantest America. “The biggest thing we are doing is enabling the data to be used across multiple test insertions, and ultimately across the entire test lifecycle.”
That problem is being echoed across the data management industry. “There is just a lack of the common data formats for test cell and test program data,” said Shai Eisen, vice president of product marketing at proteanTecs. “If the industry converges upon common structure, this will help. There’s also a technical limitation with data. Just the parsing, translation, and augmentation of data, and the cleansing of data, are extremely resource- and time-consuming. It’s not just about a common data structure. It’s what you read, how you read and store it, how you format it and how you use it.”
Today, engineering teams are spending energy aligning and fixing manufacturing data produced in the back end. While this isn’t a technical barrier, done wrong it can make the data unusable. This was the impetus behind the master data concept in the IT world, which was developed to deal with siloed and inconsistent data in enterprise-level business systems. The process includes identifying primary sources for a particular type of information, assigning responsible parties for the associated data, and establishing procedures for the maintenance of the data.
Why test and assembly data is not aligned
Applying this concept to semiconductor manufacturing is proving more difficult, though.
“There are two primary drivers that break the bridging and aligning of data in the semiconductor manufacturing space today,” stated Mike McIntyre director software product management at Onto Innovation. “The first occurs when material, or some component of material, is transferred between companies. The second break in the master data practice occurs when insufficient automation is in place to support the equipment and its data generation and collection needs.”
There are literally hundreds of process steps between an unpatterned wafer and the final packaged unit. Each of these is complex in its own right, which makes managing data a daunting task. On top of that, the foundry model puts a boundary between the design and manufacturing processes. And while IDMs may have an advantage in some areas, they also outsource some manufacturing steps.
And just to make matters more confusing, factories differ in naming conventions and operational practices. Sometimes this exists within the same company, a problem that is often made worse by repeated acquisitions.
Fig. 1: Fabless/Foundry Data supply chain in 2017 source: Global Foundries
More problems stem from legacy equipment generating test programs. It’s harder to align data in older assembly and test facilities than wafer probe for several reasons. First, they have more material to track. Second, backend operations have more diverse steps in the overall material flow, so there are more opportunities for problems with data correctness, cleanliness, and alignment. And third, legacy equipment and less factory automation contribute to all of these problems.
Due to the nature of their business, high-mix/low-volume facilities have much more challenging time with data alignment because there is less data consistency from product to product. Each product has a unique test program written by different engineers, and often differs in the test cell set-up, ATE, loadboards, and handlers. In one day, a single test-cell site may change three to four times.
Non-aligned data impedes a yield engineer’s analysis across the supply chain. It also impedes a product engineer’s ability to use an adaptive test flow.
“How do we identify the database keys to properly link and correlate data from one process to the next? In spite of marketing hype from software companies, this is rarely a straightforward exercise,” said Wes Smith, CEO of Galaxy Semiconductor. “Understanding which products were processed on which machines which correspond to which process data is a daunting task. Often the only key available to link the data is the timestamp.”
Aligning data is tedious
No one can avoid the tedious work to correct data alignment issues. It requires different stakeholders being in the same room. This effort can be measured in weeks or months.
“In many cases, these people are not in the same department or even in the same office,” said McIntyre. “It is also essential to have the objectives or goals for aligning the data to begin with. In the best of cases, a factory full of misalignment could be resolved in a matter of weeks. In some cases, it has taken months or quarters to resolve.”
Planned alignment often requires more effort up front. Michael Schuldenfrei, NI Fellow, shared an early data alignment exercise with a major customer: “From a certain date and onward, they actually laid out a set of instructions for all new products to ensure the data is logged consistently. We sat with them on a field-by-field basis and set up rules to catch every possible changing value that was outside of the accepted format, or outside of the relevant lookup.”
Despite the new mandate, that data did not always meet the consistency at the required level. This resulted in product groups making changes into existing published test programs. “That was a very tedious problem because you discovered it too late in the game,” Schuldenfrei said. “Eventually they started integrating processes in the way they develop their test programs to ensure that when the test program is published, it already met the rules for data logging.”
Putting data files on hold
Yield management systems add automation to manage customer master data. As previously described, there’s a lot of investment in such efforts to encode valid data in each field and to check that the proper files have been sent. When data is not correct, that triggers an alert.
The vast majority of alignment, once the basic initial work is completed, can be monitored and controlled through automation. Still, to set it up when you have a fabless company with multiple subcontractors in the supply chain, isn’t simple. It takes time to figure out all the misalignments and why these occur.
“Typically with big customers (millions of wafers, hundreds of products) it takes us about two months to go through that learning process,” said Paul Simon, director of analytics for silicon lifecycle management at Synopsys. “Then we have very clean data and only half a percent or so of the data goes on hold, and we can handle that manually. In our YMS platform, we have a data integration platform that the customer doesn’t see. It cleans up all the data issues. For every product we set it up so that before data goes into the database, we check for completeness and accuracy. If there is an issue, the data integration platform automatically repairs the data. If we cannot repair it automatically, then that data goes on hold. An engineer tries to identify the root cause of the of the data problem, then updates the data and the recipe to reflect how this specific data problem should be handled.”
Simon noted during this learning process engineers can find the root cause of bad data and send it back to the data suppliers so they can fix it. This can include putting in the wrong lot number, receiving the wrong or incomplete files, and missing test data.
But because of legacy systems, not all data problems can be fixed, so engineers don’t use that data. When it comes to making decisions off of manufacturing data, incomplete accurate data beats complete inaccurate data.
Improving data alignment
Mapping data attributes between factory test steps and between manufacturing and product group often occurs after the fact. Product, factory and IT engineering organizations also can tackle this up front by aligning on data and business and operational processes.
Material identifiers standards help here. “One of the classic challenges is the lack of universal data schema to effectively harmonize data and build unique identifiers,” said Ram Shanmugasundram, director of analytics at Amkor Technology. “We are utilizing SEMI standards. As an example, E142 has a specification for substrate mapping, but it does not define material. We expect this standard to get updated soon to accommodate modern semiconductor unit-level traceability (ULT) needs.”
When converging on factory or product group business processes (aka operational practices), engineering teams need to consider leaner processes — simpler with fewer steps. shared his experience on comparing and aligning data between the various factories, and the natural attempts to share BKMs.
“You want to bring your BKM but the business processes are not always the same,” said Sunil Narayanan, senior director of applied intelligence solutions at GlobalFoundries. “Unless you align on the fundamental business process to a great extent, technically you can put only so much effort to standardize things.”
Alignment provides an opportunity to simplify prior to automation or making automation improvements, which is important. “Otherwise, if you try to automate it before you standardize, you are going to carry over problem from the business side to the technical side,” Narayanan said. “That means you will need to continue to fix it. You have increased the amount of technical debt to carry.”
Yet with multiple data sources in the factory generating enormous amounts to data — petabytes per day in a fab — how do you figure out where your master data issues are in the first place?
“In the past few years, we have placed emphasis on master data management,” he said. “We have a separate organization charted, but when the volume of data is so high, an organization alone cannot fix that problem. Our data is scattered across multiple domains and a variety of data storage systems. A single organization cannot tackle our MDM challenges unless they have tools to show them where the problem is, and where are the things that are broken? We have a major initiative going on called data cataloging. By leveraging technology, we are trying to automate many of those things that clearly will show the data lineage, from where the data originated, where data injection happened, and where the data ended up.”
The semiconductor manufacturing industry continues to make strides in master data management, as well as delivering cleaner data to engineers who need to figure out yield excursions, quality issues and equipment problems. The challenges that remain are more operational than technical. Still, there is widespread agreement among analytics and automation professionals that more standardization would help the industry. That could include test data standards, factory automation standards, as well as better engineering discipline to streamline operations and to agree on nomenclature. The more this happens the more the burden of aligning data is reduced.
Meanwhile, when a fabless company or factory brings in a yield management system, alignment reveals their contributions to misalignment. “When you bring in a system, customers begin to appreciate the problem they have in aligning the data,” said Schuldenfrei. “It tends to push them into projects to improve their pre-fact alignment. They begin to understand there’s a lot more they could be doing, a lot more value they could be generating, if only they did more work on aligning things before as part of their process.”
Yield management systems and manufacturing automation solutions together can enable engineering teams by supporting master data systems, data visualizations and predictive models. Predictive models cannot perform well without complete, accurate and aligned data. The effort to align data requires the painstaking detective work of master data management processes.
The Need For Traceability In Auto Chips
Driving to higher levels of ADAS requires knowing every step of the manufacturing process.
Infrastructure Impacts Data Analytics
Gathering manufacturing data is only part of the problem. Effectively managing that data is essential for analysis and the application of machine learning.
Too Much Fab And Test Data, Low Utilization
For now, growth of data collected has outstripped engineers’ ability to analyze it all.
Electronics Supply-Chain Trust Standards Emerge
Fragmentation is still rampant, but there are signs of progress.