Unlocking Automated Datapath Gating via Machine Learning Power Prediction
Design Automation Conference (DAC) · 2026
Senior Staff R&D Engineer at Synopsys
I am a Senior Staff Engineer in Research and Development within Synopsys' Design Technology Group in Sunnyvale, California, where I work on the next generation of logic synthesis technologies.
My work focuses on Electronic Design Automation (EDA), logic synthesis, and optimization algorithms for digital circuits. During my Ph.D. at EPFL's Integrated Systems Laboratory (LSI), advised by Prof. Giovanni De Micheli, I developed state-of-the-art synthesis methods for advanced technologies—spanning conventional CMOS for FPGAs and standard-cell designs, as well as superconducting electronics such as the Adiabatic Quantum-Flux Parametron (AQFP) and Rapid Single-Flux Quantum (RSFQ). My doctoral research was recognized with the 2025 ACM SIGDA Outstanding Dissertation Award in EDA, along with other best paper awards and nominations.
Much of my work is open-sourced in the logic synthesis library Mockturtle, which I help maintain. I am also a maintainer of the EPFL Combinational Benchmark Suite and its associated contest, whose best results are presented annually at the International Workshop on Logic & Synthesis (IWLS).
Design Automation Conference (DAC) · 2026
Design Automation Conference (DAC) · 2026
International Workshop on Logic & Synthesis (IWLS) · 2025
Ashenhurst-Curtis decomposition (ACD) is a Boolean decomposition technique widely used in logic synthesis for tasks such as the decomposition of multi-valued relations, the encoding of multi-valued networks, and technology mapping into standard cells for ASICs and lookup tables (LUTs) for FPGAs. A recent truth-table-based implementation of ACD has proven effective for delay-driven LUT mapping while reducing the number of lookup tables, but it does not leverage the flexibility provided by don't-care conditions. In this paper, we enhance ACD by incorporating controllability don't-cares extracted from cuts. Exploiting these additional degrees of freedom, the proposed method increases the decomposition success rate of practical functions into 6-LUTs from 51% to 53.4% and lowers the average number of LUTs per decomposition from 2.50 to 2.46, with even larger gains for large fixed free sets, at only a 1.5x runtime overhead.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems · 2025
This paper addresses the challenge of reducing the number of nodes in Look-Up Table (LUT) networks, with two significant applications: minimizing node count to meet FPGA resource constraints, and area-oriented design space exploration for standard-cell designs, where collapsing a circuit into a LUT network, restructuring it, and remapping helps escape local minima. State-of-the-art substitution algorithms for LUT networks rely heavily on SAT solving, limiting the number of optimization attempts and the size of substitution sub-networks to one node. Conversely, our method relies on circuit simulation to increase the number of substitution candidates and enables substitutions with more than one node. Experimental results show the method identifies optimization opportunities overlooked by other methods, improving 11 out of 23 best-known results in the EPFL synthesis competition and yielding a 3.46% area reduction compared to the state-of-the-art.
Asia and South Pacific Design Automation Conference (ASP-DAC) · 2025
Quantum oracle synthesis involves compiling arbitrary Boolean functions into quantum circuits using the gates supported by the target quantum computer. In fault-tolerant quantum computing, these gates (e.g., the Clifford+T library) must be further expressed by logical quantum error correction (QEC) code operations, a process known as back-end compilation. This paper enhances current XAG-based oracle synthesis techniques by establishing a link between the properties of XOR-AND-inverter graphs (XAGs) and the quality measures of back-end-compiled quantum oracles. This connection unlocks additional optimization opportunities: experimental results demonstrate average reductions of 4.49% in T count, 7.00% in logical time steps, and 14.89% in helper qubit count.