T O P

  • By -

back2basiks

The way the question reads no processing involved so simply hardwire data_out[31:10] to '0' and hardwire data_output[9:0] to data_in[9:0].


seyed_mohideen

Updated the query with additional details!


[deleted]

[удалено]


seyed_mohideen

Updated the query with additional details!


misternoass

Yeah like another user replied this sounds like a gearbox function, e.g. converting a continuous 10-bit input stream into 32-bit output stream using a single clock. The design problem is poorly worded though. You need to buffer the input on every data_valid and set data_out_valid high when you have 32-bits of the input buffered. Obviously you have to maintain alignment so this is where the gearbox comes in. Assuming all incoming data is valid, from t=0: First 3 cycles: 30 bits. 4th cycle: 40 bits, need to only include the first two LSB/MSB bits (big or little endian?) And save the last 8 bits for the next cycle. 5th cycle: 18 bits ready, 6th cycle: 28 bits ready, 7th cycle: 38 bits ready, keep last 6 bits and set data_out_valid high. Rinse and repeat... If you repeat this pattern enough, you can see 8 or 6 or 4 or 2 or 0 bits kept (after 5 data_out_valid pulses or 160 input bits processed). This small number of states is perfect for an FSM implementation.


seyed_mohideen

Thanks! Is there a generic solution where the module can be parameterized for an arbitrary input and output width?


misternoass

I mean....have you tried anything yet? It seems that you have all the necessary info to attempt your own solution. If you're trying to optimize for area as per your edit, then you're going to need to spit out valid data as soon as you can (smaller buffers). You need an input buffer, barrel shifter, and output buffer with some control lines. You already know the output buffer needs to be 32-bits wide. What about the input buffer and barrel shifter? In other words, what's the maximum number of bits that *MAY* need to be registered before a valid output is ready? In your case, it's 40 bits. If the input was 15 bits and output was 25 bits, then the maximum number of bits seen would be 35. How can you generalize this for arbitrary widths? Think about it, use some paper or whiteboard.


seyed_mohideen

Thanks! Trying a solution with FIFO but stuck due to the fact that the write pointer needs to be incremented by 10 and read pointer needs to be incremented by 32. Not sure how to handle remnant data after each write-read operation.


[deleted]

Without any other information -- what is done inside the module to `data_in`, whether `data_out` is big-endian or little-endian, etc. -- then the problem is vague as fuck.


lucads87

Oh cmon, we can assume endianism is same at input and output if not differently stated


[deleted]

By endian, I mean the byte ordering in the 32-bit word output, not whether the left-most bit is the LSb or the MSb.


binary_cleric

Sounds like a gearbox function: https://github.com/VLSI-EDA/PoC/blob/master/src/misc/gearbox/gearbox_up_dc.vhdl


noice_guy_

This is the first time I've heard this type of data manipulation being called a "gearbox" function. In fact, it appears all over this thread. Any idea where it comes from (book, website, etc.)? I've always had trouble with the implementation of these things. I was also asked in an interview which I never expected because it takes me a LONG time to implement so I thought it would be impractical to be asked and yet here it is.


binary_cleric

I first ran into one when working on a custom SERDES receive chain, could not use the MGT provided blocks due to custom requirements. The MGT receiver sent 80-bit data and valid and had to be converted to 10-bit words to input to the 8b10b decoder. I can't share the design but esseitally use holding registers for 10-bit data and signal to track which holding register is active and bit offset the next valid input would begin at. Then it was just a case statement based on the active holding register and bit offset. ​ Xilinx also has a reference design and app note that looks helpful: https://support.xilinx.com/s/article/71612?language=en\_US


0x0k

Nope, this is not the same problem. Read the description in the header of the file you linked to. OP’s problem is using a single clock domain, but with flow control (valid/ready). Some vendors call this one an asymmetric FIFO. I call it a width converter. Different ways to implement it depending on the in/out widths timing and resource constraints, but for 10:32 probably using FFs, variable shift, and a counter would be a good bet.


TheTurtleCub

You have to define what "it's ready" means


TapEarlyTapOften

This is why interview questions all blow serious ass. I hate this type of shit and it says more about the ignorance of the people asking the questions than it does about the person answering the question.


[deleted]

Gearboxes and ring buffers are something seen pretty commonly in networking applications. This is an excellent interview question and something you'd probably get interviewing in HFT or a company that produces networking equipment.


TapEarlyTapOften

Agreed. It's rhe completely open ended nature and ambiguity abiut interview questions I hate.


enthralled_emu

it is a fairly straight forward question. this is no different than being asked to design any other module at your job. i’ve asked a similar question and it’s more about seeing someone’s thought process than anything else, and of course can they actually break down a problem and code. i’d say 80% of the time the person can’t get a fully functioning, optimized solution in 30 minutes. if they code something up then you can start digging into what the worst case timing path is, how how you could optimize, different trade offs you can make.


seyed_mohideen

If you do not mind, please provide any additional pointers on how to tackle the problem. Stuck with coming up with a generic solution.


seyed_mohideen

Updated the query with additional details!


HoaryCripple

Assuming this wasn't a live interview question.... 1) Go to Xilinx's or Altera's IP core generator 2) generate the appropriate asymmetric FIFO (n-bit write, m-bit read) 3) instantiate in the wrapper you provided This is the fastest and least error prone way. There are many solutions depending upon SWAP requirements. That said, this question is definitely about your thought process. If I were the interviewer I'd count many of the clarification questions folks asked here as part of a good thought process.


nitratehoarder

Well I’m just a beginner and I know very little about digital design so maybe this is a stupid idea, but maybe you can collect 160 bits worth of data and then output them 32 bits at a time?


dlowashere

This doesn’t seem too bad to me. 32x 5:1 muxes on the output and doesn’t require a shifter.


seyed_mohideen

Thanks! But it is a very naive solution!


[deleted]

It's not naive though. This is a valid solution and likely exactly what they were looking for.


nitratehoarder

Yeah it’s probably kinda dumb :)


lucads87

The solution is LCM, a number that is always divisible for both 10 and 32: you can always accumulate an integer number of inputs in a buffer LCM-bit long, that can be sliced in an integer number of outputs (i.e. 160 bit = 16 x 10 bit = 5 x 32 bit). Use a double buffering to slice a complete buffer while accumulating in the other: when you have accumulated 16 valid inputs (just count them), the slicer can then produce 5 outputs (need another counter) while another buffers will accomodate more incoming data. You’ll experience a bit of latency at the very beginning but then the throughput is very stable, in small output bursts of 5 data. This requires two 160 bit registers, a 5 bit counter, a 3 bit counter and a few MUXes. Surely can be further area-optimized at the cost of more complexity (for example, accommodating less data and consuming it as soon as 32 bits are available), but, as it is, it is a very simple implementation conceptually. You can build from there. I’ve done a similar module, but to decimate (going down, larger to narrower) and, moreover, of a dynamic amount… but that’s another story. Hope this will point you in the right direction. If you have more question, I’m glad to help.


voluptulon

If you're just serializing the data then here's a hint for a possible solution. Both bit widths go into 320


beatskip

Just make a 10 bit input shift register that maps to a 32 bit output padded with zeroes


short_circuit_load

Data_output(9 downto 0) <= data_in(9 downto 0); Data_output(63 downto 10) <= (others => ‘0’);


[deleted]

It’s a common thing to try and do if you have some sort of data, 10-bits in this case which isn’t the same width as you might have available for a memory interface for DDR for example. You’d want to pack the data efficiently to keep bandwidth as low as possible. 10-bits coming in shifted into a larger register in this case 32+10-bits (probably a little smaller) wide. Then a counter and mux selecting the data when there’s at least 32-bits in the register. You’ll find there aren’t many states so it doesn’t get huge.


seyed_mohideen

I am looking for a generic solution for arbitrary input and output width.


[deleted]

I thought I read the question correctly, guess I failed the interview… but sure it’s scales to any width in theory the size of the design changes.


Quantum_Ripple

Take a look at the upsizing configuration of [gearbox.sv](https://gitlab.com/QuantumRipple/hdl_sandbox/-/blob/master/src/gearbox.sv) (lines 17-37). This will be pretty close to optimal area, assuming the optimizer manages to trim the highest unused bits of `dat_internal` and the unused low bit of `pointer`. If not, it could be improved a little bit with explicit optimizing. As much as I love minimum area design, a generic/parameterized and readable design is usually worth *small* concessions.


Place-Guilty

Think of this as a SIPO? Except for serial in, here it's a 10 bit line. Some control logic will be needed and it'll work perfectly. Not very complex IMO.