Comments on Burger's FPGA Paper

From dftwiki3
Revision as of 08:14, 12 December 2016 by Thiebaut (talk | contribs) (Created page with "--~~~~ ---- <br /> =Comments on "Microsoft Bets Its Future on a Reprogrammable Computer Chip" (wired.com)= <br /> * [https://news.ycombinator.com/item?id=12578163 comments lin...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

--D. Thiebaut (talk) 08:14, 12 December 2016 (EST)



Comments on "Microsoft Bets Its Future on a Reprogrammable Computer Chip" (wired.com)



	Microsoft Bets Its Future on a Reprogrammable Computer Chip (wired.com)


It's a little odd saying that Microsoft is betting its future on reprogramable chips.. From reading the article they are just simply now using FPGAs where previously they (along with many other companies) were not. It would be like saying Amazon is betting its future on UPS trucks.. What you put in the FPGA matters.. The reprogramability is really just to allow them to implement algorithms that they use heavily that currently are not executed efficiently using current Intel chips. Basically what this is showing us is that the pace of development in software and algorithms that power the cloud is pretty fast so they need the ability to improve performance on data and computation flows that currently (and maybe in the future) will not be well supported on current silicon. FPGAs are not very cost efficient, so unlikely they would not buy that many compared to the number of Intel processors and nvidia gpus. They basically fill in gaps that currently are not being well supported. Intel now owns Alterra so this internal thrust at MSFT may be why they bought them..
 	
RandomOpinion 77 days ago [-]

>Intel now owns Alterra so this internal thrust at MSFT may be why they bought them.
Intel openly acknowledges it. From the article:
"Microsoft’s services are so large, and they use so many FPGAs, that they’re shifting the worldwide chip market. The FPGAs come from a company called Altera, and Intel vice president Diane Bryant tells me that Microsoft is why Intel acquired Altera last summer—a deal worth $16.7 billion, the largest acquisition in the history of the largest chipmaker on Earth. By 2020, she says, a third of all servers inside all the major cloud computing companies will include FPGAs."
Altera strikes me as an odd choice though. I'd have thought Intel would buy out Xilinx, the industry leader, instead.
 	
mastax 77 days ago [-]

>Altera strikes me as an odd choice though. I'd have thought Intel would buy out Xilinx, the industry leader, instead.
Intel had been fabbing (some of?) Altera's chips for a few years before the acquisition, so from this angle it makes more sense than Xilinx. As to why Altera had this partnership and not Xilinx, who knows. Perhaps being in second place motivates you to shake things up.
 	
totalZero 77 days ago [-]

My understanding is that Intel has a service called Intel Custom Foundry that gives smaller partner companies access to Intel's fabrication pipeline. Altera was a client of Intel Custom Foundry, and the two companies started to build some FPGA-accelerated x86 products together, so it was a pretty natural acquisition versus a potential Xilinx deal.
 	
friendzis 77 days ago [-]

From my own experience Altera floods education/research facilities with free-or-cheap developement boards and tools compared to Xilinx. Much like Atmel or Microsoft itself. Microsoft already has large programs to grow users for their software and Altera is more suitable in this regard.
 	
nickpsecurity 77 days ago [-]

They're similar in capabilities, Altera's tooling is easier-to-use per comments I see, and it's much cheaper buy. Funny thing was that, when Intel was buying Altera, people were suggesting Xilinx should buy AMD. That's how funny the financial situation is between top CPU and FPGA vendors. Haha.
 	
qq66 77 days ago [-]

In heavily scale-dependent industries there's often an enormous financial gap between #1 and #2. Compare Intel's CPU profits to AMD's, Nvidia's to formerly ATI, Xilinx to Altera, Apple to Samsung, etc.
 	
nickpsecurity 77 days ago [-]

It's true. Comes from First Mover Advantage that I'm aware of. I used to say go for quality first but these days I'm likely to tell people to be first to ship with extra care on upgradeable interfaces. Then you get First Mover Advantage plus ability to fix the BS it causes.
 	
WheelsAtLarge 77 days ago [-]

I think they have no choice. Current cpus have hit a wall when it comes to speed so they have to find other ways to speed up the software.
Yes, FPGAs are a pain to program now but so were general purpose CPU's in the early days. Given time and innovation that will change. We are headed towards a time where most algorithms will be run off specialized chips rather than a general purpose CPUs. It's only a matter of time.
 	
pjc50 77 days ago [-]

For me the question of "flexibility" ties to the tools. Anyone who's ever tried to do FPGA development will know that it's not exactly agile. Third-party improvement of tools is blocked, because the "bitstream" format is confidential. Imagine if Intel or ARM refused to give you an instruction set manual and made you use a slow proprietary compiler with encrypted outputs.
Without great improvement and opening of the tools, FPGAs aren't going to be a general-purpose accelerator. It'll be a question of building one specific piece of firmware for one specific project and then deploying it semi-permanently with minor revisions across the cloud.
Wake me up when Visual Studio has an FPGA backend.
 	
friendzis 77 days ago [-]

It is also possible that Microsoft leverages reprogramability of FPGAs to iterate over some designs that will at some point land in Intel chips. FPGA is probably the only solution that is both relatively cost efficient and real silicon allowing to iterate over silicon designs.
 	
nickpsecurity 77 days ago [-]

I'm surprised some cloud vendor hasn't acquired eASIC yet. They throw together stuff easily whose price/performance is between FPGA's and standard ASIC's. Have a bunch of machines for rapid, chip prototyping from FPGA's to their S-ASIC's at 90nm, 45nm, and 28nm. Any IP for networking, storage, security, whatever could be turned into a chip easily.
Vendor could pull a IBM or SGI with little boards with high-end CPU's + FPGA's or S-ASIC's for various coprocessors. Not sure if it's ultimately best idea but surprised I haven't seen anyone try it. Wait, just looked up their press releases and it seems they're doing something through OpenPOWER:
http://www.easic.com/easic-joins-the-openpower-foundation-to...
 	
ttul 77 days ago [-]

eASIC is a great little niche-filler. You can get a design fabbed in test quantities for as little as $150K or so, which is akin to "free" next to real ASIC design fabrication.
 	
ttul 77 days ago [-]

Note that Altera also offers a similar service but it costs more.
 	
foobarcrunch 77 days ago [-]

Whaaa? Isn't this what Tilera and Tabula were chasing? Maybe they were too early and/or didn't have the momentum to drive the industry? It does seem like an inevitable technological evolution direction, however compilers, debuggers and so on will need to optimize for an almost entirely new set of constraints.
The thing with using FPGA's in systems (they're great for low-volume, high-priced items where ASICs would be too costly) is they end up just emulating logic which could be more cheaply implemented as actual execution units (as many modern FPGA's things like cache, ROM, ALUs, etc.). That is, it's expensive flexibility that isn't really all that useful. Sure you could reconfigure a "computer" from doing database things to suddenly add more GPU cores to play games, but how useful or power/cost efficient would that be? Sure it's nice to cut down on ASICs and upgrading them after the fact, but it seems like more like category development than offering practical advantages to solve a real problem. Maybe a super-fast HPC-on-a-chip would be possible, but I don't see that we're storage or compute constrained, however we maybe bandwidth and latency constrained in terms of shrinking clusters to a single rack of ridiculously power-hungry reprogrammable chips.
Instead of infinitely customizable, arbitrary logic, you might have a crap-ton of simplified RISC cores with some memory and lots of interconnected bandwidth or something in-between FPGA and MPPA.
https://en.wikipedia.org/wiki/Massively_parallel_processor_a...
 	
Eridrus 77 days ago [-]

I keep hearing that FPGAs suck, but they seem like a reasonable middle ground between CPUs and ASICs in terms of cost efficiency as this article mentions.
 	
spydum 77 days ago [-]

Curious to hear what exactly the devices do? Article hints at compression and such, and the fact that they've moved devices to the edge of the machine to handle network connectivity makes me think it's a shift for better interaction with SDN more than anything else. I don't get how it has much to do with AI though?
 	
Scaevolus 77 days ago [-]

In Bing, it's used to score results for queries: "One of these FPGA nodes did the Bing search scoring for a 48-node server pod, and was linked to the nodes by four 10 Gb/sec Ethernet ports." http://www.enterprisetech.com/2014/09/03/microsoft-using-fpg...
Here's a paper about it: https://www.microsoft.com/en-us/research/wp-content/uploads/...
4.4, Feature Extraction: "The first stage of the scoring acceleration pipeline, Feature Extraction (FE), calculates numeric scores for a variety of “features” based on the query and document combination. There are potentially thousands of unique features calculated for each document, as each feature calculation produces a result for every stream in the request—furthermore, some features produce a result per query term as well. Our FPGA accelerator offers a significant advantage over software because each of the feature extraction engines can run in parallel, working on the same input stream. This is effectively a form of Multiple Instruction Single Data (MISD) computation."
 	
gradys 77 days ago [-]

FTA
> ... in the coming weeks, they will drive new search algorithms based on deep neural networks—artificial intelligence modeled on the structure of the human brain—executing this AI several orders of magnitude faster than ordinary chips could.
 	
aab0 77 days ago [-]

That raises as many questions as it answers. Why FPGAs and not GPUs which can run just about any deep neural network but usually faster and more efficiently?
 	
Const-me 77 days ago [-]

One reason is FPGAs are more flexible.
Sure, GPUs deliver impressive raw performance. To be useful, the task must benefit from massively parallel hardware. GPU hardware works fantastic for shading polygons, training neural networks, or raytracing. For compression and encryption algorithms however, GPUs aren’t terribly good.
Another reason is while a GPU delivers impressive bandwidth on parallel-friendly workloads, it’s usually possible to achieve lower latencies with FPGA. An FPGA doesn’t decode any instructions, and its computing modules exchange data directly.
 	
Eridrus 77 days ago [-]

GPUs give you great throughput, but they're expensive, eat a lot of power, only work well in batch and aren't tuned to prediction (eg poor INT8 performance and too much VRAM).
 	
emcq 76 days ago [-]

GPUs have worse performance per watt than a tuned FPGA. Some newer FPGAs can have 400 megabits of on chip RAM - that's huge, significantly larger than the 128-256k cache typically available on chip for a GPU that turns into big energy savings.
 	
p1esk 75 days ago [-]

GPUs have worse performance per watt than a tuned FPGA
Citation needed.
Maxwell Jetson TX1 is claimed to achieve 1TFlops FP16 at <10W, and soon to be released Pascal based replacement will probably be even more efficient.
 	
emcq 75 days ago [-]

While I dont have any external publications addressing this general claim, this is taken from my current and past experiences with internal studies focused on neural networks implemented on the TX1, other GPU, custom ASICs, and FPGA approaches. In terms of power efficiency it generally goes ASIC > FPGA > GPU > CPU. If you're doing just fp32 BLAS it's hard to beat a GPU, but it turns out many problems have features that you can optimize for.
The TX1 power consumption including DRAM and other subsystems peaks 20-30W. Typical usage is 10-15W if you're running anything useful.
That 1 TFLOP counts a FMA instruction as 2 flops - while accurate and useful for say dot products - for other workloads the throughput will be half of this number.
As an example of an FPGA performing significantly better than the TX1 is DeepPhi [0].
[0] http://www.deephi.com/en/technology/
 	
p1esk 75 days ago [-]

In that link, where's the comparison of fpga vs TX1?
 	
emcq 75 days ago [-]

If you click on Papers, there is a link to "Going Deeper with Embedded FPGA for Convolutional Neural Network", which compares against the TK1: https://nicsefc.ee.tsinghua.edu.cn/media/publications/2016/F...
While not the TX1 vs FPGA result you want, this is very close. For example they aren't using the latest FPGA or GPU, and are not using TensorRT on the GPU and on the FPGA side they are using fatty 16-bit weights on an older FPGA rather than newer stuff you can do with lower precision (which improves the efficiency of the FPGA having more high speed RAM collocated with computation vs GPU which is primarily off-chip).
If you want to learn more about this stuff, I suggest a presentation by one of Bill Dally's students (chief scientist at NVIDIA): http://on-demand.gputechconf.com/gtc/2016/presentation/s6561...
 	
p1esk 75 days ago [-]

Thanks, but TK1 is using FP32 weights, as opposed to FP16 on FPGA. If you double the GOP/s number for TK1 to account for that, you will end up with pretty much identical performance, and the paper claims they both consume ~9W.
I'm not saying you're wrong, just that to make a convincing claim that FPGAs are more power efficient than GPUs, one needs to do an apples to apples comparison.
And of course, let's not forget about price: Zynq ZC706 board is what, over $6k? And Jetson TK1 was what when released, $300? If you need to deploy a thousand of these chips in your datacenter, to save a million per year on power, you will need several years to break even, and by that time, you will probably need to upgrade.
It just seems that GPUs are a better deal currently, with or without looking at power efficiency.
 	
Neeek 77 days ago [-]

Anything they want. I think it's like having a blank breadboard circuit, Microsoft can mock up a new purpose designed chip and program it in to these FPGAs instead of having to fab one up from scratch.
It's hard to tell from the article though, I'm just guessing.
 	
j1vms 77 days ago [-]

There are resource limitations to contend with, roughly like an limit on total number of logic gates (e.g. AND, NOR,..) that can be "rewired", and also the bus speed to memory can become prohibitive for some applications. This is why GPUs have been so successful at the subset of problems they are targeted at, just like GPUs as compared to CPUs. However, FPGAs are becoming ever more malleable and heterogeneous: the embedded toolkit (like on-chip hardwired DSPs) is expanding, and the potential for live reconfiguration of parts of an FPGA while other parts are in-use (dynamic/active partial reconfiguration) [0] is pretty exciting.
Edit: Though I think it was not mentioned in the article, Microsoft and Intel/Altera have indeed gone this route in no small part due to the empirical death of Moore's Law (which has gone much discussed on HN over the past few years).
[0] https://en.wikipedia.org/wiki/Reconfigurable_computing#Parti...
 	
Neeek 76 days ago [-]

Yeah, not thinking they are all powerful, just sounds like they can be fine tuned easier than designing a whole new for-purpose chips.
Thank you for the info though!
 	
andreyk 77 days ago [-]

Brief summary: Microsoft is now using FPGAs (flexible hardware that can be 'programmed' to implemebt various chips) as part of its Cloud tech stack. The FPGAs can run their algorithms with better speed and energy efficiency than CPUs, but are less flexible (a pain to alter). The article does not explain it very well, I think MS itself lays it out quite clearly: https://www.microsoft.com/en-us/research/project/project-cat...
Also annoyingly this article does not link to the paper about this, which also explains it better than the article. I recall this one from MS about how they use the FPGAs in Bing; was pretty impressed by it at the time. https://www.microsoft.com/en-us/research/publication/a-recon...