Neuroflow

A Workflow Foundation based Machine Learning Algorithm Library by using GPGPU for computation backend

NOOOCL – OpenCL bindings for Node.js that actually works

I’m in progress towards Node.js based Neuroflow implementation. From now on OpenCL is flawlessly supported in Node.js with my new module, called NOOOCL.

Rewriting core implementation in D language

In a nutshell D language ROCKS! The new version’s LOC is way much lower than the C++ version, and the performance is about 20% better by using Gnu D Complier than was by using LLVM/Clang.

Here is the repo: https://github.com/unbornchikken/neuroflow-D there is some unit test in the source directory. There you can figure out basic usage till I make some docs.

OpenCL implementation in D is still in progress.

Workflow 4 Node – Milestone 1

It’s announced: here.

Workflw 4 Node Designer Announcement

I have made progress, you can check it out in the latest readme on Github. Except that there will be Neuroflow related activities in the designer on the near future: TrainingLoop, TrainingAlgorithm, etc.

New direction: Node.js

Sorry for the radio silence. I did not sleep only got totally addicted to Node.js recently. I found it as awesome now how I felt about .NET back on early 2K.

Right now I’m working on Workflow 4 Node which is gonna be Workflow Foundation for Node.js. And while I’m on the route towards it, I’ve just released my first Node.js module on npmjs.org called Backpack.

Of course Neuroflow is in this train also. It have a C++ core which can be easily  integrated with Node.js by using node-pre-gyp. And the port’s going to be based on Workflow 4 Node of course. (I ain’t dropped my .NET WF based plans, just Node.js is on top priority.)

I’m planning to head toward Kickstarter with my projects if I have the time to put’em together as working prototypes with documentation and examples.

“Our strategy has shifted”

Long time has passed since I wrote about Neuroflow. The cause is simple: my strategy has shifted.

The previous version is a .NET/C# library which has a way to implement “inner loops” in managed technologies like TPL, and native technologies like OpenCL. It’s abstracted by some interfaces, but the core logic is written in C#.

For native parts there is a C++/CLI based (hand written) layer that do the marshalling, and native interfaces which native implementation classes based on.

The problem is not the performance which is actually quite well. The problem is the C++/CLI which is the most fucked-up thing that Microsoft has ever offered including that mess that called Windows 8. If you try it, you will know it.

Some examples:

  • You cannot use some of the advanced parts of Boost library (corutines for ex.) in a /clr project.
  • If you have a native static library project which used by a managed C++/CLI library, and you change  something in native then managed library won’t build. You have to do manual a Rebuild, which is a huge pain in the ass.
  • Above situation, and if you have a unit test which tests the C++/CLI parts and if you change something in native after a test run, project won’t build any more. You have to kill VS test execution engine process manually or by using post build commandline.
  • Above situation, and if you call lambdas involving managed object captures in the native code, it will crash.
  • etc

The above implies that there is noone exists at Microsoft who builds production code by using C++/CLI.

So, I decided that Neuroflow’s core will be a pure C++ library by having Boost and OpenCL dependency only. I will support other platforms and compilers than Windows and MSVC. I’ve already deployed an Ubuntu based dev machine with QtCreator and g++/clang.

There will be Workflow and Activities of course, but managed – native marshalling will be on higher level than I’ve originally planned. And marshalling won’t use C++/CLI for sure. I’m thinking on C++/CX right now, which is used by MS folks for production definitely,  so it might be usable.

The next release will contain the new, native core along with a prototype header-only library called linqlike, which is yet another LINQ implementation for C++, targeting features proposed by the upcoming C++14 standard (and uses corutines aka yield).

Stay tuned.

C++ AMP v2 vs OpenCL performance

About a year ago, I had a complaint in the MSDN Forums regarding C++ AMP slow performance comparing to OpenCL. Because C++ AMP v2 is out, and we have promises form MS guys about improved performance and zero copy features I’ve ran my test again.

The test is very simple. It loads an array of floats to the device. Then copy this floats to a top of a device array which has double size as the source. After It launches a kernel that copies the top of the device array values to the bottom, and finally copy those values to the host by using another array.

Here is the OpenCL code:

queue.enqueueWriteBuffer(input, CL_FALSE, 0, sizeInBytes, &inputVector[0]);
queue.enqueueCopyBuffer(input, host, 0, 0, sizeInBytes);

queue.enqueueNDRangeKernel(kernel, NDRange(0), NDRange(size));

queue.enqueueCopyBuffer(host, output, sizeInBytes, 0, sizeInBytes);

queue.enqueueReadBuffer(output, CL_TRUE, 0, sizeInBytes, &outputVector[0]);

And the kernel:

kernel void CopyData(global float * hostData)
{
const int size = get_global_size(0);
const int x = get_global_id(0);
hostData[x + size] = hostData[x];
}

Here is the C++ AMP version:

copy(inputView, iSec);
parallel_for_each(concurrency::extent<1>(copySize), [=, &buff] (index<1> idx) restrict(amp)
{
buff[(bs – cs) + idx] = buff[idx];
});
copy(oSec, outputView);

Results

Buffer size is: 1024 * 100
Repeat: 10000

CPU results (Core i5 3.4Ghz):

  • OpenCL: 3893ms
  • C++ AMP: 5330ms

GPU Results (Radeon HD6870):

  • OpenCL: 5581ms
  • C++ AMP: 8031ms

Conclusion

Sorry guys, but I still cannot see the performance improvements. Those are exactly the same numbers I saw by using the first version! I like the idea, I like the integrated C++ syntax, but I don’t like the performance. I will stick to OpenCL in Neuroflow for the current Visual Studio timeframe.

v0.0.2 Alpha is out

v0.0.2 Changelog

  • Improved kernel compilation time requirement (binary caching implemented)
  • Minor performance optimizations

Download

First words

Hello Folks,

Finally I had the time and released a tiny portion of my spare time work’s result. This is, or this will be a machine learning toolset on top of the Windows Worflow Foundation and the .NET Framework that anyone can use for free.

GPGPU? On .NET?

If you check at the source you can easily find that there is C++ stuff in it beside C#. The computation back-end is a pure C++ library that is marshaled to the above .NET layer by C++/CLI. The computation back-end is designed in mind of supporting multiple GPGPU frameworks out there. Right now there is only OpenCL implementation, but I have plans (and some proof-of-concept code) for C++ AMP support. Last time I checked it was the VS 2012 time-frame, and it was outperformed by OpenCL, so I proceeded with Khronos’ stuff. Look at my MSDN forum thread. Today I have VS 2013 with C++ AMP v2 compiler, so i’ll check at my performance comparison tests again soon and gotta report the results here in the blogs.

Where is Workflow?

In the current version there is no WF stuff yet, I’m sorry. I have proof-of-concept activities but I need time to integrate them in the Neuroflow source code. At least everything there is async. 🙂

Examples?

I don’t have publishable examples yet, but there are some unit tests in the code where you can see how things are working.

Later there will be interesting stuff, like image noise filtering, object identification, data mining, game AI, etc.

You can find the list of the features and link to the source code here.

Design a site like this with WordPress.com
Get started