7,500 Faceless Coders Paid in Bitcoin Built a Hedge Fund’s Brain

aresant · on Dec 12, 2016

The company the article refers to is https://numer.ai/about

From their homepage:

"February 27th 2016, an artificial intelligence named NCVSAI joined Numerai. . . He uses an untraceable email address. . . He is completely anonymous. His strongest prediction: buy Salmar ASA — a Norwegian salmon company. . . Numerai’s hedge fund went long Salmar ASA."

Ok, so how is this anything but an anonymous proxy to arbitrage insider information?

Wrap confidential, non-public information under the guise of developing "AI" for trading?

I am serious, I don't necessarily understand the need for the data scientists to maintain anonymity, seems like the only functional reason is to let them break the law?

antognini · on Dec 12, 2016

I've been participating in Numerai for a few months now. (I've only made some beer money from it, nothing serious.) When you get the data, you have no idea what it is. It's just a file with ~70,000 data points, each of which has 21 features. Each feature is uniformly distributed between 0 and 1. All you have to do is make a binary classification of 0 or 1 (or, more accurately, the probability that the data point is in class 0 or 1). They don't tell you what the 21 features represent.

As far as you know, these predictions could be used to make currency trades, or stock predictions, or real estate purchase, or something more exotic. You really have no idea. And since you don't know what these data points represent you can't use any insider knowledge about anything to help you.

iamthepieman · on Dec 12, 2016

EDIT: This is total conjecture.

only 1 or at most a few of those 21 features represent real data. The real data represents similar information to the insider information which they wish to act upon.

Example data prep:

1. Insider source says that a contract is falling through, a patent is being filed for, quarterly numbers have been missed/surpassed etc.

2. Similar information is gathered from historic performance data of the company, similar companies or market segments.

3. The information is correlated with whatever metric they wish to move along and encoded in one of the 21 feature classes.

4. Repeat for whatever relevant information that can be linked to the insider source - i.e. competing companies, re-encoding separately for long and short positions etc.

5. Fill in remainder of 21 feature classes with noise.

6. Profit.

yellowstuff · on Dec 12, 2016

I'm not sure how seriously you were proposing this, but I would rank it's plausibility as being in the vicinity of Guam tipping over and sinking from the weight of a military base.

Inside traders are usually caught because they make profitable trades shortly before significant company events, or they're caught communicating with the tipper. This wouldn't protect from that.

Once they're being investigated having a plausible reason for making the trade is mostly irrelevant. If you have no reason other than inside information for trading and just say "I felt like gambling" it doesn't matter. If they can't actually prove you had the inside information you're innocent.

If they can prove you had illegal inside information it doesn't matter if you can prove that you made the trade for unrelated reasons. You're guilty.

gwern · on Dec 12, 2016

Needlessly convoluted. It would be much simpler to create Numerai as described, and participate oneself, uploading the 'predictions'. The key Numerai person can de-anonymize it by looking up the mapping of the 1 stock with insider information, and adding a big buy on that one. (I'm not sure if participants upload their model to be run by Numerai or just provide their predictions based on a public data feed; I think it's the latter, but even if it's the former, you can just create a model optimized to emit a big buy on the key stock and otherwise random & self-canceling.) The rest of the data can (and should be, in case of SEC investigation, as using large-scale real data would generate lots of paper trails like payment to data providers) be genuine and released as described - who knows, the participants might actually find real signals which are profitable and also mask the insider trading.

gohrt · on Dec 12, 2016

What would be the point of all this? Step 1 is the illegal part, and the other steps don't erase that.

poikniok · on Dec 12, 2016

Well of course it is illegal, you just don't want to get caught, and this allows you to have plausible deniability.

iamthepieman · on Dec 12, 2016

it's like if law enforcement uses a legally questionable tactic to get information about someone. Then, with the help of that illegally obtained knowledge, they can go and re-create it (along with a legal paper trail) via other perfectly legal means.

foota · on Dec 12, 2016

Seems that it would make detecting it more difficult

victor9000 · on Dec 12, 2016

aka parallel construction

acchow · on Dec 12, 2016

The grandparent's point wasn't that the people doing the analysis are injecting insider information....it's that the data dump you've been presented could be some fancy encoding of insider information.

usefulcat · on Dec 12, 2016

How would that work? If Numerai already has insider information, why not just use it directly? Why go to the bother (and risk) of anonymizing and distributing it?

ubernostrum · on Dec 12, 2016

As someone else suggested, think of it like parallel construction in law enforcement: an agency already knows someone is guilty, but they know it from illegal surveillance they can't use in court. So they give a tip to an unsuspecting officer to watch out for a car with a broken tail light at a particular location/time, and in the course of the traffic stop the officer finds drugs in the car, then they claim that's how they found out (despite the officer only being there because they already knew).

sanswork · on Dec 12, 2016

Plausible deniability.

mrich · on Dec 12, 2016

There would be none if they were aware the information they hand out contains insider info.

ForHackernews · on Dec 12, 2016

How would you prove it? It's just an anonymous list of features to train a model.

antognini · on Dec 12, 2016

I think it's unlikely that insider information comes in some nicely formatted dataset with 21 features week in and week out. I think of insider information as usually being a one-off tip that a company is going to take a particular action before they make that information public. This feels more like prosaic financial data. (But I'm not in finance so maybe my perceptions are way off!)

acchow · on Dec 12, 2016

Hence why I phrased it "encodes insider information"

So you get your tip, and you generate a data set that will result in some data analyst getting the conclusion you want.

gohrt · on Dec 12, 2016

No, grandparent's point was the the AI data dump is misdirection from the actual trading logic.

billconan · on Dec 12, 2016

Can I participate as an investor putting my money into the fund? Or I can only participate as a data scientist?

dilemma · on Dec 12, 2016

>They don't tell you what the 21 features represent.

Half of 42.

otoburb · on Dec 12, 2016

"Though most of these data scientists are anonymous, a small handful are not, including [named individual] of Buffalo, New York."

Most data scientists are anonymous, but they don't need to be. The datasets provided to the data scientists must be and are anonymized.

nohat · on Dec 12, 2016

The data is anonymized too. Numerai could be playing a shell game, but it would require submitting their own solutions, or publishing confidential information, both presumably illegal (and particularly indefensible if caught).

module0000 · on Dec 12, 2016

> Ok, so how is this anything but an anonymous proxy to arbitrage insider information?

I think an anonymous means to act-on/acquire insider information is in the "omg I want this" category. Kudos to Numerai.

benmmurphy · on Dec 12, 2016

The numeri.ai data they give to their researchers is anonymised.

markovbling · on Dec 12, 2016

You don't know what stocks your model is predicting to buy - it's all anonymized using homomorphic encryption

acdha · on Dec 13, 2016

How many de-anonymization schemes have withstood more than casual attacks? There's so much stock data available that I wouldn't want to bet against someone using historical data to de-anonymize a stock and then place a bet using insider information. The use of ML would even give plausible deniability – just imagine a prosecutor thinking about the odds of successfully arguing that your model couldn't have produced that result?

lkowalcz · on Dec 13, 2016

I'm almost positive it's not using homomorphic encryption (at least in any real sense). It's misleading how they seem to suggest that they are, though.

markovbling · on Dec 12, 2016

I went to school with Richard and so clearly remember him explaining this to me over coffee less than a year ago when the website had an under-construction style landing page and it's been amazing to watch this grow so fast.

The platform is great and I'd strongly recommend anyone wanting to get machine learning experience or who has played with Kaggle to check out Numerai!

The homomorphic encryption piece is fascinating and I think it'll be an important piece in balancing the privacy vs. utility of personal data as machine learning seeps deeper into the fabric of our lives.

shostack · on Dec 13, 2016

Do you have much context on how he got this off the ground in the first place? These sorts of businesses are always very interesting to me from a launch standpoint. What would an MVP look like? How did they get their first users? Etc.

CN7R · on Dec 13, 2016

> The trouble with homomorphic encryption is that it can significantly slow down data analysis tasks. “Homomorphic encryption requires a tremendous about of computation time,” says Ameesh Divatia, the CEO of Baffle, a company that’s building encryption similar to what Craib describes.

> According to Raphael Bost, a visiting scientist at MIT’s Computer Science and Artificial Intelligence Laboratory who has explored the use of machine learning with encrypted data, Numerai is likely using a method similar to the one described by Microsoft, where the data is encrypted but not in a completely secure way.

Doesn't this imply that homomorphic encryption isn't being used, but something like it instead?

lkowalcz · on Dec 13, 2016

I am pretty sure homomorphic encryption is not being used. In fact, I suspect that no real encryption is being used.

Isn't it the case that if I just removed the labels, and renormalized all my data to fall in [0, 1], then what I end up with looks a lot like what Numer.ai gives you?

I'm not aware of any homomorphic encryption / structure preserving schemes that have homomorphic evaluation on ciphertexts equivalent to literal multiplication and addition of ciphertexts, and this seems to be what they want you to do to train your model. (unless I'm misunderstanding how to interact with the "encrypted" dataset)

EDIT: seems like most people think they are using Order Preserving Encryption, which allows one to compare ciphertexts with the "less than" predicate. This makes more sense looking at what they give, but I never saw anything where they say "only do comparisons on the encrypted data."

BickNowstrom · on Dec 13, 2016

    """
    https://arxiv.org/abs/1508.06574
    "An encryption scheme is said to be homomorphic 
    if certain mathematical operations can be applied 
    directly to the cipher text in such a way that 
    decrypting the result renders the same answer as 
    applying the function to the original unencrypted 
    data."
    The function = GradientBoostingRegressor
    the cipher text = X_encrypted
    original data = X
    same answer = mean absolute error
    """
    import numpy as np
    from sklearn.metrics import mean_absolute_error
    from sklearn.ensemble import GradientBoostingRegressor

    # Replicability
    np.random.seed(0)

    # Create a data set with 1000 samples and 3 features
    X = np.random.randint(0, 60, (1000,3))

    # Create ground truth (the product of the three 
    # features - 100) / 11
    y = (np.prod(X, axis=1) - 100) / 11.
    
    # Encrypt y
    y_encrypted = y + 20

    # Encrypt X
    X_encrypted = X * -0.5

    # Init our model
    rgr = GradientBoostingRegressor(random_state=42)

    # Fit model on first 500 unencrypted features
    rgr.fit(X[:500], y[:500])

    # Predict the remaining 500 features
    preds = rgr.predict(X[500:])

    # Fit model on first 500 encrypted features
    rgr.fit(X_encrypted[:500], y[:500])

    # Predict the remaining encrypted features and decrypt
    preds_decrypted = rgr.predict(X_encrypted[500:]) - 20

    # Evaluate both functions
    print(mean_absolute_error(preds, y[500:]))
    print(mean_absolute_error(preds_decrypted, y[500:]))

    #>>> 323.09
    #>>> 323.72

lkowalcz · on Dec 13, 2016

The encryption here is being done by "adding 20" / "multiplying by -0.5"?

Given this "encrypted" X , y dataset, I could easily find the unencrypted version... (even if I don't know 20 or -0.5, this still reveals so much of the structure that I don't believe it provides any real protection against anything except the most lazy attackers)

BickNowstrom · on Dec 14, 2016

It is a toy example to show that a form of homomorphic encryption is possible, without going Fully Homomorphic Encryption.

And simple linear transforms on already anonymized features are not so easy to reverse engineer as you may think. Just try it on a few datasets from UCI.

lkowalcz · on Dec 14, 2016

Ah ok, sure. I wouldn't call something like a linear transform on anonymized features "encryption" (more like obfuscation?), but I guess it's good marketing in that it lets them associate with the "recent advances in [real] homomorphic encryption"

BickNowstrom · on Dec 14, 2016

If you desire something more one-way, consider PCA, random projections, feature expansions (with something like Random Bits Regression), hashing, or the last hidden layer activations of your best in-house neural net. Then combine these approaches for good measure.

Agreed on the clever marketing, but at least they put their money (expensive dataset) where their mouth is (release it to reverse engineers the world over).

Fully Homomorphic Encryption challenges would be interesting, but it would disqualify our current state-of-the-art algorithms, and reduce the playing field to a handful of people who know how to write algo's that work with Fully Homomorphic Encryption (if any competitor at all is allowed to work on this, and not too busy working for the NSA).

Uptrenda · on Dec 12, 2016

A while ago I wrote about how AI might be trained on a data set and then used to form the basis of an alternative cryptocurrency for data mining. I.E. maybe you train the AI to recognize images of mountains and then use that as your proof-of-work algorithm to reward "miners" for finding related images on the Internet.

What I never imagined is that you might outsource the process of building that AI itself as a separate entity (maybe even a separate blockchain.) You could do the entire thing with commitments:

* Build a blockchain that is about rewarding data scientists for predictive models on data sets.

* Commit to a hash of a data set (the challenge to the network.)

* Hash enters chain.

* Release data set.

* Give them a dead line for a solution after which they commit to a hash of their solution / predictive model along with an ECDSA pub key for a reward.

* Solutions are released after N blocks.

* Top N solutions automatically receive rewards in this new cryptoasset.

* Zero trust would be required since it operate on outcomes that anyone can check.

* (This could also be done as an Ethereum smart contract instead of a blockchain.)

Scaling that would be quite hard though and you would need to use standard proof-of-work to avoid attacks.

IMO: What I find the most interesting about this article is that they're using masked data as the input which I think is the kind of futuristic cipherpunk vision a lot of people had in mind for Ethereum when it launched. Ethereum + AI + crypto + game theory is a match made in heaven and we probably haven't even scratched the surface of what's possible in this space. I can't wait to see the kinds of things people come up with in the future.

Edit: formatting.

billconan · on Dec 12, 2016

great idea! although sharing data or model with blockchain is very costly I think?

surrey-fringe · on Dec 13, 2016

That would make it a bad idea.

lkowalcz · on Dec 13, 2016

Anyone want to speculate about how the data is being "encrypted"? It seems like they don't want to say, which immediately sets off red flags in my head...

I am pretty sure homomorphic encryption is not being used. I think if they are doing anything rigorous, maybe they are using order-preserving encryption (http://www.cc.gatech.edu/~aboldyre/papers/bclo.pdf). This would mean that the only valid operations on the ciphertexts are comparison operations. I can't seem to find anywhere where numer.ai actually says how to interact with the "encrypted" data. I think it's a little strange that they would suggest that they are using homomorphic encryption yet have only comparison operations actually make sense on their "encrypted" data.

A second hypothesis would be that no encryption is being used at all, and this is just unlabeled features that have been renormalized within [0,1].

A third would be that order-preserving encryption is being used, but in an ineffective way which is basically just resulting in the second scenario. (understanding the security guarantees of order-preserving encryption is practice is very complicated)

Twisell · on Dec 12, 2016

I might be wrong but this look like Bernard Madoff in disguise.

"Give me your money and trust my big black box that nobody understand!"

Then just to be more credible add some meaningless data point "We have 7 500 anonymous data scientists" because it is known, the less you now about people who write abstract model that manage your money the more you should be reassured.

"My son told me about anonymous they are a hell of good hackers!"

usefulcat · on Dec 12, 2016

Unless they require some sort of financial contribution from the 7500, it seems like the biggest risk (apart from wasting your time) is that if you do come up with something useful you have no way to know the value of it; you basically just have to trust them.

azernik · on Dec 12, 2016

I think parent's concern is scamming of investors, not programmers.

phpnode · on Dec 12, 2016

Why not both?

azernik · on Dec 13, 2016

Sure. I was just responding to the original comment that spoke exclusively to the investor risk.

aaron695 · on Dec 13, 2016

It's a free market, not getting payed enough for great work you just leave?

But then why would they want to risk you leaving by not paying fair value?

Like any job, if you invent something that makes the company millions, you were always free to have left years ago and started your own business if you truly were worth more.

CN7R · on Dec 13, 2016

From reading the article, how do we know that the algorithms are actually working? Isn't the market overall increasing in value which could make it seem like the models are effective? How does Machine Learning / A.I. differentiate between correlation and causality?

bluetwo · on Dec 12, 2016

That all sounds awesome and impressive, but I have the same problem with it that I have with any of the previous neural network hedge funds or, for that matter any of the 24-hour business news stations:

* If you truly had the ability to predict the price direction of ANY stock/fund/option in ANY consistent time frame (24-hours/week/month/quarter/year) ahead of time, with ANY accuracy greater than a coin flip, it would be ridiculously easy to turn this information into piles of cash.

So, if you are NOT talking to me from atop your piles of cash, I have to assume your accuracy is no better than the flip of the quarter in my right pocket.

tikhonj · on Dec 12, 2016

It's a lot more complicated than that. Even a consistent prediction will generally only give you a small edge, and turning a small edge into real money is incredibly difficult. Just like ideas aren't worth much in startups, execution is a bottleneck on most otherwise-reasonable trading ideas.

There are a lot of problems:

- transaction costs quickly eat into your margins

- regulatory compliance requires a lot of careful work, with non-trivial penalties for mistakes

- different exchanges and countries have different rules

- trading across different timezones is difficult

- synchronizing your actions across multiple venues is incredibly hard

- you might need more leverage than you can get as an individual

- sourcing and cleaning the up-to-date data you need to actually trade on a model (vs back-testing) requires intense work and is expensive

- since your edge is probabilistic in nature, you need a particular risk tolerance (backed by enough cash to survive runs of bad luck, of course)

A lot of successful trading firms live and die more on their infrastructure and execution prowess than they do on brilliant insights and predictions. Even if your prediction or model is good, you might simply not be able to use it because you don't have access to the capital and infrastructure you'd need.

This is probably Numerai's core value proposition: they have an open system for people to submit models (ie predictions) and then use their existing infrastructure to actually execute the best ones.

bluetwo · on Dec 12, 2016

Notice I used the word ANY, not EVERY. If you can do better than a coin flip in ANY prediction, you can make money over time.

I would assume the rational person with such predictive power would choose the relationship with the least complications.

tomp · on Dec 12, 2016

A good description of investing is, it's like a game of chess that you can win (using skill) and get $1, and after every game you bet $50 on a random (unbiased) coin flip.

Clearly, you can make money over the long run. But it's incredibly easy to go bankrupt. In addition, there's only so many chess players (trading opportunities), so after a certain point, there's nothing to be gained by investing more money (e.g. Renaissance has been returning 20% for decades, but only have a capital of about $10B - they pay out the profits because it's impossible for the to invest more money using these strategies).

cestith · on Dec 12, 2016

I'd say it's better than 50% that you'll sleep tonight. It's also better than 50% that you'll sleep on a bed or couch. I guess all I need to do to make loads of money is invest in furniture companies?

bluetwo · on Dec 12, 2016

There is no market in predicting where I sleep at night because no one is willing to take the other side of that bet.

dllthomas · on Dec 17, 2016

So... there exist predictions where being able to do better than a coin flip does not mean you can make money over time. This is in direct contradiction to what you said above:

> If you can do better than a coin flip in ANY prediction, you can make money over time.

It's also not true even in the case of predictions where there is a relevant liquid market. If everyone can do better than a coin flip, then I can only make money doing better than the rest. And even if I can do better than everyone else, if it's not better enough that the earnings cover my costs I am still losing money over time.

BickNowstrom · on Dec 12, 2016

Fair view. A few points:

- Numerai does not have the ability itself, it harvests predictions from many others.

- To assume that better than random guessing is not possible, is to assume that the stock market is operating at maximum efficiency, and that all the hedge funds that are profitable have very lucky random number generators, instead of well-paid quants.

- The co-founder of perhaps the biggest money making hedge fund, Renaissance Technologies, has invested in Numerai. http://www.wsj.com/articles/renaissance-technologies-hedge-f...

It's a bit like saying you don't trust any SEO company that does not rank #1 for SEO. Fair, sure. But the reality is a bit more grey. Disclosure: Won a few bitcoin on Numerai.

bluetwo · on Dec 12, 2016

I'm not saying it has to rank #1, I'm saying it has to move up any amount.

Renaissance Technologies obviously is a good name to tout but they could have invested for reasons that don't apply to other investors, such as keeping an eye on a new/competing technique in case it bears fruit.

jdietrich · on Dec 12, 2016

Unless you're extremely well capitalised, you'll be ruined by variance. If you can predict stock movements with 50.1% accuracy, you have a potentially profitable trading strategy. To make meaningful profits, you need a massive reserve of cash behind you. If you're right 50.1% of the time, you're still wrong 49.9% of the time. The wins and losses won't be evenly distributed.

Poker is much the same, on a smaller scale. A professional player can turn consistent profits in the long run, but there's a huge amount of variance; they might have a mean hourly profit of $200, with a standard deviation of +/-$2000. You need a huge amount of capital to absorb that variance without going bust.

bluetwo · on Dec 12, 2016

I can't argue with your point or your analogy.

The amount of capitalization needed however doesn't seem prohibitive as long as you are sure you have that slight, if inconsistent, edge.

Options are made to slice and dice exactly this kind of risk for a reasonable cost.

philipodonnell · on Dec 12, 2016

This is a very simplistic view of the markets and there is nothing ridiculously easy about turning information into piles of cash, its not like a light bulb goes off and your bank account instantly has $1M in it. Here are some examples where I might be able to develop an edge based on an algorithm but not be able to monetize it without help.

- The strategy may require trading access to markets or software that I do not have because I lack capital/connections/education required, like large capital requirements for heavily leveraged strategies or market-making strategies that require one of a limited number of seats.

- The strategy may require very active trading but I am not in a personal situation to be that active because of job/family/prison. Many options strategies are here.

- The strategy may be profitable on a small dollar amount but will require a trader with more experience to scale that up to a larger amount of capital. This happens a ton, dealing with scale is much more difficult that most people realize.

I'm not saying you should accept anything at face value, I'm just saying you should not dismiss them because they aren't already rich.

bluetwo · on Dec 12, 2016

I'm not talking about "information", I'm talking about the ability to predict. And I'm setting a very low bar.

philipodonnell · on Dec 12, 2016

Predictions are information. :-) In any case, I was just responding to the maxim of "if it works why aren't you already rich?".

I have a options strategy in front of me that is definitely profitable over the last 10 years, no question. Up in every month over that time-frame except 3 times. I'm not rich because it needs nearly constant monitoring and large starting capital and I can't do that with my day job.

But, if I ever try to raise that capital, I will be very sad if all I hear is "if it works why aren't you already rich?"...

bluetwo · on Dec 12, 2016

Best of luck.

spuz · on Dec 12, 2016

The article says: "Numerai’s fund has been trading stocks for a year. Though he declines to say just how successful it has been, due to government regulations around the release of such information, he does say it’s making money. And an increasingly large number of big-name investors have pumped money into the company"

So perhaps they are sitting on piles of cash?

bluetwo · on Dec 12, 2016

The stock market itself has gone up in the last year. Exactly what regulation prevents funds from releasing information about their track record? Any fund you buy is provided to show you their track record, as well as "SEC Yield" a normalized measure of recent returns.

Sound to me like they are trying to avoid being compared to other metrics, because they don't fare as well.

piker · on Dec 12, 2016

Laws against making a public offering, which hedge funds are generally subject to, prevent public disclosure of track records.

bluetwo · on Dec 12, 2016

I'm going to need a source on that one. That sounds like bunk.

piker · on Dec 13, 2016

"Given the uncertainty and potential adverse consequences of forfeiting the Regulation D exemption, many fund managers [are] careful to avoid specifically mentioning any fund names or details in interviews and press releases." Thomas P. Lemke, Gerald T. Lins, Hedge Fund and Other Private Funds: Regulation and Compliance, General Solicitation or Advertising Activities, S. 4:8 (2016). The edit above from "were" to "are" is based on my own experience as a practicing hedge fund attorney. While, as mentioned below, the JOBS Act loosened the compliance burden with respect to public offerings, most smart hedge fund managers still play by the old rules. "Avoid specific discussions of past performance--In the context of a press release or interview, providing applicable disclaimers and disclosures would be problematic and cumbersome. If a discussion of performance is unavoidable, the performance should be given net of fees." Id., Press Releases and Interviews, s. 5:12. We would generally advise our clients that there is no situation where a discussion of past performance is avoidable, and it should be avoided absolutely.

BickNowstrom · on Dec 12, 2016

Hedge funds do not want to run afoul of the SEC rules on marketing. Even though these rules were relaxed in 2013, the risk of revocation is there. http://www.newyorker.com/business/currency/why-arent-hedge-f...

I bet they show potential investors their returns. But to the public? That may be both iffy and unnecessary.

bluetwo · on Dec 12, 2016

Showing returns isn't the same as making a public offering. This article doesn't even claim that. Hedge fund managers do it on CNBC all the time without running afoul of the law.

vkou · on Dec 12, 2016

This screams 'Ponzi scheme'. If you are thinking of putting money into this, don't walk away - run away.

Above-the-board trading institutions don't have problems with nebulous 'government regulations', when talking about how profitable their funds are.

tomp · on Dec 12, 2016

You've no idea what you're talking about. Hedge funds (legally) can't advertise to retail investors (if they do, they're no longer hedge funds, which substantially increases their regulatory burden and narrows the range of strategies they can trade). Having said that, even if they could, they probably wouldn't want retail investors - if they're good, they can get rich investors (e.g. $10M+), reducing their administrative costs and allowing them to trade with longer horizons.

bluetwo · on Dec 12, 2016

I agree. Just look up ai-expert Ray Kurzweil and his hedge fund FatKat. Boom.

nkrisc · on Dec 12, 2016

Investor cash, it sounds like.

Paul-ish · on Dec 12, 2016

I don't know if it counts as insider information, but Numerai might have access to superior data, but they obviously don't want to give it away, as they would lose their chance to make money. This way they keep the power but leverage the knowledge of those with good analytic skills. I guess it would work in a situation where one party has knowledge, and the other party has skill, and it is only possible to profit when both are combined, but the one with the knowledge wants to maintain all the power in the bargain.

cwyers · on Dec 13, 2016

The idea that data science is some kind of numerical alchemy where you can just anonymize the data so much that the people doing the modeling don't even know what the problem domain is just irks me. It's utter nonsense.

tinco · on Dec 12, 2016

I wonder if in the end there will be just a single model, and a single faceless coder who actually makes the best predictions, and the whole of Numerai will be just an elaborate way of finding that particular person and his model to make the market.

BickNowstrom · on Dec 12, 2016

With meta-modeling you can build an ensemble (single model) that beats any individual model. This should also have less variance (See Breiman's Bagging Predictors). I'm saying that even if there is a super talent on Numerai, one should still use, say, 0.95 to 0.05 weights, to "hedge your bets" and improve accuracy. With enough competitors, it becomes near impossible for a single agent to beat all the others combined.

Kaggle competitions also see this. https://www.boozallen.com/content/dam/boozallen/documents/20... [page 93] shows a graph where a simple average of the top models in a competition gives an ensemble model with higher accuracy than any of the individual competitors.

I think that is largely the beauty of Numerai. Using adversarial agents to build a "collaborative" model. Disclaimer: Won a few bitcoin on Numerai.

NumberCruncher · on Dec 12, 2016

This remembers me of the "neural network black box based forex trading" companies 10 years ago, based in countries where finance fraud did not count es a crime.

tlrobinson · on Dec 12, 2016

This is about 2 (big) steps from the plot of "Daemon".

ph0rque · on Dec 12, 2016

This one? http://a.co/6lbILZm

ersii · on Dec 12, 2016

Yep! It's a good read. Highly recommend it.

It has a follow up called Freedom(TM). Both books by Daniel Suarez.

ph0rque · on Dec 12, 2016

Thanks! I placed a hold for it at my library.

saycheese · on Dec 12, 2016

>> " If you are a US taxpayer and have tournament winnings, you will be required to submit to Numerai a Form W-9 with your Taxpayer Identification Number, and you will receive from Numerai a Form 1099."

Bit of a stretch to say they do not know any of the names of coders.

Anyone know how many valid W-9s they've received? Seems like if the user was connecting from an IP in the US that the user would need to prove they're not a US taxpayer.

HappyTypist · on Dec 12, 2016

You have the option to self declare to be not a US resident.

arbuge · on Dec 12, 2016

If you actually are a US resident and declare that, you would be breaking the law.

And if you receive payment in a US bank account and declare yourself a non-resident, expect questions to be asked.

HappyTypist · on Dec 12, 2016

Payment is in bitcoin.

HappyTypist · on Dec 13, 2016

It's much more difficult to track. You can convert your bitcoin to cold hard cash at an ATM anonymously.

saycheese · on Dec 12, 2016

Fact it's paid in Bitcoin is irrelevant both to being taxable or being tracked down if need be.

macandcheese · on Dec 12, 2016

How does this compare to Quantopian?

CN7R · on Dec 13, 2016

> Write your algorithm in your browser. Then backtest it, for free, over 14 years of minute-level US equities data, and soon, US futures. (source: Quantopian website)

Numerai encrypts the data they give you, so you can't just take your algorithm and start your own hedge fund (because you won't have any data to improve with).

popol12 · on Dec 12, 2016

So it could be 1 dev who supplied 7500 different models.

irln · on Dec 12, 2016

No sarcasm intended here: Does it matter in terms of social utility whether these types of predictive models are making predictions of human decisions to buy or sell a stock versus predictions of AI based decisions to buy or sell a stock?

lmm · on Dec 12, 2016

A model that we understand seems more useful than one that we don't. If the hedge fund is saying "buy company X but not company Y because company Y leases its equipment and that will cost them in the long term" (say), that's notionally more valuable than "buy company X because the computer says so", because other companies can learn the lesson of this and not lease their equipment.

In practice given that hedge funds tend to be secretive anyway maybe it makes very little difference.

huac · on Dec 12, 2016

It's reasonably easy (not trivial, not impossible) to track the trades of hedge funds, since they do have to disclose their transactions. And some guys will even tell you! [1] [2] [3]

Why they make their decisions is another question.

[1]: https://www.bloomberg.com/gadfly/articles/2016-09-14/herbali...

[2]: http://etfdb.com/etfdb-category/hedge-fund/

[3]: https://whalewisdom.com/

Paul-ish · on Dec 12, 2016

Without knowing much how they obfuscate the data, if anyone could figure out a way to unblind the data, they could reasonably predict what trades Numerai will make, right? Even more important, they could influence those trades.

svantana · on Dec 12, 2016

No, because the predictions are not public. If they were, everyone could just copy the top predictions.

Paul-ish · on Dec 12, 2016

It may be possible to anticipate the predictions of others working from the same data if you know what techniques people tend to apply.

CN7R · on Dec 13, 2016

Doesn't this assume that market is composed of completely rational actors? On the same note, why not just do sentiment analysis on the people trading and buying stocks?

siliconc0w · on Dec 13, 2016

I'm pretty sure they're using Order Preserving Encryption and not true Fully Homomorphic Encryption. Marketing is great though - I love the look and messaging.

stokilo · on Dec 12, 2016

Investing like this it is like implementing Texas Holdem Poker AI without knowing history of previous hands and your opponents attitude. Just won't work, all marketing around it is just a bubble.