More

kurumo · on Feb 2, 2014

Imagine I want to send a letter to someone, but I am worried that it may be intercepted (by the government, for the sake of argument). So instead of sending it directly to my recepient, I make arrangements with some reliable friends so that if they receive a letter from one of us, they take it out of the envelope and put it in different one, addressed to another friend. To make sure the letter doesn't travel forever we add dots at the end; once there are more than three (four, five...), we send the letter to the original intended recepient. Nobody knows for sure who is the person who sent the letter originally (as I can put different number of dots at the end of the letter). We can also make it so that none of the intermediate parties know what the message says (possibly even who is the final recipient?) by encoding the message at every step and using extra envelopes. At the outset I write my letter, add some dots at the end and send it to one to my friends, picked randomly, together with an extra envelope addressed to the intended recepient.

That should do as far as Tor goes, but the more general problem of explaining why is such a thing needed by your mom is much harder. (Unless she lives in North Korea or some such place).

kurumo · on Jan 25, 2014

While not free, 'Machine Learning: A Probabilistic Perspective' (http://www.amazon.co.uk/gp/aw/d/0262018020) is the best book I have found so far. I also second the recommendations for Tibshirani's and MacKay's books; the former for mathematical foundations, the latter for the intuition.

kurumo · on Jan 25, 2014

There is an IM in Boston who routinely gives people, mostly without regard for their rating, 30 seconds to 5 minutes odds with no increment. I have seen him win these controls against other masters. So comparatively it is not too surprising that Carlsen would be willing to accept these odds against an unrated amateur. Chess with this kind of time controls is basically a different game, which requires emphasis on different skills than do standard time controls.

kurumo · on Jan 25, 2014

That is simply incorrect. A GM can perhaps give a strong amateur (2000+) 1 to 5 time odds, maybe 1 to 7, in under a minute time controls. Larger odds would be arrogant, foolish or even suicidal, depending on the specific GM (rating, opening repertoire, etc.)

RoboTeddy · on Jan 28, 2014

By amateur I was imagining ~1500. If in chess circles "amateur" means "anyone other than a master/grandmaster", then my statement as written is probably false!

kurumo · on April 20, 2013

What precisely makes this particular engine 'the most powerful in the world'? Does it do domain independent named entity recognition with an F score better than 0.8? For what classes of entities? Is it at least adaptable without oodles of training data? Does it do syntactic parsing? With F scores of 0.9 or better? Faster than 200ms per sentence? Across domains? Does it do anything at all in languages other than English? If there is a page on that site where it answers these types of questions I couldn't find it..

drakaal · on April 20, 2013

Install the TLDR plugin. Pick a web site. Or better yet go out to project gutenberg pick a book. Tom Sawyer. Push TLDR. Way faster than 200ms per sentence.

Yes it does most Germanic and Romance languages.

Yes it does domain independent named entities with a a higher score than anything else on the planet. ALL English classes. Medical, Dental, Animal. (that doesn't include Latin uses of animal names) Technical.

As I said we are just stepping out of stealth. I linked a PDF in the comments here.

kurumo · on April 20, 2013

Thanks, that's somewhat helpful. I am not particularly interested in the summarizer plugin itself (mostly because we have one, built in house), but I would love to talk about the underlying pipeline. If you have e.g. a named entity recognition library that performs as well as you say in Romance languages on standard data sets, you have material for at least one conference paper, and furthermore a product much more valuable than the summarizer itself.

My question about speed referred to syntactic parsing specifically. I am sure you can do entropy scoring faster than 200ms per sentence, but unless you have access to parses you are unlikely to be able to do more than purely extractive summarization. That's what Summly does, and every other summarizer on the planet as well. (Except perhaps Columbia's Newsblaster, but that's a bit of a different story).

drakaal · on April 20, 2013

We do extractive summarization because we don't feel that changing the authors words is fair use. We could do rewriting. We actually have an in house demo that for lack of a better word build Wikipedia pages for animals. (animals have fixed traits so it is easier than if we were to try and do general people and the information on them changes much less frequently)

I don't have time to do conference papers.

Our pipeline requires almost every one of our capabilities in order to do TLDR.

We have to grab the page. We have to separate the content from the theme. We have to convert the HTML to a not HTML "thing" that lets us work on the text but maintain the HTML. Then we have to Disambiguate/Segment the sentences. Then we have to analyze the type of content to pick how we are going to summarize it, which requires all the noun, and stemming and keyword analysis, then we have to rank the sentences in importance based on concepts and causation, and readability, and emotion. Then we have to put all the HTML back, and present it to the user.

We set the goal that Tom Sawyer can't take more than 45 seconds to run.

kurumo · on April 20, 2013

Fair use or not, if you could do it I would buy it :) Fine, forget conference papers. If you can demonstrate fast NER in multiple languages, across domains, with competitive precision/recall metrics, I will buy it. The rest of it is not particularly interesting to me because it's frankly not that hard.

drakaal · on April 20, 2013

Clothing, Textiles... We did recently learn that I missed furniture. Apparently a curio cabinet is not something that I was getting... but we get chest of drawers just fine, and writing desk. We even get all the weird dogs.

czzarr · on April 20, 2013

I tried it on http://paulgraham.com/startupideas and here's what it gave me:

"How to Get Startup Ideas

[1] [2] [3] You want to know how to paint a perfect painting? It's easy. Make yourself perfect and then just paint naturally. Live in the future, then build what's missing. [4] [5] [6] [7] Live in the future and build what seems interesting. [8] [9] [10] 10 [11] 11 [12] 12 [13] 13 [14] 14 [15] 15 [16] 16 [17] 17"

doesn't seem to work at all...

drakaal · on April 21, 2013

Highlight the part you want to summarize. Like the part with out the Notes.

Also Paul's writing is pretty poor. The ideas are good, but he jumps around and uses short sentences with far too many pronouns.

Garbage in Garbage out.

Here is the 25% version, which I think is Readable:

The way to get startup ideas is not to try to think of startup ideas. And yet by far the most common mistake startups make is to solve problems no one has.

I made it myself. But galleries didn't want to be online. Because I didn't pay attention to users. Because they begin by trying to think of startup ideas. That m.o. is doubly dangerous: it doesn't merely yield few good ideas; it yields bad ideas that sound plausible enough to fool you into working on them.

At YC we call these "made-up" or "sitcom" startup ideas. But coming up with good startup ideas is hard.

For example, a social network for pet owners. Millions of people have pets. Choose the latter. Not all ideas of that type are good startup ideas, but nearly all good startup ideas are of that type.

Made-up startup ideas are usually of the first type.

Nearly all good startup ideas are of the second type. If you can't answer that, the idea is probably bad. But you almost always do get it.

But while demand shaped like a well is almost a necessary condition for a good startup idea, it's not a sufficient one. If Mark Zuckerberg had built something that could only ever have appealed to Harvard students, it would not have been a good startup idea. Facebook was a good idea because it started with a small market there was a fast path out of. So you spread rapidly through all the colleges. Often you can't.

kurumo · on July 14, 2012

I have been using this when interviewing people with math degrees:

Two players play a game with a single six-sided die. The player that starts can only win by rolling a 1. If he or she doesn't win, the other player gets to roll; he or she can only win by rolling a 6. The game continues until one player wins. What's the probability the first player wins (eventually)?

debacle · on July 16, 2012

Let me see if I can work through this.

Player 1 wins based on the following series:

Chance to get a 1 (turn 1): 1/6 Chance to be allowed to roll: Previous chance to be allowed to roll * 5/6 (Chance player 1 didn't roll a 1) * 5/6 (Chance player 2 didn't roll a 6) Chance to get a 1 (turns 2+): Chance to be allowed to roll * Chance to get a 1

So... summation( 1/6 * (25/36)^(n-1) )

I don't know what that comes out to be.

kurumo · on July 23, 2012

It's actually 1/6 * Sum_n (5/6)^2n = 1/6 * 1 / (1 - 25/36) = 6/11. The first move provides a small advantage, as intuition dictates.

stephencanon · on July 14, 2012

I like this one. It can be brute-forced easily enough, but there are also a couple much more elegant approaches that I would expect a good mathematician to produce.

hypeibole · on July 14, 2012

As a mathematician I'll be interested in knowing why you ask that question to people with math degrees.

What information does that give to you as an interviewer?

kurumo · on July 14, 2012

It tells me if they can at least minimally apply their knowledge and reason about a problem. It's a filter question. If a person with a college degree in math cannot solve this, what's the likelihood they will be able to solve an actual real problem I need them to work on?

kurumo · on Nov 24, 2011

We do (Bloomberg), but on equities, not commodities. We are working on it though :)

kurumo · on Oct 10, 2011

Now, how hard would it be to design a sort of minimal virtual machine that can run this in parallel with e.g. a Windows host OS? Distribute it via some existing delivery vector, et voila...

kurumo · on Oct 1, 2011

New York, NY; Skillman, NJ; London, UK; other locations.

Bloomberg is hiring, Intern, H1B candidates are welcome.

http://careers.bloomberg.com/hire/internsearch.html

http://careers.bloomberg.com/hire/experiencesearch.html

C++, Java, .NET developer positions are available, as well as network engineers, UI designers, sysadmins, etc. Lots of real time and low latency infrastructure work, generally financial domain.

kurumo · on Aug 17, 2011

That is not entirely correct. If you have already acquired credibility (somehow, doesn't matter how for the purposes of this exercise), your estimate or advice can (and frequently does) become a self-fulfilling prophecy. Knowing that a significant number of investors will follow your advice gives you information which can be used in the market. It then stands to reason that the most profitable thing you could do would be to give this advice to the maximum possible number of people.

extension · on Aug 17, 2011

No particular reason the advice has to be accurate.

sliverstorm · on Aug 17, 2011

accurate advice is more likely to come true, thus bolstering your credibility.

extension · on Aug 17, 2011

But the advice comes true because you gave it. This is the inherent flaw with the market at issue here -- it selects for consistency, but not accuracy per se.