Show HN: Nextjournal – seamless data science for teams

kvlr · on May 7, 2019

I’m one of the founders of Nextjournal and I’m really excited that after almost three years in Private Beta we’re finally opening signups to everyone today!

Nextjournal is a computational notebook platform and our goal is to make computation more accessible and automatically reproducible, so it becomes easier to collaborate and build on top of each others work.

If you'd like to know more, check out our launch blog post at https://nextjournal.com/mk/public-beta or sign up and give it a try!

wuschel · on May 7, 2019

Hey,

just had a very brief look out of sheer curiosity, so please take this quick feedback with a grain of salt: The running times of your Python notebook get longer and longer with each print() statement and cell. While the reproducibility of your python notebook is a wonderful thing to have, I think the performance decrease is very strong downside.

Cheers -

kvlr · on May 7, 2019

Hey, this sounds like something we'd want to look at, I don't think it's inherent to reproducibility. I think we should be truncating the output in this case. Please send us the notebook via the Help button that shows up when you produce an error in a cell, I'll take a look and we'll figure this out.

wuschel · on May 8, 2019

Done.

1ba9115454 · on May 7, 2019

Just signed up hoping for scala but it wasn't there.

Is that something you would be looking at adding?

kvlr · on May 7, 2019

Yes, we certainly want to support a lot more languages in the future and it should be relatively easy to do as long as there's a Jupyter kernel (which is a lot).

We also have a proof-of-concept PR where you can implement a runtime in a notebook.

I think we'll expand the available languages as soon as we're confident the core product really solid.

kfk · on May 7, 2019

So I do BI/analytics at a big company with a team of 6 people, here is my take. We need something like this aimed at business analysts with little to no coding experience and we need it to be priced in the $100-300 per year, not more. Such tool would compete with the MS Office package and would be great. Most of the stuff is available in various open source packages, it would be about putting all together in 1 easy desktop install, adding a nice gui interface on top of various functions (like ipywidgets but more high level). For instance, we could totally add a basic gui on top of altair to do some basic charting, that basic charting is 80% of business needs when it comes to explorative analysis.

kvlr · on May 7, 2019

What you're describing is close to our long-term vision for the platform. Having it be easy to use for non programmers but with the ability to customise things and always "peek under the hood" to see how things are built.

Our GitHub and S3/GCS Components (see it in action in my launch post) are actually just thin layers that execute code from other notebooks and we plan to offer this ability to create custom components like this in the future.

mahmoudimus · on May 7, 2019

Thank you for leaving this feedback. Essential for product development.

boltzmannbrain · on May 7, 2019

Looker? https://looker.com

akrymski · on May 7, 2019

isn't that tableau.com ?

kfk · on May 7, 2019

How is that tableau.com? Tableau only does visualization and a very small subset of it, business analytics is a lot more than visually appealing dashboards

MartinMond · on May 7, 2019

I've been researching what data science tools to use at my company.

How is Nextjournal different from Jupyter or Google Colaboratory?

kvlr · on May 7, 2019

While Colaboratory is built on top of Jupyter Nextjournal is not.

We do support importing Jupyter notebooks and running Jupyter kernels, we also have our own runtime protocol.

In Jupyter (and hence in Colaboratory) you normally have one runtime that's running both your server code as well as the user code. In Nextjournal there's a separate application called the Runner that's orchestrating the runtimes which currently are docker images.

This allows us to use Nextjournal notebooks to do any kind of installations without the need for a full Jupyter kernel inside the image, something that gets tricky in Jupyter. Once we have a bash shell inside the image, we can do installations.

You can choose to commit the filesystem state at any time as a docker image and reuse it in other notebooks. This is actually how our default environment images are built: Our default Python environment https://nextjournal.com/nextjournal/python-environment is built on top of the minimal bash environment https://nextjournal.com/nextjournal/bash-environment which is importing just a stock ubuntu image.

Our system takes care of only referencing the image sha's everywhere, so everything is immutable and you can't accidentally overwrite anything.

You can also pull those docker images and use them locally.

Any data you upload or results you save (just write to a /results folder) is put into content-addressed storage, so same thing here, you'll never accidentally overwrite a file.

Lastly the document is stored in the database (Datomic) and you can restore any previous state.

Leveraging immutability at all layers of the stack is what enables our "remix" feature, so the ability to quickly and cheaply clone any published notebook and continue where another person left off.

joshe · on May 7, 2019

Put this on your site. I know that https://nextjournal.com/features seems more like marketing copy, but this is much more compelling.

Just saying "much more", or "fully" doesn't help much. Try removing all the adjectives from your marketing copy to see if it's actually communicating anything. (Then edit, then add some back :-)). Also most of the features on this page are things you get with Jupyter or collab, address what is actually different, like you do here.

vincentmarle · on May 7, 2019

Awesome, I’ve used Colaboratory mainly for machine learning tasks (I noticed you also have GPU support, nice!), but in my experience it has been very buggy and unstable, so am definitely looking forward to try this out.

r3tex · on May 7, 2019

Nextjournal is really how notebooks were meant to be used - for sharing one's code, its output, and all the reasoning in-between with great looking presentation. I'm very happy that my articles turned out so good looking on the platform.

kvlr · on May 7, 2019

for reference: https://nextjournal.com/r3tex/loss-landscape is the article he's talking about.

sandGorgon · on May 7, 2019

How is this comparable to Google Colab or Azure ML notebooks for python only ? (i know that nextjournal supports many more languages)

especially pricing per resources (its not clear from the website)

kvlr · on May 7, 2019

Our standard instances at 3,75 GB of Ram and we keep a pool of three idle ones of those around. With the free account you can currently use larger instances of up to 16 GB of Ram and 1 Nvidia K80 GPU for free.

If you sign up for the paid plan which is 99$ per researcher per month you can provision more powerful machines – basically anything that Google Cloud offers.

We currently don't enforce any storage limits.

This is our first iteration of pricing though so I'm pretty sure this will still change over time. We've gotten a lot of feedback from people asking for a cheaper plan.

What most people don't realise however is that you can use most of the features (including private drafts) as it stands now for free. We've also been debating weather we should allow for private drafts on the free plan or take a stance on what open science really means (working in the open from the start) but decided agains this for now.

Curious to hear what others think about this. Do you expect drafts to be private and would it be a violation of those expectations if they were not?

reacharavindh · on May 7, 2019

I wish I could leverage such polished interfaces for my research group. But, we have lot of contracts that bind us to keep our research data in house. We cannot simply "run something in the cloud".

So, Jupyterhub and manual tinkering to get such polish for now.

kvlr · on May 7, 2019

While I can't say anything definitive or give a timeline we do want to support research groups like yours. Ideally we'd be able to have our paid offering for companies using Nextjournal in private subsidize our open science/source offering.

We also definitely want to open source parts of our product but we haven't figured out what parts (or everything) and under what license.

Our priority is currently on providing a useful hosted product and become sustainable. It's certainly also interesting to see how e.g. metabase is doing it the other way around, open source first without a hosted product but I guess I'm a bit scared of not being ready for developing Nextjournal in the open at this point in terms of bandwidth and keeping things backwards compatible.

reacharavindh · on May 7, 2019

Completely fair. Your hard work and your product - you should turn it into a sustainable business as you see fit. I was only casually commenting to share my personal thought that it would be a great time saver for me as a sysadmin to "just" use something like Your product instead of spending hours tinkering with Jupyterhub to get it to work well. I'm not entitled to anything.

From my past experiences there are a lot of enterprises that are rightfully scared to let their employees use such a service and open up data regardless of what promises a SaaS company makes. There is general assumption that if the SaaS company fucks up, all we get is a "we take our security very seriously...." blog post.

So, making your product work within a corporate network without "call home" is a great advantage and immediately expands your target audience with some potentially big pockets.

okennedy · on May 7, 2019

> We cannot simply "run something in the cloud".

You might want to check out VizierDB (a project my group is working on). It's a self-hosted multi-language notebook with versioning, branching, and snapshots; as well as a spreadsheet-like editor and provenance-based data annotations.

http://vizierdb.info

all2 · on May 7, 2019

This is something that I want for personal use. I want to be able to control my data, and so hosting an instance (managed and paid for is not a problem) is desirable to me.

refset · on May 7, 2019

This is really neat - great work! It took me less than 10m to figure out how to copy a Crux tutorial into Nextjournal using the Clojure template: https://nextjournal.com/crux/a-bitemporal-tale

The only issue I encountered was that adding comments after the final close parens in the code sections creates EOF errors.

kvlr · on May 7, 2019

Awesome, happy to see that. Been wanting to play with crux anyway. Nextjournal runs on Clojure and Datomic and we use some of pack.alpha and aero from juxt, so thanks!

kvlr · on May 7, 2019

oh and I was able to reproduce the comment issue and will look into it!

ZeroCool2u · on May 7, 2019

Having just gone through an evaluation for platforms just like Nextjournal, there are a lot of companies that make similar claims, but very few that deliver in reality.

In the end, the only one we found that delivered on the promises of reproducibility and managing the entire data science life cycle end to end, facilitating collaboration, and getting stuff done was Domino Datalab[1].

Can you compare and contrast Nextjournal to DD? Better yet, do you feel you're competing in the same areas or are you really more focused just on reproducibility? Even if you're not now, it feels like eventually, all these types of products seem to converge to this state eventually just by nature of the sales process and promising more and more features to customers.

Regardless, it looks really solid, so best of luck!

[1]: https://www.dominodatalab.com/

kvlr · on May 7, 2019

I haven't tried Domino Datalab myself so take this with a grain of salt.

While data science is an obvious use case of literate programming, it's not the only one. I see the fundamental problem that needs to be addressed is one of dependency management. We address this today using Docker. In the future we plan to use a more functional approach most likely based on Nix or Guix. This more principled approach should address both reproducibility and usability (by allowing to compose images and providing much better install times thanks to binary caching).

I haven't really used Domino Datalab but I'm not sure if they allow for the installation of arbitrary system libraries and packages like we do. Check out some out our machine learning samples which run on GPUs: https://nextjournal.com/collection/machine-learning

In the future we also plan to allow in-browser JavaScript execution, this is currently hidden behind a feature flag but we still have an article that uses it in https://nextjournal.com/dubroy/ohm-parsing-made-easy

linkingday · on May 7, 2019

I'm currently working on implementing this same sort of platform at an FI. Most of these vendors have the same exact product, give or take a few key features (like visual programming with Dataiku, cloud/on-prem hosting, etc.).

The biggest thing that I've seen that is missing from almost all of them is a robust data ingestion and transformation engine. THAT'S what I'm interested in seeing.

pmarkovics · on May 7, 2019

Hi, I’m one of the founders of Nextjournal. I haven’t tried Domino Datalab so I can’t give you a comparison but I can speak to the vision of Nextjournal:

We market towards Data Science because our feature set (automatic versioning, reproducibility, collaboration, etc) applies very well to many pain points currently existing in the field. But Nextjournal is not limited to Data Science. We designed it as a general purpose literate programming environment that should be able to address many different use cases: generative art, cloud APIs, spreadsheet-like applications, molecular dynamics simulations, you name it.

The way we want to achieve this is by allowing people to extend Nextjournal eventually, by bringing their own languages and by implementing their own components that can be used in a notebook and shared with other users (e.g. a spreadsheet component or a task board component). We are already building some parts of Nextjournal with Nextjournal, like our component for cloning GitHub repositories into a notebook. We think this will eventually make the platform, as a whole, much more understandable and learnable and will give our users much more agency in what they want to accomplish.

cw · on May 7, 2019

really excited to see you guys launch!