Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Microsoft rewrote its C# compiler in C# and made it open source (2017) (medium.com/microsoft-open-source-stories)
148 points by nudpiedo on Feb 11, 2020 | hide | past | favorite | 134 comments


Roslyn's parser and syntax tree is pretty amazing. You can recreate the precise source text from the parse tree up to whitespace. This sort of "bijective parsing" is truly incredible and probably one of the cooler innovations I've seen in parsing technology. I can see a bunch of really interesting ideas that you could do with bijective parsing. For instance, imagine Rails style boilerplate generation but done at a semantics aware, within file basis. You could conceivably have a code generator that finds a class, introspects if it has the corresponding method, then generates it if not.

Or imagine syntax reformatting but purely locally. Or semantically aware git diffs that can actually compare the underlying parse trees instead of just raw text. There's so much cool stuff you can do. I wish every language had a parser like this.


Dart's parser can do this too. The formatter (dartfmt) takes advantage of this in order to preserve comments and use the original formatting as a hint to the output in some cases.


>Or imagine syntax reformatting but purely locally.

You don't have to imagine it, this has been supported by IDEA for about a decade now at least for some languages (it works for Kotlin at least, I just re-checked before posting this).

> Or semantically aware git diffs that can actually compare the underlying parse trees instead of just raw text.

http://semanticmerge.com/


Pardon me if I'm missing something, but doesn't just about any AST implementation allow recovering the source text up to whitespace – or indeed including whitespace, given that most real-world parsers have to retain lexical information in order to support reasonable diagnostic messages anyway.


Many (if not most) ASTs, especially in batch compilers, do not preserve comments, whitespace, or even all of the actual syntax. Even the diagnostics information you're referring to are insufficient to reproduce the whitespace.

Further, what Roslyn does is quite interesting even on top of preserving all of that. It does so in a way that lets it incrementally re-parse parts of the program without touching the rest- including position information for the rest of the file after the change, which has all shifted to match.

None of this was standard practice anywhere when Roslyn was designed, though it's starting to be picked up in other projects since.


There is reason AST is called abstract syntax tree not just syntax tree. Many syntax details like parentheses or choice between alternative syntax forms are there only to avoid syntactic ambiguity, improve readability or historical reasons and don't affect the semantics or even error messages. It's not surprising for compiler authors to choose discarding some of the unnecessary information as early as possible to save memory.


Notable a lot of JavaScript AST libraries at least a while back omitted comments and whitespace, which is fine for writing a minifier, but horrible when trying to implement source code transformations.

What I really like about Roslyn's AST (and I guess TypeScript's looks about the same) is that all those details are put into Trivia; ignoring those is just passing one flag to the visitor; you don't have to be aware of how many different types of AST nodes there are that might be ignorable. It also means that each and every whitespace and comment belongs to a syntax token instead of being other nodes interleaved with the normal nodes. This helps keeping comments where they belong even after re-arranging code in the AST – something that, e.g. ReSharper is atrociously bad at.


It's called abstract syntax tree because the concrete syntax is not a tree, it's an list of characters. Not because it makes semantic-invariant transformations.


> It's called abstract syntax tree because the concrete syntax is not a tree, it's an list of characters.

It's called abstract syntax tree to distinguish it from parse trees (a.k.a. concrete syntax trees).


Many times things like comments are dropped and names are disambiguated. Printing the tree back out would yield an equivalent program but the source code would look way different.


I'm not sure to be honest. I suspect most parse trees preserve most of the info but not up to bijection. For instance, Ruby has ways of taking the Ripper output and generating source text but there's no guarantee that this output will be precisely the original source text. After all, the parser would need to distinguish between spaces and tabs and store them as tokens to reproduce the text fully.


> doesn't just about any AST implementation allow recovering the source text up to whitespace

That's correct. Especially for parsers used in IDEs you most definitely care about exact locations of everything, including whitespace.


How would you compare special features, if any, as well as advantages (and disadvantages) of using C# for developing an embedded DSL versus using Julia macros for the same purpose versus using specialized toolsets (e.g., MPS) for developing an external DSL? Please note that I'm aware of the Modeling SDK for Visual Studio. However, since it only allows integration with / targets the Visual Studio environment, it is not a good general approach, hence the question.


Maybe you can look at F#, it will compile for Android, iOS, windows, Linux, MAC, javascript frontends and server side. The feature set is different than Julia’s but it is very powerful and versatile.


Thank you for the suggestion. Will definitely take a more detailed look at F#. I have read about it some time ago, including some very positive general feedback. However, even if F# is an excellent fit feature-wise, I can see two potential issues: 1) lack of a decent package ecosystem (since the planned DSL is only one part of multi-feature multi-aspect platform puzzle) and 2) lack of a significant enough pool of experienced developers, which would make building a good team a challenge (due to its relative popularity, C# is IMO much better than F# in this regard).


I find amusing how microsoft moved from the closed source referent in the industry into such an open source player. Right now even allows to hook some of their tools and platforms to its competing platforms and tools.


Their intent is still entirely market share. We need to be vigilant on how they get there. History has taught us a lot of lessons we seem to have forgotten because shiny and new layer of marketing. There is still a massive cultural and technical impedance mismatch.


It is interesting to me that I read people criticizing Microsoft for their open sourcing code as doing it for marketing reasons---which seems accurate---but not criticizing Google or Apple or whoever else when they open source code.


There are plenty of people on this very forum regularly criticizing Google and Apple's open source efforts...


Apple doesn't release code under Free licenses for marketing purposes: I'd be surprised if anyone in the marketing department of Apple even realized that they were doing so.


Apple releases code under free licenses because they are required to and do exactly that minimum required.


That's not entirely true. launchd was released under a free license, and it was one of the most interesting innovations in software in a long while, for example.


So now copying what Sun, Aix, Tru64 and HP-UX already did is an innovation, I guess it goes with the muster of many Apple "innovations".


You're completely wrong in Sun's case, SMF is nothing like launchd.


It looks quite similar to me in concept.


I agree. A lot of what they are doing now looks like EEE when you peel back a couple of layers.

I say this as someone who happily uses a lot of the stuff they produce in the process. I really, really, intensely hope I'm wrong and my fears aren't realised.


It's the same corporate entity, but Microsofts incentives and the majority of executives have turned over. So I don't see any reasons why Microsoft would be any more likely to EEE than another company. In fact I think they would be less likely because they they have more to risk reputationally.


> A lot of what they are doing now looks like EEE when you peel back a couple of layers.

Doesn't look like; they are doing EEE, just of a more 'benevolent' nature.

.NET between around 2010-2016 was a pretty boring/dire time for the community. Things were so bad that the Community started solving their own problems. This wound up giving us a bunch of great projects that the community stepped up and provided.

But now, when we fast forward to Today, .NET Core (Or, to put not too fine a point on it, ASP.NET Core) is doing a whole lot of what is at best steering developers away from good practices, at worst (un?)intentional EEE:

- EF Core: EF6 was terrible for a lot of reasons. EF Core tries to get people to access relational and nonrelational databases via the same API. Why? (Of course, with Cosmos the answer becomes obvious...)

- MS DI: Microsoft looked at the best examples of DI the community provided, and wound up deciding that breaking a number of established paradigms made it easier to build ASP.NET Core, even if folks who have written the best libraries/literature on DI in .NET explain why it's a bad idea.

- Serialization: 'System.Text.Json will be better than the other libraries out there' I heard that from multiple voices. It's still not really that good, but people try anyway because MS Stack.

And that's a bit of the problem. Many of these things _are_ needed for the 'safe shops'. I've been at places where deviating from MS Stack was specifically discouraged because they were worried about longer term support. Broadly speaking, if it had a big backer/sponsor (i.e. StackOverflow's Dapper) it was a non-issue, and it wasn't a -bad- way to make sure developers weren't just using $"{blogToolOfTheWeek}". Later on however, we shoved that mindset aside as much as we could. Instead we made it our goal to find the best libraries to solve or problems... and our productivity went way up.


If it's open source, how does extinguish work?


Well, it's the EEE I'm comfortable with. I hope it stays this way.


Given the new mission statement of Microsoft, I think if the community is more vocal about the things they want to accomplish it will become imperative that people at the company will be assigned to making those desiderata a reality. So if you want more visual tooling, find a way to become more vocal about it. If you want more support for this or that tech fad, make it known.


The community have been very vocal on a number of issues and thoroughly ignored and steamrolled over. Or the direction changes once the desired outcome is established.


> Their intent is still entirely market share.

So like essentially all other corporations?


Precisely.


On the other hand, a company just has to come out with propaganda like "Do no evil" and everyone rails behind them.


I think they are really repositioning the company as powering general computing through their cloud platform as opposed to powering general computing through their OS.


I think they will slowly turn into IBM. Maybe a better version of IBM though.


I hope that they have no interest in becoming a global services company...


I think they are becoming more and more enterprise oriented so global services company is the way to go.


Wonder if this will eventually lead to the company splitting into separate consumer and enterprise companies as the overlap between the two halves grows smaller.


Isn't this because they've jumped on the ad model perfected by google, facebook, etc? Open source it for press/good will, give it away for free and collect data.

"The world’s most valuable resource is no longer oil, but data"

https://news.ycombinator.com/item?id=14269073


There’s so many flavors of open source. It’s not just about the licenses, but also the way development is done. Do you truly collaborate in open or do you first develop and then push to public repo. Or is the development a joint effort, involving serious contributors from multiple companies or mainly driven by single entity.


I think the article answers your concerns. They develop the whole process in github and many features are driven and implemented out of microsoft's reach.


Their Azure platform is very closed.


I've always loved C# but can't quite get past having to target specific .net frameworks and keeping the different frameworks in my different servers straight and targeting each of them differently. Maybe that'll change with .net core (and once I'm able to get that on my servers), but for now I've discovered a personal love for Go as a way around this for my small utilities.


.NET Core 3.1 is a good LTS release and no time like the present to start migrating to it. Unlike .NET Framework, there's a lot less emphasis in .NET Core on machine-wide installs and a lot more capabilities for application-specific framework deployments. .NET Core even has tools to bundle all of your framework dependencies into a single "EXE", and tree-shaking that down to at least something of a minimal bundle size. There's a small but growing world of "go-like" small utilities that are entirely self-contained .NET Core dependencies.

https://www.hanselman.com/blog/MakingATinyNETCore30EntirelyS...

There's even fun experiments of AOT compiling to get interesting in "EXE golf" results from .NET Core applications such as getting them below 8 KB or running on Windows 3.1 (because why not):

https://www.hanselman.com/blog/NETEverywhereApparentlyAlsoMe...

.NET 5 will integrate further AOT capabilities as the Mono world is merging in, in addition to the raw marketing advantage that 5 > 4 for anyone still struggling to convince non-technical managers that .NET Core is a better investment in 2020 than .NET Framework.


.NET Core doesn’t need to be preinstalled. You can package a self contained deployment now and it comes with all of its dependencies. Or use one of the docker images to build on and run your app in a container.


The madness will hopefully be ending soon with .NET 5.

https://devblogs.microsoft.com/dotnet/introducing-net-5/

> There will be just one .NET going forward, and you will be able to use it to target Windows, Linux, macOS, iOS, Android, tvOS, watchOS and WebAssembly and more.


I say it's a mess but it isn't a hassle. Targeting .NET standard or .NET core is pretty straight forward and easy.

But the fact that all this exists is a mess and thankfully MS finally has a good plan to clean it up.


While the rewrite enabled faster development of the language as a whole, it also gradually destroyed the IDE's performance. The editor in VS 2019 is simply unworkable.

I blame this directly on the immutable AST. While a nice concept in theory, it causes too many allocations, and is cumbersome to work with.

I predict another rewrite in 2 or 3 years.


I had to ditch ReSharper to get to 2019 because together with the performance of the IDE itself, it just wasn't usable. R# ate gigs of Ram and Roslyn does the same. It's not surprising since they basically do the same thing, in managed code! But I can't pay the CPU time and memory to analyze my code TWICE on every edit. I also suspect things like switching build configuration offers thousands of opportunities to have some IDE widget hold references to old compilation data structures which are never garbage collected. On the bright side at least 2019 works pretty well without R#.


> I had to ditch ReSharper to get to 2019 because together with the performance of the IDE itself, it just wasn't usable. R# ate gigs of Ram and Roslyn does the same. It's not surprising since they basically do the same thing, in managed code! But I can't pay the CPU time and memory to analyze my code TWICE on every edit.

I'm holding off on 2019 as much as I can. Between 'forcing' an upgrade for .NET Core 3.0 and the fact Resharper slows it down too much, I decided to give Rider a try.

I'm finding myself not missing VS a whole lot; on one hand Rider is taking way more RAM to start and load, but it stays pretty constant after the first debug session, winds up staying under VS for memory on longer loads (Especially if I've got multiple solutions open) and it's smoother than VS the whole time.


Is Rider 64-bit?


It is, but that's not too relevant, as the ReSharper component runs in another process (but that's managed code as well, so probably also runs as a 64-bit process).


I'm not using ReSharper, and never have. The editor simply gets slower every version, and now it's come to the point where I seriously consider downgrading.


I did the exact opposite: I migrated to Rider, an IDE built around ReSharper. In my experience, it is much faster than VS2019 for my use case (including a huge monolith with hundreds of projects)


I highly suggest giving Rider (also by JetBrains) a try. They run most things in separate processes/threads instead of running everything in the UI thread like VS does. It's almost a drop-in replacement.


NB: It's not VS that's forcing ReSharper to run in the same process. It's JetBrains refusing to listen to the VS team on how to build VS extensions that need plenty of resources since 2008 ... (they're finally considering it as of summer 2019, I think).


> I blame this directly on the immutable AST. While a nice concept in theory, it causes too many allocations, and is cumbersome to work with.

This isn't a cause of performance problems you're seeing. What's most likely happening is that the overall size of your code has increased a lot over time, causing performance issues due to issues that have existed for a long time, but weren't being felt yet.

The issue that's most directly related to this is here: https://github.com/dotnet/roslyn/issues/40300

However, immutable vs. mutable isn't related to the issue I linked. It's just about keeping more data around than is (perhaps) necessary. You'd see the same issue with a mutable AST. If you're curious about specific work that's being tracked you can use this label: https://github.com/dotnet/roslyn/issues?q=is%3Aopen+is%3Aiss...

And if you submit reports via the VS Report a Problem tool, with the option to collect a diagnostic trace, you'll generate exactly the data needed to fix issues that you're facing. The team is very keen on addressing performance problems, especially if there's diagnostic data that can pinpoint the source of a problem.


> the overall size of your code has increased a lot over time, causing performance issues due to issues that have existed for a long time, but weren't being felt yet.

Not really. A simple empty project displays the same problems. You can try going back to 2017 right now with any project you're working on, you'll feel the difference instantly.

Intellisense simply takes longer to respond, and likewqise other editor functions.

Their feedback forums have hundreds of similar reports.


Interesting. I've observed the opposite, with VS 2017 often being unbearably slow in comparison as the codebase gets large. But I can appreciate that you may be experiencing the opposite. I highly recommend filing issues, especially if you can do so with a specific, reproducible problems. Those tend to get resolved quite quickly.


> Those tend to get resolved quite quickly.

The issues are getting resolved in the public bug trucker, pretty quickly indeed, typically saying "can't reproduce, won't fix", sometimes "not a bug, won't fix".

I'm not sure the software problems are getting resolved.


You can look through GH milestones to see specific issues fixed in results. For example, here are the 230 tracked items done for the VS 16.4 public release: https://github.com/dotnet/roslyn/milestone/53?closed=1

You can also see that there are many more resolved issues in previews of that public release: https://github.com/dotnet/roslyn/milestones?state=closed

So there are a _lot_ of legitimate problems being fixed and enhancements being added over pretty short periods of time.

When something is resolved as "no reproduction" or "not a bug", that's because there was an earnest attempt to reproduce an issue with the latest bits set to go out to a release with no reproduction, or something is truly by design (e.g., user files an issue because they would prefer a feature to do something different than it does today).


My experience might be irrelevant to Roslyn, I experienced these things when reporting Visual Studio bugs in general. Examples:

https://developercommunity.visualstudio.com/content/problem/... (BTW, people have been reporting that bug for years now)

https://developercommunity.visualstudio.com/content/problem/...

https://developercommunity.visualstudio.com/content/problem/...

https://developercommunity.visualstudio.com/content/problem/...


That is interesting and counter to my experience. Consider looking in to alternative causes (uninstalling plugins that may not be playing nicely with your particular version, background updates, etc.)


Have you replicated this issue on multiple machines?


If an immutable datastructure causes to many allocations surely that's an issue with the allocator rather than immutability?

I'm not familiar with this compiler but I'm currently writing a mostly immutable structure so I'm curious as to whether it's an issue.


The problem with immutability in a rapidly mutating environment is that the theory clashes with reality.

Anytime a leaf node changes, all its ancestors have to be replaced, instead of just updating the leaf in place. (I'm aware of the red-black node separation, but I believe that in practice most of the tree is constantly regenerated all the time).

I realized it when trying to write a complex analyzer. I had to replace the tree all the way up to the project level. If you combine different chunks of the tree, each with a slight change, you're forced to recreate each of those chunks.

This is extremely wasteful, and no wonder the IDE behaves so poorly.


Also, forgot to mention, that in some analyzers, even if you have no actual code change, the symbols change meaning, and then you're forced to recreate the tree regardless, because you can't change the node-symbol association.



I found that VS 2019 is faster than VS 2017. But VS 2019 with Resharper is a lot more slower than VS 2017 with resharper. I would blame resharper there, not VS. VS never felt that fast than right now.


Every VS release is one reason less to use Resharper.


The biggest reason IMO not to use ReSharper is actually Rider (so I don't use VS anymore either). You get all the ReSharper goodies, but inside an editor that's more nimble than vanilla VS.


I just don't use ReSharper anymore and almost don't miss it. I tried Rider but don't use it anymore. I don't like the Intellij/Rider UI, defaults shortcuts are horrible, basic actions are hiding in submenu. In my opinion, Visual Studio UX is not great, but far better than JetBrains products.


If I learned anything from my Borland days was to only get my IDE tooling from the same factory that does the sausage, instead of always playing catch-up with the OS vendor tooling.


I never had any issues with it, I guess because I keep myself away from JetBrains products as much as possible (have to endure them on Android).


Visual Studio is 32-bit and limited to about 3.5GB of RAM. The increasing functionality and solution sizes create bottlenecks in the processing.

This is a fundamental problem that for various (outdated and bad) reasons the team hasn't fixed. They've been refactoring components to run in separate processes but it's slow progress and still won't solve the main thread running out of memory anyway.


They kind of are already rewriting it, with VSCode + OmniSharp.


OmniSharp + VSCode uses Roslyn as well, so there's no rewrite going on here. But because VSCode is very different to VS - namely that it's a process host where language services run in entirely separate processes. In VS, things are a bit more complicated since lots of things run in the IDE process, but in the case of Roslyn, there are other processes spun up to run specific things.


VS Code is like 50% of VS capabilities.


VS2019 runs fast for me O.o


I still can't believe more people aren't leveraging the Roslyn APIs to write compiler extensions or additional tools around C#. It's conceptually powerful.


Personally speaking, I'd love to, but I'd then have to figure out how to make them work with an IDE. I'm very tired of waiting for record classes to show up and I could have written a (to be clear: inferior) set of stuff around regular classes that magics one into "everything is readonly and we autogenerate a `copy` method", much like Kotlin does for its data classes...but my IDE isn't gonna understand it unless I do a lot more work, and so I never bothered.


I have a code-gen for records and discriminated unions [1] (as well as other useful features). It will generate a record-type at build-time (which includes the background build process in VS). All that's needed is a [Record] or [Union] attribute:

[1] https://github.com/louthy/language-ext/wiki/Code-generation


You just made my day. Not kidding. Thank you.


I'm not sure that it would be that difficult with Roslyn and Visual Studio these days. There are Roslyn-based syntax-highlighters and linters and snippet-generators and code-transformers. I use one for making sure all of my code is not just formatted the way I want it, but auto-inserting "readonly" modifiers for fields that don't get re-written outside of the constructor, one that treats not implementing IDisposable correctly as an error (it can also track the lifetime of a Disposable object and warn when it detects that it never gets disposed, which is super cool), and another one that rainbow-highlights code blocks. The tooling is there to support just a thing, it just needs someone to put all the pieces together.


Hey, that's cool. Sounds like things have really improved. Maybe next time I get back into C# I'll see about writing the gizmo I want, if somebody else hasn't already. Thanks a lot.

(If somebody has--well, I wouldn't mind some lazyweb recommentations...)


What is the name of that IDisposable checking one? That sounds very, very useful.


I think the easiest way to set it all up is to manually edit your CSProj files, especially if you are still building .NET Framework projects. By default, Visual Studio will only create the new SDK Style project file format for .NET Standard and .NET Core projects, but it's still usable for .NET Framework projects if you manually change the format. Once you change it, it sticks, so you can use VS to edit the config after that, but it's still pretty easy to edit by hand now.

So here is my base project config: https://github.com/capnmidnight/Juniper/blob/master/Juniper....

The most important part is the first PropertyGroup sets values for all build configs, in particular is setting LangVersion to 8.0. Framework 4.8 taps out at C# 7.2, but you can use most of the C# 8.0 features, including fully async streams if you manually set the language version. Features that aren't available are some minor things like the array ranges and indexing: https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...

And here are my base project with the analyzers I use: https://github.com/capnmidnight/Juniper/blob/master/Juniper....

They're all ones provided directly from Microsoft, though there are a bunch more from other vendors: https://www.nuget.org/packages?q=analyzer

Then here is an example project using that targets file: https://github.com/capnmidnight/Juniper/blob/master/src/Juni...

You can see just how much the new SDK Style project file format simplifies things. There is no importing of any base Targets files hidden deep in Visual Studio's install directory anymore.

I manually import the .props and .targets file instead of using Directory.Build.props and Directory.Build.targets because I have other projects that use these configs, included via a git submodule.

Here is my .editorconfig file, where I set most of the rules related to Disposable types to errors: https://github.com/capnmidnight/Juniper/blob/master/.editorc...

And this Visual Studio extension makes .editorconfig files a lot nicer to work with: https://marketplace.visualstudio.com/items?itemName=MadsKris...

(BTW, I pretty much install all of Mads Kristensen's extensions)

And while I'm here, I'll give a shout-out to Viasfora for its syntax highlighting modifications that rainbow-highlight code blocks: https://marketplace.visualstudio.com/items?itemName=TomasRes...

And VSColorOutput for making the Output window in Visual Studio actually readable: https://marketplace.visualstudio.com/items?itemName=MikeWard...


5 years ago i wrote a C# code analyzer that found/suggested and fixed async versions of EntityFramework extension methods in your async functions.

Example: if my function was async and it had a .ToList() function call inside it would suggest to use .ToListAsync() and with a click auto fix it in your function

here is the repo for reference :

https://github.com/aviatrix/YARA

It took me couple of days to wrap my head around all of the new concepts, but it was quite fun! To get it integrated with the IDE, you need "CodeAnalyzer" class, and that gets executed automagically and provides annotations in the IDE :)


We've written a custom compiler from C# to Java and JavaScript based on Roslyn. Over time it gained more features and target languages as well (Python is in progress, we can also emit a working GWT wrapper for the JavaScript output, we can emit TypeScript typings or just normal TypeScript as well, etc.). For us this helps us in offering our products on various different platforms without having to write the code in different places anew. Since our product is a library, not an application, we couldn't really take advantage of existing conversion tools that mostly take the path of converting IL to hideous code and an entry point.

The whole thing is now used in basically every library build we have at some point, even for the C# versions, as it ties in with our documentation writing process and places the correct API names and links for that product into the documentation, even though the docs start with mostly the same content for each.

I agree that lack of documentation makes working with Roslyn a bit daunting at times, although the API is very well designed and oftentimes it's very obvious where to look for something. I was also very impressed by their compatibility efforts. We started while Roslyn was in beta and upgrading through the releases worked without a hitch.


I'll second the API being very well designed. It is incredibly legible from a technical perspective and it has actually been generally a joy to figure it out, as opposed to just being told how it works. That being said: I need some bathroom friendly reading material every now and again.


I looked at bit when I was in preview. It’s indeed powerful but also quite verbose and it distinguish between a lot of concepts, so there is a steep learning curve. Then there was the lack of examples beyond a few blog posts.


Oh man, the verbosity thing was a huge problem until C# got "using static". Any static classes that just hold static methods can be imported into your code module as bare functions. I frequently do "using static System.Math;" and "using static System.Console;" when tossing together little mathy processor apps.


Agreed. I have a few ideas and it would be nice to see more examples. Right now it’s hard to make sense of things without spending a lot of time on it.


At some point I wrote a barebones scripting system using roslyn for a project of mine. At runtime I got a piece of code compiled to a DLL in memory and then executed this DLL; Worked well; But at that time there was no support for destroying AppDomains or something, don't remember the exact name. Still pretty fun; But yeah, there's no real complete documentation anywhere; And the DLL hell was real. Dozens of DLL's just to support this; But now with .NET Core 3+ things must have improved a lot;


Roslyn is very powerful, but it is still pretty cumbersome. At one point I spent quite a while trying to get a game scripting system akin to the way that Lua is commonly used working, and I just couldn't get it working fast enough to be viable. I essentially wanted to have a core C# engine that provided services, and then have it call an initialization function and a gameloop function that were defined in designated script files, and then all of my game code would also be written in other scripts; this way I could run things, edit the code on the fly, and hot-reload.

It's entirely possible that I just don't know what I'm doing well enough to do this correctly, but I just couldn't get it to do the kinds of things that I wanted from it. My sense is that it is really great for injecting custom code that is used rather infrequently. I've had good results using it for building out reporting systems that are pluggable.


I think ignorance is a pretty powerful argument to be made by any developer wanting to use Roslyn. The last few times I messed with it there was literally no documentation and everything I knew/know comes from reading headers, trial and error, and experimentation.

I feel defensive about calling it cumbersome though, and I can't imagine why something like your LUA vision isn't possible (though, I've never tried I just assumed someone would inevitably do this). For example, if World of Warcraft were to switch out their UI LUA extension system with C# I could totally imagine this being possible (though it'd be suicidal for their mod community). Likewise, if Unity were to begin using it for this kind of thing (if they don't already) I'd imagine it is possible.


We have some documentation now: https://docs.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/

would really appreciate bugs/comments on these docs pages of what else you would like to see.


Well, it's not cumbersome at all. The problem is really lack of real documentation and most importantly real world examples; About the performance, if you use the real scripting support for Roslyn , yeah, I think you'll have bad performance. But if you compile the code to memory at runtime and execute it like I did, it's pretty fast;


FWIW, I wrote effectively what you describe pre-Roslyn, using the old CSharpCompiler, and it was plenty fast enough for my own developer tolerances. I ended up going to a compile-on-startup mode instead, though.


DLL Hell doesn't refer to "lots of DLLs", it refers to conflicting versions of DLLs not being easily manageable across multiple applications. Outside of putting assemblies in the GAC (which was always a hack of last resort anyway), .NET has never had "DLL Hell".


Yeah, the closest .NET equivalent is Binding Redirection hell, and NuGet and Visual Studio and .NET Core have spent years of work making that better than it was at its worst in the early NuGet days. There's a couple of low level APIs that are still annoying problems to redirect if you need to support particular sets of .NET Framework and .NET Core, but beyond that a lot of it is managed for developers automatically these days.


Yeah, the VS interface for managing it is not great, but the new project file format is a lot easier to understand and manage, so I've had good luck just making manual changes.


> .NET has never had "DLL Hell"

I take it you've never worked on a .Net solution with projects targeting both full framework and Core framework.

Core v1.x stuff was a nightmare - haven't had so many issues with versioning in 20 years. Core v2.0 was still pretty bad but each v2 point release made decent strides - and specific packages would get updated out of band at times to fix issues.

But to say .Net has never had DLL Hell is just wrong. Even pre-Core you could run into difficult situations with conflicting downstream dependencies of directly used packages.


DLL Hell had nothing to do with project development. It was a problem of application deployment and running applications with DLLs in shared locations.

https://en.wikipedia.org/wiki/DLL_Hell

  The problem arises when the version of the DLL on the computer is different than the version that was used when the program was being created. 
Other than the GAC, which was never recommended for use anyway, .NET has never had DLL Hell


GAC was surely the recommended way until .NET 4.0, when the location changed.


No, the recommended way was to install your application with all dependency DLLs in the application install location.

And I've misspoken about GAC causing DLL Hell for .NET. It fixed DLL Hell, but introduced a new Strong Naming Hell.


Initially that recommendation only applied to native code DLLs, not managed ones, as far as I can remember.


Yeah, lots of assemblies is kinda like .NET dll hell ;) .NET Core had this problem in earlier versions. Not anymore;


Maybe Assembly unloading, in which case if you wanted to swap out new dlls in memory you'd have to tear down the process and restart it. Definitely not sexy. That being said: I think Assembly Unloading is a thing now (and AppDomains don't exist in .NET Core last I heard).


In .NET standard (i.e. not .NET Core) you can load and unload assemblies without tearing down whole processes by creating AppDomains within a process. You then load your desired assemblies into these app domain(s), consume and when you need to load say a newer DLL version you just tear down the AppDomain and create a new one for the new DLL's.

It's a feature that's been around since .NET 1.1 and I used to use heavily 10+ years ago.

https://docs.microsoft.com/en-us/dotnet/api/system.appdomain...


Yeah, if I remember correctly it's indeed a thing in latest versions of .NET Core;


This looks like a pretty cool demonstration of what we're talking about: https://www.strathweb.com/2019/01/collectible-assemblies-in-...

See the section titled: Collecting a dynamically emitted assembly


Yeah that's exactly it. Thanks!


From my perspective I’d love to but from writing c# for nearly 20 years now it’s a rough ride for anyone trying to keep up with the platform. Knowledge is thrown in the trash faster than you can learn it and direction changes of all sorts jump on you. I’m not in for investing my time in that any longer.


I don't think that's true. If you're familiar with functional programming concepts- nothing in C# is really that new. In my experience, C# is just getting easier to use as they add more language support for things you'd normally have to implement on your own.


The core language is fine but the surrounding ecosystem is volatile.


Why isn't the native code compiler also now written in C#, like Java is doing?


Depending on what you consider the runtime, a lot of it is already written in C#. And many new concepts are written in C# when they may have been written with C++ in the past, in part due to C# supporting more low-level concepts than before.

That said, large-scale rewrites don't happen just because engineers feel like doing them. In the case of Roslyn, there were multiple compilers (C# language compiler, VB language compiler, C# tooling compiler, VB tooling compiler, etc.), requiring an enormous cost to evolve C# and VB as languages while ensuring that end users of these languages had a good experience using tools like Visual Studio. Beyond that, there was a host of additional tooling to provide more advanced code analysis that was equally expensive to maintain and evolve as the languages evolved. This meant that the kinds of language and tooling innovations end users expected was more challenging to meet. A similar set of challenges - focused on problems centered around the end users - must exist to justify the enormous cost of rewriting a massive engineering system.


People have done research before into having the C#->native JIT be written in C#, I recall seeing prototypes. There are a ton of barriers between a prototype and a shipping implementation though, so I'm not sure we'll ever see it. In particular you risk regressions in startup time or memory usage since the amount of infrastructure needed to run C# (pre-jitted?) to generate all your jitcode is much higher than a small blob of hand-written C/C++ that's spitting out jitcode. See https://www.mono-project.com/news/2018/09/11/csharp-jit/ for one example.

There's an old saying (from one of Unity's developers, I think?) that you can't run Hello World in C# or Java without an XML parser because so much configuration for things like locales ends up pulling in serialization libraries, reflection, etc. Things have improved in this area for both platforms but it's definitely still very difficult to trim managed code down as far as you can trim native code.

For something as critical as the JIT you also want to keep the generated code small (for cache efficiency) and if possible, PGO it - things that existing managed code generators aren't especially great at compared to clang or modern MSVC.

From my past experience writing/maintaining C# compiler and runtime code, I would never bother trying to port the JIT to C#. I don't think the returns are worth the massive investment. I'd sooner port it to a language like Rust for safety benefits or to some sort of hypothetical language that enables producing smaller/faster code to improve JIT performance. I'm not sure I'd invest in that either though because for most workloads JIT time is not the bottleneck (and you can optimize that out in many cases by pre-generating the JITcode). EDIT: Also, for workloads where JIT is the bottleneck, it's questionable whether it's possible to extract big gains out of optimizing the JIT because of the nature of JIT workloads - you may just be bottlenecked on memory bandwidth or instructions per clock. A JIT converting IR to machine code is not trivially vectorized.

People are figuring this out (again, from scratch) now with webassembly as all the modern browsers go through the churn and angst involved in answering the question 'can we actually JIT this whole 50mb executable from scratch at load time?' even though we already knew the answer back when WebAssembly wasn't a spec yet.

[DISCLOSURE: I get paid to work on Mono right now and was paid to help draft the initial WebAssembly spec, and before that my work on the JSIL MSIL->JS compiler was sponsored. So I have some massive biases here.]


The way Java does it is their JIT written in Java is optionally AOT compiled to native code, so it doesn't have a startup time or warmup time problem and it can be PGOd.

We do know that the returns are worth it, because people are able to develop new optimisations in the Java version that people just won't attempt in the C++ version because the code is so much harder to work with (but possibly just due to the age of the C++ version) and its achieving 13% speedup over the C++ version in practice at places like Twitter, which is worth millions of dollars.


This is fascinating. Is the claim here that the C++ version could have never been as fast as the current Java one or just that people have been optimizing the Java one and getting improvements past the C++ one?

Also, it sounds like the old C++ JIT has to be maintained and shipped in the event that the Java-based JIT isn't available at startup, right? If so that makes sense as a stop-gap and it would be more of a tiered JIT, not a port. Tiered JITs are definitely proven, successful technology.


> Is the claim here that the C++ version could have never been as fast as the current Java one

The claim is that the work to add new optimisations to the old C++ code is so difficult that people aren't prepared to do it. The Java code is easier to write and debug, enough so that people are managing to add new optimisations that they haven't added to the C++ code.

> the old C++ JIT has to be maintained and shipped in the event that the Java-based JIT isn't available at startup

Well only until the new Java JIT is mature. The AOT build of the Java JIT can be shipped in the binaries so it's always there.

> If so that makes sense as a stop-gap and it would be more of a tiered JIT, not a port. Tiered JITs are definitely proven, successful technology.

It's used as a new top-tier yes, replacing the C++ top tier. The C++ JIT is likely to go I think, in the medium-to-long term.


I imagine it's on a roadmap somewhere. Having the code->IL layer be readily accessible from managed code will have bigger developer impact than the IL->machine code step, though.


It should be noted they are now contributing to openjdk too!


If .NET/C# could output native binaries for Windows Desktop apps, I could seriously think about switching to C# over C++. How do people deal with securing their source code otherwise? Quite trivial to get the source from a decompiled C# exe file.


It's quite trivial to get the source from a C# exe file in the same way as a decompiler makes it trivial to get the source from a C++ exe file.

Obfuscators are common if you're concerned about the symbols leaking.

If you don't include the .PDB debugging symbol files it's much the same as native binary code. The .NET virtual machine code is a little more expressive but is superficially similar to x86 native code.

In my opinion, if you're concerned about people reading your compiled machine code, the only solution is to run your app on a server and give users an API.


A C# binary can be decompiled back to easy to read source code. Show me how this can be done with a compiled C++ binary. It is a valid concern.


With IDA, HexRays, Hopper and a couple of Python scripts.


You can reverse engineer native binaries, too. Many companies are happily publishing web apps where the actual source is available, in minified form. Why does your source code need to be so secure?


To my understanding commercial software licenses can effectively enforced globally as long as you have reach of the local justice system. So need lawyers to represent you and so on.

If the financial loss isn’t big enough to warrant a global license compliance scheme then I don’t see the source code would be that valuable (in a general commercial contex).

But I don’t think any form of software that is distributed to end users can be fully secured from unlicensed use. You always need a legal recourse if you actually want to stop unlicensed use.

The very-small niche where you can’t afford lawyers but want to force license compliance maybe isn’t a niche you can actually serve through a sound business.

So, rather than seek for automated technical compliance solutions (they don’t really exist without the physical lawyer component) maybe you should find the biggest market you can serve, use the most productive tool for the job and try to make sure unlicensed use can be noticed.


You can use an obfuscator (see e.g. https://stackoverflow.com/questions/19163701/how-can-i-obfus...).

Note that you can still decompile the obfuscated code and look around (I’ve done so to attempt to debug a 3rd party library), but mangling all identifiers makes it quite hard to read.


.NET has been able to do that since version 1.0, via NGEN at installation time.

UWP makes use of AOT compiled .NET via .NET Native.

How do people secure they C++ code otherwise, it is quite trivial to use IDA or HexRays.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: