Saturday, February 20, 2010

Use JavaScript to Format C# Code to HTML

To format C# code online go here.
To add it as a widget go to this post.
To get the most recent source code go here.

Example of how it looks:
using System.Xml.Serialization;

namespace foo
{
    public partial class Form1<AAA> : Form
    {
        void bar<T, R, L>(int k, Form1 f)
            where T : ISerializable
            where R : Dictionary<string, List<Form1>>, ISerializable
        {
            Form1 Form1 = new Form1();
        }
    }
}

I wrote a script in JavaScript to format C# Code.
I needed such a script because I didn't like the old way of copying the code to Microsoft Word, then to Blogger, and then looking for and removing all the <META> and <LINK> tags that Blogger doesn't accept.
Before I did that, I looked on the Internet, and I found this wonderful site: http://www.manoli.net/csharpformat
But it doesn't handle strings very well, and doesn't recognize types.

For example, the expression 'string path = @"c:\";' returns:
<span class="kwrd">string</span> path = @"c:\";
Which means the string is black instead of red.
About types, I think it's not supported at all.
I downloaded their source code and I found some useful things for my code.

I used more accurate regular expressions to identify strings:
// backslash slash before the end using @
string path = @"c:\";
// even number of backslashes means no backslash
string x = " \" \\\" \\\\";
/* support for multi line
   comments and strings
*/
string y = @" begin
                end";
/// <summary> This kind is also supported
/// </summary>

Formatting strings, comments and keywords is no big deal, but what about types?
Sometimes you don't know if something is a type or a variable, for example:
Something.DoIt();
(There is a way of telling the script that "Something" is a type, and it will be explained later)
It could be a field or a property called "Something", or it could be a class called "Something" and in that case "DoIt" would be a static function.
But sometimes you do know that something is a type, and it worth remembering it for the future:
class Something { } // here the script identifies "Something" as a type
Something.DoIt(); // here it uses the information from earlier

But what about "Application.DoEvents()" ?
In that case the script has a list of common types:
Application.DoEvents();

I compiled this list using this code:
static string GetAllTypes()
{
    string regex = "";
    foreach (Assembly asm in AppDomain.CurrentDomain.GetAssemblies())
    {
        foreach (Type type in asm.GetTypes())
        {
            string name = new Regex("(`.*)").Replace(type.Name, "");
            if (new Regex("^[A-Za-z0-9]+$").IsMatch(name))
            {
                regex += name + "|";
                Match match = new Regex("^(.+)Attribute$").Match(name);
                if (match.Success){
                    regex += match.Groups[1].Value + "|";
                }
            }
        }
    }
    return regex;
}

Sometimes the long list affects the loading time of the blog, so I have another manually written list that has the most commonly used types like: List, EventArgs, ...
Note that the shorter list doesn't necessarily mean the script wont recognize any types, that is because it learns the types from the code itself (as shown in the example above).

In order to use the script you need to write the following code:
<pre formatcs=1>
your code here
</pre>

If the script doesn't recognize a type from some reason, like in the in example of "Something.DoIt()" case, then you can instruct the script to recognize it by adding a new attribute called "types":
<pre formatcs=1 types="Something">
Something.DoIt();
</pre>
And the result:
Something.DoIt();
The types attribute accepts a list of type names, and it can be separated by comma, white space, anything.

The script can handle generics - means it doesn't paint it like in here:
class MyClass<T>
{
    List<T> x = new List<T>();
}

However, there might be cases where the class definition is missing, and then T might be mistaken for type:
x = new T();

Another issue with generics and more specifically with the "<" sign is that you have to write "&lt;", otherwise Blogger is going crazy adding all sorts of strange ending tags to the posts. This behavior is not caused by the script, but it does keep it from working correctly. I don't know how it behaves on other blogging sites, but I tested the script offline on my computer it works with normal "<" too.

I plan to make a gadget that adds this functionality to a blog. I will call it "C# Code Formatter".
But in the meantime the script can be used from here:
jscsharpformatter.googlecode.com

I added a new "HTML/JavaScript" gadget on the bottom of the page with the following code:
<script type="text/javascript" src="http://sites.google.com/site/ycouriel/files/csharp_formatter_normal.js">
</script>
<script type="text/javascript">
var howmany = findAndFormatCSharp();
document.write(howmany + " c# code elements were formatted");
</script>

Feedback is very welcome!

2 comments:

  1. visit http://codebeautify.org/

    it is one of the good online editor+beautifier+minifier+validator for javascript,json,xml,html,css....

    ReplyDelete
    Replies
    1. Loved http://codebeautify.org/jsonviewer one of the very nice tool.

      Delete