Maxime Euzière / articles / obfuscatweet

Obfusc-a-tweet reloaded

November 2015 - January 2020

TL;DR - the app is here: xem.github.io/obfuscatweet-reloaded

Backlog

In 2014, Subzey and I introduced Obfusc-a-tweet, allowing to fit 190 chars of JavaScript in a single 140-chars tweet, by packing a pair of ASCII chars in 95 Unicode symbols and using the other 45 chars to unpack and execute them.
In 2015, the idea was to take advantage of Twitter's URL shortener: pack js code in up to 4 URLs and add just enough glue to make the chole tweet executable.
In 2020, we extended it to take advantage of the new limit of 8 URLs in a single tweet.

The limits of Twitter's URLs

When you tweet an URL, Twitter converts it into a 23-chars URL, in the form https://t.co/xxxxxxxxxx.
In the final tweet, your original URL can appear truncated on the left or on the right, or both, according to this rule.
But when you copy-paste such a tweet from Twitter to a text editor or a JS console, you get the entire original URL (instead of the shortened URL or the truncated URL).
(in 2015, it was followed by a space and an ellipsis character "…", but it's not the case anymore).
A tweet can contain up to 10 URLs, separated by a space a punctuation or an astral character.
A tweeted URL can be up to 4096 chars long. If it's longer, il will appear as text and overflow the 280-char tweet limit.
If you use eight 4096-char urls, you can't really add a 9th and a 10th URL in your tweet, because they won't be interpreted correctly
International domain names (IDN) are not well supported. For example, you can tweet "http://mañana.com" and it will work fine. If you tweet a 140-chars-long URL like "http://mañññ(...)ññññññana.com", it will appear as text instead of a link.
If you stick to regular domain names (limited to letters, numbers and dashes), and if you choose a two-letter-long top-level domain like ".fr", the URL's domain name can take up to 4086 letters or numbers. The 10 remaining chars will be used by the automatic "http://" and the TLD ".fr". As you can imagine, this is not very interesting to store JS code.
But if you choose a one letter-long domain-name and a two-letter-long top-level-domain (like "a.fr/..."), the URL can contain up to 4077 chars after the slash. This is interesting to store JS code, because these 4084 chars can include any url-encoded text... and more!

Packing JS code

So, if we want to store a lot of text in a 280-chars tweet, the theorical limit is 8 x 4077 url-encoded chars (stored into 8 URLs) + 96 other characters.

But our goal is to have an executable tweet, that can be copied as-is into a JS console and run instantly. Thus, the tweet needs to contain some long URLs plus a small JS unpacker and executer. After a lot of trial and error, here is the most efficient solution I could find: (it contains 8 URLs and uses all the other chars to unpack and execute the JS code).

eval(decodeURI(" http://a.fr/SSS http://a.fr/TTT http://a.fr/UUU http://a.fr/VVV http://a.fr/WWW http://a.fr/XXX http://a.fr/YYY http://a.fr/ZZZ".replace(/ .{1,12}/g,'')))

... where SSS, TTT, UUU, VVV, WWW, XXX, YYY and ZZZ are url-encoded strings up to 4077 chars long.

Note that the URLs are placed into a long string, surrounded by JS code.

When you execute that into a JS console, here's what happens:

// step 0: copy-paste the tweet

eval(decodeURI(" http://a.fr/SSS http://a.fr/TTT http://a.fr/UUU http://a.fr/VVV http://a.fr/WWW http://a.fr/XXX http://a.fr/YYY http://a.fr/ZZZ".replace(/ .{1,12}/g,'')))

// step 1: the replace(/ .{1,12}/g,'') transforms the string to keep only the interesting parts
// it removes the beginning: " http://a.fr/", because it's a space followed by 12 characters.
// After the replace, here's the remaining string:

eval(decodeURI("SSSTTTUUUVVVWWWXXXYYYZZZ"))

// step 2: the remaining string is url-decoded to produce the final code
// for example, any occurrence of "%7D" will be decoded to a tilde ("~"), etc.

eval("final code")

// step 3: the remaining string (our final JS code) is evaluated

At this point, we can potentially store up to 8 * 4077 = 32,616 JS chars in a single tweet! But this total includes escape sequences for many characters that can't be used into URLs (such sequences take 3 to 12 chars each).

A word about encoded URIs

Here are the 95 printable ASCII chars:

 !"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^
_`abcdefghijklmnopqrstuvwxyz{|}~

Here are the same chars after passing them through JS's encodeURI:

%20!%22#$%25&'()*+,-./0123456789:;%3C=%3E?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5D%5E
_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~

Basically, according to JS, the only chars that need to be escaped are space, ", %, <, >, [, \, ], ^, `, {, | and }.

But Twitter has some quirks: it doesn't support URLs containing empty parenthesis like "()" or unbalanced parenthesis like "a(a)a)" or "(a(a)a" and on the other hand, it supports URLs containing %, [, ] and |.

We need to take that into account to build a packer as effective as possible. (We will keep parenthesis and percent signs escaped to avoid weird parsing bugs, and we will keep [, ] and | unescaped because they are safe enough to be used like that in our URLs).

Building the packer

To sum things up, if we want to make a JS packer, the mission is to take a JS string as input, URL-encode it, URL-encode the parenthesis characters that can cause problems on Twitter, decode the three chars that don't need to be encoded ([, ], |), split the result in 4077-char chunks, pack these chunks in URLs, surround these URLs with an unpacker/executer, and output the result as a tweetable message.

Here's the packer! (use the first option to test the above technique)

UPDATE #1: Go further with charcode shifting!

After releasing the first version of this packer, I realized with Anders Kaare that a lot of non-ASCII chars were actually useable in Twitter's URLs. After some manual research, we discovered that there's actually a total of 483 chars that can be used safely.

!#$&'()*+,-./0123456789:;=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz[]|~ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňŉŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃǄǅǆǇǈǉǊǋǌǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰǱǲǳǴǵǶǷǸǹǺǻǼǽǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗȘșȚțȜȝȞȟȠȡȢȣȤȥȦȧȨȩȪȫȬȭȮȯȰȱȲȳȴȵȶȷȸȹȺȻȼȽȾȿɀɁɂɃɄɅɆɇɈɉɊɋɌɍɎɏ

This represents a subset of the Unicode blocks "Basic Latin" and "Latin-1 supplement" plus the whole blocks "Latin-extended-A" and "Latin-extended-B", containing respectively 128 and 256 characters.

So I had the idea to "shift" all the ASCII characters present in the JS source code (U+0000 - U+007F), to represent them with Latin-extended-A characters (U+0100 - U+017F). The advantage here is to avoid all the escape sequences needed to represent the special chars of our JS code in an URL. All we need to do is to add 0x100 to each charcode.

With the previous version, a 14-chars string like "Hello World ^^" was URL-encoded as "Hello%20World%20%5E%5E" (which is 22 chars long). With this "shift", it's converted to "ňťŬŬůĠŗůŲŬŤĠŞŞ" (which is only 14 chars long).

Of course, this shifted code is not directly executable, it needs to be unshifted first. Here's the unshifter I wrote with the help of Subzey:

eval(unescape(escape('...shifted code...').replace(/u../g,'')))

If you're confused by this code, you can read an explanation here. (we used the same principle in the original obfusc-a-tweet app)

We don't have room in the tweet itself to include this unshifter, because the tweet is already filled with the URLs and their unpacker and executer, so we need to place this code directly inside our URLs.

Here's an example of a complete tweet:

eval(decodeURI(" http://a.fr/eval%28unescape%28escape%28'SSS http://a.fr/TTT http://a.fr/UUU http://a.fr/VVV http://a.fr/WWW http://a.fr/XXX http://a.fr/YYY http://a.fr/ZZZ'%29.replace(/u../g,'')%29%29".replace(/ .{1,12}/g,'')))

where SSS and ZZZ are a bit shorter to make room for the JS code.

Note that the first URL starts with an URL-encoded version of eval(unescape(escape(' and the last URL ends with an URL-encoded version of ').replace(/u../g,''))).

Here's what happens when this tweet is executed:

// step 0: copy-paste the tweet

eval(decodeURI(" http://a.fr/eval%28unescape%28escape%28'SSS http://a.fr/TTT http://a.fr/UUU http://a.fr/VVV http://a.fr/WWW http://a.fr/XXX http://a.fr/YYY http://a.fr/ZZZ'%29.replace(/u../g,'')%29%29".replace(/ .{1,12}/g,'')))

// Unpacker-side:

// step 1: the replace() transforms the string to keep only the interesting parts
// After the replace, here's the remaining string:

eval(decodeURI("eval%28unescape%28escape%28'SSSTTTUUUVVVWWWXXXYYYZZZ'%29.replace(/u../g,'')%29%29"))

// step 2: the remaining string is url-decoded:

eval("eval(unescape(escape('WWWXXXYYYZZZ').replace(/u../g,'')))")

// step 3: the remaining string is evaluated:

eval(unescape(escape('WWWXXXYYYZZZ').replace(/u../g,'')))

// Unshifter-side:

// Step 4: the string is unshifted to produce the final code

eval('final code')

// Step 5: the remaining string is evaluated.

This unshifter consumes 57 chars of our URLs, but it removes all risk of escape sequences. So we're now sure to store up to 32,559 chars of JS in a tweet.

Here's the updated packer! (choose the second option under the input section to use the shifter)

And here's a demo

UPDATE #2: Go even further with shift + PNG bootstrapping

So, if we sum up our possibilities right now, we can either execute up to 32,616 chars of JS (including the escape sequences for all the chars that are not URL-safe), or execute up to 32,559 chars of JS (shifted to avoid using any escape sequence). This works fine with minified and even RegPacked code, as long as it's ASCII only. But what if we could gzip our JS code, embed it into our 32,559 shifted chars, and add a little bit of extra code to extract and execute it?

Well, some people already kinda had this idea, and it's called JsExe. JsExe encodes the chars of any JS code in the pixels of a greyscale PNG image. The PNG format is natively gzipped. The PNG is then saved as a HTML page that includes itself in an IMG, draws it on a canvas, and reads the pixels to retrieve and execute the original JS code. Even though the extre code takes about 200 bytes, this "PNG bootstrapping" technique is generally more efficient than RegPack, even for demos as small as 1kb.

So we decided to try that. First, Subzey and I developed zpng, a pure JS clone of JsExe. It turned out to be even more efficient than JsExe thanks to the use of zopfli compression!

Then I took the code that generates PNG from zpng and included it in obfuscatweet-reloaded.

I made it encode the JS source code into a PNG, converted the PNG bytes into extended ASCII, shifted them, and it gave me something like that:

// Source code
alert("Hello World ^^");

// PNG as dataURI
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABkAAAABCAAAAAAOB1pJAAAAIklEQVR4AWNIzEktKtFQ8kjNyclXCM8vyklRiItT0rRmAABw4wfh
  
// PNG's content, as text

‰PNG


IHDRZI"IDATxcHÌI-*ÑPòHÍÉÉWÏ/ÊIQˆ‹SÒ´fpãá
  
// PNG's content, as shifted text
ƉŐŎŇčĊĚĊĀĀĀčŉňńŒĀĀĀęĀĀĀāĈĀĀĀĀĎćŚŉĀĀĀĢŉńŁŔŸāţňǌŉĭĪǑŐǲňǍǉǉŗĈǏįǊŉőƈƋœǒƴŦĀĀŰǣćǡ

Cool, so now the PNG's content is tweetable. In order to use it, we have to unshift it, convert it in dataURI, put it in an image, draw that image on a canvas, read the pixels of the canvas, transform it in JS and execute the JS.

The principle is the following:

var png = "ƉŐŎŇčĊĚĊĀĀĀčŉňńŒĀĀĀęĀĀĀāĈĀĀĀĀĎćŚŉĀĀĀĢŉńŁŔŸāţňǌŉĭĪǑŐǲňǍǉǉŗĈǏįǊŉőƈƋœǒƴŦĀĀŰǣćǡ";
var g;
// (snip) unshift png and put the result in g
z=new Image;
z.src='data:image/png;base64,'+btoa(g);
V=document.createElement('canvas');
C=V.getContext("2d");
for($=_=''; C.drawImage(z,$--,0), X=(q=C.getImageData(0,0,64,32)).data[0]; _+=String.fromCharCode(X)){};
(e=eval)(_);

Let's golf it:

(z=new Image).src='data:;base64,'+btoa(g);C=document.createElement('canvas').getContext('2d');for($=_='';C.drawImage(z,$--,0),X=(q=C.getImageData(0,0,64,32)).data[0];_+=String.fromCharCode(X));(e=eval)(_)

204 bytes. Great! Now here's the real trick. The PNG content is stored as shifted text. This bootstrap code should also be shifted, to use as few chars as possible. So the idea is to put the two shifted blocks in the same string:

var tmp = 'ĨźĽŮťŷĠŉŭšŧťĩĮųŲţĽħŤšŴšĺĻŢšųťĶĴĬħīŢŴůšĨŧĮųŬũţťĨĲıĵĩĩĻŃĽŤůţŵŭťŮŴĮţŲťšŴťŅŬťŭťŮŴĨħţšŮŶšųħĩĮŧťŴŃůŮŴťŸŴĨħĲŤħĩĻŦůŲĨĤĽşĽħħĻŃĮŤŲšŷŉŭšŧťĨźĬĤĭĭĬİĩĬŘĽĨűĽŃĮŧťŴŉŭšŧťńšŴšĨİĬİĬĶĴĬĳĲĩĩĮŤšŴšśİŝĻşīĽœŴŲũŮŧĮŦŲůŭŃŨšŲŃůŤťĨŘĩĩĻĨťĽťŶšŬĩĨşĩ' /*bootstrap*/ + 'ƉŐŎŇčĊĚĊĀĀĀčŉňńŒĀĀĀęĀĀĀāĈĀĀĀĀĎćŚŉĀĀĀĢŉńŁŔŸāţňǌŉĭĪǑŐǲňǍǉǉŗĈǏįǊŉőƈƋœǒƴŦĀĀŰǣćǡ' /* PNG */

Then unshift this string, as we already did in the previous chapter, an put the result in g:

g=unescape(escape('ĨźĽŮťŷĠŉŭšŧťĩĮųŲţĽħŤšŴšĺĻŢšųťĶĴĬħīŢŴůšĨŧĮųŬũţťĨĲıĵĩĩĻŃĽŤůţŵŭťŮŴĮţŲťšŴťŅŬťŭťŮŴĨħţšŮŶšųħĩĮŧťŴŃůŮŴťŸŴĨħĲŤħĩĻŦůŲĨĤĽşĽħħĻŃĮŤŲšŷŉŭšŧťĨźĬĤĭĭĬİĩĬŘĽĨűĽŃĮŧťŴŉŭšŧťńšŴšĨİĬİĬĶĴĬĳĲĩĩĮŤšŴšśİŝĻşīĽœŴŲũŮŧĮŦŲůŭŃŨšŲŃůŤťĨŘĩĩĻĨťĽťŶšŬĩĨşĩƉŐŎŇčĊĚĊĀĀĀčŉňńŒĀĀĀęĀĀĀāĈĀĀĀĀĎćŚŉĀĀĀĢŉńŁŔŸāţňǌŉĭĪǑŐǲňǍǉǉŗĈǏįǊŉőƈƋœǒƴŦĀĀŰǣćǡ').replace(/u../g,''))

Then we execute only the first 204 chars of g.

eval((g=unescape(escape('ĨźĽŮťŷĠŉŭšŧťĩĮųŲţĽħŤšŴšĺĻŢšųťĶĴĬħīŢŴůšĨŧĮųŬũţťĨĲıĵĩĩĻŃĽŤůţŵŭťŮŴĮţŲťšŴťŅŬťŭťŮŴĨħţšŮŶšųħĩĮŧťŴŃůŮŴťŸŴĨħĲŤħĩĻŦůŲĨĤĽşĽħħĻŃĮŤŲšŷŉŭšŧťĨźĬĤĭĭĬİĩĬŘĽĨűĽŃĮŧťŴŉŭšŧťńšŴšĨİĬİĬĶĴĬĳĲĩĩĮŤšŴšśİŝĻşīĽœŴŲũŮŧĮŦŲůŭŃŨšŲŃůŤťĨŘĩĩĻĨťĽťŶšŬĩĨşĩƉŐŎŇčĊĚĊĀĀĀčŉňńŒĀĀĀęĀĀĀāĈĀĀĀĀĎćŚŉĀĀĀĢŉńŁŔŸāţňǌŉĭĪǑŐǲňǍǉǉŗĈǏįǊŉőƈƋœǒƴŦĀĀŰǣćǡ').replace(/u../g,''))).slice(0,204))

Let's go back to the bootstrap code. Instead of creating a dataURI from g, it needs to create it from g.slice(204).

(z=new Image).src='data:;base64,'+btoa(g.slice(204));C=document.createElement('canvas').getContext('2d');for($=_='';C.drawImage(z,$--,0),X=(q=C.getImageData(0,0,64,32)).data[0];_+=String.fromCharCode(X));(e=eval)(_)

But the ".slice(204)" adds 11 chars to the bootstrap, so we need to replace 204 to 215 everywhere:

// In bootstrap
(z=new Image).src='data:;base64,'+btoa(g.slice(215));C=document.createElement('canvas').getContext('2d');for($=_='';C.drawImage(z,$--,0),X=(q=C.getImageData(0,0,64,32)).data[0];_+=String.fromCharCode(X));(e=eval)(_)

// In unshifter
eval((g=unescape(escape('ĨźĽŮťŷĠŉŭšŧťĩĮųŲţĽħŤšŴšĺĻŢšųťĶĴĬħīŢŴůšĨŧĮųŬũţťĨĲıĵĩĩĻŃĽŤůţŵŭťŮŴĮţŲťšŴťŅŬťŭťŮŴĨħţšŮŶšųħĩĮŧťŴŃůŮŴťŸŴĨħĲŤħĩĻŦůŲĨĤĽşĽħħĻŃĮŤŲšŷŉŭšŧťĨźĬĤĭĭĬİĩĬŘĽĨűĽŃĮŧťŴŉŭšŧťńšŴšĨİĬİĬĶĴĬĳĲĩĩĮŤšŴšśİŝĻşīĽœŴŲũŮŧĮŦŲůŭŃŨšŲŃůŤťĨŘĩĩĻĨťĽťŶšŬĩĨşĩƉŐŎŇčĊĚĊĀĀĀčŉňńŒĀĀĀęĀĀĀāĈĀĀĀĀĎćŚŉĀĀĀĢŉńŁŔŸāţňǌŉĭĪǑŐǲňǍǉǉŗĈǏįǊŉőƈƋœǒƴŦĀĀŰǣćǡ').replace(/u../g,''))).slice(0,215))

And we're done! Phew! So, here's what happens when we tweet a PNG bootstrapped code:

// step 0: copy-paste the tweet
// WWW, XXX, YYY and ZZZ represent the shifted string containing the bootstrap code and the PNG
eval(decodeURI(" http://h.ck/eval%28%28g=unescape%28escape%28'SSS http://h.ck/TTT http://h.ck/UUU http://h.ck/VVV http://h.ck/WWW http://h.ck/XXX http://h.ck/YYY http://h.ck/ZZZ'%29.replace(/u../g,'')%29%29.slice(0,215)%29".replace(/ .{1,12}/g,'')))

// Unpacker-side:

// step 1: the replace() transforms the string to keep only the interesting parts
// After the replace, here's the remaining string:

eval(decodeURI("eval%28%28g=unescape%28escape%28'SSSTTTUUUWWWXXXYYYZZZ'%29.replace(/u../g,'')%29%29.slice(0,215)%29"))

// step 2: the remaining string is url-decoded:

eval("eval((g=unescape(escape('SSSTTTUUUWWWXXXYYYZZZ').replace(/u../g,''))).slice(0,215))")

// step 3: the remaining string is evaluated:

eval((g=unescape(escape('SSSTTTUUUWWWXXXYYYZZZ').replace(/u../g,''))).slice(0,215))


// Unshifter-side:

// Step 4: unshift, g contains all the unshifted string, and the first 215 chars are evaluated

g = "(z=new Image).src='data:;base64,'+btoa(g.slice(215));C=document.createElement('canvas').getContext('2d');for($=_='';C.drawImage(z,$--,0),X=(q=C.getImageData(0,0,64,32)).data[0];_+=String.fromCharCode(X));(e=eval)(_) // +  PNG content";

eval(g.slice(0,215))

// PNG bootstrap side

// z contains an image whose src is g.slice(215)
// C contains a canvas context2d
// _ contains our final JS code
// e is a shortcut for eval
e(_)

// Step 5: the final code is evaluated.

We successfully embedded a PNG bootstrap technique in a tweet.

Unfortunately, non-Webkit browsers bug when they try to read and execute a PNG image bigger than 16kb, that's why we recommend using the non-PNG option for scripts that are not too big, or the PNG option only if the zipped code is not too large.

Cheers,
@MaximeEuziere

XEM

game designer & JS hacker