The Cleaner

Task To Be Addressed

  The simple .URL file.  Your link back to a website you might wish to revisit in the future.  Inaccessible to most users because they lack a programmer's editor (like UltraEdit), they reside in your Favorites directory (and each of the subdirectories you have created).  Windows Explorer offers no hints as to the contents.  Ideal for hiding tracking links.  Mind you, not all .URL files are as bad as the one I am going to show you, but it sure looks like fertile ground to me.  Let's have a peek inside one...

[DEFAULT]
BASEURL=http://sustainablesources.com/
[DOC#564]
BASEURL=http://www.youtube.com/embed/uGsmKY_RrmI
ORIGURL=http://www.youtube.com/embed/uGsmKY_RrmI
[DOC_google_ads_frame1]
BASEURL=http://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-1859845853929504&output=html&h=600&slotname=1323859760&w=160&lmt=1336790857&ea=0&flash=11.2.202.235&url=http%3A%2F%2Fsustainablesources.com%2F&dt=1336790857526&shv=r20120502&jsv=r20110914&saldr=1&correlator=1336790857554&frm=20&adk=2902830490&ga_vid=1099170898.1336790858&ga_sid=1336790858&ga_hid=1957588677&ga_fc=0&u_tz=-420&u_his=245&u_java=1&u_h=1080&u_w=1920&u_ah=1050&u_aw=1920&u_cd=32&u_nplug=0&u_nmime=0&dff=tahoma&dfs=11&adx=-2&ady=-2&biw=932&bih=935&oid=2&docm=8&fu=0&ifi=1&dtd=50
ORIGURL=http://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-1859845853929504&output=html&h=600&slotname=1323859760&w=160&lmt=1336790857&ea=0&flash=11.2.202.235&url=http%3A%2F%2Fsustainablesources.com%2F&dt=1336790857526&shv=r20120502&jsv=r20110914&saldr=1&correlator=1336790857554&frm=20&adk=2902830490&ga_vid=1099170898.1336790858&ga_sid=1336790858&ga_hid=1957588677&ga_fc=0&u_tz=-420&u_his=245&u_java=1&u_h=1080&u_w=1920&u_ah=1050&u_aw=1920&u_cd=32&u_nplug=0&u_nmime=0&dff=tahoma&dfs=11&adx=-2&ady=-2&biw=932&bih=935&oid=2&docm=8&fu=0&ifi=1&dtd=50
[DOC_I1_1336790857644]
BASEURL=https://plusone.google.com/_/+1/fastbutton?bsv=p&url=http%3A%2F%2Fsustainablesources.com%2F&size=standard&count=false&hl=en-US&jsh=m%3B%2F_%2Fapps-static%2F_%2Fjs%2Fgapi%2F__features__%2Frt%3Dj%2Fver%3DNec4xg3wDg8.en_US.%2Fsv%3D1%2Fam%3D!AuYF0E1N7E-Ine7KrA%2Fd%3D1%2Frs%3DAItRSTMUSSt3OSnDgL9qnPccCbWYHQBtyg
ORIGURL=https://plusone.google.com/_/+1/fastbutton?bsv=p&url=http%3A%2F%2Fsustainablesources.com%2F&size=standard&count=false&hl=en-US&jsh=m%3B%2F_%2Fapps-static%2F_%2Fjs%2Fgapi%2F__features__%2Frt%3Dj%2Fver%3DNec4xg3wDg8.en_US.%2Fsv%3D1%2Fam%3D!AuYF0E1N7E-Ine7KrA%2Fd%3D1%2Frs%3DAItRSTMUSSt3OSnDgL9qnPccCbWYHQBtyg#id=I1_1336790857644&parent=http%3A%2F%2Fsustainablesources.com&rpctoken=538808861&_methods=onPlusOne%2C_ready%2C_close%2C_open%2C_resizeMe%2C_renderstart
[InternetShortcut]
URL=http://sustainablesources.com/
IDList=
IconFile=http://sustainablesources.com/wp-content/themes/atahualpa3.7.10/images/favicon/favicon.ico
IconIndex=1
[{000214A0-0000-0000-C000-000000000046}]
Prop3=19,2

  That was pretty bad, no?  All your browser needs in order for you to navigate your way back to this site is listed under the bracketed text "[InternetShortcut]" - the site's URL= address.  Nothing more.  But look what we have here.  Links to YouTube, Google Leads, Doubleclick (ulgh) Google's Plus One, a link to an icon file and then some cryptic code.  Each time to click your link to return to the site, all of this unnecessary hoohah gets triggered without your knowledge.  Well friends, this is not for me.

Development

The idea is simple.  Take the above example, eliminate the unnecessary and rewrite the .URL as follows:

[InternetShortcut]
URL=http://sustainablesources.com/

I will post the commented source code in the implementation section.  It is written in C++ so you will need to download a C++ development environment so that the program can be compiled to work on your machine.  I use the Code::Blocks IDE (available at http://www.codeblocks.org).  In addition, there are some limitations:

Implementation

// The CLEANSE program acts on the file created with the DOS "dir *.url /o:n /x /w" command.
//
// Volume in drive C is OS
// Volume Serial Number is 780A-AE63
//
// Directory of C:\Users\Scott\Programming\Beginning_Programming-CPP2\QuickPrep1
//
//11/22/1999**02:32*PM***************132*COMPUT~1.URL*Computers & Structures, Inc. Home Page.url
//11/22/1999**02:20*PM***************124*COSMOS~1.URL*COSMOS, the line of powerful FEA software and design analysis.url
//11/22/1999**02:24*PM***************287*DOWNLO~1.URL*Download FEMAP and mtabSTRESS FEA finite element analysis sof.url
//11/22/1999**02:01*PM***************116*ENGINE~1.URL*Engineering News-Record-enr.com homepage.url
//11/22/1999**02:11*PM***************120*LUSASF~1.URL*LUSAS Finite Element Analysis - Home Page.url
//11/22/1999**02:28*PM***************128*MATHTO~1.URL*Mathtools.net Scientific portal for MATLAB, MIDEVA, Excel, C,.url
//11/22/1999**01:41*PM***************126*RESEAR~1.URL*RESEARCH ENGINEERS.url
//11/22/1999**02:21*PM***************124*WELCOM~1.URL*Welcome to AutoFEA - Finite Element Analysis Software.url
//**************8 File(s) 1,157 bytes
//**************0 Dir(s) 1,336,025,403,392 bytes free
//***************************************^**********^****Note filename position.
//0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
// 1 2 3 4 5 6 7 8 9 X 1 2
//
// The contents of each .url file are reviewed, the URL= line is extracted, and the file rewritten with just the label
// [InternetShortcut] and the URL= text. Yes, more includes than necessary. Pare them down if you like.
//
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <iosfwd>
#include <cstring>
#include <sstream>
#include <string>
using namespace std;
//
// Global constants
const int MaxArrayChars = 1000; // Length of "lines" as read from files.
//
// Replacement for the EOLN() function from Pascal. Checks for Enf Of LiNe.
bool bEOLN(char ch)
{
if (ch == '\n') { return true; }
else { return false; }
}
//
// Fill the cLine array with blanks. Array spaces 0..254. Set length to zero.
// The number of cells is not passed in the parameter.
void ClearLine(char caLine[], int& nLineL)
{
for (int n = 0; n < MaxArrayChars; n++) { caLine[n] = ' '; }
nLineL = 0;
}
//
// Using the supplied file, retrieve one line of characters and return the
// number of characters found.
//
// Aside: The character array variable begins with "c" for character and "a" for array.
// The integer variable nLineL begins with "n" for integer and ends with "L" for length.
// nLineL returns the number of occupied spaces in the array -> 1 if [0] is filled,
// 7 if [0..6] are filled.
//
void ReadLine(istream& inputFile, char caLine[], int& nLineL)
{
ClearLine(caLine,nLineL);
inputFile.get(caLine[0]);
while ((!inputFile.eof()) && (!bEOLN(caLine[nLineL])))
{
nLineL++;
inputFile.get(caLine[nLineL]);
}
}
// Write a line to the output file.
void WriteLine(ostream& outputFile, char caLine[], int nLineL)
{
if (nLineL != 0)
{
for (int m = 0; m < nLineL; m++) { outputFile.put(caLine[m]); }
outputFile << endl;
}
else { outputFile << endl; }
}
//
// The .url file looks like:
//[DEFAULT]
//BASEURL=http://www.autofea.com/ (Several of these statements may be present.)
// (There may also be lonks to Google AdSense, Facebook, DoubleClick, etc.)
//
//[InternetShortcut] (Only one of these)
//URL=http://www.autofea.com/ (Only one of these)
//Modified=40F884897235BF01AC
//(There may be more crap down here related to Favorit.ICO)
//
//
void GetURL(istream& inputFile, char caURL[], int& nURLL)
{
bool Flag = false;
while (!Flag)
{
ReadLine(inputFile,caURL,nURLL);
if ((caURL[0] == 'U') && (caURL[1] == 'R') && (caURL[2] == 'L')) { Flag = true; }
else { ClearLine(caURL,nURLL); }
}
}
//
// See above for filename position.
//
void ExtractFilename(char caLine[],char caFilename[],int& nFilenameL)
{
nFilenameL = 0;
for (int n = 39; n <=50; n++)
{
switch (caLine[n])
{
// Eliminate forward spaces in filename using ' ' -> break.
case ' ': break;
default :
caFilename[nFilenameL] = caLine[n];
nFilenameL++;
}
}
}
//
// Build a string out of an array of characters
//
void BuildString(char cFilename[], int nFilenameL, string& strFilename)
{
string strFn(1,cFilename[0]);
for (int n = 1; n < nFilenameL; n++) { strFn += cFilename[n]; }
strFilename = strFn;
}
//
// Reminder area for previous functions
//
// bool bEOLN(char ch)
// void ClearLine(char caLine[], int& nLine)
// void ReadLine(istream& inputFile, char caLine[], int& nLineL)
// void WriteLine(ostream& outputFile, char caLine[], int nLineL)
// void GetURL(istream& inputFile, char caURL[], int& nURLL)
// void ExtractFilename(char caLine[],char caFilename[],int& nFilenameL)
// void BuildString(char cFilename[], int nFilenameL, string& strFilename)
//
int main(int nNumberofArgs, char* pszArgs[])
{
// Initialize main input and output files and check for good input.
ifstream in_stream("dirlist.txt", ios_base::in);
if (!in_stream)
{
cout << "Could not open file. Exiting.";
exit(1);
}
//
// End of initialization.
//
// Main program body.
int nTempL = 0; // Temporary Line Length
int nURL = 0; // URL length
int nfilenameL = 0; // filename length (the same for both read and write)
char caTempLine[MaxArrayChars]; // Temporary Line - an array of characters.
char caULine[MaxArrayChars]; // Holds the .URL link text.
char caTempFN[MaxArrayChars]; // Holds the .URL filename. Use standard array even though only 12 spaces required.
string FNstr; // An assembled string from array characters.
int nEndofList = 0; // Flag to indicate we are through processing directory lines.
string sTempFN1; // The identifier for each .URL to be read.
string sTempFN2; // The identifier for each .URL to be written.
//
// Dispose of first five lines in the "dirlist.txt" file.
for (int n = 1; n <= 5; n++) { ReadLine(in_stream,caTempLine,nTempL); }
//
// If the file lacks .url entries, the first character is an "F" as in "File not found".
// Otherwise it will be a number. A space in the first position signals the end of the
// .url entries. When nEndofList is found, value increases and loop is terminated.
//
for (;;)
{
ClearLine(caTempLine,nTempL);
ReadLine(in_stream,caTempLine,nTempL);
if (nTempL != 0)
{
switch (caTempLine[0])
{
case 'F':
cout << "No .url files found.";
nEndofList = 2;
exit(1);
case ' ':
cout << "End of .url listing found. Normal termination.";
nEndofList = 1;
break;
default:
ExtractFilename(caTempLine,caTempFN,nfilenameL);
BuildString(caTempFN,nfilenameL,FNstr);
ifstream sTempFN1(FNstr.c_str(), ios::in);
GetURL(sTempFN1,caULine,nURL);
sTempFN1.close();
//
ofstream sTempFN2(FNstr.c_str(), ios::out);
sTempFN2 << "[InternetShortcut]\n";
WriteLine(sTempFN2,caULine,nURL);
sTempFN2.flush();
sTempFN2.close();
}
if (nEndofList > 0) { break; }
}
else { cout << "Blank line encountered, skipping..."; }
}
in_stream.close();
return 0;
}

What makes the above code look funny is the lack of proper indentation.  Bless HTML.  Copy and paste and re-establish the indenting per your preference.