10 Mar

Number of times a word appears in a string

Occasionally I get awkward data. Usually it is text data of some sort that came from a spreadsheet or Word document and I needed to get it into a database. To be honest, sometimes even when it showed up clean I would do this.

What I would do is take the data, drop it in Excel, or a text editor (Notepad++ is great for this), convert it to an INSERT statement and load it. Yeah, there are alternatives but if I were working at my desktop and wanted to quickly get it on a website, it felt easier to just convert it, drop the INSERT statements into phpMyAdmin and roll with it. It’s really easy to do this with an editor that can read CR/LF like Notepad++.

I’m sure my lizard friend speaks for many, as there are other ways, but now we’re going to use my awkward approach to things, for a noble purpose: to count the number of times a word appears in a string.

DECLARE @string nvarchar(4000) = 
'Now   is the time''s for all good men to come to the aid of their country';
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp;
CREATE TABLE #tmp
(
	myWord nvarchar(4000)
);
SET @string = LTRIM(RTRIM(@string));
SET @string = REPLACE(@string, '  ','');
SET @string = REPLACE(@string, ',','');
SET @string = REPLACE(@string, '.','');
SET @string = REPLACE(@string, '.','');
SET @string = REPLACE(@string, '''',''''''); 
SET @string = 'INSERT INTO #tmp VALUES (''' + REPLACE(@string,' ','''),(''') + ''');'

EXEC SP_EXECUTESQL  @string;
PRINT @string; 

INSERT INTO #tmp VALUES ('Now'),('is'),('the'),('time''s'),('for'),('all'),('good'),('men'),
('to'),('come'), ('to'),('the'),('aid'),('of'),('their'),('country');

A lot of things happened in that code.

  1. We create a string with extra spaces and an apostrophe for testing purposes.
  2. Create a temporary table.
  3. Remove a parts of the string with consecutive spaces.
  4. Remove periods, commas and quotes from the string.  There’s probably more but this will do for this example.
  5. Keep the double-quotes in the string.
  6. Replace the spaces in the string with ‘,’ and append an INSERT statement around the replaced data.
  7. Execute the INSERT statement.

You can see the INSERT statement in the example.

Now we’ll run some queries against the #tmp table.

SELECT *
FROM #tmp;

myWord
Now
is
the
time's
for
all
good
men
to
come
to
the
aid
of
their
country

SELECT myWord, COUNT(*) AS noTimes
FROM #tmp
GROUP BY myWord
HAVING COUNT(*) > 1;

myWord	noTimes
the	2
to	2

SELECT COUNT(*) AS wordCount
FROM #tmp;

wordCount
16

DROP TABLE #tmp;

And guess what, we have code that counts the number of words in a string. There are a lot of restrictions, particularly the size of the string, and other alternatives that are better, but for me, and as a thinking exercise, at least, it was kind of screwball fun to write.