Sentence and Word Analysis #2
This as part two of my post on sentence and word analysis. In part one I discussed my motives for analysing the RSS feed in question. In this post I shall be building upon my initial findings and presenting the C# and SQL code that I used to do so.
I have continued to run the RSS reader periodically and now have 284 job descriptions to analyse. I have run through the initial results and identified the words and sentences that are irrelevant and placed these into a keywords table so that I may strip them from my results. This was quite a lengthy process as there were a significant number of these to exclude - nearly a thousand. Following that, I looked through the results and because of the different permutations of the keywords that I was looking for it was evident that I would need to look within the top 100 words/phrases to identify the ones that I was interested in. I made a decision to leave in keywords that related to job skills in addition to computer languages.
The top 100 keyword/skills results from analysis of 284 job descriptions. The analysis took 9.5 minutes to run.
| # |
Word |
Rank |
|
# |
Word |
Rank |
|
# |
Word |
Rank |
| 1 |
C# |
301 |
|
35 |
CSS |
29 |
|
69 |
structured |
16 |
| 2 |
SQL |
206 |
|
36 |
E-commerce |
29 |
|
70 |
Unix |
16 |
| 3 |
.NET |
203 |
|
37 |
ASP |
26 |
|
71 |
Website |
16 |
| 4 |
Server |
173 |
|
38 |
C# .NET |
26 |
|
72 |
will work |
16 |
| 5 |
ASP.NET |
129 |
|
39 |
CRM |
26 |
|
73 |
automated |
15 |
| 6 |
SQL Server |
122 |
|
40 |
Equities |
26 |
|
74 |
Datawarehouse |
15 |
| 7 |
SharePoint |
79 |
|
41 |
RAD |
26 |
|
75 |
Derivative |
15 |
| 8 |
Office |
78 |
|
42 |
SQL Server 2005 |
26 |
|
76 |
desk |
15 |
| 9 |
Test |
70 |
|
43 |
VBA |
25 |
|
77 |
Equity |
15 |
| 10 |
C++ |
69 |
|
44 |
Winforms |
25 |
|
78 |
International |
15 |
| 11 |
banking |
63 |
|
45 |
C#. |
23 |
|
79 |
MOSS |
15 |
| 12 |
Java |
59 |
|
46 |
C#.NET |
23 |
|
80 |
OLAP |
15 |
| 13 |
London |
59 |
|
47 |
Fixed Income |
23 |
|
81 |
VB6 |
15 |
| 14 |
Front Office |
55 |
|
48 |
framework |
23 |
|
82 |
ASAP |
14 |
| 15 |
XML |
52 |
|
49 |
Quant |
22 |
|
83 |
Back End |
14 |
| 16 |
Windows |
47 |
|
50 |
2 |
21 |
|
84 |
Basic |
14 |
| 17 |
Oracle |
45 |
|
51 |
Visual Studio |
21 |
|
85 |
business req. |
14 |
| 18 |
tools |
45 |
|
52 |
GUI |
20 |
|
86 |
comm. skills |
14 |
| 19 |
database |
44 |
|
53 |
VB.NET |
20 |
|
87 |
document |
14 |
| 20 |
Excel |
43 |
|
54 |
Web based |
20 |
|
88 |
experienced C# |
14 |
| 21 |
FX |
43 |
|
55 |
Access |
19 |
|
89 |
functional |
14 |
| 22 |
MS |
41 |
|
56 |
Cash |
18 |
|
90 |
VB |
14 |
| 23 |
HTML |
40 |
|
57 |
digital |
18 |
|
91 |
.NET Framework |
13 |
| 24 |
C# ASP.NET |
39 |
|
58 |
Finance |
18 |
|
92 |
.NET 3.5 |
13 |
| 25 |
life cycle |
38 |
|
59 |
AJAX |
17 |
|
93 |
ASP.NET C# |
13 |
| 26 |
C# Developer |
36 |
|
60 |
Biztalk |
17 |
|
94 |
ASP.Net Developer |
13 |
| 27 |
Reporting |
36 |
|
61 |
Excel VBA |
17 |
|
95 |
C# ASP.net SQL |
13 |
| 28 |
analyst |
34 |
|
62 |
media |
17 |
|
96 |
degree |
13 |
| 29 |
JavaScript |
33 |
|
63 |
Security |
17 |
|
97 |
MS SQL |
13 |
| 30 |
3.5 |
32 |
|
64 |
ASP.Net SQL |
16 |
|
98 |
Rates FX |
13 |
| 31 |
agile |
31 |
|
65 |
CMS |
16 |
|
99 |
Reporting Services |
13 |
| 32 |
architecture |
31 |
|
66 |
credit derivatives |
16 |
|
100 |
Siebel |
13 |
| 33 |
communication |
31 |
|
67 |
Silverlight |
16 |
|
|
|
|
| 34 |
.NET developer |
29 |
|
68 |
Sophis |
16 |
|
|
|
|
A link to a backup of the database may be found here: jobs.zip (392.41 kb)
You will need to restore this into SQL Server before the .NET code (below) will work.
A link to the .NET code (C#) is here: RssReader.zip (3.48 kb)
You will need to modify the App.Config file to point to your RSS feed and database.
To run the analysis on the sentances in the database you'll need to execute the 'analyse' stored procedure. Once that has finished execuiting you'll need to perform a select from the 'analysis_results' view to view the results.