菜单

ATL正则表明式库使用

2019年5月31日 - JavaScript
一.关键头文件:  #include <atlrx.h>  vs2005自带.    VS 2008中由于将ALT项目的部分代码剥离出去成为了独立的开源项目,需要用到ALT中正则表达式等功能就需要手动下载。  参考:http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=306398  下载地址:http://www.codeplex.com/AtlServer  把下载的东西解压缩到一个目录,比如c:\alt\  在VS里面[工具]--[选项]--[项目和解决方案]--[VC++目录],在右上角选择[包含引用的文件]中加入c:\alt\include就OK了    二.  关于  CAtlRegExp 及 GRETA        不支持   {m,n} 这样的限定符  而Boost支持    三.  还有一个值得注意的地方就是ATL中用大括号({})表示其子匹配  子匹配Group从0开始.    四.关键类及结构体:  1、 CATLRegExp类  声明:  template <class CharTraits=CAtlRECharTraits>  class CAtlRegExp;  2、 CAtlREMatchContext类  声明:  template <class CharTraits=CAtlRECharTraits>  class CAtlREMatchContext  3.  CAtlREMatchContext<>::MatchGroup    //代码1:这里请注意只用代码环境为多字符集,非UNICODE.  #include <iostream>  #include <afxwin.h>  #include <atlrx.h>  using namespace std;    int main(int argc, char* argv[]) {      CAtlRegExp<> re;      CAtlREMatchContext<> mc;        const char* szIn = "98a76";      char szMatch[128];      memset(szMatch,'\0',128);      re.Parse("[0-9][0-9]");      while(re.Match(szIn,&mc,&szIn)){          strncpy(szMatch,mc.m_Match.szStart, mc.m_Match.szEnd-mc.m_Match.szStart );          cout << szMatch << endl;      }      return 0;  }      /*  项目中,代码我是这样写的.          CString strMatch;  #ifdef _UNICODE          wcsncpy(strMatch.GetBuffer(mg.szEnd-mg.szStart),mg.szStart,mg.szEnd - mg.szStart);  #else          strncpy(strMatch.GetBuffer(mg.szEnd-mg.szStart),mg.szStart,mg.szEnd - mg.szStart);  #endif          strMatch.ReleaseBuffer();  */          用个MSDN上的代码:  http://msdn.microsoft.com/zh-cn/library/k3zs4axe(VS.80).aspx   请注意子匹配Group 及CAtlREMatchContext<>类GetMatch()方法的使用.   其他就不详细讲述了.  // catlregexp_class.cpp  #include <afx.h>  #include <atlrx.h>    int main(int argc, char* argv[])  {      CAtlRegExp<> reUrl;      // Five match groups: scheme, authority, path, query, fragment      REParseError status = reUrl.Parse(          "({[^:/?#]+}:)?(//{[^/?#]*})?{[^?#]*}(?{[^#]*})?(#{.*})?" );        if (REPARSE_ERROR_OK != status)      {          // Unexpected error.          return 0;      }        CAtlREMatchContext<> mcUrl;      if (!reUrl.Match(  "http://search.microsoft.com/us/Search.asp?qu=atl&boolean=ALL#results",          &mcUrl))      {          // Unexpected error.          return 0;      }        for (UINT nGroupIndex = 0; nGroupIndex < mcUrl.m_uNumGroups;           ++nGroupIndex)      {          const CAtlREMatchContext<>::RECHAR* szStart = 0;          const CAtlREMatchContext<>::RECHAR* szEnd = 0;          mcUrl.GetMatch(nGroupIndex, &szStart, &szEnd);            ptrdiff_t nLength = szEnd - szStart;          printf_s("%d: \"%.*s\"\n", nGroupIndex, nLength, szStart);      }        return 0;  }    

1.去RegexKitLite下载类库,解压出来会有2个例子包及1个公文,其实接纳的就那二个文件,增多到工程中。

二.工程中增加libicucore.dylib frameworks。

友谊提醒:普普通通的人导入RegexKitLite编写翻译报错,便是因为未有导入那几个类库,加上这些就OK了

叁.现行反革命享有的nsstring对象就足以调用RegexKitLite中的方法了。

NSString *email = @”kkk@aaa.com”;

[email
isMatchedByRegex:@”\\b([a-zA-Z0-9%_.+\\-]+)@([a-zA-Z0-9.\\-]+?\\.[a-zA-Z]{2,6})\\b”];

回来YES,表明是email格式,要求专注的是RegexKitLite用到的正则表明式和wiki上的略有分裂。

searchString = @”http://www.example.com:8080/index.html”;

regexString  =
@”\\bhttps?://[a-zA-Z0-9\\-.]+(?::(\\d+))?(?:(?:/[a-zA-Z0-9\\-._?,’+\\&%$=~*!():@\\\\]*)+)?”;

NSInteger portInteger = [[searchString stringByMatching:regexString
capture:1L] integerValue];

NSLog(@”portInteger: ‘%ld’”, (long)portInteger);

// 2008-10-15 08:52:52.500 host_port[8021:807] portInteger: ‘8080′

取string中http的例子。

下边给出常用的局地正则表明式(其实就是RegexKitLite官方网站络的,怕同鞋偷情不看)

CharacterDescription

\aMatch a BELL, \u0007

\AMatch at the beginning of the input. Differs from ^ in that \A will
not match after a new-line within the input.

\b, outside of a [Set]Match if the current position is a word
boundary. Boundaries occur at the transitions between word \w and
non-word \W characters, with combining marks ignored.

See also: RKLUnicodeWordBoundaries

\b, within a [Set]Match a BACKSPACE, \u0008.

\BMatch if the current position is not a word boundary.

\cxMatch a Control-x character.

\dMatch any character with the Unicode General Category of Nd (Number,
Decimal Digit).

\DMatch any character that is not a decimal digit.

\eMatch an ESCAPE, \u001B.

\ETerminates a \Q…\E quoted sequence.

\fMatch a FORM FEED, \u000C.

\GMatch if the current position is at the end of the previous match.

\nMatch a LINE FEED, \u000A.

\N{Unicode Character Name}Match the named Unicode Character.

\p{Unicode Property Name}Match any character with the specified Unicode
Property.

\P{Unicode Property Name}Match any character not having the specified
Unicode Property.

\QQuotes all following characters until \E.

\rMatch a CARRIAGE RETURN, \u000D.

\sMatch a white space character. White space is defined as
[\t\n\f\r\p{Z}].

\SMatch a non-white space character.

\tMatch a HORIZONTAL TABULATION, \u0009.

\uhhhhMatch the character with the hex value hhhh.

\UhhhhhhhhMatch the character with the hex value hhhhhhhh. Exactly
eight hex digits must be provided, even though the largest Unicode code
point is \U0010ffff.

\wMatch a word character. Word characters are
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].

\WMatch a non-word character.

\x{h…}Match the character with hex value hhhh. From one to six hex
digits may be supplied.

\xhhMatch the character with two digit hex value hh.

\XMatch a Grapheme Cluster.

\ZMatch if the current position is at the end of input, but before the
final line terminator, if one exists.

\zMatch if the current position is at the end of input.

\nBack Reference. Match whatever the nth capturing group matched. n
must be a number ≥ 1 and ≤ total number of capture groups in the
pattern.Note:Octal escapes, such as \012, are not supported.

[pattern]Match any one character from the set. See ICU Regular
Expression Character Classes for a full description of what may appear
in the pattern.

.Match any character.

^Match at the beginning of a line.

$Match at the end of a line.

\Quotes the following character. Characters that must be quoted to be
treated as literals are * ? + [ ( ) { } ^ $ | \ . /

OperatorsOperatorDescription

|Alternation. A|B matches either A or B.

*Match zero or more times. Match as many times as possible.

+Match one or more times. Match as many times as possible.

?Match zero or one times. Prefer one.

{n}Match exactly n times.

{n,}Match at least n times. Match as many times as possible.

{n,m}Match between n and m times. Match as many times as possible, but
not more than m.

*?Match zero or more times. Match as few times as possible.

+?Match one or more times. Match as few times as possible.

??Match zero or one times. Prefer zero.

{n}?Match exactly n times.

{n,}?Match at least n times, but no more than required for an overall
pattern match.

{n,m}?Match between n and m times. Match as few times as possible, but
not less than n.

*+Match zero or more times. Match as many times as possible when first
encountered, do not retry with fewer even if overall match fails.
Possessive match.

++Match one or more times. Possessive match.

?+Match zero or one times. Possessive match.

{n}+Match exactly n times. Possessive match.

{n,}+Match at least n times. Possessive match.

{n,m}+Match between n and m times. Possessive match.

(…)Capturing parentheses. Range of input that matched the parenthesized
subexpression is available after the match.

(?:…)Non-capturing parentheses. Groups the included pattern, but does
not provide capturing of matching text. Somewhat more efficient than
capturing parentheses.

(?>…)Atomic-match parentheses. First match of the parenthesized
subexpression is the only one tried; if it does not lead to an overall
pattern match, back up the search for a match to a position before the
(?> .

(?#…)Free-format comment (?#comment).

(?=…)Look-ahead assertion. True if the parenthesized pattern matches at
the current input position, but does not advance the input position.

(?!…)Negative look-ahead assertion. True if the parenthesized pattern
does not match at the current input position. Does not advance the input
position.

(?<=…)Look-behind assertion. True if the parenthesized pattern
matches text preceding the current input position, with the last
character of the match being the input character just before the current
position. Does not alter the input position. The length of possible
strings matched by the look-behind pattern must not be unbounded (no *
or + operators).

(?<!…)Negative Look-behind assertion. True if the parenthesized
pattern does not match text preceding the current input position, with
the last character of the match being the input character just before
the current position. Does not alter the input position. The length of
possible strings matched by the look-behind pattern must not be
unbounded (no * or + operators).

(?ismwx-ismwx:…)Flag settings. Evaluate the parenthesized expression
with the specified flags enabled or -disabled.

(?ismwx-ismwx)Flag settings. Change the flag settings. Changes apply to
the portion of the pattern following the setting. For example, (?i)
changes to a case insensitive match.

See also: Regular Expression Options

图片 1

图片 2

图片 3

图片 4图片 5

并且必要注意的是转义字符哦~~在safari上复制会直接转变(网址蛮人性化的)

再者也提供了改变工具,safari测试辅助,大概下载的时候有一点点慢,耐心等待,链接

图片 6



相关文章

发表评论

电子邮件地址不会被公开。 必填项已用*标注

网站地图xml地图