Frequently Asked Questions
General
This website is a programmer's tool to generate regular expression from text. You can input any string and use the code output in PHP, Perl, Java, Javascript, C++, C#, Visual Basic.NET, Pascal, Python, etc. Instead of trying to build the regular expression, you start off with the string that you want to search. You paste the text into the site, click Show Matches and the site finds recognisable patterns in your string. You then select the patterns that you are interested in and it writes a fully fledged program that extracts those patterns from that string. You then copy the program into your editor or IDE and play with it to integrate it into your program.
This site was created in the absence of txt2re that I used to use. txt2re.com has been offline for quite sometime and the current txt2re.com is nothing but a shell and it seems to have been taken over by a domain squatter. Finding a regular expression could be time consuming and I've decided to make this tool publicly available for free at no charge with no ads, similar to the original txt2re. Since I'm working with more than one programming language and each language seem to have a different way of doing it, I made the output available in several different languages and hopefully this is beneficial for others as well.
Expressions that are generated are always in the same form - if you are looking for the second integer from the left, this system will output a regex that finds 1. non integer text I call 'filler' 2. the first integer 3. non integer text 4. the second integer. Only the second integer will be bracketed for extraction. In the generated code, you can see it building this expression. See this example to see what I mean.
Most tools start with the regular expression, and provide a graphical interface instead of a text based interface to allow you to build it. It's just as cumbersome and difficult as typing the regular expression into an editor. I've never seen the big advantage. Txt2regex on the other hand takes a fundementally different approach - it starts with the string to be searched.
Another site that has a similar functionality is https://regex-generator.olafneumann.org.
I've added quite a few more patterns, including Unicode text (including the smiley subset). It can also detect whether the input is Chinese or Arabic character. The original txt2re.com could not detect unicode strings.
I can't include all possible patterns in our database and have tried to steer clear of patterns that will not be commonly used. If you think that your particular pattern warrants inclusion, let me know in the feedback.
Lets say you are reading lines from a file in Java. You want to extract a filename, an amount and a date from each line:
f:\db4\q1\A404.rt , |
3500.0000 |
, |
0 |
, |
01/11/2005 , |
, |
J NEVILLE. , |
49.00 , |
11/01/2005 , |
Notes 3.11 |
||
f:\db4\q1\A482.rt , |
1500.0000 |
, |
0 |
, |
01/21/2006 , |
, |
A MONTGOMERY. , |
24.00 , |
19/01/2002 , |
Amendment #4 |
||
f:\db4\q1\A414.rt , |
3500.0000 |
, |
0 |
, |
01/22/2006 , |
, |
J DUNNE. , |
19.00 , |
02/01/2001 , |
Amendment #193 |
||
f:\db4\q1\A443.rt , |
4500.0000 |
, |
0 |
, |
01/22/2006 , |
, |
H JEFFERSON. , |
33.00 , |
24/01/2005 , |
NA |
||
f:\db4\q1\A443.rt , |
3500.0000 |
, |
0 |
, |
01/23/2006 , |
, |
C ENNIS. , |
90.00 , |
08/01/2000 , |
Real #30 |
||
f:\db4\q1\A4476.rt , |
1500.0000 |
, |
0 |
, |
01/24/2006 , |
, |
P DOYLE.,C WILLS. , |
19.00 , |
02/01/2003 , |
Amendment #4 |
My old way (60 minutes):
1. Find an example of how Java Regex works.
2. Try a simple expression to extract the yellow windows file field.
3. Work out that the dots in the filename need to be escaped in the regular expression string.
4. Work out that the backslashes in the path need to be escaped also.
5. Work out that the backslashes need to be doubly escaped - once for Java and once for the regular expressions.
6. Match the blue amount column because on its left hand side is an integer and on its right a date.
7. Match the green date column because on its left is the amount.
8. Work out that the forward slashes need to be escaped in the date expression.
9. Test, test, test.
With txt2regex (5 minutes):
1. Paste a line from the file into txt2re.
2. Click on the 'windows_filename_no_spaces' pattern, the 2nd 'real_number' pattern and the second 'dd-mm-yyyy'.
3. Click on Java.
4. Integrate logic into program.
You are trying to match backslash in a php program and are having some difficulty. You paste a backslash into the text area, click submit select the backslash and select php. You then have a complete php program that correctly matches a backslash. Copy and paste!
Say your string contains the number 168. This is matched as both an integer and as the number 168. The system displays both. If you click on the integer, the system determines how many integers are to the left of the 168 you clicked on. It then outputs a program that matches against each of these, and then performs an extraction on the one you clicked on. If you had clicked on literal 168, it would find out how many of these are to the left of the one you clicked on.. and so on
I've added Unicode support. The original txt2re.com didn't have any Unicode support and couldn't detect non-ASCII character correctly. For instance, if you type Lüdenscheid Pražští filharmonici Île-de-France it will recognize the characters.
This system wasn't designed as for newbies! - it was designed for programmers who know what they are doing but couldnt be bothered doing it. If you understand the problem that is being solved, the interface is fine. If you don't its gibberish. Do your career a favor and learn how regular expressions work - you will meet problems that this site does not solve!
I realise that a Web2 version of this tool would look great but I don't care. I spend my professional life caring hugely how things look but not here. A half decent programmer should have no problem with it. I am not trying to win a design award or to help weak students to get through regex projects...
It's free because I have been helped in my career so much by the programmers who generated free systems like Linux, Apache, PHP and MySQL. This is my way to give back to the community. The original txt2re.com was also free.
The source code will be released at the end of 2023.
Note: there was never any source code for the original txt2re.com and I wrote this site in PHP, basing it on Shaun Patterson's code (which was written in Perl with Public Domain license). I have made numerous fixes as well as added Unicode support to the original code. Compared to Shaun's code, it's more or less 60% total rewrite.